Sample records for residues multiple sequence

  1. Embedding strategies for effective use of information from multiple sequence alignments.

    PubMed Central

    Henikoff, S.; Henikoff, J. G.

    1997-01-01

    We describe a new strategy for utilizing multiple sequence alignment information to detect distant relationships in searches of sequence databases. A single sequence representing a protein family is enriched by replacing conserved regions with position-specific scoring matrices (PSSMs) or consensus residues derived from multiple alignments of family members. In comprehensive tests of these and other family representations, PSSM-embedded queries produced the best results overall when used with a special version of the Smith-Waterman searching algorithm. Moreover, embedding consensus residues instead of PSSMs improved performance with readily available single sequence query searching programs, such as BLAST and FASTA. Embedding PSSMs or consensus residues into a representative sequence improves searching performance by extracting multiple alignment information from motif regions while retaining single sequence information where alignment is uncertain. PMID:9070452

  2. PFAAT version 2.0: a tool for editing, annotating, and analyzing multiple sequence alignments.

    PubMed

    Caffrey, Daniel R; Dana, Paul H; Mathur, Vidhya; Ocano, Marco; Hong, Eun-Jong; Wang, Yaoyu E; Somaroo, Shyamal; Caffrey, Brian E; Potluri, Shobha; Huang, Enoch S

    2007-10-11

    By virtue of their shared ancestry, homologous sequences are similar in their structure and function. Consequently, multiple sequence alignments are routinely used to identify trends that relate to function. This type of analysis is particularly productive when it is combined with structural and phylogenetic analysis. Here we describe the release of PFAAT version 2.0, a tool for editing, analyzing, and annotating multiple sequence alignments. Support for multiple annotations is a key component of this release as it provides a framework for most of the new functionalities. The sequence annotations are accessible from the alignment and tree, where they are typically used to label sequences or hyperlink them to related databases. Sequence annotations can be created manually or extracted automatically from UniProt entries. Once a multiple sequence alignment is populated with sequence annotations, sequences can be easily selected and sorted through a sophisticated search dialog. The selected sequences can be further analyzed using statistical methods that explicitly model relationships between the sequence annotations and residue properties. Residue annotations are accessible from the alignment viewer and are typically used to designate binding sites or properties for a particular residue. Residue annotations are also searchable, and allow one to quickly select alignment columns for further sequence analysis, e.g. computing percent identities. Other features include: novel algorithms to compute sequence conservation, mapping conservation scores to a 3D structure in Jmol, displaying secondary structure elements, and sorting sequences by residue composition. PFAAT provides a framework whereby end-users can specify knowledge for a protein family in the form of annotation. The annotations can be combined with sophisticated analysis to test hypothesis that relate to sequence, structure and function.

  3. An intuitive graphical webserver for multiple-choice protein sequence search.

    PubMed

    Banky, Daniel; Szalkai, Balazs; Grolmusz, Vince

    2014-04-10

    Every day tens of thousands of sequence searches and sequence alignment queries are submitted to webservers. The capitalized word "BLAST" becomes a verb, describing the act of performing sequence search and alignment. However, if one needs to search for sequences that contain, for example, two hydrophobic and three polar residues at five given positions, the query formation on the most frequently used webservers will be difficult. Some servers support the formation of queries with regular expressions, but most of the users are unfamiliar with their syntax. Here we present an intuitive, easily applicable webserver, the Protein Sequence Analysis server, that allows the formation of multiple choice queries by simply drawing the residues to their positions; if more than one residue are drawn to the same position, then they will be nicely stacked on the user interface, indicating the multiple choice at the given position. This computer-game-like interface is natural and intuitive, and the coloring of the residues makes possible to form queries requiring not just certain amino acids in the given positions, but also small nonpolar, negatively charged, hydrophobic, positively charged, or polar ones. The webserver is available at http://psa.pitgroup.org. Copyright © 2014 Elsevier B.V. All rights reserved.

  4. Bioinformatic prediction and in vivo validation of residue-residue interactions in human proteins

    NASA Astrophysics Data System (ADS)

    Jordan, Daniel; Davis, Erica; Katsanis, Nicholas; Sunyaev, Shamil

    2014-03-01

    Identifying residue-residue interactions in protein molecules is important for understanding both protein structure and function in the context of evolutionary dynamics and medical genetics. Such interactions can be difficult to predict using existing empirical or physical potentials, especially when residues are far from each other in sequence space. Using a multiple sequence alignment of 46 diverse vertebrate species we explore the space of allowed sequences for orthologous protein families. Amino acid changes that are known to damage protein function allow us to identify specific changes that are likely to have interacting partners. We fit the parameters of the continuous-time Markov process used in the alignment to conclude that these interactions are primarily pairwise, rather than higher order. Candidates for sites under pairwise epistasis are predicted, which can then be tested by experiment. We report the results of an initial round of in vivo experiments in a zebrafish model that verify the presence of multiple pairwise interactions predicted by our model. These experimentally validated interactions are novel, distant in sequence, and are not readily explained by known biochemical or biophysical features.

  5. CombAlign: a code for generating a one-to-many sequence alignment from a set of pairwise structure-based sequence alignments.

    PubMed

    Zhou, Carol L Ecale

    2015-01-01

    In order to better define regions of similarity among related protein structures, it is useful to identify the residue-residue correspondences among proteins. Few codes exist for constructing a one-to-many multiple sequence alignment derived from a set of structure or sequence alignments, and a need was evident for creating such a tool for combining pairwise structure alignments that would allow for insertion of gaps in the reference structure. This report describes a new Python code, CombAlign, which takes as input a set of pairwise sequence alignments (which may be structure based) and generates a one-to-many, gapped, multiple structure- or sequence-based sequence alignment (MSSA). The use and utility of CombAlign was demonstrated by generating gapped MSSAs using sets of pairwise structure-based sequence alignments between structure models of the matrix protein (VP40) and pre-small/secreted glycoprotein (sGP) of Reston Ebolavirus and the corresponding proteins of several other filoviruses. The gapped MSSAs revealed structure-based residue-residue correspondences, which enabled identification of structurally similar versus differing regions in the Reston proteins compared to each of the other corresponding proteins. CombAlign is a new Python code that generates a one-to-many, gapped, multiple structure- or sequence-based sequence alignment (MSSA) given a set of pairwise sequence alignments (which may be structure based). CombAlign has utility in assisting the user in distinguishing structurally conserved versus divergent regions on a reference protein structure relative to other closely related proteins. CombAlign was developed in Python 2.6, and the source code is available for download from the GitHub code repository.

  6. ComplexContact: a web server for inter-protein contact prediction using deep learning.

    PubMed

    Zeng, Hong; Wang, Sheng; Zhou, Tianming; Zhao, Feifeng; Li, Xiufeng; Wu, Qing; Xu, Jinbo

    2018-05-22

    ComplexContact (http://raptorx2.uchicago.edu/ComplexContact/) is a web server for sequence-based interfacial residue-residue contact prediction of a putative protein complex. Interfacial residue-residue contacts are critical for understanding how proteins form complex and interact at residue level. When receiving a pair of protein sequences, ComplexContact first searches for their sequence homologs and builds two paired multiple sequence alignments (MSA), then it applies co-evolution analysis and a CASP-winning deep learning (DL) method to predict interfacial contacts from paired MSAs and visualizes the prediction as an image. The DL method was originally developed for intra-protein contact prediction and performed the best in CASP12. Our large-scale experimental test further shows that ComplexContact greatly outperforms pure co-evolution methods for inter-protein contact prediction, regardless of the species.

  7. ProfileGrids: a sequence alignment visualization paradigm that avoids the limitations of Sequence Logos.

    PubMed

    Roca, Alberto I

    2014-01-01

    The 2013 BioVis Contest provided an opportunity to evaluate different paradigms for visualizing protein multiple sequence alignments. Such data sets are becoming extremely large and thus taxing current visualization paradigms. Sequence Logos represent consensus sequences but have limitations for protein alignments. As an alternative, ProfileGrids are a new protein sequence alignment visualization paradigm that represents an alignment as a color-coded matrix of the residue frequency occurring at every homologous position in the aligned protein family. The JProfileGrid software program was used to analyze the BioVis contest data sets to generate figures for comparison with the Sequence Logo reference images. The ProfileGrid representation allows for the clear and effective analysis of protein multiple sequence alignments. This includes both a general overview of the conservation and diversity sequence patterns as well as the interactive ability to query the details of the protein residue distributions in the alignment. The JProfileGrid software is free and available from http://www.ProfileGrid.org.

  8. Annotating Protein Functional Residues by Coupling High-Throughput Fitness Profile and Homologous-Structure Analysis.

    PubMed

    Du, Yushen; Wu, Nicholas C; Jiang, Lin; Zhang, Tianhao; Gong, Danyang; Shu, Sara; Wu, Ting-Ting; Sun, Ren

    2016-11-01

    Identification and annotation of functional residues are fundamental questions in protein sequence analysis. Sequence and structure conservation provides valuable information to tackle these questions. It is, however, limited by the incomplete sampling of sequence space in natural evolution. Moreover, proteins often have multiple functions, with overlapping sequences that present challenges to accurate annotation of the exact functions of individual residues by conservation-based methods. Using the influenza A virus PB1 protein as an example, we developed a method to systematically identify and annotate functional residues. We used saturation mutagenesis and high-throughput sequencing to measure the replication capacity of single nucleotide mutations across the entire PB1 protein. After predicting protein stability upon mutations, we identified functional PB1 residues that are essential for viral replication. To further annotate the functional residues important to the canonical or noncanonical functions of viral RNA-dependent RNA polymerase (vRdRp), we performed a homologous-structure analysis with 16 different vRdRp structures. We achieved high sensitivity in annotating the known canonical polymerase functional residues. Moreover, we identified a cluster of noncanonical functional residues located in the loop region of the PB1 β-ribbon. We further demonstrated that these residues were important for PB1 protein nuclear import through the interaction with Ran-binding protein 5. In summary, we developed a systematic and sensitive method to identify and annotate functional residues that are not restrained by sequence conservation. Importantly, this method is generally applicable to other proteins about which homologous-structure information is available. To fully comprehend the diverse functions of a protein, it is essential to understand the functionality of individual residues. Current methods are highly dependent on evolutionary sequence conservation, which is usually limited by sampling size. Sequence conservation-based methods are further confounded by structural constraints and multifunctionality of proteins. Here we present a method that can systematically identify and annotate functional residues of a given protein. We used a high-throughput functional profiling platform to identify essential residues. Coupling it with homologous-structure comparison, we were able to annotate multiple functions of proteins. We demonstrated the method with the PB1 protein of influenza A virus and identified novel functional residues in addition to its canonical function as an RNA-dependent RNA polymerase. Not limited to virology, this method is generally applicable to other proteins that can be functionally selected and about which homologous-structure information is available. Copyright © 2016 Du et al.

  9. SARA-Coffee web server, a tool for the computation of RNA sequence and structure multiple alignments

    PubMed Central

    Di Tommaso, Paolo; Bussotti, Giovanni; Kemena, Carsten; Capriotti, Emidio; Chatzou, Maria; Prieto, Pablo; Notredame, Cedric

    2014-01-01

    This article introduces the SARA-Coffee web server; a service allowing the online computation of 3D structure based multiple RNA sequence alignments. The server makes it possible to combine sequences with and without known 3D structures. Given a set of sequences SARA-Coffee outputs a multiple sequence alignment along with a reliability index for every sequence, column and aligned residue. SARA-Coffee combines SARA, a pairwise structural RNA aligner with the R-Coffee multiple RNA aligner in a way that has been shown to improve alignment accuracy over most sequence aligners when enough structural data is available. The server can be accessed from http://tcoffee.crg.cat/apps/tcoffee/do:saracoffee. PMID:24972831

  10. Protein contact prediction by integrating deep multiple sequence alignments, coevolution and machine learning.

    PubMed

    Adhikari, Badri; Hou, Jie; Cheng, Jianlin

    2018-03-01

    In this study, we report the evaluation of the residue-residue contacts predicted by our three different methods in the CASP12 experiment, focusing on studying the impact of multiple sequence alignment, residue coevolution, and machine learning on contact prediction. The first method (MULTICOM-NOVEL) uses only traditional features (sequence profile, secondary structure, and solvent accessibility) with deep learning to predict contacts and serves as a baseline. The second method (MULTICOM-CONSTRUCT) uses our new alignment algorithm to generate deep multiple sequence alignment to derive coevolution-based features, which are integrated by a neural network method to predict contacts. The third method (MULTICOM-CLUSTER) is a consensus combination of the predictions of the first two methods. We evaluated our methods on 94 CASP12 domains. On a subset of 38 free-modeling domains, our methods achieved an average precision of up to 41.7% for top L/5 long-range contact predictions. The comparison of the three methods shows that the quality and effective depth of multiple sequence alignments, coevolution-based features, and machine learning integration of coevolution-based features and traditional features drive the quality of predicted protein contacts. On the full CASP12 dataset, the coevolution-based features alone can improve the average precision from 28.4% to 41.6%, and the machine learning integration of all the features further raises the precision to 56.3%, when top L/5 predicted long-range contacts are evaluated. And the correlation between the precision of contact prediction and the logarithm of the number of effective sequences in alignments is 0.66. © 2017 Wiley Periodicals, Inc.

  11. Identification and Analysis of Novel Amino-Acid Sequence Repeats in Bacillus anthracis str. Ames Proteome Using Computational Tools

    PubMed Central

    Hemalatha, G. R.; Rao, D. Satyanarayana; Guruprasad, L.

    2007-01-01

    We have identified four repeats and ten domains that are novel in proteins encoded by the Bacillus anthracis str. Ames proteome using automated in silico methods. A “repeat” corresponds to a region comprising less than 55-amino-acid residues that occur more than once in the protein sequence and sometimes present in tandem. A “domain” corresponds to a conserved region with greater than 55-amino-acid residues and may be present as single or multiple copies in the protein sequence. These correspond to (1) 57-amino-acid-residue PxV domain, (2) 122-amino-acid-residue FxF domain, (3) 111-amino-acid-residue YEFF domain, (4) 109-amino-acid-residue IMxxH domain, (5) 103-amino-acid-residue VxxT domain, (6) 84-amino-acid-residue ExW domain, (7) 104-amino-acid-residue NTGFIG domain, (8) 36-amino-acid-residue NxGK repeat, (9) 95-amino-acid-residue VYV domain, (10) 75-amino-acid-residue KEWE domain, (11) 59-amino-acid-residue AFL domain, (12) 53-amino-acid-residue RIDVK repeat, (13) (a) 41-amino-acid-residue AGQF repeat and (b) 42-amino-acid-residue GSAL repeat. A repeat or domain type is characterized by specific conserved sequence motifs. We discuss the presence of these repeats and domains in proteins from other genomes and their probable secondary structure. PMID:17538688

  12. C-terminal amino acid residue loss for deprotonated peptide ions containing glutamic acid, aspartic acid, or serine residues at the C-terminus.

    PubMed

    Li, Zhong; Yalcin, Talat; Cassady, Carolyn J

    2006-07-01

    Deprotonated peptides containing C-terminal glutamic acid, aspartic acid, or serine residues were studied by sustained off-resonance irradiation collision-induced dissociation (SORI-CID) in a Fourier transform ion cyclotron resonance (FT-ICR) mass spectrometer with ion production by electrospray ionization (ESI). Additional studies were performed by post source decay (PSD) in a matrix-assisted laser desorption ionization/time-of-flight (MALDI/TOF) mass spectrometer. This work included both model peptides synthesized in our laboratory and bioactive peptides with more complex sequences. During SORI-CID and PSD, [M - H]- and [M - 2H]2- underwent an unusual cleavage corresponding to the elimination of the C-terminal residue. Two mechanisms are proposed to occur. They involve nucleophilic attack on the carbonyl carbon of the adjacent residue by either the carboxylate group of the C-terminus or the side chain carboxylate group of C-terminal glutamic acid and aspartic acid residues. To confirm the proposed mechanisms, AAAAAD was labelled by 18O specifically on the side chain of the aspartic acid residue. For peptides that contain multiple C-terminal glutamic acid residues, each of these residues can be sequentially eliminated from the deprotonated ions; a driving force may be the formation of a very stable pyroglutamatic acid neutral. For peptides with multiple aspartic acid residues at the C-terminus, aspartic acid residue loss is not sequential. For peptides with multiple serine residues at the C-terminus, C-terminal residue loss is sequential; however, abundant loss of other neutral molecules also occurs. In addition, the presence of basic residues (arginine or lysine) in the sequence has no effect on C-terminal residue elimination in the negative ion mode.

  13. StralSV: assessment of sequence variability within similar 3D structures and application to polio RNA-dependent RNA polymerase.

    PubMed

    Zemla, Adam T; Lang, Dorothy M; Kostova, Tanya; Andino, Raul; Ecale Zhou, Carol L

    2011-06-02

    Most of the currently used methods for protein function prediction rely on sequence-based comparisons between a query protein and those for which a functional annotation is provided. A serious limitation of sequence similarity-based approaches for identifying residue conservation among proteins is the low confidence in assigning residue-residue correspondences among proteins when the level of sequence identity between the compared proteins is poor. Multiple sequence alignment methods are more satisfactory--still, they cannot provide reliable results at low levels of sequence identity. Our goal in the current work was to develop an algorithm that could help overcome these difficulties by facilitating the identification of structurally (and possibly functionally) relevant residue-residue correspondences between compared protein structures. Here we present StralSV (structure-alignment sequence variability), a new algorithm for detecting closely related structure fragments and quantifying residue frequency from tight local structure alignments. We apply StralSV in a study of the RNA-dependent RNA polymerase of poliovirus, and we demonstrate that the algorithm can be used to determine regions of the protein that are relatively unique, or that share structural similarity with proteins that would be considered distantly related. By quantifying residue frequencies among many residue-residue pairs extracted from local structural alignments, one can infer potential structural or functional importance of specific residues that are determined to be highly conserved or that deviate from a consensus. We further demonstrate that considerable detailed structural and phylogenetic information can be derived from StralSV analyses. StralSV is a new structure-based algorithm for identifying and aligning structure fragments that have similarity to a reference protein. StralSV analysis can be used to quantify residue-residue correspondences and identify residues that may be of particular structural or functional importance, as well as unusual or unexpected residues at a given sequence position. StralSV is provided as a web service at http://proteinmodel.org/AS2TS/STRALSV/.

  14. ProfileGrids: a sequence alignment visualization paradigm that avoids the limitations of Sequence Logos

    PubMed Central

    2014-01-01

    Background The 2013 BioVis Contest provided an opportunity to evaluate different paradigms for visualizing protein multiple sequence alignments. Such data sets are becoming extremely large and thus taxing current visualization paradigms. Sequence Logos represent consensus sequences but have limitations for protein alignments. As an alternative, ProfileGrids are a new protein sequence alignment visualization paradigm that represents an alignment as a color-coded matrix of the residue frequency occurring at every homologous position in the aligned protein family. Results The JProfileGrid software program was used to analyze the BioVis contest data sets to generate figures for comparison with the Sequence Logo reference images. Conclusions The ProfileGrid representation allows for the clear and effective analysis of protein multiple sequence alignments. This includes both a general overview of the conservation and diversity sequence patterns as well as the interactive ability to query the details of the protein residue distributions in the alignment. The JProfileGrid software is free and available from http://www.ProfileGrid.org. PMID:25237393

  15. StralSV: assessment of sequence variability within similar 3D structures and application to polio RNA-dependent RNA polymerase

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zemla, A; Lang, D; Kostova, T

    2010-11-29

    Most of the currently used methods for protein function prediction rely on sequence-based comparisons between a query protein and those for which a functional annotation is provided. A serious limitation of sequence similarity-based approaches for identifying residue conservation among proteins is the low confidence in assigning residue-residue correspondences among proteins when the level of sequence identity between the compared proteins is poor. Multiple sequence alignment methods are more satisfactory - still, they cannot provide reliable results at low levels of sequence identity. Our goal in the current work was to develop an algorithm that could overcome these difficulties and facilitatemore » the identification of structurally (and possibly functionally) relevant residue-residue correspondences between compared protein structures. Here we present StralSV, a new algorithm for detecting closely related structure fragments and quantifying residue frequency from tight local structure alignments. We apply StralSV in a study of the RNA-dependent RNA polymerase of poliovirus and demonstrate that the algorithm can be used to determine regions of the protein that are relatively unique or that shared structural similarity with structures that are distantly related. By quantifying residue frequencies among many residue-residue pairs extracted from local alignments, one can infer potential structural or functional importance of specific residues that are determined to be highly conserved or that deviate from a consensus. We further demonstrate that considerable detailed structural and phylogenetic information can be derived from StralSV analyses. StralSV is a new structure-based algorithm for identifying and aligning structure fragments that have similarity to a reference protein. StralSV analysis can be used to quantify residue-residue correspondences and identify residues that may be of particular structural or functional importance, as well as unusual or unexpected residues at a given sequence position.« less

  16. Annotating Protein Functional Residues by Coupling High-Throughput Fitness Profile and Homologous-Structure Analysis

    PubMed Central

    Du, Yushen; Wu, Nicholas C.; Jiang, Lin; Zhang, Tianhao; Gong, Danyang; Shu, Sara; Wu, Ting-Ting

    2016-01-01

    ABSTRACT Identification and annotation of functional residues are fundamental questions in protein sequence analysis. Sequence and structure conservation provides valuable information to tackle these questions. It is, however, limited by the incomplete sampling of sequence space in natural evolution. Moreover, proteins often have multiple functions, with overlapping sequences that present challenges to accurate annotation of the exact functions of individual residues by conservation-based methods. Using the influenza A virus PB1 protein as an example, we developed a method to systematically identify and annotate functional residues. We used saturation mutagenesis and high-throughput sequencing to measure the replication capacity of single nucleotide mutations across the entire PB1 protein. After predicting protein stability upon mutations, we identified functional PB1 residues that are essential for viral replication. To further annotate the functional residues important to the canonical or noncanonical functions of viral RNA-dependent RNA polymerase (vRdRp), we performed a homologous-structure analysis with 16 different vRdRp structures. We achieved high sensitivity in annotating the known canonical polymerase functional residues. Moreover, we identified a cluster of noncanonical functional residues located in the loop region of the PB1 β-ribbon. We further demonstrated that these residues were important for PB1 protein nuclear import through the interaction with Ran-binding protein 5. In summary, we developed a systematic and sensitive method to identify and annotate functional residues that are not restrained by sequence conservation. Importantly, this method is generally applicable to other proteins about which homologous-structure information is available. PMID:27803181

  17. From Principal Component to Direct Coupling Analysis of Coevolution in Proteins: Low-Eigenvalue Modes are Needed for Structure Prediction

    PubMed Central

    Cocco, Simona; Monasson, Remi; Weigt, Martin

    2013-01-01

    Various approaches have explored the covariation of residues in multiple-sequence alignments of homologous proteins to extract functional and structural information. Among those are principal component analysis (PCA), which identifies the most correlated groups of residues, and direct coupling analysis (DCA), a global inference method based on the maximum entropy principle, which aims at predicting residue-residue contacts. In this paper, inspired by the statistical physics of disordered systems, we introduce the Hopfield-Potts model to naturally interpolate between these two approaches. The Hopfield-Potts model allows us to identify relevant ‘patterns’ of residues from the knowledge of the eigenmodes and eigenvalues of the residue-residue correlation matrix. We show how the computation of such statistical patterns makes it possible to accurately predict residue-residue contacts with a much smaller number of parameters than DCA. This dimensional reduction allows us to avoid overfitting and to extract contact information from multiple-sequence alignments of reduced size. In addition, we show that low-eigenvalue correlation modes, discarded by PCA, are important to recover structural information: the corresponding patterns are highly localized, that is, they are concentrated in few sites, which we find to be in close contact in the three-dimensional protein fold. PMID:23990764

  18. Dissecting substrate specificities of the mitochondrial AFG3L2 protease.

    PubMed

    Ding, Bojian; Martin, Dwight W; Rampello, Anthony J; Glynn, Steven E

    2018-06-22

    Human AFG3L2 is a compartmental AAA+ protease that performs ATP-fueled degradation at the matrix face of the inner mitochondrial membrane. Identifying how AFG3L2 selects substrates from the diverse complement of matrix-localized proteins is essential for understanding mitochondrial protein biogenesis and quality control. Here, we create solubilized forms of AFG3L2 to examine the enzyme's substrate specificity mechanisms. We show that conserved residues within the pre-sequence of the mitochondrial ribosomal protein, MrpL32, target the subunit to the protease for processing into a mature form. Moreover, these residues can act as a degron, delivering diverse model proteins to AFG3L2 for degradation. By determining the sequence of degra-dation products from multiple substrates using mass spectrometry, we construct a peptidase specificity pro-file that displays constrained product lengths and is dominated by the identity of the residue at the P1' posi-tion, with a strong preference for hydrophobic and small polar residues. This specificity profile is validated by examining the cleavage of both fluorogenic reporter peptides and full polypeptide substrates bearing different P1' residues. Together, these results demonstrate that AFG3L2 contains multiple modes of specificity, dis-criminating between potential substrates by recognizing accessible degron sequences, and performing peptide bond cleavage at preferred patterns of residues within the compartmental chamber.

  19. On the Split Personality of Penultimate Proline

    PubMed Central

    Glover, Matthew S.; Shi, Liuqing; Fuller, Daniel R.; Arnold, Randy J.; Radivojac, Predrag; Clemmer, David E.

    2014-01-01

    The influence of the position of the amino acid proline in polypeptide sequences is examined by a combination of ion mobility spectrometry-mass spectrometry (IMS-MS), amino acid substitutions, and molecular modeling. The results suggest that when proline exists as the second residue from the N-terminus (i.e., penultimate proline), two families of conformers are formed. We demonstrate the existence of these families by a study of a series of truncated and mutated peptides derived from the 11-residue peptide Ser1-Pro2-Glu3-Leu4-Pro5-Ser6-Pro7-Gln8-Ala9-Glu10-Lys11. We find that every peptide from this sequence with a penultimate proline residue has multiple conformations. Substitution of Ala for Pro residues indicates that multiple conformers arise from the cis- trans isomerization of Xaa1–Pro2 peptide bonds as Xaa–Ala peptide bonds are unlikely to adopt the cis isomer, and examination of spectra from a library of 58 peptides indicates that ~80% of sequences show this effect. A simple mechanism suggesting that the barrier between the cis-and trans-proline forms is lowered because of low steric impedance is proposed. This observation may have interesting biological implications as well, and we note that a number of biologically active peptides have penultimate proline residues. PMID:25503299

  20. Effect of the sequence data deluge on the performance of methods for detecting protein functional residues.

    PubMed

    Garrido-Martín, Diego; Pazos, Florencio

    2018-02-27

    The exponential accumulation of new sequences in public databases is expected to improve the performance of all the approaches for predicting protein structural and functional features. Nevertheless, this was never assessed or quantified for some widely used methodologies, such as those aimed at detecting functional sites and functional subfamilies in protein multiple sequence alignments. Using raw protein sequences as only input, these approaches can detect fully conserved positions, as well as those with a family-dependent conservation pattern. Both types of residues are routinely used as predictors of functional sites and, consequently, understanding how the sequence content of the databases affects them is relevant and timely. In this work we evaluate how the growth and change with time in the content of sequence databases affect five sequence-based approaches for detecting functional sites and subfamilies. We do that by recreating historical versions of the multiple sequence alignments that would have been obtained in the past based on the database contents at different time points, covering a period of 20 years. Applying the methods to these historical alignments allows quantifying the temporal variation in their performance. Our results show that the number of families to which these methods can be applied sharply increases with time, while their ability to detect potentially functional residues remains almost constant. These results are informative for the methods' developers and final users, and may have implications in the design of new sequencing initiatives.

  1. A sequence-specific transcription activator motif and powerful synthetic variants that bind Mediator using a fuzzy protein interface.

    PubMed

    Warfield, Linda; Tuttle, Lisa M; Pacheco, Derek; Klevit, Rachel E; Hahn, Steven

    2014-08-26

    Although many transcription activators contact the same set of coactivator complexes, the mechanism and specificity of these interactions have been unclear. For example, do intrinsically disordered transcription activation domains (ADs) use sequence-specific motifs, or do ADs of seemingly different sequence have common properties that encode activation function? We find that the central activation domain (cAD) of the yeast activator Gcn4 functions through a short, conserved sequence-specific motif. Optimizing the residues surrounding this short motif by inserting additional hydrophobic residues creates very powerful ADs that bind the Mediator subunit Gal11/Med15 with high affinity via a "fuzzy" protein interface. In contrast to Gcn4, the activity of these synthetic ADs is not strongly dependent on any one residue of the AD, and this redundancy is similar to that of some natural ADs in which few if any sequence-specific residues have been identified. The additional hydrophobic residues in the synthetic ADs likely allow multiple faces of the AD helix to interact with the Gal11 activator-binding domain, effectively forming a fuzzier interface than that of the wild-type cAD.

  2. ADOMA: A Command Line Tool to Modify ClustalW Multiple Alignment Output.

    PubMed

    Zaal, Dionne; Nota, Benjamin

    2016-01-01

    We present ADOMA, a command line tool that produces alternative outputs from ClustalW multiple alignments of nucleotide or protein sequences. ADOMA can simplify the output of alignments by showing only the different residues between sequences, which is often desirable when only small differences such as single nucleotide polymorphisms are present (e.g., between different alleles). Another feature of ADOMA is that it can enhance the ClustalW output by coloring the residues in the alignment. This tool is easily integrated into automated Linux pipelines for next-generation sequencing data analysis, and may be useful for researchers in a broad range of scientific disciplines including evolutionary biology and biomedical sciences. The source code is freely available at https://sourceforge. net/projects/adoma/. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  3. fLPS: Fast discovery of compositional biases for the protein universe.

    PubMed

    Harrison, Paul M

    2017-11-13

    Proteins often contain regions that are compositionally biased (CB), i.e., they are made from a small subset of amino-acid residue types. These CB regions can be functionally important, e.g., the prion-forming and prion-like regions that are rich in asparagine and glutamine residues. Here I report a new program fLPS that can rapidly annotate CB regions. It discovers both single-residue and multiple-residue biases. It works through a process of probability minimization. First, contigs are constructed for each amino-acid type out of sequence windows with a low degree of bias; second, these contigs are searched exhaustively for low-probability subsequences (LPSs); third, such LPSs are iteratively assessed for merger into possible multiple-residue biases. At each of these stages, efficiency measures are taken to avoid or delay probability calculations unless/until they are necessary. On a current desktop workstation, the fLPS algorithm can annotate the biased regions of the yeast proteome (>5700 sequences) in <1 s, and of the whole current TrEMBL database (>65 million sequences) in as little as ~1 h, which is >2 times faster than the commonly used program SEG, using default parameters. fLPS discovers both shorter CB regions (of the sort that are often termed 'low-complexity sequence'), and milder biases that may only be detectable over long tracts of sequence. fLPS can readily handle very large protein data sets, such as might come from metagenomics projects. It is useful in searching for proteins with similar CB regions, and for making functional inferences about CB regions for a protein of interest. The fLPS package is available from: http://biology.mcgill.ca/faculty/harrison/flps.html , or https://github.com/pmharrison/flps , or is a supplement to this article.

  4. PREvaIL, an integrative approach for inferring catalytic residues using sequence, structural, and network features in a machine-learning framework.

    PubMed

    Song, Jiangning; Li, Fuyi; Takemoto, Kazuhiro; Haffari, Gholamreza; Akutsu, Tatsuya; Chou, Kuo-Chen; Webb, Geoffrey I

    2018-04-14

    Determining the catalytic residues in an enzyme is critical to our understanding the relationship between protein sequence, structure, function, and enhancing our ability to design novel enzymes and their inhibitors. Although many enzymes have been sequenced, and their primary and tertiary structures determined, experimental methods for enzyme functional characterization lag behind. Because experimental methods used for identifying catalytic residues are resource- and labor-intensive, computational approaches have considerable value and are highly desirable for their ability to complement experimental studies in identifying catalytic residues and helping to bridge the sequence-structure-function gap. In this study, we describe a new computational method called PREvaIL for predicting enzyme catalytic residues. This method was developed by leveraging a comprehensive set of informative features extracted from multiple levels, including sequence, structure, and residue-contact network, in a random forest machine-learning framework. Extensive benchmarking experiments on eight different datasets based on 10-fold cross-validation and independent tests, as well as side-by-side performance comparisons with seven modern sequence- and structure-based methods, showed that PREvaIL achieved competitive predictive performance, with an area under the receiver operating characteristic curve and area under the precision-recall curve ranging from 0.896 to 0.973 and from 0.294 to 0.523, respectively. We demonstrated that this method was able to capture useful signals arising from different levels, leveraging such differential but useful types of features and allowing us to significantly improve the performance of catalytic residue prediction. We believe that this new method can be utilized as a valuable tool for both understanding the complex sequence-structure-function relationships of proteins and facilitating the characterization of novel enzymes lacking functional annotations. Copyright © 2018 Elsevier Ltd. All rights reserved.

  5. The production of Multiple Small Peptaibol Families by Single 14-Module Peptide Synthetases in Trichoderma/Hypocrea

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Degenkolb, Thomas; Aghchehb, Razieh Karimi; Dieckmann, Ralf

    2012-03-01

    The most common peptaibibiotic structures are 11-residue peptaibols found widely distributed in the genus Trichoderma/Hypocrea. Frequently associated are 14-residue peptaibols sharing partial sequence identity. Genome sequencing projects of 3 Trichoderma strains of the major clades reveal the presence of up to 3 types of nonribosomal peptide synthetases with 7, 14, or 18-20 amino acid adding modules. We here provide evidence that the 14-module NRPS type found in T. virens, T. reesei (teleomorph Hypocrea jecorina) and T. atroviride produces both 11- and 14- residue peptaibols based on the disruption of the respective NRPS gene of T. reesei, and bioinformatic analysis ofmore » their amino acid activating domains and modules. The structures of these peptides may be predicted from the gene structures and have been confirmed by analysis of families of 11- and 14-residue peptaibols from the strain 618, termed hypojecorins A (23 sequences determined, 4 new) and B (3 new sequences), and the recently established trichovirins A from T. virens. The distribution of 11- and 14-residue products is strain-specific and depends on growth conditions as well. Possible mechanisms of module skipping are discussed.« less

  6. Antimicrobial Peptides from Plants

    PubMed Central

    Tam, James P.; Wang, Shujing; Wong, Ka H.; Tan, Wei Liang

    2015-01-01

    Plant antimicrobial peptides (AMPs) have evolved differently from AMPs from other life forms. They are generally rich in cysteine residues which form multiple disulfides. In turn, the disulfides cross-braced plant AMPs as cystine-rich peptides to confer them with extraordinary high chemical, thermal and proteolytic stability. The cystine-rich or commonly known as cysteine-rich peptides (CRPs) of plant AMPs are classified into families based on their sequence similarity, cysteine motifs that determine their distinctive disulfide bond patterns and tertiary structure fold. Cystine-rich plant AMP families include thionins, defensins, hevein-like peptides, knottin-type peptides (linear and cyclic), lipid transfer proteins, α-hairpinin and snakins family. In addition, there are AMPs which are rich in other amino acids. The ability of plant AMPs to organize into specific families with conserved structural folds that enable sequence variation of non-Cys residues encased in the same scaffold within a particular family to play multiple functions. Furthermore, the ability of plant AMPs to tolerate hypervariable sequences using a conserved scaffold provides diversity to recognize different targets by varying the sequence of the non-cysteine residues. These properties bode well for developing plant AMPs as potential therapeutics and for protection of crops through transgenic methods. This review provides an overview of the major families of plant AMPs, including their structures, functions, and putative mechanisms. PMID:26580629

  7. Direct identification of non-polio enteroviruses in residual paralysis cases by analysis of VP1 sequences.

    PubMed

    Rahimi, Pooneh; Tabatabaie, H; Gouya, Mohammad M; Mahmudi, M; Musavi, T; Rad, K Samimi; Azad, T Mokhtari; Nategh, R

    2009-06-01

    The 66 serotypes of human enteroviruses (EVs) are classified into four species A-D, based on phylogenetic relationships in multiple genome regions. Partial VP(1) amplification and sequence analysis are reliable methods for identifying non-polio enterovirus serotypes, especially in negative cell culture specimens from patients with residual paralysis. In Iran during the years 2000-2002, there were 29 residual paralysis cases with negative cell (RD, HEp(2) and L(20)B) culture results. The genomic RNA was extracted from stool specimens from cases of residual paralysis and detected by amplification of the 5'-nontranslated region using RT-PCR with Pan-EV primers. Partial VP(1) amplification by semi-nested RT-PCR (snRT-PCR) and sequence analysis were done. Specimens from the 29 culture-negative cases contained echoviruses of six different serotypes. The global eradication of wild polioviruses is near and study of non-polio enteroviruses, which can cause poliomyelitis, is increasingly important to understand their pathogenesis. The VP(1) sequences, derived from the snRT-PCR products, allowed rapid molecular analysis of these non-polio strains.

  8. GASP: Gapped Ancestral Sequence Prediction for proteins

    PubMed Central

    Edwards, Richard J; Shields, Denis C

    2004-01-01

    Background The prediction of ancestral protein sequences from multiple sequence alignments is useful for many bioinformatics analyses. Predicting ancestral sequences is not a simple procedure and relies on accurate alignments and phylogenies. Several algorithms exist based on Maximum Parsimony or Maximum Likelihood methods but many current implementations are unable to process residues with gaps, which may represent insertion/deletion (indel) events or sequence fragments. Results Here we present a new algorithm, GASP (Gapped Ancestral Sequence Prediction), for predicting ancestral sequences from phylogenetic trees and the corresponding multiple sequence alignments. Alignments may be of any size and contain gaps. GASP first assigns the positions of gaps in the phylogeny before using a likelihood-based approach centred on amino acid substitution matrices to assign ancestral amino acids. Important outgroup information is used by first working down from the tips of the tree to the root, using descendant data only to assign probabilities, and then working back up from the root to the tips using descendant and outgroup data to make predictions. GASP was tested on a number of simulated datasets based on real phylogenies. Prediction accuracy for ungapped data was similar to three alternative algorithms tested, with GASP performing better in some cases and worse in others. Adding simple insertions and deletions to the simulated data did not have a detrimental effect on GASP accuracy. Conclusions GASP (Gapped Ancestral Sequence Prediction) will predict ancestral sequences from multiple protein alignments of any size. Although not as accurate in all cases as some of the more sophisticated maximum likelihood approaches, it can process a wide range of input phylogenies and will predict ancestral sequences for gapped and ungapped residues alike. PMID:15350199

  9. Multiple acquisitions via sequential transfer of orphan spin polarization (MAeSTOSO): How far can we push residual spin polarization in solid-state NMR?

    NASA Astrophysics Data System (ADS)

    Gopinath, T.; Veglia, Gianluigi

    2016-06-01

    Conventional multidimensional magic angle spinning (MAS) solid-state NMR (ssNMR) experiments detect the signal arising from the decay of a single coherence transfer pathway (FID), resulting in one spectrum per acquisition time. Recently, we introduced two new strategies, namely DUMAS (DUal acquisition Magic Angle Spinning) and MEIOSIS (Multiple ExperIments via Orphan SpIn operatorS), that enable the simultaneous acquisitions of multidimensional ssNMR experiments using multiple coherence transfer pathways. Here, we combined the main elements of DUMAS and MEIOSIS to harness both orphan spin operators and residual polarization and increase the number of simultaneous acquisitions. We show that it is possible to acquire up to eight two-dimensional experiments using four acquisition periods per each scan. This new suite of pulse sequences, called MAeSTOSO for Multiple Acquisitions via Sequential Transfer of Orphan Spin pOlarization, relies on residual polarization of both 13C and 15N pathways and combines low- and high-sensitivity experiments into a single pulse sequence using one receiver and commercial ssNMR probes. The acquisition of multiple experiments does not affect the sensitivity of the main experiment; rather it recovers the lost coherences that are discarded, resulting in a significant gain in experimental time. Both merits and limitations of this approach are discussed.

  10. Conservation of coevolving protein interfaces bridges prokaryote–eukaryote homologies in the twilight zone

    PubMed Central

    Rodriguez-Rivas, Juan; Marsili, Simone; Juan, David; Valencia, Alfonso

    2016-01-01

    Protein–protein interactions are fundamental for the proper functioning of the cell. As a result, protein interaction surfaces are subject to strong evolutionary constraints. Recent developments have shown that residue coevolution provides accurate predictions of heterodimeric protein interfaces from sequence information. So far these approaches have been limited to the analysis of families of prokaryotic complexes for which large multiple sequence alignments of homologous sequences can be compiled. We explore the hypothesis that coevolution points to structurally conserved contacts at protein–protein interfaces, which can be reliably projected to homologous complexes with distantly related sequences. We introduce a domain-centered protocol to study the interplay between residue coevolution and structural conservation of protein–protein interfaces. We show that sequence-based coevolutionary analysis systematically identifies residue contacts at prokaryotic interfaces that are structurally conserved at the interface of their eukaryotic counterparts. In turn, this allows the prediction of conserved contacts at eukaryotic protein–protein interfaces with high confidence using solely mutational patterns extracted from prokaryotic genomes. Even in the context of high divergence in sequence (the twilight zone), where standard homology modeling of protein complexes is unreliable, our approach provides sequence-based accurate information about specific details of protein interactions at the residue level. Selected examples of the application of prokaryotic coevolutionary analysis to the prediction of eukaryotic interfaces further illustrate the potential of this approach. PMID:27965389

  11. Conservation of coevolving protein interfaces bridges prokaryote-eukaryote homologies in the twilight zone.

    PubMed

    Rodriguez-Rivas, Juan; Marsili, Simone; Juan, David; Valencia, Alfonso

    2016-12-27

    Protein-protein interactions are fundamental for the proper functioning of the cell. As a result, protein interaction surfaces are subject to strong evolutionary constraints. Recent developments have shown that residue coevolution provides accurate predictions of heterodimeric protein interfaces from sequence information. So far these approaches have been limited to the analysis of families of prokaryotic complexes for which large multiple sequence alignments of homologous sequences can be compiled. We explore the hypothesis that coevolution points to structurally conserved contacts at protein-protein interfaces, which can be reliably projected to homologous complexes with distantly related sequences. We introduce a domain-centered protocol to study the interplay between residue coevolution and structural conservation of protein-protein interfaces. We show that sequence-based coevolutionary analysis systematically identifies residue contacts at prokaryotic interfaces that are structurally conserved at the interface of their eukaryotic counterparts. In turn, this allows the prediction of conserved contacts at eukaryotic protein-protein interfaces with high confidence using solely mutational patterns extracted from prokaryotic genomes. Even in the context of high divergence in sequence (the twilight zone), where standard homology modeling of protein complexes is unreliable, our approach provides sequence-based accurate information about specific details of protein interactions at the residue level. Selected examples of the application of prokaryotic coevolutionary analysis to the prediction of eukaryotic interfaces further illustrate the potential of this approach.

  12. Statistical discovery of site inter-dependencies in sub-molecular hierarchical protein structuring

    PubMed Central

    2012-01-01

    Background Much progress has been made in understanding the 3D structure of proteins using methods such as NMR and X-ray crystallography. The resulting 3D structures are extremely informative, but do not always reveal which sites and residues within the structure are of special importance. Recently, there are indications that multiple-residue, sub-domain structural relationships within the larger 3D consensus structure of a protein can be inferred from the analysis of the multiple sequence alignment data of a protein family. These intra-dependent clusters of associated sites are used to indicate hierarchical inter-residue relationships within the 3D structure. To reveal the patterns of associations among individual amino acids or sub-domain components within the structure, we apply a k-modes attribute (aligned site) clustering algorithm to the ubiquitin and transthyretin families in order to discover associations among groups of sites within the multiple sequence alignment. We then observe what these associations imply within the 3D structure of these two protein families. Results The k-modes site clustering algorithm we developed maximizes the intra-group interdependencies based on a normalized mutual information measure. The clusters formed correspond to sub-structural components or binding and interface locations. Applying this data-directed method to the ubiquitin and transthyretin protein family multiple sequence alignments as a test bed, we located numerous interesting associations of interdependent sites. These clusters were then arranged into cluster tree diagrams which revealed four structural sub-domains within the single domain structure of ubiquitin and a single large sub-domain within transthyretin associated with the interface among transthyretin monomers. In addition, several clusters of mutually interdependent sites were discovered for each protein family, each of which appear to play an important role in the molecular structure and/or function. Conclusions Our results demonstrate that the method we present here using a k-modes site clustering algorithm based on interdependency evaluation among sites obtained from a sequence alignment of homologous proteins can provide significant insights into the complex, hierarchical inter-residue structural relationships within the 3D structure of a protein family. PMID:22793672

  13. Statistical discovery of site inter-dependencies in sub-molecular hierarchical protein structuring.

    PubMed

    Durston, Kirk K; Chiu, David Ky; Wong, Andrew Kc; Li, Gary Cl

    2012-07-13

    Much progress has been made in understanding the 3D structure of proteins using methods such as NMR and X-ray crystallography. The resulting 3D structures are extremely informative, but do not always reveal which sites and residues within the structure are of special importance. Recently, there are indications that multiple-residue, sub-domain structural relationships within the larger 3D consensus structure of a protein can be inferred from the analysis of the multiple sequence alignment data of a protein family. These intra-dependent clusters of associated sites are used to indicate hierarchical inter-residue relationships within the 3D structure. To reveal the patterns of associations among individual amino acids or sub-domain components within the structure, we apply a k-modes attribute (aligned site) clustering algorithm to the ubiquitin and transthyretin families in order to discover associations among groups of sites within the multiple sequence alignment. We then observe what these associations imply within the 3D structure of these two protein families. The k-modes site clustering algorithm we developed maximizes the intra-group interdependencies based on a normalized mutual information measure. The clusters formed correspond to sub-structural components or binding and interface locations. Applying this data-directed method to the ubiquitin and transthyretin protein family multiple sequence alignments as a test bed, we located numerous interesting associations of interdependent sites. These clusters were then arranged into cluster tree diagrams which revealed four structural sub-domains within the single domain structure of ubiquitin and a single large sub-domain within transthyretin associated with the interface among transthyretin monomers. In addition, several clusters of mutually interdependent sites were discovered for each protein family, each of which appear to play an important role in the molecular structure and/or function. Our results demonstrate that the method we present here using a k-modes site clustering algorithm based on interdependency evaluation among sites obtained from a sequence alignment of homologous proteins can provide significant insights into the complex, hierarchical inter-residue structural relationships within the 3D structure of a protein family.

  14. Homonuclear Hartmann-Hahn transfer with reduced relaxation losses by use of the MOCCA-XY16 multiple pulse sequence

    NASA Astrophysics Data System (ADS)

    Furrer, Julien; Kramer, Frank; Marino, John P.; Glaser, Steffen J.; Luy, Burkhard

    2004-01-01

    Homonuclear Hartmann-Hahn transfer is one of the most important building blocks in modern high-resolution NMR. It constitutes a very efficient transfer element for the assignment of proteins, nucleic acids, and oligosaccharides. Nevertheless, in macromolecules exceeding ˜10 kDa TOCSY-experiments can show decreasing sensitivity due to fast transverse relaxation processes that are active during the mixing periods. In this article we propose the MOCCA-XY16 multiple pulse sequence, originally developed for efficient TOCSY transfer through residual dipolar couplings, as a homonuclear Hartmann-Hahn sequence with improved relaxation properties. A theoretical analysis of the coherence transfer via scalar couplings and its relaxation behavior as well as experimental transfer curves for MOCCA-XY16 relative to the well-characterized DIPSI-2 multiple pulse sequence are given.

  15. Homonuclear Hartmann-Hahn transfer with reduced relaxation losses by use of the MOCCA-XY16 multiple pulse sequence.

    PubMed

    Furrer, Julien; Kramer, Frank; Marino, John P; Glaser, Steffen J; Luy, Burkhard

    2004-01-01

    Homonuclear Hartmann-Hahn transfer is one of the most important building blocks in modern high-resolution NMR. It constitutes a very efficient transfer element for the assignment of proteins, nucleic acids, and oligosaccharides. Nevertheless, in macromolecules exceeding approximately 10 kDa TOCSY-experiments can show decreasing sensitivity due to fast transverse relaxation processes that are active during the mixing periods. In this article we propose the MOCCA-XY16 multiple pulse sequence, originally developed for efficient TOCSY transfer through residual dipolar couplings, as a homonuclear Hartmann-Hahn sequence with improved relaxation properties. A theoretical analysis of the coherence transfer via scalar couplings and its relaxation behavior as well as experimental transfer curves for MOCCA-XY16 relative to the well-characterized DIPSI-2 multiple pulse sequence are given.

  16. SigniSite: Identification of residue-level genotype-phenotype correlations in protein multiple sequence alignments.

    PubMed

    Jessen, Leon Eyrich; Hoof, Ilka; Lund, Ole; Nielsen, Morten

    2013-07-01

    Identifying which mutation(s) within a given genotype is responsible for an observable phenotype is important in many aspects of molecular biology. Here, we present SigniSite, an online application for subgroup-free residue-level genotype-phenotype correlation. In contrast to similar methods, SigniSite does not require any pre-definition of subgroups or binary classification. Input is a set of protein sequences where each sequence has an associated real number, quantifying a given phenotype. SigniSite will then identify which amino acid residues are significantly associated with the data set phenotype. As output, SigniSite displays a sequence logo, depicting the strength of the phenotype association of each residue and a heat-map identifying 'hot' or 'cold' regions. SigniSite was benchmarked against SPEER, a state-of-the-art method for the prediction of specificity determining positions (SDP) using a set of human immunodeficiency virus protease-inhibitor genotype-phenotype data and corresponding resistance mutation scores from the Stanford University HIV Drug Resistance Database, and a data set of protein families with experimentally annotated SDPs. For both data sets, SigniSite was found to outperform SPEER. SigniSite is available at: http://www.cbs.dtu.dk/services/SigniSite/.

  17. New Measurement for Correlation of Co-evolution Relationship of Subsequences in Protein.

    PubMed

    Gao, Hongyun; Yu, Xiaoqing; Dou, Yongchao; Wang, Jun

    2015-12-01

    Many computational tools have been developed to measure the protein residues co-evolution. Most of them only focus on co-evolution for pairwise residues in a protein sequence. However, number of residues participate in co-evolution might be multiple. And some co-evolved residues are clustered in several distinct regions in primary structure. Therefore, the co-evolution among the adjacent residues and the correlation between the distinct regions offer insights into function and evolution of the protein and residues. Subsequence is used to represent the adjacent multiple residues in one distinct region. In the paper, co-evolution relationship in each subsequence is represented by mutual information matrix (MIM). Then, Pearson's correlation coefficient: R value is developed to measure the similarity correlation of two MIMs. MSAs from Catalytic Data Base (Catalytic Site Atlas, CSA) are used for testing. R value characterizes a specific class of residues. In contrast to individual pairwise co-evolved residues, adjacent residues without high individual MI values are found since the co-evolved relationship among them is similar to that among another set of adjacent residues. These subsequences possess some flexibility in the composition of side chains, such as the catalyzed environment.

  18. Protein 3D Structure Computed from Evolutionary Sequence Variation

    PubMed Central

    Sheridan, Robert; Hopf, Thomas A.; Pagnani, Andrea; Zecchina, Riccardo; Sander, Chris

    2011-01-01

    The evolutionary trajectory of a protein through sequence space is constrained by its function. Collections of sequence homologs record the outcomes of millions of evolutionary experiments in which the protein evolves according to these constraints. Deciphering the evolutionary record held in these sequences and exploiting it for predictive and engineering purposes presents a formidable challenge. The potential benefit of solving this challenge is amplified by the advent of inexpensive high-throughput genomic sequencing. In this paper we ask whether we can infer evolutionary constraints from a set of sequence homologs of a protein. The challenge is to distinguish true co-evolution couplings from the noisy set of observed correlations. We address this challenge using a maximum entropy model of the protein sequence, constrained by the statistics of the multiple sequence alignment, to infer residue pair couplings. Surprisingly, we find that the strength of these inferred couplings is an excellent predictor of residue-residue proximity in folded structures. Indeed, the top-scoring residue couplings are sufficiently accurate and well-distributed to define the 3D protein fold with remarkable accuracy. We quantify this observation by computing, from sequence alone, all-atom 3D structures of fifteen test proteins from different fold classes, ranging in size from 50 to 260 residues., including a G-protein coupled receptor. These blinded inferences are de novo, i.e., they do not use homology modeling or sequence-similar fragments from known structures. The co-evolution signals provide sufficient information to determine accurate 3D protein structure to 2.7–4.8 Å Cα-RMSD error relative to the observed structure, over at least two-thirds of the protein (method called EVfold, details at http://EVfold.org). This discovery provides insight into essential interactions constraining protein evolution and will facilitate a comprehensive survey of the universe of protein structures, new strategies in protein and drug design, and the identification of functional genetic variants in normal and disease genomes. PMID:22163331

  19. Analysis of Ribosome Inactivating Protein (RIP): A Bioinformatics Approach

    NASA Astrophysics Data System (ADS)

    Jothi, G. Edward Gnana; Majilla, G. Sahaya Jose; Subhashini, D.; Deivasigamani, B.

    2012-10-01

    In spite of the medical advances in recent years, the world is in need of different sources to encounter certain health issues.Ribosome Inactivating Proteins (RIPs) were found to be one among them. In order to get easy access about RIPs, there is a need to analyse RIPs towards constructing a database on RIPs. Also, multiple sequence alignment was done towards screening for homologues of significant RIPs from rare sources against RIPs from easily available sources in terms of similarity. Protein sequences were retrieved from SWISS-PROT and are further analysed using pair wise and multiple sequence alignment.Analysis shows that, 151 RIPs have been characterized to date. Amongst them, there are 87 type I, 37 type II, 1 type III and 25 unknown RIPs. The sequence length information of various RIPs about the availability of full or partial sequence was also found. The multiple sequence alignment of 37 type I RIP using the online server Multalin, indicates the presence of 20 conserved residues. Pairwise alignment and multiple sequence alignment of certain selected RIPs in two groups namely Group I and Group II were carried out and the consensus level was found to be 98%, 98% and 90% respectively.

  20. SequenceCEROSENE: a computational method and web server to visualize spatial residue neighborhoods at the sequence level.

    PubMed

    Heinke, Florian; Bittrich, Sebastian; Kaiser, Florian; Labudde, Dirk

    2016-01-01

    To understand the molecular function of biopolymers, studying their structural characteristics is of central importance. Graphics programs are often utilized to conceive these properties, but with the increasing number of available structures in databases or structure models produced by automated modeling frameworks this process requires assistance from tools that allow automated structure visualization. In this paper a web server and its underlying method for generating graphical sequence representations of molecular structures is presented. The method, called SequenceCEROSENE (color encoding of residues obtained by spatial neighborhood embedding), retrieves the sequence of each amino acid or nucleotide chain in a given structure and produces a color coding for each residue based on three-dimensional structure information. From this, color-highlighted sequences are obtained, where residue coloring represent three-dimensional residue locations in the structure. This color encoding thus provides a one-dimensional representation, from which spatial interactions, proximity and relations between residues or entire chains can be deduced quickly and solely from color similarity. Furthermore, additional heteroatoms and chemical compounds bound to the structure, like ligands or coenzymes, are processed and reported as well. To provide free access to SequenceCEROSENE, a web server has been implemented that allows generating color codings for structures deposited in the Protein Data Bank or structure models uploaded by the user. Besides retrieving visualizations in popular graphic formats, underlying raw data can be downloaded as well. In addition, the server provides user interactivity with generated visualizations and the three-dimensional structure in question. Color encoded sequences generated by SequenceCEROSENE can aid to quickly perceive the general characteristics of a structure of interest (or entire sets of complexes), thus supporting the researcher in the initial phase of structure-based studies. In this respect, the web server can be a valuable tool, as users are allowed to process multiple structures, quickly switch between results, and interact with generated visualizations in an intuitive manner. The SequenceCEROSENE web server is available at https://biosciences.hs-mittweida.de/seqcerosene.

  1. Sequence harmony: detecting functional specificity from alignments

    PubMed Central

    Feenstra, K. Anton; Pirovano, Walter; Krab, Klaas; Heringa, Jaap

    2007-01-01

    Multiple sequence alignments are often used for the identification of key specificity-determining residues within protein families. We present a web server implementation of the Sequence Harmony (SH) method previously introduced. SH accurately detects subfamily specific positions from a multiple alignment by scoring compositional differences between subfamilies, without imposing conservation. The SH web server allows a quick selection of subtype specific sites from a multiple alignment given a subfamily grouping. In addition, it allows the predicted sites to be directly mapped onto a protein structure and displayed. We demonstrate the use of the SH server using the family of plant mitochondrial alternative oxidases (AOX). In addition, we illustrate the usefulness of combining sequence and structural information by showing that the predicted sites are clustered into a few distinct regions in an AOX homology model. The SH web server can be accessed at www.ibi.vu.nl/programs/seqharmwww. PMID:17584793

  2. Sequence, structure and function relationships in flaviviruses as assessed by evolutive aspects of its conserved non-structural protein domains.

    PubMed

    da Fonseca, Néli José; Lima Afonso, Marcelo Querino; Pedersolli, Natan Gonçalves; de Oliveira, Lucas Carrijo; Andrade, Dhiego Souto; Bleicher, Lucas

    2017-10-28

    Flaviviruses are responsible for serious diseases such as dengue, yellow fever, and zika fever. Their genomes encode a polyprotein which, after cleavage, results in three structural and seven non-structural proteins. Homologous proteins can be studied by conservation and coevolution analysis as detected in multiple sequence alignments, usually reporting positions which are strictly necessary for the structure and/or function of all members in a protein family or which are involved in a specific sub-class feature requiring the coevolution of residue sets. This study provides a complete conservation and coevolution analysis on all flaviviruses non-structural proteins, with results mapped on all well-annotated available sequences. A literature review on the residues found in the analysis enabled us to compile available information on their roles and distribution among different flaviviruses. Also, we provide the mapping of conserved and coevolved residues for all sequences currently in SwissProt as a supplementary material, so that particularities in different viruses can be easily analyzed. Copyright © 2017 Elsevier Inc. All rights reserved.

  3. On the Role of Aggregation Prone Regions in Protein Evolution, Stability, and Enzymatic Catalysis: Insights from Diverse Analyses

    PubMed Central

    Buck, Patrick M.; Kumar, Sandeep; Singh, Satish K.

    2013-01-01

    The various roles that aggregation prone regions (APRs) are capable of playing in proteins are investigated here via comprehensive analyses of multiple non-redundant datasets containing randomly generated amino acid sequences, monomeric proteins, intrinsically disordered proteins (IDPs) and catalytic residues. Results from this study indicate that the aggregation propensities of monomeric protein sequences have been minimized compared to random sequences with uniform and natural amino acid compositions, as observed by a lower average aggregation propensity and fewer APRs that are shorter in length and more often punctuated by gate-keeper residues. However, evidence for evolutionary selective pressure to disrupt these sequence regions among homologous proteins is inconsistent. APRs are less conserved than average sequence identity among closely related homologues (≥80% sequence identity with a parent) but APRs are more conserved than average sequence identity among homologues that have at least 50% sequence identity with a parent. Structural analyses of APRs indicate that APRs are three times more likely to contain ordered versus disordered residues and that APRs frequently contribute more towards stabilizing proteins than equal length segments from the same protein. Catalytic residues and APRs were also found to be in structural contact significantly more often than expected by random chance. Our findings suggest that proteins have evolved by optimizing their risk of aggregation for cellular environments by both minimizing aggregation prone regions and by conserving those that are important for folding and function. In many cases, these sequence optimizations are insufficient to develop recombinant proteins into commercial products. Rational design strategies aimed at improving protein solubility for biotechnological purposes should carefully evaluate the contributions made by candidate APRs, targeted for disruption, towards protein structure and activity. PMID:24146608

  4. Prognostic value of deep sequencing method for minimal residual disease detection in multiple myeloma

    PubMed Central

    Lahuerta, Juan J.; Pepin, François; González, Marcos; Barrio, Santiago; Ayala, Rosa; Puig, Noemí; Montalban, María A.; Paiva, Bruno; Weng, Li; Jiménez, Cristina; Sopena, María; Moorhead, Martin; Cedena, Teresa; Rapado, Immaculada; Mateos, María Victoria; Rosiñol, Laura; Oriol, Albert; Blanchard, María J.; Martínez, Rafael; Bladé, Joan; San Miguel, Jesús; Faham, Malek; García-Sanz, Ramón

    2014-01-01

    We assessed the prognostic value of minimal residual disease (MRD) detection in multiple myeloma (MM) patients using a sequencing-based platform in bone marrow samples from 133 MM patients in at least very good partial response (VGPR) after front-line therapy. Deep sequencing was carried out in patients in whom a high-frequency myeloma clone was identified and MRD was assessed using the IGH-VDJH, IGH-DJH, and IGK assays. The results were contrasted with those of multiparametric flow cytometry (MFC) and allele-specific oligonucleotide polymerase chain reaction (ASO-PCR). The applicability of deep sequencing was 91%. Concordance between sequencing and MFC and ASO-PCR was 83% and 85%, respectively. Patients who were MRD– by sequencing had a significantly longer time to tumor progression (TTP) (median 80 vs 31 months; P < .0001) and overall survival (median not reached vs 81 months; P = .02), compared with patients who were MRD+. When stratifying patients by different levels of MRD, the respective TTP medians were: MRD ≥10−3 27 months, MRD 10−3 to 10−5 48 months, and MRD <10−5 80 months (P = .003 to .0001). Ninety-two percent of VGPR patients were MRD+. In complete response patients, the TTP remained significantly longer for MRD– compared with MRD+ patients (131 vs 35 months; P = .0009). PMID:24646471

  5. Friedelin Synthase from Maytenus ilicifolia: Leucine 482 Plays an Essential Role in the Production of the Most Rearranged Pentacyclic Triterpene

    PubMed Central

    Souza-Moreira, Tatiana M.; Alves, Thaís B.; Pinheiro, Karina A.; Felippe, Lidiane G.; De Lima, Gustavo M. A.; Watanabe, Tatiana F.; Barbosa, Cristina C.; Santos, Vânia A. F. F. M.; Lopes, Norberto P.; Valentini, Sandro R.; Guido, Rafael V. C.; Furlan, Maysa; Zanelli, Cleslei F.

    2016-01-01

    Among the biologically active triterpenes, friedelin has the most-rearranged structure produced by the oxidosqualene cyclases and is the only one containing a cetonic group. In this study, we cloned and functionally characterized friedelin synthase and one cycloartenol synthase from Maytenus ilicifolia (Celastraceae). The complete coding sequences of these 2 genes were cloned from leaf mRNA, and their functions were characterized by heterologous expression in yeast. The cycloartenol synthase sequence is very similar to other known OSCs of this type (approximately 80% identity), although the M. ilicifolia friedelin synthase amino acid sequence is more related to β-amyrin synthases (65–74% identity), which is similar to the friedelin synthase cloned from Kalanchoe daigremontiana. Multiple sequence alignments demonstrated the presence of a leucine residue two positions upstream of the friedelin synthase Asp-Cys-Thr-Ala-Glu (DCTAE) active site motif, while the vast majority of OSCs identified so far have a valine or isoleucine residue at the same position. The substitution of the leucine residue with valine, threonine or isoleucine in M. ilicifolia friedelin synthase interfered with substrate recognition and lead to the production of different pentacyclic triterpenes. Hence, our data indicate a key role for the leucine residue in the structure and function of this oxidosqualene cyclase. PMID:27874020

  6. Friedelin Synthase from Maytenus ilicifolia: Leucine 482 Plays an Essential Role in the Production of the Most Rearranged Pentacyclic Triterpene

    NASA Astrophysics Data System (ADS)

    Souza-Moreira, Tatiana M.; Alves, Thaís B.; Pinheiro, Karina A.; Felippe, Lidiane G.; de Lima, Gustavo M. A.; Watanabe, Tatiana F.; Barbosa, Cristina C.; Santos, Vânia A. F. F. M.; Lopes, Norberto P.; Valentini, Sandro R.; Guido, Rafael V. C.; Furlan, Maysa; Zanelli, Cleslei F.

    2016-11-01

    Among the biologically active triterpenes, friedelin has the most-rearranged structure produced by the oxidosqualene cyclases and is the only one containing a cetonic group. In this study, we cloned and functionally characterized friedelin synthase and one cycloartenol synthase from Maytenus ilicifolia (Celastraceae). The complete coding sequences of these 2 genes were cloned from leaf mRNA, and their functions were characterized by heterologous expression in yeast. The cycloartenol synthase sequence is very similar to other known OSCs of this type (approximately 80% identity), although the M. ilicifolia friedelin synthase amino acid sequence is more related to β-amyrin synthases (65-74% identity), which is similar to the friedelin synthase cloned from Kalanchoe daigremontiana. Multiple sequence alignments demonstrated the presence of a leucine residue two positions upstream of the friedelin synthase Asp-Cys-Thr-Ala-Glu (DCTAE) active site motif, while the vast majority of OSCs identified so far have a valine or isoleucine residue at the same position. The substitution of the leucine residue with valine, threonine or isoleucine in M. ilicifolia friedelin synthase interfered with substrate recognition and lead to the production of different pentacyclic triterpenes. Hence, our data indicate a key role for the leucine residue in the structure and function of this oxidosqualene cyclase.

  7. The mechanical design of spider silks: from fibroin sequence to mechanical function.

    PubMed

    Gosline, J M; Guerette, P A; Ortlepp, C S; Savage, K N

    1999-12-01

    Spiders produce a variety of silks, and the cloning of genes for silk fibroins reveals a clear link between protein sequence and structure-property relationships. The fibroins produced in the spider's major ampullate (MA) gland, which forms the dragline and web frame, contain multiple repeats of motifs that include an 8-10 residue long poly-alanine block and a 24-35 residue long glycine-rich block. When fibroins are spun into fibres, the poly-alanine blocks form (&bgr;)-sheet crystals that crosslink the fibroins into a polymer network with great stiffness, strength and toughness. As illustrated by a comparison of MA silks from Araneus diadematus and Nephila clavipes, variation in fibroin sequence and properties between spider species provides the opportunity to investigate the design of these remarkable biomaterials.

  8. Sequence analysis of serum albumins reveals the molecular evolution of ligand recognition properties.

    PubMed

    Fanali, Gabriella; Ascenzi, Paolo; Bernardi, Giorgio; Fasano, Mauro

    2012-01-01

    Serum albumin (SA) is a circulating protein providing a depot and carrier for many endogenous and exogenous compounds. At least seven major binding sites have been identified by structural and functional investigations mainly in human SA. SA is conserved in vertebrates, with at least 49 entries in protein sequence databases. The multiple sequence analysis of this set of entries leads to the definition of a cladistic tree for the molecular evolution of SA orthologs in vertebrates, thus showing the clustering of the considered species, with lamprey SAs (Lethenteron japonicum and Petromyzon marinus) in a separate outgroup. Sequence analysis aimed at searching conserved domains revealed that most SA sequences are made up by three repeated domains (about 600 residues), as extensively characterized for human SA. On the contrary, lamprey SAs are giant proteins (about 1400 residues) comprising seven repeated domains. The phylogenetic analysis of the SA family reveals a stringent correlation with the taxonomic classification of the species available in sequence databases. A focused inspection of the sequences of ligand binding sites in SA revealed that in all sites most residues involved in ligand binding are conserved, although the versatility towards different ligands could be peculiar of higher organisms. Moreover, the analysis of molecular links between the different sites suggests that allosteric modulation mechanisms could be restricted to higher vertebrates.

  9. In silico analysis of L-asparaginase from different source organisms.

    PubMed

    Dwivedi, Vivek Dhar; Mishra, Sarad Kumar

    2014-06-01

    L-asparaginases are widely distributed enzymes among plants, fungi and bacteria. This enzyme catalyzes the conversion of l-asparagine to l-aspartate and ammonia and to a lesser extent the formation of l-glutamate from l-glutamine. In the present study, forty-five full-length amino acid sequences of L-asparaginases from bacteria, fungi and plants were collected and subjected to multiple sequence alignment (MSA), domain identification, discovering individual amino acid composition, and phylogenetic tree construction. MSA revealed that two glycine residues were identically found in all analyzed species, two glycine residues were also identically found in all the fungal and bacterial sources and three glycine residues were identically found in all plant and bacterial sources while no residue was identically found in plant and fungal L-asparaginases. Two major sequence clusters were constructed by phylogenetic analysis. One cluster contains eleven species of fungi, twelve species of bacteria, and one species of plant, whereas the other one contains fourteen species of plant, four species of fungi and three species bacteria. The amino acid composition result revealed that the average frequency of amino acid alanine is 10.77 percent that is very high in comparison to other amino acids in all analyzed species.

  10. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Fogh, R.H.; Mabbutt, B.C.; Kem, W.R.

    Sequence-specific assignments are reported for the 500-MHz H nuclear magnetic resonance (NMR) spectrum of the 48-residue polypeptide neurotoxin I from the sea anemone Stichodactyla helianthus (Sh I). Spin systems were first identified by using two-dimensional relayed or multiple quantum filtered correlation spectroscopy, double quantum spectroscopy, and spin lock experiments. Specific resonance assignments were then obtained from nuclear Overhauser enhancement (NOE) connectivities between protons from residues adjacent in the amino acid sequence. Of a total of 265 potentially observable resonances, 248 (i.e., 94%) were assigned, arising from 39 completely and 9 partially assigned amino acid spin systems. The secondary structure ofmore » Sh I was defined on the basis of the pattern of sequential NOE connectivities. NOEs between protons on separate strands of the polypeptide backbone, and backbone amide exchange rates. Sh I contains a four-stranded antiparallel {beta}-sheet encompassing residues 1-5, 16-24, 30-33, and 40-46, with a {beta}-bulge at residues 17 and 18 and a reverse turn, probably a type II {beta}-turn, involving residues 27-30. No evidence of {alpha}-helical structure was found.« less

  11. Evolutionarily conserved regions and hydrophobic contacts at the superfamily level: The case of the fold-type I, pyridoxal-5′-phosphate-dependent enzymes

    PubMed Central

    Paiardini, Alessandro; Bossa, Francesco; Pascarella, Stefano

    2004-01-01

    The wealth of biological information provided by structural and genomic projects opens new prospects of understanding life and evolution at the molecular level. In this work, it is shown how computational approaches can be exploited to pinpoint protein structural features that remain invariant upon long evolutionary periods in the fold-type I, PLP-dependent enzymes. A nonredundant set of 23 superposed crystallographic structures belonging to this superfamily was built. Members of this family typically display high-structural conservation despite low-sequence identity. For each structure, a multiple-sequence alignment of orthologous sequences was obtained, and the 23 alignments were merged using the structural information to obtain a comprehensive multiple alignment of 921 sequences of fold-type I enzymes. The structurally conserved regions (SCRs), the evolutionarily conserved residues, and the conserved hydrophobic contacts (CHCs) were extracted from this data set, using both sequence and structural information. The results of this study identified a structural pattern of hydrophobic contacts shared by all of the superfamily members of fold-type I enzymes and involved in native interactions. This profile highlights the presence of a nucleus for this fold, in which residues participating in the most conserved native interactions exhibit preferential evolutionary conservation, that correlates significantly (r = 0.70) with the extent of mean hydrophobic contact value of their apolar fraction. PMID:15498941

  12. Protein contact prediction using patterns of correlation.

    PubMed

    Hamilton, Nicholas; Burrage, Kevin; Ragan, Mark A; Huber, Thomas

    2004-09-01

    We describe a new method for using neural networks to predict residue contact pairs in a protein. The main inputs to the neural network are a set of 25 measures of correlated mutation between all pairs of residues in two "windows" of size 5 centered on the residues of interest. While the individual pair-wise correlations are a relatively weak predictor of contact, by training the network on windows of correlation the accuracy of prediction is significantly improved. The neural network is trained on a set of 100 proteins and then tested on a disjoint set of 1033 proteins of known structure. An average predictive accuracy of 21.7% is obtained taking the best L/2 predictions for each protein, where L is the sequence length. Taking the best L/10 predictions gives an average accuracy of 30.7%. The predictor is also tested on a set of 59 proteins from the CASP5 experiment. The accuracy is found to be relatively consistent across different sequence lengths, but to vary widely according to the secondary structure. Predictive accuracy is also found to improve by using multiple sequence alignments containing many sequences to calculate the correlations. Copyright 2004 Wiley-Liss, Inc.

  13. Iterative Overlap FDE for Multicode DS-CDMA

    NASA Astrophysics Data System (ADS)

    Takeda, Kazuaki; Tomeba, Hiromichi; Adachi, Fumiyuki

    Recently, a new frequency-domain equalization (FDE) technique, called overlap FDE, that requires no GI insertion was proposed. However, the residual inter/intra-block interference (IBI) cannot completely be removed. In addition to this, for multicode direct sequence code division multiple access (DS-CDMA), the presence of residual interchip interference (ICI) after FDE distorts orthogonality among the spreading codes. In this paper, we propose an iterative overlap FDE for multicode DS-CDMA to suppress both the residual IBI and the residual ICI. In the iterative overlap FDE, joint minimum mean square error (MMSE)-FDE and ICI cancellation is repeated a sufficient number of times. The bit error rate (BER) performance with the iterative overlap FDE is evaluated by computer simulation.

  14. Automated hierarchical classification of protein domain subfamilies based on functionally-divergent residue signatures

    PubMed Central

    2012-01-01

    Background The NCBI Conserved Domain Database (CDD) consists of a collection of multiple sequence alignments of protein domains that are at various stages of being manually curated into evolutionary hierarchies based on conserved and divergent sequence and structural features. These domain models are annotated to provide insights into the relationships between sequence, structure and function via web-based BLAST searches. Results Here we automate the generation of conserved domain (CD) hierarchies using a combination of heuristic and Markov chain Monte Carlo (MCMC) sampling procedures and starting from a (typically very large) multiple sequence alignment. This procedure relies on statistical criteria to define each hierarchy based on the conserved and divergent sequence patterns associated with protein functional-specialization. At the same time this facilitates the sequence and structural annotation of residues that are functionally important. These statistical criteria also provide a means to objectively assess the quality of CD hierarchies, a non-trivial task considering that the protein subgroups are often very distantly related—a situation in which standard phylogenetic methods can be unreliable. Our aim here is to automatically generate (typically sub-optimal) hierarchies that, based on statistical criteria and visual comparisons, are comparable to manually curated hierarchies; this serves as the first step toward the ultimate goal of obtaining optimal hierarchical classifications. A plot of runtimes for the most time-intensive (non-parallelizable) part of the algorithm indicates a nearly linear time complexity so that, even for the extremely large Rossmann fold protein class, results were obtained in about a day. Conclusions This approach automates the rapid creation of protein domain hierarchies and thus will eliminate one of the most time consuming aspects of conserved domain database curation. At the same time, it also facilitates protein domain annotation by identifying those pattern residues that most distinguish each protein domain subgroup from other related subgroups. PMID:22726767

  15. The Interaction of Integrin αIIbβ3 with Fibrin Occurs through Multiple Binding Sites in the αIIb β-Propeller Domain*

    PubMed Central

    Podolnikova, Nataly P.; Yakovlev, Sergiy; Yakubenko, Valentin P.; Wang, Xu; Gorkun, Oleg V.; Ugarova, Tatiana P.

    2014-01-01

    The currently available antithrombotic agents target the interaction of platelet integrin αIIbβ3 (GPIIb-IIIa) with fibrinogen during platelet aggregation. Platelets also bind fibrin formed early during thrombus growth. It was proposed that inhibition of platelet-fibrin interactions may be a necessary and important property of αIIbβ3 antagonists; however, the mechanisms by which αIIbβ3 binds fibrin are uncertain. We have previously identified the γ370–381 sequence (P3) in the γC domain of fibrinogen as the fibrin-specific binding site for αIIbβ3 involved in platelet adhesion and platelet-mediated fibrin clot retraction. In the present study, we have demonstrated that P3 can bind to several discontinuous segments within the αIIb β-propeller domain of αIIbβ3 enriched with negatively charged and aromatic residues. By screening peptide libraries spanning the sequence of the αIIb β-propeller, several sequences were identified as candidate contact sites for P3. Synthetic peptides duplicating these segments inhibited platelet adhesion and clot retraction but not platelet aggregation, supporting the role of these regions in fibrin recognition. Mutant αIIbβ3 receptors in which residues identified as critical for P3 binding were substituted for homologous residues in the I-less integrin αMβ2 exhibited reduced cell adhesion and clot retraction. These residues are different from those that are involved in the coordination of the fibrinogen γ404–411 sequence and from auxiliary sites implicated in binding of soluble fibrinogen. These results map the binding of fibrin to multiple sites in the αIIb β-propeller and further indicate that recognition specificity of αIIbβ3 for fibrin differs from that for soluble fibrinogen. PMID:24338009

  16. Improve the prediction of RNA-binding residues using structural neighbours.

    PubMed

    Li, Quan; Cao, Zanxia; Liu, Haiyan

    2010-03-01

    The interactions between RNA-binding proteins (RBPs) with RNA play key roles in managing some of the cell's basic functions. The identification and prediction of RNA binding sites is important for understanding the RNA-binding mechanism. Computational approaches are being developed to predict RNA-binding residues based on the sequence- or structure-derived features. To achieve higher prediction accuracy, improvements on current prediction methods are necessary. We identified that the structural neighbors of RNA-binding and non-RNA-binding residues have different amino acid compositions. Combining this structure-derived feature with evolutionary (PSSM) and other structural information (secondary structure and solvent accessibility) significantly improves the predictions over existing methods. Using a multiple linear regression approach and 6-fold cross validation, our best model can achieve an overall correct rate of 87.8% and MCC of 0.47, with a specificity of 93.4%, correctly predict 52.4% of the RNA-binding residues for a dataset containing 107 non-homologous RNA-binding proteins. Compared with existing methods, including the amino acid compositions of structure neighbors lead to clearly improvement. A web server was developed for predicting RNA binding residues in a protein sequence (or structure),which is available at http://mcgill.3322.org/RNA/.

  17. STING Millennium: a web-based suite of programs for comprehensive and simultaneous analysis of protein structure and sequence

    PubMed Central

    Neshich, Goran; Togawa, Roberto C.; Mancini, Adauto L.; Kuser, Paula R.; Yamagishi, Michel E. B.; Pappas, Georgios; Torres, Wellington V.; Campos, Tharsis Fonseca e; Ferreira, Leonardo L.; Luna, Fabio M.; Oliveira, Adilton G.; Miura, Ronald T.; Inoue, Marcus K.; Horita, Luiz G.; de Souza, Dimas F.; Dominiquini, Fabiana; Álvaro, Alexandre; Lima, Cleber S.; Ogawa, Fabio O.; Gomes, Gabriel B.; Palandrani, Juliana F.; dos Santos, Gabriela F.; de Freitas, Esther M.; Mattiuz, Amanda R.; Costa, Ivan C.; de Almeida, Celso L.; Souza, Savio; Baudet, Christian; Higa, Roberto H.

    2003-01-01

    STING Millennium Suite (SMS) is a new web-based suite of programs and databases providing visualization and a complex analysis of molecular sequence and structure for the data deposited at the Protein Data Bank (PDB). SMS operates with a collection of both publicly available data (PDB, HSSP, Prosite) and its own data (contacts, interface contacts, surface accessibility). Biologists find SMS useful because it provides a variety of algorithms and validated data, wrapped-up in a user friendly web interface. Using SMS it is now possible to analyze sequence to structure relationships, the quality of the structure, nature and volume of atomic contacts of intra and inter chain type, relative conservation of amino acids at the specific sequence position based on multiple sequence alignment, indications of folding essential residue (FER) based on the relationship of the residue conservation to the intra-chain contacts and Cα–Cα and Cβ–Cβ distance geometry. Specific emphasis in SMS is given to interface forming residues (IFR)—amino acids that define the interactive portion of the protein surfaces. SMS may simultaneously display and analyze previously superimposed structures. PDB updates trigger SMS updates in a synchronized fashion. SMS is freely accessible for public data at http://www.cbi.cnptia.embrapa.br, http://mirrors.rcsb.org/SMS and http://trantor.bioc.columbia.edu/SMS. PMID:12824333

  18. Application of advanced cytometric and molecular technologies to minimal residual disease monitoring

    NASA Astrophysics Data System (ADS)

    Leary, James F.; He, Feng; Reece, Lisa M.

    2000-04-01

    Minimal residual disease monitoring presents a number of theoretical and practical challenges. Recently it has been possible to meet some of these challenges by combining a number of new advanced biotechnologies. To monitor the number of residual tumor cells requires complex cocktails of molecular probes that collectively provide sensitivities of detection on the order of one residual tumor cell per million total cells. Ultra-high-speed, multi parameter flow cytometry is capable of analyzing cells at rates in excess of 100,000 cells/sec. Residual tumor selection marker cocktails can be optimized by use of receiver operating characteristic analysis. New data minimizing techniques when combined with multi variate statistical or neural network classifications of tumor cells can more accurately predict residual tumor cell frequencies. The combination of these techniques can, under at least some circumstances, detect frequencies of tumor cells as low as one cell in a million with an accuracy of over 98 percent correct classification. Detection of mutations in tumor suppressor genes requires insolation of these rare tumor cells and single-cell DNA sequencing. Rare residual tumor cells can be isolated at single cell level by high-resolution single-cell cell sorting. Molecular characterization of tumor suppressor gene mutations can be accomplished using a combination of single- cell polymerase chain reaction amplification of specific gene sequences followed by TA cloning techniques and DNA sequencing. Mutations as small as a single base pair in a tumor suppressor gene of a single sorted tumor cell have been detected using these methods. Using new amplification procedures and DNA micro arrays it should be possible to extend the capabilities shown in this paper to screening of multiple DNA mutations in tumor suppressor and other genes on small numbers of sorted metastatic tumor cells.

  19. Improving Multiple Fault Diagnosability using Possible Conflicts

    NASA Technical Reports Server (NTRS)

    Daigle, Matthew J.; Bregon, Anibal; Biswas, Gautam; Koutsoukos, Xenofon; Pulido, Belarmino

    2012-01-01

    Multiple fault diagnosis is a difficult problem for dynamic systems. Due to fault masking, compensation, and relative time of fault occurrence, multiple faults can manifest in many different ways as observable fault signature sequences. This decreases diagnosability of multiple faults, and therefore leads to a loss in effectiveness of the fault isolation step. We develop a qualitative, event-based, multiple fault isolation framework, and derive several notions of multiple fault diagnosability. We show that using Possible Conflicts, a model decomposition technique that decouples faults from residuals, we can significantly improve the diagnosability of multiple faults compared to an approach using a single global model. We demonstrate these concepts and provide results using a multi-tank system as a case study.

  20. Protein sequence analysis, cloning, and expression of flammutoxin, a pore-forming cytolysin from Flammulina velutipes. Maturation of dimeric precursor to monomeric active form by carboxyl-terminal truncation.

    PubMed

    Tomita, Toshio; Mizumachi, Yoshihiro; Chong, Kang; Ogawa, Kanako; Konishi, Norihide; Sugawara-Tomita, Noriko; Dohmae, Naoshi; Hashimoto, Yohichi; Takio, Koji

    2004-12-24

    Flammutoxin (FTX), a 31-kDa pore-forming cytolysin from Flammulina velutipes, is specifically expressed during the fruiting body formation. We cloned and expressed the cDNA encoding a 272-residue protein with an identical N-terminal sequence with that of FTX but failed to obtain hemolytically active protein. This, together with the presence of multiple FTX family proteins in the mushroom, prompted us to determine the complete primary structure of FTX by protein sequence analysis. The N-terminal 72 and C-terminal 107 residues were sequenced by Edman degradation of the fragments generated from the alkylated FTX by enzymatic digestions with Achromobacter protease I or Staphylococcus aureus V8 protease and by chemical cleavages with CNBr, hydroxylamine, or 1% formic acid. The central part of FTX was sequenced with a surface-adhesive 7-kDa fragment, which was generated by a tryptic digestion of FTX and recovered by rinsing the wall of a test tube with 6 M guanidine HCl. The 7-kDa peptide was cleaved with 12 M HCl, thermolysin, or S. aureus V8 protease to produce smaller peptides for sequence analysis. As a result, FTX consisted of 251 residues, and protein and nucleotide sequences were in accord except for the lack of the initial Met and the C-terminal 20 residues in protein. Recombinant FTX (rFTX) with or without the C-terminal 20 residues (rFTX271 or rFTX251, respectively) was prepared to study the maturation process of FTX. Like natural FTX, rFTX251 existed as a monomer in solution and assembled into an SDS-stable, ring-shaped pore complex on human erythrocytes, causing hemolysis. In contrast, rFTX271, existing as a dimer in solution, bound to the cells but failed to form pore complex. The dimeric rFTX271 was converted to hemolytically active monomers upon the cleavage between Lys(251) and Met(252) by trypsin.

  1. Predicting residue-wise contact orders in proteins by support vector regression.

    PubMed

    Song, Jiangning; Burrage, Kevin

    2006-10-03

    The residue-wise contact order (RWCO) describes the sequence separations between the residues of interest and its contacting residues in a protein sequence. It is a new kind of one-dimensional protein structure that represents the extent of long-range contacts and is considered as a generalization of contact order. Together with secondary structure, accessible surface area, the B factor, and contact number, RWCO provides comprehensive and indispensable important information to reconstructing the protein three-dimensional structure from a set of one-dimensional structural properties. Accurately predicting RWCO values could have many important applications in protein three-dimensional structure prediction and protein folding rate prediction, and give deep insights into protein sequence-structure relationships. We developed a novel approach to predict residue-wise contact order values in proteins based on support vector regression (SVR), starting from primary amino acid sequences. We explored seven different sequence encoding schemes to examine their effects on the prediction performance, including local sequence in the form of PSI-BLAST profiles, local sequence plus amino acid composition, local sequence plus molecular weight, local sequence plus secondary structure predicted by PSIPRED, local sequence plus molecular weight and amino acid composition, local sequence plus molecular weight and predicted secondary structure, and local sequence plus molecular weight, amino acid composition and predicted secondary structure. When using local sequences with multiple sequence alignments in the form of PSI-BLAST profiles, we could predict the RWCO distribution with a Pearson correlation coefficient (CC) between the predicted and observed RWCO values of 0.55, and root mean square error (RMSE) of 0.82, based on a well-defined dataset with 680 protein sequences. Moreover, by incorporating global features such as molecular weight and amino acid composition we could further improve the prediction performance with the CC to 0.57 and an RMSE of 0.79. In addition, combining the predicted secondary structure by PSIPRED was found to significantly improve the prediction performance and could yield the best prediction accuracy with a CC of 0.60 and RMSE of 0.78, which provided at least comparable performance compared with the other existing methods. The SVR method shows a prediction performance competitive with or at least comparable to the previously developed linear regression-based methods for predicting RWCO values. In contrast to support vector classification (SVC), SVR is very good at estimating the raw value profiles of the samples. The successful application of the SVR approach in this study reinforces the fact that support vector regression is a powerful tool in extracting the protein sequence-structure relationship and in estimating the protein structural profiles from amino acid sequences.

  2. A common deletion in two gamma ray induced rat pulmonary tumor cell lines.

    PubMed

    Van Klaveren, P; De Bruijne, J; Van der Winden, H; Kal, H B; Bentvelzen, P

    1994-01-01

    Subtraction hybridization was performed on normal WAG/Rij rat DNA with DNA from a syngeneic Ir-192 induced pulmonary tumor cell line L37. The residual DNA was amplified by means of sequence-independent PCR. This procedure yielded a sequence, of which multiple copies are present in normal rat DNA. In the tumor line L37 two restriction fragments hybridizing with this repeat sequence are lacking. In another Ir-192 induced pulmonary tumor line, L33, one of these fragments was also lacking. This indicates a common deletion in the two tumor lines.

  3. Regulation of the Production of Infectious Genotype 1a Hepatitis C Virus by NS5A Domain III▿

    PubMed Central

    Kim, Seungtaek; Welsch, Christoph; Yi, MinKyung; Lemon, Stanley M.

    2011-01-01

    Although hepatitis C virus (HCV) assembly remains incompletely understood, recent studies with the genotype 2a JFH-1 strain suggest that it is dependent upon the phosphorylation of Ser residues near the C terminus of NS5A, a multifunctional nonstructural protein. Since genotype 1 viruses account for most HCV disease yet differ substantially in sequence from that of JFH-1, we studied the role of NS5A in the production of the H77S virus. While less efficient than JFH-1, genotype 1a H77S RNA produces infectious virus when transfected into permissive Huh-7 cells. The exchange of complete NS5A sequences between these viruses was highly detrimental to replication, while exchanges of the C-terminal domain III sequence (46% amino acid sequence identity) were well tolerated, with little effect on RNA synthesis. Surprisingly, the placement of the H77S domain III sequence into JFH-1 resulted in increased virus yields; conversely, H77S yields were reduced by the introduction of domain III from JFH-1. These changes in infectious virus yield correlated well with changes in the abundance of NS5A in RNA-transfected cells but not with RNA replication or core protein expression levels. Alanine replacement mutagenesis of selected Ser and Thr residues in the C-terminal domain III sequence revealed no single residue to be essential for infectious H77S virus production. However, virus production was eliminated by Ala substitutions at multiple residues and could be restored by phosphomimetic Asp substitutions at these sites. Thus, despite low overall sequence homology, the production of infectious virus is regulated similarly in JFH-1 and H77S viruses by a conserved function associated with a C-terminal Ser/Thr cluster in domain III of NS5A. PMID:21525356

  4. Whole-genome and multisector exome sequencing of primary and post-treatment glioblastoma reveals patterns of tumor evolution

    PubMed Central

    Kim, Hoon; Zheng, Siyuan; Amini, Seyed S.; Virk, Selene M.; Mikkelsen, Tom; Brat, Daniel J.; Grimsby, Jonna; Sougnez, Carrie; Muller, Florian; Hu, Jian; Sloan, Andrew E.; Cohen, Mark L.; Van Meir, Erwin G.; Scarpace, Lisa; Laird, Peter W.; Weinstein, John N.; Lander, Eric S.; Gabriel, Stacey; Getz, Gad; Meyerson, Matthew; Chin, Lynda; Barnholtz-Sloan, Jill S.

    2015-01-01

    Glioblastoma (GBM) is a prototypical heterogeneous brain tumor refractory to conventional therapy. A small residual population of cells escapes surgery and chemoradiation, resulting in a typically fatal tumor recurrence ∼7 mo after diagnosis. Understanding the molecular architecture of this residual population is critical for the development of successful therapies. We used whole-genome sequencing and whole-exome sequencing of multiple sectors from primary and paired recurrent GBM tumors to reconstruct the genomic profile of residual, therapy resistant tumor initiating cells. We found that genetic alteration of the p53 pathway is a primary molecular event predictive of a high number of subclonal mutations in glioblastoma. The genomic road leading to recurrence is highly idiosyncratic but can be broadly classified into linear recurrences that share extensive genetic similarity with the primary tumor and can be directly traced to one of its specific sectors, and divergent recurrences that share few genetic alterations with the primary tumor and originate from cells that branched off early during tumorigenesis. Our study provides mechanistic insights into how genetic alterations in primary tumors impact the ensuing evolution of tumor cells and the emergence of subclonal heterogeneity. PMID:25650244

  5. Unique Structural Features and Sequence Motifs of Proline Utilization A (PutA)

    PubMed Central

    Singh, Ranjan K.; Tanner, John J.

    2013-01-01

    Proline utilization A proteins (PutAs) are bifunctional enzymes that catalyze the oxidation of proline to glutamate using spatially separated proline dehydrogenase and pyrroline-5-carboxylate dehydrogenase active sites. Here we use the crystal structure of the minimalist PutA from Bradyrhizobium japonicum (BjPutA) along with sequence analysis to identify unique structural features of PutAs. This analysis shows that PutAs have secondary structural elements and domains not found in the related monofunctional enzymes. Some of these extra features are predicted to be important for substrate channeling in BjPutA. Multiple sequence alignment analysis shows that some PutAs have a 17-residue conserved motif in the C-terminal 20–30 residues of the polypeptide chain. The BjPutA structure shows that this motif helps seal the internal substrate-channeling cavity from the bulk medium. Finally, it is shown that some PutAs have a 100–200 residue domain of unknown function in the C-terminus that is not found in minimalist PutAs. Remote homology detection suggests that this domain is homologous to the oligomerization beta-hairpin and Rossmann fold domain of BjPutA. PMID:22201760

  6. Q-learning residual analysis: application to the effectiveness of sequences of antipsychotic medications for patients with schizophrenia.

    PubMed

    Ertefaie, Ashkan; Shortreed, Susan; Chakraborty, Bibhas

    2016-06-15

    Q-learning is a regression-based approach that uses longitudinal data to construct dynamic treatment regimes, which are sequences of decision rules that use patient information to inform future treatment decisions. An optimal dynamic treatment regime is composed of a sequence of decision rules that indicate how to optimally individualize treatment using the patients' baseline and time-varying characteristics to optimize the final outcome. Constructing optimal dynamic regimes using Q-learning depends heavily on the assumption that regression models at each decision point are correctly specified; yet model checking in the context of Q-learning has been largely overlooked in the current literature. In this article, we show that residual plots obtained from standard Q-learning models may fail to adequately check the quality of the model fit. We present a modified Q-learning procedure that accommodates residual analyses using standard tools. We present simulation studies showing the advantage of the proposed modification over standard Q-learning. We illustrate this new Q-learning approach using data collected from a sequential multiple assignment randomized trial of patients with schizophrenia. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.

  7. Sequence Bundles: a novel method for visualising, discovering and exploring sequence motifs

    PubMed Central

    2014-01-01

    Background We introduce Sequence Bundles--a novel data visualisation method for representing multiple sequence alignments (MSAs). We identify and address key limitations of the existing bioinformatics data visualisation methods (i.e. the Sequence Logo) by enabling Sequence Bundles to give salient visual expression to sequence motifs and other data features, which would otherwise remain hidden. Methods For the development of Sequence Bundles we employed research-led information design methodologies. Sequences are encoded as uninterrupted, semi-opaque lines plotted on a 2-dimensional reconfigurable grid. Each line represents a single sequence. The thickness and opacity of the stack at each residue in each position indicates the level of conservation and the lines' curved paths expose patterns in correlation and functionality. Several MSAs can be visualised in a composite image. The Sequence Bundles method is designed to favour a tangible, continuous and intuitive display of information. Results We have developed a software demonstration application for generating a Sequence Bundles visualisation of MSAs provided for the BioVis 2013 redesign contest. A subsequent exploration of the visualised line patterns allowed for the discovery of a number of interesting features in the dataset. Reported features include the extreme conservation of sequences displaying a specific residue and bifurcations of the consensus sequence. Conclusions Sequence Bundles is a novel method for visualisation of MSAs and the discovery of sequence motifs. It can aid in generating new insight and hypothesis making. Sequence Bundles is well disposed for future implementation as an interactive visual analytics software, which can complement existing visualisation tools. PMID:25237395

  8. Sequence alignment visualization in HTML5 without Java.

    PubMed

    Gille, Christoph; Birgit, Weyand; Gille, Andreas

    2014-01-01

    Java has been extensively used for the visualization of biological data in the web. However, the Java runtime environment is an additional layer of software with an own set of technical problems and security risks. HTML in its new version 5 provides features that for some tasks may render Java unnecessary. Alignment-To-HTML is the first HTML-based interactive visualization for annotated multiple sequence alignments. The server side script interpreter can perform all tasks like (i) sequence retrieval, (ii) alignment computation, (iii) rendering, (iv) identification of a homologous structural models and (v) communication with BioDAS-servers. The rendered alignment can be included in web pages and is displayed in all browsers on all platforms including touch screen tablets. The functionality of the user interface is similar to legacy Java applets and includes color schemes, highlighting of conserved and variable alignment positions, row reordering by drag and drop, interlinked 3D visualization and sequence groups. Novel features are (i) support for multiple overlapping residue annotations, such as chemical modifications, single nucleotide polymorphisms and mutations, (ii) mechanisms to quickly hide residue annotations, (iii) export to MS-Word and (iv) sequence icons. Alignment-To-HTML, the first interactive alignment visualization that runs in web browsers without additional software, confirms that to some extend HTML5 is already sufficient to display complex biological data. The low speed at which programs are executed in browsers is still the main obstacle. Nevertheless, we envision an increased use of HTML and JavaScript for interactive biological software. Under GPL at: http://www.bioinformatics.org/strap/toHTML/.

  9. The Treacher Collins syndrome (TCOF1) gene product, treacle, is targeted to the nucleolus by signals in its C-terminus.

    PubMed

    Winokur, S T; Shiang, R

    1998-11-01

    The TCOF1 gene product, treacle, responsible for the craniofacial disorder Treacher Collins syndrome, has been predicted to be a member of a class of nucleolar phosphoproteins based on its primary amino acid sequence. Treacle is a low complexity protein with ten repeating units of acidic and basic residues, each of which contains a large number of putative casein kinase 2 and protein kinase C phosphorylation sites. In addition, the C-terminus of treacle contains multiple putative nuclear localization signals. The overall structure of treacle, as well as sequence similarity to several nucleolar phosphoproteins, predicts that treacle is a member of this class of proteins. Using green fluorescent protein fusion constructs with the full-length and deleted domains of the murine homolog of treacle, we demonstrate that the cellular localization of treacle is nucleolar. This localization is mediated by the last 41 residues of the C-terminus (residues 1262-1302). At least two functional nuclear localization signals have been identified in the protein, one between residues 1176 and 1270 and the second within the last 32 residues of the protein (1271-1302). The nucleolar localization signal is disrupted by two constructs that split the C-terminal region between residues 1270 and 1271. This study provides the first direct analysis of treacle and demonstrates that the protein involved in TCOF1 is a nucleolar protein.

  10. Conservation of hot regions in protein-protein interaction in evolution.

    PubMed

    Hu, Jing; Li, Jiarui; Chen, Nansheng; Zhang, Xiaolong

    2016-11-01

    The hot regions of protein-protein interactions refer to the active area which formed by those most important residues to protein combination process. With the research development on protein interactions, lots of predicted hot regions can be discovered efficiently by intelligent computing methods, while performing biology experiments to verify each every prediction is hardly to be done due to the time-cost and the complexity of the experiment. This study based on the research of hot spot residue conservations, the proposed method is used to verify authenticity of predicted hot regions that using machine learning algorithm combined with protein's biological features and sequence conservation, though multiple sequence alignment, module substitute matrix and sequence similarity to create conservation scoring algorithm, and then using threshold module to verify the conservation tendency of hot regions in evolution. This research work gives an effective method to verify predicted hot regions in protein-protein interactions, which also provides a useful way to deeply investigate the functional activities of protein hot regions. Copyright © 2016. Published by Elsevier Inc.

  11. Progressive structure-based alignment of homologous proteins: Adopting sequence comparison strategies.

    PubMed

    Joseph, Agnel Praveen; Srinivasan, Narayanaswamy; de Brevern, Alexandre G

    2012-09-01

    Comparison of multiple protein structures has a broad range of applications in the analysis of protein structure, function and evolution. Multiple structure alignment tools (MSTAs) are necessary to obtain a simultaneous comparison of a family of related folds. In this study, we have developed a method for multiple structure comparison largely based on sequence alignment techniques. A widely used Structural Alphabet named Protein Blocks (PBs) was used to transform the information on 3D protein backbone conformation as a 1D sequence string. A progressive alignment strategy similar to CLUSTALW was adopted for multiple PB sequence alignment (mulPBA). Highly similar stretches identified by the pairwise alignments are given higher weights during the alignment. The residue equivalences from PB based alignments are used to obtain a three dimensional fit of the structures followed by an iterative refinement of the structural superposition. Systematic comparisons using benchmark datasets of MSTAs underlines that the alignment quality is better than MULTIPROT, MUSTANG and the alignments in HOMSTRAD, in more than 85% of the cases. Comparison with other rigid-body and flexible MSTAs also indicate that mulPBA alignments are superior to most of the rigid-body MSTAs and highly comparable to the flexible alignment methods. Copyright © 2012 Elsevier Masson SAS. All rights reserved.

  12. Whole-genome and multisector exome sequencing of primary and post-treatment glioblastoma reveals patterns of tumor evolution.

    PubMed

    Kim, Hoon; Zheng, Siyuan; Amini, Seyed S; Virk, Selene M; Mikkelsen, Tom; Brat, Daniel J; Grimsby, Jonna; Sougnez, Carrie; Muller, Florian; Hu, Jian; Sloan, Andrew E; Cohen, Mark L; Van Meir, Erwin G; Scarpace, Lisa; Laird, Peter W; Weinstein, John N; Lander, Eric S; Gabriel, Stacey; Getz, Gad; Meyerson, Matthew; Chin, Lynda; Barnholtz-Sloan, Jill S; Verhaak, Roel G W

    2015-03-01

    Glioblastoma (GBM) is a prototypical heterogeneous brain tumor refractory to conventional therapy. A small residual population of cells escapes surgery and chemoradiation, resulting in a typically fatal tumor recurrence ∼ 7 mo after diagnosis. Understanding the molecular architecture of this residual population is critical for the development of successful therapies. We used whole-genome sequencing and whole-exome sequencing of multiple sectors from primary and paired recurrent GBM tumors to reconstruct the genomic profile of residual, therapy resistant tumor initiating cells. We found that genetic alteration of the p53 pathway is a primary molecular event predictive of a high number of subclonal mutations in glioblastoma. The genomic road leading to recurrence is highly idiosyncratic but can be broadly classified into linear recurrences that share extensive genetic similarity with the primary tumor and can be directly traced to one of its specific sectors, and divergent recurrences that share few genetic alterations with the primary tumor and originate from cells that branched off early during tumorigenesis. Our study provides mechanistic insights into how genetic alterations in primary tumors impact the ensuing evolution of tumor cells and the emergence of subclonal heterogeneity. © 2015 Kim et al.; Published by Cold Spring Harbor Laboratory Press.

  13. The NS3 proteins of global strains of bluetongue virus evolve into regional topotypes through negative (purifying) selection.

    PubMed

    Balasuriya, U B R; Nadler, S A; Wilson, W C; Pritchard, L I; Smythe, A B; Savini, G; Monaco, F; De Santis, P; Zhang, N; Tabachnick, W J; Maclachlan, N J

    2008-01-01

    Comparison of the deduced amino acid sequences of the genes (S10) encoding the NS3 protein of 137 strains of bluetongue virus (BTV) from Africa, the Americas, Asia, Australia and the Mediterranean Basin showed limited variation. Common to all NS3 sequences were potential glycosylation sites at amino acid residues 63 and 150 and a cysteine at residue 137, whereas a cysteine at residue 181 was not conserved. The PPXY and PS/TAP late-domain motifs were conserved in all but three of the viruses. Phylogenetic analyses of these same sequences yielded two principal clades that grouped the viruses irrespective of their serotype or year of isolation (1900-2003). All viruses from Asia and Australia were grouped in one clade, whereas those from the other regions were present in both clades. Each clade segregated into distinct subclades that included viruses from single or multiple regions, and the S10 genes of some field viruses were identical to those of live-attenuated BTV vaccines. There was no evidence of positive selection on the S10 gene as assessed by reconstruction of ancestral codon states on the phylogeny, rather the functional constraints of the NS3 protein are expressed through substantial negative (purifying) selection.

  14. Structural phylogeny by profile extraction and multiple superimposition using electrostatic congruence as a discriminator

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chakraborty, Sandeep; Rao, Basuthkar J.; Baker, Nathan A.

    2013-04-01

    Phylogenetic analysis of proteins using multiple sequence alignment (MSA) assumes an underlying evolutionary relationship in these proteins which occasionally remains undetected due to considerable sequence divergence. Structural alignment programs have been developed to unravel such fuzzy relationships. However, none of these structure based methods have used electrostatic properties to discriminate between spatially equivalent residues. We present a methodology for MSA of a set of related proteins with known structures using electrostatic properties as an additional discriminator (STEEP). STEEP first extracts a profile, then generates a multiple structural superimposition providing a consolidated spatial framework for comparing residues and finally emits themore » MSA. Residues that are aligned differently by including or excluding electrostatic properties can be targeted by directed evolution experiments to transform the enzymatic properties of one protein into another. We have compared STEEP results to those obtained from a MSA program (ClustalW) and a structural alignment method (MUSTANG) for chymotrypsin serine proteases. Subsequently, we used PhyML to generate phylogenetic trees for the serine and metallo-β-lactamase superfamilies from the STEEP generated MSA, and corroborated the accepted relationships in these superfamilies. We have observed that STEEP acts as a functional classifier when electrostatic congruence is used as a discriminator, and thus identifies potential targets for directed evolution experiments. In summary, STEEP is unique among phylogenetic methods for its ability to use electrostatic congruence to specify mutations that might be the source of the functional divergence in a protein family. Based on our results, we also hypothesize that the active site and its close vicinity contains enough information to infer the correct phylogeny for related proteins.« less

  15. Deciphering Dorin M glycosylation by mass spectrometry.

    PubMed

    Man, Petr; Kovár, Vojtech; Sterba, Ján; Strohalm, Martin; Kavan, Daniel; Kopácek, Petr; Grubhoffer, Libor; Havlícek, Vladimír

    2008-01-01

    The soft tick, Ornithodoros moubata, is a vector of several bacterial and viral pathogens including Borrelia duttoni, a causative agent of relapsing fever and African swine fever virus. Previously, a sialic acid-specific lectin Dorin M was isolated from its hemolymph. Here, we report on the complete characterization of the primary sequence of Dorin M. Using liquid chromatography coupled to mass spectrometry, we identified three different glycopeptides in the tryptic digest of Dorin M. The peptide, as well as the glycan part of all glycopeptides, were further fully sequenced by means of tandem mass spectrometry (MS2) and multiple-stage mass spectrometry (MS3). Two classical N-glycosylation sites were modified by high-mannose-type glycans containing up to nine mannose residues. The third site bore a glycan with four to five mannose residues and a deoxyhexose (fucose) attached to the proximal N-acetylglycosamine. The microheterogeneity at each site was estimated based on chromatographic behavior of different glycoforms. The fourth, a non-classical N-glycosylation site (Asn-Asn-Cys), was not glycosylated, probably due to the involvement of the cysteine residue in a disulfide bridge.

  16. Identification of the critical residues responsible for differential reactivation of the triosephosphate isomerases of two trypanosomes

    PubMed Central

    Rodríguez-Bolaños, Monica; Cabrera, Nallely

    2016-01-01

    The reactivation of triosephosphate isomerase (TIM) from unfolded monomers induced by guanidine hydrochloride involves different amino acids of its sequence in different stages of protein refolding. We describe a systematic mutagenesis method to find critical residues for certain physico-chemical properties of a protein. The two similar TIMs of Trypanosoma brucei and Trypanosoma cruzi have different reactivation velocities and efficiencies. We used a small number of chimeric enzymes, additive mutants and planned site-directed mutants to produce an enzyme from T. brucei with 13 mutations in its sequence, which reactivates fast and efficiently like wild-type (WT) TIM from T. cruzi, and another enzyme from T. cruzi, with 13 slightly altered mutations, which reactivated slowly and inefficiently like the WT TIM of T. brucei. Our method is a shorter alternative to random mutagenesis, saturation mutagenesis or directed evolution to find multiple amino acids critical for certain properties of proteins. PMID:27733588

  17. Statistical Linkage Analysis of Substitutions in Patient-Derived Sequences of Genotype 1a Hepatitis C Virus Nonstructural Protein 3 Exposes Targets for Immunogen Design

    PubMed Central

    Quadeer, Ahmed A.; Louie, Raymond H. Y.; Shekhar, Karthik; Chakraborty, Arup K.; Hsing, I-Ming

    2014-01-01

    ABSTRACT Chronic hepatitis C virus (HCV) infection is one of the leading causes of liver failure and liver cancer, affecting around 3% of the world's population. The extreme sequence variability of the virus resulting from error-prone replication has thwarted the discovery of a universal prophylactic vaccine. It is known that vigorous and multispecific cellular immune responses, involving both helper CD4+ and cytotoxic CD8+ T cells, are associated with the spontaneous clearance of acute HCV infection. Escape mutations in viral epitopes can, however, abrogate protective T-cell responses, leading to viral persistence and associated pathologies. Despite the propensity of the virus to mutate, there might still exist substitutions that incur a fitness cost. In this paper, we identify groups of coevolving residues within HCV nonstructural protein 3 (NS3) by analyzing diverse sequences of this protein using ideas from random matrix theory and associated methods. Our analyses indicate that one of these groups comprises a large percentage of residues for which HCV appears to resist multiple simultaneous substitutions. Targeting multiple residues in this group through vaccine-induced immune responses should either lead to viral recognition or elicit escape substitutions that compromise viral fitness. Our predictions are supported by published clinical data, which suggested that immune genotypes associated with spontaneous clearance of HCV preferentially recognized and targeted this vulnerable group of residues. Moreover, mapping the sites of this group onto the available protein structure provided insight into its functional significance. An epitope-based immunogen is proposed as an alternative to the NS3 epitopes in the peptide-based vaccine IC41. IMPORTANCE Despite much experimental work on HCV, a thorough statistical study of the HCV sequences for the purpose of immunogen design was missing in the literature. Such a study is vital to identify epistatic couplings among residues that can provide useful insights for designing a potent vaccine. In this work, ideas from random matrix theory were applied to characterize the statistics of substitutions within the diverse publicly available sequences of the genotype 1a HCV NS3 protein, leading to a group of sites for which HCV appears to resist simultaneous substitutions possibly due to deleterious effect on viral fitness. Our analysis leads to completely novel immunogen designs for HCV. In addition, the NS3 epitopes used in the recently proposed peptide-based vaccine IC41 were analyzed in the context of our framework. Our analysis predicts that alternative NS3 epitopes may be worth exploring as they might be more efficacious. PMID:24760894

  18. Accurate Sample Assignment in a Multiplexed, Ultrasensitive, High-Throughput Sequencing Assay for Minimal Residual Disease.

    PubMed

    Bartram, Jack; Mountjoy, Edward; Brooks, Tony; Hancock, Jeremy; Williamson, Helen; Wright, Gary; Moppett, John; Goulden, Nick; Hubank, Mike

    2016-07-01

    High-throughput sequencing (HTS) (next-generation sequencing) of the rearranged Ig and T-cell receptor genes promises to be less expensive and more sensitive than current methods of monitoring minimal residual disease (MRD) in patients with acute lymphoblastic leukemia. However, the adoption of new approaches by clinical laboratories requires careful evaluation of all potential sources of error and the development of strategies to ensure the highest accuracy. Timely and efficient clinical use of HTS platforms will depend on combining multiple samples (multiplexing) in each sequencing run. Here we examine the Ig heavy-chain gene HTS on the Illumina MiSeq platform for MRD. We identify errors associated with multiplexing that could potentially impact the accuracy of MRD analysis. We optimize a strategy that combines high-purity, sequence-optimized oligonucleotides, dual indexing, and an error-aware demultiplexing approach to minimize errors and maximize sensitivity. We present a probability-based, demultiplexing pipeline Error-Aware Demultiplexer that is suitable for all MiSeq strategies and accurately assigns samples to the correct identifier without excessive loss of data. Finally, using controls quantified by digital PCR, we show that HTS-MRD can accurately detect as few as 1 in 10(6) copies of specific leukemic MRD. Crown Copyright © 2016. Published by Elsevier Inc. All rights reserved.

  19. Techniques for computing the discrete Fourier transform using the quadratic residue Fermat number systems

    NASA Technical Reports Server (NTRS)

    Truong, T. K.; Chang, J. J.; Hsu, I. S.; Pei, D. Y.; Reed, I. S.

    1986-01-01

    The complex integer multiplier and adder over the direct sum of two copies of finite field developed by Cozzens and Finkelstein (1985) is specialized to the direct sum of the rings of integers modulo Fermat numbers. Such multiplication over the rings of integers modulo Fermat numbers can be performed by means of two integer multiplications, whereas the complex integer multiplication requires three integer multiplications. Such multiplications and additions can be used in the implementation of a discrete Fourier transform (DFT) of a sequence of complex numbers. The advantage of the present approach is that the number of multiplications needed to compute a systolic array of the DFT can be reduced substantially. The architectural designs using this approach are regular, simple, expandable and, therefore, naturally suitable for VLSI implementation.

  20. Predicting disulfide connectivity from protein sequence using multiple sequence feature vectors and secondary structure.

    PubMed

    Song, Jiangning; Yuan, Zheng; Tan, Hao; Huber, Thomas; Burrage, Kevin

    2007-12-01

    Disulfide bonds are primary covalent crosslinks between two cysteine residues in proteins that play critical roles in stabilizing the protein structures and are commonly found in extracy-toplasmatic or secreted proteins. In protein folding prediction, the localization of disulfide bonds can greatly reduce the search in conformational space. Therefore, there is a great need to develop computational methods capable of accurately predicting disulfide connectivity patterns in proteins that could have potentially important applications. We have developed a novel method to predict disulfide connectivity patterns from protein primary sequence, using a support vector regression (SVR) approach based on multiple sequence feature vectors and predicted secondary structure by the PSIPRED program. The results indicate that our method could achieve a prediction accuracy of 74.4% and 77.9%, respectively, when averaged on proteins with two to five disulfide bridges using 4-fold cross-validation, measured on the protein and cysteine pair on a well-defined non-homologous dataset. We assessed the effects of different sequence encoding schemes on the prediction performance of disulfide connectivity. It has been shown that the sequence encoding scheme based on multiple sequence feature vectors coupled with predicted secondary structure can significantly improve the prediction accuracy, thus enabling our method to outperform most of other currently available predictors. Our work provides a complementary approach to the current algorithms that should be useful in computationally assigning disulfide connectivity patterns and helps in the annotation of protein sequences generated by large-scale whole-genome projects. The prediction web server and Supplementary Material are accessible at http://foo.maths.uq.edu.au/~huber/disulfide

  1. RECOVIR Software for Identifying Viruses

    NASA Technical Reports Server (NTRS)

    Chakravarty, Sugoto; Fox, George E.; Zhu, Dianhui

    2013-01-01

    Most single-stranded RNA (ssRNA) viruses mutate rapidly to generate a large number of strains with highly divergent capsid sequences. Determining the capsid residues or nucleotides that uniquely characterize these strains is critical in understanding the strain diversity of these viruses. RECOVIR (an acronym for "recognize viruses") software predicts the strains of some ssRNA viruses from their limited sequence data. Novel phylogenetic-tree-based databases of protein or nucleic acid residues that uniquely characterize these virus strains are created. Strains of input virus sequences (partial or complete) are predicted through residue-wise comparisons with the databases. RECOVIR uses unique characterizing residues to identify automatically strains of partial or complete capsid sequences of picorna and caliciviruses, two of the most highly diverse ssRNA virus families. Partition-wise comparisons of the database residues with the corresponding residues of more than 300 complete and partial sequences of these viruses resulted in correct strain identification for all of these sequences. This study shows the feasibility of creating databases of hitherto unknown residues uniquely characterizing the capsid sequences of two of the most highly divergent ssRNA virus families. These databases enable automated strain identification from partial or complete capsid sequences of these human and animal pathogens.

  2. Automated use of mutagenesis data in structure prediction.

    PubMed

    Nanda, Vikas; DeGrado, William F

    2005-05-15

    In the absence of experimental structural determination, numerous methods are available to indirectly predict or probe the structure of a target molecule. Genetic modification of a protein sequence is a powerful tool for identifying key residues involved in binding reactions or protein stability. Mutagenesis data is usually incorporated into the modeling process either through manual inspection of model compatibility with empirical data, or through the generation of geometric constraints linking sensitive residues to a binding interface. We present an approach derived from statistical studies of lattice models for introducing mutation information directly into the fitness score. The approach takes into account the phenotype of mutation (neutral or disruptive) and calculates the energy for a given structure over an ensemble of sequences. The structure prediction procedure searches for the optimal conformation where neutral sequences either have no impact or improve stability and disruptive sequences reduce stability relative to wild type. We examine three types of sequence ensembles: information from saturation mutagenesis, scanning mutagenesis, and homologous proteins. Incorporating multiple sequences into a statistical ensemble serves to energetically separate the native state and misfolded structures. As a result, the prediction of structure with a poor force field is sufficiently enhanced by mutational information to improve accuracy. Furthermore, by separating misfolded conformations from the target score, the ensemble energy serves to speed up conformational search algorithms such as Monte Carlo-based methods. Copyright 2005 Wiley-Liss, Inc.

  3. Combined sequence and structure analysis of the fungal laccase family.

    PubMed

    Kumar, S V Suresh; Phale, Prashant S; Durani, S; Wangikar, Pramod P

    2003-08-20

    Plant and fungal laccases belong to the family of multi-copper oxidases and show much broader substrate specificity than other members of the family. Laccases have consequently been of interest for potential industrial applications. We have analyzed the essential sequence features of fungal laccases based on multiple sequence alignments of more than 100 laccases. This has resulted in identification of a set of four ungapped sequence regions, L1-L4, as the overall signature sequences that can be used to identify the laccases, distinguishing them within the broader class of multi-copper oxidases. The 12 amino acid residues in the enzymes serving as the copper ligands are housed within these four identified conserved regions, of which L2 and L4 conform to the earlier reported copper signature sequences of multi-copper oxidases while L1 and L3 are distinctive to the laccases. The mapping of regions L1-L4 on to the three-dimensional structure of the Coprinus cinerius laccase indicates that many of the non-copper-ligating residues of the conserved regions could be critical in maintaining a specific, more or less C-2 symmetric, protein conformational motif characterizing the active site apparatus of the enzymes. The observed intraprotein homologies between L1 and L3 and between L2 and L4 at both the structure and the sequence levels suggest that the quasi C-2 symmetric active site conformational motif may have arisen from a structural duplication event that neither the sequence homology analysis nor the structure homology analysis alone would have unraveled. Although the sequence and structure homology is not detectable in the rest of the protein, the relative orientation of region L1 with L2 is similar to that of L3 with L4. The structure duplication of first-shell and second-shell residues has become cryptic because the intraprotein sequence homology noticeable for a given laccase becomes significant only after comparing the conservation pattern in several fungal laccases. The identified motifs, L1-L4, can be useful in searching the newly sequenced genomes for putative laccase enzymes. Copyright 2003 Wiley Periodicals, Inc. Biotechnol Bioeng 83: 386-394, 2003.

  4. Evolutionary profiles from the QR factorization of multiple sequence alignments

    PubMed Central

    Sethi, Anurag; O'Donoghue, Patrick; Luthey-Schulten, Zaida

    2005-01-01

    We present an algorithm to generate complete evolutionary profiles that represent the topology of the molecular phylogenetic tree of the homologous group. The method, based on the multidimensional QR factorization of numerically encoded multiple sequence alignments, removes redundancy from the alignments and orders the protein sequences by increasing linear dependence, resulting in the identification of a minimal basis set of sequences that spans the evolutionary space of the homologous group of proteins. We observe a general trend that these smaller, more evolutionarily balanced profiles have comparable and, in many cases, better performance in database searches than conventional profiles containing hundreds of sequences, constructed in an iterative and computationally intensive procedure. For more diverse families or superfamilies, with sequence identity <30%, structural alignments, based purely on the geometry of the protein structures, provide better alignments than pure sequence-based methods. Merging the structure and sequence information allows the construction of accurate profiles for distantly related groups. These structure-based profiles outperformed other sequence-based methods for finding distant homologs and were used to identify a putative class II cysteinyl-tRNA synthetase (CysRS) in several archaea that eluded previous annotation studies. Phylogenetic analysis showed the putative class II CysRSs to be a monophyletic group and homology modeling revealed a constellation of active site residues similar to that in the known class I CysRS. PMID:15741270

  5. Multi-Harmony: detecting functional specificity from sequence alignment

    PubMed Central

    Brandt, Bernd W.; Feenstra, K. Anton; Heringa, Jaap

    2010-01-01

    Many protein families contain sub-families with functional specialization, such as binding different ligands or being involved in different protein–protein interactions. A small number of amino acids generally determine functional specificity. The identification of these residues can aid the understanding of protein function and help finding targets for experimental analysis. Here, we present multi-Harmony, an interactive web sever for detecting sub-type-specific sites in proteins starting from a multiple sequence alignment. Combining our Sequence Harmony (SH) and multi-Relief (mR) methods in one web server allows simultaneous analysis and comparison of specificity residues; furthermore, both methods have been significantly improved and extended. SH has been extended to cope with more than two sub-groups. mR has been changed from a sampling implementation to a deterministic one, making it more consistent and user friendly. For both methods Z-scores are reported. The multi-Harmony web server produces a dynamic output page, which includes interactive connections to the Jalview and Jmol applets, thereby allowing interactive analysis of the results. Multi-Harmony is available at http://www.ibi.vu.nl/ programs/shmrwww. PMID:20525785

  6. Molecular Cloning and Sequence Analysis of a Phenylalanine Ammonia-Lyase Gene from Dendrobium

    PubMed Central

    Cai, Yongping; Lin, Yi

    2013-01-01

    In this study, a phenylalanine ammonia-lyase (PAL) gene was cloned from Dendrobium candidum using homology cloning and RACE. The full-length sequence and catalytic active sites that appear in PAL proteins of Arabidopsis thaliana and Nicotiana tabacum are also found: PAL cDNA of D. candidum (designated Dc-PAL1, GenBank No. JQ765748) has 2,458 bps and contains a complete open reading frame (ORF) of 2,142 bps, which encodes 713 amino acid residues. The amino acid sequence of DcPAL1 has more than 80% sequence identity with the PAL genes of other plants, as indicated by multiple alignments. The dominant sites and catalytic active sites, which are similar to that showing in PAL proteins of Arabidopsis thaliana and Nicotiana tabacum, are also found in DcPAL1. Phylogenetic tree analysis revealed that DcPAL is more closely related to PALs from orchidaceae plants than to those of other plants. The differential expression patterns of PAL in protocorm-like body, leaf, stem, and root, suggest that the PAL gene performs multiple physiological functions in Dendrobium candidum. PMID:23638048

  7. SAbPred: a structure-based antibody prediction server

    PubMed Central

    Dunbar, James; Krawczyk, Konrad; Leem, Jinwoo; Marks, Claire; Nowak, Jaroslaw; Regep, Cristian; Georges, Guy; Kelm, Sebastian; Popovic, Bojana; Deane, Charlotte M.

    2016-01-01

    SAbPred is a server that makes predictions of the properties of antibodies focusing on their structures. Antibody informatics tools can help improve our understanding of immune responses to disease and aid in the design and engineering of therapeutic molecules. SAbPred is a single platform containing multiple applications which can: number and align sequences; automatically generate antibody variable fragment homology models; annotate such models with estimated accuracy alongside sequence and structural properties including potential developability issues; predict paratope residues; and predict epitope patches on protein antigens. The server is available at http://opig.stats.ox.ac.uk/webapps/sabpred. PMID:27131379

  8. Phylogenetic analysis of eIF4E-family members

    PubMed Central

    Joshi, Bhavesh; Lee, Kibwe; Maeder, Dennis L; Jagus, Rosemary

    2005-01-01

    Background Translation initiation in eukaryotes involves the recruitment of mRNA to the ribosome which is controlled by the translation factor eIF4E. eIF4E binds to the 5'-m7Gppp cap-structure of mRNA. Three dimensional structures of eIF4Es bound to cap-analogues resemble 'cupped-hands' in which the cap-structure is sandwiched between two conserved Trp residues (Trp-56 and Trp-102 of H. sapiens eIF4E). A third conserved Trp residue (Trp-166 of H. sapiens eIF4E) recognizes the 7-methyl moiety of the cap-structure. Assessment of GenBank NR and dbEST databases reveals that many organisms encode a number of proteins with homology to eIF4E. Little is understood about the relationships of these structurally related proteins to each other. Results By combining sequence data deposited in the Genbank databases, we have identified sequences encoding 411 eIF4E-family members from 230 species. These sequences have been deposited into an internet-accessible database designed for sequence comparisons of eIF4E-family members. Most members can be grouped into one of three classes. Class I members carry Trp residues equivalent to Trp-43 and Trp-56 of H. sapiens eIF4E and appear to be present in all eukaryotes. Class II members, possess Trp→Tyr/Phe/Leu and Trp→Tyr/Phe substitutions relative to Trp-43 and Trp-56 of H. sapiens eIF4E, and can be identified in Metazoa, Viridiplantae, and Fungi. Class III members possess a Trp residue equivalent to Trp-43 of H. sapiens eIF4E but carry a Trp→Cys/Tyr substitution relative to Trp-56 of H. sapiens eIF4E, and can be identified in Coelomata and Cnidaria. Some eIF4E-family members from Protista show extension or compaction relative to prototypical eIF4E-family members. Conclusion The expansion of sequenced cDNAs and genomic DNAs from all eukaryotic kingdoms has revealed a variety of proteins related in structure to eIF4E. Evolutionarily it seems that a single early eIF4E gene has undergone multiple gene duplications generating multiple structural classes, such that it is no longer possible to predict function from the primary amino acid sequence of an eIF4E-family member. The variety of eIF4E-family members provides a source of alternatives on the eIF4E structural theme that will benefit structure/function analyses and therapeutic drug design. PMID:16191198

  9. Resolution of Site-Specific Conformational Heterogeneity in Proline-Rich Molecular Recognition by Src Homology 3 Domains.

    PubMed

    Horness, Rachel E; Basom, Edward J; Mayer, John P; Thielges, Megan C

    2016-02-03

    Conformational heterogeneity and dynamics are increasingly evoked in models of protein molecular recognition but are challenging to experimentally characterize. Here we combine the inherent temporal resolution of infrared (IR) spectroscopy with the spatial resolution afforded by selective incorporation of carbon-deuterium (C-D) bonds, which provide frequency-resolved absorptions within a protein IR spectrum, to characterize the molecular recognition of the Src homology 3 (SH3) domain of the yeast protein Sho1 with its cognate proline-rich (PR) sequence of Pbs2. The IR absorptions of C-D bonds introduced at residues along a peptide of the Pbs2 PR sequence report on the changes in the local environments upon binding to the SH3 domain. Interestingly, upon forming the complex the IR spectra of the peptides labeled with C-D bonds at either of the two conserved prolines of the PXXP consensus recognition sequence show more absorptions than there are C-D bonds, providing evidence for the population of multiple states. In contrast, the NMR spectra of the peptides labeled with (13)C at the same residues show only single resonances, indicating rapid interconversion on the NMR time scale. Thus, the data suggest that the SH3 domain recognizes its cognate peptide with a component of induced fit molecular recognition involving the adoption of multiples states, which have previously gone undetected due to interconversion between the populated states that is too fast to resolve using conventional methods.

  10. Phenotype–genotype correlation in Hirschsprung disease is illuminated by comparative analysis of the RET protein sequence

    PubMed Central

    Kashuk, Carl S.; Stone, Eric A.; Grice, Elizabeth A.; Portnoy, Matthew E.; Green, Eric D.; Sidow, Arend; Chakravarti, Aravinda; McCallion, Andrew S.

    2005-01-01

    The ability to discriminate between deleterious and neutral amino acid substitutions in the genes of patients remains a significant challenge in human genetics. The increasing availability of genomic sequence data from multiple vertebrate species allows inclusion of sequence conservation and physicochemical properties of residues to be used for functional prediction. In this study, the RET receptor tyrosine kinase serves as a model disease gene in which a broad spectrum (≥116) of disease-associated mutations has been identified among patients with Hirschsprung disease and multiple endocrine neoplasia type 2. We report the alignment of the human RET protein sequence with the orthologous sequences of 12 non-human vertebrates (eight mammalian, one avian, and three teleost species), their comparative analysis, the evolutionary topology of the RET protein, and predicted tolerance for all published missense mutations. We show that, although evolutionary conservation alone provides significant information to predict the effect of a RET mutation, a model that combines comparative sequence data with analysis of physiochemical properties in a quantitative framework provides far greater accuracy. Although the ability to discern the impact of a mutation is imperfect, our analyses permit substantial discrimination between predicted functional classes of RET mutations and disease severity even for a multigenic disease such as Hirschsprung disease. PMID:15956201

  11. A strategy for detecting the conservation of folding-nucleus residues in protein superfamilies.

    PubMed

    Michnick, S W; Shakhnovich, E

    1998-01-01

    Nucleation-growth theory predicts that fast-folding peptide sequences fold to their native structure via structures in a transition-state ensemble that share a small number of native contacts (the folding nucleus). Experimental and theoretical studies of proteins suggest that residues participating in folding nuclei are conserved among homologs. We attempted to determine if this is true in proteins with highly diverged sequences but identical folds (superfamilies). We describe a strategy based on comparisons of residue conservation in natural superfamily sequences with simulated sequences (generated with a Monte-Carlo sequence design strategy) for the same proteins. The basic assumptions of the strategy were that natural sequences will conserve residues needed for folding and stability plus function, the simulated sequences contain no functional conservation, and nucleus residues make native contacts with each other. Based on these assumptions, we identified seven potential nucleus residues in ubiquitin superfamily members. Non-nucleus conserved residues were also identified; these are proposed to be involved in stabilizing native interactions. We found that all superfamily members conserved the same potential nucleus residue positions, except those for which the structural topology is significantly different. Our results suggest that the conservation of the nucleus of a specific fold can be predicted by comparing designed simulated sequences with natural highly diverged sequences that fold to the same structure. We suggest that such a strategy could be used to help plan protein folding and design experiments, to identify new superfamily members, and to subdivide superfamilies further into classes having a similar folding mechanism.

  12. Location of alkali metal binding sites in endothelin A selective receptor antagonists, cyclo(D-Trp-D-Asp-Pro-D-Val-Leu) and cyclo(D-Trp-D-Asp-Pro-D-Ile-Leu), from multistep collisionally activated decompositions.

    PubMed

    Ngoka, L C; Gross, M L

    2000-02-01

    We previously showed by using mass spectrometry that endothelin A selective receptor antagonists BQ123 and JKC301 form novel coordination compounds with sodium ions. This property may underlie the ability of an ET(A) antagonist to induce net tubular sodium reabsorption in the proximal tubule cells and reverse acute renal failure induced by severe ischemia. We have now defined the metal binding sites on BQ123 and JKC301 by subjecting the metal-containing peptides to multiple stages of collisionally activated decomposition (CAD) in an ion trap mass spectrometer. When submitted to low-energy CAD, the ring opens at the Asp-Pro amide bond. The metal ion, which bonds, inter alia, to the carbonyl oxygen of the proline residue, acts as a fixed charge site, and directs a charge-remote, sequence-specific fragmentation of the ring-opened peptide. Amino acid residues are sequentially cleaved from the C-terminal end, and the terminal aziridinone structure moves one step toward the N-terminus with each C-terminal amino acid residue removed. These observations are the basis of a new method to sequence cyclic peptides. Amino acid residues are observed as sets of three ions, a*(n)PD, b*(n)PD and c*(n)PD where n is the number of amino acid residues in the peptide. Copyright 2000 John Wiley & Sons, Ltd.

  13. The primary structure of rat liver ribosomal protein L37. Homology with yeast and bacterial ribosomal proteins.

    PubMed

    Lin, A; McNally, J; Wool, I G

    1983-09-10

    The covalent structure of the rat liver 60 S ribosomal subunit protein L37 was determined. Twenty-four tryptic peptides were purified and the sequence of each was established; they accounted for all 111 residues of L37. The sequence of the first 30 residues of L37, obtained previously by automated Edman degradation of the intact protein, provided the alignment of the first 9 tryptic peptides. Three peptides (CN1, CN2, and CN3) were produced by cleavage of protein L37 with cyanogen bromide. The sequence of CN1 (65 residues) was established from the sequence of secondary peptides resulting from cleavage with trypsin and chymotrypsin. The sequence of CN1 in turn served to order tryptic peptides 1 through 14. The sequence of CN2 (15 residues) was determined entirely by a micromanual procedure and allowed the alignment of tryptic peptides 14 through 18. The sequence of the NH2-terminal 28 amino acids of CN3 (31 residues) was determined; in addition the complete sequences of the secondary tryptic and chymotryptic peptides were done. The sequence of CN3 provided the order of tryptic peptides 18 through 24. Thus the sequence of the three cyanogen bromide peptides also accounted for the 111 residues of protein L37. The carboxyl-terminal amino acids were identified after carboxypeptidase A treatment. There is a disulfide bridge between half-cystinyl residues at positions 40 and 69. Rat liver ribosomal protein L37 is homologous with yeast YP55 and with Escherichia coli L34. Moreover, there is a segment of 17 residues in rat L37 that occurs, albeit with modifications, in yeast YP55 and in E. coli S4, L20, and L34.

  14. Characterization of a nuclear localization signal in the foot-and-mouth disease virus polymerase

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sanchez-Aparicio, Maria Teresa; Rosas, Maria Flora; Sobrino, Francisco, E-mail: fsobrino@cbm.uam.es

    2013-09-15

    We have experimentally tested whether the MRKTKLAPT sequence in FMDV 3D protein (residues 16 to 24) can act as a nuclear localization signal (NLS). Mutants with substitutions in two basic residues within this sequence, K18E and K20E, were generated. A decreased nuclear localization was observed in transiently expressed 3D and its precursor 3CD, suggesting a role of K18 and K20 in nuclear targeting. Fusion of MRKTKLAPT to the green fluorescence protein (GFP) increased the nuclear localization of GFP, which was not observed when GFP was fused to the 3D mutated sequences. These results indicate that the sequence MRKTKLAPT can bemore » functionally considered as a NLS. When introduced in a FMDV full length RNA replacements K18E and K20E led to production of revertant viruses that replaced the acidic residues introduced (E) by K, suggesting that the presence of lysins at positions 18 and 20 of 3D is essential for virus multiplication. - Highlights: • The FMDV 3D polymerase contains a nuclear localization signal. • Replacements K18E and K20E decrease nuclear localization of 3D and its precursor 3CD. • Fusion of the MRKTKLAPT 3D motif to GFP increases the nuclear localization of GFP. • Replacements K18E and K20E abolish the ability of MRKTKLAPT to relocate GFP. • RNAs harboring replacements K18E and K20E lead to recovery of revertant FMDVs.« less

  15. Ebola virus RNA editing depends on the primary editing site sequence and an upstream secondary structure.

    PubMed

    Mehedi, Masfique; Hoenen, Thomas; Robertson, Shelly; Ricklefs, Stacy; Dolan, Michael A; Taylor, Travis; Falzarano, Darryl; Ebihara, Hideki; Porcella, Stephen F; Feldmann, Heinz

    2013-01-01

    Ebolavirus (EBOV), the causative agent of a severe hemorrhagic fever and a biosafety level 4 pathogen, increases its genome coding capacity by producing multiple transcripts encoding for structural and nonstructural glycoproteins from a single gene. This is achieved through RNA editing, during which non-template adenosine residues are incorporated into the EBOV mRNAs at an editing site encoding for 7 adenosine residues. However, the mechanism of EBOV RNA editing is currently not understood. In this study, we report for the first time that minigenomes containing the glycoprotein gene editing site can undergo RNA editing, thereby eliminating the requirement for a biosafety level 4 laboratory to study EBOV RNA editing. Using a newly developed dual-reporter minigenome, we have characterized the mechanism of EBOV RNA editing, and have identified cis-acting sequences that are required for editing, located between 9 nt upstream and 9 nt downstream of the editing site. Moreover, we show that a secondary structure in the upstream cis-acting sequence plays an important role in RNA editing. EBOV RNA editing is glycoprotein gene-specific, as a stretch encoding for 7 adenosine residues located in the viral polymerase gene did not serve as an editing site, most likely due to an absence of the necessary cis-acting sequences. Finally, the EBOV protein VP30 was identified as a trans-acting factor for RNA editing, constituting a novel function for this protein. Overall, our results provide novel insights into the RNA editing mechanism of EBOV, further understanding of which might result in novel intervention strategies against this viral pathogen.

  16. Characterization of intronic uridine-rich sequence elements acting as possible targets for nuclear proteins during pre-mRNA splicing in Nicotiana plumbaginifolia.

    PubMed

    Gniadkowski, M; Hemmings-Mieszczak, M; Klahre, U; Liu, H X; Filipowicz, W

    1996-02-15

    Introns of nuclear pre-mRNAs in dicotyledonous plants, unlike introns in vertebrates or yeast, are distinctly rich in A+U nucleotides and this feature is essential for their processing. In order to define more precisely sequence elements important for intron recognition in plants, we investigated the effects of short insertions, either U-rich or A-rich, on splicing of synthetic introns in transfected protoplast of Nicotiana plumbaginifolia. It was found that insertions of U-rich (sequence UUUUUAU) but not A-rich (AUAAAAA) segments can activate splicing of a GC-rich synthetic infron, and that U-rich segments, or multimers thereof, can function irrespective of the site of insertion within the intron. Insertions of multiple U-rich segments, either at the same or different locations, generally had an additive, stimulatory effect on splicing. Mutational analysis showed that replacement of one or two U residues in the UUUUUAU sequence with A or C residues had only a small effect on splicing, but replacement with G residues was strongly inhibitory. Proteins that interact with fragments of natural and synthetic pre-mRNAs in vitro were identified in nuclear extracts of N.plumbaginifolia by UV cross- linking. The profile of cross-linked plant proteins was considerably less complex than that obtained with a HeLa cell nuclear extract. Two major cross-linkable plant proteins had apparent molecular mass of 50 and 54 kDa and showed affinity for oligouridilates present in synGC introns or for poly(U).

  17. Characterization of intronic uridine-rich sequence elements acting as possible targets for nuclear proteins during pre-mRNA splicing in Nicotiana plumbaginifolia.

    PubMed Central

    Gniadkowski, M; Hemmings-Mieszczak, M; Klahre, U; Liu, H X; Filipowicz, W

    1996-01-01

    Introns of nuclear pre-mRNAs in dicotyledonous plants, unlike introns in vertebrates or yeast, are distinctly rich in A+U nucleotides and this feature is essential for their processing. In order to define more precisely sequence elements important for intron recognition in plants, we investigated the effects of short insertions, either U-rich or A-rich, on splicing of synthetic introns in transfected protoplast of Nicotiana plumbaginifolia. It was found that insertions of U-rich (sequence UUUUUAU) but not A-rich (AUAAAAA) segments can activate splicing of a GC-rich synthetic infron, and that U-rich segments, or multimers thereof, can function irrespective of the site of insertion within the intron. Insertions of multiple U-rich segments, either at the same or different locations, generally had an additive, stimulatory effect on splicing. Mutational analysis showed that replacement of one or two U residues in the UUUUUAU sequence with A or C residues had only a small effect on splicing, but replacement with G residues was strongly inhibitory. Proteins that interact with fragments of natural and synthetic pre-mRNAs in vitro were identified in nuclear extracts of N.plumbaginifolia by UV cross- linking. The profile of cross-linked plant proteins was considerably less complex than that obtained with a HeLa cell nuclear extract. Two major cross-linkable plant proteins had apparent molecular mass of 50 and 54 kDa and showed affinity for oligouridilates present in synGC introns or for poly(U). PMID:8604302

  18. Partial amino acid sequence of the branched chain amino acid aminotransferase (TmB) of E. coli JA199 pDU11

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Feild, M.J.; Armstrong, F.B.

    1987-05-01

    E. coli JA199 pDU11 harbors a multicopy plasmid containing the ilv GEDAY gene cluster of S. typhimurium. TmB, gene product of ilv E, was purified, crystallized, and subjected to Edman degradation using a gas phase sequencer. The intact protein yielded an amino terminal 31 residue sequence. Both carboxymethylated apoenzyme and (/sup 3/H)-NaBH-reduced holoenzyme were then subjected to digestion by trypsin. The digests were fractionated using reversed phase HPLC, and the peptides isolated were sequenced. The borohydride-treated holoenzyme was used to isolate the cofactor-binding peptide. The peptide is 27 residues long and a comparison with known sequences of other aminotransferases revealedmore » limited homology. Peptides accounting for 211 of 288 predicted residues have been sequenced, including 9 residues of the carboxyl terminus. Comparison of peptides with the inferred amino acid sequence of the E. coli K-12 enzyme has helped determine the sequence of the amino terminal 59 residues; only two differences between the sequences are noted in this region.« less

  19. Nonenzymatic template-directed synthesis on hairpin oligonucleotides. 3. Incorporation of adenosine and uridine residues

    NASA Technical Reports Server (NTRS)

    Wu, T.; Orgel, L. E.

    1992-01-01

    We have used [32P]-labeled hairpin oligonucleotides to study template-directed synthesis on templates containing one or more A or T residues within a run of C residues. When nucleoside-5'-phosphoro(2-methyl)imidazolides are used as substrates, isolated A and T residues function efficiently in facilitating the incorporation of U and A, respectively. The reactions are regiospecific, producing mainly 3'-5'-phosphodiester bonds. Pairs of consecutive non-C residues are copied much less efficiently. Limited synthesis of CA and AC sequences on templates containing TG and GT sequences was observed along with some synthesis of the AA sequences on templates containing TT sequences. The other dimer sequences investigated, AA, AG, GA, TA, and AT, could not be copied. If A is absent from the reaction mixture, misincorporation of G residues is a significant reaction on templates containing an isolated T residue or two consecutive T residues. However, if both A and G are present, A is incorporated to a much greater extent than G. We believe that wobble-pairing between T and G is responsible for misincorporation when only G is present.

  20. Techniques for Computing the DFT Using the Residue Fermat Number Systems and VLSI

    NASA Technical Reports Server (NTRS)

    Truong, T. K.; Chang, J. J.; Hsu, I. S.; Pei, D. Y.; Reed, I. S.

    1985-01-01

    The integer complex multiplier and adder over the direct sum of two copies of a finite field is specialized to the direct sum of the rings of integers modulo Fermat numbers. Such multiplications and additions can be used in the implementation of a discrete Fourier transform (DFT) of a sequence of complex numbers. The advantage of the present approach is that the number of multiplications needed for the DFT can be reduced substantially over the previous approach. The architectural designs using this approach are regular, simple, expandable and, therefore, naturally suitable for VLSI implementation.

  1. Improving homology modeling of G-protein coupled receptors through multiple-template derived conserved inter-residue interactions

    NASA Astrophysics Data System (ADS)

    Chaudhari, Rajan; Heim, Andrew J.; Li, Zhijun

    2015-05-01

    Evidenced by the three-rounds of G-protein coupled receptors (GPCR) Dock competitions, improving homology modeling methods of helical transmembrane proteins including the GPCRs, based on templates of low sequence identity, remains an eminent challenge. Current approaches addressing this challenge adopt the philosophy of "modeling first, refinement next". In the present work, we developed an alternative modeling approach through the novel application of available multiple templates. First, conserved inter-residue interactions are derived from each additional template through conservation analysis of each template-target pairwise alignment. Then, these interactions are converted into distance restraints and incorporated in the homology modeling process. This approach was applied to modeling of the human β2 adrenergic receptor using the bovin rhodopsin and the human protease-activated receptor 1 as templates and improved model quality was demonstrated compared to the homology model generated by standard single-template and multiple-template methods. This method of "refined restraints first, modeling next", provides a fast and complementary way to the current modeling approaches. It allows rational identification and implementation of additional conserved distance restraints extracted from multiple templates and/or experimental data, and has the potential to be applicable to modeling of all helical transmembrane proteins.

  2. Sequence identity and antigenic cross-reactivity of white face hornet venom allergen, also a hyaluronidase, with other proteins.

    PubMed

    Lu, G; Kochoumian, L; King, T P

    1995-03-03

    White face hornet (Dolichovespula maculata) venom has three known protein allergens which induce IgE response in susceptible people. They are antigen 5, phospholipase A1, and hyaluronidase, also known as Dol m 5, 1, and 2, respectively. We have cloned Dol m 2, a protein of 331 residues. When expressed in bacteria, a mixture of recombinant Dol m 2 and its fragments was obtained. The fragments were apparently generated by proteolysis of a Met-Met bond at residue 122, as they were not observed for a Dol m 2 mutant with a Leu-Met bond. Dol m 2 has 56% sequence identity with the honey bee venom allergen hyaluronidase and 27% identity with PH-20, a human sperm protein with hyaluronidase activity. A common feature of hornet venom allergens is their sequence identity with other proteins in our environment. We showed previously the sequence identity of Dol m 5 with a plant protein and a mammalian testis protein and of Dol m 1 with mammalian lipases. In BALB/c mice, Dol m 2 and bee hyaluronidase showed cross-reactivity at both antibody and T cell levels. These findings are relevant to some patients' multiple sensitivity to hornet and bee stings.

  3. Prediction of protein long-range contacts using an ensemble of genetic algorithm classifiers with sequence profile centers.

    PubMed

    Chen, Peng; Li, Jinyan

    2010-05-17

    Prediction of long-range inter-residue contacts is an important topic in bioinformatics research. It is helpful for determining protein structures, understanding protein foldings, and therefore advancing the annotation of protein functions. In this paper, we propose a novel ensemble of genetic algorithm classifiers (GaCs) to address the long-range contact prediction problem. Our method is based on the key idea called sequence profile centers (SPCs). Each SPC is the average sequence profiles of residue pairs belonging to the same contact class or non-contact class. GaCs train on multiple but different pairs of long-range contact data (positive data) and long-range non-contact data (negative data). The negative data sets, having roughly the same sizes as the positive ones, are constructed by random sampling over the original imbalanced negative data. As a result, about 21.5% long-range contacts are correctly predicted. We also found that the ensemble of GaCs indeed makes an accuracy improvement by around 5.6% over the single GaC. Classifiers with the use of sequence profile centers may advance the long-range contact prediction. In line with this approach, key structural features in proteins would be determined with high efficiency and accuracy.

  4. On the relationship between residue structural environment and sequence conservation in proteins.

    PubMed

    Liu, Jen-Wei; Lin, Jau-Ji; Cheng, Chih-Wen; Lin, Yu-Feng; Hwang, Jenn-Kang; Huang, Tsun-Tsao

    2017-09-01

    Residues that are crucial to protein function or structure are usually evolutionarily conserved. To identify the important residues in protein, sequence conservation is estimated, and current methods rely upon the unbiased collection of homologous sequences. Surprisingly, our previous studies have shown that the sequence conservation is closely correlated with the weighted contact number (WCN), a measure of packing density for residue's structural environment, calculated only based on the C α positions of a protein structure. Moreover, studies have shown that sequence conservation is correlated with environment-related structural properties calculated based on different protein substructures, such as a protein's all atoms, backbone atoms, side-chain atoms, or side-chain centroid. To know whether the C α atomic positions are adequate to show the relationship between residue environment and sequence conservation or not, here we compared C α atoms with other substructures in their contributions to the sequence conservation. Our results show that C α positions are substantially equivalent to the other substructures in calculations of various measures of residue environment. As a result, the overlapping contributions between C α atoms and the other substructures are high, yielding similar structure-conservation relationship. Take the WCN as an example, the average overlapping contribution to sequence conservation is 87% between C α and all-atom substructures. These results indicate that only C α atoms of a protein structure could reflect sequence conservation at the residue level. © 2017 Wiley Periodicals, Inc.

  5. Determinants of the Differential Antizyme-Binding Affinity of Ornithine Decarboxylase

    PubMed Central

    Liu, Yen-Chin; Hsu, Den-Hua; Huang, Chi-Liang; Liu, Yi-Liang; Liu, Guang-Yaw; Hung, Hui-Chih

    2011-01-01

    Ornithine decarboxylase (ODC) is a ubiquitous enzyme that is conserved in all species from bacteria to humans. Mammalian ODC is degraded by the proteasome in a ubiquitin-independent manner by direct binding to the antizyme (AZ). In contrast, Trypanosoma brucei ODC has a low binding affinity toward AZ. In this study, we identified key amino acid residues that govern the differential AZ binding affinity of human and Trypanosoma brucei ODC. Multiple sequence alignments of the ODC putative AZ-binding site highlights several key amino acid residues that are different between the human and Trypanosoma brucei ODC protein sequences, including residue 119, 124,125, 129, 136, 137 and 140 (the numbers is for human ODC). We generated a septuple human ODC mutant protein where these seven bases were mutated to match the Trypanosoma brucei ODC protein sequence. The septuple mutant protein was much less sensitive to AZ inhibition compared to the WT protein, suggesting that these amino acid residues play a role in human ODC-AZ binding. Additional experiments with sextuple mutants suggest that residue 137 plays a direct role in AZ binding, and residues 119 and 140 play secondary roles in AZ binding. The dissociation constants were also calculated to quantify the affinity of the ODC-AZ binding interaction. The K d value for the wild type ODC protein-AZ heterodimer ([ODC_WT]-AZ) is approximately 0.22 μM, while the K d value for the septuple mutant-AZ heterodimer ([ODC_7M]-AZ) is approximately 12.4 μM. The greater than 50-fold increase in [ODC_7M]-AZ binding affinity shows that the ODC-7M enzyme has a much lower binding affinity toward AZ. For the mutant proteins ODC_7M(-Q119H) and ODC_7M(-V137D), the K d was 1.4 and 1.2 μM, respectively. These affinities are 6-fold higher than the WT_ODC K d, which suggests that residues 119 and 137 play a role in AZ binding. PMID:22073206

  6. Genetic and structural analyses of cytochrome P450 hydroxylases in sex hormone biosynthesis: Sequential origin and subsequent coevolution.

    PubMed

    Goldstone, Jared V; Sundaramoorthy, Munirathinam; Zhao, Bin; Waterman, Michael R; Stegeman, John J; Lamb, David C

    2016-01-01

    Biosynthesis of steroid hormones in vertebrates involves three cytochrome P450 hydroxylases, CYP11A1, CYP17A1 and CYP19A1, which catalyze sequential steps in steroidogenesis. These enzymes are conserved in the vertebrates, but their origin and existence in other chordate subphyla (Tunicata and Cephalochordata) have not been clearly established. In this study, selected protein sequences of CYP11A1, CYP17A1 and CYP19A1 were compiled and analyzed using multiple sequence alignment and phylogenetic analysis. Our analyses show that cephalochordates have sequences orthologous to vertebrate CYP11A1, CYP17A1 or CYP19A1, and that echinoderms and hemichordates possess CYP11-like but not CYP19 genes. While the cephalochordate sequences have low identity with the vertebrate sequences, reflecting evolutionary distance, the data show apparent origin of CYP11 prior to the evolution of CYP19 and possibly CYP17, thus indicating a sequential origin of these functionally related steroidogenic CYPs. Co-occurrence of the three CYPs in early chordates suggests that the three genes may have coevolved thereafter, and that functional conservation should be reflected in functionally important residues in the proteins. CYP19A1 has the largest number of conserved residues while CYP11A1 sequences are less conserved. Structural analyses of human CYP11A1, CYP17A1 and CYP19A1 show that critical substrate binding site residues are highly conserved in each enzyme family. The results emphasize that the steroidogenic pathways producing glucocorticoids and reproductive steroids are several hundred million years old and that the catalytic structural elements of the enzymes have been conserved over the same period of time. Analysis of these elements may help to identify when precursor functions linked to these enzymes first arose. Copyright © 2015 Elsevier Inc. All rights reserved.

  7. The nonamer UUAUUUAUU is the key AU-rich sequence motif that mediates mRNA degradation.

    PubMed Central

    Zubiaga, A M; Belasco, J G; Greenberg, M E

    1995-01-01

    Labile mRNAs that encode cytokine and immediate-early gene products often contain AU-rich sequences within their 3' untranslated region (UTR). These AU-rich sequences appear to be key determinants of the short half-lives of these mRNAs, although the sequence features of these elements and the mechanism by which they target mRNAs for rapid decay have not been fully defined. We have examined the features of AU-rich elements (AREs) that are crucial for their function as determinants of mRNA instability in mammalian cells by testing the ability of various mutant c-fos AREs and synthetic AREs to direct rapid mRNA deadenylation and decay when inserted within the 3' UTR of the normally stable beta-globin mRNA. Evidence is presented that the pentamer AUUUA, which previously was suggested to be the minimal determinant of instability present in mammalian AREs, cannot direct rapid mRNA deadenylation and decay. Instead, the nonomer UUAUUUAUU is the elemental AU-rich sequence motif that destabilizes mRNA. Removal of one uridine residue from either end of the nonamer (UUAUUUAU or UAUUUAUU) results in a decrease of potency of the element, while removal of a uridine residue from both ends of the nonamer (UAUUUAU) eliminates detectable destabilizing activity. The inclusion of an additional uridine residue at both ends of the nonamer (UUUAUUUAUUU) does not further increase the efficacy of the element. Taken together, these findings suggest that the nonamer UUAUUUAUU is the minimal AU-rich motif that effectively destabilizes mRNA. Additional ARE potency is achieved by combining multiple copies of this nonamer in a single mRNA 3' UTR. Furthermore, analysis of poly(A) shortening rates for ARE-containing mRNAs reveals that the UUAUUUAUU sequence also accelerates mRNA deadenylation and suggests that the UUAUUUAUU motif targets mRNA for rapid deadenylation as an early step in the mRNA decay process. PMID:7891716

  8. Characterization of tannase protein sequences of bacteria and fungi: an in silico study.

    PubMed

    Banerjee, Amrita; Jana, Arijit; Pati, Bikash R; Mondal, Keshab C; Das Mohapatra, Pradeep K

    2012-04-01

    The tannase protein sequences of 149 bacteria and 36 fungi were retrieved from NCBI database. Among them only 77 bacterial and 31 fungal tannase sequences were taken which have different amino acid compositions. These sequences were analysed for different physical and chemical properties, superfamily search, multiple sequence alignment, phylogenetic tree construction and motif finding to find out the functional motif and the evolutionary relationship among them. The superfamily search for these tannase exposed the occurrence of proline iminopeptidase-like, biotin biosynthesis protein BioH, O-acetyltransferase, carboxylesterase/thioesterase 1, carbon-carbon bond hydrolase, haloperoxidase, prolyl oligopeptidase, C-terminal domain and mycobacterial antigens families and alpha/beta hydrolase superfamily. Some bacterial and fungal sequence showed similarity with different families individually. The multiple sequence alignment of these tannase protein sequences showed conserved regions at different stretches with maximum homology from amino acid residues 389-469 and 482-523 which could be used for designing degenerate primers or probes specific for tannase producing bacterial and fungal species. Phylogenetic tree showed two different clusters; one has only bacteria and another have both fungi and bacteria showing some relationship between these different genera. Although in second cluster near about all fungal species were found together in a corner which indicates the sequence level similarity among fungal genera. The distributions of fourteen motifs analysis revealed Motif 1 with a signature amino acid sequence of 29 amino acids, i.e. GCSTGGREALKQAQRWPHDYDGIIANNPA, was uniformly observed in 83.3 % of studied tannase sequences representing its participation with the structure and enzymatic function.

  9. Structure of the membrane channel porin from Rhodopseudomonas blastica at 2.0 A resolution.

    PubMed Central

    Kreusch, A.; Neubüser, A.; Schiltz, E.; Weckesser, J.; Schulz, G. E.

    1994-01-01

    The crystal structure of a membrane channel, homotrimeric porin from Rhodopseudomonas blastica has been determined at 2.0 A resolution by multiple isomorphous replacement and structural refinement. The current model has an R-factor of 16.5% and consists of 289 amino acids, 238 water molecules, and 3 detergent molecules per subunit. The partial protein sequence and subsequently the complete DNA sequence were determined. The general architecture is similar to those of the structurally known porins. As a particular feature there are 3 adjacent binding sites for n-alkyl chains at the molecular 3-fold axis. The side chain arrangement in the channel indicates a transverse electric field across each of the 3 pore eyelets, which may explain the discrimination against nonpolar solutes. Moreover, there are 2 significantly ordered girdles of aromatic residues at the nonpolar/polar borderlines of the interface between protein and membrane. Possibly, these residues shield the polypeptide conformation against adverse membrane fluctuations. PMID:8142898

  10. An Adaptive Kalman Filter using a Simple Residual Tuning Method

    NASA Technical Reports Server (NTRS)

    Harman, Richard R.

    1999-01-01

    One difficulty in using Kalman filters in real world situations is the selection of the correct process noise, measurement noise, and initial state estimate and covariance. These parameters are commonly referred to as tuning parameters. Multiple methods have been developed to estimate these parameters. Most of those methods such as maximum likelihood, subspace, and observer Kalman Identification require extensive offline processing and are not suitable for real time processing. One technique, which is suitable for real time processing, is the residual tuning method. Any mismodeling of the filter tuning parameters will result in a non-white sequence for the filter measurement residuals. The residual tuning technique uses this information to estimate corrections to those tuning parameters. The actual implementation results in a set of sequential equations that run in parallel with the Kalman filter. Equations for the estimation of the measurement noise have also been developed. These algorithms are used to estimate the process noise and measurement noise for the Wide Field Infrared Explorer star tracker and gyro.

  11. Neural/Bayes network predictor for inheritable cardiac disease pathogenicity and phenotype.

    PubMed

    Burghardt, Thomas P; Ajtai, Katalin

    2018-04-11

    The cardiac muscle sarcomere contains multiple proteins contributing to contraction energy transduction and its regulation during a heartbeat. Inheritable heart disease mutants affect most of them but none more frequently than the ventricular myosin motor and cardiac myosin binding protein c (mybpc3). These co-localizing proteins have mybpc3 playing a regulatory role to the energy transducing motor. Residue substitution and functional domain assignment of each mutation in the protein sequence decides, under the direction of a sensible disease model, phenotype and pathogenicity. The unknown model mechanism is decided here using a method combing neural and Bayes networks. Missense single nucleotide polymorphisms (SNPs) are clues for the disease mechanism summarized in an extensive database collecting mutant sequence location and residue substitution as independent variables that imply the dependent disease phenotype and pathogenicity characteristics in 4 dimensional data points (4ddps). The SNP database contains entries with the majority having one or both dependent data entries unfulfilled. A neural network relating causes (mutant residue location and substitution) and effects (phenotype and pathogenicity) is trained, validated, and optimized using fulfilled 4ddps. It then predicts unfulfilled 4ddps providing the implicit disease model. A discrete Bayes network interprets fulfilled and predicted 4ddps with conditional probabilities for phenotype and pathogenicity given mutation location and residue substitution thus relating the neural network implicit model to explicit features of the motor and mybpc3 sequence and structural domains. Neural/Bayes network forecasting automates disease mechanism modeling by leveraging the world wide human missense SNP database that is in place and expanding. Copyright © 2018 The Authors. Published by Elsevier Ltd.. All rights reserved.

  12. A versatile palindromic amphipathic repeat coding sequence horizontally distributed among diverse bacterial and eucaryotic microbes

    PubMed Central

    2010-01-01

    Background Intragenic tandem repeats occur throughout all domains of life and impart functional and structural variability to diverse translation products. Repeat proteins confer distinctive surface phenotypes to many unicellular organisms, including those with minimal genomes such as the wall-less bacterial monoderms, Mollicutes. One such repeat pattern in this clade is distributed in a manner suggesting its exchange by horizontal gene transfer (HGT). Expanding genome sequence databases reveal the pattern in a widening range of bacteria, and recently among eucaryotic microbes. We examined the genomic flux and consequences of the motif by determining its distribution, predicted structural features and association with membrane-targeted proteins. Results Using a refined hidden Markov model, we document a 25-residue protein sequence motif tandemly arrayed in variable-number repeats in ORFs lacking assigned functions. It appears sporadically in unicellular microbes from disparate bacterial and eucaryotic clades, representing diverse lifestyles and ecological niches that include host parasitic, marine and extreme environments. Tracts of the repeats predict a malleable configuration of recurring domains, with conserved hydrophobic residues forming an amphipathic secondary structure in which hydrophilic residues endow extensive sequence variation. Many ORFs with these domains also have membrane-targeting sequences that predict assorted topologies; others may comprise reservoirs of sequence variants. We demonstrate expressed variants among surface lipoproteins that distinguish closely related animal pathogens belonging to a subgroup of the Mollicutes. DNA sequences encoding the tandem domains display dyad symmetry. Moreover, in some taxa the domains occur in ORFs selectively associated with mobile elements. These features, a punctate phylogenetic distribution, and different patterns of dispersal in genomes of related taxa, suggest that the repeat may be disseminated by HGT and intra-genomic shuffling. Conclusions We describe novel features of PARCELs (Palindromic Amphipathic Repeat Coding ELements), a set of widely distributed repeat protein domains and coding sequences that were likely acquired through HGT by diverse unicellular microbes, further mobilized and diversified within genomes, and co-opted for expression in the membrane proteome of some taxa. Disseminated by multiple gene-centric vehicles, ORFs harboring these elements enhance accessory gene pools as part of the "mobilome" connecting genomes of various clades, in taxa sharing common niches. PMID:20626840

  13. Residue selective 15N CEST and CPMG experiments for studies of millisecond timescale protein dynamics.

    PubMed

    Niu, Xiaogang; Ding, Jienv; Zhang, Wenbo; Li, Qianwen; Hu, Yunfei; Jin, Changwen

    2018-06-01

    Proteins are intrinsically dynamic molecules and undergo exchanges among multiple conformations to perform biological functions. The CPMG relaxation dispersion and CEST experiments are two important solution NMR techniques for characterizing the conformational exchange processes on the millisecond timescale. Traditional pseudo 3D 15 N CEST and CPMG experiments have certain limitations in their applications. For example, both experiments have low sensitivity for broadened resonances, and the process of optimizing sample conditions and experimental parameters are often time consuming. To overcome these limitations, we herein present a new set of residue selective 15 N CEST and CPMG pulse sequences by employing the Hartmann-Hahn cross-polarization transfer of magnetization in both 1D and 2D schemes. Combined with frequency labeling in the indirect dimension using only a small number of increments, the pulse sequences in the 2D scheme can be applied on resonances in overlapped regions of the 1 H- 15 N HSQC spectrum. The pulse sequences were further applied on several proteins, demonstrating their advantages over the traditional CEST and CPMG experiments under specific circumstances. Copyright © 2018 Elsevier Inc. All rights reserved.

  14. Building toy models of proteins using coevolutionary information

    NASA Astrophysics Data System (ADS)

    Cheng, Ryan; Raghunathan, Mohit; Onuchic, Jose

    2015-03-01

    Recent developments in global statistical methodologies have advanced the analysis of large collections of protein sequences for coevolutionary information. Coevolution between amino acids in a protein arises from compensatory mutations that are needed to maintain the stability or function of a protein over the course of evolution. This gives rise to quantifiable correlations between amino acid positions within the multiple sequence alignment of a protein family. Here, we use Direct Coupling Analysis (DCA) to infer a Potts model Hamiltonian governing the correlated mutations in a protein family to obtain the sequence-dependent interaction energies of a toy protein model. We demonstrate that this methodology predicts residue-residue interaction energies that are consistent with experimental mutational changes in protein stabilities as well as other computational methodologies. Furthermore, we demonstrate with several examples that DCA could be used to construct a structure-based model that quantitatively agrees with experimental data on folding mechanisms. This work serves as a potential framework for generating models of proteins that are enriched by evolutionary data that can potentially be used to engineer key functional motions and interactions in protein systems. This research has been supported by the NSF INSPIRE award MCB-1241332 and by the CTBP sponsored by the NSF (Grant PHY-1427654).

  15. Synthetic signal sequences that enable efficient secretory protein production in the yeast Kluyveromyces marxianus.

    PubMed

    Yarimizu, Tohru; Nakamura, Mikiko; Hoshida, Hisashi; Akada, Rinji

    2015-02-14

    Targeting of cellular proteins to the extracellular environment is directed by a secretory signal sequence located at the N-terminus of a secretory protein. These signal sequences usually contain an N-terminal basic amino acid followed by a stretch containing hydrophobic residues, although no consensus signal sequence has been identified. In this study, simple modeling of signal sequences was attempted using Gaussia princeps secretory luciferase (GLuc) in the yeast Kluyveromyces marxianus, which allowed comprehensive recombinant gene construction to substitute synthetic signal sequences. Mutational analysis of the GLuc signal sequence revealed that the GLuc hydrophobic peptide length was lower limit for effective secretion and that the N-terminal basic residue was indispensable. Deletion of the 16th Glu caused enhanced levels of secreted protein, suggesting that this hydrophilic residue defined the boundary of a hydrophobic peptide stretch. Consequently, we redesigned this domain as a repeat of a single hydrophobic amino acid between the N-terminal Lys and C-terminal Glu. Stretches consisting of Phe, Leu, Ile, or Met were effective for secretion but the number of residues affected secretory activity. A stretch containing sixteen consecutive methionine residues (M16) showed the highest activity; the M16 sequence was therefore utilized for the secretory production of human leukemia inhibitory factor protein in yeast, resulting in enhanced secreted protein yield. We present a new concept for the provision of secretory signal sequence ability in the yeast K. marxianus, determined by the number of residues of a single hydrophobic residue located between N-terminal basic and C-terminal acidic amino acid boundaries.

  16. An intact SAM-dependent methyltransferase fold is encoded by the human endothelin-converting enzyme-2 gene

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Tempel, W.; Wu, H.; Dombrovsky, L.

    2010-08-17

    A recent survey of protein expression patterns in patients with Alzheimer's disease (AD) has identified ece2 (chromosome: 3; Locations: 3q27.1) as the most significantly downregulated gene within the tested group. ece2 encodes endothelin-converting enzyme ECE2, a metalloprotease with a role in neuropeptide processing. Deficiency in the highly homologous ECE1 has earlier been linked to increased levels of AD-related {beta}-amyloid peptide in mice, consistent with a role for ECE in the degradation of that peptide. Initially, ECE2 was presumed to resemble ECE1, in that it comprises a single transmembrane region of {approx}20 residues flanked by a small amino-terminal cytosolic segment andmore » a carboxy-terminal lumenar peptidase domain. The carboxy-terminal domain has significant sequence similarity to both neutral endopeptidase, for which an X-ray structure has been determined, and Kell blood group protein. After their initial discovery, multiple isoforms of ECE1 and ECE2 were discovered, generated by alternative splicing of multiple exons. The originally described ece2 transcript, RefSeq NM{_}174046, contains the amino-terminal cytosolic portion followed by the transmembrane region and peptidase domain (Fig. 1, isoform B). Another ece2 transcript, available from the Mammalian Gene Collection under MGC2408 (Fig. 1, isoform C), RefSeq accession NM{_}032331, is predicted to be translated into a 255 residue peptide with low but detectable sequence similarity to known S-adenosyl-L-methionine (SAM)-dependent methyltransferases (SAM-MTs), such as the hypothetical protein TT1324 from Thermus thermophilis, PDB code 2GS9, which shares 30% amino acid sequence identity with ECE2 over 138 residues of the sequence. Intriguingly, another 'elongated' ece2 transcript (Fig. 1, isoform A) (RefSeq NM{_}014693) contains an amino-terminal portion of the putative SAM-MT domain, the transmembrane domain, and the protease domain. This suggests the possibility for coexistence of the putative SAM-MT and protease domains in a single polypeptide and their transmembrane interplay. Although sequence conservation across the SAM-MT family is weak, the structural fold is highly conserved. The most conserved part of this fold is the SAM-binding subdomain, which is shared between MGC2408 and hypothetical protein TT1324. Typically, the SAM-binding subdomain is flanked by a variable Nterminal extension and, at the C-terminus, by a substrate- binding subdomain, which varies enormously in size but preserves a conserved topology with three antiparallel b-strands. The 'elongated' transcript of ece2 lacks this substrate-binding subdomain. To test the hypothesis that the 255 residue ece2 gene product MGC2408 represents a complete SAM-MT fold, we have determined a crystal structure of this protein in the presence of SAH.« less

  17. On the Formation and Properties of Interstrand DNA-DNA Cross-links Forged by Reaction of an Abasic Site With the Opposing Guanine Residue of 5′-CAp Sequences in Duplex DNA

    PubMed Central

    Johnson, Kevin M.; Price, Nathan E.; Wang, Jin; Fekry, Mostafa I.; Dutta, Sanjay; Seiner, Derrick R.; Wang, Yinsheng; Gates, Kent S.

    2014-01-01

    We recently reported that the aldehyde residue of an abasic (Ap) site in duplex DNA can generate an interstrand cross-link via reaction with a guanine residue on the opposing strand. This finding is intriguing because the highly deleterious nature of interstrand cross-links suggests that even small amounts of Ap-derived cross-links could make a significant contribution to the biological consequences stemming from the generation of Ap sites in cellular DNA. Incubation of 21-bp duplexes containing a central 5′-CAp sequence under conditions of reductive amination (NaCNBH3, pH 5.2) generated much higher yields of cross-linked DNA than reported previously. At pH 7, in the absence of reducing agents, these Ap-containing duplexes also produced cross-linked duplexes that were readily detected on denaturing polyacrylamide gels. Cross-link formation was not highly sensitive to reaction conditions and, once formed, the cross-link was stable to a variety of work-up conditions. Results of multiple experiments including MALDI-TOF mass spectrometry, gel mobility, methoxyamine capping of the Ap aldehyde, inosine-for-guanine replacement, hydroxyl radical footprinting, and LCMS/MS were consistent with a cross-linking mechanism involving reversible reaction of the Ap aldehyde residue with the N2-amino group of the opposing guanine residue in 5′-CAp sequences to generate hemiaminal, imine, or cyclic hemiaminal cross-links (7-10) that were irreversibly converted under conditions of reductive amination (NaCNBH3/pH 5.2) to a stable amine linkage. Further support for the importance of the exocyclic N2-amino group in this reaction was provided by an experiment showing that installation of a 2-aminopurine-thymine base pair at the cross-linking site produced high yields (15-30%) of a cross-linked duplex at neutral pH, in the absence of NaCNBH3. PMID:23215239

  18. Prediction of cis/trans isomerization in proteins using PSI-BLAST profiles and secondary structure information.

    PubMed

    Song, Jiangning; Burrage, Kevin; Yuan, Zheng; Huber, Thomas

    2006-03-09

    The majority of peptide bonds in proteins are found to occur in the trans conformation. However, for proline residues, a considerable fraction of Prolyl peptide bonds adopt the cis form. Proline cis/trans isomerization is known to play a critical role in protein folding, splicing, cell signaling and transmembrane active transport. Accurate prediction of proline cis/trans isomerization in proteins would have many important applications towards the understanding of protein structure and function. In this paper, we propose a new approach to predict the proline cis/trans isomerization in proteins using support vector machine (SVM). The preliminary results indicated that using Radial Basis Function (RBF) kernels could lead to better prediction performance than that of polynomial and linear kernel functions. We used single sequence information of different local window sizes, amino acid compositions of different local sequences, multiple sequence alignment obtained from PSI-BLAST and the secondary structure information predicted by PSIPRED. We explored these different sequence encoding schemes in order to investigate their effects on the prediction performance. The training and testing of this approach was performed on a newly enlarged dataset of 2424 non-homologous proteins determined by X-Ray diffraction method using 5-fold cross-validation. Selecting the window size 11 provided the best performance for determining the proline cis/trans isomerization based on the single amino acid sequence. It was found that using multiple sequence alignments in the form of PSI-BLAST profiles could significantly improve the prediction performance, the prediction accuracy increased from 62.8% with single sequence to 69.8% and Matthews Correlation Coefficient (MCC) improved from 0.26 with single local sequence to 0.40. Furthermore, if coupled with the predicted secondary structure information by PSIPRED, our method yielded a prediction accuracy of 71.5% and MCC of 0.43, 9% and 0.17 higher than the accuracy achieved based on the singe sequence information, respectively. A new method has been developed to predict the proline cis/trans isomerization in proteins based on support vector machine, which used the single amino acid sequence with different local window sizes, the amino acid compositions of local sequence flanking centered proline residues, the position-specific scoring matrices (PSSMs) extracted by PSI-BLAST and the predicted secondary structures generated by PSIPRED. The successful application of SVM approach in this study reinforced that SVM is a powerful tool in predicting proline cis/trans isomerization in proteins and biological sequence analysis.

  19. Protein Sectors: Statistical Coupling Analysis versus Conservation

    PubMed Central

    Teşileanu, Tiberiu; Colwell, Lucy J.; Leibler, Stanislas

    2015-01-01

    Statistical coupling analysis (SCA) is a method for analyzing multiple sequence alignments that was used to identify groups of coevolving residues termed “sectors”. The method applies spectral analysis to a matrix obtained by combining correlation information with sequence conservation. It has been asserted that the protein sectors identified by SCA are functionally significant, with different sectors controlling different biochemical properties of the protein. Here we reconsider the available experimental data and note that it involves almost exclusively proteins with a single sector. We show that in this case sequence conservation is the dominating factor in SCA, and can alone be used to make statistically equivalent functional predictions. Therefore, we suggest shifting the experimental focus to proteins for which SCA identifies several sectors. Correlations in protein alignments, which have been shown to be informative in a number of independent studies, would then be less dominated by sequence conservation. PMID:25723535

  20. Large diversity of the piggyBac-like elements in the genome of Tribolium castaneum

    PubMed Central

    Wang, Jianjun; Du, Yuzhou; Wang, Suzhi; Brown, Sue; Park, Yoonseong

    2011-01-01

    The piggyBac transposable element, originally discovered in the cabbage looper, Trichoplusia ni, has been widely used in insect transgenesis including the red flour beetle Tribolium castaneum. We surveyed piggyBac-like (PLE) sequences in the genome of Tribolium castaneum by homology searches using as queries the diverse PLE sequences that have been described previously. The search yielded a total of 32 piggyBac-like elements (TcPLEs) which were classified into 14 distinct groups. Most of the TcPLEs contain defective functional motifs in that they are lacking inverted terminal repeats or have disrupted open reading frames. Only one single copy of TcPLE1 appears to be intact with imperfect 16 bp inverted terminal repeats flanking an open reading frame encoding a transposase of 571 amino acid residues. Many copies of TcPLEs were found to be inserted into or close to other transposon-like sequences. This large diversity of TcPLEs with generally low copy numbers suggests multiple invasions of the TcPLEs over a long evolutionary time without extensive multiplications or occurrence of rapid loss of TcPLEs copies. PMID:18342253

  1. Three acidic residues are at the active site of a beta-propeller architecture in glycoside hydrolase families 32, 43, 62, and 68.

    PubMed

    Pons, Tirso; Naumoff, Daniil G; Martínez-Fleites, Carlos; Hernández, Lázaro

    2004-02-15

    Multiple-sequence alignment of glycoside hydrolase (GH) families 32, 43, 62, and 68 revealed three conserved blocks, each containing an acidic residue at an equivalent position in all the enzymes. A detailed analysis of the site-directed mutations so far performed on invertases (GH32), arabinanases (GH43), and bacterial fructosyltransferases (GH68) indicated a direct implication of the conserved residues Asp/Glu (block I), Asp (block II), and Glu (block III) in substrate binding and hydrolysis. These residues are close in space in the 5-bladed beta-propeller fold determined for Cellvibrio japonicus alpha-L-arabinanase Arb43A [Nurizzo et al., Nat Struct Biol 2002;9:665-668] and Bacillus subtilis endo-1,5-alpha-L-arabinanase. A sequence-structure compatibility search using 3D-PSSM, mGenTHREADER, INBGU, and SAM-T02 programs predicted indistinctly the 5-bladed beta-propeller fold of Arb43A and the 6-bladed beta-propeller fold of sialidase/neuraminidase (GH33, GH34, and GH83) as the most reliable topologies for GH families 32, 62, and 68. We conclude that the identified acidic residues are located at the active site of a beta-propeller architecture in GH32, GH43, GH62, and GH68, operating with a canonical reaction mechanism of either inversion (GH43 and likely GH62) or retention (GH32 and GH68) of the anomeric configuration. Also, we propose that the beta-propeller architecture accommodates distinct binding sites for the acceptor saccharide in glycosyl transfer reaction. Copyright 2003 Wiley-Liss, Inc.

  2. Genome-Wide Analysis of Oleosin Gene Family in 22 Tree Species: An Accelerator for Metabolic Engineering of BioFuel Crops and Agrigenomics Industrial Applications?

    PubMed Central

    2015-01-01

    Abstract Trees contribute to enormous plant oil reserves because many trees contain 50%–80% of oil (triacylglycerols, TAGs) in the fruits and kernels. TAGs accumulate in subcellular structures called oil bodies/droplets, in which TAGs are covered by low-molecular-mass hydrophobic proteins called oleosins (OLEs). The OLEs/TAGs ratio determines the size and shape of intracellular oil bodies. There is a lack of comprehensive sequence analysis and structural information of OLEs among diverse trees. The objectives of this study were to identify OLEs from 22 tree species (e.g., tung tree, tea-oil tree, castor bean), perform genome-wide analysis of OLEs, classify OLEs, identify conserved sequence motifs and amino acid residues, and predict secondary and three-dimensional structures in tree OLEs and OLE subfamilies. Data mining identified 65 OLEs with perfect conservation of the “proline knot” motif (PX5SPX3P) from 19 trees. These OLEs contained >40% hydrophobic amino acid residues. They displayed similar properties and amino acid composition. Genome-wide phylogenetic analysis and multiple sequence alignment demonstrated that these proteins could be classified into five OLE subfamilies. There were distinct patterns of sequence conservation among the OLE subfamilies and within individual tree species. Computational modeling indicated that OLEs were composed of at least three α-helixes connected with short coils without any β-strand and that they exhibited distinct 3D structures and ligand binding sites. These analyses provide fundamental information in the similarity and specificity of diverse OLE isoforms within the same subfamily and among the different species, which should facilitate studying the structure-function relationship and identify critical amino acid residues in OLEs for metabolic engineering of tree TAGs. PMID:26258573

  3. Genome-Wide Analysis of Oleosin Gene Family in 22 Tree Species: An Accelerator for Metabolic Engineering of BioFuel Crops and Agrigenomics Industrial Applications?

    PubMed

    Cao, Heping

    2015-09-01

    Trees contribute to enormous plant oil reserves because many trees contain 50%-80% of oil (triacylglycerols, TAGs) in the fruits and kernels. TAGs accumulate in subcellular structures called oil bodies/droplets, in which TAGs are covered by low-molecular-mass hydrophobic proteins called oleosins (OLEs). The OLEs/TAGs ratio determines the size and shape of intracellular oil bodies. There is a lack of comprehensive sequence analysis and structural information of OLEs among diverse trees. The objectives of this study were to identify OLEs from 22 tree species (e.g., tung tree, tea-oil tree, castor bean), perform genome-wide analysis of OLEs, classify OLEs, identify conserved sequence motifs and amino acid residues, and predict secondary and three-dimensional structures in tree OLEs and OLE subfamilies. Data mining identified 65 OLEs with perfect conservation of the "proline knot" motif (PX5SPX3P) from 19 trees. These OLEs contained >40% hydrophobic amino acid residues. They displayed similar properties and amino acid composition. Genome-wide phylogenetic analysis and multiple sequence alignment demonstrated that these proteins could be classified into five OLE subfamilies. There were distinct patterns of sequence conservation among the OLE subfamilies and within individual tree species. Computational modeling indicated that OLEs were composed of at least three α-helixes connected with short coils without any β-strand and that they exhibited distinct 3D structures and ligand binding sites. These analyses provide fundamental information in the similarity and specificity of diverse OLE isoforms within the same subfamily and among the different species, which should facilitate studying the structure-function relationship and identify critical amino acid residues in OLEs for metabolic engineering of tree TAGs.

  4. Research on wind field algorithm of wind lidar based on BP neural network and grey prediction

    NASA Astrophysics Data System (ADS)

    Chen, Yong; Chen, Chun-Li; Luo, Xiong; Zhang, Yan; Yang, Ze-hou; Zhou, Jie; Shi, Xiao-ding; Wang, Lei

    2018-01-01

    This paper uses the BP neural network and grey algorithm to forecast and study radar wind field. In order to reduce the residual error in the wind field prediction which uses BP neural network and grey algorithm, calculating the minimum value of residual error function, adopting the residuals of the gray algorithm trained by BP neural network, using the trained network model to forecast the residual sequence, using the predicted residual error sequence to modify the forecast sequence of the grey algorithm. The test data show that using the grey algorithm modified by BP neural network can effectively reduce the residual value and improve the prediction precision.

  5. Experimental assessment of the importance of amino acid positions identified by an entropy-based correlation analysis of multiple-sequence alignments.

    PubMed

    Dietrich, Susanne; Borst, Nadine; Schlee, Sandra; Schneider, Daniel; Janda, Jan-Oliver; Sterner, Reinhard; Merkl, Rainer

    2012-07-17

    The analysis of a multiple-sequence alignment (MSA) with correlation methods identifies pairs of residue positions whose occupation with amino acids changes in a concerted manner. It is plausible to assume that positions that are part of many such correlation pairs are important for protein function or stability. We have used the algorithm H2r to identify positions k in the MSAs of the enzymes anthranilate phosphoribosyl transferase (AnPRT) and indole-3-glycerol phosphate synthase (IGPS) that show a high conn(k) value, i.e., a large number of significant correlations in which k is involved. The importance of the identified residues was experimentally validated by performing mutagenesis studies with sAnPRT and sIGPS from the archaeon Sulfolobus solfataricus. For sAnPRT, five H2r mutant proteins were generated by replacing nonconserved residues with alanine or the prevalent residue of the MSA. As a control, five residues with conn(k) values of zero were chosen randomly and replaced with alanine. The catalytic activities and conformational stabilities of the H2r and control mutant proteins were analyzed by steady-state enzyme kinetics and thermal unfolding studies. Compared to wild-type sAnPRT, the catalytic efficiencies (k(cat)/K(M)) were largely unaltered. In contrast, the apparent thermal unfolding temperature (T(M)(app)) was lowered in most proteins. Remarkably, the strongest observed destabilization (ΔT(M)(app) = 14 °C) was caused by the V284A exchange, which pertains to the position with the highest correlation signal [conn(k) = 11]. For sIGPS, six H2r mutant and four control proteins with alanine exchanges were generated and characterized. The k(cat)/K(M) values of four H2r mutant proteins were reduced between 13- and 120-fold, and their T(M)(app) values were decreased by up to 5 °C. For the sIGPS control proteins, the observed activity and stability decreases were much less severe. Our findings demonstrate that positions with high conn(k) values have an increased probability of being important for enzyme function or stability.

  6. Active Site Characterization of Proteases Sequences from Different Species of Aspergillus.

    PubMed

    Morya, V K; Yadav, Virendra K; Yadav, Sangeeta; Yadav, Dinesh

    2016-09-01

    A total of 129 proteases sequences comprising 43 serine proteases, 36 aspartic proteases, 24 cysteine protease, 21 metalloproteases, and 05 neutral proteases from different Aspergillus species were analyzed for the catalytically active site residues using MEROPS database and various bioinformatics tools. Different proteases have predominance of variable active site residues. In case of 24 cysteine proteases of Aspergilli, the predominant active site residues observed were Gln193, Cys199, His364, Asn384 while for 43 serine proteases, the active site residues namely Asp164, His193, Asn284, Ser349 and Asp325, His357, Asn454, Ser519 were frequently observed. The analysis of 21 metalloproteases of Aspergilli revealed Glu298 and Glu388, Tyr476 as predominant active site residues. In general, Aspergilli species-specific active site residues were observed for different types of protease sequences analyzed. The phylogenetic analysis of these 129 proteases sequences revealed 14 different clans representing different types of proteases with diverse active site residues.

  7. Protein-RNA interface residue prediction using machine learning: an assessment of the state of the art.

    PubMed

    Walia, Rasna R; Caragea, Cornelia; Lewis, Benjamin A; Towfic, Fadi; Terribilini, Michael; El-Manzalawy, Yasser; Dobbs, Drena; Honavar, Vasant

    2012-05-10

    RNA molecules play diverse functional and structural roles in cells. They function as messengers for transferring genetic information from DNA to proteins, as the primary genetic material in many viruses, as catalysts (ribozymes) important for protein synthesis and RNA processing, and as essential and ubiquitous regulators of gene expression in living organisms. Many of these functions depend on precisely orchestrated interactions between RNA molecules and specific proteins in cells. Understanding the molecular mechanisms by which proteins recognize and bind RNA is essential for comprehending the functional implications of these interactions, but the recognition 'code' that mediates interactions between proteins and RNA is not yet understood. Success in deciphering this code would dramatically impact the development of new therapeutic strategies for intervening in devastating diseases such as AIDS and cancer. Because of the high cost of experimental determination of protein-RNA interfaces, there is an increasing reliance on statistical machine learning methods for training predictors of RNA-binding residues in proteins. However, because of differences in the choice of datasets, performance measures, and data representations used, it has been difficult to obtain an accurate assessment of the current state of the art in protein-RNA interface prediction. We provide a review of published approaches for predicting RNA-binding residues in proteins and a systematic comparison and critical assessment of protein-RNA interface residue predictors trained using these approaches on three carefully curated non-redundant datasets. We directly compare two widely used machine learning algorithms (Naïve Bayes (NB) and Support Vector Machine (SVM)) using three different data representations in which features are encoded using either sequence- or structure-based windows. Our results show that (i) Sequence-based classifiers that use a position-specific scoring matrix (PSSM)-based representation (PSSMSeq) outperform those that use an amino acid identity based representation (IDSeq) or a smoothed PSSM (SmoPSSMSeq); (ii) Structure-based classifiers that use smoothed PSSM representation (SmoPSSMStr) outperform those that use PSSM (PSSMStr) as well as sequence identity based representation (IDStr). PSSMSeq classifiers, when tested on an independent test set of 44 proteins, achieve performance that is comparable to that of three state-of-the-art structure-based predictors (including those that exploit geometric features) in terms of Matthews Correlation Coefficient (MCC), although the structure-based methods achieve substantially higher Specificity (albeit at the expense of Sensitivity) compared to sequence-based methods. We also find that the expected performance of the classifiers on a residue level can be markedly different from that on a protein level. Our experiments show that the classifiers trained on three different non-redundant protein-RNA interface datasets achieve comparable cross-validation performance. However, we find that the results are significantly affected by differences in the distance threshold used to define interface residues. Our results demonstrate that protein-RNA interface residue predictors that use a PSSM-based encoding of sequence windows outperform classifiers that use other encodings of sequence windows. While structure-based methods that exploit geometric features can yield significant increases in the Specificity of protein-RNA interface residue predictions, such increases are offset by decreases in Sensitivity. These results underscore the importance of comparing alternative methods using rigorous statistical procedures, multiple performance measures, and datasets that are constructed based on several alternative definitions of interface residues and redundancy cutoffs as well as including evaluations on independent test sets into the comparisons.

  8. Isolation, identification and synthesis of four novel antioxidant peptides from rice residue protein hydrolyzed by multiple proteases.

    PubMed

    Yan, Qiao-Juan; Huang, Lin-Hua; Sun, Qian; Jiang, Zheng-Qiang; Wu, Xia

    2015-07-15

    Multiple proteases were optimized to hydrolyze the rice residue protein (RRP) to produce novel antioxidant peptides. An antioxidant peptide fraction (RRPB3) with IC50 of 0.25 mg/ml was purified from the RRP hydrolysate using membrane ultrafiltration followed by size exclusion chromatography and reversed-phase FPLC. RRPB3 was found to include four peptides (RRPB3 I-IV) and their amino acid sequences were RPNYTDA (835.9 Da), TSQLLSDQ (891.0 Da), TRTGDPFF (940.0 Da) and NFHPQ (641.7 Da), respectively. Furthermore, four peptides were chemically synthesized and their antioxidant activities were assessed by DPPH radical scavenging, ABTS radical scavenging assay and FRAP-Fe(3+) reducing assay, respectively. Both RRPB3 I and III showed synergistic antioxidant activity compared to each of them used alone. All four synthetic peptides showed excellent stability against simulated gastrointestinal proteases. Therefore, the peptides isolated from RRP may be used as potential antioxidants in the food and drug industries. Copyright © 2015 Elsevier Ltd. All rights reserved.

  9. Molecular dynamics simulations of certain RGD-based peptides from Kistrin provide insight into the higher activity of REI-RGD34 protein at higher temperature.

    PubMed

    Upadhyay, Sanjay K

    2014-05-01

    To determine the bioactive conformation required to bind with receptor aIIbb3, the peptide sequence RIPRGDMP from Kistrin was inserted into CDR 1 loop region of REI protein, resulting in REI-RGD34. The activity of REI-RGD34 was observed to increase at higher temperature towards the receptor aIIbb3. It could be justified in either way: the modified complex forces the restricted peptide to adapt bioactive conformation or it unfolds the peptide in a way that opens its binding surface with high affinity for receptor. Here, we model the conformational preference of RGD sequence in RIPRGDMP at 25 and 42 °C using multiple MD simulations. Further, we model the peptide sequence RGD, PRGD and PRGDMP from kistrin to observe the effect of flanking residues on conformational sampling of RGD. The presence of flanking residues around RGD peptide greatly influenced the conformational sampling. A transition from bend to turn conformation was observed for RGD sequence at 42 °C. The turn conformation shows pharmacophoric parameters required to recognize the receptor aIIbb3. Thus, the temperaturedependent activity of RIPRGDMP when inserted into the loop region of REI can be explained by the presence of the turn conformation. This study will help in designing potential antagonist for the receptor aIIbb3.

  10. Naturally selected hepatitis C virus polymorphisms confer broad neutralizing antibody resistance.

    PubMed

    Bailey, Justin R; Wasilewski, Lisa N; Snider, Anna E; El-Diwany, Ramy; Osburn, William O; Keck, Zhenyong; Foung, Steven K H; Ray, Stuart C

    2015-01-01

    For hepatitis C virus (HCV) and other highly variable viruses, broadly neutralizing mAbs are an important guide for vaccine development. The development of resistance to anti-HCV mAbs is poorly understood, in part due to a lack of neutralization testing against diverse, representative panels of HCV variants. Here, we developed a neutralization panel expressing diverse, naturally occurring HCV envelopes (E1E2s) and used this panel to characterize neutralizing breadth and resistance mechanisms of 18 previously described broadly neutralizing anti-HCV human mAbs. The observed mAb resistance could not be attributed to polymorphisms in E1E2 at known mAb-binding residues. Additionally, hierarchical clustering analysis of neutralization resistance patterns revealed relationships between mAbs that were not predicted by prior epitope mapping, identifying 3 distinct neutralization clusters. Using this clustering analysis and envelope sequence data, we identified polymorphisms in E2 that confer resistance to multiple broadly neutralizing mAbs. These polymorphisms, which are not at mAb contact residues, also conferred resistance to neutralization by plasma from HCV-infected subjects. Together, our method of neutralization clustering with sequence analysis reveals that polymorphisms at noncontact residues may be a major immune evasion mechanism for HCV, facilitating viral persistence and presenting a challenge for HCV vaccine development.

  11. Comparative sequence analysis of acid sensitive/resistance proteins in Escherichia coli and Shigella flexneri

    PubMed Central

    Manikandan, Selvaraj; Balaji, Seetharaaman; Kumar, Anil; Kumar, Rita

    2007-01-01

    The molecular basis for the survival of bacteria under extreme conditions in which growth is inhibited is a question of great current interest. A preliminary study was carried out to determine residue pattern conservation among the antiporters of enteric bacteria, responsible for extreme acid sensitivity especially in Escherichia coli and Shigella flexneri. Here we found the molecular evidence that proved the relationship between E. coli and S. flexneri. Multiple sequence alignment of the gadC coded acid sensitive antiporter showed many conserved residue patterns at regular intervals at the N-terminal region. It was observed that as the alignment approaches towards the C-terminal, the number of conserved residues decreases, indicating that the N-terminal region of this protein has much active role when compared to the carboxyl terminal. The motif, FHLVFFLLLGG, is well conserved within the entire gadC coded protein at the amino terminal. The motif is also partially conserved among other antiporters (which are not coded by gadC) but involved in acid sensitive/resistance mechanism. Phylogenetic cluster analysis proves the relationship of Escherichia coli and Shigella flexneri. The gadC coded proteins are converged as a clade and diverged from other antiporters belongs to the amino acid-polyamine-organocation (APC) superfamily. PMID:21670792

  12. Cyclic mu-opioid receptor ligands containing multiple N-methylated amino acid residues.

    PubMed

    Adamska-Bartłomiejczyk, Anna; Janecka, Anna; Szabó, Márton Richárd; Cerlesi, Maria Camilla; Calo, Girolamo; Kluczyk, Alicja; Tömböly, Csaba; Borics, Attila

    2017-04-15

    In this study we report the in vitro activities of four cyclic opioid peptides with various sequence length/macrocycle size and N-methylamino acid residue content. N-Methylated amino acids were incorporated and cyclization was employed to enhance conformational rigidity to various extent. The effect of such modifications on ligand structure and binding properties were studied. The pentapeptide containing one endocyclic and one exocyclic N-methylated amino acid displayed the highest affinity to the mu-opioid receptor. This peptide was also shown to be a full agonist, while the other analogs failed to activate the mu opioid receptor. Results of molecular docking studies provided rationale for the explanation of binding properties on a structural basis. Copyright © 2017 Elsevier Ltd. All rights reserved.

  13. Structural and sequence features of two residue turns in beta-hairpins.

    PubMed

    Madan, Bharat; Seo, Sung Yong; Lee, Sun-Gu

    2014-09-01

    Beta-turns in beta-hairpins have been implicated as important sites in protein folding. In particular, two residue β-turns, the most abundant connecting elements in beta-hairpins, have been a major target for engineering protein stability and folding. In this study, we attempted to investigate and update the structural and sequence properties of two residue turns in beta-hairpins with a large data set. For this, 3977 beta-turns were extracted from 2394 nonhomologous protein chains and analyzed. First, the distribution, dihedral angles and twists of two residue turn types were determined, and compared with previous data. The trend of turn type occurrence and most structural features of the turn types were similar to previous results, but for the first time Type II turns in beta-hairpins were identified. Second, sequence motifs for the turn types were devised based on amino acid positional potentials of two-residue turns, and their distributions were examined. From this study, we could identify code-like sequence motifs for the two residue beta-turn types. Finally, structural and sequence properties of beta-strands in the beta-hairpins were analyzed, which revealed that the beta-strands showed no specific sequence and structural patterns for turn types. The analytical results in this study are expected to be a reference in the engineering or design of beta-hairpin turn structures and sequences. © 2014 Wiley Periodicals, Inc.

  14. Residues at a Single Site Differentiate Animal Cryptochromes from Cyclobutane Pyrimidine Dimer Photolyases by Affecting the Proteins' Preferences for Reduced FAD.

    PubMed

    Xu, Lei; Wen, Bin; Wang, Yuan; Tian, Changqing; Wu, Mingcai; Zhu, Guoping

    2017-06-19

    Cryptochromes (CRYs) and photolyases belong to the cryptochrome/photolyase family (CPF). Reduced FAD is essential for photolyases to photorepair UV-induced cyclobutane pyrimidine dimers (CPDs) or 6-4 photoproducts in DNA. In Drosophila CRY (dCRY, a type I animal CRY), FAD is converted to the anionic radical but not to the reduced state upon illumination, which might induce a conformational change in the protein to relay the light signal downstream. To explore the foundation of these differences, multiple sequence alignment of 650 CPF protein sequences was performed. We identified a site facing FAD (Ala377 in Escherichia coli CPD photolyase and Val415 in dCRY), hereafter referred to as "site 377", that was distinctly conserved across these sequences: CPD photolyases often had Ala, Ser, or Asn at this site, whereas animal CRYs had Ile, Leu, or Val. The binding affinity for reduced FAD, but not the photorepair activity of E. coli photolyase, was dramatically impaired when replacing Ala377 with any of the three CRY residues. Conversely, in V415S and V415N mutants of dCRY, FAD was photoreduced to its fully reduced state after prolonged illumination, and light-dependent conformational changes of these mutants were severely inhibited. We speculate that the residues at site 377 play a key role in the different preferences of CPF proteins for reduced FAD, which differentiate animal CRYs from CPD photolyases. © 2017 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.

  15. A new earthworm cellulase and its possible role in the innate immunity.

    PubMed

    Park, In Yong; Cha, Ju Roung; Ok, Suk-Mi; Shin, Chuog; Kim, Jin-Se; Kwak, Hee-Jin; Yu, Yun-Sang; Kim, Yu-Kyung; Medina, Brenda; Cho, Sung-Jin; Park, Soon Cheol

    2017-02-01

    A new endogenous cellulase (Ean-EG) from the earthworm, Eisenia andrei and its expression pattern are demonstrated. Based on a deduced amino acid sequence, the open reading frame (ORF) of Ean-EG consisted of 1368 bps corresponding to a polypeptide of 456 amino acid residues in which is contained the conserved region specific to GHF9 that has the essential amino acid residues for enzyme activity. In multiple alignments and phylogenetic analysis, the deduced amino acid sequence of Ean- EG showed the highest sequence similarity (about 79%) to that of an annelid (Pheretima hilgendorfi) and could be clustered together with other GHF9 cellulases, indicating that Ean-EG could be categorized as a member of the GHF9 to which most animal cellulases belong. The histological expression pattern of Ean-EG mRNA using in situ hybridization revealed that the most distinct expression was observed in epithelial cells with positive hybridization signal in epidermis, chloragogen tissue cells, coelomic cell-aggregate, and even blood vessel, which could strongly support the fact that at least in the earthworm, Eisenia andrei, cellulase function must not be limited to digestive process but be possibly extended to the innate immunity. Copyright © 2016 Elsevier Ltd. All rights reserved.

  16. Sequence composition and environment effects on residue fluctuations in protein structures

    NASA Astrophysics Data System (ADS)

    Ruvinsky, Anatoly M.; Vakser, Ilya A.

    2010-10-01

    Structure fluctuations in proteins affect a broad range of cell phenomena, including stability of proteins and their fragments, allosteric transitions, and energy transfer. This study presents a statistical-thermodynamic analysis of relationship between the sequence composition and the distribution of residue fluctuations in protein-protein complexes. A one-node-per-residue elastic network model accounting for the nonhomogeneous protein mass distribution and the interatomic interactions through the renormalized inter-residue potential is developed. Two factors, a protein mass distribution and a residue environment, were found to determine the scale of residue fluctuations. Surface residues undergo larger fluctuations than core residues in agreement with experimental observations. Ranking residues over the normalized scale of fluctuations yields a distinct classification of amino acids into three groups: (i) highly fluctuating-Gly, Ala, Ser, Pro, and Asp, (ii) moderately fluctuating-Thr, Asn, Gln, Lys, Glu, Arg, Val, and Cys, and (iii) weakly fluctuating-Ile, Leu, Met, Phe, Tyr, Trp, and His. The structural instability in proteins possibly relates to the high content of the highly fluctuating residues and a deficiency of the weakly fluctuating residues in irregular secondary structure elements (loops), chameleon sequences, and disordered proteins. Strong correlation between residue fluctuations and the sequence composition of protein loops supports this hypothesis. Comparing fluctuations of binding site residues (interface residues) with other surface residues shows that, on average, the interface is more rigid than the rest of the protein surface and Gly, Ala, Ser, Cys, Leu, and Trp have a propensity to form more stable docking patches on the interface. The findings have broad implications for understanding mechanisms of protein association and stability of protein structures.

  17. Elman RNN based classification of proteins sequences on account of their mutual information.

    PubMed

    Mishra, Pooja; Nath Pandey, Paras

    2012-10-21

    In the present work we have employed the method of estimating residue correlation within the protein sequences, by using the mutual information (MI) of adjacent residues, based on structural and solvent accessibility properties of amino acids. The long range correlation between nonadjacent residues is improved by constructing a mutual information vector (MIV) for a single protein sequence, like this each protein sequence is associated with its corresponding MIVs. These MIVs are given to Elman RNN to obtain the classification of protein sequences. The modeling power of MIV was shown to be significantly better, giving a new approach towards alignment free classification of protein sequences. We also conclude that sequence structural and solvent accessible property based MIVs are better predictor. Copyright © 2012 Elsevier Ltd. All rights reserved.

  18. An Adaptive Kalman Filter Using a Simple Residual Tuning Method

    NASA Technical Reports Server (NTRS)

    Harman, Richard R.

    1999-01-01

    One difficulty in using Kalman filters in real world situations is the selection of the correct process noise, measurement noise, and initial state estimate and covariance. These parameters are commonly referred to as tuning parameters. Multiple methods have been developed to estimate these parameters. Most of those methods such as maximum likelihood, subspace, and observer Kalman Identification require extensive offline processing and are not suitable for real time processing. One technique, which is suitable for real time processing, is the residual tuning method. Any mismodeling of the filter tuning parameters will result in a non-white sequence for the filter measurement residuals. The residual tuning technique uses this information to estimate corrections to those tuning parameters. The actual implementation results in a set of sequential equations that run in parallel with the Kalman filter. A. H. Jazwinski developed a specialized version of this technique for estimation of process noise. Equations for the estimation of the measurement noise have also been developed. These algorithms are used to estimate the process noise and measurement noise for the Wide Field Infrared Explorer star tracker and gyro.

  19. Rapid NMR Assignments of Proteins by Using Optimized Combinatorial Selective Unlabeling.

    PubMed

    Dubey, Abhinav; Kadumuri, Rajashekar Varma; Jaipuria, Garima; Vadrevu, Ramakrishna; Atreya, Hanudatta S

    2016-02-15

    A new approach for rapid resonance assignments in proteins based on amino acid selective unlabeling is presented. The method involves choosing a set of multiple amino acid types for selective unlabeling and identifying specific tripeptides surrounding the labeled residues from specific 2D NMR spectra in a combinatorial manner. The methodology directly yields sequence specific assignments, without requiring a contiguously stretch of amino acid residues to be linked, and is applicable to deuterated proteins. We show that a 2D [(15) N,(1) H] HSQC spectrum with two 2D spectra can result in ∼50 % assignments. The methodology was applied to two proteins: an intrinsically disordered protein (12 kDa) and the 29 kDa (268 residue) α-subunit of Escherichia coli tryptophan synthase, which presents a challenging case with spectral overlaps and missing peaks. The method can augment existing approaches and will be useful for applications such as identifying active-site residues involved in ligand binding, phosphorylation, or protein-protein interactions, even prior to complete resonance assignments. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  20. Identifying functionally informative evolutionary sequence profiles.

    PubMed

    Gil, Nelson; Fiser, Andras

    2018-04-15

    Multiple sequence alignments (MSAs) can provide essential input to many bioinformatics applications, including protein structure prediction and functional annotation. However, the optimal selection of sequences to obtain biologically informative MSAs for such purposes is poorly explored, and has traditionally been performed manually. We present Selection of Alignment by Maximal Mutual Information (SAMMI), an automated, sequence-based approach to objectively select an optimal MSA from a large set of alternatives sampled from a general sequence database search. The hypothesis of this approach is that the mutual information among MSA columns will be maximal for those MSAs that contain the most diverse set possible of the most structurally and functionally homogeneous protein sequences. SAMMI was tested to select MSAs for functional site residue prediction by analysis of conservation patterns on a set of 435 proteins obtained from protein-ligand (peptides, nucleic acids and small substrates) and protein-protein interaction databases. Availability and implementation: A freely accessible program, including source code, implementing SAMMI is available at https://github.com/nelsongil92/SAMMI.git. andras.fiser@einstein.yu.edu. Supplementary data are available at Bioinformatics online.

  1. MetaPSICOV: combining coevolution methods for accurate prediction of contacts and long range hydrogen bonding in proteins.

    PubMed

    Jones, David T; Singh, Tanya; Kosciolek, Tomasz; Tetchner, Stuart

    2015-04-01

    Recent developments of statistical techniques to infer direct evolutionary couplings between residue pairs have rendered covariation-based contact prediction a viable means for accurate 3D modelling of proteins, with no information other than the sequence required. To extend the usefulness of contact prediction, we have designed a new meta-predictor (MetaPSICOV) which combines three distinct approaches for inferring covariation signals from multiple sequence alignments, considers a broad range of other sequence-derived features and, uniquely, a range of metrics which describe both the local and global quality of the input multiple sequence alignment. Finally, we use a two-stage predictor, where the second stage filters the output of the first stage. This two-stage predictor is additionally evaluated on its ability to accurately predict the long range network of hydrogen bonds, including correctly assigning the donor and acceptor residues. Using the original PSICOV benchmark set of 150 protein families, MetaPSICOV achieves a mean precision of 0.54 for top-L predicted long range contacts-around 60% higher than PSICOV, and around 40% better than CCMpred. In de novo protein structure prediction using FRAGFOLD, MetaPSICOV is able to improve the TM-scores of models by a median of 0.05 compared with PSICOV. Lastly, for predicting long range hydrogen bonding, MetaPSICOV-HB achieves a precision of 0.69 for the top-L/10 hydrogen bonds compared with just 0.26 for the baseline MetaPSICOV. MetaPSICOV is available as a freely available web server at http://bioinf.cs.ucl.ac.uk/MetaPSICOV. Raw data (predicted contact lists and 3D models) and source code can be downloaded from http://bioinf.cs.ucl.ac.uk/downloads/MetaPSICOV. Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press.

  2. Efficient Covalent Bond Formation in Gas-Phase Peptide-Peptide Ion Complexes with the Photoleucine Stapler

    NASA Astrophysics Data System (ADS)

    Shaffer, Christopher J.; Andrikopoulos, Prokopis C.; Řezáč, Jan; Rulíšek, Lubomír; Tureček, František

    2016-04-01

    Noncovalent complexes of hydrophobic peptides GLLLG and GLLLK with photoleucine (L*) tagged peptides G(L* n L m )K (n = 1,3, m = 2,0) were generated as singly charged ions in the gas phase and probed by photodissociation at 355 nm. Carbene intermediates produced by photodissociative loss of N2 from the L* diazirine rings underwent insertion into X-H bonds of the target peptide moiety, forming covalent adducts with yields reaching 30%. Gas-phase sequencing of the covalent adducts revealed preferred bond formation at the C-terminal residue of the target peptide. Site-selective carbene insertion was achieved by placing the L* residue in different positions along the photopeptide chain, and the residues in the target peptide undergoing carbene insertion were identified by gas-phase ion sequencing that was aided by specific 13C labeling. Density functional theory calculations indicated that noncovalent binding to GL*L*L*K resulted in substantial changes of the (GLLLK + H)+ ground state conformation. The peptide moieties in [GL*L*LK + GLLLK + H]+ ion complexes were held together by hydrogen bonds, whereas dispersion interactions of the nonpolar groups were only secondary in ground-state 0 K structures. Born-Oppenheimer molecular dynamics for 100 ps trajectories of several different conformers at the 310 K laboratory temperature showed that noncovalent complexes developed multiple, residue-specific contacts between the diazirine carbons and GLLLK residues. The calculations pointed to the substantial fluidity of the nonpolar side chains in the complexes. Diazirine photochemistry in combination with Born-Oppenheimer molecular dynamics is a promising tool for investigations of peptide-peptide ion interactions in the gas phase.

  3. Study of Binding Interaction between Pif80 Protein Fragment and Aragonite

    NASA Astrophysics Data System (ADS)

    Du, Yuan-Peng; Chang, Hsun-Hui; Yang, Sheng-Yu; Huang, Shing-Jong; Tsai, Yu-Ju; Huang, Joseph Jen-Tse; Chan, Jerry Chun Chung

    2016-08-01

    Pif is a crucial protein for the formation of the nacreous layer in Pinctada fucata. Three non-acidic peptide fragments of the aragonite-binding domain (Pif80) are selected, which contain multiple copies of the repeat sequence DDRK, to study the interaction between non-acidic peptides and aragonite. The polypeptides DDRKDDRKGGK (Pif80-11) and DDRKDDRKGGKDDRKDDRKGGK (Pif80-22) have similar binding affinity to aragonite. Solid-state NMR data indicate that the backbones of Pif80-11 and Pif80-22 peptides bound on aragonite adopt a random-coil conformation. Pif80-11 is a lot more effective than Pif80-22 in promoting the nucleation of aragonite on the substrate of β-chitin. Our results suggest that the structural arrangement at a protein-mineral interface depends on the surface structure of the mineral substrate and the protein sequence. The side chains of the basic residues, which function as anchors to the aragonite surface, have uniform structures. The role of basic residues as anchors in protein-mineral interaction may play an important role in biomineralization.

  4. The complete amino acid sequence of human skeletal-muscle fructose-bisphosphate aldolase.

    PubMed Central

    Freemont, P S; Dunbar, B; Fothergill-Gilmore, L A

    1988-01-01

    The complete amino acid sequence of human skeletal-muscle fructose-bisphosphate aldolase, comprising 363 residues, was determined. The sequence was deduced by automated sequencing of CNBr-cleavage, o-iodosobenzoic acid-cleavage, trypsin-digest and staphylococcal-proteinase-digest fragments. Comparison of the sequence with other class I aldolase sequences shows that the mammalian muscle isoenzyme is one of the most highly conserved enzymes known, with only about 2% of the residues changing per 100 million years. Non-mammalian aldolases appear to be evolving at the same rate as other glycolytic enzymes, with about 4% of the residues changing per 100 million years. Secondary-structure predictions are analysed in an accompanying paper [Sawyer, Fothergill-Gilmore & Freemont (1988) Biochem. J. 249, 789-793]. PMID:3355497

  5. Functional region prediction with a set of appropriate homologous sequences-an index for sequence selection by integrating structure and sequence information with spatial statistics

    PubMed Central

    2012-01-01

    Background The detection of conserved residue clusters on a protein structure is one of the effective strategies for the prediction of functional protein regions. Various methods, such as Evolutionary Trace, have been developed based on this strategy. In such approaches, the conserved residues are identified through comparisons of homologous amino acid sequences. Therefore, the selection of homologous sequences is a critical step. It is empirically known that a certain degree of sequence divergence in the set of homologous sequences is required for the identification of conserved residues. However, the development of a method to select homologous sequences appropriate for the identification of conserved residues has not been sufficiently addressed. An objective and general method to select appropriate homologous sequences is desired for the efficient prediction of functional regions. Results We have developed a novel index to select the sequences appropriate for the identification of conserved residues, and implemented the index within our method to predict the functional regions of a protein. The implementation of the index improved the performance of the functional region prediction. The index represents the degree of conserved residue clustering on the tertiary structure of the protein. For this purpose, the structure and sequence information were integrated within the index by the application of spatial statistics. Spatial statistics is a field of statistics in which not only the attributes but also the geometrical coordinates of the data are considered simultaneously. Higher degrees of clustering generate larger index scores. We adopted the set of homologous sequences with the highest index score, under the assumption that the best prediction accuracy is obtained when the degree of clustering is the maximum. The set of sequences selected by the index led to higher functional region prediction performance than the sets of sequences selected by other sequence-based methods. Conclusions Appropriate homologous sequences are selected automatically and objectively by the index. Such sequence selection improved the performance of functional region prediction. As far as we know, this is the first approach in which spatial statistics have been applied to protein analyses. Such integration of structure and sequence information would be useful for other bioinformatics problems. PMID:22643026

  6. Combining protein sequence, structure, and dynamics: A novel approach for functional evolution analysis of PAS domain superfamily.

    PubMed

    Dong, Zheng; Zhou, Hongyu; Tao, Peng

    2018-02-01

    PAS domains are widespread in archaea, bacteria, and eukaryota, and play important roles in various functions. In this study, we aim to explore functional evolutionary relationship among proteins in the PAS domain superfamily in view of the sequence-structure-dynamics-function relationship. We collected protein sequences and crystal structure data from RCSB Protein Data Bank of the PAS domain superfamily belonging to three biological functions (nucleotide binding, photoreceptor activity, and transferase activity). Protein sequences were aligned and then used to select sequence-conserved residues and build phylogenetic tree. Three-dimensional structure alignment was also applied to obtain structure-conserved residues. The protein dynamics were analyzed using elastic network model (ENM) and validated by molecular dynamics (MD) simulation. The result showed that the proteins with same function could be grouped by sequence similarity, and proteins in different functional groups displayed statistically significant difference in their vibrational patterns. Interestingly, in all three functional groups, conserved amino acid residues identified by sequence and structure conservation analysis generally have a lower fluctuation than other residues. In addition, the fluctuation of conserved residues in each biological function group was strongly correlated with the corresponding biological function. This research suggested a direct connection in which the protein sequences were related to various functions through structural dynamics. This is a new attempt to delineate functional evolution of proteins using the integrated information of sequence, structure, and dynamics. © 2017 The Protein Society.

  7. A Modified LS+AR Model to Improve the Accuracy of the Short-term Polar Motion Prediction

    NASA Astrophysics Data System (ADS)

    Wang, Z. W.; Wang, Q. X.; Ding, Y. Q.; Zhang, J. J.; Liu, S. S.

    2017-03-01

    There are two problems of the LS (Least Squares)+AR (AutoRegressive) model in polar motion forecast: the inner residual value of LS fitting is reasonable, but the residual value of LS extrapolation is poor; and the LS fitting residual sequence is non-linear. It is unsuitable to establish an AR model for the residual sequence to be forecasted, based on the residual sequence before forecast epoch. In this paper, we make solution to those two problems with two steps. First, restrictions are added to the two endpoints of LS fitting data to fix them on the LS fitting curve. Therefore, the fitting values next to the two endpoints are very close to the observation values. Secondly, we select the interpolation residual sequence of an inward LS fitting curve, which has a similar variation trend as the LS extrapolation residual sequence, as the modeling object of AR for the residual forecast. Calculation examples show that this solution can effectively improve the short-term polar motion prediction accuracy by the LS+AR model. In addition, the comparison results of the forecast models of RLS (Robustified Least Squares)+AR, RLS+ARIMA (AutoRegressive Integrated Moving Average), and LS+ANN (Artificial Neural Network) confirm the feasibility and effectiveness of the solution for the polar motion forecast. The results, especially for the polar motion forecast in the 1-10 days, show that the forecast accuracy of the proposed model can reach the world level.

  8. Local Geometry and Evolutionary Conservation of Protein Surfaces Reveal the Multiple Recognition Patches in Protein-Protein Interactions

    PubMed Central

    Laine, Elodie; Carbone, Alessandra

    2015-01-01

    Protein-protein interactions (PPIs) are essential to all biological processes and they represent increasingly important therapeutic targets. Here, we present a new method for accurately predicting protein-protein interfaces, understanding their properties, origins and binding to multiple partners. Contrary to machine learning approaches, our method combines in a rational and very straightforward way three sequence- and structure-based descriptors of protein residues: evolutionary conservation, physico-chemical properties and local geometry. The implemented strategy yields very precise predictions for a wide range of protein-protein interfaces and discriminates them from small-molecule binding sites. Beyond its predictive power, the approach permits to dissect interaction surfaces and unravel their complexity. We show how the analysis of the predicted patches can foster new strategies for PPIs modulation and interaction surface redesign. The approach is implemented in JET2, an automated tool based on the Joint Evolutionary Trees (JET) method for sequence-based protein interface prediction. JET2 is freely available at www.lcqb.upmc.fr/JET2. PMID:26690684

  9. Sequence Complexity of Amyloidogenic Regions in Intrinsically Disordered Human Proteins

    PubMed Central

    Das, Swagata; Pal, Uttam; Das, Supriya; Bagga, Khyati; Roy, Anupam; Mrigwani, Arpita; Maiti, Nakul C.

    2014-01-01

    An amyloidogenic region (AR) in a protein sequence plays a significant role in protein aggregation and amyloid formation. We have investigated the sequence complexity of AR that is present in intrinsically disordered human proteins. More than 80% human proteins in the disordered protein databases (DisProt+IDEAL) contained one or more ARs. With decrease of protein disorder, AR content in the protein sequence was decreased. A probability density distribution analysis and discrete analysis of AR sequences showed that ∼8% residue in a protein sequence was in AR and the region was in average 8 residues long. The residues in the AR were high in sequence complexity and it seldom overlapped with low complexity regions (LCR), which was largely abundant in disorder proteins. The sequences in the AR showed mixed conformational adaptability towards α-helix, β-sheet/strand and coil conformations. PMID:24594841

  10. Query-seeded iterative sequence similarity searching improves selectivity 5–20-fold

    PubMed Central

    Li, Weizhong; Lopez, Rodrigo

    2017-01-01

    Abstract Iterative similarity search programs, like psiblast, jackhmmer, and psisearch, are much more sensitive than pairwise similarity search methods like blast and ssearch because they build a position specific scoring model (a PSSM or HMM) that captures the pattern of sequence conservation characteristic to a protein family. But models are subject to contamination; once an unrelated sequence has been added to the model, homologs of the unrelated sequence will also produce high scores, and the model can diverge from the original protein family. Examination of alignment errors during psiblast PSSM contamination suggested a simple strategy for dramatically reducing PSSM contamination. psiblast PSSMs are built from the query-based multiple sequence alignment (MSA) implied by the pairwise alignments between the query model (PSSM, HMM) and the subject sequences in the library. When the original query sequence residues are inserted into gapped positions in the aligned subject sequence, the resulting PSSM rarely produces alignment over-extensions or alignments to unrelated sequences. This simple step, which tends to anchor the PSSM to the original query sequence and slightly increase target percent identity, can reduce the frequency of false-positive alignments more than 20-fold compared with psiblast and jackhmmer, with little loss in search sensitivity. PMID:27923999

  11. Unified Deep Learning Architecture for Modeling Biology Sequence.

    PubMed

    Wu, Hongjie; Cao, Chengyuan; Xia, Xiaoyan; Lu, Qiang

    2017-10-09

    Prediction of the spatial structure or function of biological macromolecules based on their sequence remains an important challenge in bioinformatics. When modeling biological sequences using traditional sequencing models, characteristics, such as long-range interactions between basic units, the complicated and variable output of labeled structures, and the variable length of biological sequences, usually lead to different solutions on a case-by-case basis. This study proposed the use of bidirectional recurrent neural networks based on long short-term memory or a gated recurrent unit to capture long-range interactions by designing the optional reshape operator to adapt to the diversity of the output labels and implementing a training algorithm to support the training of sequence models capable of processing variable-length sequences. Additionally, the merge and pooling operators enhanced the ability to capture short-range interactions between basic units of biological sequences. The proposed deep-learning model and its training algorithm might be capable of solving currently known biological sequence-modeling problems through the use of a unified framework. We validated our model on one of the most difficult biological sequence-modeling problems currently known, with our results indicating the ability of the model to obtain predictions of protein residue interactions that exceeded the accuracy of current popular approaches by 10% based on multiple benchmarks.

  12. Zinc-binding Domain of the Bacteriophage T7 DNA Primase Modulates Binding to the DNA Template*

    PubMed Central

    Lee, Seung-Joo; Zhu, Bin; Akabayov, Barak; Richardson, Charles C.

    2012-01-01

    The zinc-binding domain (ZBD) of prokaryotic DNA primases has been postulated to be crucial for recognition of specific sequences in the single-stranded DNA template. To determine the molecular basis for this role in recognition, we carried out homolog-scanning mutagenesis of the zinc-binding domain of DNA primase of bacteriophage T7 using a bacterial homolog from Geobacillus stearothermophilus. The ability of T7 DNA primase to catalyze template-directed oligoribonucleotide synthesis is eliminated by substitution of any five-amino acid residue-long segment within the ZBD. The most significant defect occurs upon substitution of a region (Pro-16 to Cys-20) spanning two cysteines that coordinate the zinc ion. The role of this region in primase function was further investigated by generating a protein library composed of multiple amino acid substitutions for Pro-16, Asp-18, and Asn-19 followed by genetic screening for functional proteins. Examination of proteins selected from the screening reveals no change in sequence-specific recognition. However, the more positively charged residues in the region facilitate DNA binding, leading to more efficient oligoribonucleotide synthesis on short templates. The results suggest that the zinc-binding mode alone is not responsible for sequence recognition, but rather its interaction with the RNA polymerase domain is critical for DNA binding and for sequence recognition. Consequently, any alteration in the ZBD that disturbs its conformation leads to loss of DNA-dependent oligoribonucleotide synthesis. PMID:23024359

  13. SNBRFinder: A Sequence-Based Hybrid Algorithm for Enhanced Prediction of Nucleic Acid-Binding Residues.

    PubMed

    Yang, Xiaoxia; Wang, Jia; Sun, Jun; Liu, Rong

    2015-01-01

    Protein-nucleic acid interactions are central to various fundamental biological processes. Automated methods capable of reliably identifying DNA- and RNA-binding residues in protein sequence are assuming ever-increasing importance. The majority of current algorithms rely on feature-based prediction, but their accuracy remains to be further improved. Here we propose a sequence-based hybrid algorithm SNBRFinder (Sequence-based Nucleic acid-Binding Residue Finder) by merging a feature predictor SNBRFinderF and a template predictor SNBRFinderT. SNBRFinderF was established using the support vector machine whose inputs include sequence profile and other complementary sequence descriptors, while SNBRFinderT was implemented with the sequence alignment algorithm based on profile hidden Markov models to capture the weakly homologous template of query sequence. Experimental results show that SNBRFinderF was clearly superior to the commonly used sequence profile-based predictor and SNBRFinderT can achieve comparable performance to the structure-based template methods. Leveraging the complementary relationship between these two predictors, SNBRFinder reasonably improved the performance of both DNA- and RNA-binding residue predictions. More importantly, the sequence-based hybrid prediction reached competitive performance relative to our previous structure-based counterpart. Our extensive and stringent comparisons show that SNBRFinder has obvious advantages over the existing sequence-based prediction algorithms. The value of our algorithm is highlighted by establishing an easy-to-use web server that is freely accessible at http://ibi.hzau.edu.cn/SNBRFinder.

  14. Active site characterization and molecular cloning of Tenebrio molitor midgut trehalase and comments on their insect homologs.

    PubMed

    Gomez, Ana; Cardoso, Christiane; Genta, Fernando A; Terra, Walter R; Ferreira, Clélia

    2013-08-01

    The soluble midgut trehalase from Tenebrio molitor (TmTre1) was purified after several chromatographic steps, resulting in an enzyme with 58 kDa and pH optimum 5.3 (ionizing active groups in the free enzyme: pK(e1) = 3.8 ± 0.2 pK(e2) = 7.4 ± 0.2). The purified enzyme corresponds to the deduced amino acid sequence of a cloned cDNA (TmTre1-cDNA), because a single cDNA coding a soluble trehalase was found in the T. molitor midgut transcriptome. Furthermore, the mass of the protein predicted to be coded by TmTre1-cDNA agrees with that of the purified enzyme. TmTre1 has the essential catalytic groups Asp 315 and Glu 513 and the essential Arg residues R164, R217, R282. Carbodiimide inactivation of the purified enzyme at different pH values reveals an essential carboxyl group with pKa = 3.5 ± 0.3. Phenylglyoxal modified a single Arg residue with pKa = 7.5 ± 0.2, as observed in the soluble trehalase from Spodoptera frugiperda (SfTre1). Diethylpyrocarbonate modified a His residue that resulted in a less active enzyme with pK(e1) changed to 4.8 ± 0.2. In TmTre1 the modified His residue (putatively His 336) is more exposed than the His modified in SfTre1 (putatively His 210) and that affects the ionization of an Arg residue. The architecture of the active site of TmTre1 and SfTre1 is different, as shown by multiple inhibition analysis, the meaning of which demands further research. Trehalase sequences obtained from midgut transcriptomes (pyrosequencing and Illumina data) from 8 insects pertaining to 5 different orders were used in a cladogram, together with other representative sequences. The data suggest that the trehalase gene went duplication and divergence prior to the separation of the paraneopteran and holometabolan orders and that the soluble trehalase derived from the membrane-bound one by losing the C-terminal transmembrane loop. Copyright © 2013 Elsevier Ltd. All rights reserved.

  15. Sequence and structural characterization of Trx-Grx type of monothiol glutaredoxins from Ashbya gossypii.

    PubMed

    Yadav, Saurabh; Kumari, Pragati; Kushwaha, Hemant Ritturaj

    2013-01-01

    Glutaredoxins are enzymatic antioxidants which are small, ubiquitous, glutathione dependent and essentially classified under thioredoxin-fold superfamily. Glutaredoxins are classified into two types: dithiol and monothiol. Monothiol glutaredoxins which carry the signature "CGFS" as a redox active motif is known for its role in oxidative stress, inside the cell. In the present analysis, the 138 amino acid long monothiol glutaredoxin, AgGRX1 from Ashbya gossypii was identified and has been used for the analysis. The multiple sequence alignment of the AgGRX1 protein sequence revealed the characteristic motif of typical monothiol glutaredoxin as observed in various other organisms. The proposed structure of the AgGRX1 protein was used to analyze signature folds related to the thioredoxin superfamily. Further, the study highlighted the structural features pertaining to the complex mechanism of glutathione docking and interacting residues.

  16. Acetylation of the RhoA GEF Net1A controls its subcellular localization and activity

    PubMed Central

    Song, Eun Hyeon; Oh, Wonkyung; Ulu, Arzu; Carr, Heather S.; Zuo, Yan; Frost, Jeffrey A.

    2015-01-01

    ABSTRACT Net1 isoform A (Net1A) is a RhoA GEF that is required for cell motility and invasion in multiple cancers. Nuclear localization of Net1A negatively regulates its activity, and we have recently shown that Rac1 stimulates Net1A relocalization to the plasma membrane to promote RhoA activation and cytoskeletal reorganization. However, mechanisms controlling the subcellular localization of Net1A are not well understood. Here, we show that Net1A contains two nuclear localization signal (NLS) sequences within its N-terminus and that residues surrounding the second NLS sequence are acetylated. Treatment of cells with deacetylase inhibitors or expression of active Rac1 promotes Net1A acetylation. Deacetylase inhibition is sufficient for Net1A relocalization outside the nucleus, and replacement of the N-terminal acetylation sites with arginine residues prevents cytoplasmic accumulation of Net1A caused by deacetylase inhibition or EGF stimulation. By contrast, replacement of these sites with glutamine residues is sufficient for Net1A relocalization, RhoA activation and downstream signaling. Moreover, the N-terminal acetylation sites are required for rescue of F-actin accumulation and focal adhesion maturation in Net1 knockout MEFs. These data indicate that Net1A acetylation regulates its subcellular localization to impact on RhoA activity and actin cytoskeletal organization. PMID:25588829

  17. Protein consensus-based surface engineering (ProCoS): a computer-assisted method for directed protein evolution.

    PubMed

    Shivange, Amol V; Hoeffken, Hans Wolfgang; Haefner, Stefan; Schwaneberg, Ulrich

    2016-12-01

    Protein consensus-based surface engineering (ProCoS) is a simple and efficient method for directed protein evolution combining computational analysis and molecular biology tools to engineer protein surfaces. ProCoS is based on the hypothesis that conserved residues originated from a common ancestor and that these residues are crucial for the function of a protein, whereas highly variable regions (situated on the surface of a protein) can be targeted for surface engineering to maximize performance. ProCoS comprises four main steps: ( i ) identification of conserved and highly variable regions; ( ii ) protein sequence design by substituting residues in the highly variable regions, and gene synthesis; ( iii ) in vitro DNA recombination of synthetic genes; and ( iv ) screening for active variants. ProCoS is a simple method for surface mutagenesis in which multiple sequence alignment is used for selection of surface residues based on a structural model. To demonstrate the technique's utility for directed evolution, the surface of a phytase enzyme from Yersinia mollaretii (Ymphytase) was subjected to ProCoS. Screening just 1050 clones from ProCoS engineering-guided mutant libraries yielded an enzyme with 34 amino acid substitutions. The surface-engineered Ymphytase exhibited 3.8-fold higher pH stability (at pH 2.8 for 3 h) and retained 40% of the enzyme's specific activity (400 U/mg) compared with the wild-type Ymphytase. The pH stability might be attributed to a significantly increased (20 percentage points; from 9% to 29%) number of negatively charged amino acids on the surface of the engineered phytase.

  18. Re-Introduction of Transmembrane Serine Residues Reduce the Minimum Pore Diameter of Channelrhodopsin-2

    PubMed Central

    Richards, Ryan; Dempski, Robert E.

    2012-01-01

    Channelrhodopsin-2 (ChR2) is a microbial-type rhodopsin found in the green algae Chlamydomonas reinhardtii. Under physiological conditions, ChR2 is an inwardly rectifying cation channel that permeates a wide range of mono- and divalent cations. Although this protein shares a high sequence homology with other microbial-type rhodopsins, which are ion pumps, ChR2 is an ion channel. A sequence alignment of ChR2 with bacteriorhodopsin, a proton pump, reveals that ChR2 lacks specific motifs and residues, such as serine and threonine, known to contribute to non-covalent interactions within transmembrane domains. We hypothesized that reintroduction of the eight transmembrane serine residues present in bacteriorhodopsin, but not in ChR2, will restrict the conformational flexibility and reduce the pore diameter of ChR2. In this work, eight single serine mutations were created at homologous positions in ChR2. Additionally, an endogenous transmembrane serine was replaced with alanine. We measured kinetics, changes in reversal potential, and permeability ratios in different alkali metal solutions using two-electrode voltage clamp. Applying excluded volume theory, we calculated the minimum pore diameter of ChR2 constructs. An analysis of the results from our experiments show that reintroducing serine residues into the transmembrane domain of ChR2 can restrict the minimum pore diameter through inter- and intrahelical hydrogen bonds while the removal of a transmembrane serine results in a larger pore diameter. Therefore, multiple positions along the intracellular side of the transmembrane domains contribute to the cation permeability of ChR2. PMID:23185520

  19. Improving protein-protein interaction prediction using evolutionary information from low-quality MSAs.

    PubMed

    Várnai, Csilla; Burkoff, Nikolas S; Wild, David L

    2017-01-01

    Evolutionary information stored in multiple sequence alignments (MSAs) has been used to identify the interaction interface of protein complexes, by measuring either co-conservation or co-mutation of amino acid residues across the interface. Recently, maximum entropy related correlated mutation measures (CMMs) such as direct information, decoupling direct from indirect interactions, have been developed to identify residue pairs interacting across the protein complex interface. These studies have focussed on carefully selected protein complexes with large, good-quality MSAs. In this work, we study protein complexes with a more typical MSA consisting of fewer than 400 sequences, using a set of 79 intramolecular protein complexes. Using a maximum entropy based CMM at the residue level, we develop an interface level CMM score to be used in re-ranking docking decoys. We demonstrate that our interface level CMM score compares favourably to the complementarity trace score, an evolutionary information-based score measuring co-conservation, when combined with the number of interface residues, a knowledge-based potential and the variability score of individual amino acid sites. We also demonstrate, that, since co-mutation and co-complementarity in the MSA contain orthogonal information, the best prediction performance using evolutionary information can be achieved by combining the co-mutation information of the CMM with co-conservation information of a complementarity trace score, predicting a near-native structure as the top prediction for 41% of the dataset. The method presented is not restricted to small MSAs, and will likely improve interface prediction also for complexes with large and good-quality MSAs.

  20. Occurrence of C-Terminal Residue Exclusion in Peptide Fragmentation by ESI and MALDI Tandem Mass Spectrometry

    NASA Astrophysics Data System (ADS)

    Dupré, Mathieu; Cantel, Sonia; Martinez, Jean; Enjalbal, Christine

    2012-02-01

    By screening a data set of 392 synthetic peptides MS/MS spectra, we found that a known C-terminal rearrangement was unexpectedly frequently occurring from monoprotonated molecular ions in both ESI and MALDI tandem mass spectrometry upon low and high energy collision activated dissociations with QqTOF and TOF/TOF mass analyzer configuration, respectively. Any residue localized at the C-terminal carboxylic acid end, even a basic one, was lost, provided that a basic amino acid such arginine and to a lesser extent histidine and lysine was present in the sequence leading to a fragment ion, usually depicted as (bn-1 + H2O) ion, corresponding to a shortened non-scrambled peptide chain. Far from being an epiphenomenon, such a residue exclusion from the peptide chain C-terminal extremity gave a fragment ion that was the base peak of the MS/MS spectrum in certain cases. Within the frame of the mobile proton model, the ionizing proton being sequestered onto the basic amino acid side chain, it is known that the charge directed fragmentation mechanism involved the C-terminal carboxylic acid function forming an anhydride intermediate structure. The same mechanism was also demonstrated from cationized peptides. To confirm such assessment, we have prepared some of the peptides that displayed such C-terminal residue exclusion as a C-terminal backbone amide. As expected in this peptide amide series, the production of truncated chains was completely suppressed. Besides, multiply charged molecular ions of all peptides recorded in ESI mass spectrometry did not undergo such fragmentation validating that any mobile ionizing proton will prevent such a competitive C-terminal backbone rearrangement. Among all well-known nondirect sequence fragment ions issued from non specific loss of neutral molecules (mainly H2O and NH3) and multiple backbone amide ruptures (b-type internal ions), the described C-terminal residue exclusion is highly identifiable giving raise to a single fragment ion in the high mass range of the MS/MS spectra. The mass difference between this signal and the protonated molecular ion corresponds to the mass of the C-terminal residue. It allowed a straightforward identification of the amino acid positioned at this extremity. It must be emphasized that a neutral residue loss can be misattributed to the formation of a ym-1 ion, i.e., to the loss of the N-terminal residue following the a1-ym-1 fragmentation channel. Extreme caution must be adopted when reading the direct sequence ion on the positive ion MS/MS spectra of singly charged peptides not to mix up the attribution of the N- and C-terminal amino acids. Although such peculiar fragmentation behavior is of obvious interest for de novo peptide sequencing, it can also be exploited in proteomics, especially for studies involving digestion protocols carried out with proteolytic enzymes other than trypsin (Lys-N, Glu-C, and Asp-N) that produce arginine-containing peptides.

  1. Increasing Prion Propensity by Hydrophobic Insertion

    PubMed Central

    Petri, Michelina; Flores, Noe; Rogge, Ryan A.; Cascarina, Sean M.; Ross, Eric D.

    2014-01-01

    Prion formation involves the conversion of proteins from a soluble form into an infectious amyloid form. Most yeast prion proteins contain glutamine/asparagine-rich regions that are responsible for prion aggregation. Prion formation by these domains is driven primarily by amino acid composition, not primary sequence, yet there is a surprising disconnect between the amino acids thought to have the highest aggregation propensity and those that are actually found in yeast prion domains. Specifically, a recent mutagenic screen suggested that both aromatic and non-aromatic hydrophobic residues strongly promote prion formation. However, while aromatic residues are common in yeast prion domains, non-aromatic hydrophobic residues are strongly under-represented. Here, we directly test the effects of hydrophobic and aromatic residues on prion formation. Remarkably, we found that insertion of as few as two hydrophobic residues resulted in a multiple orders-of-magnitude increase in prion formation, and significant acceleration of in vitro amyloid formation. Thus, insertion or deletion of hydrophobic residues provides a simple tool to control the prion activity of a protein. These data, combined with bioinformatics analysis, suggest a limit on the number of strongly prion-promoting residues tolerated in glutamine/asparagine-rich domains. This limit may explain the under-representation of non-aromatic hydrophobic residues in yeast prion domains. Prion activity requires not only that a protein be able to form prion fibers, but also that these fibers be cleaved to generate new independently-segregating aggregates to offset dilution by cell division. Recent studies suggest that aromatic residues, but not non-aromatic hydrophobic residues, support the fiber cleavage step. Therefore, we propose that while both aromatic and non-aromatic hydrophobic residues promote prion formation, aromatic residues are favored in yeast prion domains because they serve a dual function, promoting both prion formation and chaperone-dependent prion propagation. PMID:24586661

  2. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Tai, Lin-Ru; Chou, Chang-Wei; Lee, I-Fang

    In this study, we used a multiple copy (EGFP){sub 3} reporter system to establish a numeric nuclear index system to assess the degree of nuclear import. The system was first validated by a FRAP assay, and then was applied to evaluate the essential and multifaceted nature of basic amino acid clusters during the nuclear import of ribosomal protein L7. The results indicate that the sequence context of the basic cluster determines the degree of nuclear import, and that the number of basic residues in the cluster is irrelevant; rather the position of the pertinent basic residues is crucial. Moreover, itmore » also found that the type of carrier protein used by basic cluster has a great impact on the degree of nuclear import. In case of L7, importin β2 or importin β3 are preferentially used by clusters with a high import efficiency, notwithstanding that other importins are also used by clusters with a weaker level of nuclear import. Such a preferential usage of multiple basic clusters and importins to gain nuclear entry would seem to be a common practice among ribosomal proteins in order to ensure their full participation in high rate ribosome synthesis. - Highlights: ► We introduce a numeric index system that represents the degree of nuclear import. ► The rate of nuclear import is dictated by the sequence context of the basic cluster. ► Importin β2 and β3 were mainly responsible for the N4 mediated nuclear import.« less

  3. Engineering a horseradish peroxidase C stable to radical attacks by mutating multiple radical coupling sites.

    PubMed

    Kim, Su Jin; Joo, Jeong Chan; Song, Bong Keun; Yoo, Young Je; Kim, Yong Hwan

    2015-04-01

    Peroxidases have great potential as industrial biocatalysts. In particular, the oxidative polymerization of phenolic compounds catalyzed by peroxidases has been extensively examined because of the advantage of this method over other conventional chemical methods. However, the industrial application of peroxidases is often limited because of their rapid inactivation by phenoxyl radicals during oxidative polymerization. In this work, we report a novel protein engineering approach to improve the radical stability of horseradish peroxidase isozyme C (HRPC). Phenylalanine residues that are vulnerable to modification by the phenoxyl radicals were identified using mass spectrometry analysis. UV-Vis and CD spectra showed that radical coupling did not change the secondary structure or the active site of HRPC. Four phenylalanine (Phe) residues (F68, F142, F143, and F179) were each mutated to alanine residues to generate single mutants to examine the role of these sites in radical coupling. Despite marginal improvement of radical stability, each single mutant still exhibited rapid radical inactivation. To further reduce inactivation by radical coupling, the four substitution mutations were combined in F68A/F142A/F143A/F179A. This mutant demonstrated dramatic enhancement of radical stability by retaining 41% of its initial activity compared to the wild-type, which was completely inactivated. Structure and sequence alignment revealed that radical-vulnerable Phe residues of HPRC are conserved in homologous peroxidases, which showed the same rapid inactivation tendency as HRPC. Based on our site-directed mutagenesis and biochemical characterization, we have shown that engineering radical-vulnerable residues to eliminate multiple radical coupling can be a good strategy to improve the stability of peroxidases against radical attack. © 2014 Wiley Periodicals, Inc.

  4. Proteolytic interconversion and N-terminal sequences of the Citrobacter diversus major beta-lactamases.

    PubMed Central

    Franceschini, N; Amicosante, G; Perilli, M; Maccarrone, M; Oratore, A; van Beeumen, J; Frère, J M

    1991-01-01

    The N-terminal sequences of the two major beta-lactamases produced by Citrobacter diversus differed only by the absence of the first residue in form II and the loss of five amino acid residues at the C-terminal end. Limited proteolysis of the homogeneous form I protein yielded a variety of enzymatically active products. In the major product obtained after the action of papain, the first three N-terminal residues of form I had been cleaved, whereas at the C-terminal end the treated enzyme lacked five residues. However, this cannot explain the different behaviours of form I, form II and papain digestion product upon chromatofocusing. Form I, which was sequenced up to position 56, exhibited a very high degree of similarity with a Klebsiella oxytoca beta-lactamase. The determined sequence, which contained the active serine residue, demonstrated that the chromosome-encoded beta-lactamase of Citrobacter diversus belong to class A. Images Fig. 2. PMID:2039443

  5. Amino acid sequence of a trypsin inhibitor from a Spirometra (Spirometra erinaceieuropaei).

    PubMed

    Sanda, A; Uchida, A; Itagaki, T; Kobayashi, H; Inokuchi, N; Koyama, T; Iwama, M; Ohgi, K; Irie, M

    2001-12-01

    A trypsin inhibitor that is highly homologous with bovine pancreatic trypsin inhibitor (BPTI) was co-purified along with RNase from Spirometra (Spirometra erinaceieuropaei). The amino acid sequence of this inhibitor (SETI) and the nucleotide sequence of the cDNA encoding this protein were determined by protein chemistry and gene technology. SETI contains 68 amino acid residues and has a molecular mass of 7,798 Da. SETI has 31 amino acid residues that are identical with BPTI's sequence, including 6 half-cystine and 5 aromatic amino acid residues. The active site Lys residue in BPTI is replaced by an Arg residue in SETI. SETI is an effective inhibitor of trypsin and moderately inhibits a-chymotrypsin, but less inhibits elastase or subtilisin. SETI was expressed by E. coli containing a PelB vector carrying the SETI encoding cDNA; an expression yield of 0.68 mg/l was obtained. The phylogenetic relationship of SETI and the other BPTI-like trypsin inhibitors was analyzed using most likelihood inference methods.

  6. HomPPI: a class of sequence homology based protein-protein interface prediction methods

    PubMed Central

    2011-01-01

    Background Although homology-based methods are among the most widely used methods for predicting the structure and function of proteins, the question as to whether interface sequence conservation can be effectively exploited in predicting protein-protein interfaces has been a subject of debate. Results We studied more than 300,000 pair-wise alignments of protein sequences from structurally characterized protein complexes, including both obligate and transient complexes. We identified sequence similarity criteria required for accurate homology-based inference of interface residues in a query protein sequence. Based on these analyses, we developed HomPPI, a class of sequence homology-based methods for predicting protein-protein interface residues. We present two variants of HomPPI: (i) NPS-HomPPI (Non partner-specific HomPPI), which can be used to predict interface residues of a query protein in the absence of knowledge of the interaction partner; and (ii) PS-HomPPI (Partner-specific HomPPI), which can be used to predict the interface residues of a query protein with a specific target protein. Our experiments on a benchmark dataset of obligate homodimeric complexes show that NPS-HomPPI can reliably predict protein-protein interface residues in a given protein, with an average correlation coefficient (CC) of 0.76, sensitivity of 0.83, and specificity of 0.78, when sequence homologs of the query protein can be reliably identified. NPS-HomPPI also reliably predicts the interface residues of intrinsically disordered proteins. Our experiments suggest that NPS-HomPPI is competitive with several state-of-the-art interface prediction servers including those that exploit the structure of the query proteins. The partner-specific classifier, PS-HomPPI can, on a large dataset of transient complexes, predict the interface residues of a query protein with a specific target, with a CC of 0.65, sensitivity of 0.69, and specificity of 0.70, when homologs of both the query and the target can be reliably identified. The HomPPI web server is available at http://homppi.cs.iastate.edu/. Conclusions Sequence homology-based methods offer a class of computationally efficient and reliable approaches for predicting the protein-protein interface residues that participate in either obligate or transient interactions. For query proteins involved in transient interactions, the reliability of interface residue prediction can be improved by exploiting knowledge of putative interaction partners. PMID:21682895

  7. Complete amino acid sequence of the myoglobin from the Pacific sei whale, Balaenoptera borealis.

    PubMed

    Jones, B N; Rothgeb, T M; England, R D; Gurd, F R

    1979-04-25

    The complete amino acid sequence of the major component myoglobin from Pacific sei whale, Balaenoptera borealis, was determined by specific cleavage of the protein to obtain large peptides which are readily degraded by the automatic sequencer. The acetimidated apomyoglobin was selectively cleaved at its two methionyl residues with cyanogen bromide and at its three arginyl residues by trypsin. From the sequence analysis of four of these peptides and the apomyoglobin, over 75% of the covalent structure of the protein was obtained. The remainder of the primary structure was determined by the sequence analysis of peptides that resulted from further digestion of the amino-terminal and central cyanogen bromide fragments. The amino-terminal fragment was specifically cleaved at its two tryptophanyl residues with N-chlorosuccinimide and the central cyanogen bromide fragment was cleaved at its glutamyl residues with staphylococcal protease and at its single tyrosyl residue with N-bromosuccinimide. The primary structure of this myoglobin proved identical with that from the gray whale but differs from that of the finback whale at four positions, from that of the minke whale at three positions and from the myoglobin of the humpback whale at one position. The above sequence identities and differences reflect the close taxonomic relationship of these five species of Cetacea.

  8. Immediate-Early Transactivator Rta of Epstein-Barr Virus (EBV) Shows Multiple Epitopes Recognized by EBV-Specific Cytotoxic T Lymphocytes

    PubMed Central

    Pepperl, Sandra; Benninger-Döring, Gerlinde; Modrow, Susanne; Wolf, Hans; Jilg, Wolfgang

    1998-01-01

    We analyzed the immediate-early transactivator Rta of Epstein-Barr virus (EBV) for its role as a target for specific cytotoxic T lymphocytes (CTL). Panels of overlapping peptides covering the entire amino acid sequence of Rta were synthesized and used to induce and analyze specific CTL responses in EBV-positive donors. Using peptide-pulsed target cells, we found nine different CTL epitopes that are distributed over the entire protein sequence. One epitope restricted by HLA-A24 could be mapped to the decameric sequence DYCNVLNKEF between amino acid positions 28 and 37 of the Rta protein. A second epitope could be assigned to the same region of Rta (residues 25 to 39) and was shown to be restricted by HLA-B18. Another, minimal epitope could be mapped to the nonameric sequence ATIGTAMYK between amino acid positions 134 and 142; this peptide was restricted by HLA-A11. Another four epitopes were proven to be restricted by HLA-A2, -A3, -B61, and -Cw4 and were located between Rta residues 225 and 239, 145 and 159, 529 and 543, and 393 and 407, respectively. For two other epitopes, only the location within the Rta protein is known so far (residues 121 to 135 and 441 to 455); their exact HLA restriction patterns have not yet been identified. Using target cells infected with recombinant vaccinia virus containing the gene for Rta, we showed that six of eight Rta-specific CTL lines recognized the corresponding peptides also after endogenous processing. These data suggest that Rta comprises an important target for EBV-specific cellular cytotoxicity. Together with recent findings of other immediate-early and early proteins also acting as CTL targets, they reveal the role of proteins of the lytic cycle in the immune recognition of EBV-infected cells. PMID:9765404

  9. Integrative View of the Diversity and Evolution of SWEET and SemiSWEET Sugar Transporters

    PubMed Central

    Jia, Baolei; Zhu, Xiao Feng; Pu, Zhong Ji; Duan, Yu Xi; Hao, Lu Jiang; Zhang, Jie; Chen, Li-Qing; Jeon, Che Ok; Xuan, Yuan Hu

    2017-01-01

    Sugars Will Eventually be Exported Transporter (SWEET) and SemiSWEET are recently characterized families of sugar transporters in eukaryotes and prokaryotes, respectively. SemiSWEETs contain 3 transmembrane helices (TMHs), while SWEETs contain 7. Here, we performed sequence-based comprehensive analyses for SWEETs and SemiSWEETs across the biosphere. In total, 3,249 proteins were identified and ≈60% proteins were found in green plants and Oomycota, which include a number of important plant pathogens. Protein sequence similarity networks indicate that proteins from different organisms are significantly clustered. Of note, SemiSWEETs with 3 or 4 TMHs that may fuse to SWEET were identified in plant genomes. 7-TMH SWEETs were found in bacteria, implying that SemiSWEET can be fused directly in prokaryote. 15-TMH extraSWEET and 25-TMH superSWEET were also observed in wild rice and oomycetes, respectively. The transporters can be classified into 4, 2, 2, and 2 clades in plants, Metazoa, unicellular eukaryotes, and prokaryotes, respectively. The consensus and coevolution of amino acids in SWEETs were identified by multiple sequence alignments. The functions of the highly conserved residues were analyzed by molecular dynamics analysis. The 19 most highly conserved residues in the SWEETs were further confirmed by point mutagenesis using SWEET1 from Arabidopsis thaliana. The results proved that the conserved residues located in the extrafacial gate (Y57, G58, G131, and P191), the substrate binding pocket (N73, N192, and W176), and the intrafacial gate (P43, Y83, F87, P145, M161, P162, and Q202) play important roles for substrate recognition and transport processes. Taken together, our analyses provide a foundation for understanding the diversity, classification, and evolution of SWEETs and SemiSWEETs using large-scale sequence analysis and further show that gene duplication and gene fusion are important factors driving the evolution of SWEETs. PMID:29326750

  10. Integrative View of the Diversity and Evolution of SWEET and SemiSWEET Sugar Transporters.

    PubMed

    Jia, Baolei; Zhu, Xiao Feng; Pu, Zhong Ji; Duan, Yu Xi; Hao, Lu Jiang; Zhang, Jie; Chen, Li-Qing; Jeon, Che Ok; Xuan, Yuan Hu

    2017-01-01

    Sugars Will Eventually be Exported Transporter (SWEET) and SemiSWEET are recently characterized families of sugar transporters in eukaryotes and prokaryotes, respectively. SemiSWEETs contain 3 transmembrane helices (TMHs), while SWEETs contain 7. Here, we performed sequence-based comprehensive analyses for SWEETs and SemiSWEETs across the biosphere. In total, 3,249 proteins were identified and ≈60% proteins were found in green plants and Oomycota, which include a number of important plant pathogens. Protein sequence similarity networks indicate that proteins from different organisms are significantly clustered. Of note, SemiSWEETs with 3 or 4 TMHs that may fuse to SWEET were identified in plant genomes. 7-TMH SWEETs were found in bacteria, implying that SemiSWEET can be fused directly in prokaryote. 15-TMH extraSWEET and 25-TMH superSWEET were also observed in wild rice and oomycetes, respectively. The transporters can be classified into 4, 2, 2, and 2 clades in plants, Metazoa, unicellular eukaryotes, and prokaryotes, respectively. The consensus and coevolution of amino acids in SWEETs were identified by multiple sequence alignments. The functions of the highly conserved residues were analyzed by molecular dynamics analysis. The 19 most highly conserved residues in the SWEETs were further confirmed by point mutagenesis using SWEET1 from Arabidopsis thaliana . The results proved that the conserved residues located in the extrafacial gate (Y57, G58, G131, and P191), the substrate binding pocket (N73, N192, and W176), and the intrafacial gate (P43, Y83, F87, P145, M161, P162, and Q202) play important roles for substrate recognition and transport processes. Taken together, our analyses provide a foundation for understanding the diversity, classification, and evolution of SWEETs and SemiSWEETs using large-scale sequence analysis and further show that gene duplication and gene fusion are important factors driving the evolution of SWEETs.

  11. Complexity in the cattle CD94/NKG2 gene families.

    PubMed

    Birch, James; Ellis, Shirley A

    2007-04-01

    Natural killer cell responses are controlled to a large extent by the interaction of an array of inhibitory and activating receptors with their ligands. The mostly nonpolymorphic CD94/NKG2 receptors in both humans and mice were shown to recognize a single nonclassical MHC class I molecule in each case. In this paper, we describe the CD94/NKG2 gene family in cattle. NKG2 and CD94 sequences were amplified from cDNA derived from four animals. Four CD94 sequences, ten NKG2A, and three NKG2C sequences were identified in total. In contrast to human, we show that cattle have multiple distinct NKG2A genes, some of which show minor allelic variation. All of the sequences designated NKG2A have two tyrosine-based inhibitory motifs in the cytoplasmic domain and one putative gene has, in addition, a charged residue in the transmembrane domain. NKG2C appears to be essentially monomorphic in cattle. All of the NKG2A sequences are similar apart from NKG2A-01, which, in contrast, shares the majority of its carbohydrate recognition domain with NKG2-C. Most of the genes appear to generate multiple alternatively spliced forms. These findings suggest that the CD94/NKG2A heterodimers in cattle, in contrast to other species, are binding several different ligands. Because NKG2C is not polymorphic, this raises questions as to the combined functional capacity of the CD94/NKG2 gene families in cattle.

  12. SEQATOMS: a web tool for identifying missing regions in PDB in sequence context.

    PubMed

    Brandt, Bernd W; Heringa, Jaap; Leunissen, Jack A M

    2008-07-01

    With over 46 000 proteins, the Protein Data Bank (PDB) is the most important database with structural information of biological macromolecules. PDB files contain sequence and coordinate information. Residues present in the sequence can be absent from the coordinate section, which means their position in space is unknown. Similarity searches are routinely carried out against sequences taken from PDB SEQRES. However, there no distinction is made between residues that have a known or unknown position in the 3D protein structure. We present a FASTA sequence database that is produced by combining the sequence and coordinate information. All residues absent from the PDB coordinate section are masked with lower-case letters, thereby providing a view of these residues in the context of the entire protein sequence, which facilitates inspecting 'missing' regions. We also provide a masked version of the CATH domain database. A user-friendly BLAST interface is available for similarity searching. In contrast to standard (stand-alone) BLAST output, which only contains upper-case letters, our output retains the lower-case letters of the masked regions. Thus, our server can be used to perform BLAST searching case-sensitively. Here, we have applied it to the study of missing regions in their sequence context. SEQATOMS is available at http://www.bioinformatics.nl/tools/seqatoms/.

  13. KinView: A visual comparative sequence analysis tool for integrated kinome research

    PubMed Central

    McSkimming, Daniel Ian; Dastgheib, Shima; Baffi, Timothy R.; Byrne, Dominic P.; Ferries, Samantha; Scott, Steven Thomas; Newton, Alexandra C.; Eyers, Claire E.; Kochut, Krzysztof J.; Eyers, Patrick A.

    2017-01-01

    Multiple sequence alignments (MSAs) are a fundamental analysis tool used throughout biology to investigate relationships between protein sequence, structure, function, evolutionary history, and patterns of disease-associated variants. However, their widespread application in systems biology research is currently hindered by the lack of user-friendly tools to simultaneously visualize, manipulate and query the information conceptualized in large sequence alignments, and the challenges in integrating MSAs with multiple orthogonal data such as cancer variants and post-translational modifications, which are often stored in heterogeneous data sources and formats. Here, we present the Multiple Sequence Alignment Ontology (MSAOnt), which represents a profile or consensus alignment in an ontological format. Subsets of the alignment are easily selected through the SPARQL Protocol and RDF Query Language for downstream statistical analysis or visualization. We have also created the Kinome Viewer (KinView), an interactive integrative visualization that places eukaryotic protein kinase cancer variants in the context of natural sequence variation and experimentally determined post-translational modifications, which play central roles in the regulation of cellular signaling pathways. Using KinView, we identified differential phosphorylation patterns between tyrosine and serine/threonine kinases in the activation segment, a major kinase regulatory region that is often mutated in proliferative diseases. We discuss cancer variants that disrupt phosphorylation sites in the activation segment, and show how KinView can be used as a comparative tool to identify differences and similarities in natural variation, cancer variants and post-translational modifications between kinase groups, families and subfamilies. Based on KinView comparisons, we identify and experimentally characterize a regulatory tyrosine (Y177PLK4) in the PLK4 C-terminal activation segment region termed the P+1 loop. To further demonstrate the application of KinView in hypothesis generation and testing, we formulate and validate a hypothesis explaining a novel predicted loss-of-function variant (D523NPKCβ) in the regulatory spine of PKCβ, a recently identified tumor suppressor kinase. KinView provides a novel, extensible interface for performing comparative analyses between subsets of kinases and for integrating multiple types of residue specific annotations in user friendly formats. PMID:27731453

  14. Phylotranscriptomic analysis uncovers a wealth of tissue inhibitor of metalloproteinases variants in echinoderms

    PubMed Central

    Clouse, Ronald M.; Linchangco, Gregorio V.; Kerr, Alexander M.; Reid, Robert W.; Janies, Daniel A.

    2015-01-01

    Tissue inhibitors of metalloproteinases (TIMPs) help regulate the extracellular matrix (ECM) in animals, mostly by inhibiting matrix metalloproteinases (MMPs). They are important activators of mutable collagenous tissue (MCT), which have been extensively studied in echinoderms, and the four TIMP copies in humans have been studied for their role in cancer. To understand the evolution of TIMPs, we combined 405 TIMPs from an echinoderm transcriptome dataset built from 41 specimens representing all five classes of echinoderms with variants from protostomes and chordates. We used multiple sequence alignment with various stringencies of alignment quality to cull highly divergent sequences and then conducted phylogenetic analyses using both nucleotide and amino acid sequences. Phylogenetic hypotheses consistently recovered TIMPs as diversifying in the ancestral deuterostome and these early lineages continuing to diversify in echinoderms. The four vertebrate TIMPs diversified from a single copy in the ancestral chordate, all other copies being lost. Consistent with greater MCT needs owing to body wall liquefaction, evisceration, autotomy and reproduction by fission, holothuroids had significantly more TIMPs and higher read depths per contig. Ten cysteine residues, an HPQ binding site and several other residues were conserved in at least 70% of all TIMPs. The conservation of binding sites and the placement of echinoderm TIMPs involved in MCT modification suggest that ECM regulation remains the primary function of TIMP genes, although within this role there are a large number of specialized copies. PMID:27017967

  15. Genomic Microbial Epidemiology Is Needed to Comprehend the Global Problem of Antibiotic Resistance and to Improve Pathogen Diagnosis.

    PubMed

    Wyrsch, Ethan R; Roy Chowdhury, Piklu; Chapman, Toni A; Charles, Ian G; Hammond, Jeffrey M; Djordjevic, Steven P

    2016-01-01

    Contamination of waste effluent from hospitals and intensive food animal production with antimicrobial residues is an immense global problem. Antimicrobial residues exert selection pressures that influence the acquisition of antimicrobial resistance and virulence genes in diverse microbial populations. Despite these concerns there is only a limited understanding of how antimicrobial residues contribute to the global problem of antimicrobial resistance. Furthermore, rapid detection of emerging bacterial pathogens and strains with resistance to more than one antibiotic class remains a challenge. A comprehensive, sequence-based genomic epidemiological surveillance model that captures essential microbial metadata is needed, both to improve surveillance for antimicrobial resistance and to monitor pathogen evolution. Escherichia coli is an important pathogen causing both intestinal [intestinal pathogenic E. coli (IPEC)] and extraintestinal [extraintestinal pathogenic E. coli (ExPEC)] disease in humans and food animals. ExPEC are the most frequently isolated Gram negative pathogen affecting human health, linked to food production practices and are often resistant to multiple antibiotics. Cattle are a known reservoir of IPEC but they are not recognized as a source of ExPEC that impact human or animal health. In contrast, poultry are a recognized source of multiple antibiotic resistant ExPEC, while swine have received comparatively less attention in this regard. Here, we review what is known about ExPEC in swine and how pig production contributes to the problem of antibiotic resistance.

  16. Identification and Application of Neutralizing Epitopes of Human Adenovirus Type 55 Hexon Protein

    PubMed Central

    Tian, Xingui; Ma, Qiang; Jiang, Zaixue; Huang, Junfeng; Liu, Qian; Lu, Xiaomei; Luo, Qingming; Zhou, Rong

    2015-01-01

    Human adenovirus type 55 (HAdV55) is a newly identified re-emergent acute respiratory disease (ARD) pathogen with a proposed recombination of hexon gene between HAdV11 and HAdV14 strains. The identification of the neutralizing epitopes is important for the surveillance and vaccine development against HAdV55 infection. In this study, four type-specific epitope peptides of HAdV55 hexon protein, A55R1 (residues 138 to 152), A55R2 (residues 179 to 187), A55R4 (residues 247 to 259) and A55R7 (residues 429 to 443), were predicted by multiple sequence alignment and homology modeling methods, and then confirmed with synthetic peptides by enzyme-linked immunosorbent assay (ELISA) and neutralization tests (NT). Finally, the A55R2 was incorporated into human adenoviruses 3 (HAdV3) and a chimeric adenovirus rAd3A55R2 was successfully obtained. The chimeric rAd3A55R2 could induce neutralizing antibodies against both HAdV3 and HAdV55. This current study will contribute to the development of novel adenovirus vaccine candidate and adenovirus structural analysis. PMID:26516903

  17. A conserved mechanism for replication origin recognition and binding in archaea.

    PubMed

    Majerník, Alan I; Chong, James P J

    2008-01-15

    To date, methanogens are the only group within the archaea where firing DNA replication origins have not been demonstrated in vivo. In the present study we show that a previously identified cluster of ORB (origin recognition box) sequences do indeed function as an origin of replication in vivo in the archaeon Methanothermobacter thermautotrophicus. Although the consensus sequence of ORBs in M. thermautotrophicus is somewhat conserved when compared with ORB sequences in other archaea, the Cdc6-1 protein from M. thermautotrophicus (termed MthCdc6-1) displays sequence-specific binding that is selective for the MthORB sequence and does not recognize ORBs from other archaeal species. Stabilization of in vitro MthORB DNA binding by MthCdc6-1 requires additional conserved sequences 3' to those originally described for M. thermautotrophicus. By testing synthetic sequences bearing mutations in the MthORB consensus sequence, we show that Cdc6/ORB binding is critically dependent on the presence of an invariant guanine found in all archaeal ORB sequences. Mutation of a universally conserved arginine residue in the recognition helix of the winged helix domain of archaeal Cdc6-1 shows that specific origin sequence recognition is dependent on the interaction of this arginine residue with the invariant guanine. Recognition of a mutated origin sequence can be achieved by mutation of the conserved arginine residue to a lysine or glutamine residue. Thus despite a number of differences in protein and DNA sequences between species, the mechanism of origin recognition and binding appears to be conserved throughout the archaea.

  18. Identification, expression, and taxonomic distribution of alternative oxidases in non-angiosperm plants.

    PubMed

    Neimanis, Karina; Staples, James F; Hüner, Norman P A; McDonald, Allison E

    2013-09-10

    Alternative oxidase (AOX) is a terminal ubiquinol oxidase present in the respiratory chain of all angiosperms investigated to date, but AOX distribution in other members of the Viridiplantae is less clear. We assessed the taxonomic distribution of AOX using bioinformatics. Multiple sequence alignments compared AOX proteins and examined amino acid residues involved in AOX catalytic function and post-translational regulation. Novel AOX sequences were found in both Chlorophytes and Streptophytes and we conclude that AOX is widespread in the Viridiplantae. AOX multigene families are common in non-angiosperm plants and the appearance of AOX1 and AOX2 subtypes pre-dates the divergence of the Coniferophyta and Magnoliophyta. Residues involved in AOX catalytic function are highly conserved between Chlorophytes and Streptophytes, while AOX post-translational regulation likely differs in these two lineages. We demonstrate experimentally that an AOX gene is present in the moss Physcomitrella patens and that the gene is transcribed. Our findings suggest that AOX will likely exert an influence on plant respiration and carbon metabolism in non-angiosperms such as green algae, bryophytes, liverworts, lycopods, ferns, gnetophytes, and gymnosperms and that further research in these systems is required. Copyright © 2013 Elsevier B.V. All rights reserved.

  19. Characterization and application of enterocin RM6, a bacteriocin from Enterococcus faecalis.

    PubMed

    Huang, En; Zhang, Liwen; Chung, Yoon-Kyung; Zheng, Zuoxing; Yousef, Ahmed E

    2013-01-01

    Use of bacteriocins in food preservation has received great attention in recent years. The goal of this study is to characterize enterocin RM6 from Enterococcus faecalis OSY-RM6 and investigate its efficacy against Listeria monocytogenes in cottage cheese. Enterocin RM6 was purified from E. faecalis culture supernatant using ion exchange column, multiple C18-silica cartridges, followed by reverse-phase high-performance liquid chromatography. The molecular weight of enterocin RM6 is 7145.0823 as determined by mass spectrometry (MS). Tandem mass spectrometry (MS/MS) analysis revealed that enterocin RM6 is a 70-residue cyclic peptide with a head-to-tail linkage between methionine and tryptophan residues. The peptide sequence of enterocin RM6 was further confirmed by sequencing the structural gene of the peptide. Enterocin RM6 is active against Gram-positive bacteria, including L. monocytogenes, Bacillus cereus, and methicillin-resistant Staphylococcus aureus (MRSA). Enterocin RM6 (final concentration in cottage cheese, 80 AU/mL) caused a 4-log reduction in population of L. monocytogenes inoculated in cottage cheese within 30 min of treatment. Therefore, enterocin RM6 has potential applications as a potent antimicrobial peptide against foodborne pathogens in food.

  20. Characterization and Application of Enterocin RM6, a Bacteriocin from Enterococcus faecalis

    PubMed Central

    Chung, Yoon-Kyung; Yousef, Ahmed E.

    2013-01-01

    Use of bacteriocins in food preservation has received great attention in recent years. The goal of this study is to characterize enterocin RM6 from Enterococcus faecalis OSY-RM6 and investigate its efficacy against Listeria monocytogenes in cottage cheese. Enterocin RM6 was purified from E. faecalis culture supernatant using ion exchange column, multiple C18-silica cartridges, followed by reverse-phase high-performance liquid chromatography. The molecular weight of enterocin RM6 is 7145.0823 as determined by mass spectrometry (MS). Tandem mass spectrometry (MS/MS) analysis revealed that enterocin RM6 is a 70-residue cyclic peptide with a head-to-tail linkage between methionine and tryptophan residues. The peptide sequence of enterocin RM6 was further confirmed by sequencing the structural gene of the peptide. Enterocin RM6 is active against Gram-positive bacteria, including L. monocytogenes, Bacillus cereus, and methicillin-resistant Staphylococcus aureus (MRSA). Enterocin RM6 (final concentration in cottage cheese, 80 AU/mL) caused a 4-log reduction in population of L. monocytogenes inoculated in cottage cheese within 30 min of treatment. Therefore, enterocin RM6 has potential applications as a potent antimicrobial peptide against foodborne pathogens in food. PMID:23844357

  1. Incorporating a guanidine-modified cytosine base into triplex-forming PNAs for the recognition of a C-G pyrimidine–purine inversion site of an RNA duplex

    PubMed Central

    Toh, Desiree-Faye Kaixin; Devi, Gitali; Patil, Kiran M.; Qu, Qiuyu; Maraswami, Manikantha; Xiao, Yunyun; Loh, Teck Peng; Zhao, Yanli; Chen, Gang

    2016-01-01

    RNA duplex regions are often involved in tertiary interactions and protein binding and thus there is great potential in developing ligands that sequence-specifically bind to RNA duplexes. We have developed a convenient synthesis method for a modified peptide nucleic acid (PNA) monomer with a guanidine-modified 5-methyl cytosine base. We demonstrated by gel electrophoresis, fluorescence and thermal melting experiments that short PNAs incorporating the modified residue show high binding affinity and sequence specificity in the recognition of an RNA duplex containing an internal inverted Watson-Crick C-G base pair. Remarkably, the relatively short PNAs show no appreciable binding to DNA duplexes or single-stranded RNAs. The attached guanidine group stabilizes the base triple through hydrogen bonding with the G base in a C-G pair. Selective binding towards an RNA duplex over a single-stranded RNA can be rationalized by the fact that alkylation of the amine of a 5-methyl C base blocks the Watson–Crick edge. PNAs incorporating multiple guanidine-modified cytosine residues are able to enter HeLa cells without any transfection agent. PMID:27596599

  2. Structural studies of polypeptides: Mechanism of immunoglobin catalysis and helix propagation in hybrid sequence, disulfide containing peptides

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Storrs, Richard Wood

    1992-08-01

    Catalytic immunoglobin fragments were studied Nuclear Magnetic Resonance spectroscopy to identify amino acid residues responsible for the catalytic activity. Small, hybrid sequence peptides were analyzed for helix propagation following covalent initiation and for activity related to the protein from which the helical sequence was derived. Hydrolysis of p-nitrophenyl carbonates and esters by specific immunoglobins is thought to involve charge complementarity. The pK of the transition state analog P-nitrophenyl phosphate bound to the immunoglobin fragment was determined by 31P-NMR to verify the juxtaposition of a positively charged amino acid to the binding/catalytic site. Optical studies of immunoglobin mediated photoreversal of cis,more » syn cyclobutane thymine dimers implicated tryptophan as the photosensitizing chromophore. Research shows the chemical environment of a single tryptophan residue is altered upon binding of the thymine dimer. This tryptophan residue was localized to within 20 Å of the binding site through the use of a nitroxide paramagnetic species covalently attached to the thymine dimer. A hybrid sequence peptide was synthesized based on the bee venom peptide apamin in which the helical residues of apamin were replaced with those from the recognition helix of the bacteriophage 434 repressor protein. Oxidation of the disufide bonds occured uniformly in the proper 1-11, 3-15 orientation, stabilizing the 434 sequence in an α-helix. The glycine residue stopped helix propagation. Helix propagation in 2,2,2-trifluoroethanol mixtures was investigated in a second hybrid sequence peptide using the apamin-derived disulfide scaffold and the S-peptide sequence. The helix-stop signal previously observed was not observed in the NMR NOESY spectrum. Helical connectivities were seen throughout the S-peptide sequence. The apamin/S-peptide hybrid binded to the S-protein (residues 21-166 of ribonuclease A) and reconstituted enzymatic activity.« less

  3. Structural studies of polypeptides: Mechanism of immunoglobin catalysis and helix propagation in hybrid sequence, disulfide containing peptides

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Storrs, R.W.

    1992-08-01

    Catalytic immunoglobin fragments were studied Nuclear Magnetic Resonance spectroscopy to identify amino acid residues responsible for the catalytic activity. Small, hybrid sequence peptides were analyzed for helix propagation following covalent initiation and for activity related to the protein from which the helical sequence was derived. Hydrolysis of p-nitrophenyl carbonates and esters by specific immunoglobins is thought to involve charge complementarity. The pK of the transition state analog P-nitrophenyl phosphate bound to the immunoglobin fragment was determined by [sup 31]P-NMR to verify the juxtaposition of a positively charged amino acid to the binding/catalytic site. Optical studies of immunoglobin mediated photoreversal ofmore » cis, syn cyclobutane thymine dimers implicated tryptophan as the photosensitizing chromophore. Research shows the chemical environment of a single tryptophan residue is altered upon binding of the thymine dimer. This tryptophan residue was localized to within 20 [Angstrom] of the binding site through the use of a nitroxide paramagnetic species covalently attached to the thymine dimer. A hybrid sequence peptide was synthesized based on the bee venom peptide apamin in which the helical residues of apamin were replaced with those from the recognition helix of the bacteriophage 434 repressor protein. Oxidation of the disufide bonds occured uniformly in the proper 1-11, 3-15 orientation, stabilizing the 434 sequence in an [alpha]-helix. The glycine residue stopped helix propagation. Helix propagation in 2,2,2-trifluoroethanol mixtures was investigated in a second hybrid sequence peptide using the apamin-derived disulfide scaffold and the S-peptide sequence. The helix-stop signal previously observed was not observed in the NMR NOESY spectrum. Helical connectivities were seen throughout the S-peptide sequence. The apamin/S-peptide hybrid binded to the S-protein (residues 21-166 of ribonuclease A) and reconstituted enzymatic activity.« less

  4. The cDNA sequence of mouse Pgp-1 and homology to human CD44 cell surface antigen and proteoglycan core/link proteins.

    PubMed

    Wolffe, E J; Gause, W C; Pelfrey, C M; Holland, S M; Steinberg, A D; August, J T

    1990-01-05

    We describe the isolation and sequencing of a cDNA encoding mouse Pgp-1. An oligonucleotide probe corresponding to the NH2-terminal sequence of the purified protein was synthesized by the polymerase chain reaction and used to screen a mouse macrophage lambda gt11 library. A cDNA clone with an insert of 1.2 kilobases was selected and sequenced. In Northern blot analysis, only cells expressing Pgp-1 contained mRNA species that hybridized with this Pgp-1 cDNA. The nucleotide sequence of the cDNA has a single open reading frame that yields a protein-coding sequence of 1076 base pairs followed by a 132-base pair 3'-untranslated sequence that includes a putative polyadenylation signal but no poly(A) tail. The translated sequence comprises a 13-amino acid signal peptide followed by a polypeptide core of 345 residues corresponding to an Mr of 37,800. Portions of the deduced amino acid sequence were identical to those obtained by amino acid sequence analysis from the purified glycoprotein, confirming that the cDNA encodes Pgp-1. The predicted structure of Pgp-1 includes an NH2-terminal extracellular domain (residues 14-265), a transmembrane domain (residues 266-286), and a cytoplasmic tail (residues 287-358). Portions of the mouse Pgp-1 sequence are highly similar to that of the human CD44 cell surface glycoprotein implicated in cell adhesion. The protein also shows sequence similarity to the proteoglycan tandem repeat sequences found in cartilage link protein and cartilage proteoglycan core protein which are thought to be involved in binding to hyaluronic acid.

  5. A tool for calculating binding-site residues on proteins from PDB structures.

    PubMed

    Hu, Jing; Yan, Changhui

    2009-08-03

    In the research on protein functional sites, researchers often need to identify binding-site residues on a protein. A commonly used strategy is to find a complex structure from the Protein Data Bank (PDB) that consists of the protein of interest and its interacting partner(s) and calculate binding-site residues based on the complex structure. However, since a protein may participate in multiple interactions, the binding-site residues calculated based on one complex structure usually do not reveal all binding sites on a protein. Thus, this requires researchers to find all PDB complexes that contain the protein of interest and combine the binding-site information gleaned from them. This process is very time-consuming. Especially, combing binding-site information obtained from different PDB structures requires tedious work to align protein sequences. The process becomes overwhelmingly difficult when researchers have a large set of proteins to analyze, which is usually the case in practice. In this study, we have developed a tool for calculating binding-site residues on proteins, TCBRP http://yanbioinformatics.cs.usu.edu:8080/ppbindingsubmit. For an input protein, TCBRP can quickly find all binding-site residues on the protein by automatically combining the information obtained from all PDB structures that consist of the protein of interest. Additionally, TCBRP presents the binding-site residues in different categories according to the interaction type. TCBRP also allows researchers to set the definition of binding-site residues. The developed tool is very useful for the research on protein binding site analysis and prediction.

  6. Conformational dynamics of a short antigenic peptide in its free and antibody bound forms gives insight into the role of β-turns in peptide immunogenicity.

    PubMed

    Shukla, Rashmi Tambe; Sasidhar, Yellamraju U

    2015-07-01

    Earlier immunological experiments with a synthetic 36-residue peptide (75-110) from Influenza hemagglutinin have been shown to elicit anti-peptide antibodies (Ab) which could cross-react with the parent protein. In this article, we have studied the conformational features of a short antigenic (Ag) peptide ((98)YPYDVPDYASLRS(110)) from Influenza hemagglutinin in its free and antibody (Ab) bound forms with molecular dynamics simulations using GROMACS package and OPLS-AA/L all-atom force field at two different temperatures (293 K and 310 K). Multiple simulations for the free Ag peptide show sampling of ordered conformations and suggest different conformational preferences of the peptide at the two temperatures. The free Ag samples a conformation crucial for Ab binding (β-turn formed by "DYAS" sequence) with greater preference at 310 K while, it samples a native-like conformation with relatively greater propensity at 293 K. The sequence "DYAS" samples β-turn conformation with greater propensity at 310 K as part of the hemagglutinin protein also. The bound Ag too samples the β-turn involving "DYAS" sequence and in addition it also samples a β-turn formed by the sequence "YPYD" at its N-terminus, which seems to be induced upon binding to the Ab. Further, the bound Ag displays conformational flexibility at both 293 K and 310 K, particularly at terminal residues. The implications of these results for peptide immunogenicity and Ag-Ab recognition are discussed. © 2015 Wiley Periodicals, Inc.

  7. The zinc fingers of YY1 bind single-stranded RNA with low sequence specificity.

    PubMed

    Wai, Dorothy C C; Shihab, Manar; Low, Jason K K; Mackay, Joel P

    2016-11-02

    Classical zinc fingers (ZFs) are traditionally considered to act as sequence-specific DNA-binding domains. More recently, classical ZFs have been recognised as potential RNA-binding modules, raising the intriguing possibility that classical-ZF transcription factors are involved in post-transcriptional gene regulation via direct RNA binding. To date, however, only one classical ZF-RNA complex, that involving TFIIIA, has been structurally characterised. Yin Yang-1 (YY1) is a multi-functional transcription factor involved in many regulatory processes, and binds DNA via four classical ZFs. Recent evidence suggests that YY1 also interacts with RNA, but the molecular nature of the interaction remains unknown. In the present work, we directly assess the ability of YY1 to bind RNA using in vitro assays. Systematic Evolution of Ligands by EXponential enrichment (SELEX) was used to identify preferred RNA sequences bound by the YY1 ZFs from a randomised library over multiple rounds of selection. However, a strong motif was not consistently recovered, suggesting that the RNA sequence selectivity of these domains is modest. YY1 ZF residues involved in binding to single-stranded RNA were identified by NMR spectroscopy and found to be largely distinct from the set of residues involved in DNA binding, suggesting that interactions between YY1 and ssRNA constitute a separate mode of nucleic acid binding. Our data are consistent with recent reports that YY1 can bind to RNA in a low-specificity, yet physiologically relevant manner. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  8. Sequence search on a supercomputer.

    PubMed

    Gotoh, O; Tagashira, Y

    1986-01-10

    A set of programs was developed for searching nucleic acid and protein sequence data bases for sequences similar to a given sequence. The programs, written in FORTRAN 77, were optimized for vector processing on a Hitachi S810-20 supercomputer. A search of a 500-residue protein sequence against the entire PIR data base Ver. 1.0 (1) (0.5 M residues) is carried out in a CPU time of 45 sec. About 4 min is required for an exhaustive search of a 1500-base nucleotide sequence against all mammalian sequences (1.2M bases) in Genbank Ver. 29.0. The CPU time is reduced to about a quarter with a faster version.

  9. Self-Complementarity within Proteins: Bridging the Gap between Binding and Folding

    PubMed Central

    Basu, Sankar; Bhattacharyya, Dhananjay; Banerjee, Rahul

    2012-01-01

    Complementarity, in terms of both shape and electrostatic potential, has been quantitatively estimated at protein-protein interfaces and used extensively to predict the specific geometry of association between interacting proteins. In this work, we attempted to place both binding and folding on a common conceptual platform based on complementarity. To that end, we estimated (for the first time to our knowledge) electrostatic complementarity (Em) for residues buried within proteins. Em measures the correlation of surface electrostatic potential at protein interiors. The results show fairly uniform and significant values for all amino acids. Interestingly, hydrophobic side chains also attain appreciable complementarity primarily due to the trajectory of the main chain. Previous work from our laboratory characterized the surface (or shape) complementarity (Sm) of interior residues, and both of these measures have now been combined to derive two scoring functions to identify the native fold amid a set of decoys. These scoring functions are somewhat similar to functions that discriminate among multiple solutions in a protein-protein docking exercise. The performances of both of these functions on state-of-the-art databases were comparable if not better than most currently available scoring functions. Thus, analogously to interfacial residues of protein chains associated (docked) with specific geometry, amino acids found in the native interior have to satisfy fairly stringent constraints in terms of both Sm and Em. The functions were also found to be useful for correctly identifying the same fold for two sequences with low sequence identity. Finally, inspired by the Ramachandran plot, we developed a plot of Sm versus Em (referred to as the complementarity plot) that identifies residues with suboptimal packing and electrostatics which appear to be correlated to coordinate errors. PMID:22713576

  10. Novel Rhizosphere Soil Alleles for the Enzyme 1-Aminocyclopropane-1-Carboxylate Deaminase Queried for Function with an In Vivo Competition Assay

    PubMed Central

    Jin, Zhao; Di Rienzi, Sara C.; Janzon, Anders; Werner, Jeff J.; Angenent, Largus T.; Dangl, Jeffrey L.; Fowler, Douglas M.

    2015-01-01

    Metagenomes derived from environmental microbiota encode a vast diversity of protein homologs. How this diversity impacts protein function can be explored through selection assays aimed to optimize function. While artificially generated gene sequence pools are typically used in selection assays, their usage may be limited because of technical or ethical reasons. Here, we investigate an alternative strategy, the use of soil microbial DNA as a starting point. We demonstrate this approach by optimizing the function of a widely occurring soil bacterial enzyme, 1-aminocyclopropane-1-carboxylate (ACC) deaminase. We identified a specific ACC deaminase domain region (ACCD-DR) that, when PCR amplified from the soil, produced a variant pool that we could swap into functional plasmids carrying ACC deaminase-encoding genes. Functional clones of ACC deaminase were selected for in a competition assay based on their capacity to provide nitrogen to Escherichia coli in vitro. The most successful ACCD-DR variants were identified after multiple rounds of selection by sequence analysis. We observed that previously identified essential active-site residues were fixed in the original unselected library and that additional residues went to fixation after selection. We identified a divergent essential residue whose presence hints at the possible use of alternative substrates and a cluster of neutral residues that did not influence ACCD performance. Using an artificial ACCD-DR variant library generated by DNA oligomer synthesis, we validated the same fixation patterns. Our study demonstrates that soil metagenomes are useful starting pools of protein-coding-gene diversity that can be utilized for protein optimization and functional characterization when synthetic libraries are not appropriate. PMID:26637602

  11. Self-complementarity within proteins: bridging the gap between binding and folding.

    PubMed

    Basu, Sankar; Bhattacharyya, Dhananjay; Banerjee, Rahul

    2012-06-06

    Complementarity, in terms of both shape and electrostatic potential, has been quantitatively estimated at protein-protein interfaces and used extensively to predict the specific geometry of association between interacting proteins. In this work, we attempted to place both binding and folding on a common conceptual platform based on complementarity. To that end, we estimated (for the first time to our knowledge) electrostatic complementarity (Em) for residues buried within proteins. Em measures the correlation of surface electrostatic potential at protein interiors. The results show fairly uniform and significant values for all amino acids. Interestingly, hydrophobic side chains also attain appreciable complementarity primarily due to the trajectory of the main chain. Previous work from our laboratory characterized the surface (or shape) complementarity (Sm) of interior residues, and both of these measures have now been combined to derive two scoring functions to identify the native fold amid a set of decoys. These scoring functions are somewhat similar to functions that discriminate among multiple solutions in a protein-protein docking exercise. The performances of both of these functions on state-of-the-art databases were comparable if not better than most currently available scoring functions. Thus, analogously to interfacial residues of protein chains associated (docked) with specific geometry, amino acids found in the native interior have to satisfy fairly stringent constraints in terms of both Sm and Em. The functions were also found to be useful for correctly identifying the same fold for two sequences with low sequence identity. Finally, inspired by the Ramachandran plot, we developed a plot of Sm versus Em (referred to as the complementarity plot) that identifies residues with suboptimal packing and electrostatics which appear to be correlated to coordinate errors. Copyright © 2012 Biophysical Society. Published by Elsevier Inc. All rights reserved.

  12. Coiled-coil destabilizing residues in the group A Streptococcus M1 protein are required for functional interaction.

    PubMed

    Stewart, Chelsea M; Buffalo, Cosmo Z; Valderrama, J Andrés; Henningham, Anna; Cole, Jason N; Nizet, Victor; Ghosh, Partho

    2016-08-23

    The sequences of M proteins, the major surface-associated virulence factors of the widespread bacterial pathogen group A Streptococcus, are antigenically variable but have in common a strong propensity to form coiled coils. Paradoxically, these sequences are also replete with coiled-coil destabilizing residues. These features are evident in the irregular coiled-coil structure and thermal instability of M proteins. We present an explanation for this paradox through studies of the B repeats of the medically important M1 protein. The B repeats are required for interaction of M1 with fibrinogen (Fg) and consequent proinflammatory activation. The B repeats sample multiple conformations, including intrinsically disordered, dissociated, as well as two alternate coiled-coil conformations: a Fg-nonbinding register 1 and a Fg-binding register 2. Stabilization of M1 in the Fg-nonbinding register 1 resulted in attenuation of Fg binding as expected, but counterintuitively, so did stabilization in the Fg-binding register 2. Strikingly, these register-stabilized M1 proteins gained the ability to bind Fg when they were destabilized by a chaotrope. These results indicate that M1 stability is antithetical to Fg interaction and that M1 conformational dynamics, as specified by destabilizing residues, are essential for interaction. A "capture-and-collapse" model of association accounts for these observations, in which M1 captures Fg through a dynamic conformation and then collapses into a register 2-coiled coil as a result of stabilization provided by binding energy. Our results support the general conclusion that destabilizing residues are evolutionarily conserved in M proteins to enable functional interactions necessary for pathogenesis.

  13. Saccharomyces cerevisiae SSB1 protein and its relationship to nucleolar RNA-binding proteins.

    PubMed

    Jong, A Y; Clark, M W; Gilbert, M; Oehm, A; Campbell, J L

    1987-08-01

    To better define the function of Saccharomyces cerevisiae SSB1, an abundant single-stranded nucleic acid-binding protein, we determined the nucleotide sequence of the SSB1 gene and compared it with those of other proteins of known function. The amino acid sequence contains 293 amino acid residues and has an Mr of 32,853. There are several stretches of sequence characteristic of other eucaryotic single-stranded nucleic acid-binding proteins. At the amino terminus, residues 39 to 54 are highly homologous to a peptide in calf thymus UP1 and UP2 and a human heterogeneous nuclear ribonucleoprotein. Residues 125 to 162 constitute a fivefold tandem repeat of the sequence RGGFRG, the composition of which suggests a nucleic acid-binding site. Near the C terminus, residues 233 to 245 are homologous to several RNA-binding proteins. Of 18 C-terminal residues, 10 are acidic, a characteristic of the procaryotic single-stranded DNA-binding proteins and eucaryotic DNA- and RNA-binding proteins. In addition, examination of the subcellular distribution of SSB1 by immunofluorescence microscopy indicated that SSB1 is a nuclear protein, predominantly located in the nucleolus. Sequence homologies and the nucleolar localization make it likely that SSB1 functions in RNA metabolism in vivo, although an additional role in DNA metabolism cannot be excluded.

  14. Isolation and characterization of full-length putative alcohol dehydrogenase genes from polygonum minus

    NASA Astrophysics Data System (ADS)

    Hamid, Nur Athirah Abd; Ismail, Ismanizan

    2013-11-01

    Polygonum minus, locally named as Kesum is an aromatic herb which is high in secondary metabolite content. Alcohol dehydrogenase is an important enzyme that catalyzes the reversible oxidation of alcohol and aldehyde with the presence of NAD(P)(H) as co-factor. The main focus of this research is to identify the gene of ADH. The total RNA was extracted from leaves of P. minus which was treated with 150 μM Jasmonic acid. Full-length cDNA sequence of ADH was isolated via rapid amplification cDNA end (RACE). Subsequently, in silico analysis was conducted on the full-length cDNA sequence and PCR was done on genomic DNA to determine the exon and intron organization. Two sequences of ADH, designated as PmADH1 and PmADH2 were successfully isolated. Both sequences have ORF of 801 bp which encode 266 aa residues. Nucleotide sequence comparison of PmADH1 and PmADH2 indicated that both sequences are highly similar at the ORF region but divergent in the 3' untranslated regions (UTR). The amino acid is differ at the 107 residue; PmADH1 contains Gly (G) residue while PmADH2 contains Cys (C) residue. The intron-exon organization pattern of both sequences are also same, with 3 introns and 4 exons. Based on in silico analysis, both sequences contain "classical" short chain alcohol dehydrogenases/reductases ((c) SDRs) conserved domain. The results suggest that both sequences are the members of short chain alcohol dehydrogenase family.

  15. The primary structure of stinging nettle (Urtica dioica) agglutinin. A two-domain member of the hevein family.

    PubMed

    Beintema, J J; Peumans, W J

    1992-03-09

    The primary structure of stinging nettle (Urtica dioica) agglutinin has been determined by sequence analysis of peptides obtained from three overlapping proteolytic digests. The sequence of 80 residues consists of two hevein-like domains with the same spacing of half-cystine residues and several other conserved residues as observed earlier in other proteins with hevein-like domains. The hinge region between the two domains is four residues longer than those between the four domains in cereal lectins like wheat germ agglutinin.

  16. The SUPERFAMILY database in 2004: additions and improvements.

    PubMed

    Madera, Martin; Vogel, Christine; Kummerfeld, Sarah K; Chothia, Cyrus; Gough, Julian

    2004-01-01

    The SUPERFAMILY database provides structural assignments to protein sequences and a framework for analysis of the results. At the core of the database is a library of profile Hidden Markov Models that represent all proteins of known structure. The library is based on the SCOP classification of proteins: each model corresponds to a SCOP domain and aims to represent an entire superfamily. We have applied the library to predicted proteins from all completely sequenced genomes (currently 154), the Swiss-Prot and TrEMBL databases and other sequence collections. Close to 60% of all proteins have at least one match, and one half of all residues are covered by assignments. All models and full results are available for download and online browsing at http://supfam.org. Users can study the distribution of their superfamily of interest across all completely sequenced genomes, investigate with which other superfamilies it combines and retrieve proteins in which it occurs. Alternatively, concentrating on a particular genome as a whole, it is possible first, to find out its superfamily composition, and secondly, to compare it with that of other genomes to detect superfamilies that are over- or under-represented. In addition, the webserver provides the following standard services: sequence search; keyword search for genomes, superfamilies and sequence identifiers; and multiple alignment of genomic, PDB and custom sequences.

  17. A Quantitative Tool to Distinguish Isobaric Leucine and Isoleucine Residues for Mass Spectrometry-Based De Novo Monoclonal Antibody Sequencing

    NASA Astrophysics Data System (ADS)

    Poston, Chloe N.; Higgs, Richard E.; You, Jinsam; Gelfanova, Valentina; Hale, John E.; Knierman, Michael D.; Siegel, Robert; Gutierrez, Jesus A.

    2014-07-01

    De novo sequencing by mass spectrometry (MS) allows for the determination of the complete amino acid (AA) sequence of a given protein based on the mass difference of detected ions from MS/MS fragmentation spectra. The technique relies on obtaining specific masses that can be attributed to characteristic theoretical masses of AAs. A major limitation of de novo sequencing by MS is the inability to distinguish between the isobaric residues leucine (Leu) and isoleucine (Ile). Incorrect identification of Ile as Leu or vice versa often results in loss of activity in recombinant antibodies. This functional ambiguity is commonly resolved with costly and time-consuming AA mutation and peptide sequencing experiments. Here, we describe a set of orthogonal biochemical protocols, which experimentally determine the identity of Ile or Leu residues in monoclonal antibodies (mAb) based on the selectivity that leucine aminopeptidase shows for n-terminal Leu residues and the cleavage preference for Leu by chymotrypsin. The resulting observations are combined with germline frequencies and incorporated into a logistic regression model, called Predictor for Xle Sites (PXleS) to provide a statistical likelihood for the identity of Leu at an ambiguous site. We demonstrate that PXleS can generate a probability for an Xle site in mAbs with 96% accuracy. The implementation of PXleS precludes the expression of several possible sequences and, therefore, reduces the overall time and resources required to go from spectra generation to a biologically active sequence for a mAb when an Ile or Leu residue is in question.

  18. A quantitative tool to distinguish isobaric leucine and isoleucine residues for mass spectrometry-based de novo monoclonal antibody sequencing.

    PubMed

    Poston, Chloe N; Higgs, Richard E; You, Jinsam; Gelfanova, Valentina; Hale, John E; Knierman, Michael D; Siegel, Robert; Gutierrez, Jesus A

    2014-07-01

    De novo sequencing by mass spectrometry (MS) allows for the determination of the complete amino acid (AA) sequence of a given protein based on the mass difference of detected ions from MS/MS fragmentation spectra. The technique relies on obtaining specific masses that can be attributed to characteristic theoretical masses of AAs. A major limitation of de novo sequencing by MS is the inability to distinguish between the isobaric residues leucine (Leu) and isoleucine (Ile). Incorrect identification of Ile as Leu or vice versa often results in loss of activity in recombinant antibodies. This functional ambiguity is commonly resolved with costly and time-consuming AA mutation and peptide sequencing experiments. Here, we describe a set of orthogonal biochemical protocols, which experimentally determine the identity of Ile or Leu residues in monoclonal antibodies (mAb) based on the selectivity that leucine aminopeptidase shows for n-terminal Leu residues and the cleavage preference for Leu by chymotrypsin. The resulting observations are combined with germline frequencies and incorporated into a logistic regression model, called Predictor for Xle Sites (PXleS) to provide a statistical likelihood for the identity of Leu at an ambiguous site. We demonstrate that PXleS can generate a probability for an Xle site in mAbs with 96% accuracy. The implementation of PXleS precludes the expression of several possible sequences and, therefore, reduces the overall time and resources required to go from spectra generation to a biologically active sequence for a mAb when an Ile or Leu residue is in question.

  19. Exploring substrate binding and discrimination in fructose1, 6-bisphosphate and tagatose 1,6-bisphosphate aldolases.

    PubMed

    Zgiby, S M; Thomson, G J; Qamar, S; Berry, A

    2000-03-01

    Fructose 1,6-bisphosphate aldolase catalyses the reversible condensation of glycerone-P and glyceraldehyde 3-phosphate into fructose 1,6-bisphosphate. A recent structure of the Escherichia coli Class II fructose 1,6-bisphosphate aldolase [Hall, D.R., Leonard, G.A., Reed, C.D., Watt, C.I., Berry, A. & Hunter, W.N. (1999) J. Mol. Biol. 287, 383-394] in the presence of the transition state analogue phosphoglycolohydroxamate delineated the roles of individual amino acids in binding glycerone-P and in the initial proton abstraction steps of the mechanism. The X-ray structure has now been used, together with sequence alignments, site-directed mutagenesis and steady-state enzyme kinetics to extend these studies to map important residues in the binding of glyceraldehyde 3-phosphate. From these studies three residues (Asn35, Ser61 and Lys325) have been identified as important in catalysis. We show that mutation of Ser61 to alanine increases the Km value for fructose 1, 6-bisphosphate 16-fold and product inhibition studies indicate that this effect is manifested most strongly in the glyceraldehyde 3-phosphate binding pocket of the active site, demonstrating that Ser61 is involved in binding glyceraldehyde 3-phosphate. In contrast a S61T mutant had no effect on catalysis emphasizing the importance of an hydroxyl group for this role. Mutation of Asn35 (N35A) resulted in an enzyme with only 1.5% of the activity of the wild-type enzyme and different partial reactions indicate that this residue effects the binding of both triose substrates. Finally, mutation of Lys325 has a greater effect on catalysis than on binding, however, given the magnitude of the effects it is likely that it plays an indirect role in maintaining other critical residues in a catalytically competent conformation. Interestingly, despite its proximity to the active site and high sequence conservation, replacement of a fourth residue, Gln59 (Q59A) had no significant effect on the function of the enzyme. In a separate study to characterize the molecular basis of aldolase specificity, the agaY-encoded tagatose 1,6-bisphosphate aldolase of E. coli was cloned, expressed and kinetically characterized. Our studies showed that the two aldolases are highly discriminating between the diastereoisomers fructose bisphosphate and tagatose bisphosphate, each enzyme preferring its cognate substrate by a factor of 300-1500-fold. This produces an overall discrimination factor of almost 5 x 105 between the two enzymes. Using the X-ray structure of the fructose 1,6-bisphosphate aldolase and multiple sequence alignments, several residues were identified, which are highly conserved and are in the vicinity of the active site. These residues might potentially be important in substrate recognition. As a consequence, nine mutations were made in attempts to switch the specificity of the fructose 1,6-bisphosphate aldolase to that of the tagatose 1,6-bisphosphate aldolase and the effect on substrate discrimination was evaluated. Surprisingly, despite making multiple changes in the active site, many of which abolished fructose 1, 6-bisphosphate aldolase activity, no switch in specificity was observed. This highlights the complexity of enzyme catalysis in this family of enzymes, and points to the need for further structural studies before we fully understand the subtleties of the shaping of the active site for complementarity to the cognate substrate.

  20. Folding and Function of a T4 Lysozyme Containing 10 Consecutive Alanines Illustrate the Redundancy of Information in an Amino Acid Sequence

    NASA Astrophysics Data System (ADS)

    Heinz, Dirk W.; Baase, Walt A.; Matthews, Brian W.

    1992-05-01

    Single and multiple Xaa -> Ala substitutions were constructed in the α-helix comprising residues 39-50 in bacteriophage T4 lysozyme. The variant with alanines at 10 consecutive positions (A40-49) folds normally and has activity essentially the same as wild type, although it is less stable. The crystal structure of this polyalanine mutant displays no significant change in the main-chain atoms of the helix when compared with the wild-type structure. The individual substitutions of the solvent-exposed residues Asn-40, Ser-44, and Glu-45 with alanine tend to increase the thermostability of the protein, whereas replacements of the buried or partially buried residues Lys-43 and Leu-46 are destabilizing. The melting temperature of the lysozyme in which Lys-43 and Leu-46 are retained and positions 40, 44, 45, 47, and 48 are substituted with alanine (i.e., A40-42/44-45/47-49) is increased by 3.1^circC relative to wild type at pH 3.0, but reduced by 1.6^circC at pH 6.7. In the case of the charged amino acids Glu-45 and Lys-48, the changes in melting temperature indicate that the putative salt bridge between these two residues contributes essentially nothing to the stability of the protein. The results clearly demonstrate that there is considerable redundancy in the sequence information in the polypeptide chain; not every amino acid is essential for folding. Also, further evidence is provided that the replacement of fully solvent-exposed residues within α-helices with alanines may be a general way to increase protein stability. The general approach may permit a simplification of the protein folding problem by retaining only amino acids proven to be essential for folding and replacing the remainder with alanine.

  1. Isolation and determination of the primary structure of a lectin protein from the serum of the American alligator (Alligator mississippiensis).

    PubMed

    Darville, Lancia N F; Merchant, Mark E; Maccha, Venkata; Siddavarapu, Vivekananda Reddy; Hasan, Azeem; Murray, Kermit K

    2012-02-01

    Mass spectrometry in conjunction with de novo sequencing was used to determine the amino acid sequence of a 35kDa lectin protein isolated from the serum of the American alligator that exhibits binding to mannose. The protein N-terminal sequence was determined using Edman degradation and enzymatic digestion with different proteases was used to generate peptide fragments for analysis by liquid chromatography tandem mass spectrometry (LC MS/MS). Separate analysis of the protein digests with multiple enzymes enhanced the protein sequence coverage. De novo sequencing was accomplished using MASCOT Distiller and PEAKS software and the sequences were searched against the NCBI database using MASCOT and BLAST to identify homologous peptides. MS analysis of the intact protein indicated that it is present primarily as monomer and dimer in vitro. The isolated 35kDa protein was ~98% sequenced and found to have 313 amino acids and nine cysteine residues and was identified as an alligator lectin. The alligator lectin sequence was aligned with other lectin sequences using DIALIGN and ClustalW software and was found to exhibit 58% and 59% similarity to both human and mouse intelectin-1. The alligator lectin exhibited strong binding affinities toward mannan and mannose as compared to other tested carbohydrates. Copyright © 2011 Elsevier Inc. All rights reserved.

  2. Parallel Implementation of MAFFT on CUDA-Enabled Graphics Hardware.

    PubMed

    Zhu, Xiangyuan; Li, Kenli; Salah, Ahmad; Shi, Lin; Li, Keqin

    2015-01-01

    Multiple sequence alignment (MSA) constitutes an extremely powerful tool for many biological applications including phylogenetic tree estimation, secondary structure prediction, and critical residue identification. However, aligning large biological sequences with popular tools such as MAFFT requires long runtimes on sequential architectures. Due to the ever increasing sizes of sequence databases, there is increasing demand to accelerate this task. In this paper, we demonstrate how graphic processing units (GPUs), powered by the compute unified device architecture (CUDA), can be used as an efficient computational platform to accelerate the MAFFT algorithm. To fully exploit the GPU's capabilities for accelerating MAFFT, we have optimized the sequence data organization to eliminate the bandwidth bottleneck of memory access, designed a memory allocation and reuse strategy to make full use of limited memory of GPUs, proposed a new modified-run-length encoding (MRLE) scheme to reduce memory consumption, and used high-performance shared memory to speed up I/O operations. Our implementation tested in three NVIDIA GPUs achieves speedup up to 11.28 on a Tesla K20m GPU compared to the sequential MAFFT 7.015.

  3. Identification of Mom12 and Mom13, two novel modifier loci of Apc (Min) -mediated intestinal tumorigenesis.

    PubMed

    Crist, Richard C; Roth, Jacquelyn J; Lisanti, Michael P; Siracusa, Linda D; Buchberg, Arthur M

    2011-04-01

    Colorectal cancer is a heterogeneous disease resulting from a combination of genetic and environmental factors. The C57BL/6J (B6) Apc (Min/+) mouse develops polyps throughout the gastrointestinal tract and has been a valuable model for understanding the genetic basis of intestinal tumorigenesis. Apc (Min/+) mice have been used to study known oncogenes and tumor suppressor genes on a controlled genetic background. These studies often utilize congenic knockout alleles, which can carry an unknown amount of residual donor DNA. The Apc (Min) model has also been used to identify modifer loci, known as Modifier of Min (Mom) loci, which alter Apc (Min) -mediated intestinal tumorigenesis. B6 mice carrying a knockout allele generated in WW6 embryonic stem cells were crossed to B6 Apc (Min/+) mice to determine the effect on polyp multiplicity. The newly generated colony developed significantly more intestinal polyps than Apc (Min/+) controls. Polyp multiplicity did not correlate with inheritance of the knockout allele, suggesting the presence of one or more modifier loci segregating in the colony. Genotyping of simple sequence length polymorphism (SSLP) markers revealed residual 129X1/SvJ genomic DNA within the congenic region of the parental knockout line. An analysis of polyp multiplicity data and SSLP genotyping indicated the presence of two Mom loci in the colony: 1) Mom12, a dominant modifier linked to the congenic region on chromosome 6, and 2) Mom13, which is unlinked to the congenic region and whose effect is masked by Mom12. The identification of Mom12 and Mom13 demonstrates the potential problems resulting from residual heterozygosity present in congenic lines.

  4. A proposed OB-fold with a protein-interaction surface in Candida albicans telomerase protein Est3

    PubMed Central

    Yu, Eun Young; Wang, Feng; Lei, Ming; Lue, Neal F

    2008-01-01

    Ever shorter telomeres 3 (Est3) is an essential telomerase regulatory subunit thought to be unique to budding yeasts. Here we use multiple sequence alignment and hidden Markov model–hidden Markov model (HMM-HMM) comparison to uncover potential similarities between Est3 and the mammalian telomeric protein Tpp1. Analysis of site-specific mutants of Candida albicans Est3 revealed functional distinctions between residues that are conserved between Est3 and Tpp1 and those that are unique to Est3. Although both types of residues are important for telomere maintenance in vivo, only the former contributes to telomerase activity in vitro and facilitates the association of Est3 with telomerase core components. Consistent with a function in protein-protein interaction, the residues common to Est3 and Tpp1 map to one face of an OB-fold model structure, away from the canonical nucleic acid binding surface. We propose that Est3 and the OB-fold domain of Tpp1 mediate a conserved function in telomerase regulation. PMID:19172753

  5. Protein structure based prediction of catalytic residues.

    PubMed

    Fajardo, J Eduardo; Fiser, Andras

    2013-02-22

    Worldwide structural genomics projects continue to release new protein structures at an unprecedented pace, so far nearly 6000, but only about 60% of these proteins have any sort of functional annotation. We explored a range of features that can be used for the prediction of functional residues given a known three-dimensional structure. These features include various centrality measures of nodes in graphs of interacting residues: closeness, betweenness and page-rank centrality. We also analyzed the distance of functional amino acids to the general center of mass (GCM) of the structure, relative solvent accessibility (RSA), and the use of relative entropy as a measure of sequence conservation. From the selected features, neural networks were trained to identify catalytic residues. We found that using distance to the GCM together with amino acid type provide a good discriminant function, when combined independently with sequence conservation. Using an independent test set of 29 annotated protein structures, the method returned 411 of the initial 9262 residues as the most likely to be involved in function. The output 411 residues contain 70 of the annotated 111 catalytic residues. This represents an approximately 14-fold enrichment of catalytic residues on the entire input set (corresponding to a sensitivity of 63% and a precision of 17%), a performance competitive with that of other state-of-the-art methods. We found that several of the graph based measures utilize the same underlying feature of protein structures, which can be simply and more effectively captured with the distance to GCM definition. This also has the added the advantage of simplicity and easy implementation. Meanwhile sequence conservation remains by far the most influential feature in identifying functional residues. We also found that due the rapid changes in size and composition of sequence databases, conservation calculations must be recalibrated for specific reference databases.

  6. Modifications to the Foot-and-Mouth Disease Virus 2A Peptide: Influence on Polyprotein Processing and Virus Replication.

    PubMed

    Kjær, Jonas; Belsham, Graham J

    2018-04-15

    Foot-and-mouth disease virus (FMDV) has a positive-sense single-stranded RNA (ssRNA) genome that includes a single, large open reading frame encoding a polyprotein. The cotranslational "cleavage" of this polyprotein at the 2A/2B junction is mediated by the 2A peptide (18 residues in length) using a nonproteolytic mechanism termed "ribosome skipping" or "StopGo." Multiple variants of the 2A polypeptide with this property among the picornaviruses share a conserved C-terminal motif [D(V/I)E(S/T)NPG↓P]. The impact of 2A modifications within this motif on FMDV protein synthesis, polyprotein processing, and virus viability were investigated. Amino acid substitutions are tolerated at residues E 14 , S 15 , and N 16 within the 2A sequences of infectious FMDVs despite their reported "cleavage" efficiencies at the 2A/2B junction of only ca. 30 to 50% compared to that of the wild type (wt). In contrast, no viruses containing substitutions at residue P 17 , G 18 , or P 19 , which displayed little or no "cleavage" activity in vitro , were rescued, but wt revertants were obtained. The 2A substitutions impaired the replication of an FMDV replicon. Using transient-expression assays, it was shown that certain amino acid substitutions at residues E 14 , S 15 , N 16 , and P 19 resulted in partial "cleavage" of a protease-free polyprotein, indicating that these specific residues are not essential for cotranslational "cleavage." Immunofluorescence studies, using full-length FMDV RNA transcripts encoding mutant 2A peptides, indicated that the 2A peptide remained attached to adjacent proteins, presumably 2B. These results show that efficient "cleavage" at the 2A/2B junction is required for optimal virus replication. However, maximal StopGo activity does not appear to be essential for the viability of FMDV. IMPORTANCE Foot-and-mouth disease virus (FMDV) causes one of the most economically important diseases of farm animals. Cotranslational "cleavage" of the FMDV polyprotein precursor at the 2A/2B junction, termed StopGo, is mediated by the short 2A peptide through a nonproteolytic mechanism which leads to release of the nascent protein and continued translation of the downstream sequence. Improved understanding of this process will not only give a better insight into how this peptide influences the FMDV replication cycle but may also assist the application of this sequence in biotechnology for the production of multiple proteins from a single mRNA. Our data show that single amino acid substitutions in the 2A peptide can have a major influence on viral protein synthesis, virus viability, and polyprotein processing. They also indicate that efficient "cleavage" at the 2A/2B junction is required for optimal virus replication. However, maximal StopGo activity is not essential for the viability of FMDV. Copyright © 2018 American Society for Microbiology.

  7. An Amino Acid Code for β-sheet Packing Structure

    PubMed Central

    Joo, Hyun; Tsai, Jerry

    2014-01-01

    To understand the relationship between protein sequence and structure, this work extends the knob-socket model in an investigation of β-sheet packing. Over a comprehensive set of β-sheet folds, the contacts between residues were used to identify packing cliques: sets of residues that all contact each other. These packing cliques were then classified based on size and contact order. From this analysis, the 2 types of 4 residue packing cliques necessary to describe β-sheet packing were characterized. Both occur between 2 adjacent hydrogen bonded β-strands. First, defining the secondary structure packing within β-sheets, the combined socket or XY:HG pocket consists of 4 residues i,i+2 on one strand and j,j+2 on the other. Second, characterizing the tertiary packing between β-sheets, the knob-socket XY:H+B consists of a 3 residue XY:H socket (i,i+2 on one strand and j on the other) packed against a knob B residue (residue k distant in sequence). Depending on the packing depth of the knob B residue, 2 types of knob-sockets are found: side-chain and main-chain sockets. The amino acid composition of the pockets and knob-sockets reveal the sequence specificity of β-sheet packing. For β-sheet formation, the XY:HG pocket clearly shows sequence specificity of amino acids. For tertiary packing, the XY:H+B side-chain and main-chain sockets exhibit distinct amino acid preferences at each position. These relationships define an amino acid code for β-sheet structure and provide an intuitive topological mapping of β-sheet packing. PMID:24668690

  8. Common Amino Acid Subsequences in a Universal Proteome—Relevance for Food Science

    PubMed Central

    Minkiewicz, Piotr; Darewicz, Małgorzata; Iwaniak, Anna; Sokołowska, Jolanta; Starowicz, Piotr; Bucholska, Justyna; Hrynkiewicz, Monika

    2015-01-01

    A common subsequence is a fragment of the amino acid chain that occurs in more than one protein. Common subsequences may be an object of interest for food scientists as biologically active peptides, epitopes, and/or protein markers that are used in comparative proteomics. An individual bioactive fragment, in particular the shortest fragment containing two or three amino acid residues, may occur in many protein sequences. An individual linear epitope may also be present in multiple sequences of precursor proteins. Although recent recommendations for prediction of allergenicity and cross-reactivity include not only sequence identity, but also similarities in secondary and tertiary structures surrounding the common fragment, local sequence identity may be used to screen protein sequence databases for potential allergens in silico. The main weakness of the screening process is that it overlooks allergens and cross-reactivity cases without identical fragments corresponding to linear epitopes. A single peptide may also serve as a marker of a group of allergens that belong to the same family and, possibly, reveal cross-reactivity. This review article discusses the benefits for food scientists that follow from the common subsequences concept. PMID:26340620

  9. Joint Frequency-Domain Equalization and Despreading for Multi-Code DS-CDMA Using Cyclic Delay Transmit Diversity

    NASA Astrophysics Data System (ADS)

    Yamamoto, Tetsuya; Takeda, Kazuki; Adachi, Fumiyuki

    Frequency-domain equalization (FDE) based on the minimum mean square error (MMSE) criterion can provide a better bit error rate (BER) performance than rake combining. To further improve the BER performance, cyclic delay transmit diversity (CDTD) can be used. CDTD simultaneously transmits the same signal from different antennas after adding different cyclic delays to increase the number of equivalent propagation paths. Although a joint use of CDTD and MMSE-FDE for direct sequence code division multiple access (DS-CDMA) achieves larger frequency diversity gain, the BER performance improvement is limited by the residual inter-chip interference (ICI) after FDE. In this paper, we propose joint FDE and despreading for DS-CDMA using CDTD. Equalization and despreading are simultaneously performed in the frequency-domain to suppress the residual ICI after FDE. A theoretical conditional BER analysis is presented for the given channel condition. The BER analysis is confirmed by computer simulation.

  10. Improving membrane protein expression by optimizing integration efficiency

    PubMed Central

    2017-01-01

    The heterologous overexpression of integral membrane proteins in Escherichia coli often yields insufficient quantities of purifiable protein for applications of interest. The current study leverages a recently demonstrated link between co-translational membrane integration efficiency and protein expression levels to predict protein sequence modifications that improve expression. Membrane integration efficiencies, obtained using a coarse-grained simulation approach, robustly predicted effects on expression of the integral membrane protein TatC for a set of 140 sequence modifications, including loop-swap chimeras and single-residue mutations distributed throughout the protein sequence. Mutations that improve simulated integration efficiency were 4-fold enriched with respect to improved experimentally observed expression levels. Furthermore, the effects of double mutations on both simulated integration efficiency and experimentally observed expression levels were cumulative and largely independent, suggesting that multiple mutations can be introduced to yield higher levels of purifiable protein. This work provides a foundation for a general method for the rational overexpression of integral membrane proteins based on computationally simulated membrane integration efficiencies. PMID:28918393

  11. Adaptive Local Realignment of Protein Sequences.

    PubMed

    DeBlasio, Dan; Kececioglu, John

    2018-06-11

    While mutation rates can vary markedly over the residues of a protein, multiple sequence alignment tools typically use the same values for their scoring-function parameters across a protein's entire length. We present a new approach, called adaptive local realignment, that in contrast automatically adapts to the diversity of mutation rates along protein sequences. This builds upon a recent technique known as parameter advising, which finds global parameter settings for an aligner, to now adaptively find local settings. Our approach in essence identifies local regions with low estimated accuracy, constructs a set of candidate realignments using a carefully-chosen collection of parameter settings, and replaces the region if a realignment has higher estimated accuracy. This new method of local parameter advising, when combined with prior methods for global advising, boosts alignment accuracy as much as 26% over the best default setting on hard-to-align protein benchmarks, and by 6.4% over global advising alone. Adaptive local realignment has been implemented within the Opal aligner using the Facet accuracy estimator.

  12. Structure of a Trypanosoma Brucei Alpha/Beta--Hydrolase Fold Protein With Unknown Function

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Merritt, E.A.; Holmes, M.; Buckner, F.S.

    2009-05-26

    The structure of a structural genomics target protein, Tbru020260AAA from Trypanosoma brucei, has been determined to a resolution of 2.2 {angstrom} using multiple-wavelength anomalous diffraction at the Se K edge. This protein belongs to Pfam sequence family PF08538 and is only distantly related to previously studied members of the {alpha}/{beta}-hydrolase fold family. Structural superposition onto representative {alpha}/{beta}-hydrolase fold proteins of known function indicates that a possible catalytic nucleophile, Ser116 in the T. brucei protein, lies at the expected location. However, the present structure and by extension the other trypanosomatid members of this sequence family have neither sequence nor structural similaritymore » at the location of other active-site residues typical for proteins with this fold. Together with the presence of an additional domain between strands {beta}6 and {beta}7 that is conserved in trypanosomatid genomes, this suggests that the function of these homologs has diverged from other members of the fold family.« less

  13. Deep Sequencing of Random Mutant Libraries Reveals the Active Site of the Narrow Specificity CphA Metallo-β-Lactamase is Fragile to Mutations.

    PubMed

    Sun, Zhizeng; Mehta, Shrenik C; Adamski, Carolyn J; Gibbs, Richard A; Palzkill, Timothy

    2016-09-12

    CphA is a Zn(2+)-dependent metallo-β-lactamase that efficiently hydrolyzes only carbapenem antibiotics. To understand the sequence requirements for CphA function, single codon random mutant libraries were constructed for residues in and near the active site and mutants were selected for E. coli growth on increasing concentrations of imipenem, a carbapenem antibiotic. At high concentrations of imipenem that select for phenotypically wild-type mutants, the active-site residues exhibit stringent sequence requirements in that nearly all residues in positions that contact zinc, the substrate, or the catalytic water do not tolerate amino acid substitutions. In addition, at high imipenem concentrations a number of residues that do not directly contact zinc or substrate are also essential and do not tolerate substitutions. Biochemical analysis confirmed that amino acid substitutions at essential positions decreased the stability or catalytic activity of the CphA enzyme. Therefore, the CphA active - site is fragile to substitutions, suggesting active-site residues are optimized for imipenem hydrolysis. These results also suggest that resistance to inhibitors targeted to the CphA active site would be slow to develop because of the strong sequence constraints on function.

  14. Residues in the alternative reading frame tumor suppressor that influence its stability and p53-independent activities

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Tommaso, Anne di; Hagen, Jussara; Tompkins, Van

    2009-04-15

    The Alternative Reading Frame (ARF) protein suppresses tumorigenesis through p53-dependent and p53-independent pathways. Most of ARF's anti-proliferative activity is conferred by sequences in its first exon. Previous work showed specific amino acid changes occurred in that region during primate evolution, so we programmed those changes into human p14ARF to assay their functional impact. Two human p14ARF residues (Ala{sup 14} and Thr{sup 31}) were found to destabilize the protein while two others (Val{sup 24} and Ala{sup 41}) promoted more efficient p53 stabilization and activation. Despite those effects, all modified p14ARF forms displayed robust p53-dependent anti-proliferative activity demonstrating there are no significantmore » biological differences in p53-mediated growth suppression associated with simian versus human p14ARF residues. In contrast, p53-independent p14ARF function was considerably altered by several residue changes. Val{sup 24} was required for p53-independent growth suppression whereas multiple residues (Val{sup 24}, Thr{sup 31}, Ala{sup 41} and His{sup 60}) enabled p14ARF to block or reverse the inherent chromosomal instability of p53-null MEFs. Together, these data pinpoint specific residues outside of established p14ARF functional domains that influence its expression and signaling activities. Most intriguingly, this work reveals a novel and direct role for p14ARF in the p53-independent maintenance of genomic stability.« less

  15. Mechanism of endonuclease cleavage by the HigB toxin

    PubMed Central

    Schureck, Marc A.; Repack, Adrienne; Miles, Stacey J.; Marquez, Jhomar; Dunham, Christine M.

    2016-01-01

    Bacteria encode multiple type II toxin–antitoxin modules that cleave ribosome-bound mRNAs in response to stress. All ribosome-dependent toxin family members structurally characterized to date adopt similar microbial RNase architectures despite possessing low sequence identities. Therefore, determining which residues are catalytically important in this specialized RNase family has been a challenge in the field. Structural studies of RelE and YoeB toxins bound to the ribosome provided significant insights but biochemical experiments with RelE were required to clearly demonstrate which residues are critical for acid-base catalysis of mRNA cleavage. Here, we solved an X-ray crystal structure of the wild-type, ribosome-dependent toxin HigB bound to the ribosome revealing potential catalytic residues proximal to the mRNA substrate. Using cell-based and biochemical assays, we further determined that HigB residues His54, Asp90, Tyr91 and His92 are critical for activity in vivo, while HigB H54A and Y91A variants have the largest effect on mRNA cleavage in vitro. Comparison of X-ray crystal structures of two catalytically inactive HigB variants with 70S-HigB bound structures reveal that HigB active site residues undergo conformational rearrangements likely required for recognition of its mRNA substrate. These data support the emerging concept that ribosome-dependent toxins have diverse modes of mRNA recognition. PMID:27378776

  16. Characterizing protein conformations by correlation analysis of coarse-grained contact matrices.

    PubMed

    Lindsay, Richard J; Siess, Jan; Lohry, David P; McGee, Trevor S; Ritchie, Jordan S; Johnson, Quentin R; Shen, Tongye

    2018-01-14

    We have developed a method to capture the essential conformational dynamics of folded biopolymers using statistical analysis of coarse-grained segment-segment contacts. Previously, the residue-residue contact analysis of simulation trajectories was successfully applied to the detection of conformational switching motions in biomolecular complexes. However, the application to large protein systems (larger than 1000 amino acid residues) is challenging using the description of residue contacts. Also, the residue-based method cannot be used to compare proteins with different sequences. To expand the scope of the method, we have tested several coarse-graining schemes that group a collection of consecutive residues into a segment. The definition of these segments may be derived from structural and sequence information, while the interaction strength of the coarse-grained segment-segment contacts is a function of the residue-residue contacts. We then perform covariance calculations on these coarse-grained contact matrices. We monitored how well the principal components of the contact matrices is preserved using various rendering functions. The new method was demonstrated to assist the reduction of the degrees of freedom for describing the conformation space, and it potentially allows for the analysis of a system that is approximately tenfold larger compared with the corresponding residue contact-based method. This method can also render a family of similar proteins into the same conformational space, and thus can be used to compare the structures of proteins with different sequences.

  17. Characterizing protein conformations by correlation analysis of coarse-grained contact matrices

    NASA Astrophysics Data System (ADS)

    Lindsay, Richard J.; Siess, Jan; Lohry, David P.; McGee, Trevor S.; Ritchie, Jordan S.; Johnson, Quentin R.; Shen, Tongye

    2018-01-01

    We have developed a method to capture the essential conformational dynamics of folded biopolymers using statistical analysis of coarse-grained segment-segment contacts. Previously, the residue-residue contact analysis of simulation trajectories was successfully applied to the detection of conformational switching motions in biomolecular complexes. However, the application to large protein systems (larger than 1000 amino acid residues) is challenging using the description of residue contacts. Also, the residue-based method cannot be used to compare proteins with different sequences. To expand the scope of the method, we have tested several coarse-graining schemes that group a collection of consecutive residues into a segment. The definition of these segments may be derived from structural and sequence information, while the interaction strength of the coarse-grained segment-segment contacts is a function of the residue-residue contacts. We then perform covariance calculations on these coarse-grained contact matrices. We monitored how well the principal components of the contact matrices is preserved using various rendering functions. The new method was demonstrated to assist the reduction of the degrees of freedom for describing the conformation space, and it potentially allows for the analysis of a system that is approximately tenfold larger compared with the corresponding residue contact-based method. This method can also render a family of similar proteins into the same conformational space, and thus can be used to compare the structures of proteins with different sequences.

  18. Saccharomyces cerevisiae SSB1 protein and its relationship to nucleolar RNA-binding proteins.

    PubMed Central

    Jong, A Y; Clark, M W; Gilbert, M; Oehm, A; Campbell, J L

    1987-01-01

    To better define the function of Saccharomyces cerevisiae SSB1, an abundant single-stranded nucleic acid-binding protein, we determined the nucleotide sequence of the SSB1 gene and compared it with those of other proteins of known function. The amino acid sequence contains 293 amino acid residues and has an Mr of 32,853. There are several stretches of sequence characteristic of other eucaryotic single-stranded nucleic acid-binding proteins. At the amino terminus, residues 39 to 54 are highly homologous to a peptide in calf thymus UP1 and UP2 and a human heterogeneous nuclear ribonucleoprotein. Residues 125 to 162 constitute a fivefold tandem repeat of the sequence RGGFRG, the composition of which suggests a nucleic acid-binding site. Near the C terminus, residues 233 to 245 are homologous to several RNA-binding proteins. Of 18 C-terminal residues, 10 are acidic, a characteristic of the procaryotic single-stranded DNA-binding proteins and eucaryotic DNA- and RNA-binding proteins. In addition, examination of the subcellular distribution of SSB1 by immunofluorescence microscopy indicated that SSB1 is a nuclear protein, predominantly located in the nucleolus. Sequence homologies and the nucleolar localization make it likely that SSB1 functions in RNA metabolism in vivo, although an additional role in DNA metabolism cannot be excluded. Images PMID:2823109

  19. Predicting protein β-sheet contacts using a maximum entropy-based correlated mutation measure.

    PubMed

    Burkoff, Nikolas S; Várnai, Csilla; Wild, David L

    2013-03-01

    The problem of ab initio protein folding is one of the most difficult in modern computational biology. The prediction of residue contacts within a protein provides a more tractable immediate step. Recently introduced maximum entropy-based correlated mutation measures (CMMs), such as direct information, have been successful in predicting residue contacts. However, most correlated mutation studies focus on proteins that have large good-quality multiple sequence alignments (MSA) because the power of correlated mutation analysis falls as the size of the MSA decreases. However, even with small autogenerated MSAs, maximum entropy-based CMMs contain information. To make use of this information, in this article, we focus not on general residue contacts but contacts between residues in β-sheets. The strong constraints and prior knowledge associated with β-contacts are ideally suited for prediction using a method that incorporates an often noisy CMM. Using contrastive divergence, a statistical machine learning technique, we have calculated a maximum entropy-based CMM. We have integrated this measure with a new probabilistic model for β-contact prediction, which is used to predict both residue- and strand-level contacts. Using our model on a standard non-redundant dataset, we significantly outperform a 2D recurrent neural network architecture, achieving a 5% improvement in true positives at the 5% false-positive rate at the residue level. At the strand level, our approach is competitive with the state-of-the-art single methods achieving precision of 61.0% and recall of 55.4%, while not requiring residue solvent accessibility as an input. http://www2.warwick.ac.uk/fac/sci/systemsbiology/research/software/

  20. Roles of the C-terminal domains of human dihydrodiol dehydrogenase isoforms in the binding of substrates and modulators: probing with chimaeric enzymes.

    PubMed Central

    Matsuura, K; Hara, A; Deyashiki, Y; Iwasa, H; Kume, T; Ishikura, S; Shiraishi, H; Katagiri, Y

    1998-01-01

    Human liver dihydrodiol dehydrogenase (DD; EC 1.3.1.20) exists in isoforms (DD1, DD2 and DD4) composed of 323 amino acids. DD1 and DD2 share 98% amino acid sequence identity, but show lower identities (approx. 83%) with DD4, in which a marked difference is seen in the C-terminal ten amino acids. DD4 exhibits unique catalytic properties, such as the ability to oxidize both (R)- and (S)-alicyclic alcohols equally, high dehydrogenase activity for bile acids, potent inhibition by steroidal anti-inflammatory drugs and activation by sulphobromophthalein and clofibric acid derivatives. In this study, we have prepared chimaeric enzymes, in which we exchanged the C-terminal 39 residues between the two enzymes. Compared with DD1, CDD1-4 (DD1 with the C-terminal sequence of DD4) had increased kcat/Km values for 3alpha-hydroxy-5beta-androstanes and bile acids of 3-9-fold and decreased values for the other substrates by 5-100-fold. It also became highly sensitive to DD4 inhibitors such as phenolphthalein and hexoestrol. Another chimaeric enzyme, CDD4-1 (DD4 with the C-terminal sequence of DD1), showed the same (S)-stereospecificity for the alicyclic alcohols as DD1, had decreased kcat/Km values for bile acids with 7beta- or 12alpha-hydroxy groups by more than 120-fold and was resistant to inhibition by betamethasone. In addition, the activation effects of sulphobromophthalein and bezafibrate decreased or disappeared for CDD4-1. The recombinant DD4 with the His314-->Pro (the corresponding residue of DD1) mutation showed intermediate changes in the properties between those of wild-type DD4 and CDD4-1. The results indicate that the binding of substrates, inhibitors and activators to the enzymes is controlled by residues in their C-terminal domains; multiple residues co-ordinately act as determinants for substrate specificity and inhibitor sensitivity. PMID:9820821

  1. Streptococcal phosphoenolpyruvate-sugar phosphotransferase system: amino acid sequence and site of ATP-dependent phosphorylation of HPr

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Deutscher, J.; Pevec, B.; Beyreuther, K.

    1986-10-21

    The amino acid sequence of histidine-containing protein (HPr) from Streptococcus faecalis has been determined by direct Edman degradation of intact HPr and by amino acid sequence analysis of tryptic peptides, V8 proteolyptic peptides, thermolytic peptides, and cyanogen bromide cleavage products. HPr from S. faecalis was found to contain 89 amino acid residues, corresponding to a molecular weight of 9438. The amino acid sequence of HPr from S. faecalis shows extended homology to the primary structure of HPr proteins from other bacteria. Besides the phosphoenolpyruvate-dependent phosphorylation of a histidyl residue in HPr, catalyzed by enzyme I of the bacterial phosphotransferase system,more » HPr was also found to be phosphorylated at a seryl residue in an ATP-dependent protein kinase catalyzed reaction. The site of ATP-dependent phosphorylation in HPr of S faecalis has now been determined. (/sup 32/P)P-Ser-HPr was digested with three different proteases, and in each case, a single labeled peptide was isolated. Following digestion with subtilisin, they obtained a peptide with the sequence -(P)Ser-Ile-Met-. Using chymotrypsin, they isolated a peptide with the sequence -Ser-Val-Asn-Leu-Lys-(P)Ser-Ile-Met-Gly-Val-Met-. The longest labeled peptide was obtained with V8 staphylococcal protease. According to amino acid analysis, this peptide contained 36 out of the 89 amino acid residues of HPr. The following sequence of 12 amino acid residues of the V8 peptide was determined: -Tyr-Lys-Gly-Lys-Ser-Val-Asn-Leu-Lys-(P)Ser-Ile-Met-. Thus, the site of ATP-dependent phosphorylation was determined to be Ser-46 within the primary structure of HPr.« less

  2. Residual Stresses and Critical Initial Flaw Size Analyses of Welds

    NASA Technical Reports Server (NTRS)

    Brust, Frederick W.; Raju, Ivatury, S.; Dawocke, David S.; Cheston, Derrick

    2009-01-01

    An independent assessment was conducted to determine the critical initial flaw size (CIFS) for the flange-to-skin weld in the Ares I-X Upper Stage Simulator (USS). A series of weld analyses are performed to determine the residual stresses in a critical region of the USS. Weld residual stresses both increase constraint and mean stress thereby having an important effect on the fatigue life. The purpose of the weld analyses was to model the weld process using a variety of sequences to determine the 'best' sequence in terms of weld residual stresses and distortions. The many factors examined in this study include weld design (single-V, double-V groove), weld sequence, boundary conditions, and material properties, among others. The results of this weld analysis are included with service loads to perform a fatigue and critical initial flaw size evaluation.

  3. Genomic Microbial Epidemiology Is Needed to Comprehend the Global Problem of Antibiotic Resistance and to Improve Pathogen Diagnosis

    PubMed Central

    Wyrsch, Ethan R.; Roy Chowdhury, Piklu; Chapman, Toni A.; Charles, Ian G.; Hammond, Jeffrey M.; Djordjevic, Steven P.

    2016-01-01

    Contamination of waste effluent from hospitals and intensive food animal production with antimicrobial residues is an immense global problem. Antimicrobial residues exert selection pressures that influence the acquisition of antimicrobial resistance and virulence genes in diverse microbial populations. Despite these concerns there is only a limited understanding of how antimicrobial residues contribute to the global problem of antimicrobial resistance. Furthermore, rapid detection of emerging bacterial pathogens and strains with resistance to more than one antibiotic class remains a challenge. A comprehensive, sequence-based genomic epidemiological surveillance model that captures essential microbial metadata is needed, both to improve surveillance for antimicrobial resistance and to monitor pathogen evolution. Escherichia coli is an important pathogen causing both intestinal [intestinal pathogenic E. coli (IPEC)] and extraintestinal [extraintestinal pathogenic E. coli (ExPEC)] disease in humans and food animals. ExPEC are the most frequently isolated Gram negative pathogen affecting human health, linked to food production practices and are often resistant to multiple antibiotics. Cattle are a known reservoir of IPEC but they are not recognized as a source of ExPEC that impact human or animal health. In contrast, poultry are a recognized source of multiple antibiotic resistant ExPEC, while swine have received comparatively less attention in this regard. Here, we review what is known about ExPEC in swine and how pig production contributes to the problem of antibiotic resistance. PMID:27379026

  4. Application of sorting and next generation sequencing to study 5΄-UTR influence on translation efficiency in Escherichia coli

    PubMed Central

    Evfratov, Sergey A.; Osterman, Ilya A.; Komarova, Ekaterina S.; Pogorelskaya, Alexandra M.; Rubtsova, Maria P.; Zatsepin, Timofei S.; Semashko, Tatiana A.; Kostryukova, Elena S.; Mironov, Andrey A.; Burnaev, Evgeny; Krymova, Ekaterina; Gelfand, Mikhail S.; Govorun, Vadim M.; Bogdanov, Alexey A.; Dontsova, Olga A.

    2017-01-01

    Abstract Yield of protein per translated mRNA may vary by four orders of magnitude. Many studies analyzed the influence of mRNA features on the translation yield. However, a detailed understanding of how mRNA sequence determines its propensity to be translated is still missing. Here, we constructed a set of reporter plasmid libraries encoding CER fluorescent protein preceded by randomized 5΄ untranslated regions (5΄-UTR) and Red fluorescent protein (RFP) used as an internal control. Each library was transformed into Escherchia coli cells, separated by efficiency of CER mRNA translation by a cell sorter and subjected to next generation sequencing. We tested efficiency of translation of the CER gene preceded by each of 48 natural 5΄-UTR sequences and introduced random and designed mutations into natural and artificially selected 5΄-UTRs. Several distinct properties could be ascribed to a group of 5΄-UTRs most efficient in translation. In addition to known ones, several previously unrecognized features that contribute to the translation enhancement were found, such as low proportion of cytidine residues, multiple SD sequences and AG repeats. The latter could be identified as translation enhancer, albeit less efficient than SD sequence in several natural 5΄-UTRs. PMID:27899632

  5. Amino-terminal sequence of glycoprotein D of herpes simplex virus types 1 and 2

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Eisenberg, R.J.; Long, D.; Hogue-Angeletti, R.

    1984-01-01

    Glycoprotein D (gD) of herpes simplex virus is a structural component of the virion envelope which stimulates production of high titers of herpes simplex virus type-common neutralizing antibody. The authors caried out automated N-terminal amino acid sequencing studies on radiolabeled preparations of gD-1 (gD of herpes simplex virus type 1) and gD-2 (gD of herpes simplex virus type 2). Although some differences were noted, particularly in the methionine and alanine profiles for gD-1 and gD-2, the amino acid sequence of a number of the first 30 residues of the amino terminus of gD-1 and gD-2 appears to be quite similar.more » For both proteins, the first residue is a lysine. When we compared out sequence data for gD-1 with those predicted by nucleic acid sequencing, the two sequences could be aligned (with one exception) starting at residue 26 (lysine) of the predicted sequence. Thus, the first 25 amino acids of the predicted sequence are absent from the polypeptides isolated from infected cells.« less

  6. The complete amino acid sequence of human erythrocyte diphosphoglycerate mutase.

    PubMed Central

    Haggarty, N W; Dunbar, B; Fothergill, L A

    1983-01-01

    The complete amino acid sequence of human erythrocyte diphosphoglycerate mutase, comprising 239 residues, was determined. The sequence was deduced from the four cyanogen bromide fragments, and from the peptides derived from these fragments after digestion with a number of proteolytic enzymes. Comparison of this sequence with that of the yeast glycolytic enzyme, phosphoglycerate mutase, shows that these enzymes are 47% identical. Most, but not all, of the residues implicated as being important for the activity of the glycolytic mutase are conserved in the erythrocyte diphosphoglycerate mutase. PMID:6313356

  7. Adenine specific DNA chemical sequencing reaction.

    PubMed Central

    Iverson, B L; Dervan, P B

    1987-01-01

    Reaction of DNA with K2PdCl4 at pH 2.0 followed by a piperidine workup produces specific cleavage at adenine (A) residues. Product analysis revealed the K2PdCl4 reaction involves selective depurination at adenine, affording an excision reaction analogous to the other chemical DNA sequencing reactions. Adenine residues methylated at the exocyclic amine (N6) react with lower efficiency than unmethylated adenine in an identical sequence. This simple protocol specific for A may be a useful addition to current chemical sequencing reactions. Images PMID:3671067

  8. Identification of Specific DNA Binding Residues in the TCP Family of Transcription Factors in Arabidopsis[W

    PubMed Central

    Aggarwal, Pooja; Das Gupta, Mainak; Joseph, Agnel Praveen; Chatterjee, Nirmalya; Srinivasan, N.; Nath, Utpal

    2010-01-01

    The TCP transcription factors control multiple developmental traits in diverse plant species. Members of this family share an ∼60-residue-long TCP domain that binds to DNA. The TCP domain is predicted to form a basic helix-loop-helix (bHLH) structure but shares little sequence similarity with canonical bHLH domain. This classifies the TCP domain as a novel class of DNA binding domain specific to the plant kingdom. Little is known about how the TCP domain interacts with its target DNA. We report biochemical characterization and DNA binding properties of a TCP member in Arabidopsis thaliana, TCP4. We have shown that the 58-residue domain of TCP4 is essential and sufficient for binding to DNA and possesses DNA binding parameters comparable to canonical bHLH proteins. Using a yeast-based random mutagenesis screen and site-directed mutants, we identified the residues important for DNA binding and dimer formation. Mutants defective in binding and dimerization failed to rescue the phenotype of an Arabidopsis line lacking the endogenous TCP4 activity. By combining structure prediction, functional characterization of the mutants, and molecular modeling, we suggest a possible DNA binding mechanism for this class of transcription factors. PMID:20363772

  9. A Next-Generation Sequencing Strategy for Evaluating the Most Common Genetic Abnormalities in Multiple Myeloma.

    PubMed

    Jiménez, Cristina; Jara-Acevedo, María; Corchete, Luis A; Castillo, David; Ordóñez, Gonzalo R; Sarasquete, María E; Puig, Noemí; Martínez-López, Joaquín; Prieto-Conde, María I; García-Álvarez, María; Chillón, María C; Balanzategui, Ana; Alcoceba, Miguel; Oriol, Albert; Rosiñol, Laura; Palomera, Luis; Teruel, Ana I; Lahuerta, Juan J; Bladé, Joan; Mateos, María V; Orfão, Alberto; San Miguel, Jesús F; González, Marcos; Gutiérrez, Norma C; García-Sanz, Ramón

    2017-01-01

    Identification and characterization of genetic alterations are essential for diagnosis of multiple myeloma and may guide therapeutic decisions. Currently, genomic analysis of myeloma to cover the diverse range of alterations with prognostic impact requires fluorescence in situ hybridization (FISH), single nucleotide polymorphism arrays, and sequencing techniques, which are costly and labor intensive and require large numbers of plasma cells. To overcome these limitations, we designed a targeted-capture next-generation sequencing approach for one-step identification of IGH translocations, V(D)J clonal rearrangements, the IgH isotype, and somatic mutations to rapidly identify risk groups and specific targetable molecular lesions. Forty-eight newly diagnosed myeloma patients were tested with the panel, which included IGH and six genes that are recurrently mutated in myeloma: NRAS, KRAS, HRAS, TP53, MYC, and BRAF. We identified 14 of 17 IGH translocations previously detected by FISH and three confirmed translocations not detected by FISH, with the additional advantage of breakpoint identification, which can be used as a target for evaluating minimal residual disease. IgH subclass and V(D)J rearrangements were identified in 77% and 65% of patients, respectively. Mutation analysis revealed the presence of missense protein-coding alterations in at least one of the evaluating genes in 16 of 48 patients (33%). This method may represent a time- and cost-effective diagnostic method for the molecular characterization of multiple myeloma. Copyright © 2017 American Society for Investigative Pathology and the Association for Molecular Pathology. Published by Elsevier Inc. All rights reserved.

  10. Chicken immunoglobulin gamma-heavy chains: limited VH gene repertoire, combinatorial diversification by D gene segments and evolution of the heavy chain locus.

    PubMed

    Parvari, R; Avivi, A; Lentner, F; Ziv, E; Tel-Or, S; Burstein, Y; Schechter, I

    1988-03-01

    cDNA clones encoding the variable and constant regions of chicken immunoglobulin (Ig) gamma-chains were obtained from spleen cDNA libraries. Southern blots of kidney DNA show that the variable region sequences of eight cDNA clones reveal the same set of bands corresponding to approximately 30 cross-hybridizing VH genes of one subgroup. Since the VH clones were randomly selected, it is likely that the bulk of chicken H-chains are encoded by a single VH subgroup. Nucleotide sequence determinations of two cDNA clones reveal VH, D, JH and the constant region. The VH segments are closely related to each other (83% homology) as expected for VH or the same subgroup. The JHs are 15 residues long and differ by one amino acid. The Ds differ markedly in sequence (20% homology) and size (10 and 20 residues). These findings strongly indicate multiple (at least two) D genes which by a combinatorial joining mechanism diversify the H-chains, a mechanism which is not operative in the chicken L-chain locus. The most notable among the chicken Igs is the so-called 7S IgG because its H-chain differs in many important aspects from any mammalian IgG. The sequence of the C gamma cDNA reported here resolves this issue. The chicken C gamma is 426 residues long with four CH domains (unlike mammalian C gamma which has three CH domains) and it shows 25% homology to the chicken C mu. The chicken C gamma is most related to the mammalian C epsilon in length, the presence of four CH domains and the distribution of cysteines in the CH1 and CH2 domains. We propose that the unique chicken C gamma is the ancestor of the mammalian C epsilon and C gamma subclasses, and discuss the evolution of the H-chain locus from that of chicken with presumably three genes (mu, gamma, alpha) to the mammalian loci with 8-10 H-chain genes.

  11. Genome-wide diversity and selective pressure in the human rhinovirus

    PubMed Central

    Kistler, Amy L; Webster, Dale R; Rouskin, Silvi; Magrini, Vince; Credle, Joel J; Schnurr, David P; Boushey, Homer A; Mardis, Elaine R; Li, Hao; DeRisi, Joseph L

    2007-01-01

    Background The human rhinoviruses (HRV) are one of the most common and diverse respiratory pathogens of humans. Over 100 distinct HRV serotypes are known, yet only 6 genomes are available. Due to the paucity of HRV genome sequence, little is known about the genetic diversity within HRV or the forces driving this diversity. Previous comparative genome sequence analyses indicate that recombination drives diversification in multiple genera of the picornavirus family, yet it remains unclear if this holds for HRV. Results To resolve this and gain insight into the forces driving diversification in HRV, we generated a representative set of 34 fully sequenced HRVs. Analysis of these genomes shows consistent phylogenies across the genome, conserved non-coding elements, and only limited recombination. However, spikes of genetic diversity at both the nucleotide and amino acid level are detectable within every locus of the genome. Despite this, the HRV genome as a whole is under purifying selective pressure, with islands of diversifying pressure in the VP1, VP2, and VP3 structural genes and two non-structural genes, the 3C protease and 3D polymerase. Mapping diversifying residues in these factors onto available 3-dimensional structures revealed the diversifying capsid residues partition to the external surface of the viral particle in statistically significant proximity to antigenic sites. Diversifying pressure in the pleconaril binding site is confined to a single residue known to confer drug resistance (VP1 191). In contrast, diversifying pressure in the non-structural genes is less clear, mapping both nearby and beyond characterized functional domains of these factors. Conclusion This work provides a foundation for understanding HRV genetic diversity and insight into the underlying biology driving evolution in HRV. It expands our knowledge of the genome sequence space that HRV reference serotypes occupy and how the pattern of genetic diversity across HRV genomes differs from other picornaviruses. It also reveals evidence of diversifying selective pressure in both structural genes known to interact with the host immune system and in domains of unassigned function in the non-structural 3C and 3D genes, raising the possibility that diversification of undiscovered functions in these essential factors may influence HRV fitness and evolution. PMID:17477878

  12. The ARTT motif and a unified structural understanding of substraterecognition in ADP ribosylating bacterial toxins and eukaryotic ADPribosyltransferases

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Han, S.; Tainer, J.A.

    2001-08-01

    ADP-ribosylation is a widely occurring and biologically critical covalent chemical modification process in pathogenic mechanisms, intracellular signaling systems, DNA repair, and cell division. The reaction is catalyzed by ADP-ribosyltransferases, which transfer the ADP-ribose moiety of NAD to a target protein with nicotinamide release. A family of bacterial toxins and eukaryotic enzymes has been termed the mono-ADP-ribosyltransferases, in distinction to the poly-ADP-ribosyltransferases, which catalyze the addition of multiple ADP-ribose groups to the carboxyl terminus of eukaryotic nucleoproteins. Despite the limited primary sequence homology among the different ADP-ribosyltransferases, a central cleft bearing NAD-binding pocket formed by the two perpendicular b-sheet core hasmore » been remarkably conserved between bacterial toxins and eukaryotic mono- and poly-ADP-ribosyltransferases. The majority of bacterial toxins and eukaryotic mono-ADP-ribosyltransferases are characterized by conserved His and catalytic Glu residues. In contrast, Diphtheria toxin, Pseudomonas exotoxin A, and eukaryotic poly-ADP-ribosyltransferases are characterized by conserved Arg and catalytic Glu residues. The NAD-binding core of a binary toxin and a C3-like toxin family identified an ARTT motif (ADP-ribosylating turn-turn motif) that is implicated in substrate specificity and recognition by structural and mutagenic studies. Here we apply structure-based sequence alignment and comparative structural analyses of all known structures of ADP-ribosyltransfeases to suggest that this ARTT motif is functionally important in many ADP-ribosylating enzymes that bear a NAD binding cleft as characterized by conserved Arg and catalytic Glu residues. Overall, structure-based sequence analysis reveals common core structures and conserved active sites of ADP-ribosyltransferases to support similar NAD binding mechanisms but differing mechanisms of target protein binding via sequence variations within the ARTT motif structural framework. Thus, we propose here that the ARTT motif represents an experimentally testable general recognition motif region for many ADP-ribosyltransferases and thereby potentially provides a unified structural understanding of substrate recognition in ADP-ribosylation processes.« less

  13. The amino acid sequence around the active-site cysteine and histidine residues of stem bromelain

    PubMed Central

    Husain, S. S.; Lowe, G.

    1970-01-01

    Stem bromelain that had been irreversibly inhibited with 1,3-dibromo[2-14C]-acetone was reduced with sodium borohydride and carboxymethylated with iodoacetic acid. After digestion with trypsin and α-chymotrypsin three radioactive peptides were isolated chromatographically. The amino acid sequences around the cross-linked cysteine and histidine residues were determined and showed a high degree of homology with those around the active-site cysteine and histidine residues of papain and ficin. PMID:5420046

  14. Turn stability in beta-hairpin peptides: Investigation of peptides containing 3:5 type I G1 bulge turns.

    PubMed

    Blandl, Tamas; Cochran, Andrea G; Skelton, Nicholas J

    2003-02-01

    The turn-forming ability of a series of three-residue sequences was investigated by substituting them into a well-characterized beta-hairpin peptide. The starting scaffold, bhpW, is a disulfide-cyclized 10-residue peptide that folds into a stable beta-hairpin with two antiparallel strands connected by a two-residue reverse turn. Substitution of the central two residues with the three-residue test sequences leads to less stable hairpins, as judged by thiol-disulfide equilibrium measurements. However, analysis of NMR parameters indicated that each molecule retains a significant folded population, and that the type of turn adopted by the three-residue sequence is the same in all cases. The solution structure of a selected peptide with a PDG turn contained an antiparallel beta-hairpin with a 3:5 type I + G1 bulge turn. Analysis of the energetic contributions of individual turn residues in the series of peptides indicates that substitution effects have significant context dependence, limiting the predictive power of individual amino acid propensities for turn formation. The most stable and least stable sequences were also substituted into a more stable disulfide-cyclized scaffold and a linear beta-hairpin scaffold. The relative stabilities remained the same, suggesting that experimental measurements in the bhpW context are a useful way to evaluate turn stability for use in protein design projects. Moreover, these scaffolds are capable of displaying a diverse set of turns, which can be exploited for the mimicry of protein loops or for generating libraries of reverse turns.

  15. Modeling coding-sequence evolution within the context of residue solvent accessibility.

    PubMed

    Scherrer, Michael P; Meyer, Austin G; Wilke, Claus O

    2012-09-12

    Protein structure mediates site-specific patterns of sequence divergence. In particular, residues in the core of a protein (solvent-inaccessible residues) tend to be more evolutionarily conserved than residues on the surface (solvent-accessible residues). Here, we present a model of sequence evolution that explicitly accounts for the relative solvent accessibility of each residue in a protein. Our model is a variant of the Goldman-Yang 1994 (GY94) model in which all model parameters can be functions of the relative solvent accessibility (RSA) of a residue. We apply this model to a data set comprised of nearly 600 yeast genes, and find that an evolutionary-rate ratio ω that varies linearly with RSA provides a better model fit than an RSA-independent ω or an ω that is estimated separately in individual RSA bins. We further show that the branch length t and the transition-transverion ratio κ also vary with RSA. The RSA-dependent GY94 model performs better than an RSA-dependent Muse-Gaut 1994 (MG94) model in which the synonymous and non-synonymous rates individually are linear functions of RSA. Finally, protein core size affects the slope of the linear relationship between ω and RSA, and gene expression level affects both the intercept and the slope. Structure-aware models of sequence evolution provide a significantly better fit than traditional models that neglect structure. The linear relationship between ω and RSA implies that genes are better characterized by their ω slope and intercept than by just their mean ω.

  16. Versatility and Invariance in the Evolution of Homologous Heteromeric Interfaces

    PubMed Central

    Andreani, Jessica; Faure, Guilhem; Guerois, Raphaël

    2012-01-01

    Evolutionary pressures act on protein complex interfaces so that they preserve their complementarity. Nonetheless, the elementary interactions which compose the interface are highly versatile throughout evolution. Understanding and characterizing interface plasticity across evolution is a fundamental issue which could provide new insights into protein-protein interaction prediction. Using a database of 1,024 couples of close and remote heteromeric structural interologs, we studied protein-protein interactions from a structural and evolutionary point of view. We systematically and quantitatively analyzed the conservation of different types of interface contacts. Our study highlights astonishing plasticity regarding polar contacts at complex interfaces. It also reveals that up to a quarter of the residues switch out of the interface when comparing two homologous complexes. Despite such versatility, we identify two important interface descriptors which correlate with an increased conservation in the evolution of interfaces: apolar patches and contacts surrounding anchor residues. These observations hold true even when restricting the dataset to transiently formed complexes. We show that a combination of six features related either to sequence or to geometric properties of interfaces can be used to rank positions likely to share similar contacts between two interologs. Altogether, our analysis provides important tracks for extracting meaningful information from multiple sequence alignments of conserved binding partners and for discriminating near-native interfaces using evolutionary information. PMID:22952442

  17. A comparative study of cold- and warm-adapted Endonucleases A using sequence analyses and molecular dynamics simulations.

    PubMed

    Michetti, Davide; Brandsdal, Bjørn Olav; Bon, Davide; Isaksen, Geir Villy; Tiberti, Matteo; Papaleo, Elena

    2017-01-01

    The psychrophilic and mesophilic endonucleases A (EndA) from Aliivibrio salmonicida (VsEndA) and Vibrio cholera (VcEndA) have been studied experimentally in terms of the biophysical properties related to thermal adaptation. The analyses of their static X-ray structures was no sufficient to rationalize the determinants of their adaptive traits at the molecular level. Thus, we used Molecular Dynamics (MD) simulations to compare the two proteins and unveil their structural and dynamical differences. Our simulations did not show a substantial increase in flexibility in the cold-adapted variant on the nanosecond time scale. The only exception is a more rigid C-terminal region in VcEndA, which is ascribable to a cluster of electrostatic interactions and hydrogen bonds, as also supported by MD simulations of the VsEndA mutant variant where the cluster of interactions was introduced. Moreover, we identified three additional amino acidic substitutions through multiple sequence alignment and the analyses of MD-based protein structure networks. In particular, T120V occurs in the proximity of the catalytic residue H80 and alters the interaction with the residue Y43, which belongs to the second coordination sphere of the Mg2+ ion. This makes T120V an amenable candidate for future experimental mutagenesis.

  18. Active site of tripeptidyl peptidase II from human erythrocytes is of the subtilisin type.

    PubMed Central

    Tomkinson, B; Wernstedt, C; Hellman, U; Zetterqvist, O

    1987-01-01

    The present report presents evidence that the amino acid sequence around the serine of the active site of human tripeptidyl peptidase II is of the subtilisin type. The enzyme from human erythrocytes was covalently labeled at its active site with [3H]diisopropyl fluorophosphate, and the protein was subsequently reduced, alkylated, and digested with trypsin. The labeled tryptic peptides were purified by gel filtration and repeated reversed-phase HPLC, and their amino-terminal sequences were determined. Residue 9 contained the radioactive label and was, therefore, considered to be the active serine residue. The primary structure of the part of the active site (residues 1-10) containing this residue was concluded to be Xaa-Thr-Gln-Leu-Met-Asx-Gly-Thr-Ser-Met. This amino acid sequence is homologous to the sequence surrounding the active serine of the microbial peptidases subtilisin and thermitase. These data demonstrate that human tripeptidyl peptidase II represents a potentially distinct class of human peptidases and raise the question of an evolutionary relationship between the active site of a mammalian peptidase and that of the subtilisin family of serine peptidases. PMID:3313395

  19. Feedback power control strategies in wireless sensor networks with joint channel decoding.

    PubMed

    Abrardo, Andrea; Ferrari, Gianluigi; Martalò, Marco; Perna, Fabio

    2009-01-01

    In this paper, we derive feedback power control strategies for block-faded multiple access schemes with correlated sources and joint channel decoding (JCD). In particular, upon the derivation of the feasible signal-to-noise ratio (SNR) region for the considered multiple access schemes, i.e., the multidimensional SNR region where error-free communications are, in principle, possible, two feedback power control strategies are proposed: (i) a classical feedback power control strategy, which aims at equalizing all link SNRs at the access point (AP), and (ii) an innovative optimized feedback power control strategy, which tries to make the network operational point fall in the feasible SNR region at the lowest overall transmit energy consumption. These strategies will be referred to as "balanced SNR" and "unbalanced SNR," respectively. While they require, in principle, an unlimited power control range at the sources, we also propose practical versions with a limited power control range. We preliminary consider a scenario with orthogonal links and ideal feedback. Then, we analyze the robustness of the proposed power control strategies to possible non-idealities, in terms of residual multiple access interference and noisy feedback channels. Finally, we successfully apply the proposed feedback power control strategies to a limiting case of the class of considered multiple access schemes, namely a central estimating officer (CEO) scenario, where the sensors observe noisy versions of a common binary information sequence and the AP's goal is to estimate this sequence by properly fusing the soft-output information output by the JCD algorithm.

  20. The NH2-terminal php domain of the alpha subunit of the Escherichia coli replicase binds the epsilon proofreading subunit.

    PubMed

    Wieczorek, Anna; McHenry, Charles S

    2006-05-05

    The alpha subunit of the replicase of all bacteria contains a php domain, initially identified by its similarity to histidinol phosphatase but of otherwise unknown function (Aravind, L., and Koonin, E. V. (1998) Nucleic Acids Res. 26, 3746-3752). Deletion of 60 residues from the NH2 terminus of the alpha php domain destroys epsilon binding. The minimal 255-residue php domain, estimated by sequence alignment with homolog YcdX, is insufficient for epsilon binding. However, a 320-residue segment including sequences that immediately precede the polymerase domain binds epsilon with the same affinity as the 1160-residue full-length alpha subunit. A subset of mutations of a conserved acidic residue (Asp43 in Escherichia coli alpha) present in the php domain of all bacterial replicases resulted in defects in epsilon binding. Using sequence alignments, we show that the prototypical gram+ Pol C, which contains the polymerase and proofreading activities within the same polypeptide chain, has an epsilon-like sequence inserted in a surface loop near the center of the homologous YcdX protein. These findings suggest that the php domain serves as a platform to enable coordination of proofreading and polymerase activities during chromosomal replication.

  1. Inference of Functionally-Relevant N-acetyltransferase Residues Based on Statistical Correlations.

    PubMed

    Neuwald, Andrew F; Altschul, Stephen F

    2016-12-01

    Over evolutionary time, members of a superfamily of homologous proteins sharing a common structural core diverge into subgroups filling various functional niches. At the sequence level, such divergence appears as correlations that arise from residue patterns distinct to each subgroup. Such a superfamily may be viewed as a population of sequences corresponding to a complex, high-dimensional probability distribution. Here we model this distribution as hierarchical interrelated hidden Markov models (hiHMMs), which describe these sequence correlations implicitly. By characterizing such correlations one may hope to obtain information regarding functionally-relevant properties that have thus far evaded detection. To do so, we infer a hiHMM distribution from sequence data using Bayes' theorem and Markov chain Monte Carlo (MCMC) sampling, which is widely recognized as the most effective approach for characterizing a complex, high dimensional distribution. Other routines then map correlated residue patterns to available structures with a view to hypothesis generation. When applied to N-acetyltransferases, this reveals sequence and structural features indicative of functionally important, yet generally unknown biochemical properties. Even for sets of proteins for which nothing is known beyond unannotated sequences and structures, this can lead to helpful insights. We describe, for example, a putative coenzyme-A-induced-fit substrate binding mechanism mediated by arginine residue switching between salt bridge and π-π stacking interactions. A suite of programs implementing this approach is available (psed.igs.umaryland.edu).

  2. Two Theileria parva CD8 T Cell Antigen Genes Are More Variable in Buffalo than Cattle Parasites, but Differ in Pattern of Sequence Diversity

    PubMed Central

    Pelle, Roger; Graham, Simon P.; Njahira, Moses N.; Osaso, Julius; Saya, Rosemary M.; Odongo, David O.; Toye, Philip G.; Spooner, Paul R.; Musoke, Anthony J.; Mwangi, Duncan M.; Taracha, Evans L. N.; Morrison, W. Ivan; Weir, William; Silva, Joana C.; Bishop, Richard P.

    2011-01-01

    Background Theileria parva causes an acute fatal disease in cattle, but infections are asymptomatic in the African buffalo (Syncerus caffer). Cattle can be immunized against the parasite by infection and treatment, but immunity is partially strain specific. Available data indicate that CD8+ T lymphocyte responses mediate protection and, recently, several parasite antigens recognised by CD8+ T cells have been identified. This study set out to determine the nature and extent of polymorphism in two of these antigens, Tp1 and Tp2, which contain defined CD8+ T-cell epitopes, and to analyse the sequences for evidence of selection. Methodology/Principal Findings Partial sequencing of the Tp1 gene and the full-length Tp2 gene from 82 T. parva isolates revealed extensive polymorphism in both antigens, including the epitope-containing regions. Single nucleotide polymorphisms were detected at 51 positions (∼12%) in Tp1 and in 320 positions (∼61%) in Tp2. Together with two short indels in Tp1, these resulted in 30 and 42 protein variants of Tp1 and Tp2, respectively. Although evidence of positive selection was found for multiple amino acid residues, there was no preferential involvement of T cell epitope residues. Overall, the extent of diversity was much greater in T. parva isolates originating from buffalo than in isolates known to be transmissible among cattle. Conclusions/Significance The results indicate that T. parva parasites maintained in cattle represent a subset of the overall T. parva population, which has become adapted for tick transmission between cattle. The absence of obvious enrichment for positively selected amino acid residues within defined epitopes indicates either that diversity is not predominantly driven by selection exerted by host T cells, or that such selection is not detectable by the methods employed due to unidentified epitopes elsewhere in the antigens. Further functional studies are required to address this latter point. PMID:21559495

  3. Two Theileria parva CD8 T cell antigen genes are more variable in buffalo than cattle parasites, but differ in pattern of sequence diversity.

    PubMed

    Pelle, Roger; Graham, Simon P; Njahira, Moses N; Osaso, Julius; Saya, Rosemary M; Odongo, David O; Toye, Philip G; Spooner, Paul R; Musoke, Anthony J; Mwangi, Duncan M; Taracha, Evans L N; Morrison, W Ivan; Weir, William; Silva, Joana C; Bishop, Richard P

    2011-04-29

    Theileria parva causes an acute fatal disease in cattle, but infections are asymptomatic in the African buffalo (Syncerus caffer). Cattle can be immunized against the parasite by infection and treatment, but immunity is partially strain specific. Available data indicate that CD8(+) T lymphocyte responses mediate protection and, recently, several parasite antigens recognised by CD8(+) T cells have been identified. This study set out to determine the nature and extent of polymorphism in two of these antigens, Tp1 and Tp2, which contain defined CD8(+) T-cell epitopes, and to analyse the sequences for evidence of selection. Partial sequencing of the Tp1 gene and the full-length Tp2 gene from 82 T. parva isolates revealed extensive polymorphism in both antigens, including the epitope-containing regions. Single nucleotide polymorphisms were detected at 51 positions (∼12%) in Tp1 and in 320 positions (∼61%) in Tp2. Together with two short indels in Tp1, these resulted in 30 and 42 protein variants of Tp1 and Tp2, respectively. Although evidence of positive selection was found for multiple amino acid residues, there was no preferential involvement of T cell epitope residues. Overall, the extent of diversity was much greater in T. parva isolates originating from buffalo than in isolates known to be transmissible among cattle. The results indicate that T. parva parasites maintained in cattle represent a subset of the overall T. parva population, which has become adapted for tick transmission between cattle. The absence of obvious enrichment for positively selected amino acid residues within defined epitopes indicates either that diversity is not predominantly driven by selection exerted by host T cells, or that such selection is not detectable by the methods employed due to unidentified epitopes elsewhere in the antigens. Further functional studies are required to address this latter point.

  4. Conservation of tubulin-binding sequences in TRPV1 throughout evolution.

    PubMed

    Sardar, Puspendu; Kumar, Abhishek; Bhandari, Anita; Goswami, Chandan

    2012-01-01

    Transient Receptor Potential Vanilloid sub type 1 (TRPV1), commonly known as capsaicin receptor can detect multiple stimuli ranging from noxious compounds, low pH, temperature as well as electromagnetic wave at different ranges. In addition, this receptor is involved in multiple physiological and sensory processes. Therefore, functions of TRPV1 have direct influences on adaptation and further evolution also. Availability of various eukaryotic genomic sequences in public domain facilitates us in studying the molecular evolution of TRPV1 protein and the respective conservation of certain domains, motifs and interacting regions that are functionally important. Using statistical and bioinformatics tools, our analysis reveals that TRPV1 has evolved about ∼420 million years ago (MYA). Our analysis reveals that specific regions, domains and motifs of TRPV1 has gone through different selection pressure and thus have different levels of conservation. We found that among all, TRP box is the most conserved and thus have functional significance. Our results also indicate that the tubulin binding sequences (TBS) have evolutionary significance as these stretch sequences are more conserved than many other essential regions of TRPV1. The overall distribution of positively charged residues within the TBS motifs is conserved throughout evolution. In silico analysis reveals that the TBS-1 and TBS-2 of TRPV1 can form helical structures and may play important role in TRPV1 function. Our analysis identifies the regions of TRPV1, which are important for structure-function relationship. This analysis indicates that tubulin binding sequence-1 (TBS-1) near the TRP-box forms a potential helix and the tubulin interactions with TRPV1 via TBS-1 have evolutionary significance. This interaction may be required for the proper channel function and regulation and may also have significance in the context of Taxol®-induced neuropathy.

  5. Protein structure based prediction of catalytic residues

    PubMed Central

    2013-01-01

    Background Worldwide structural genomics projects continue to release new protein structures at an unprecedented pace, so far nearly 6000, but only about 60% of these proteins have any sort of functional annotation. Results We explored a range of features that can be used for the prediction of functional residues given a known three-dimensional structure. These features include various centrality measures of nodes in graphs of interacting residues: closeness, betweenness and page-rank centrality. We also analyzed the distance of functional amino acids to the general center of mass (GCM) of the structure, relative solvent accessibility (RSA), and the use of relative entropy as a measure of sequence conservation. From the selected features, neural networks were trained to identify catalytic residues. We found that using distance to the GCM together with amino acid type provide a good discriminant function, when combined independently with sequence conservation. Using an independent test set of 29 annotated protein structures, the method returned 411 of the initial 9262 residues as the most likely to be involved in function. The output 411 residues contain 70 of the annotated 111 catalytic residues. This represents an approximately 14-fold enrichment of catalytic residues on the entire input set (corresponding to a sensitivity of 63% and a precision of 17%), a performance competitive with that of other state-of-the-art methods. Conclusions We found that several of the graph based measures utilize the same underlying feature of protein structures, which can be simply and more effectively captured with the distance to GCM definition. This also has the added the advantage of simplicity and easy implementation. Meanwhile sequence conservation remains by far the most influential feature in identifying functional residues. We also found that due the rapid changes in size and composition of sequence databases, conservation calculations must be recalibrated for specific reference databases. PMID:23433045

  6. CCR2 and CCR5 receptor-binding properties of herpesvirus-8 vMIP-II based on sequence analysis and its solution structure.

    PubMed

    Shao, W; Fernandez, E; Sachpatzidis, A; Wilken, J; Thompson, D A; Schweitzer, B I; Lolis, E

    2001-05-01

    Human herpesvirus-8 (HHV-8) is the infectious agent responsible for Kaposi's sarcoma and encodes a protein, macrophage inflammatory protein-II (vMIP-II), which shows sequence similarity to the human CC chemokines. vMIP-II has broad receptor specificity that crosses chemokine receptor subfamilies, and inhibits HIV-1 viral entry mediated by numerous chemokine receptors. In this study, the solution structure of chemically synthesized vMIP-II was determined by nuclear magnetic resonance. The protein is a monomer and possesses the chemokine fold consisting of a flexible N-terminus, three antiparallel beta strands, and a C-terminal alpha helix. Except for the N-terminal residues (residues 1-13) and the last two C-terminal residues (residues 73-74), the structure of vMIP-II is well-defined, exhibiting average rmsd of 0.35 and 0.90 A for the backbone heavy atoms and all heavy atoms of residues 14-72, respectively. Taking into account the sequence differences between the various CC chemokines and comparing their three-dimensional structures allows us to implicate residues that influence the quaternary structure and receptor binding and activation of these proteins in solution. The analysis of the sequence and three-dimensional structure of vMIP-II indicates the presence of epitopes involved in binding two receptors CCR2 and CCR5. We propose that vMIP-II was initially specific for CCR5 and acquired receptor-binding properties to CCR2 and other chemokine receptors.

  7. Identification of short single disulfide-containing contryphans from the venom of cone snails using de novo mass spectrometry-based sequencing methods.

    PubMed

    Franklin, Jayaseelan Benjamin; Rajesh, Rajaian Pushpabai; Vinithkumar, Nambali Valsalan; Kirubagaran, Ramalingam

    2017-06-15

    We identified 12 short single disulfide-containing conopeptides from the venom of Conus coronatus, C. leopardus, C. lividus and C. zonatus. Interestingly, we detected the shortest contryphan sequence thus far characterized which contains only six amino acid residues. We also identified three distinct contryphan sequences of C. lividus without any proline residues and one sequence with an unusual post-translational modification (bromination of tryptophan). Furthermore, we characterized venom peptides of C. zonatus for the first time. Copyright © 2017 Elsevier Ltd. All rights reserved.

  8. Complete cDNA sequence and amino acid analysis of a bovine ribonuclease K6 gene.

    PubMed

    Pietrowski, D; Förster, M

    2000-01-01

    The complete cDNA sequence of a ribonuclease k6 gene of Bos Taurus has been determined. It codes for a protein with 154 amino acids and contains the invariant cysteine, histidine and lysine residues as well as the characteristic motifs specific to ribonuclease active sites. The deduced protein sequence is 27 residues longer than other known ribonucleases k6 and shows amino acids exchanges which could reflect a strain specificity or polymorphism within the bovine genome. Based on sequence similarity we have termed the identified gene bovine ribonuclease k6 b (brk6b).

  9. Iterative refinement of structure-based sequence alignments by Seed Extension

    PubMed Central

    Kim, Changhoon; Tai, Chin-Hsien; Lee, Byungkook

    2009-01-01

    Background Accurate sequence alignment is required in many bioinformatics applications but, when sequence similarity is low, it is difficult to obtain accurate alignments based on sequence similarity alone. The accuracy improves when the structures are available, but current structure-based sequence alignment procedures still mis-align substantial numbers of residues. In order to correct such errors, we previously explored the possibility of replacing the residue-based dynamic programming algorithm in structure alignment procedures with the Seed Extension algorithm, which does not use a gap penalty. Here, we describe a new procedure called RSE (Refinement with Seed Extension) that iteratively refines a structure-based sequence alignment. Results RSE uses SE (Seed Extension) in its core, which is an algorithm that we reported recently for obtaining a sequence alignment from two superimposed structures. The RSE procedure was evaluated by comparing the correctly aligned fractions of residues before and after the refinement of the structure-based sequence alignments produced by popular programs. CE, DaliLite, FAST, LOCK2, MATRAS, MATT, TM-align, SHEBA and VAST were included in this analysis and the NCBI's CDD root node set was used as the reference alignments. RSE improved the average accuracy of sequence alignments for all programs tested when no shift error was allowed. The amount of improvement varied depending on the program. The average improvements were small for DaliLite and MATRAS but about 5% for CE and VAST. More substantial improvements have been seen in many individual cases. The additional computation times required for the refinements were negligible compared to the times taken by the structure alignment programs. Conclusion RSE is a computationally inexpensive way of improving the accuracy of a structure-based sequence alignment. It can be used as a standalone procedure following a regular structure-based sequence alignment or to replace the traditional iterative refinement procedures based on residue-level dynamic programming algorithm in many structure alignment programs. PMID:19589133

  10. A Method for WD40 Repeat Detection and Secondary Structure Prediction

    PubMed Central

    Wang, Yang; Jiang, Fan; Zhuo, Zhu; Wu, Xian-Hui; Wu, Yun-Dong

    2013-01-01

    WD40-repeat proteins (WD40s), as one of the largest protein families in eukaryotes, play vital roles in assembling protein-protein/DNA/RNA complexes. WD40s fold into similar β-propeller structures despite diversified sequences. A program WDSP (WD40 repeat protein Structure Predictor) has been developed to accurately identify WD40 repeats and predict their secondary structures. The method is designed specifically for WD40 proteins by incorporating both local residue information and non-local family-specific structural features. It overcomes the problem of highly diversified protein sequences and variable loops. In addition, WDSP achieves a better prediction in identifying multiple WD40-domain proteins by taking the global combination of repeats into consideration. In secondary structure prediction, the average Q3 accuracy of WDSP in jack-knife test reaches 93.7%. A disease related protein LRRK2 was used as a representive example to demonstrate the structure prediction. PMID:23776530

  11. The interaction between the iron-responsive element binding protein and its cognate RNA is highly dependent upon both RNA sequence and structure.

    PubMed

    Jaffrey, S R; Haile, D J; Klausner, R D; Harford, J B

    1993-09-25

    To assess the influence of RNA sequence/structure on the interaction RNAs with the iron-responsive element binding protein (IRE-BP), twenty eight altered RNAs were tested as competitors for an RNA corresponding to the ferritin H chain IRE. All changes in the loop of the predicted IRE hairpin and in the unpaired cytosine residue characteristically found in IRE stems significantly decreased the apparent affinity of the RNA for the IRE-BP. Similarly, alteration in the spacing and/or orientation of the loop and the unpaired cytosine of the stem by either increasing or decreasing the number of base pairs separating them significantly reduced efficacy as a competitor. It is inferred that the IRE-BP forms multiple contacts with its cognate RNA, and that these contacts, acting in concert, provide the basis for the high affinity of this interaction.

  12. Potential ligand-binding residues in rat olfactory receptors identified by correlated mutation analysis

    NASA Technical Reports Server (NTRS)

    Singer, M. S.; Oliveira, L.; Vriend, G.; Shepherd, G. M.

    1995-01-01

    A family of G-protein-coupled receptors is believed to mediate the recognition of odor molecules. In order to identify potential ligand-binding residues, we have applied correlated mutation analysis to receptor sequences from the rat. This method identifies pairs of sequence positions where residues remain conserved or mutate in tandem, thereby suggesting structural or functional importance. The analysis supported molecular modeling studies in suggesting several residues in positions that were consistent with ligand-binding function. Two of these positions, dominated by histidine residues, may play important roles in ligand binding and could confer broad specificity to mammalian odor receptors. The presence of positive (overdominant) selection at some of the identified positions provides additional evidence for roles in ligand binding. Higher-order groups of correlated residues were also observed. Each group may interact with an individual ligand determinant, and combinations of these groups may provide a multi-dimensional mechanism for receptor diversity.

  13. Negative Electron Transfer Dissociation Sequencing of 3-O-Sulfation-Containing Heparan Sulfate Oligosaccharides

    NASA Astrophysics Data System (ADS)

    Wu, Jiandong; Wei, Juan; Hogan, John D.; Chopra, Pradeep; Joshi, Apoorva; Lu, Weigang; Klein, Joshua; Boons, Geert-Jan; Lin, Cheng; Zaia, Joseph

    2018-03-01

    Among dissociation methods, negative electron transfer dissociation (NETD) has been proven the most useful for glycosaminoglycan (GAG) sequencing because it produces informative fragmentation, a low degree of sulfate losses, high sensitivity, and translatability to multiple instrument types. The challenge, however, is to distinguish positional sulfation. In particular, NETD has been reported to fail to differentiate 4-O- versus 6-O-sulfation in chondroitin sulfate decasaccharide. This raised the concern of whether NETD is able to differentiate the rare 3-O-sulfation from predominant 6-O-sulfation in heparan sulfate (HS) oligosaccharides. Here, we report that NETD generates highly informative spectra that differentiate sites of O-sulfation on glucosamine residues, enabling structural characterizations of synthetic HS isomers containing 3-O-sulfation. Further, lyase-resistant 3-O-sulfated tetrasaccharides from natural sources were successfully sequenced. Notably, for all of the oligosaccharides in this study, the successful sequencing is based on NETD tandem mass spectra of commonly observed deprotonated precursor ions without derivatization or metal cation adduction, simplifying the experimental workflow and data interpretation. These results demonstrate the potential of NETD as a sensitive analytical tool for detailed, high-throughput structural analysis of highly sulfated GAGs. [Figure not available: see fulltext.

  14. A Case-by-Case Evolutionary Analysis of Four Imprinted Retrogenes

    PubMed Central

    McCole, Ruth B; Loughran, Noeleen B; Chahal, Mandeep; Fernandes, Luis P; Roberts, Roland G; Fraternali, Franca; O'Connell, Mary J; Oakey, Rebecca J

    2011-01-01

    Retroposition is a widespread phenomenon resulting in the generation of new genes that are initially related to a parent gene via very high coding sequence similarity. We examine the evolutionary fate of four retrogenes generated by such an event; mouse Inpp5f_v2, Mcts2, Nap1l5, and U2af1-rs1. These genes are all subject to the epigenetic phenomenon of parental imprinting. We first provide new data on the age of these retrogene insertions. Using codon-based models of sequence evolution, we show these retrogenes have diverse evolutionary trajectories, including divergence from the parent coding sequence under positive selection pressure, purifying selection pressure maintaining parent-retrogene similarity, and neutral evolution. Examination of the expression pattern of retrogenes shows an atypical, broad pattern across multiple tissues. Protein 3D structure modeling reveals that a positively selected residue in U2af1-rs1, not shared by its parent, may influence protein conformation. Our case-by-case analysis of the evolution of four imprinted retrogenes reveals that this interesting class of imprinted genes, while similar in regulation and sequence characteristics, follow very varied evolutionary paths. PMID:21166792

  15. Application of the MIDAS approach for analysis of lysine acetylation sites.

    PubMed

    Evans, Caroline A; Griffiths, John R; Unwin, Richard D; Whetton, Anthony D; Corfe, Bernard M

    2013-01-01

    Multiple Reaction Monitoring Initiated Detection and Sequencing (MIDAS™) is a mass spectrometry-based technique for the detection and characterization of specific post-translational modifications (Unwin et al. 4:1134-1144, 2005), for example acetylated lysine residues (Griffiths et al. 18:1423-1428, 2007). The MIDAS™ technique has application for discovery and analysis of acetylation sites. It is a hypothesis-driven approach that requires a priori knowledge of the primary sequence of the target protein and a proteolytic digest of this protein. MIDAS essentially performs a targeted search for the presence of modified, for example acetylated, peptides. The detection is based on the combination of the predicted molecular weight (measured as mass-charge ratio) of the acetylated proteolytic peptide and a diagnostic fragment (product ion of m/z 126.1), which is generated by specific fragmentation of acetylated peptides during collision induced dissociation performed in tandem mass spectrometry (MS) analysis. Sequence information is subsequently obtained which enables acetylation site assignment. The technique of MIDAS was later trademarked by ABSciex for targeted protein analysis where an MRM scan is combined with full MS/MS product ion scan to enable sequence confirmation.

  16. Tracing cell lineages in videos of lens-free microscopy.

    PubMed

    Rempfler, Markus; Stierle, Valentin; Ditzel, Konstantin; Kumar, Sanjeev; Paulitschke, Philipp; Andres, Bjoern; Menze, Bjoern H

    2018-06-05

    In vitro experiments with cultured cells are essential for studying their growth and migration pattern and thus, for gaining a better understanding of cancer progression and its treatment. Recent progress in lens-free microscopy (LFM) has rendered it an inexpensive tool for label-free, continuous live cell imaging, yet there is only little work on analysing such time-lapse image sequences. We propose (1) a cell detector for LFM images based on fully convolutional networks and residual learning, and (2) a probabilistic model based on moral lineage tracing that explicitly handles multiple detections and temporal successor hypotheses by clustering and tracking simultaneously. (3) We benchmark our method in terms of detection and tracking scores on a dataset of three annotated sequences of several hours of LFM, where we demonstrate our method to produce high quality lineages. (4) We evaluate its performance on a somewhat more challenging problem: estimating cell lineages from the LFM sequence as would be possible from a corresponding fluorescence microscopy sequence. We present experiments on 16 LFM sequences for which we acquired fluorescence microscopy in parallel and generated annotations from them. Finally, (5) we showcase our methods effectiveness for quantifying cell dynamics in an experiment with skin cancer cells. Copyright © 2018 Elsevier B.V. All rights reserved.

  17. Evolution and Diversity in Human Herpes Simplex Virus Genomes

    PubMed Central

    Gatherer, Derek; Ochoa, Alejandro; Greenbaum, Benjamin; Dolan, Aidan; Bowden, Rory J.; Enquist, Lynn W.; Legendre, Matthieu; Davison, Andrew J.

    2014-01-01

    Herpes simplex virus 1 (HSV-1) causes a chronic, lifelong infection in >60% of adults. Multiple recent vaccine trials have failed, with viral diversity likely contributing to these failures. To understand HSV-1 diversity better, we comprehensively compared 20 newly sequenced viral genomes from China, Japan, Kenya, and South Korea with six previously sequenced genomes from the United States, Europe, and Japan. In this diverse collection of passaged strains, we found that one-fifth of the newly sequenced members share a gene deletion and one-third exhibit homopolymeric frameshift mutations (HFMs). Individual strains exhibit genotypic and potential phenotypic variation via HFMs, deletions, short sequence repeats, and single-nucleotide polymorphisms, although the protein sequence identity between strains exceeds 90% on average. In the first genome-scale analysis of positive selection in HSV-1, we found signs of selection in specific proteins and residues, including the fusion protein glycoprotein H. We also confirmed previous results suggesting that recombination has occurred with high frequency throughout the HSV-1 genome. Despite this, the HSV-1 strains analyzed clustered by geographic origin during whole-genome distance analysis. These data shed light on likely routes of HSV-1 adaptation to changing environments and will aid in the selection of vaccine antigens that are invariant worldwide. PMID:24227835

  18. NMR investigations of molecular dynamics

    NASA Astrophysics Data System (ADS)

    Palmer, Arthur

    2011-03-01

    NMR spectroscopy is a powerful experimental approach for characterizing protein conformational dynamics on multiple time scales. The insights obtained from NMR studies are complemented and by molecular dynamics (MD) simulations, which provide full atomistic details of protein dynamics. Homologous mesophilic (E. coli) and thermophilic (T. thermophilus) ribonuclease H (RNase H) enzymes serve to illustrate how changes in protein sequence and structure that affect conformational dynamic processes can be monitored and characterized by joint analysis of NMR spectroscopy and MD simulations. A Gly residue inserted within a putative hinge between helices B and C is conserved among thermophilic RNases H, but absent in mesophilic RNases H. Experimental spin relaxation measurements show that the dynamic properties of T. thermophilus RNase H are recapitulated in E. coli RNase H by insertion of a Gly residue between helices B and C. Additional specific intramolecular interactions that modulate backbone and sidechain dynamical properties of the Gly-rich loop and of the conserved Trp residue flanking the Gly insertion site have been identified using MD simulations and subsequently confirmed by NMR spin relaxation measurements. These results emphasize the importance of hydrogen bonds and local steric interactions in restricting conformational fluctuations, and the absence of such interactions in allowing conformational adaptation to substrate binding.

  19. A critical analysis of computational protein design with sparse residue interaction graphs

    PubMed Central

    Georgiev, Ivelin S.

    2017-01-01

    Protein design algorithms enumerate a combinatorial number of candidate structures to compute the Global Minimum Energy Conformation (GMEC). To efficiently find the GMEC, protein design algorithms must methodically reduce the conformational search space. By applying distance and energy cutoffs, the protein system to be designed can thus be represented using a sparse residue interaction graph, where the number of interacting residue pairs is less than all pairs of mutable residues, and the corresponding GMEC is called the sparse GMEC. However, ignoring some pairwise residue interactions can lead to a change in the energy, conformation, or sequence of the sparse GMEC vs. the original or the full GMEC. Despite the widespread use of sparse residue interaction graphs in protein design, the above mentioned effects of their use have not been previously analyzed. To analyze the costs and benefits of designing with sparse residue interaction graphs, we computed the GMECs for 136 different protein design problems both with and without distance and energy cutoffs, and compared their energies, conformations, and sequences. Our analysis shows that the differences between the GMECs depend critically on whether or not the design includes core, boundary, or surface residues. Moreover, neglecting long-range interactions can alter local interactions and introduce large sequence differences, both of which can result in significant structural and functional changes. Designs on proteins with experimentally measured thermostability show it is beneficial to compute both the full and the sparse GMEC accurately and efficiently. To this end, we show that a provable, ensemble-based algorithm can efficiently compute both GMECs by enumerating a small number of conformations, usually fewer than 1000. This provides a novel way to combine sparse residue interaction graphs with provable, ensemble-based algorithms to reap the benefits of sparse residue interaction graphs while avoiding their potential inaccuracies. PMID:28358804

  20. Hierarchical damage mechanisms in composite materials subjected to fatigue loadings

    NASA Astrophysics Data System (ADS)

    D'Amore, Alberto; Grassia, Luigi

    2018-02-01

    The strength degradation of fiber reinforced composites subjected to constant amplitude (CA) fatigue loadings can be described by a two-parameter residual strength model. From the analytical approach it results that under moderate loadings the multiple damage mechanisms develop with different kinetics and manifest their effectiveness at different time scales highlighting the three-Stage hierarchical nature of damage accumulation in composites. The model captures the sequence of damage accumulation mechanisms from diffuse matrix cracking (I), to fiber/matrix interface failure (II) to fiber and ply rupture and delamination (III). Further, by increasing the loading severity it appears that the different mechanisms superpose witnessing their simultaneous co-existence.

  1. Identification of the sequence motif of glycoside hydrolase 13 family members

    PubMed Central

    Kumar, Vikash

    2011-01-01

    A bioinformatics analysis of sequences of enzymes of the glycoside hydrolase (GH) 13 family members such as α-amylase, cyclodextrin glycosyltransferase (CGTase), branching enzyme and cyclomaltodextrinase has been carried out in order to find out the sequence motifs that govern the reactions specificities of these enzymes by using hidden Markov model (HMM) profile. This analysis suggests the existence of such sequence motifs and residues of these motifs constituting the −1 to +3 catalytic subsites of the enzyme. Hence, by introducing mutations in the residues of these four subsites, one can change the reaction specificities of the enzymes. In general it has been observed that α -amylase sequence motif have low sequence conservation than rest of the motifs of the GH13 family members. PMID:21544166

  2. Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model.

    PubMed

    Wang, Sheng; Sun, Siqi; Li, Zhen; Zhang, Renyu; Xu, Jinbo

    2017-01-01

    Protein contacts contain key information for the understanding of protein structure and function and thus, contact prediction from sequence is an important problem. Recently exciting progress has been made on this problem, but the predicted contacts for proteins without many sequence homologs is still of low quality and not very useful for de novo structure prediction. This paper presents a new deep learning method that predicts contacts by integrating both evolutionary coupling (EC) and sequence conservation information through an ultra-deep neural network formed by two deep residual neural networks. The first residual network conducts a series of 1-dimensional convolutional transformation of sequential features; the second residual network conducts a series of 2-dimensional convolutional transformation of pairwise information including output of the first residual network, EC information and pairwise potential. By using very deep residual networks, we can accurately model contact occurrence patterns and complex sequence-structure relationship and thus, obtain higher-quality contact prediction regardless of how many sequence homologs are available for proteins in question. Our method greatly outperforms existing methods and leads to much more accurate contact-assisted folding. Tested on 105 CASP11 targets, 76 past CAMEO hard targets, and 398 membrane proteins, the average top L long-range prediction accuracy obtained by our method, one representative EC method CCMpred and the CASP11 winner MetaPSICOV is 0.47, 0.21 and 0.30, respectively; the average top L/10 long-range accuracy of our method, CCMpred and MetaPSICOV is 0.77, 0.47 and 0.59, respectively. Ab initio folding using our predicted contacts as restraints but without any force fields can yield correct folds (i.e., TMscore>0.6) for 203 of the 579 test proteins, while that using MetaPSICOV- and CCMpred-predicted contacts can do so for only 79 and 62 of them, respectively. Our contact-assisted models also have much better quality than template-based models especially for membrane proteins. The 3D models built from our contact prediction have TMscore>0.5 for 208 of the 398 membrane proteins, while those from homology modeling have TMscore>0.5 for only 10 of them. Further, even if trained mostly by soluble proteins, our deep learning method works very well on membrane proteins. In the recent blind CAMEO benchmark, our fully-automated web server implementing this method successfully folded 6 targets with a new fold and only 0.3L-2.3L effective sequence homologs, including one β protein of 182 residues, one α+β protein of 125 residues, one α protein of 140 residues, one α protein of 217 residues, one α/β of 260 residues and one α protein of 462 residues. Our method also achieved the highest F1 score on free-modeling targets in the latest CASP (Critical Assessment of Structure Prediction), although it was not fully implemented back then. http://raptorx.uchicago.edu/ContactMap/.

  3. Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model

    PubMed Central

    Li, Zhen; Zhang, Renyu

    2017-01-01

    Motivation Protein contacts contain key information for the understanding of protein structure and function and thus, contact prediction from sequence is an important problem. Recently exciting progress has been made on this problem, but the predicted contacts for proteins without many sequence homologs is still of low quality and not very useful for de novo structure prediction. Method This paper presents a new deep learning method that predicts contacts by integrating both evolutionary coupling (EC) and sequence conservation information through an ultra-deep neural network formed by two deep residual neural networks. The first residual network conducts a series of 1-dimensional convolutional transformation of sequential features; the second residual network conducts a series of 2-dimensional convolutional transformation of pairwise information including output of the first residual network, EC information and pairwise potential. By using very deep residual networks, we can accurately model contact occurrence patterns and complex sequence-structure relationship and thus, obtain higher-quality contact prediction regardless of how many sequence homologs are available for proteins in question. Results Our method greatly outperforms existing methods and leads to much more accurate contact-assisted folding. Tested on 105 CASP11 targets, 76 past CAMEO hard targets, and 398 membrane proteins, the average top L long-range prediction accuracy obtained by our method, one representative EC method CCMpred and the CASP11 winner MetaPSICOV is 0.47, 0.21 and 0.30, respectively; the average top L/10 long-range accuracy of our method, CCMpred and MetaPSICOV is 0.77, 0.47 and 0.59, respectively. Ab initio folding using our predicted contacts as restraints but without any force fields can yield correct folds (i.e., TMscore>0.6) for 203 of the 579 test proteins, while that using MetaPSICOV- and CCMpred-predicted contacts can do so for only 79 and 62 of them, respectively. Our contact-assisted models also have much better quality than template-based models especially for membrane proteins. The 3D models built from our contact prediction have TMscore>0.5 for 208 of the 398 membrane proteins, while those from homology modeling have TMscore>0.5 for only 10 of them. Further, even if trained mostly by soluble proteins, our deep learning method works very well on membrane proteins. In the recent blind CAMEO benchmark, our fully-automated web server implementing this method successfully folded 6 targets with a new fold and only 0.3L-2.3L effective sequence homologs, including one β protein of 182 residues, one α+β protein of 125 residues, one α protein of 140 residues, one α protein of 217 residues, one α/β of 260 residues and one α protein of 462 residues. Our method also achieved the highest F1 score on free-modeling targets in the latest CASP (Critical Assessment of Structure Prediction), although it was not fully implemented back then. Availability http://raptorx.uchicago.edu/ContactMap/ PMID:28056090

  4. RosettaAntibodyDesign (RAbD): A general framework for computational antibody design

    PubMed Central

    Adolf-Bryfogle, Jared; Kalyuzhniy, Oleks; Kubitz, Michael; Hu, Xiaozhen; Adachi, Yumiko; Schief, William R.

    2018-01-01

    A structural-bioinformatics-based computational methodology and framework have been developed for the design of antibodies to targets of interest. RosettaAntibodyDesign (RAbD) samples the diverse sequence, structure, and binding space of an antibody to an antigen in highly customizable protocols for the design of antibodies in a broad range of applications. The program samples antibody sequences and structures by grafting structures from a widely accepted set of the canonical clusters of CDRs (North et al., J. Mol. Biol., 406:228–256, 2011). It then performs sequence design according to amino acid sequence profiles of each cluster, and samples CDR backbones using a flexible-backbone design protocol incorporating cluster-based CDR constraints. Starting from an existing experimental or computationally modeled antigen-antibody structure, RAbD can be used to redesign a single CDR or multiple CDRs with loops of different length, conformation, and sequence. We rigorously benchmarked RAbD on a set of 60 diverse antibody–antigen complexes, using two design strategies—optimizing total Rosetta energy and optimizing interface energy alone. We utilized two novel metrics for measuring success in computational protein design. The design risk ratio (DRR) is equal to the frequency of recovery of native CDR lengths and clusters divided by the frequency of sampling of those features during the Monte Carlo design procedure. Ratios greater than 1.0 indicate that the design process is picking out the native more frequently than expected from their sampled rate. We achieved DRRs for the non-H3 CDRs of between 2.4 and 4.0. The antigen risk ratio (ARR) is the ratio of frequencies of the native amino acid types, CDR lengths, and clusters in the output decoys for simulations performed in the presence and absence of the antigen. For CDRs, we achieved cluster ARRs as high as 2.5 for L1 and 1.5 for H2. For sequence design simulations without CDR grafting, the overall recovery for the native amino acid types for residues that contact the antigen in the native structures was 72% in simulations performed in the presence of the antigen and 48% in simulations performed without the antigen, for an ARR of 1.5. For the non-contacting residues, the ARR was 1.08. This shows that the sequence profiles are able to maintain the amino acid types of these conserved, buried sites, while recovery of the exposed, contacting residues requires the presence of the antigen-antibody interface. We tested RAbD experimentally on both a lambda and kappa antibody–antigen complex, successfully improving their affinities 10 to 50 fold by replacing individual CDRs of the native antibody with new CDR lengths and clusters. PMID:29702641

  5. RosettaAntibodyDesign (RAbD): A general framework for computational antibody design.

    PubMed

    Adolf-Bryfogle, Jared; Kalyuzhniy, Oleks; Kubitz, Michael; Weitzner, Brian D; Hu, Xiaozhen; Adachi, Yumiko; Schief, William R; Dunbrack, Roland L

    2018-04-01

    A structural-bioinformatics-based computational methodology and framework have been developed for the design of antibodies to targets of interest. RosettaAntibodyDesign (RAbD) samples the diverse sequence, structure, and binding space of an antibody to an antigen in highly customizable protocols for the design of antibodies in a broad range of applications. The program samples antibody sequences and structures by grafting structures from a widely accepted set of the canonical clusters of CDRs (North et al., J. Mol. Biol., 406:228-256, 2011). It then performs sequence design according to amino acid sequence profiles of each cluster, and samples CDR backbones using a flexible-backbone design protocol incorporating cluster-based CDR constraints. Starting from an existing experimental or computationally modeled antigen-antibody structure, RAbD can be used to redesign a single CDR or multiple CDRs with loops of different length, conformation, and sequence. We rigorously benchmarked RAbD on a set of 60 diverse antibody-antigen complexes, using two design strategies-optimizing total Rosetta energy and optimizing interface energy alone. We utilized two novel metrics for measuring success in computational protein design. The design risk ratio (DRR) is equal to the frequency of recovery of native CDR lengths and clusters divided by the frequency of sampling of those features during the Monte Carlo design procedure. Ratios greater than 1.0 indicate that the design process is picking out the native more frequently than expected from their sampled rate. We achieved DRRs for the non-H3 CDRs of between 2.4 and 4.0. The antigen risk ratio (ARR) is the ratio of frequencies of the native amino acid types, CDR lengths, and clusters in the output decoys for simulations performed in the presence and absence of the antigen. For CDRs, we achieved cluster ARRs as high as 2.5 for L1 and 1.5 for H2. For sequence design simulations without CDR grafting, the overall recovery for the native amino acid types for residues that contact the antigen in the native structures was 72% in simulations performed in the presence of the antigen and 48% in simulations performed without the antigen, for an ARR of 1.5. For the non-contacting residues, the ARR was 1.08. This shows that the sequence profiles are able to maintain the amino acid types of these conserved, buried sites, while recovery of the exposed, contacting residues requires the presence of the antigen-antibody interface. We tested RAbD experimentally on both a lambda and kappa antibody-antigen complex, successfully improving their affinities 10 to 50 fold by replacing individual CDRs of the native antibody with new CDR lengths and clusters.

  6. A deep learning framework for improving long-range residue-residue contact prediction using a hierarchical strategy.

    PubMed

    Xiong, Dapeng; Zeng, Jianyang; Gong, Haipeng

    2017-09-01

    Residue-residue contacts are of great value for protein structure prediction, since contact information, especially from those long-range residue pairs, can significantly reduce the complexity of conformational sampling for protein structure prediction in practice. Despite progresses in the past decade on protein targets with abundant homologous sequences, accurate contact prediction for proteins with limited sequence information is still far from satisfaction. Methodologies for these hard targets still need further improvement. We presented a computational program DeepConPred, which includes a pipeline of two novel deep-learning-based methods (DeepCCon and DeepRCon) as well as a contact refinement step, to improve the prediction of long-range residue contacts from primary sequences. When compared with previous prediction approaches, our framework employed an effective scheme to identify optimal and important features for contact prediction, and was only trained with coevolutionary information derived from a limited number of homologous sequences to ensure robustness and usefulness for hard targets. Independent tests showed that 59.33%/49.97%, 64.39%/54.01% and 70.00%/59.81% of the top L/5, top L/10 and top 5 predictions were correct for CASP10/CASP11 proteins, respectively. In general, our algorithm ranked as one of the best methods for CASP targets. All source data and codes are available at http://166.111.152.91/Downloads.html . hgong@tsinghua.edu.cn or zengjy321@tsinghua.edu.cn. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  7. Proline: The Distribution, Frequency, Positioning, and Common Functional Roles of Proline and Polyproline Sequences in the Human Proteome

    PubMed Central

    Morgan, Alexander A.; Rubenstein, Edward

    2013-01-01

    Proline is an anomalous amino acid. Its nitrogen atom is covalently locked within a ring, thus it is the only proteinogenic amino acid with a constrained phi angle. Sequences of three consecutive prolines can fold into polyproline helices, structures that join alpha helices and beta pleats as architectural motifs in protein configuration. Triproline helices are participants in protein-protein signaling interactions. Longer spans of repeat prolines also occur, containing as many as 27 consecutive proline residues. Little is known about the frequency, positioning, and functional significance of these proline sequences. Therefore we have undertaken a systematic bioinformatics study of proline residues in proteins. We analyzed the distribution and frequency of 687,434 proline residues among 18,666 human proteins, identifying single residues, dimers, trimers, and longer repeats. Proline accounts for 6.3% of the 10,882,808 protein amino acids. Of all proline residues, 4.4% are in trimers or longer spans. We detected patterns that influence function based on proline location, spacing, and concentration. We propose a classification based on proline-rich, polyproline-rich, and proline-poor status. Whereas singlet proline residues are often found in proteins that display recurring architectural patterns, trimers or longer proline sequences tend be associated with the absence of repetitive structural motifs. Spans of 6 or more are associated with DNA/RNA processing, actin, and developmental processes. We also suggest a role for proline in Kruppel-type zinc finger protein control of DNA expression, and in the nucleation and translocation of actin by the formin complex. PMID:23372670

  8. Cloning and characterization of the novel D-aspartyl endopeptidase, paenidase, from Paenibacillus sp. B38.

    PubMed

    Nirasawa, Satoru; Nakahara, Kazuhiko; Takahashi, Saori

    2018-02-27

    Paenidase is the first microorganism-derived D-aspartyl endopeptidase that specifically recognizes an internal D-Asp residue to cleave [D-Asp]-X peptide bonds. Using peptide sequences obtained from the protein, we performed PCR with degenerate primers to amplify the paenidase I-encoding gene. Nucleotide sequencing revealed that mature paenidase I consists of 322 amino acid residues and that the protein is encoded as a pro-protein with a 197-amino-acid N-terminal extension compared to the mature protein. Paenidase I exhibits amino acid sequence similarity to several penicillin-binding proteins. In addition, paenidase I was classified into peptidase family S12 based on a MEROPS database search. Family S12 contains serine-type D-Ala-D-Ala carboxypeptidases that have three active site residues (Ser, Lys, and Tyr) in the conserved motifs Ser-Xaa-Thr-Lys and Tyr-Xaa-Asn. These motifs were conserved in the primary structure of paenidase I, and the role of these residues was confirmed by site-directed mutagenesis.

  9. Molecular cloning and sequence analysis of the gene coding for the 57kDa soluble antigen of the salmonid fish pathogen Renibacterium salmoninarum

    USGS Publications Warehouse

    Chien, Maw-Sheng; Gilbert , Teresa L.; Huang, Chienjin; Landolt, Marsha L.; O'Hara, Patrick J.; Winton, James R.

    1992-01-01

    The complete sequence coding for the 57-kDa major soluble antigen of the salmonid fish pathogen, Renibacterium salmoninarum, was determined. The gene contained an opening reading frame of 1671 nucleotides coding for a protein of 557 amino acids with a calculated Mr value of 57190. The first 26 amino acids constituted a signal peptide. The deduced sequence for amino acid residues 27–61 was in agreement with the 35 N-terminal amino acid residues determined by microsequencing, suggesting the protein in synthesized as a 557-amino acid precursor and processed to produce a mature protein of Mr 54505. Two regions of the protein contained imperfect direct repeats. The first region contained two copies of an 81-residue repeat, the second contained five copies of an unrelated 25-residue repeat. Also, a perfect inverted repeat (including three in-frame UAA stop codons) was observed at the carboxyl-terminus of the gene.

  10. DNA-binding proteins from marine bacteria expand the known sequence diversity of TALE-like repeats

    PubMed Central

    de Lange, Orlando; Wolf, Christina; Thiel, Philipp; Krüger, Jens; Kleusch, Christian; Kohlbacher, Oliver; Lahaye, Thomas

    2015-01-01

    Transcription Activator-Like Effectors (TALEs) of Xanthomonas bacteria are programmable DNA binding proteins with unprecedented target specificity. Comparative studies into TALE repeat structure and function are hindered by the limited sequence variation among TALE repeats. More sequence-diverse TALE-like proteins are known from Ralstonia solanacearum (RipTALs) and Burkholderia rhizoxinica (Bats), but RipTAL and Bat repeats are conserved with those of TALEs around the DNA-binding residue. We study two novel marine-organism TALE-like proteins (MOrTL1 and MOrTL2), the first to date of non-terrestrial origin. We have assessed their DNA-binding properties and modelled repeat structures. We found that repeats from these proteins mediate sequence specific DNA binding conforming to the TALE code, despite low sequence similarity to TALE repeats, and with novel residues around the BSR. However, MOrTL1 repeats show greater sequence discriminating power than MOrTL2 repeats. Sequence alignments show that there are only three residues conserved between repeats of all TALE-like proteins including the two new additions. This conserved motif could prove useful as an identifier for future TALE-likes. Additionally, comparing MOrTL repeats with those of other TALE-likes suggests a common evolutionary origin for the TALEs, RipTALs and Bats. PMID:26481363

  11. Isolation and characterization of the chicken trypsinogen gene family.

    PubMed Central

    Wang, K; Gan, L; Lee, I; Hood, L

    1995-01-01

    Based on genomic Southern hybridizations and cDNA sequence analyses, the chicken trypsinogen gene family can be divided into two multi-member subfamilies, a six-member trypsinogen I subfamily which encodes the cationic trypsin isoenzymes and a three-member trypsinogen II subfamily which encodes the anionic trypsin isoenzymes. The chicken cDNA and genomic clones containing these two subfamilies were isolated and characterized by DNA sequence analysis. The results indicated that the chicken trypsinogen genes encoded a signal peptide of 15 to 16 amino acid residues, an activation peptide of 9 to 10 residues and a trypsin of 223 amino acid residues. The chicken trypsinogens contain all the common catalytic and structural features for trypsins, including the catalytic triad His, Asp and Ser and the six disulphide bonds. The trypsinogen I and II subfamilies share approximately 70% sequence identity at the nucleotide and amino acid level. The sequence comparison among chicken trypsinogen subfamily members and trypsin sequences from other species suggested that the chicken trypsinogen genes may have evolved in coincidental or concerted fashion. Images Figure 6 Figure 7 PMID:7733885

  12. Mechanism of degradation of 2'-deoxycytidine by formamide: implications for chemical DNA sequencing procedures.

    PubMed

    Saladino, R; Crestini, C; Mincione, E; Costanzo, G; Di Mauro, E; Negri, R

    1997-11-01

    We describe the reaction of formamide with 2'-deoxycytidine to give pyrimidine ring opening by nucleophilic addition on the electrophilic C(6) and C(4) positions. This information is confirmed by the analysis of the products of formamide attack on 2'-deoxycytidine, 5-methyl-2'-deoxycytidine, and 5-bromo-2'-deoxycytidine, residues when the latter are incorporated into oligonucleotides by DNA polymerase-driven polymerization and solid-phase phosphoramidite procedure. The increased sensitivity of 5-bromo-2'-deoxycytidine relative to that of 2'-deoxycytidine is pivotal for the improvement of the one-lane chemical DNA sequencing procedure based on the base-selective reaction of formamide with DNA. In many DNA sequencing cases it will in fact be possible to incorporate this base analogue into the DNA to be sequenced, thus providing a complete discrimination between its UV absorption signal and that of the thymidine residues. The wide spectrum of different sensitivities to formamide displayed by the 2'-deoxycytidine analogues solves, in the DNA single-lane chemical sequencing procedure, the possible source of errors due to low discrimination between C and T residues.

  13. Prediction of Spontaneous Protein Deamidation from Sequence-Derived Secondary Structure and Intrinsic Disorder.

    PubMed

    Lorenzo, J Ramiro; Alonso, Leonardo G; Sánchez, Ignacio E

    2015-01-01

    Asparagine residues in proteins undergo spontaneous deamidation, a post-translational modification that may act as a molecular clock for the regulation of protein function and turnover. Asparagine deamidation is modulated by protein local sequence, secondary structure and hydrogen bonding. We present NGOME, an algorithm able to predict non-enzymatic deamidation of internal asparagine residues in proteins in the absence of structural data, using sequence-based predictions of secondary structure and intrinsic disorder. Compared to previous algorithms, NGOME does not require three-dimensional structures yet yields better predictions than available sequence-only methods. Four case studies of specific proteins show how NGOME may help the user identify deamidation-prone asparagine residues, often related to protein gain of function, protein degradation or protein misfolding in pathological processes. A fifth case study applies NGOME at a proteomic scale and unveils a correlation between asparagine deamidation and protein degradation in yeast. NGOME is freely available as a webserver at the National EMBnet node Argentina, URL: http://www.embnet.qb.fcen.uba.ar/ in the subpage "Protein and nucleic acid structure and sequence analysis".

  14. Evolution and the Distribution of Glutaminyl and Asparaginyl Residues in Proteins

    PubMed Central

    Robinson, Arthur B.

    1974-01-01

    Recent experiments on the deamidation of glutaminyl and asparaginyl residues in peptides and proteins support the hypothesis that these residues may serve as molecular clocks that control biological processes. A hypothesis is now offered that suggests that these molecular clocks are set by rejection or accumulation of appropriate sequences of residues including a glutaminyl or asparaginyl residue during evolution. PMID:4522799

  15. Non-active site mutation (Q123A) in New Delhi metallo-β-lactamase (NDM-1) enhanced its enzyme activity.

    PubMed

    Ali, Abid; Azam, Mohd W; Khan, Asad U

    2018-06-01

    New Delhi metallo β-lactamase-1 is one of the carbapenemases, causing hydrolysis of almost all β-lactamase antibiotics. Seventeen different NDM variants have been reported so far, they varied in their sequences either by single or multiple amino acid substitutions. Hence, it is important to understand its structural and functional relation. In the earlier studies role of active site residues has been studied but non-active site residues has not studied in detail. Therefore, we have initiated to further comprehend its structure and function relation by mutating some of its non-active site residues. A laboratory mutant of NDM-1 was generated by PCR-based site-directed mutagenesis, replacing Q to A at 123 position. The MICs of imipenem and meropenem for NDM-1 Q123A were found increased by 2 fold as compare to wild type and so the hydrolytic activity was enhanced (Kcat/Km) as compared to NDM-1 wild type. GOLD fitness scores were also found in favour of kinetics data. Secondary structure for α-helical content was determined by Far-UV circular dichroism (CD), which showed significant conformational changes. We conclude a noteworthy role of non-active-site amino acid residues in the catalytic activity of NDM-1. This study also provides an insight of emergence of new variants through natural evolution. Copyright © 2018 Elsevier B.V. All rights reserved.

  16. Arg-Pro-X-Ser/Thr is a Consensus Phosphoacceptor Sequence for the Meiosis-Specific Ime2 Protein Kinase in Saccharomyces cerevisiae†

    PubMed Central

    Moore, Michael; Shin, Marcus; Bruning, Adrian; Schindler, Karen; Vershon, Andrew; Winter, Edward

    2008-01-01

    Ime2 is a meiosis-specific protein kinase in Saccharomyces cerevisiae that is functionally related to cyclin-dependent kinase. Although Ime2 regulates multiple steps in meiosis, only a few of its substrates have been identified. Here we show that Ime2 phosphorylates Sum1, a repressor of meiotic gene transcription, on Thr-306. Ime2 protein kinase assays on Sum1 mutants and synthetic peptides define a consensus motif Arg-Pro-X-Ser/Thr that is required for efficient phosphorylation by Ime2. The carboxyl residue adjacent to the phosphoacceptor (+1 position) also influences the efficiency of Ime2 phosphorylation with alanine being a preferred residue. This information has predictive value in identifying new potential Ime2 targets as shown by the ability of Ime2 to phosphorylate Sgs1 and Gip1 in vitro, and could be important in differentiating mitotic and meiotic regulatory pathways. PMID:17198398

  17. Two opsins from the compound eye of the crab Hemigrapsus sanguineus

    PubMed

    Sakamoto; Hisatomi; Tokunaga; Eguchi

    1996-01-01

    The primary structures of two opsins from the brachyuran crab Hemigrapsus sanguineus were deduced from the cDNA nucleotide sequences. Both deduced proteins were composed of 377 amino acid residues and included residues highly conserved in visual pigments of other species, and the proteins were 75 % identical to each other. The distribution of opsin transcripts in the compound eye, determined by in situ hybridization, suggested that the mRNAs of the two opsins were expressed simultaneously in all of the seven retinular cells (R1-R7) forming the main rhabdom in each ommatidium. Two different visual pigments may be present in one photoreceptor cell in this brachyuran crab. The spectral sensitivity of the compound eye was also determined by recording the electroretinogram. The compound eye was maximally sensitive at about 480 nm. These and previous findings suggest that both opsins of this brachyuran crab produce visual pigments with maximal absorption in the blue-green region of the spectrum. Evidence is presented that crustaceans possess multiple pigment systems for vision.

  18. Not all transmembrane helices are born equal: Towards the extension of the sequence homology concept to membrane proteins

    PubMed Central

    2011-01-01

    Background Sequence homology considerations widely used to transfer functional annotation to uncharacterized protein sequences require special precautions in the case of non-globular sequence segments including membrane-spanning stretches composed of non-polar residues. Simple, quantitative criteria are desirable for identifying transmembrane helices (TMs) that must be included into or should be excluded from start sequence segments in similarity searches aimed at finding distant homologues. Results We found that there are two types of TMs in membrane-associated proteins. On the one hand, there are so-called simple TMs with elevated hydrophobicity, low sequence complexity and extraordinary enrichment in long aliphatic residues. They merely serve as membrane-anchoring device. In contrast, so-called complex TMs have lower hydrophobicity, higher sequence complexity and some functional residues. These TMs have additional roles besides membrane anchoring such as intra-membrane complex formation, ligand binding or a catalytic role. Simple and complex TMs can occur both in single- and multi-membrane-spanning proteins essentially in any type of topology. Whereas simple TMs have the potential to confuse searches for sequence homologues and to generate unrelated hits with seemingly convincing statistical significance, complex TMs contain essential evolutionary information. Conclusion For extending the homology concept onto membrane proteins, we provide a necessary quantitative criterion to distinguish simple TMs (and a sufficient criterion for complex TMs) in query sequences prior to their usage in homology searches based on assessment of hydrophobicity and sequence complexity of the TM sequence segments. Reviewers This article was reviewed by Shamil Sunyaev, L. Aravind and Arcady Mushegian. PMID:22024092

  19. Molecular cloning of pepsinogens A and C from adult newt (Cynops pyrrhogaster) stomach.

    PubMed

    Inokuchi, Tomofumi; Ikuzawa, Masayuki; Yamazaki, Shin; Watanabe, Yukari; Shiota, Koushiro; Katoh, Takuma; Kobayashi, Ken-Ichiro

    2013-08-01

    The full-length cDNAs of three pepsinogens (Pgs) were cloned from the stomach of newt, Cynops pyrrhogaster, and nucleotide sequences of the full-length cDNAs were determined. Molecular phylogenetic analysis showed that two Pgs, named PgC1 and PgC2, belong to the pepsinogen C group, and one Pg, named PgA, belongs to the pepsinogen A group. The sequences contain an open reading frame (ORF) encoding 385 amino acid residues for PgC1, 383 amino acid residues for PgC2 and 377 amino acid residues for PgA. In addition, all of the three amino acid sequences conserve some unique characteristics such as six cysteine residues and putative active site two aspartic acid residues. All of the pepsinogen mRNAs were detected in the stomach by RT-PCR but not in other organs. Although a slight difference at the time of the start of expression was seen among the three pepsinogen genes, all of them were expressed in the larval stage after hatching. This is the first report on cloning of pepsinogens from urodele stomach. Copyright © 2013 Elsevier Inc. All rights reserved.

  20. Mapping PDB chains to UniProtKB entries.

    PubMed

    Martin, Andrew C R

    2005-12-01

    UniProtKB/SwissProt is the main resource for detailed annotations of protein sequences. This database provides a jumping-off point to many other resources through the links it provides. Among others, these include other primary databases, secondary databases, the Gene Ontology and OMIM. While a large number of links are provided to Protein Data Bank (PDB) files, obtaining a regularly updated mapping between UniProtKB entries and PDB entries at the chain or residue level is not straightforward. In particular, there is no regularly updated resource which allows a UniProtKB/SwissProt entry to be identified for a given residue of a PDB file. We have created a completely automatically maintained database which maps PDB residues to residues in UniProtKB/SwissProt and UniProtKB/trEMBL entries. The protocol uses links from PDB to UniProtKB, from UniProtKB to PDB and a brute-force sequence scan to resolve PDB chains for which no annotated link is available. Finally the sequences from PDB and UniProtKB are aligned to obtain a residue-level mapping. The resource may be queried interactively or downloaded from http://www.bioinf.org.uk/pdbsws/.

  1. Ultrahigh-resolution Fourier transform ion cyclotron resonance mass spectrometry and tandem mass spectrometry for peptide de novo amino acid sequencing for a seven-protein mixture by paired single-residue transposed Lys-N and Lys-C digestion.

    PubMed

    Guan, Xiaoyan; Brownstein, Naomi C; Young, Nicolas L; Marshall, Alan G

    2017-01-30

    Bottom-up tandem mass spectrometry (MS/MS) is regularly used in proteomics to identify proteins from a sequence database. De novo sequencing is also available for sequencing peptides with relatively short sequence lengths. We recently showed that paired Lys-C and Lys-N proteases produce peptides of identical mass and similar retention time, but different tandem mass spectra. Such parallel experiments provide complementary information, and allow for up to 100% MS/MS sequence coverage. Here, we report digestion by paired Lys-C and Lys-N proteases of a seven-protein mixture: human hemoglobin alpha, bovine carbonic anhydrase 2, horse skeletal muscle myoglobin, hen egg white lysozyme, bovine pancreatic ribonuclease, bovine rhodanese, and bovine serum albumin, followed by reversed-phase nanoflow liquid chromatography, collision-induced dissociation, and 14.5 T Fourier transform ion cyclotron resonance mass spectrometry. Matched pairs of product peptide ions of equal precursor mass and similar retention times from each digestion are compared, leveraging single-residue transposed information with independent interferences to confidently identify fragment ion types, residues, and peptides. Selected pairs of product ion mass spectra for de novo sequenced protein segments from each member of the mixture are presented. Pairs of the transposed product ions as well as complementary information from the parallel experiments allow for both high MS/MS coverage for long peptide sequences and high confidence in the amino acid identification. Moreover, the parallel experiments in the de novo sequencing reduce false-positive matches of product ions from the single-residue transposed peptides from the same segment, and thereby further improve the confidence in protein identification. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.

  2. Feedback Power Control Strategies in Wireless Sensor Networks with Joint Channel Decoding

    PubMed Central

    Abrardo, Andrea; Ferrari, Gianluigi; Martalò, Marco; Perna, Fabio

    2009-01-01

    In this paper, we derive feedback power control strategies for block-faded multiple access schemes with correlated sources and joint channel decoding (JCD). In particular, upon the derivation of the feasible signal-to-noise ratio (SNR) region for the considered multiple access schemes, i.e., the multidimensional SNR region where error-free communications are, in principle, possible, two feedback power control strategies are proposed: (i) a classical feedback power control strategy, which aims at equalizing all link SNRs at the access point (AP), and (ii) an innovative optimized feedback power control strategy, which tries to make the network operational point fall in the feasible SNR region at the lowest overall transmit energy consumption. These strategies will be referred to as “balanced SNR” and “unbalanced SNR,” respectively. While they require, in principle, an unlimited power control range at the sources, we also propose practical versions with a limited power control range. We preliminary consider a scenario with orthogonal links and ideal feedback. Then, we analyze the robustness of the proposed power control strategies to possible non-idealities, in terms of residual multiple access interference and noisy feedback channels. Finally, we successfully apply the proposed feedback power control strategies to a limiting case of the class of considered multiple access schemes, namely a central estimating officer (CEO) scenario, where the sensors observe noisy versions of a common binary information sequence and the AP's goal is to estimate this sequence by properly fusing the soft-output information output by the JCD algorithm. PMID:22291536

  3. BWM*: A Novel, Provable, Ensemble-based Dynamic Programming Algorithm for Sparse Approximations of Computational Protein Design.

    PubMed

    Jou, Jonathan D; Jain, Swati; Georgiev, Ivelin S; Donald, Bruce R

    2016-06-01

    Sparse energy functions that ignore long range interactions between residue pairs are frequently used by protein design algorithms to reduce computational cost. Current dynamic programming algorithms that fully exploit the optimal substructure produced by these energy functions only compute the GMEC. This disproportionately favors the sequence of a single, static conformation and overlooks better binding sequences with multiple low-energy conformations. Provable, ensemble-based algorithms such as A* avoid this problem, but A* cannot guarantee better performance than exhaustive enumeration. We propose a novel, provable, dynamic programming algorithm called Branch-Width Minimization* (BWM*) to enumerate a gap-free ensemble of conformations in order of increasing energy. Given a branch-decomposition of branch-width w for an n-residue protein design with at most q discrete side-chain conformations per residue, BWM* returns the sparse GMEC in O([Formula: see text]) time and enumerates each additional conformation in merely O([Formula: see text]) time. We define a new measure, Total Effective Search Space (TESS), which can be computed efficiently a priori before BWM* or A* is run. We ran BWM* on 67 protein design problems and found that TESS discriminated between BWM*-efficient and A*-efficient cases with 100% accuracy. As predicted by TESS and validated experimentally, BWM* outperforms A* in 73% of the cases and computes the full ensemble or a close approximation faster than A*, enumerating each additional conformation in milliseconds. Unlike A*, the performance of BWM* can be predicted in polynomial time before running the algorithm, which gives protein designers the power to choose the most efficient algorithm for their particular design problem.

  4. Optimizing expression of the pregnancy malaria vaccine candidate, VAR2CSA in Pichia pastoris.

    PubMed

    Avril, Marion; Hathaway, Marianne J; Cartwright, Megan M; Gose, Severin O; Narum, David L; Smith, Joseph D

    2009-06-29

    VAR2CSA is the main candidate for a vaccine against pregnancy-associated malaria, but vaccine development is complicated by the large size and complex disulfide bonding pattern of the protein. Recent X-ray crystallographic information suggests that domain boundaries of VAR2CSA Duffy binding-like (DBL) domains may be larger than previously predicted and include two additional cysteine residues. This study investigated whether longer constructs would improve VAR2CSA recombinant protein secretion from Pichia pastoris and if domain boundaries were applicable across different VAR2CSA alleles. VAR2CSA sequences were bioinformatically analysed to identify the predicted C11 and C12 cysteine residues at the C-termini of DBL domains and revised N- and C-termimal domain boundaries were predicted in VAR2CSA. Multiple construct boundaries were systematically evaluated for protein secretion in P. pastoris and secreted proteins were tested as immunogens. From a total of 42 different VAR2CSA constructs, 15 proteins (36%) were secreted. Longer construct boundaries, including the predicted C11 and C12 cysteine residues, generally improved expression of poorly or non-secreted domains and permitted expression of all six VAR2CSA DBL domains. However, protein secretion was still highly empiric and affected by subtle differences in domain boundaries and allelic variation between VAR2CSA sequences. Eleven of the secreted proteins were used to immunize rabbits. Antibodies reacted with CSA-binding infected erythrocytes, indicating that P. pastoris recombinant proteins possessed native protein epitopes. These findings strengthen emerging data for a revision of DBL domain boundaries in var-encoded proteins and may facilitate pregnancy malaria vaccine development.

  5. Optimizing expression of the pregnancy malaria vaccine candidate, VAR2CSA in Pichia pastoris

    PubMed Central

    Avril, Marion; Hathaway, Marianne J; Cartwright, Megan M; Gose, Severin O; Narum, David L; Smith, Joseph D

    2009-01-01

    Background VAR2CSA is the main candidate for a vaccine against pregnancy-associated malaria, but vaccine development is complicated by the large size and complex disulfide bonding pattern of the protein. Recent X-ray crystallographic information suggests that domain boundaries of VAR2CSA Duffy binding-like (DBL) domains may be larger than previously predicted and include two additional cysteine residues. This study investigated whether longer constructs would improve VAR2CSA recombinant protein secretion from Pichia pastoris and if domain boundaries were applicable across different VAR2CSA alleles. Methods VAR2CSA sequences were bioinformatically analysed to identify the predicted C11 and C12 cysteine residues at the C-termini of DBL domains and revised N- and C-termimal domain boundaries were predicted in VAR2CSA. Multiple construct boundaries were systematically evaluated for protein secretion in P. pastoris and secreted proteins were tested as immunogens. Results From a total of 42 different VAR2CSA constructs, 15 proteins (36%) were secreted. Longer construct boundaries, including the predicted C11 and C12 cysteine residues, generally improved expression of poorly or non-secreted domains and permitted expression of all six VAR2CSA DBL domains. However, protein secretion was still highly empiric and affected by subtle differences in domain boundaries and allelic variation between VAR2CSA sequences. Eleven of the secreted proteins were used to immunize rabbits. Antibodies reacted with CSA-binding infected erythrocytes, indicating that P. pastoris recombinant proteins possessed native protein epitopes. Conclusion These findings strengthen emerging data for a revision of DBL domain boundaries in var-encoded proteins and may facilitate pregnancy malaria vaccine development. PMID:19563628

  6. High precision in protein contact prediction using fully convolutional neural networks and minimal sequence features.

    PubMed

    Jones, David T; Kandathil, Shaun M

    2018-04-26

    In addition to substitution frequency data from protein sequence alignments, many state-of-the-art methods for contact prediction rely on additional sources of information, or features, of protein sequences in order to predict residue-residue contacts, such as solvent accessibility, predicted secondary structure, and scores from other contact prediction methods. It is unclear how much of this information is needed to achieve state-of-the-art results. Here, we show that using deep neural network models, simple alignment statistics contain sufficient information to achieve state-of-the-art precision. Our prediction method, DeepCov, uses fully convolutional neural networks operating on amino-acid pair frequency or covariance data derived directly from sequence alignments, without using global statistical methods such as sparse inverse covariance or pseudolikelihood estimation. Comparisons against CCMpred and MetaPSICOV2 show that using pairwise covariance data calculated from raw alignments as input allows us to match or exceed the performance of both of these methods. Almost all of the achieved precision is obtained when considering relatively local windows (around 15 residues) around any member of a given residue pairing; larger window sizes have comparable performance. Assessment on a set of shallow sequence alignments (fewer than 160 effective sequences) indicates that the new method is substantially more precise than CCMpred and MetaPSICOV2 in this regime, suggesting that improved precision is attainable on smaller sequence families. Overall, the performance of DeepCov is competitive with the state of the art, and our results demonstrate that global models, which employ features from all parts of the input alignment when predicting individual contacts, are not strictly needed in order to attain precise contact predictions. DeepCov is freely available at https://github.com/psipred/DeepCov. d.t.jones@ucl.ac.uk.

  7. Sequence Effect on the Formation of DNA Minidumbbells.

    PubMed

    Liu, Yuan; Lam, Sik Lok

    2017-11-16

    The DNA minidumbbell (MDB) is a recently identified non-B structure. The reported MDBs contain two TTTA, CCTG, or CTTG type II loops. At present, the knowledge and understanding of the sequence criteria for MDB formation are still limited. In this study, we performed a systematic high-resolution nuclear magnetic resonance (NMR) and native gel study to investigate the effect of sequence variations in tandem repeats on the formation of MDBs. Our NMR results reveal the importance of hydrogen bonds, base-base stacking, and hydrophobic interactions from each of the participating residues. We conclude that in the MDBs formed by tandem repeats, C-G loop-closing base pairs are more stabilizing than T-A loop-closing base pairs, and thymine residues in both the second and third loop positions are more stabilizing than cytosine residues. The results from this study enrich our knowledge on the sequence criteria for the formation of MDBs, paving a path for better exploring their potential roles in biological systems and DNA nanotechnology.

  8. Compartmentalization of HIV-1 within the female genital tract is due to monotypic and low-diversity variants not distinct viral populations.

    PubMed

    Bull, Marta; Learn, Gerald; Genowati, Indira; McKernan, Jennifer; Hitti, Jane; Lockhart, David; Tapia, Kenneth; Holte, Sarah; Dragavon, Joan; Coombs, Robert; Mullins, James; Frenkel, Lisa

    2009-09-22

    Compartmentalization of HIV-1 between the genital tract and blood was noted in half of 57 women included in 12 studies primarily using cell-free virus. To further understand differences between genital tract and blood viruses of women with chronic HIV-1 infection cell-free and cell-associated virus populations were sequenced from these tissues, reasoning that integrated viral DNA includes variants archived from earlier in infection, and provides a greater array of genotypes for comparisons. Multiple sequences from single-genome-amplification of HIV-1 RNA and DNA from the genital tract and blood of each woman were compared in a cross-sectional study. Maximum likelihood phylogenies were evaluated for evidence of compartmentalization using four statistical tests. Genital tract and blood HIV-1 appears compartmentalized in 7/13 women by >/=2 statistical analyses. These subjects' phylograms were characterized by low diversity genital-specific viral clades interspersed between clades containing both genital and blood sequences. Many of the genital-specific clades contained monotypic HIV-1 sequences. In 2/7 women, HIV-1 populations were significantly compartmentalized across all four statistical tests; both had low diversity genital tract-only clades. Collapsing monotypic variants into a single sequence diminished the prevalence and extent of compartmentalization. Viral sequences did not demonstrate tissue-specific signature amino acid residues, differential immune selection, or co-receptor usage. In women with chronic HIV-1 infection multiple identical sequences suggest proliferation of HIV-1-infected cells, and low diversity tissue-specific phylogenetic clades are consistent with bursts of viral replication. These monotypic and tissue-specific viruses provide statistical support for compartmentalization of HIV-1 between the female genital tract and blood. However, the intermingling of these clades with clades comprised of both genital and blood sequences and the absence of tissue-specific genetic features suggests compartmentalization between blood and genital tract may be due to viral replication and proliferation of infected cells, and questions whether HIV-1 in the female genital tract is distinct from blood.

  9. Orotidine Monophosphate Decarboxylase--A Fascinating Workhorse Enzyme with Therapeutic Potential.

    PubMed

    Fujihashi, Masahiro; Mnpotra, Jagjeet S; Mishra, Ram Kumar; Pai, Emil F; Kotra, Lakshmi P

    2015-05-20

    Orotidine 5'-monophosphate decarboxylase (ODCase) is known as one of the most proficient enzymes. The enzyme catalyzes the last reaction step of the de novo pyrimidine biosynthesis, the conversion from orotidine 5'-monophosphate (OMP) to uridine 5'-monophosphate. The enzyme is found in all three domains of life, Bacteria, Eukarya and Archaea. Multiple sequence alignment of 750 putative ODCase sequences resulted in five distinct groups. While the universally conserved DxKxxDx motif is present in all the groups, depending on the groups, several characteristic motifs and residues can be identified. Over 200 crystal structures of ODCases have been determined so far. The structures, together with biochemical assays and computational studies, elucidated that ODCase utilized both transition state stabilization and substrate distortion to accelerate the decarboxylation of its natural substrate. Stabilization of the vinyl anion intermediate by a conserved lysine residue at the catalytic site is considered the largest contributing factor to catalysis, while bending of the carboxyl group from the plane of the aromatic pyrimidine ring of OMP accounts for substrate distortion. A number of crystal structures of ODCases complexed with potential drug candidate molecules have also been determined, including with 6-iodo-uridine, a potential antimalarial agent. Copyright © 2015 Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, and Genetics Society of China. Published by Elsevier Ltd. All rights reserved.

  10. Structure of the human gene encoding the protein repair L-isoaspartyl (D-aspartyl) O-methyltransferase.

    PubMed

    DeVry, C G; Tsai, W; Clarke, S

    1996-11-15

    The protein L-isoaspartyl/D-aspartyl O-methyltransferase (EC 2.1.1.77) catalyzes the first step in the repair of proteins damaged in the aging process by isomerization or racemization reactions at aspartyl and asparaginyl residues. A single gene has been localized to human chromosome 6 and multiple transcripts arising through alternative splicing have been identified. Restriction enzyme mapping, subcloning, and DNA sequence analysis of three overlapping clones from a human genomic library in bacteriophage P1 indicate that the gene spans approximately 60 kb and is composed of 8 exons interrupted by 7 introns. Analysis of intron/exon splice junctions reveals that all of the donor and acceptor splice sites are in agreement with the mammalian consensus splicing sequence. Determination of transcription initiation sites by primer extension analysis of poly(A)+ mRNA from human brain identifies multiple start sites, with a major site 159 nucleotides upstream from the ATG start codon. Sequence analysis of the 5'-untranslated region demonstrates several potential cis-acting DNA elements including SP1, ETF, AP1, AP2, ARE, XRE, CREB, MED-1, and half-palindromic ERE motifs. The promoter of this methyltransferase gene lacks an identifiable TATA box but is characterized by a CpG island which begins approximately 723 nucleotides upstream of the major transcriptional start site and extends through exon 1 and into the first intron. These features are characteristic of housekeeping genes and are consistent with the wide tissue distribution observed for this methyltransferase activity.

  11. Equation Chapter 1 Section 1Sequence-To-Conformation Relationships of Disordered Regions Tethered to Folded Domains of Proteins.

    PubMed

    Mittal, Anuradha; Holehouse, Alex S; Cohan, Megan C; Pappu, Rohit V

    2018-05-12

    Intrinsically disordered proteins and regions (IDPs / IDRs) are characterized by well-defined sequence-to-conformation relationships (SCRs). These relationships refer to the sequence-specific preferences for average sizes, shapes, residue-specific secondary structure propensities, and amplitudes of multiscale conformational fluctuations. SCRs are discerned from the sequence-specific conformational ensembles of IDPs. A vast majority of IDPs are actually tethered to folded domains (FDs). This raises the question of whether or not SCRs inferred for IDPs are applicable to IDRs tethered to folded domains. Here, we use atomistic simulations based on a well-established forcefield paradigm and an enhanced sampling method to obtain comparative assessments of SCRs for thirteen archetypal IDRs modeled as autonomous units, as C-terminal tails connected to folded domains, and as linkers between pairs of folded domains. Our studies uncover a set of general observations regarding context-independent versus context-dependent SCRs of IDRs. SCRs are minimally perturbed upon tethering to folded domains if the IDRs are deficient in charged residues and for polyampholytic IDRs where the oppositely charged residues within the sequence of the IDR are separated into distinct blocks. In contrast, the interplay between IDRs and tethered folded domains has a significant modulatory effect on SCRs if the IDRs have intermediate fractions of charged residues or if they have sequence-intrinsic conformational preferences for canonical random coils. Our findings suggest that IDRs with context-independent SCRs might be independent evolutionary modules whereas IDRs with context-dependent intrinsic SCRs might co-evolve with the FDs to which they are tethered. Copyright © 2018. Published by Elsevier Ltd.

  12. Replacement of all arginine residues with canavanine in MazF-bs mRNA interferase changes its specificity.

    PubMed

    Ishida, Yojiro; Park, Jung-Ho; Mao, Lili; Yamaguchi, Yoshihiro; Inouye, Masayori

    2013-03-15

    Replacement of a specific amino acid residue in a protein with nonnatural analogues is highly challenging because of their cellular toxicity. We demonstrate for the first time the replacement of all arginine (Arg) residues in a protein with canavanine (Can), a toxic Arg analogue. All Arg residues in the 5-base specific (UACAU) mRNA interferase from Bacillus subtilis (MazF-bs(arg)) were replaced with Can by using the single-protein production system in Escherichia coli. The resulting MazF-bs(can) gained a 6-base recognition sequence, UACAUA, for RNA cleavage instead of the 5-base sequence, UACAU, for MazF-bs(arg). Mass spectrometry analysis confirmed that all Arg residues were replaced with Can. The present system offers a novel approach to create new functional proteins by replacing a specific amino acid in a protein with its analogues.

  13. A new cofactor in prokaryotic enzyme: Tryptophan tryptophylquinone as the redox prosthetic group in methylamine dehydrogenase

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    McIntire, W.S.; Wemmer, D.E.; Chistoserdov, A.

    Methylamine dehydrogenase (MADH), an {alpha}{sub 2}{beta}{sub 2} enzyme from numerous methylotrophic soil bacteria, contains a novel quinonoid redox prosthetic group that is covalently bound to its small {beta} subunit through two amino acyl residues. A comparison of the amino acid sequence deduced from the gene sequence of the small subunit for the enzyme from Methylobacterium extorquens AM1 with the published amino acid sequence obtained by Edman degradation method, allowed the identification of the amino acyl constituents of the cofactor as two tryptophyl residues. This information was crucial for interpreting {sup 1}H and {sup 13}C nuclear magnetic resonance, and mass spectralmore » data collected for the semicarbazide- and carboxymethyl-derivatized bis(tripeptidyl)-cofactor of MADH from bacterium W3A1. The cofactor is composed of two cross-linked tryptophyl residues. Although there are many possible isomers, only one is consistent with all the data: The first tryptophyl residue in the peptide sequence exists as an indole-6,7-dione, and is attached at its 4 position to the 2 position of the second, otherwise unmodified, indole side group. Contrary to earlier reports, the cofactor of MADH is not 2,7,9-tricarboxypyrroloquinoline quinone (PQQ), a derivative thereof, of pro-PQQ. This appears to be the only example of two cross-linked, modified amino acyl residues having a functional role in the active site of an enzyme, in the absence of other cofactors or metal ions.« less

  14. Functional Analysis of the Accessory Protein TapA in Bacillus subtilis Amyloid Fiber Assembly

    PubMed Central

    Romero, Diego; Vlamakis, Hera; Losick, Richard

    2014-01-01

    Bacillus subtilis biofilm formation relies on the assembly of a fibrous scaffold formed by the protein TasA. TasA polymerizes into highly stable fibers with biochemical and morphological features of functional amyloids. Previously, we showed that assembly of TasA fibers requires the auxiliary protein TapA. In this study, we investigated the roles of TapA sequences from the C-terminal and N-terminal ends and TapA cysteine residues in its ability to promote the assembly of TasA amyloid-like fibers. We found that the cysteine residues are not essential for the formation of TasA fibers, as their replacement by alanine residues resulted in only minor defects in biofilm formation. Mutating sequences in the C-terminal half had no effect on biofilm formation. However, we identified a sequence of 8 amino acids in the N terminus that is key for TasA fiber formation. Strains expressing TapA lacking these 8 residues were completely defective in biofilm formation. In addition, this TapA mutant protein exhibited a dominant negative effect on TasA fiber formation. Even in the presence of wild-type TapA, the mutant protein inhibited fiber assembly in vitro and delayed biofilm formation in vivo. We propose that this 8-residue sequence is crucial for the formation of amyloid-like fibers on the cell surface, perhaps by mediating the interaction between TapA or TapA and TasA molecules. PMID:24488317

  15. Functional analysis of the accessory protein TapA in Bacillus subtilis amyloid fiber assembly.

    PubMed

    Romero, Diego; Vlamakis, Hera; Losick, Richard; Kolter, Roberto

    2014-04-01

    Bacillus subtilis biofilm formation relies on the assembly of a fibrous scaffold formed by the protein TasA. TasA polymerizes into highly stable fibers with biochemical and morphological features of functional amyloids. Previously, we showed that assembly of TasA fibers requires the auxiliary protein TapA. In this study, we investigated the roles of TapA sequences from the C-terminal and N-terminal ends and TapA cysteine residues in its ability to promote the assembly of TasA amyloid-like fibers. We found that the cysteine residues are not essential for the formation of TasA fibers, as their replacement by alanine residues resulted in only minor defects in biofilm formation. Mutating sequences in the C-terminal half had no effect on biofilm formation. However, we identified a sequence of 8 amino acids in the N terminus that is key for TasA fiber formation. Strains expressing TapA lacking these 8 residues were completely defective in biofilm formation. In addition, this TapA mutant protein exhibited a dominant negative effect on TasA fiber formation. Even in the presence of wild-type TapA, the mutant protein inhibited fiber assembly in vitro and delayed biofilm formation in vivo. We propose that this 8-residue sequence is crucial for the formation of amyloid-like fibers on the cell surface, perhaps by mediating the interaction between TapA or TapA and TasA molecules.

  16. Extreme Evolutionary Conservation of Functionally Important Regions in H1N1 Influenza Proteome

    PubMed Central

    Warren, Samantha; Wan, Xiu-Feng; Conant, Gavin; Korkin, Dmitry

    2013-01-01

    The H1N1 subtype of influenza A virus has caused two of the four documented pandemics and is responsible for seasonal epidemic outbreaks, presenting a continuous threat to public health. Co-circulating antigenically divergent influenza strains significantly complicates vaccine development and use. Here, by combining evolutionary, structural, functional, and population information about the H1N1 proteome, we seek to answer two questions: (1) do residues on the protein surfaces evolve faster than the protein core residues consistently across all proteins that constitute the influenza proteome? and (2) in spite of the rapid evolution of surface residues in influenza proteins, are there any protein regions on the protein surface that do not evolve? To answer these questions, we first built phylogenetically-aware models of the patterns of surface and interior substitutions. Employing these models, we found a single coherent pattern of faster evolution on the protein surfaces that characterizes all influenza proteins. The pattern is consistent with the events of inter-species reassortment, the worldwide introduction of the flu vaccine in the early 80’s, as well as the differences caused by the geographic origins of the virus. Next, we developed an automated computational pipeline to comprehensively detect regions of the protein surface residues that were 100% conserved over multiple years and in multiple host species. We identified conserved regions on the surface of 10 influenza proteins spread across all avian, swine, and human strains; with the exception of a small group of isolated strains that affected the conservation of three proteins. Surprisingly, these regions were also unaffected by genetic variation in the pandemic 2009 H1N1 viral population data obtained from deep sequencing experiments. Finally, the conserved regions were intrinsically related to the intra-viral macromolecular interaction interfaces. Our study may provide further insights towards the identification of novel protein targets for influenza antivirals. PMID:24282564

  17. Generating intrinsically disordered protein conformational ensembles from a Markov chain

    NASA Astrophysics Data System (ADS)

    Cukier, Robert I.

    2018-03-01

    Intrinsically disordered proteins (IDPs) sample a diverse conformational space. They are important to signaling and regulatory pathways in cells. An entropy penalty must be payed when an IDP becomes ordered upon interaction with another protein or a ligand. Thus, the degree of conformational disorder of an IDP is of interest. We create a dichotomic Markov model that can explore entropic features of an IDP. The Markov condition introduces local (neighbor residues in a protein sequence) rotamer dependences that arise from van der Waals and other chemical constraints. A protein sequence of length N is characterized by its (information) entropy and mutual information, MIMC, the latter providing a measure of the dependence among the random variables describing the rotamer probabilities of the residues that comprise the sequence. For a Markov chain, the MIMC is proportional to the pair mutual information MI which depends on the singlet and pair probabilities of neighbor residue rotamer sampling. All 2N sequence states are generated, along with their probabilities, and contrasted with the probabilities under the assumption of independent residues. An efficient method to generate realizations of the chain is also provided. The chain entropy, MIMC, and state probabilities provide the ingredients to distinguish different scenarios using the terminologies: MoRF (molecular recognition feature), not-MoRF, and not-IDP. A MoRF corresponds to large entropy and large MIMC (strong dependence among the residues' rotamer sampling), a not-MoRF corresponds to large entropy but small MIMC, and not-IDP corresponds to low entropy irrespective of the MIMC. We show that MorFs are most appropriate as descriptors of IDPs. They provide a reasonable number of high-population states that reflect the dependences between neighbor residues, thus classifying them as IDPs, yet without very large entropy that might lead to a too high entropy penalty.

  18. A Novel Cylindrical Representation for Characterizing Intrinsic Properties of Protein Sequences.

    PubMed

    Yu, Jia-Feng; Dou, Xiang-Hua; Wang, Hong-Bo; Sun, Xiao; Zhao, Hui-Ying; Wang, Ji-Hua

    2015-06-22

    The composition and sequence order of amino acid residues are the two most important characteristics to describe a protein sequence. Graphical representations facilitate visualization of biological sequences and produce biologically useful numerical descriptors. In this paper, we propose a novel cylindrical representation by placing the 20 amino acid residue types in a circle and sequence positions along the z axis. This representation allows visualization of the composition and sequence order of amino acids at the same time. Ten numerical descriptors and one weighted numerical descriptor have been developed to quantitatively describe intrinsic properties of protein sequences on the basis of the cylindrical model. Their applications to similarity/dissimilarity analysis of nine ND5 proteins indicated that these numerical descriptors are more effective than several classical numerical matrices. Thus, the cylindrical representation obtained here provides a new useful tool for visualizing and charactering protein sequences. An online server is available at http://biophy.dzu.edu.cn:8080/CNumD/input.jsp .

  19. A Lossy Compression Technique Enabling Duplication-Aware Sequence Alignment

    PubMed Central

    Freschi, Valerio; Bogliolo, Alessandro

    2012-01-01

    In spite of the recognized importance of tandem duplications in genome evolution, commonly adopted sequence comparison algorithms do not take into account complex mutation events involving more than one residue at the time, since they are not compliant with the underlying assumption of statistical independence of adjacent residues. As a consequence, the presence of tandem repeats in sequences under comparison may impair the biological significance of the resulting alignment. Although solutions have been proposed, repeat-aware sequence alignment is still considered to be an open problem and new efficient and effective methods have been advocated. The present paper describes an alternative lossy compression scheme for genomic sequences which iteratively collapses repeats of increasing length. The resulting approximate representations do not contain tandem duplications, while retaining enough information for making their comparison even more significant than the edit distance between the original sequences. This allows us to exploit traditional alignment algorithms directly on the compressed sequences. Results confirm the validity of the proposed approach for the problem of duplication-aware sequence alignment. PMID:22518086

  20. The point mutation process in proteins

    NASA Technical Reports Server (NTRS)

    Schwartz, R. M.; Dayhoff, M. O.

    1978-01-01

    An optimized scoring matrix for residue-by-residue comparisons of distantly related protein sequences has been developed. The scoring matrix is based on observed exchanges and mutabilities of amino acids in 1572 closely related sequences derived from a cross-section of protein groups. Very few superimposed or parallel mutations are included in the data. The scoring matrix is most useful for demonstrating the relatedness of proteins between 65 and 85% different.

  1. The legumin gene family: structure of a B type gene of Vicia faba and a possible legumin gene specific regulatory element.

    PubMed Central

    Bäumlein, H; Wobus, U; Pustell, J; Kafatos, F C

    1986-01-01

    The field bean, Vicia faba L. var. minor, possesses two sub-families of 11 S legumin genes named A and B. We isolated from a genomic library a B-type gene (LeB4) and determined its primary DNA sequence. Gene LeB4 codes for a 484 amino acid residue prepropolypeptide, encompassing a signal peptide of 22 amino acid residues, an acidic, very hydrophilic alpha-chain of 281 residues and a basic, somewhat hydrophobic beta-chain of 181 residues. The latter two coding regions are immediately contiguous, but each is interrupted by a short intron. Type A legumin genes from soybean and pea are known to have introns in the same two positions, in addition to an extra intron (within the alpha-coding sequence). Sequence comparisons of legumin genes from these three plants revealed a highly conserved sequence element of at least 28 bp, centered at approximately 100 bp upstream of each cap site. The element is absent from the equivalent position of all non-legumin and other plant and fungal genes examined. We tentatively name this element "legumin box" and suggest that it may have a function in the regulation of legumin gene expression. PMID:3960730

  2. Structural insight with mutational impact on tyrosinase and PKC-β interaction from Homo sapiens: Molecular modeling and docking studies for melanogenesis, albinism and increased risk for melanoma.

    PubMed

    Banerjee, Arundhati; Ray, Sujay

    2016-10-30

    Human tyrosinase, is an important protein for biosynthetic pathway of melanin. It was studied to be phosphorylated and activated by protein kinase-C, β-subunit (PKC-β) through earlier experimentations with in vivo evidences. Documentation documents that mutation in two essentially vital serine residues in C-terminal end of tyrosinase leads to albinism. Due to the deficiency of protective shield like enzyme; melanin, albinos are at an increased peril for melanoma and other skin cancers. So, computational and residue-level insight including a mutational exploration with evolutionary importance into this mechanism lies obligatory for future pathological and therapeutic developments. Therefore, functional tertiary models of the relevant proteins were analyzed after satisfying their stereo-chemical features. Evolutionarily paramount residues for the activation of tyrosinase were perceived via multiple sequence alignment phenomena. Mutant-type tyrosinase protein (S98A and S102A) was thereby modeled, maintaining the wild-type proteins' functionality. Furthermore, this present comparative study discloses the variation in the stable residual participation (for mutant-type and wild-type tyrosinase-PKCβ complex). Mainly, an increased number of polar negatively charged residues from the wild-type tyrosinase participated with PKC-β, predominantly. Fascinatingly supported by evaluation of statistical significances, mutation even led to a destabilizing impact in tyrosinase accompanied by conformational switches with a helix-to-coil transition in the mutated protein. Even the allosteric sites in the protein got poorly hampered upon mutation leading to weaker tendency for binding partners to interact. Copyright © 2016 Elsevier B.V. All rights reserved.

  3. Oxidative generation of guanine radicals by carbonate radicals and their reactions with nitrogen dioxide to form site specific 5-guanidino-4-nitroimidazole lesions in oligodeoxynucleotides.

    PubMed

    Joffe, Avrum; Mock, Steven; Yun, Byeong Hwa; Kolbanovskiy, Alexander; Geacintov, Nicholas E; Shafirovich, Vladimir

    2003-08-01

    A simple photochemical approach is described for synthesizing site specific, stable 5-guanidino-4-nitroimidazole (NIm) adducts in single- and double-stranded oligodeoxynucleotides containing single and multiple guanine residues. The DNA sequences employed, 5'-d(ACC CG(1)C G(2)TC CG(3)C G(4)CC) and 5'-d(ACC CG(1)C G(2)TC C), were a portion of exon 5 of the p53 tumor suppressor gene, including the codons 157 (G(2)) and 158 (G(3)) mutation hot spots in the former sequence with four Gs and the codon 157 (G(2)) mutation hot spot in the latter sequence with two Gs. The nitration of oligodeoxynucleotides was initiated by the selective photodissociation of persulfate anions to sulfate radicals induced by UV laser pulses (308 nm). In aqueous solutions, of bicarbonate and nitrite anions, the sulfate radicals generate carbonate anion radicals and nitrogen dioxide radicals by one electron oxidation of the respective anions. The guanine residue in the oligodeoxynucleotide is oxidized by the carbonate anion radical to form the neutral guanine radical. While the nitrogen dioxide radicals do not react with any of the intact DNA bases, they readily combine with the guanine radicals at either the C8 or the C5 positions. The C8 addition generates the well-known 8-nitroguanine (8-nitro-G) lesions, whereas the C5 attack produces unstable adducts, which rapidly decompose to NIm lesions. The maximum yields of the nitro products (NIm + 8-nitro-G) were typically in the range of 20-40%, depending on the number of guanine residues in the sequence. The ratio of the NIm to 8-nitro-G lesions gradually decreases from 3.4 in the model compound, 2',3',5'-tri-O-acetylguanosine, to 2.1-2.6 in the single-stranded oligodeoxynucleotides and to 0.8-1.1 in the duplexes. The adduct of the 5'-d(ACC CG(1)C G(2)TC C) oligodeoxynucleotide containing the NIm lesion in codon 157 (G(2)) was isolated in HPLC-pure form. The integrity of this adduct was established by a detailed analysis of exonuclease digestion ladders by matrix-assisted laser desorption ionization with time-of-flight detection MS techniques.

  4. Four distinct types of E.C. 1.2.1.30 enzymes can catalyze the reduction of carboxylic acids to aldehydes.

    PubMed

    Stolterfoht, Holly; Schwendenwein, Daniel; Sensen, Christoph W; Rudroff, Florian; Winkler, Margit

    2017-09-10

    Increasing demand for chemicals from renewable resources calls for the development of new biotechnological methods for the reduction of oxidized bio-based compounds. Enzymatic carboxylate reduction is highly selective, both in terms of chemo- and product selectivity, but not many carboxylate reductase enzymes (CARs) have been identified on the sequence level to date. Thus far, their phylogeny is unexplored and very little is known about their structure-function-relationship. CARs minimally contain an adenylation domain, a phosphopantetheinylation domain and a reductase domain. We have recently identified new enzymes of fungal origin, using similarity searches against genomic sequences from organisms in which aldehydes were detected upon incubation with carboxylic acids. Analysis of sequences with known CAR functionality and CAR enzymes recently identified in our laboratory suggests that the three-domain architecture mentioned above is modular. The construction of a distance tree with a subsequent 1000-replicate bootstrap analysis showed that the CAR sequences included in our study fall into four distinct subgroups (one of bacterial origin and three of fungal origin, respectively), each with a bootstrap value of 100%. The multiple sequence alignment of all experimentally confirmed CAR protein sequences revealed fingerprint sequences of residues which are likely to be involved in substrate and co-substrate binding and one of the three catalytic substeps, respectively. The fingerprint sequences broaden our understanding of the amino acids that might be essential for the reduction of organic acids to the corresponding aldehydes in CAR proteins. Copyright © 2017 Elsevier B.V. All rights reserved.

  5. DNA-binding proteins from marine bacteria expand the known sequence diversity of TALE-like repeats.

    PubMed

    de Lange, Orlando; Wolf, Christina; Thiel, Philipp; Krüger, Jens; Kleusch, Christian; Kohlbacher, Oliver; Lahaye, Thomas

    2015-11-16

    Transcription Activator-Like Effectors (TALEs) of Xanthomonas bacteria are programmable DNA binding proteins with unprecedented target specificity. Comparative studies into TALE repeat structure and function are hindered by the limited sequence variation among TALE repeats. More sequence-diverse TALE-like proteins are known from Ralstonia solanacearum (RipTALs) and Burkholderia rhizoxinica (Bats), but RipTAL and Bat repeats are conserved with those of TALEs around the DNA-binding residue. We study two novel marine-organism TALE-like proteins (MOrTL1 and MOrTL2), the first to date of non-terrestrial origin. We have assessed their DNA-binding properties and modelled repeat structures. We found that repeats from these proteins mediate sequence specific DNA binding conforming to the TALE code, despite low sequence similarity to TALE repeats, and with novel residues around the BSR. However, MOrTL1 repeats show greater sequence discriminating power than MOrTL2 repeats. Sequence alignments show that there are only three residues conserved between repeats of all TALE-like proteins including the two new additions. This conserved motif could prove useful as an identifier for future TALE-likes. Additionally, comparing MOrTL repeats with those of other TALE-likes suggests a common evolutionary origin for the TALEs, RipTALs and Bats. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  6. A novel HLA-B allele, B*5214, detected in a Taiwanese volunteer bone marrow donor using a sequence-based typing method.

    PubMed

    Chen, M J; Chu, C C; Shyr, M H; Lin, C L; Lin, P Y; Yang, K L

    2010-02-01

    HLA-B*5214, a novel rare allele of HLA-B*52 variant, was found in a Taiwanese volunteer bone marrow donor by sequence-based typing method. The sequence of B*5214 is identical to that of B*520101 in exon 2 but differs from B*520101 in exon 3 at nucleotide positions 419 A-->T and 435 A-->G. Alteration of these two nucleotides resulted an amino acid substitution at amino acid residue 116 Y-->F ( TAC-->TTC) and a silent exchange at residue 121 K-->K (AAA-->AAG).

  7. Characterization, production, and purification of leucocin H, a two-peptide bacteriocin from Leuconostoc MF215B.

    PubMed

    Blom, H; Katla, T; Holck, A; Sletten, K; Axelsson, L; Holo, H

    1999-07-01

    Leuconostoc MF215B was found to produce a two-peptide bacteriocin referred to as leucocin H. The two peptides were termed leucocin Halpha and leucocin Hbeta. When acting together, they inhibit, among others, Listeria monocytogenes, Bacillus cereus, and Clostridium perfringens. Production of leucocin H in growth medium takes place at temperatures down to 6 degrees C and at pH below 7. The highest activity of leucocin H in growth medium was demonstrated in the late exponential growth phase. The bacteriocin was purified by precipitation with ammonium sulfate, ion-exchange (SP Sepharose) and reverse phase chromatography. Upon purification, specific activity increased 10(5)-fold, and the final specific activity was 2 x 10(7) BU/OD280. Amino acid composition analyses of leucocin Halpha and leucocin Hbeta indicated that both peptides consisted of around 40 amino acid residues. Their N-termini were blocked for Edman degradation, and the methionin residues of leucocin Hbeta did not respond to Cyanogen Bromide (CNBr) cleavage. Absorbance at 280 nm indicated the presence of tryptophan residues and tryptophan-fracturing opened for partial sequencing by Edman degradation. From leucocin Halpha, the sequence of 20 amino acids was obtained; from leucocin Hbeta the sequence of 28 amino acid residues was obtained. No sequence homology to other known bacteriocins could be demonstrated. It also appeared that the two peptides themselves shared little or no sequence homology. The presence of soy oil did not affect the activity of leucocin H in agar.

  8. FastRNABindR: Fast and Accurate Prediction of Protein-RNA Interface Residues.

    PubMed

    El-Manzalawy, Yasser; Abbas, Mostafa; Malluhi, Qutaibah; Honavar, Vasant

    2016-01-01

    A wide range of biological processes, including regulation of gene expression, protein synthesis, and replication and assembly of many viruses are mediated by RNA-protein interactions. However, experimental determination of the structures of protein-RNA complexes is expensive and technically challenging. Hence, a number of computational tools have been developed for predicting protein-RNA interfaces. Some of the state-of-the-art protein-RNA interface predictors rely on position-specific scoring matrix (PSSM)-based encoding of the protein sequences. The computational efforts needed for generating PSSMs severely limits the practical utility of protein-RNA interface prediction servers. In this work, we experiment with two approaches, random sampling and sequence similarity reduction, for extracting a representative reference database of protein sequences from more than 50 million protein sequences in UniRef100. Our results suggest that random sampled databases produce better PSSM profiles (in terms of the number of hits used to generate the profile and the distance of the generated profile to the corresponding profile generated using the entire UniRef100 data as well as the accuracy of the machine learning classifier trained using these profiles). Based on our results, we developed FastRNABindR, an improved version of RNABindR for predicting protein-RNA interface residues using PSSM profiles generated using 1% of the UniRef100 sequences sampled uniformly at random. To the best of our knowledge, FastRNABindR is the only protein-RNA interface residue prediction online server that requires generation of PSSM profiles for query sequences and accepts hundreds of protein sequences per submission. Our approach for determining the optimal BLAST database for a protein-RNA interface residue classification task has the potential of substantially speeding up, and hence increasing the practical utility of, other amino acid sequence based predictors of protein-protein and protein-DNA interfaces.

  9. Complete amino acid sequence of ananain and a comparison with stem bromelain and other plant cysteine proteases.

    PubMed Central

    Lee, K L; Albee, K L; Bernasconi, R J; Edmunds, T

    1997-01-01

    The amino acid sequences of ananain (EC3.4.22.31) and stem bromelain (3.4.22.32), two cysteine proteases from pineapple stem, are similar yet ananain and stem bromelain possess distinct specificities towards synthetic peptide substrates and different reactivities towards the cysteine protease inhibitors E-64 and chicken egg white cystatin. We present here the complete amino acid sequence of ananain and compare it with the reported sequences of pineapple stem bromelain, papain and chymopapain from papaya and actinidin from kiwifruit. Ananain is comprised of 216 residues with a theoretical mass of 23464 Da. This primary structure includes a sequence insert between residues 170 and 174 not present in stem bromelain or papain and a hydrophobic series of amino acids adjacent to His-157. It is possible that these sequence differences contribute to the different substrate and inhibitor specificities exhibited by ananain and stem bromelain. PMID:9355753

  10. A Multiple-Sequence Variant of the Multiple-Baseline Design: A Strategy for Analysis of Sequence Effects and Treatment Comparison.

    ERIC Educational Resources Information Center

    Noell, George H.; Gresham, Frank M.

    2001-01-01

    Describes design logic and potential uses of a variant of the multiple-baseline design. The multiple-baseline multiple-sequence (MBL-MS) consists of multiple-baseline designs that are interlaced with one another and include all possible sequences of treatments. The MBL-MS design appears to be primarily useful for comparison of treatments taking…

  11. Automatic Prediction of Protein 3D Structures by Probabilistic Multi-template Homology Modeling.

    PubMed

    Meier, Armin; Söding, Johannes

    2015-10-01

    Homology modeling predicts the 3D structure of a query protein based on the sequence alignment with one or more template proteins of known structure. Its great importance for biological research is owed to its speed, simplicity, reliability and wide applicability, covering more than half of the residues in protein sequence space. Although multiple templates have been shown to generally increase model quality over single templates, the information from multiple templates has so far been combined using empirically motivated, heuristic approaches. We present here a rigorous statistical framework for multi-template homology modeling. First, we find that the query proteins' atomic distance restraints can be accurately described by two-component Gaussian mixtures. This insight allowed us to apply the standard laws of probability theory to combine restraints from multiple templates. Second, we derive theoretically optimal weights to correct for the redundancy among related templates. Third, a heuristic template selection strategy is proposed. We improve the average GDT-ha model quality score by 11% over single template modeling and by 6.5% over a conventional multi-template approach on a set of 1000 query proteins. Robustness with respect to wrong constraints is likewise improved. We have integrated our multi-template modeling approach with the popular MODELLER homology modeling software in our free HHpred server http://toolkit.tuebingen.mpg.de/hhpred and also offer open source software for running MODELLER with the new restraints at https://bitbucket.org/soedinglab/hh-suite.

  12. SODa: an Mn/Fe superoxide dismutase prediction and design server.

    PubMed

    Kwasigroch, Jean Marc; Wintjens, René; Gilis, Dimitri; Rooman, Marianne

    2008-06-02

    Superoxide dismutases (SODs) are ubiquitous metalloenzymes that play an important role in the defense of aerobic organisms against oxidative stress, by converting reactive oxygen species into nontoxic molecules. We focus here on the SOD family that uses Fe or Mn as cofactor. The SODa webtool http://babylone.ulb.ac.be/soda predicts if a target sequence corresponds to an Fe/Mn SOD. If so, it predicts the metal ion specificity (Fe, Mn or cambialistic) and the oligomerization mode (dimer or tetramer) of the target. In addition, SODa proposes a list of residue substitutions likely to improve the predicted preferences for the metal cofactor and oligomerization mode. The method is based on residue fingerprints, consisting of residues conserved in SOD sequences or typical of SOD subgroups, and of interaction fingerprints, containing residue pairs that are in contact in SOD structures. SODa is shown to outperform and to be more discriminative than traditional techniques based on pairwise sequence alignments. Moreover, the fact that it proposes selected mutations makes it a valuable tool for rational protein design.

  13. Photoaffinity Labeling of Ras Converting Enzyme using Peptide Substrates that Incorporate Benzoylphenylalanine (Bpa) Residues: Improved Labeling and Structural Implications

    PubMed Central

    Kyro, Kelly; Manandhar, Surya P.; Mullen, Daniel; Schmidt, Walter K.; Distefano, Mark D.

    2012-01-01

    Rce1p catalyzes the proteolytic trimming of C-terminal tripeptides from isoprenylated proteins containing CAAX-box sequences. Because Rce1p processing is a necessary component in the Ras pathway of oncogenic signal transduction, Rce1p holds promise as a potential target for therapeutic intervention. However, its mechanism of proteolysis and active site have yet to be defined. Here, we describe synthetic peptide analogues that mimic the natural lipidated Rce1p substrate and incorporate photolabile groups for photoaffinity-labeling applications. These photoactive peptides are designed to crosslink to residues in or near the Rce1p active site. By incorporating the photoactive group via p-benzoyl-L-phenylalanine (Bpa) residues directly into the peptide substrate sequence, the labeling efficiency was substantially increased relative to a previously-synthesized compound. Incorporation of biotin on the N-terminus of the peptides permitted photolabeled Rce1p to be isolated via streptavidin affinity capture. Our findings further suggest that residues outside the CAAX-box sequence are in contact with Rce1p, which has implications for future inhibitor design. PMID:22079863

  14. Biosynthesis and processing of the somatostatin family of peptide hormones.

    PubMed

    Andrews, P C; Dixon, J E

    1986-01-01

    Understanding of the biosynthesis of the somatostatin family of peptide hormones has greatly increased in recent years. Isolation and sequencing of the rat somatostatin gene indicates that it contains a single intron located between the codons for Gn(-57) and Glu(-56) of pre-prosomatostatin. The gene contains three repetitive sequences, one at the 5' end of the gene and two of them 3' to the coding portion. Two of the sequences consist of alternating purine-pyrimidine bases and have been shown to adopt Z-DNA structures in vitro. The cDNA for rat somatostatin codes for a 116-residue peptide structurally similar to the anglerfish and catfish precursors to the 14-residue somatostatin (SST-14). In addition to SST-14, the catfish and the anglerfish both contain an additional pancreatic somatostatin, each derived from a different gene. The catfish contains a 22-residue somatostatin, which is O-glycosylated at Thr-5. The second somatostatin gene from anglerfish encodes a prosomatostatin that is processed to a 28-residue peptide. The mature peptide contains a hydroxylated lysine at position 23.

  15. Germline TRAV5D-4 T-Cell Receptor Sequence Targets a Primary Insulin Peptide of NOD Mice

    PubMed Central

    Nakayama, Maki; Castoe, Todd; Sosinowski, Tomasz; He, XiangLing; Johnson, Kelly; Haskins, Kathryn; Vignali, Dario A.A.; Gapin, Laurent; Pollock, David; Eisenbarth, George S.

    2012-01-01

    There is accumulating evidence that autoimmunity to insulin B chain peptide, amino acids 9–23 (insulin B:9–23), is central to development of autoimmune diabetes of the NOD mouse model. We hypothesized that enhanced susceptibility to autoimmune diabetes is the result of targeting of insulin by a T-cell receptor (TCR) sequence commonly encoded in the germline. In this study, we aimed to demonstrate that a particular Vα gene TRAV5D-4 with multiple junction sequences is sufficient to induce anti-islet autoimmunity by studying retrogenic mouse lines expressing α-chains with different Vα TRAV genes. Retrogenic NOD strains expressing Vα TRAV5D-4 α-chains with many different complementarity determining region (CDR) 3 sequences, even those derived from TCRs recognizing islet-irrelevant molecules, developed anti-insulin autoimmunity. Induction of insulin autoantibodies by TRAV5D-4 α-chains was abrogated by the mutation of insulin peptide B:9–23 or that of two amino acid residues in CDR1 and 2 of the TRAV5D-4. TRAV13–1, the human ortholog of murine TRAV5D-4, was also capable of inducing in vivo anti-insulin autoimmunity when combined with different murine CDR3 sequences. Targeting primary autoantigenic peptides by simple germline-encoded TCR motifs may underlie enhanced susceptibility to the development of autoimmune diabetes. PMID:22315318

  16. Molecular dynamics studies on the DNA-binding process of ERG.

    PubMed

    Beuerle, Matthias G; Dufton, Neil P; Randi, Anna M; Gould, Ian R

    2016-11-15

    The ETS family of transcription factors regulate gene targets by binding to a core GGAA DNA-sequence. The ETS factor ERG is required for homeostasis and lineage-specific functions in endothelial cells, some subset of haemopoietic cells and chondrocytes; its ectopic expression is linked to oncogenesis in multiple tissues. To date details of the DNA-binding process of ERG including DNA-sequence recognition outside the core GGAA-sequence are largely unknown. We combined available structural and experimental data to perform molecular dynamics simulations to study the DNA-binding process of ERG. In particular we were able to reproduce the ERG DNA-complex with a DNA-binding simulation starting in an unbound configuration with a final root-mean-square-deviation (RMSD) of 2.1 Å to the core ETS domain DNA-complex crystal structure. This allowed us to elucidate the relevance of amino acids involved in the formation of the ERG DNA-complex and to identify Arg385 as a novel key residue in the DNA-binding process. Moreover we were able to show that water-mediated hydrogen bonds are present between ERG and DNA in our simulations and that those interactions have the potential to achieve sequence recognition outside the GGAA core DNA-sequence. The methodology employed in this study shows the promising capabilities of modern molecular dynamics simulations in the field of protein DNA-interactions.

  17. Tetramer-organizing polyproline-rich peptides differ in CHO cell-expressed and plasma-derived human butyrylcholinesterase tetramers.

    PubMed

    Schopfer, Lawrence M; Lockridge, Oksana

    2016-06-01

    Tetrameric butyrylcholinesterase (BChE) in human plasma is the product of multiple genes, namely one BCHE gene on chromosome 3q26.1 and multiple genes that encode polyproline-rich peptides. The function of the polyproline-rich peptides is to assemble BChE into tetramers. CHO cells transfected with human BChE cDNA express BChE monomers and dimers, but only low quantities of tetramers. Our goal was to identify the polyproline-rich peptides in CHO-cell derived human BChE tetramers. CHO cell-produced human BChE tetramers were purified from serum-free culture medium. Peptides embedded in the tetramerization domain were released from BChE tetramers by boiling and identified by liquid chromatography-tandem mass spectrometry. A total of 270 proline-rich peptides were sequenced, ranging in size from 6-41 residues. The peptides originated from 60 different proteins that reside in multiple cell compartments including the nucleus, cytoplasm, and endoplasmic reticulum. No single protein was the source of the polyproline-rich peptides in CHO cell-expressed human BChE tetramers. In contrast, 70% of the tetramer-organizing peptides in plasma-derived BChE tetramers originate from lamellipodin. No protein source was identified for polyproline peptides containing up to 41 consecutive proline residues. In conclusion, the use of polyproline-rich peptides as a tetramerization motif is documented only for the cholinesterases, but is expected to serve other tetrameric proteins as well. The CHO cell data suggest that the BChE tetramer-organizing peptide can arise from a variety of proteins. Copyright © 2016 Elsevier B.V. All rights reserved.

  18. An extension of command shaping methods for controlling residual vibration using frequency sampling

    NASA Technical Reports Server (NTRS)

    Singer, Neil C.; Seering, Warren P.

    1992-01-01

    The authors present an extension to the impulse shaping technique for commanding machines to move with reduced residual vibration. The extension, called frequency sampling, is a method for generating constraints that are used to obtain shaping sequences which minimize residual vibration in systems such as robots whose resonant frequencies change during motion. The authors present a review of impulse shaping methods, a development of the proposed extension, and a comparison of results of tests conducted on a simple model of the space shuttle robot arm. Frequency shaping provides a method for minimizing the impulse sequence duration required to give the desired insensitivity.

  19. Sequential addition of short DNA oligos in DNA-polymerase-based synthesis reactions

    DOEpatents

    Gardner, Shea N; Mariella, Jr., Raymond P; Christian, Allen T; Young, Jennifer A; Clague, David S

    2013-06-25

    A method of preselecting a multiplicity of DNA sequence segments that will comprise the DNA molecule of user-defined sequence, separating the DNA sequence segments temporally, and combining the multiplicity of DNA sequence segments with at least one polymerase enzyme wherein the multiplicity of DNA sequence segments join to produce the DNA molecule of user-defined sequence. Sequence segments may be of length n, where n is an odd integer. In one embodiment the length of desired hybridizing overlap is specified by the user and the sequences and the protocol for combining them are guided by computational (bioinformatics) predictions. In one embodiment sequence segments are combined from multiple reading frames to span the same region of a sequence, so that multiple desired hybridizations may occur with different overlap lengths.

  20. Selective Loss of Cysteine Residues and Disulphide Bonds in a Potato Proteinase Inhibitor II Family

    PubMed Central

    Li, Xiu-Qing; Zhang, Tieling; Donnelly, Danielle

    2011-01-01

    Disulphide bonds between cysteine residues in proteins play a key role in protein folding, stability, and function. Loss of a disulphide bond is often associated with functional differentiation of the protein. The evolution of disulphide bonds is still actively debated; analysis of naturally occurring variants can promote understanding of the protein evolutionary process. One of the disulphide bond-containing protein families is the potato proteinase inhibitor II (PI-II, or Pin2, for short) superfamily, which is found in most solanaceous plants and participates in plant development, stress response, and defence. Each PI-II domain contains eight cysteine residues (8C), and two similar PI-II domains form a functional protein that has eight disulphide bonds and two non-identical reaction centres. It is still unclear which patterns and processes affect cysteine residue loss in PI-II. Through cDNA sequencing and data mining, we found six natural variants missing cysteine residues involved in one or two disulphide bonds at the first reaction centre. We named these variants Pi7C and Pi6C for the proteins missing one or two pairs of cysteine residues, respectively. This PI-II-7C/6C family was found exclusively in potato. The missing cysteine residues were in bonding pairs but distant from one another at the nucleotide/protein sequence level. The non-synonymous/synonymous substitution (Ka/Ks) ratio analysis suggested a positive evolutionary gene selection for Pi6C and various Pi7C. The selective deletion of the first reaction centre cysteine residues that are structure-level-paired but sequence-level-distant in PI-II illustrates the flexibility of PI-II domains and suggests the functionality of their transient gene versions during evolution. PMID:21494600

  1. Axolotl hemoglobin: cDNA-derived amino acid sequences of two alpha globins and a beta globin from an adult Ambystoma mexicanum.

    PubMed

    Shishikura, Fumio; Takeuchi, Hiro-aki; Nagai, Takatoshi

    2005-11-01

    Erythrocytes of the adult axolotl, Ambystoma mexicanum, have multiple hemoglobins. We separated and purified two kinds of hemoglobin, termed major hemoglobin (Hb M) and minor hemoglobin (Hb m), from a five-year-old male by hydrophobic interaction column chromatography on Alkyl Superose. The hemoglobins have two distinct alpha type globin polypeptides (alphaM and alpham) and a common beta globin polypeptide, all of which were purified in FPLC on a reversed-phase column after S-pyridylethylation. The complete amino acid sequences of the three globin chains were determined separately using nucleotide sequencing with the assistance of protein sequencing. The mature globin molecules were composed of 141 amino acid residues for alphaM globin, 143 for alpham globin and 146 for beta globin. Comparing primary structures of the five kinds of axolotl globins, including two previously established alpha type globins from the same species, with other known globins of amphibians and representatives of other vertebrates, we constructed phylogenetic trees for amphibian hemoglobins and tetrapod hemoglobins. The molecular trees indicated that alphaM, alpham, beta and the previously known alpha major globin were adult types of globins and the other known alpha globin was a larval type. The existence of two to four more globins in the axolotl erythrocyte is predicted.

  2. Multi-loci diagnosis of acute lymphoblastic leukaemia with high-throughput sequencing and bioinformatics analysis.

    PubMed

    Ferret, Yann; Caillault, Aurélie; Sebda, Shéhérazade; Duez, Marc; Grardel, Nathalie; Duployez, Nicolas; Villenet, Céline; Figeac, Martin; Preudhomme, Claude; Salson, Mikaël; Giraud, Mathieu

    2016-05-01

    High-throughput sequencing (HTS) is considered a technical revolution that has improved our knowledge of lymphoid and autoimmune diseases, changing our approach to leukaemia both at diagnosis and during follow-up. As part of an immunoglobulin/T cell receptor-based minimal residual disease (MRD) assessment of acute lymphoblastic leukaemia patients, we assessed the performance and feasibility of the replacement of the first steps of the approach based on DNA isolation and Sanger sequencing, using a HTS protocol combined with bioinformatics analysis and visualization using the Vidjil software. We prospectively analysed the diagnostic and relapse samples of 34 paediatric patients, thus identifying 125 leukaemic clones with recombinations on multiple loci (TRG, TRD, IGH and IGK), including Dd2/Dd3 and Intron/KDE rearrangements. Sequencing failures were halved (14% vs. 34%, P = 0.0007), enabling more patients to be monitored. Furthermore, more markers per patient could be monitored, reducing the probability of false negative MRD results. The whole analysis, from sample receipt to clinical validation, was shorter than our current diagnostic protocol, with equal resources. V(D)J recombination was successfully assigned by the software, even for unusual recombinations. This study emphasizes the progress that HTS with adapted bioinformatics tools can bring to the diagnosis of leukaemia patients. © 2016 John Wiley & Sons Ltd.

  3. Ten tandem repeats of {beta}-hCG 109-118 enhance immunogenicity and anti-tumor effects of {beta}-hCG C-terminal peptide carried by mycobacterial heat-shock protein HSP65

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zhang Yankai; Yan Rong; He Yi

    2006-07-14

    The {beta}-subunit of human chorionic gonadotropin ({beta}-hCG) is secreted by many kinds of tumors and it has been used as an ideal target antigen to develop vaccines against tumors. In view of the low immunogenicity of this self-peptide,we designed a method based on isocaudamer technique to repeat tandemly the 10-residue sequence X of {beta}-hCG (109-118), then 10 tandemly repeated copies of the 10-residue sequence combined with {beta}-hCG C-terminal 37 peptides were fused to mycobacterial heat-shock protein 65 to construct a fusion protein HSP65-X10-{beta}hCGCTP37 as an immunogen. In this study, we examined the effect of the tandem repeats of this 10-residuemore » sequence in eliciting an immune by comparing the immunogenicity and anti-tumor effects of the two immunogens, HSP65-X10-{beta}hCGCTP37 and HSP65-{beta}hCGCTP37 (without the 10 tandem repeats). Immunization of mice with the fusion protein HSP65-X10-{beta}hCGCTP37 elicited much higher levels of specific anti-{beta}-hCG antibodies and more effectively inhibited the growth of Lewis lung carcinoma (LLC) in vivo than with HSP65-{beta}hCGCTP37, which should suggest that HSP65-X10-{beta}hCGCTP37 may be an effective protein vaccine for the treatment of {beta}-hCG-dependent tumors and multiple tandem repeats of a certain epitope are an efficient method to overcome the low immunogenicity of self-peptide antigens.« less

  4. Adenosine-to-Inosine Editing of MicroRNA-487b Alters Target Gene Selection After Ischemia and Promotes Neovascularization.

    PubMed

    van der Kwast, Reginald V C T; van Ingen, Eva; Parma, Laura; Peters, Hendrika A B; Quax, Paul H A; Nossent, A Yaël

    2018-02-02

    Adenosine-to-inosine editing of microRNAs has the potential to cause a shift in target site selection. 2'-O-ribose-methylation of adenosine residues, however, has been shown to inhibit adenosine-to-inosine editing. To investigate whether angiomiR miR487b is subject to adenosine-to-inosine editing or 2'-O-ribose-methylation during neovascularization. Complementary DNA was prepared from C57BL/6-mice subjected to hindlimb ischemia. Using Sanger sequencing and endonuclease digestion, we identified and validated adenosine-to-inosine editing of the miR487b seed sequence. In the gastrocnemius muscle, pri-miR487b editing increased from 6.7±0.4% before to 11.7±1.6% ( P =0.02) 1 day after ischemia. Edited pri-miR487b is processed into a novel microRNA, edited miR487b, which is also upregulated after ischemia. We confirmed editing of miR487b in multiple human primary vascular cell types. Short interfering RNA-mediated knockdown demonstrated that editing is adenosine deaminase acting on RNA 1 and 2 dependent. Using reverse-transcription at low dNTP concentrations followed by quantitative-PCR, we found that the same adenosine residue is methylated in mice and human primary cells. In the murine gastrocnemius, the estimated methylation fraction increased from 32.8±14% before to 53.6±12% 1 day after ischemia. Short interfering RNA knockdown confirmed that methylation is fibrillarin dependent. Although we could not confirm that methylation directly inhibits editing, we do show that adenosine deaminase acting on RNA 1 and 2 and fibrillarin negatively influence each other's expression. Using multiple luciferase reporter gene assays, we could demonstrate that editing results in a complete switch of target site selection. In human primary cells, we confirmed the shift in miR487b targeting after editing, resulting in a edited miR487b targetome that is enriched for multiple proangiogenic pathways. Furthermore, overexpression of edited miR487b, but not wild-type miR487b, stimulates angiogenesis in both in vitro and ex vivo assays. MiR487b is edited in the seed sequence in mice and humans, resulting in a novel, proangiogenic microRNA with a unique targetome. The rate of miR487b editing, as well as 2'-O-ribose-methylation, is increased in murine muscle tissue during postischemic neovascularization. Our findings suggest miR487b editing plays an intricate role in postischemic neovascularization. © 2017 American Heart Association, Inc.

  5. Evolutionarily Conserved Linkage between Enzyme Fold, Flexibility, and Catalysis

    PubMed Central

    Ramanathan, Arvind; Agarwal, Pratul K.

    2011-01-01

    Proteins are intrinsically flexible molecules. The role of internal motions in a protein's designated function is widely debated. The role of protein structure in enzyme catalysis is well established, and conservation of structural features provides vital clues to their role in function. Recently, it has been proposed that the protein function may involve multiple conformations: the observed deviations are not random thermodynamic fluctuations; rather, flexibility may be closely linked to protein function, including enzyme catalysis. We hypothesize that the argument of conservation of important structural features can also be extended to identification of protein flexibility in interconnection with enzyme function. Three classes of enzymes (prolyl-peptidyl isomerase, oxidoreductase, and nuclease) that catalyze diverse chemical reactions have been examined using detailed computational modeling. For each class, the identification and characterization of the internal protein motions coupled to the chemical step in enzyme mechanisms in multiple species show identical enzyme conformational fluctuations. In addition to the active-site residues, motions of protein surface loop regions (>10 Å away) are observed to be identical across species, and networks of conserved interactions/residues connect these highly flexible surface regions to the active-site residues that make direct contact with substrates. More interestingly, examination of reaction-coupled motions in non-homologous enzyme systems (with no structural or sequence similarity) that catalyze the same biochemical reaction shows motions that induce remarkably similar changes in the enzyme–substrate interactions during catalysis. The results indicate that the reaction-coupled flexibility is a conserved aspect of the enzyme molecular architecture. Protein motions in distal areas of homologous and non-homologous enzyme systems mediate similar changes in the active-site enzyme–substrate interactions, thereby impacting the mechanism of catalyzed chemistry. These results have implications for understanding the mechanism of allostery, and for protein engineering and drug design. PMID:22087074

  6. Evolutionarily conserved linkage between enzyme fold, flexibility, and catalysis.

    PubMed

    Ramanathan, Arvind; Agarwal, Pratul K

    2011-11-01

    Proteins are intrinsically flexible molecules. The role of internal motions in a protein's designated function is widely debated. The role of protein structure in enzyme catalysis is well established, and conservation of structural features provides vital clues to their role in function. Recently, it has been proposed that the protein function may involve multiple conformations: the observed deviations are not random thermodynamic fluctuations; rather, flexibility may be closely linked to protein function, including enzyme catalysis. We hypothesize that the argument of conservation of important structural features can also be extended to identification of protein flexibility in interconnection with enzyme function. Three classes of enzymes (prolyl-peptidyl isomerase, oxidoreductase, and nuclease) that catalyze diverse chemical reactions have been examined using detailed computational modeling. For each class, the identification and characterization of the internal protein motions coupled to the chemical step in enzyme mechanisms in multiple species show identical enzyme conformational fluctuations. In addition to the active-site residues, motions of protein surface loop regions (>10 Å away) are observed to be identical across species, and networks of conserved interactions/residues connect these highly flexible surface regions to the active-site residues that make direct contact with substrates. More interestingly, examination of reaction-coupled motions in non-homologous enzyme systems (with no structural or sequence similarity) that catalyze the same biochemical reaction shows motions that induce remarkably similar changes in the enzyme-substrate interactions during catalysis. The results indicate that the reaction-coupled flexibility is a conserved aspect of the enzyme molecular architecture. Protein motions in distal areas of homologous and non-homologous enzyme systems mediate similar changes in the active-site enzyme-substrate interactions, thereby impacting the mechanism of catalyzed chemistry. These results have implications for understanding the mechanism of allostery, and for protein engineering and drug design.

  7. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ramanathan, Arvind; Agarwal, Pratul K

    Proteins are intrinsically flexible molecules. The role of internal motions in a protein's designated function is widely debated. The role of protein structure in enzyme catalysis is well established, and conservation of structural features provides vital clues to their role in function. Recently, it has been proposed that the protein function may involve multiple conformations: the observed deviations are not random thermodynamic fluctuations; rather, flexibility may be closely linked to protein function, including enzyme catalysis. We hypothesize that the argument of conservation of important structural features can also be extended to identification of protein flexibility in interconnection with enzyme function.more » Three classes of enzymes (prolyl-peptidyl isomerase, oxidoreductase, and nuclease) that catalyze diverse chemical reactions have been examined using detailed computational modeling. For each class, the identification and characterization of the internal protein motions coupled to the chemical step in enzyme mechanisms in multiple species show identical enzyme conformational fluctuations. In addition to the active-site residues, motions of protein surface loop regions (>10 away) are observed to be identical across species, and networks of conserved interactions/residues connect these highly flexible surface regions to the active-site residues that make direct contact with substrates. More interestingly, examination of reaction-coupled motions in non-homologous enzyme systems (with no structural or sequence similarity) that catalyze the same biochemical reaction shows motions that induce remarkably similar changes in the enzyme substrate interactions during catalysis. The results indicate that the reaction-coupled flexibility is a conserved aspect of the enzyme molecular architecture. Protein motions in distal areas of homologous and non-homologous enzyme systems mediate similar changes in the active-site enzyme substrate interactions, thereby impacting the mechanism of catalyzed chemistry. These results have implications for understanding the mechanism of allostery, and for protein engineering and drug design.« less

  8. Amino acid sequence of tyrosinase from Neurospora crassa.

    PubMed Central

    Lerch, K

    1978-01-01

    The amino-acid sequence of tyrosinase from Neurospora crassa (monophenol,dihydroxyphenylalanine:oxygen oxidoreductase, EC 1.14.18.1) is reported. This copper-containing oxidase consists of a single polypeptide chain of 407 amino acids. The primary structure was determined by automated and manual sequence analysis on fragments produced by cleavage with cyanogen bromide and on peptides obtained by digestion with trypsin, pepsin, thermolysin, or chymotrypsin. The amino terminus of the protein is acetylated and the single cysteinyl residue 96 is covalently linked via a thioether bridge to histidyl residue 94. The formation and the possible role of this unusual structure in Neurospora tyrosinase is discussed. Dye-sensitized photooxidation of apotyrosinase and active-site-directed inactivation of the native enzyme indicate the possible involvement of histidyl residues 188, 192, 289, and 305 or 306 as ligands to the active-site copper as well as in the catalytic mechanism of this monooxygenase. PMID:151279

  9. Methylation of RNA polymerase II non-consensus Lysine residues marks early transcription in mammalian cells

    PubMed Central

    Dias, João D; Rito, Tiago; Torlai Triglia, Elena; Kukalev, Alexander; Ferrai, Carmelo; Chotalia, Mita; Brookes, Emily; Kimura, Hiroshi; Pombo, Ana

    2015-01-01

    Dynamic post-translational modification of RNA polymerase II (RNAPII) coordinates the co-transcriptional recruitment of enzymatic complexes that regulate chromatin states and processing of nascent RNA. Extensive phosphorylation of serine residues at the largest RNAPII subunit occurs at its structurally-disordered C-terminal domain (CTD), which is composed of multiple heptapeptide repeats with consensus sequence Y1-S2-P3-T4-S5-P6-S7. Serine-5 and Serine-7 phosphorylation mark transcription initiation, whereas Serine-2 phosphorylation coincides with productive elongation. In vertebrates, the CTD has eight non-canonical substitutions of Serine-7 into Lysine-7, which can be acetylated (K7ac). Here, we describe mono- and di-methylation of CTD Lysine-7 residues (K7me1 and K7me2). K7me1 and K7me2 are observed during the earliest transcription stages and precede or accompany Serine-5 and Serine-7 phosphorylation. In contrast, K7ac is associated with RNAPII elongation, Serine-2 phosphorylation and mRNA expression. We identify an unexpected balance between RNAPII K7 methylation and acetylation at gene promoters, which fine-tunes gene expression levels. DOI: http://dx.doi.org/10.7554/eLife.11215.001 PMID:26687004

  10. PuLSE: Quality control and quantification of peptide sequences explored by phage display libraries.

    PubMed

    Shave, Steven; Mann, Stefan; Koszela, Joanna; Kerr, Alastair; Auer, Manfred

    2018-01-01

    The design of highly diverse phage display libraries is based on assumption that DNA bases are incorporated at similar rates within the randomized sequence. As library complexity increases and expected copy numbers of unique sequences decrease, the exploration of library space becomes sparser and the presence of truly random sequences becomes critical. We present the program PuLSE (Phage Library Sequence Evaluation) as a tool for assessing randomness and therefore diversity of phage display libraries. PuLSE runs on a collection of sequence reads in the fastq file format and generates tables profiling the library in terms of unique DNA sequence counts and positions, translated peptide sequences, and normalized 'expected' occurrences from base to residue codon frequencies. The output allows at-a-glance quantitative quality control of a phage library in terms of sequence coverage both at the DNA base and translated protein residue level, which has been missing from toolsets and literature. The open source program PuLSE is available in two formats, a C++ source code package for compilation and integration into existing bioinformatics pipelines and precompiled binaries for ease of use.

  11. The amino acid sequence of Staphylococcus aureus penicillinase.

    PubMed Central

    Ambler, R P

    1975-01-01

    The amino acid sequence of the penicillinase (penicillin amido-beta-lactamhydrolase, EC 3.5.2.6) from Staphylococcus aureus strain PC1 was determined. The protein consists of a single polypeptide chain of 257 residues, and the sequence was determined by characterization of tryptic, chymotryptic, peptic and CNBr peptides, with some additional evidence from thermolysin and S. aureus proteinase peptides. A mistake in the preliminary report of the sequence is corrected; residues 113-116 are now thought to be -Lys-Lys-Val-Lys- rather than -Lys-Val-Lys-Lys-. Detailed evidence for the amino acid sequence has been deposited as Supplementary Publication SUP 50056 (91 pages) at the British Library (Lending Division), Boston Spa, Wetherby, West Yorkshire LS23 7BQ, U.K., from whom copies may be obtained on the terms given in Biochem. J. (1975) 145, 5. PMID:1218078

  12. Homologous kappa-neurotoxins exhibit residue-specific interactions with the alpha 3 subunit of the nicotinic acetylcholine receptor: a comparison of the structural requirements for kappa-bungarotoxin and kappa-flavitoxin binding.

    PubMed

    McLane, K E; Weaver, W R; Lei, S; Chiappinelli, V A; Conti-Tronconi, B M

    1993-07-13

    kappa-Flavotoxin (kappa-FTX), a snake neurotoxin that is a selective antagonist of certain neuronal nicotinic acetylcholine receptors (AChRs), has recently been isolated and characterized [Grant, G. A., Frazier, M. W., & Chiappinelli, V. A. (1988) Biochemistry 27, 1532-1537]. Like the related snake toxin kappa-bungarotoxin (kappa-BTX), kappa-FTX binds with high affinity to alpha 3 subtypes of neuronal AChRs, even though there are distinct sequence differences between the two toxins. To further characterize the sequence regions of the neuronal AChR alpha 3 subunit involved in formation of the binding site for this family of kappa-neurotoxins, we investigated kappa-FTX binding to overlapping synthetic peptides screening the alpha 3 subunit sequence. A sequence region forming a "prototope" for kappa-FTX was identified within residues alpha 3 (51-70), confirming the suggestions of previous studies on the binding of kappa-BTX to the alpha 3 subunit [McLane, K. E., Tang, F., & Conti-Tronconi, B. M. (1990) J. Biol. Chem. 265, 1537-1544] and alpha-bungarotoxin to the Torpedo AChR alpha subunit [Conti-Tronconi, B. M., Tang, F., Diethelm, B. M., Spencer, S. R., Reinhardt-Maelicke, S., & Maelicke, A. (1990) Biochemistry 29, 6221-6230] that this sequence region is involved in formation of a cholinergic site. Single residue substituted analogues, where each residue of the sequence alpha 3 (51-70) was sequentially replaced by a glycine, were used to identify the amino acid side chains involved in the interaction of this prototope with kappa-FTX.(ABSTRACT TRUNCATED AT 250 WORDS)

  13. Preparation and properties of pure, full-length IclR protein of Escherichia coli. Use of time-of-flight mass spectrometry to investigate the problems encountered.

    PubMed Central

    Donald, L. J.; Chernushevich, I. V.; Zhou, J.; Verentchikov, A.; Poppe-Schriemer, N.; Hosfield, D. J.; Westmore, J. B.; Ens, W.; Duckworth, H. W.; Standing, K. G.

    1996-01-01

    IclR protein, the repressor of the aceBAK operon of Escherichia coli, has been examined by time-of-flight mass spectrometry, with ionization by matrix assisted laser desorption or by electrospray. The purified protein was found to have a smaller mass than that predicted from the base sequence of the cloned iclR gene. Additional measurements were made on mixtures of peptides derived from IclR by treatment with trypsin and cyanogen bromide. They showed that the amino acid sequence is that predicted from the gene sequence, except that the protein has suffered truncation by removal of the N-terminal eight or, in some cases, nine amino acid residues. The peptide bond whose hydrolysis would remove eight residues is a typical target for the E. coli protease OmpT. We find that, by taking precautions to minimize Omp T proteolysis, or by eliminating it through mutation of the host strain, we can isolate full-length IclR protein (lacking only the N-terminal methionine residue). Full-length IclR is a much better DNA-binding protein than the truncated versions: it binds the aceBAK operator sequence 44-fold more tightly, presumably because of additional contacts that the N-terminal residues make with the DNA. Our experience thus demonstrates the advantages of using mass spectrometry to characterize newly purified proteins produced from cloned genes, especially where proteolysis or other covalent modification is a concern. This technique gives mass spectra from complex peptide mixtures that can be analyzed completely, without any fractionation of the mixtures, by reference to the amino acid sequence inferred from the base sequence of the cloned gene. PMID:8844850

  14. Negative Ion In-Source Decay Matrix-Assisted Laser Desorption/Ionization Mass Spectrometry for Sequencing Acidic Peptides

    NASA Astrophysics Data System (ADS)

    McMillen, Chelsea L.; Wright, Patience M.; Cassady, Carolyn J.

    2016-05-01

    Matrix-assisted laser desorption/ionization (MALDI) in-source decay was studied in the negative ion mode on deprotonated peptides to determine its usefulness for obtaining extensive sequence information for acidic peptides. Eight biological acidic peptides, ranging in size from 11 to 33 residues, were studied by negative ion mode ISD (nISD). The matrices 2,5-dihydroxybenzoic acid, 2-aminobenzoic acid, 2-aminobenzamide, 1,5-diaminonaphthalene, 5-amino-1-naphthol, 3-aminoquinoline, and 9-aminoacridine were used with each peptide. Optimal fragmentation was produced with 1,5-diaminonphthalene (DAN), and extensive sequence informative fragmentation was observed for every peptide except hirudin(54-65). Cleavage at the N-Cα bond of the peptide backbone, producing c' and z' ions, was dominant for all peptides. Cleavage of the N-Cα bond N-terminal to proline residues was not observed. The formation of c and z ions is also found in electron transfer dissociation (ETD), electron capture dissociation (ECD), and positive ion mode ISD, which are considered to be radical-driven techniques. Oxidized insulin chain A, which has four highly acidic oxidized cysteine residues, had less extensive fragmentation. This peptide also exhibited the only charged localized fragmentation, with more pronounced product ion formation adjacent to the highly acidic residues. In addition, spectra were obtained by positive ion mode ISD for each protonated peptide; more sequence informative fragmentation was observed via nISD for all peptides. Three of the peptides studied had no product ion formation in ISD, but extensive sequence informative fragmentation was found in their nISD spectra. The results of this study indicate that nISD can be used to readily obtain sequence information for acidic peptides.

  15. Negative Ion In-Source Decay Matrix-Assisted Laser Desorption/Ionization Mass Spectrometry for Sequencing Acidic Peptides.

    PubMed

    McMillen, Chelsea L; Wright, Patience M; Cassady, Carolyn J

    2016-05-01

    Matrix-assisted laser desorption/ionization (MALDI) in-source decay was studied in the negative ion mode on deprotonated peptides to determine its usefulness for obtaining extensive sequence information for acidic peptides. Eight biological acidic peptides, ranging in size from 11 to 33 residues, were studied by negative ion mode ISD (nISD). The matrices 2,5-dihydroxybenzoic acid, 2-aminobenzoic acid, 2-aminobenzamide, 1,5-diaminonaphthalene, 5-amino-1-naphthol, 3-aminoquinoline, and 9-aminoacridine were used with each peptide. Optimal fragmentation was produced with 1,5-diaminonphthalene (DAN), and extensive sequence informative fragmentation was observed for every peptide except hirudin(54-65). Cleavage at the N-Cα bond of the peptide backbone, producing c' and z' ions, was dominant for all peptides. Cleavage of the N-Cα bond N-terminal to proline residues was not observed. The formation of c and z ions is also found in electron transfer dissociation (ETD), electron capture dissociation (ECD), and positive ion mode ISD, which are considered to be radical-driven techniques. Oxidized insulin chain A, which has four highly acidic oxidized cysteine residues, had less extensive fragmentation. This peptide also exhibited the only charged localized fragmentation, with more pronounced product ion formation adjacent to the highly acidic residues. In addition, spectra were obtained by positive ion mode ISD for each protonated peptide; more sequence informative fragmentation was observed via nISD for all peptides. Three of the peptides studied had no product ion formation in ISD, but extensive sequence informative fragmentation was found in their nISD spectra. The results of this study indicate that nISD can be used to readily obtain sequence information for acidic peptides.

  16. AlignMiner: a Web-based tool for detection of divergent regions in multiple sequence alignments of conserved sequences

    PubMed Central

    2010-01-01

    Background Multiple sequence alignments are used to study gene or protein function, phylogenetic relations, genome evolution hypotheses and even gene polymorphisms. Virtually without exception, all available tools focus on conserved segments or residues. Small divergent regions, however, are biologically important for specific quantitative polymerase chain reaction, genotyping, molecular markers and preparation of specific antibodies, and yet have received little attention. As a consequence, they must be selected empirically by the researcher. AlignMiner has been developed to fill this gap in bioinformatic analyses. Results AlignMiner is a Web-based application for detection of conserved and divergent regions in alignments of conserved sequences, focusing particularly on divergence. It accepts alignments (protein or nucleic acid) obtained using any of a variety of algorithms, which does not appear to have a significant impact on the final results. AlignMiner uses different scoring methods for assessing conserved/divergent regions, Entropy being the method that provides the highest number of regions with the greatest length, and Weighted being the most restrictive. Conserved/divergent regions can be generated either with respect to the consensus sequence or to one master sequence. The resulting data are presented in a graphical interface developed in AJAX, which provides remarkable user interaction capabilities. Users do not need to wait until execution is complete and can.even inspect their results on a different computer. Data can be downloaded onto a user disk, in standard formats. In silico and experimental proof-of-concept cases have shown that AlignMiner can be successfully used to designing specific polymerase chain reaction primers as well as potential epitopes for antibodies. Primer design is assisted by a module that deploys several oligonucleotide parameters for designing primers "on the fly". Conclusions AlignMiner can be used to reliably detect divergent regions via several scoring methods that provide different levels of selectivity. Its predictions have been verified by experimental means. Hence, it is expected that its usage will save researchers' time and ensure an objective selection of the best-possible divergent region when closely related sequences are analysed. AlignMiner is freely available at http://www.scbi.uma.es/alignminer. PMID:20525162

  17. Lineages of Streptococcus equi ssp. equi in the Irish equine industry.

    PubMed

    Moloney, Emma; Kavanagh, Kerrie S; Buckley, Tom C; Cooney, Jakki C

    2013-01-01

    Streptococcus equi ssp. equi is the causative agent of 'Strangles' in horses. This is a debilitating condition leading to economic loss, yard closures and cancellation of equestrian events. There are multiple genotypes of S. equi ssp. equi which can cause disease, but to date there has been no systematic study of strains which are prevalent in Ireland. This study identified and classified Streptococcus equi ssp. equi strains isolated from within the Irish equine industry. Two hundred veterinary isolates were subjected to SLST (single locus sequence typing) based on an internal sequence from the seM gene of Streptococcus equi ssp equi. Of the 171 samples which successfully gave an amplicon, 162 samples (137 Irish and 24 UK strains) gave robust DNA sequence information. Analysis of the sequences allowed division of the isolates into 19 groups, 13 of which contain at least 2 isolates and 6 groups containing single isolates. There were 19 positions where a DNA SNP (single nucleotide polymorphism) occurs, and one 3 bp insertion. All groups had multiple (2-8) SNPs. Of the SNPs 17 would result in an amino acid change in the encoded protein. Interestingly, the single isolate EI8, which has 6 SNPs, has the three base pair insertion which is not seen in any other isolate, this would result in the insertion of an Ile residue at position 62 in that protein sequence. Comparison of the relevant region in the determined sequences with the UK Streptococcus equi seM MLST database showed that Group B (15 isolates) and Group I (2 isolates), as well as the individual isolates EI3 and EI8, are unique to Ireland, and some groups are most likely of UK origin (Groups F and M), but many more probably passed back and forth between the two countries. The strains occurring in Ireland are not clonal and there is a considerable degree of sequence variation seen in the seM gene. There are two major clades causing infection in Ireland and these strains are also common in the UK.

  18. Lineages of Streptococcus equi ssp. equi in the Irish equine industry

    PubMed Central

    2013-01-01

    Background Streptococcus equi ssp. equi is the causative agent of ‘Strangles’ in horses. This is a debilitating condition leading to economic loss, yard closures and cancellation of equestrian events. There are multiple genotypes of S. equi ssp. equi which can cause disease, but to date there has been no systematic study of strains which are prevalent in Ireland. This study identified and classified Streptococcus equi ssp. equi strains isolated from within the Irish equine industry. Results Two hundred veterinary isolates were subjected to SLST (single locus sequence typing) based on an internal sequence from the seM gene of Streptococcus equi ssp equi. Of the 171 samples which successfully gave an amplicon, 162 samples (137 Irish and 24 UK strains) gave robust DNA sequence information. Analysis of the sequences allowed division of the isolates into 19 groups, 13 of which contain at least 2 isolates and 6 groups containing single isolates. There were 19 positions where a DNA SNP (single nucleotide polymorphism) occurs, and one 3 bp insertion. All groups had multiple (2–8) SNPs. Of the SNPs 17 would result in an amino acid change in the encoded protein. Interestingly, the single isolate EI8, which has 6 SNPs, has the three base pair insertion which is not seen in any other isolate, this would result in the insertion of an Ile residue at position 62 in that protein sequence. Comparison of the relevant region in the determined sequences with the UK Streptococcus equi seM MLST database showed that Group B (15 isolates) and Group I (2 isolates), as well as the individual isolates EI3 and EI8, are unique to Ireland, and some groups are most likely of UK origin (Groups F and M), but many more probably passed back and forth between the two countries. Conclusions The strains occurring in Ireland are not clonal and there is a considerable degree of sequence variation seen in the seM gene. There are two major clades causing infection in Ireland and these strains are also common in the UK. PMID:23731628

  19. Uptake, Results, and Outcomes of Germline Multiple-Gene Sequencing After Diagnosis of Breast Cancer.

    PubMed

    Kurian, Allison W; Ward, Kevin C; Hamilton, Ann S; Deapen, Dennis M; Abrahamse, Paul; Bondarenko, Irina; Li, Yun; Hawley, Sarah T; Morrow, Monica; Jagsi, Reshma; Katz, Steven J

    2018-05-10

    Low-cost sequencing of multiple genes is increasingly available for cancer risk assessment. Little is known about uptake or outcomes of multiple-gene sequencing after breast cancer diagnosis in community practice. To examine the effect of multiple-gene sequencing on the experience and treatment outcomes for patients with breast cancer. For this population-based retrospective cohort study, patients with breast cancer diagnosed from January 2013 to December 2015 and accrued from SEER registries across Georgia and in Los Angeles, California, were surveyed (n = 5080, response rate = 70%). Responses were merged with SEER data and results of clinical genetic tests, either BRCA1 and BRCA2 (BRCA1/2) sequencing only or including additional other genes (multiple-gene sequencing), provided by 4 laboratories. Type of testing (multiple-gene sequencing vs BRCA1/2-only sequencing), test results (negative, variant of unknown significance, or pathogenic variant), patient experiences with testing (timing of testing, who discussed results), and treatment (strength of patient consideration of, and surgeon recommendation for, prophylactic mastectomy), and prophylactic mastectomy receipt. We defined a patient subgroup with higher pretest risk of carrying a pathogenic variant according to practice guidelines. Among 5026 patients (mean [SD] age, 59.9 [10.7]), 1316 (26.2%) were linked to genetic results from any laboratory. Multiple-gene sequencing increasingly replaced BRCA1/2-only testing over time: in 2013, the rate of multiple-gene sequencing was 25.6% and BRCA1/2-only testing, 74.4%;in 2015 the rate of multiple-gene sequencing was 66.5% and BRCA1/2-only testing, 33.5%. Multiple-gene sequencing was more often ordered by genetic counselors (multiple-gene sequencing, 25.5% and BRCA1/2-only testing, 15.3%) and delayed until after surgery (multiple-gene sequencing, 32.5% and BRCA1/2-only testing, 19.9%). Multiple-gene sequencing substantially increased rate of detection of any pathogenic variant (multiple-gene sequencing: higher-risk patients, 12%; average-risk patients, 4.2% and BRCA1/2-only testing: higher-risk patients, 7.8%; average-risk patients, 2.2%) and variants of uncertain significance, especially in minorities (multiple-gene sequencing: white patients, 23.7%; black patients, 44.5%; and Asian patients, 50.9% and BRCA1/2-only testing: white patients, 2.2%; black patients, 5.6%; and Asian patients, 0%). Multiple-gene sequencing was not associated with an increase in the rate of prophylactic mastectomy use, which was highest with pathogenic variants in BRCA1/2 (BRCA1/2, 79.0%; other pathogenic variant, 37.6%; variant of uncertain significance, 30.2%; negative, 35.3%). Multiple-gene sequencing rapidly replaced BRCA1/2-only testing for patients with breast cancer in the community and enabled 2-fold higher detection of clinically relevant pathogenic variants without an associated increase in prophylactic mastectomy. However, important targets for improvement in the clinical utility of multiple-gene sequencing include postsurgical delay and racial/ethnic disparity in variants of uncertain significance.

  20. O-acetylserine(thiol)lyase from spinach (Spinacia oleracea L.) leaf: cDNA cloning, characterization, and overexpression in Escherichia coli of the chloroplast isoform.

    PubMed

    Rolland, N; Droux, M; Lebrun, M; Douce, R

    1993-01-01

    The last enzymatic step for L-cysteine biosynthesis is catalyzed by O-acetylserine(thiol)lyase (OASTL, EC 4.2.99.8) which synthesizes L-cysteine from O-acetylserine and "sulfide." We have isolated and characterized a full-length cDNA (1432 bp) from a lambda gt11 library of spinach leaf encoding the complete precursor of the chloroplast isoform. The 1149-nucleotide open reading frame coding for O-acetylserine(thiol)lyase was in the direction opposite that of the lambda gt11 beta-galactosidase gene. The derived amino acid sequence indicates that the protein precursor consists of 383 amino acid residues including a N-terminal presequence peptide of 52 residues. The amino acid sequence of mature spinach chloroplast O-acetylserine(thiol)lyase shows 40 and 57% homology with its bacterial counterparts. Sequence comparison with several pyridoxal 5'-phosphate-containing proteins reveals the presence of a lysine residue assumed to be involved in cofactor binding. A synthetic cDNA was constructed, coding for the entire 331-amino-acid mature O-acetylserine(thiol)lyase and for an initiating methionine. A high level of expression of the active mature chloroplast isoform was achieved in an Escherichia coli strain carrying the T7 RNA polymerase system (F. W. Studier, A. H. Rosenberg, J. J. Dunn, and J. W. Dubendorff, 1990, in Methods in Enzymology, D. V. Goeddel, Ed., Vol. 185, pp. 60-89, Academic Press, San Diego, CA). Addition of pyridoxine to the bacterial growth medium enhanced the enzyme activity due to the recombinant protein. The extent of production is 25-fold higher than in chloroplast from spinach leaves and the recombinant protein presents the relative molecular mass and immunological properties of the natural enzyme from spinach leaf chloroplast. This work, together with our previous biochemical studies, are in accordance with a prokaryotic type enzyme for L-cysteine biosynthesis in higher plant chloroplasts. Southern blot analysis indicated that O-acetylserine(thiol)lyase is encoded by multiple genes in the spinach leaf genomic DNA.

  1. Diversity of environmental single-stranded DNA phages revealed by PCR amplification of the partial major capsid protein

    PubMed Central

    Hopkins, Max; Kailasan, Shweta; Cohen, Allison; Roux, Simon; Tucker, Kimberly Pause; Shevenell, Amelia; Agbandje-McKenna, Mavis; Breitbart, Mya

    2014-01-01

    The small single-stranded DNA (ssDNA) bacteriophages of the subfamily Gokushovirinae were traditionally perceived as narrowly targeted, niche-specific viruses infecting obligate parasitic bacteria, such as Chlamydia. The advent of metagenomics revealed gokushoviruses to be widespread in global environmental samples. This study expands knowledge of gokushovirus diversity in the environment by developing a degenerate PCR assay to amplify a portion of the major capsid protein (MCP) gene of gokushoviruses. Over 500 amplicons were sequenced from 10 environmental samples (sediments, sewage, seawater and freshwater), revealing the ubiquity and high diversity of this understudied phage group. Residue-level conservation data generated from multiple alignments was combined with a predicted 3D structure, revealing a tendency for structurally internal residues to be more highly conserved than surface-presenting protein–protein or viral–host interaction domains. Aggregating this data set into a phylogenetic framework, many gokushovirus MCP clades contained samples from multiple environments, although distinct clades dominated the different samples. Antarctic sediment samples contained the most diverse gokushovirus communities, whereas freshwater springs from Florida were the least diverse. Whether the observed diversity is being driven by environmental factors or host-binding interactions remains an open question. The high environmental diversity of this previously overlooked ssDNA viral group necessitates further research elucidating their natural hosts and exploring their ecological roles. PMID:24694711

  2. Rational design of a conformation-switchable Ca2+- and Tb3+-binding protein without the use of multiple coupled metal-binding sites.

    PubMed

    Li, Shunyi; Yang, Wei; Maniccia, Anna W; Barrow, Doyle; Tjong, Harianto; Zhou, Huan-Xiang; Yang, Jenny J

    2008-10-01

    Ca2+, as a messenger of signal transduction, regulates numerous target molecules via Ca2+-induced conformational changes. Investigation into the determinants for Ca2+-induced conformational change is often impeded by cooperativity between multiple metal-binding sites or protein oligomerization in naturally occurring proteins. To dissect the relative contributions of key determinants for Ca2+-dependent conformational changes, we report the design of a single-site Ca2+-binding protein (CD2.trigger) created by altering charged residues at an electrostatically sensitive location on the surface of the host protein rat Cluster of Differentiation 2 (CD2).CD2.trigger binds to Tb3+ and Ca2+ with dissociation constants of 0.3 +/- 0.1 and 90 +/- 25 microM, respectively. This protein is largely unfolded in the absence of metal ions at physiological pH, but Tb3+ or Ca2+ binding results in folding of the native-like conformation. Neutralization of the charged coordination residues, either by mutation or protonation, similarly induces folding of the protein. The control of a major conformational change by a single Ca2+ ion, achieved on a protein designed without reliance on sequence similarity to known Ca2+-dependent proteins and coupled metal-binding sites, represents an important step in the design of trigger proteins.

  3. A novel class of plant-specific zinc-dependent DNA-binding protein that binds to A/T-rich DNA sequences

    PubMed Central

    Nagano, Yukio; Furuhashi, Hirofumi; Inaba, Takehito; Sasaki, Yukiko

    2001-01-01

    Complementary DNA encoding a DNA-binding protein, designated PLATZ1 (plant AT-rich sequence- and zinc-binding protein 1), was isolated from peas. The amino acid sequence of the protein is similar to those of other uncharacterized proteins predicted from the genome sequences of higher plants. However, no paralogous sequences have been found outside the plant kingdom. Multiple alignments among these paralogous proteins show that several cysteine and histidine residues are invariant, suggesting that these proteins are a novel class of zinc-dependent DNA-binding proteins with two distantly located regions, C-x2-H-x11-C-x2-C-x(4–5)-C-x2-C-x(3–7)-H-x2-H and C-x2-C-x(10–11)-C-x3-C. In an electrophoretic mobility shift assay, the zinc chelator 1,10-o-phenanthroline inhibited DNA binding, and two distant zinc-binding regions were required for DNA binding. A protein blot with 65ZnCl2 showed that both regions are required for zinc-binding activity. The PLATZ1 protein non-specifically binds to A/T-rich sequences, including the upstream region of the pea GTPase pra2 and plastocyanin petE genes. Expression of the PLATZ1 repressed those of the reporter constructs containing the coding sequence of luciferase gene driven by the cauliflower mosaic virus (CaMV) 35S90 promoter fused to the tandem repeat of the A/T-rich sequences. These results indicate that PLATZ1 is a novel class of plant-specific zinc-dependent DNA-binding protein responsible for A/T-rich sequence-mediated transcriptional repression. PMID:11600698

  4. A theoretical method to compute sequence dependent configurational properties in charged polymers and proteins.

    PubMed

    Sawle, Lucas; Ghosh, Kingshuk

    2015-08-28

    A general formalism to compute configurational properties of proteins and other heteropolymers with an arbitrary sequence of charges and non-uniform excluded volume interaction is presented. A variational approach is utilized to predict average distance between any two monomers in the chain. The presented analytical model, for the first time, explicitly incorporates the role of sequence charge distribution to determine relative sizes between two sequences that vary not only in total charge composition but also in charge decoration (even when charge composition is fixed). Furthermore, the formalism is general enough to allow variation in excluded volume interactions between two monomers. Model predictions are benchmarked against the all-atom Monte Carlo studies of Das and Pappu [Proc. Natl. Acad. Sci. U. S. A. 110, 13392 (2013)] for 30 different synthetic sequences of polyampholytes. These sequences possess an equal number of glutamic acid (E) and lysine (K) residues but differ in the patterning within the sequence. Without any fit parameter, the model captures the strong sequence dependence of the simulated values of the radius of gyration with a correlation coefficient of R(2) = 0.9. The model is then applied to real proteins to compare the unfolded state dimensions of 540 orthologous pairs of thermophilic and mesophilic proteins. The excluded volume parameters are assumed similar under denatured conditions, and only electrostatic effects encoded in the sequence are accounted for. With these assumptions, thermophilic proteins are found-with high statistical significance-to have more compact disordered ensemble compared to their mesophilic counterparts. The method presented here, due to its analytical nature, is capable of making such high throughput analysis of multiple proteins and will have broad applications in proteomic studies as well as in other heteropolymeric systems.

  5. Screening of matrix metalloproteinases available from the protein data bank: insights into biological functions, domain organization, and zinc binding groups.

    PubMed

    Nicolotti, Orazio; Miscioscia, Teresa Fabiola; Leonetti, Francesco; Muncipinto, Giovanni; Carotti, Angelo

    2007-01-01

    A total of 142 matrix metalloproteinase (MMP) X-ray crystallographic structures were retrieved from the Protein Data Bank (PDB) and analyzed by an automated and efficient routine, developed in-house, with a series of bioinformatic tools. Highly informative heat maps and hierarchical clusterograms provided a reliable and comprehensive representation of the relationships existing among MMPs, enlarging and complementing the current knowledge in the field. Multiple sequence and structural alignments permitted better location and display of key MMP motifs and quantification of the residue consensus at each amino acid position in the most critical binding subsites of MMPs. The MMP active site consensus sequences, the C-alpha root-mean-square deviation (RMSd) analysis of diverse enzymatic subsites, and the examination of the chemical nature, binding topologies, and zinc binding groups (ZBGs) of ligands extracted from crystallographic complexes provided useful insights on the structural arrangements of the most potent MMP inhibitors.

  6. Limited utility of residue masking for positive-selection inference.

    PubMed

    Spielman, Stephanie J; Dawson, Eric T; Wilke, Claus O

    2014-09-01

    Errors in multiple sequence alignments (MSAs) can reduce accuracy in positive-selection inference. Therefore, it has been suggested to filter MSAs before conducting further analyses. One widely used filter, Guidance, allows users to remove MSA positions aligned with low confidence. However, Guidance's utility in positive-selection inference has been disputed in the literature. We have conducted an extensive simulation-based study to characterize fully how Guidance impacts positive-selection inference, specifically for protein-coding sequences of realistic divergence levels. We also investigated whether novel scoring algorithms, which phylogenetically corrected confidence scores, and a new gap-penalization score-normalization scheme improved Guidance's performance. We found that no filter, including original Guidance, consistently benefitted positive-selection inferences. Moreover, all improvements detected were exceedingly minimal, and in certain circumstances, Guidance-based filters worsened inferences. © The Author 2014. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  7. The ConSurf-DB: pre-calculated evolutionary conservation profiles of protein structures.

    PubMed

    Goldenberg, Ofir; Erez, Elana; Nimrod, Guy; Ben-Tal, Nir

    2009-01-01

    ConSurf-DB is a repository for evolutionary conservation analysis of the proteins of known structures in the Protein Data Bank (PDB). Sequence homologues of each of the PDB entries were collected and aligned using standard methods. The evolutionary conservation of each amino acid position in the alignment was calculated using the Rate4Site algorithm, implemented in the ConSurf web server. The algorithm takes into account the phylogenetic relations between the aligned proteins and the stochastic nature of the evolutionary process explicitly. Rate4Site assigns a conservation level for each position in the multiple sequence alignment using an empirical Bayesian inference. Visual inspection of the conservation patterns on the 3D structure often enables the identification of key residues that comprise the functionally important regions of the protein. The repository is updated with the latest PDB entries on a monthly basis and will be rebuilt annually. ConSurf-DB is available online at http://consurfdb.tau.ac.il/

  8. The ConSurf-DB: pre-calculated evolutionary conservation profiles of protein structures

    PubMed Central

    Goldenberg, Ofir; Erez, Elana; Nimrod, Guy; Ben-Tal, Nir

    2009-01-01

    ConSurf-DB is a repository for evolutionary conservation analysis of the proteins of known structures in the Protein Data Bank (PDB). Sequence homologues of each of the PDB entries were collected and aligned using standard methods. The evolutionary conservation of each amino acid position in the alignment was calculated using the Rate4Site algorithm, implemented in the ConSurf web server. The algorithm takes into account the phylogenetic relations between the aligned proteins and the stochastic nature of the evolutionary process explicitly. Rate4Site assigns a conservation level for each position in the multiple sequence alignment using an empirical Bayesian inference. Visual inspection of the conservation patterns on the 3D structure often enables the identification of key residues that comprise the functionally important regions of the protein. The repository is updated with the latest PDB entries on a monthly basis and will be rebuilt annually. ConSurf-DB is available online at http://consurfdb.tau.ac.il/ PMID:18971256

  9. Functional characterization of a synthetic hydrophilic antifungal peptide derived from the marine snail Cenchritis muricatus.

    PubMed

    López-Abarrategui, Carlos; Alba, Annia; Silva, Osmar N; Reyes-Acosta, Osvaldo; Vasconcelos, Ilka M; Oliveira, Jose T A; Migliolo, Ludovico; Costa, Maysa P; Costa, Carolina R; Silva, Maria R R; Garay, Hilda E; Dias, Simoni C; Franco, Octávio L; Otero-González, Anselmo J

    2012-04-01

    Antimicrobial peptides have been found in mollusks and other sea animals. In this report, a crude extract of the marine snail Cenchritis muricatus was evaluated against human pathogens responsible for multiple deleterious effects and diseases. A peptide of 1485.26 Da was purified by reversed-phase HPLC and functionally characterized. This trypsinized peptide was sequenced by MS/MS technology, and a sequence (SRSELIVHQR), named Cm-p1 was recovered, chemically synthesized and functionally characterized. This peptide demonstrated the capacity to prevent the development of yeasts and filamentous fungi. Otherwise, Cm-p1 displayed no toxic effects against mammalian cells. Molecular modeling analyses showed that this peptide possible forms a single hydrophilic α-helix and the probable cationic residue involved in antifungal activity action is proposed. The data reported here demonstrate the importance of sea animals peptide discovery for biotechnological tools development that could be useful in solving human health and agribusiness problems. Copyright © 2011 Elsevier Masson SAS. All rights reserved.

  10. Computational approaches for identification of conserved/unique binding pockets in the A chain of ricin

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ecale Zhou, C L; Zemla, A T; Roe, D

    2005-01-29

    Specific and sensitive ligand-based protein detection assays that employ antibodies or small molecules such as peptides, aptamers, or other small molecules require that the corresponding surface region of the protein be accessible and that there be minimal cross-reactivity with non-target proteins. To reduce the time and cost of laboratory screening efforts for diagnostic reagents, we developed new methods for evaluating and selecting protein surface regions for ligand targeting. We devised combined structure- and sequence-based methods for identifying 3D epitopes and binding pockets on the surface of the A chain of ricin that are conserved with respect to a set ofmore » ricin A chains and unique with respect to other proteins. We (1) used structure alignment software to detect structural deviations and extracted from this analysis the residue-residue correspondence, (2) devised a method to compare corresponding residues across sets of ricin structures and structures of closely related proteins, (3) devised a sequence-based approach to determine residue infrequency in local sequence context, and (4) modified a pocket-finding algorithm to identify surface crevices in close proximity to residues determined to be conserved/unique based on our structure- and sequence-based methods. In applying this combined informatics approach to ricin A we identified a conserved/unique pocket in close proximity (but not overlapping) the active site that is suitable for bi-dentate ligand development. These methods are generally applicable to identification of surface epitopes and binding pockets for development of diagnostic reagents, therapeutics, and vaccines.« less

  11. Relationships between residue Voronoi volume and sequence conservation in proteins.

    PubMed

    Liu, Jen-Wei; Cheng, Chih-Wen; Lin, Yu-Feng; Chen, Shao-Yu; Hwang, Jenn-Kang; Yen, Shih-Chung

    2018-02-01

    Functional and biophysical constraints can cause different levels of sequence conservation in proteins. Previously, structural properties, e.g., relative solvent accessibility (RSA) and packing density of the weighted contact number (WCN), have been found to be related to protein sequence conservation (CS). The Voronoi volume has recently been recognized as a new structural property of the local protein structural environment reflecting CS. However, for surface residues, it is sensitive to water molecules surrounding the protein structure. Herein, we present a simple structural determinant termed the relative space of Voronoi volume (RSV); it uses the Voronoi volume and the van der Waals volume of particular residues to quantify the local structural environment. RSV (range, 0-1) is defined as (Voronoi volume-van der Waals volume)/Voronoi volume of the target residue. The concept of RSV describes the extent of available space for every protein residue. RSV and Voronoi profiles with and without water molecules (RSVw, RSV, VOw, and VO) were compared for 554 non-homologous proteins. RSV (without water) showed better Pearson's correlations with CS than did RSVw, VO, or VOw values. The mean correlation coefficient between RSV and CS was 0.51, which is comparable to the correlation between RSA and CS (0.49) and that between WCN and CS (0.56). RSV is a robust structural descriptor with and without water molecules and can quantitatively reflect evolutionary information in a single protein structure. Therefore, it may represent a practical structural determinant to study protein sequence, structure, and function relationships. Copyright © 2017 Elsevier B.V. All rights reserved.

  12. Thermal adaptation analyzed by comparison of protein sequences from mesophilic and extremely thermophilic Methanococcus species

    NASA Technical Reports Server (NTRS)

    Haney, P. J.; Badger, J. H.; Buldak, G. L.; Reich, C. I.; Woese, C. R.; Olsen, G. J.

    1999-01-01

    The genome sequence of the extremely thermophilic archaeon Methanococcus jannaschii provides a wealth of data on proteins from a thermophile. In this paper, sequences of 115 proteins from M. jannaschii are compared with their homologs from mesophilic Methanococcus species. Although the growth temperatures of the mesophiles are about 50 degrees C below that of M. jannaschii, their genomic G+C contents are nearly identical. The properties most correlated with the proteins of the thermophile include higher residue volume, higher residue hydrophobicity, more charged amino acids (especially Glu, Arg, and Lys), and fewer uncharged polar residues (Ser, Thr, Asn, and Gln). These are recurring themes, with all trends applying to 83-92% of the proteins for which complete sequences were available. Nearly all of the amino acid replacements most significantly correlated with the temperature change are the same relatively conservative changes observed in all proteins, but in the case of the mesophile/thermophile comparison there is a directional bias. We identify 26 specific pairs of amino acids with a statistically significant (P < 0.01) preferred direction of replacement.

  13. Homology modeling and docking analyses of M. leprae Mur ligases reveals the common binding residues for structure based drug designing to eradicate leprosy.

    PubMed

    Shanmugam, Anusuya; Natarajan, Jeyakumar

    2012-06-01

    Multi drug resistance capacity for Mycobacterium leprae (MDR-Mle) demands the profound need for developing new anti-leprosy drugs. Since most of the drugs target a single enzyme, mutation in the active site renders the antibiotic ineffective. However, structural and mechanistic information on essential bacterial enzymes in a pathway could lead to the development of antibiotics that targets multiple enzymes. Peptidoglycan is an important component of the cell wall of M. leprae. The biosynthesis of bacterial peptidoglycan represents important targets for the development of new antibacterial drugs. Biosynthesis of peptidoglycan is a multi-step process that involves four key Mur ligase enzymes: MurC (EC:6.3.2.8), MurD (EC:6.3.2.9), MurE (EC:6.3.2.13) and MurF (EC:6.3.2.10). Hence in our work, we modeled the three-dimensional structure of the above Mur ligases using homology modeling method and analyzed its common binding features. The residues playing an important role in the catalytic activity of each of the Mur enzymes were predicted by docking these Mur ligases with their substrates and ATP. The conserved sequence motifs significant for ATP binding were predicted as the probable residues for structure based drug designing. Overall, the study was successful in listing significant and common binding residues of Mur enzymes in peptidoglycan pathway for multi targeted therapy.

  14. Analysis of DNA-binding sites on Mhr1, a yeast mitochondrial ATP-independent homologous pairing protein.

    PubMed

    Masuda, Tokiha; Ling, Feng; Shibata, Takehiko; Mikawa, Tsutomu

    2010-03-01

    The Mhr1 protein is necessary for mtDNA homologous recombination in Saccharomyces cerevisiae. Homologous pairing (HP) is an essential reaction during homologous recombination, and is generally catalyzed by the RecA/Rad51 family of proteins in an ATP-dependent manner. Mhr1 catalyzes HP through a mechanism similar, at the DNA level, to that of the RecA/Rad51 proteins, but without utilizing ATP. However, it has no sequence homology with the RecA/Rad51 family proteins or with other ATP-independent HP proteins, and exhibits different requirements for DNA topology. We are interested in the structural features of the functional domains of Mhr1. In this study, we employed the native fluorescence of Mhr1's Trp residues to examine the energy transfer from the Trp residues to etheno-modified ssDNA bound to Mhr1. Our results showed that two of the seven Trp residues (Trp71 and Trp165) are spatially close to the bound DNA. A systematic analysis of mutant Mhr1 proteins revealed that Asp69 is involved in Mg(2+)-dependent DNA binding, and that multiple Lys and Arg residues located around Trp71 and Trp165 are involved in the DNA-binding activity of Mhr1. In addition, in vivo complementation analyses showed that a region around Trp165 is important for the maintenance of mtDNA. On the basis of these results, we discuss the function of the region surrounding Trp165.

  15. Knowledge-based grouping of modeled HLA peptide complexes.

    PubMed

    Kangueane, P; Sakharkar, M K; Lim, K S; Hao, H; Lin, K; Chee, R E; Kolatkar, P R

    2000-05-01

    Human leukocyte antigens are the most polymorphic of human genes and multiple sequence alignment shows that such polymorphisms are clustered in the functional peptide binding domains. Because of such polymorphism among the peptide binding residues, the prediction of peptides that bind to specific HLA molecules is very difficult. In recent years two different types of computer based prediction methods have been developed and both the methods have their own advantages and disadvantages. The nonavailability of allele specific binding data restricts the use of knowledge-based prediction methods for a wide range of HLA alleles. Alternatively, the modeling scheme appears to be a promising predictive tool for the selection of peptides that bind to specific HLA molecules. The scoring of the modeled HLA-peptide complexes is a major concern. The use of knowledge based rules (van der Waals clashes and solvent exposed hydrophobic residues) to distinguish binders from nonbinders is applied in the present study. The rules based on (1) number of observed atomic clashes between the modeled peptide and the HLA structure, and (2) number of solvent exposed hydrophobic residues on the modeled peptide effectively discriminate experimentally known binders from poor/nonbinders. Solved crystal complexes show no vdW Clash (vdWC) in 95% cases and no solvent exposed hydrophobic peptide residues (SEHPR) were seen in 86% cases. In our attempt to compare experimental binding data with the predicted scores by this scoring scheme, 77% of the peptides are correctly grouped as good binders with a sensitivity of 71%.

  16. Attenuation of multiples in image space

    NASA Astrophysics Data System (ADS)

    Alvarez, Gabriel F.

    In complex subsurface areas, attenuation of 3D specular and diffracted multiples in data space is difficult and inaccurate. In those areas, image space is an attractive alternative. There are several reasons: (1) migration increases the signal-to-noise ratio of the data; (2) primaries are mapped to coherent events in Subsurface Offset Domain Common Image Gathers (SODCIGs) or Angle Domain Common Image Gathers (ADCIGs); (3) image space is regular and smaller; (4) attenuating the multiples in data space leaves holes in the frequency-Wavenumber space that generate artifacts after migration. I develop a new equation for the residual moveout of specular multiples in ADCIGs and use it for the kernel of an apex-shifted Radon transform to focus and separate the primaries from specular and diffracted multiples. Because of small amplitude, phase and kinematic errors in the multiple estimate, we need adaptive matching and subtraction to estimate the primaries. I pose this problem as an iterative least-squares inversion that simultaneously matches the estimates of primaries and multiples to the data. Standard methods match only the estimate of the multiples. I demonstrate with real and synthetic data that the method produces primaries and multiples with little cross-talk. In 3D, the multiples exhibit residual moveout in SODCIGs in in-line and cross-line offsets. They map away from zero subsurface offsets when migrated with the faster velocity of the primaries. In ADCIGs the residual moveout of the primaries as a function of the aperture angle, for a given azimuth, is flat for those angles that illuminate the reflector. The multiples have residual moveout towards increasing depth for increasing aperture angles at all azimuths. As a function of azimuth, the primaries have better azimuth resolution than the multiples at larger aperture angles. I show, with a real 3D dataset, that even below salt, where illumination is poor, the multiples are well attenuated in ADCIGs with the new Radon transform in planes of azimuth-stacked ADCIGs. The angle stacks of the estimated primaries show little residual multiple energy.

  17. Structure of Lmaj006129AAA, a hypothetical protein from Leishmania major

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Arakaki, Tracy; Le Trong, Isolde; Structural Genomics of Pathogenic Protozoa

    2006-03-01

    The crystal structure of a conserved hypothetical protein from L. major, Pfam sequence family PF04543, structural genomics target ID Lmaj006129AAA, has been determined at a resolution of 1.6 Å. The gene product of structural genomics target Lmaj006129 from Leishmania major codes for a 164-residue protein of unknown function. When SeMet expression of the full-length gene product failed, several truncation variants were created with the aid of Ginzu, a domain-prediction method. 11 truncations were selected for expression, purification and crystallization based upon secondary-structure elements and disorder. The structure of one of these variants, Lmaj006129AAH, was solved by multiple-wavelength anomalous diffraction (MAD)more » using ELVES, an automatic protein crystal structure-determination system. This model was then successfully used as a molecular-replacement probe for the parent full-length target, Lmaj006129AAA. The final structure of Lmaj006129AAA was refined to an R value of 0.185 (R{sub free} = 0.229) at 1.60 Å resolution. Structure and sequence comparisons based on Lmaj006129AAA suggest that proteins belonging to Pfam sequence families PF04543 and PF01878 may share a common ligand-binding motif.« less

  18. Structural basis for regulation of rhizobial nodulation and symbiosis gene expression by the regulatory protein NolR.

    PubMed

    Lee, Soon Goo; Krishnan, Hari B; Jez, Joseph M

    2014-04-29

    The symbiosis between rhizobial microbes and host plants involves the coordinated expression of multiple genes, which leads to nodule formation and nitrogen fixation. As part of the transcriptional machinery for nodulation and symbiosis across a range of Rhizobium, NolR serves as a global regulatory protein. Here, we present the X-ray crystal structures of NolR in the unliganded form and complexed with two different 22-base pair (bp) double-stranded operator sequences (oligos AT and AA). Structural and biochemical analysis of NolR reveals protein-DNA interactions with an asymmetric operator site and defines a mechanism for conformational switching of a key residue (Gln56) to accommodate variation in target DNA sequences from diverse rhizobial genes for nodulation and symbiosis. This conformational switching alters the energetic contributions to DNA binding without changes in affinity for the target sequence. Two possible models for the role of NolR in the regulation of different nodulation and symbiosis genes are proposed. To our knowledge, these studies provide the first structural insight on the regulation of genes involved in the agriculturally and ecologically important symbiosis of microbes and plants that leads to nodule formation and nitrogen fixation.

  19. Sequential addition of short DNA oligos in DNA-polymerase-based synthesis reactions

    DOEpatents

    Gardner, Shea N [San Leandro, CA; Mariella, Jr., Raymond P.; Christian, Allen T [Tracy, CA; Young, Jennifer A [Berkeley, CA; Clague, David S [Livermore, CA

    2011-01-18

    A method of fabricating a DNA molecule of user-defined sequence. The method comprises the steps of preselecting a multiplicity of DNA sequence segments that will comprise the DNA molecule of user-defined sequence, separating the DNA sequence segments temporally, and combining the multiplicity of DNA sequence segments with at least one polymerase enzyme wherein the multiplicity of DNA sequence segments join to produce the DNA molecule of user-defined sequence. Sequence segments may be of length n, where n is an even or odd integer. In one embodiment the length of desired hybridizing overlap is specified by the user and the sequences and the protocol for combining them are guided by computational (bioinformatics) predictions. In one embodiment sequence segments are combined from multiple reading frames to span the same region of a sequence, so that multiple desired hybridizations may occur with different overlap lengths. In one embodiment starting sequence fragments are of different lengths, n, n+1, n+2, etc.

  20. The amino acid sequences of carboxypeptidases I and II from Aspergillus niger and their stability in the presence of divalent cations.

    PubMed

    Svendsen, I; Dal Degan, F

    1998-09-08

    The amino acid sequences of serine carboxypeptidase I (CPD-I) and II (CPD-II), respectively, from Aspergillus niger have been determined by conventional Edman degradation of the reduced and vinylpyridinated enzymes and peptides hereof generated by cleavage with cyanogen bromide, iodobenzoic acid, glutamic acid cleaving enzyme, AspN-endoproteinase and EndoLysC proteinase. CPD-I consists of a single peptide chain of 471 amino acid residues, three disulfide bridges and nine N-glycosylated asparaginyl residues, while CPD-II consists of a single peptide chain of 481 amino acid residues, has three disulfide bridges, one free cysteinyl residue and nine glycosylated asparaginyl residues. The enzymes are closely related to carboxypeptidase S3 from Penicillium janthinellum. Both Ca2+ and Mg2+ stabilize CPD-I as well as CPD-II, at basic pH values, Ca2+ being most effective, while the divalent ions have no effect on the activity of the two enzymes.

  1. Sequence-Based Prediction of RNA-Binding Residues in Proteins.

    PubMed

    Walia, Rasna R; El-Manzalawy, Yasser; Honavar, Vasant G; Dobbs, Drena

    2017-01-01

    Identifying individual residues in the interfaces of protein-RNA complexes is important for understanding the molecular determinants of protein-RNA recognition and has many potential applications. Recent technical advances have led to several high-throughput experimental methods for identifying partners in protein-RNA complexes, but determining RNA-binding residues in proteins is still expensive and time-consuming. This chapter focuses on available computational methods for identifying which amino acids in an RNA-binding protein participate directly in contacting RNA. Step-by-step protocols for using three different web-based servers to predict RNA-binding residues are described. In addition, currently available web servers and software tools for predicting RNA-binding sites, as well as databases that contain valuable information about known protein-RNA complexes, RNA-binding motifs in proteins, and protein-binding recognition sites in RNA are provided. We emphasize sequence-based methods that can reliably identify interfacial residues without the requirement for structural information regarding either the RNA-binding protein or its RNA partner.

  2. Sequence-Based Prediction of RNA-Binding Residues in Proteins

    PubMed Central

    Walia, Rasna R.; EL-Manzalawy, Yasser; Honavar, Vasant G.; Dobbs, Drena

    2017-01-01

    Identifying individual residues in the interfaces of protein–RNA complexes is important for understanding the molecular determinants of protein–RNA recognition and has many potential applications. Recent technical advances have led to several high-throughput experimental methods for identifying partners in protein–RNA complexes, but determining RNA-binding residues in proteins is still expensive and time-consuming. This chapter focuses on available computational methods for identifying which amino acids in an RNA-binding protein participate directly in contacting RNA. Step-by-step protocols for using three different web-based servers to predict RNA-binding residues are described. In addition, currently available web servers and software tools for predicting RNA-binding sites, as well as databases that contain valuable information about known protein–RNA complexes, RNA-binding motifs in proteins, and protein-binding recognition sites in RNA are provided. We emphasize sequence-based methods that can reliably identify interfacial residues without the requirement for structural information regarding either the RNA-binding protein or its RNA partner. PMID:27787829

  3. Identification of residues within the African swine fever virus DP71L protein required for dephosphorylation of translation initiation factor eIF2α and inhibiting activation of pro-apoptotic CHOP

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Barber, Claire; Netherton, Chris; Goatley, Lynnett

    The African swine fever virus DP71L protein recruits protein phosphatase 1 (PP1) to dephosphorylate the translation initiation factor 2α (eIF2α) and avoid shut-off of global protein synthesis and downstream activation of the pro-apoptotic factor CHOP. Residues V16 and F18A were critical for binding of DP71L to PP1. Mutation of this PP1 binding motif or deletion of residues between 52 and 66 reduced the ability of DP71L to cause dephosphorylation of eIF2α and inhibit CHOP induction. The residues LSAVL, between 57 and 61, were also required. PP1 was co-precipitated with wild type DP71L and the mutant lacking residues 52- 66 ormore » the LSAVL motif, but not with the PP1 binding motif mutant. The residues in the LSAVL motif play a critical role in DP71L function but do not interfere with binding to PP1. Instead we propose these residues are important for DP71L binding to eIF2α. - Highlights: •The African swine fever virus DP71L protein recruits protein phosphatase 1 (PP1) to dephosphorylate translation initiation factor eIF2α (eIF2α). •The residues V{sup 16}, F{sup 18} of DP71L are required for binding to the α, β and γ isoforms of PP1 and for DP71L function. •The sequence LSAVL downstream from the PP1 binding site (residues 57–61) are also important for DP71L function. •DP71L mutants of the LSAVL sequence retain ability to co-precipitate with PP1 showing these sequences have a different role to PP1 binding.« less

  4. Protein Solvent-Accessibility Prediction by a Stacked Deep Bidirectional Recurrent Neural Network.

    PubMed

    Zhang, Buzhong; Li, Linqing; Lü, Qiang

    2018-05-25

    Residue solvent accessibility is closely related to the spatial arrangement and packing of residues. Predicting the solvent accessibility of a protein is an important step to understand its structure and function. In this work, we present a deep learning method to predict residue solvent accessibility, which is based on a stacked deep bidirectional recurrent neural network applied to sequence profiles. To capture more long-range sequence information, a merging operator was proposed when bidirectional information from hidden nodes was merged for outputs. Three types of merging operators were used in our improved model, with a long short-term memory network performing as a hidden computing node. The trained database was constructed from 7361 proteins extracted from the PISCES server using a cut-off of 25% sequence identity. Sequence-derived features including position-specific scoring matrix, physical properties, physicochemical characteristics, conservation score and protein coding were used to represent a residue. Using this method, predictive values of continuous relative solvent-accessible area were obtained, and then, these values were transformed into binary states with predefined thresholds. Our experimental results showed that our deep learning method improved prediction quality relative to current methods, with mean absolute error and Pearson's correlation coefficient values of 8.8% and 74.8%, respectively, on the CB502 dataset and 8.2% and 78%, respectively, on the Manesh215 dataset.

  5. Predicting HIV-1 broadly neutralizing antibody epitope networks using neutralization titers and a novel computational method

    PubMed Central

    2014-01-01

    Background Recent efforts in HIV-1 vaccine design have focused on immunogens that evoke potent neutralizing antibody responses to a broad spectrum of viruses circulating worldwide. However, the development of effective vaccines will depend on the identification and characterization of the neutralizing antibodies and their epitopes. We developed bioinformatics methods to predict epitope networks and antigenic determinants using structural information, as well as corresponding genotypes and phenotypes generated by a highly sensitive and reproducible neutralization assay. 282 clonal envelope sequences from a multiclade panel of HIV-1 viruses were tested in viral neutralization assays with an array of broadly neutralizing monoclonal antibodies (mAbs: b12, PG9,16, PGT121 - 128, PGT130 - 131, PGT135 - 137, PGT141 - 145, and PGV04). We correlated IC50 titers with the envelope sequences, and used this information to predict antibody epitope networks. Structural patches were defined as amino acid groups based on solvent-accessibility, radius, atomic depth, and interaction networks within 3D envelope models. We applied a boosted algorithm consisting of multiple machine-learning and statistical models to evaluate these patches as possible antibody epitope regions, evidenced by strong correlations with the neutralization response for each antibody. Results We identified patch clusters with significant correlation to IC50 titers as sites that impact neutralization sensitivity and therefore are potentially part of the antibody binding sites. Predicted epitope networks were mostly located within the variable loops of the envelope glycoprotein (gp120), particularly in V1/V2. Site-directed mutagenesis experiments involving residues identified as epitope networks across multiple mAbs confirmed association of these residues with loss or gain of neutralization sensitivity. Conclusions Computational methods were implemented to rapidly survey protein structures and predict epitope networks associated with response to individual monoclonal antibodies, which resulted in the identification and deeper understanding of immunological hotspots targeted by broadly neutralizing HIV-1 antibodies. PMID:24646213

  6. A five-residue sequence near the carboxyl terminus of the polytopic membrane protein lac permease is required for stability within the membrane.

    PubMed Central

    Roepe, P D; Zbar, R I; Sarkar, H K; Kaback, H R

    1989-01-01

    The lac permease (lacY gene product) of Escherichia coli contains 417 amino acid residues and is predicted to have a short hydrophilic amino terminus on the inner surface of the cytoplasmic membrane, multiple transmembrane hydrophobic segments in alpha-helical conformation, and a 17-amino acid residue hydrophilic carboxyl-terminal tail on the inner surface of the membrane. To assess the importance of the carboxyl terminus, the properties of several truncation mutants were studied. The mutants were constructed by site-directed mutagenesis such that stop codons were placed at specified positions, and the altered lacY genes were expressed at a relatively low rate from plasmid pACYC184. Permease truncated at position 407 or 401 retains full activity, and a normal complement of molecules is present in the membrane, as judged by immunoblot analyses. Thus, it is apparent that the carboxyl-terminal tail plays no direct role in membrane insertion of the permease, its stability, or in the mechanism of lactose/H+ symport. In marked contrast, when truncations are made at residues 396 (i.e., 4 amino acid residues from the carboxyl terminus of putative helix XII), 389, 372, or 346, the permease is no longer found in the membrane. Remarkably, however, when each of the mutated lacY genes is expressed at a high rate by means of the T7 RNA polymerase system [Tabor, S. & Richardson, C. C. (1985) Proc. Natl. Acad. Sci. USA 82, 1074-1079], all of the truncated permeases are present in the membrane, as indicated by [35S]methionine incorporation studies; however, permease truncated at residue 396, 389, 372, or 346 is defective with respect to lactose/H+ symport. Finally, pulse-chase experiments indicate that wild-type permease or permease truncated at residue 401 is stable, whereas permease truncated at or prior to residue 396 is degraded at a significant rate. The results are consistent with the notion that residues 396-401 in putative helix XII are important for protection against proteolytic degradation and suggest that this region of the permease may be necessary for proper folding. Images PMID:2657733

  7. Mapping a nucleolar targeting sequence of an RNA binding nucleolar protein, Nop25

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Fujiwara, Takashi; Suzuki, Shunji; Kanno, Motoko

    2006-06-10

    Nop25 is a putative RNA binding nucleolar protein associated with rRNA transcription. The present study was undertaken to determine the mechanism of Nop25 localization in the nucleolus. Deletion experiments of Nop25 amino acid sequence showed Nop25 to contain a nuclear targeting sequence in the N-terminal and a nucleolar targeting sequence in the C-terminal. By expressing derivative peptides from the C-terminal as GFP-fusion proteins in the cells, a lysine and arginine residue-enriched peptide (KRKHPRRAQDSTKKPPSATRTSKTQRRRR) allowed a GFP-fusion protein to be transported and fully retained in the nucleolus. When the peptide was fused with cMyc epitope and expressed in the cells, amore » cMyc epitope was then detected in the nucleolus. Nop25 did not localize in the nucleolus by deletion of the peptide from Nop25. Furthermore, deletion of a subdomain (KRKHPRRAQ) in the peptide or amino acid substitution of lysine and arginine residues in the subdomain resulted in the loss of Nop25 nucleolar localization. These results suggest that the lysine and arginine residue-enriched peptide is the most prominent nucleolar targeting sequence of Nop25 and that the long stretch of basic residues might play an important role in the nucleolar localization of Nop25. Although Nop25 contained putative SUMOylation, phosphorylation and glycosylation sites, the amino acid substitution in these sites had no effect on the nucleolar localization, thus suggesting that these post-translational modifications did not contribute to the localization of Nop25 in the nucleolus. The treatment of the cells, which expressed a GFP-fusion protein with a nucleolar targeting sequence of Nop25, with RNase A resulted in a complete dislocation of the protein from the nucleolus. These data suggested that the nucleolar targeting sequence might therefore play an important role in the binding of Nop25 to RNA molecules and that the RNA binding of Nop25 might be essential for the nucleolar localization of Nop25.« less

  8. Large-scale analysis of intrinsic disorder flavors and associated functions in the protein sequence universe.

    PubMed

    Necci, Marco; Piovesan, Damiano; Tosatto, Silvio C E

    2016-12-01

    Intrinsic disorder (ID) in proteins has been extensively described for the last decade; a large-scale classification of ID in proteins is mostly missing. Here, we provide an extensive analysis of ID in the protein universe on the UniProt database derived from sequence-based predictions in MobiDB. Almost half the sequences contain an ID region of at least five residues. About 9% of proteins have a long ID region of over 20 residues which are more abundant in Eukaryotic organisms and most frequently cover less than 20% of the sequence. A small subset of about 67,000 (out of over 80 million) proteins is fully disordered and mostly found in Viruses. Most proteins have only one ID, with short ID evenly distributed along the sequence and long ID overrepresented in the center. The charged residue composition of Das and Pappu was used to classify ID proteins by structural propensities and corresponding functional enrichment. Swollen Coils seem to be used mainly as structural components and in biosynthesis in both Prokaryotes and Eukaryotes. In Bacteria, they are confined in the nucleoid and in Viruses provide DNA binding function. Coils & Hairpins seem to be specialized in ribosome binding and methylation activities. Globules & Tadpoles bind antigens in Eukaryotes but are involved in killing other organisms and cytolysis in Bacteria. The Undefined class is used by Bacteria to bind toxic substances and mediate transport and movement between and within organisms in Viruses. Fully disordered proteins behave similarly, but are enriched for glycine residues and extracellular structures. © 2016 The Protein Society.

  9. All-atom 3D structure prediction of transmembrane β-barrel proteins from sequences.

    PubMed

    Hayat, Sikander; Sander, Chris; Marks, Debora S; Elofsson, Arne

    2015-04-28

    Transmembrane β-barrels (TMBs) carry out major functions in substrate transport and protein biogenesis but experimental determination of their 3D structure is challenging. Encouraged by successful de novo 3D structure prediction of globular and α-helical membrane proteins from sequence alignments alone, we developed an approach to predict the 3D structure of TMBs. The approach combines the maximum-entropy evolutionary coupling method for predicting residue contacts (EVfold) with a machine-learning approach (boctopus2) for predicting β-strands in the barrel. In a blinded test for 19 TMB proteins of known structure that have a sufficient number of diverse homologous sequences available, this combined method (EVfold_bb) predicts hydrogen-bonded residue pairs between adjacent β-strands at an accuracy of ∼70%. This accuracy is sufficient for the generation of all-atom 3D models. In the transmembrane barrel region, the average 3D structure accuracy [template-modeling (TM) score] of top-ranked models is 0.54 (ranging from 0.36 to 0.85), with a higher (44%) number of residue pairs in correct strand-strand registration than in earlier methods (18%). Although the nonbarrel regions are predicted less accurately overall, the evolutionary couplings identify some highly constrained loop residues and, for FecA protein, the barrel including the structure of a plug domain can be accurately modeled (TM score = 0.68). Lower prediction accuracy tends to be associated with insufficient sequence information and we therefore expect increasing numbers of β-barrel families to become accessible to accurate 3D structure prediction as the number of available sequences increases.

  10. Biochemical and genetic characterization of enterocin A from Enterococcus faecium, a new antilisterial bacteriocin in the pediocin family of bacteriocins.

    PubMed Central

    Aymerich, T; Holo, H; Håvarstein, L S; Hugas, M; Garriga, M; Nes, I F

    1996-01-01

    A new bacteriocin has been isolated from an Enterococcus faecium strain. The bacteriocin, termed enterocin A, was purified to homogeneity as judged by sodium dodecyl sulfate-polyacrylamide gel electrophoresis, N-terminal amino acid sequencing, and mass spectrometry analysis. By combining the data obtained from amino acid and DNA sequencing, the primary structure of enterocin A was determined. It consists of 47 amino acid residues, and the molecular weight was calculated to be 4,829, assuming that the four cysteine residues form intramolecular disulfide bridges. This molecular weight was confirmed by mass spectrometry analysis. The amino acid sequence of enterocin A shared significant homology with a group of bacteriocins (now termed pediocin-like bacteriocins) isolated from a variety of lactic acid-producing bacteria, which include members of the genera Lactobacillus, Pediococcus, Leuconostoc, and Carnobacterium. Sequencing of the structural gene of enterocin A, which is located on the bacterial chromosome, revealed an N-terminal leader sequence of 18 amino acid residues, which was removed during the maturation process. The enterocin A leader belongs to the double-glycine leaders which are found among most other small nonlantibiotic bacteriocins, some lantibiotics, and colicin V. Downstream of the enterocin A gene was located a second open reading frame, encoding a putative protein of 103 amino acid residues. This gene may encode the immunity factor of enterocin A, and it shares 40% identity with a similar open reading frame in the operon of leucocin AUL 187, another pediocin-like bacteriocin. PMID:8633865

  11. Use of synthetic peptides and site-specific antibodies to localize a diphtheria toxin sequence associated with ADP-ribosyltransferase activity.

    PubMed Central

    Olson, J C

    1993-01-01

    Diphtheria toxin (DT) and Pseudomonas aeruginosa exotoxin A have the same molecular mechanism of toxicity; both toxins ADP-ribosylate a modified histidine residue in elongation factor 2. To help identify amino acids involved in this reaction, sequences in DT that share homology with P. aeruginosa exotoxin A were synthesized and examined for a role in the ADP-ribosyltransferase reaction. By using this approach, residues 32 to 54 of DT were found to define an epitope associated with antibody-mediated inhibition of DT enzyme activity. This lends further support to the notion that residues in this region of DT are involved in the enzymatic reaction. PMID:8423159

  12. Predicting Flavonoid UGT Regioselectivity

    PubMed Central

    Jackson, Rhydon; Knisley, Debra; McIntosh, Cecilia; Pfeiffer, Phillip

    2011-01-01

    Machine learning was applied to a challenging and biologically significant protein classification problem: the prediction of avonoid UGT acceptor regioselectivity from primary sequence. Novel indices characterizing graphical models of residues were proposed and found to be widely distributed among existing amino acid indices and to cluster residues appropriately. UGT subsequences biochemically linked to regioselectivity were modeled as sets of index sequences. Several learning techniques incorporating these UGT models were compared with classifications based on standard sequence alignment scores. These techniques included an application of time series distance functions to protein classification. Time series distances defined on the index sequences were used in nearest neighbor and support vector machine classifiers. Additionally, Bayesian neural network classifiers were applied to the index sequences. The experiments identified improvements over the nearest neighbor and support vector machine classifications relying on standard alignment similarity scores, as well as strong correlations between specific subsequences and regioselectivities. PMID:21747849

  13. Reptilian-transcriptome v1.0, a glimpse in the brain transcriptome of five divergent Sauropsida lineages and the phylogenetic position of turtles.

    PubMed

    Tzika, Athanasia C; Helaers, Raphaël; Schramm, Gerrit; Milinkovitch, Michel C

    2011-09-26

    Reptiles are largely under-represented in comparative genomics despite the fact that they are substantially more diverse in many respects than mammals. Given the high divergence of reptiles from classical model species, next-generation sequencing of their transcriptomes is an approach of choice for gene identification and annotation. Here, we use 454 technology to sequence the brain transcriptome of four divergent reptilian and one reference avian species: the Nile crocodile, the corn snake, the bearded dragon, the red-eared turtle, and the chicken. Using an in-house pipeline for recursive similarity searches of >3,000,000 reads against multiple databases from 7 reference vertebrates, we compile a reptilian comparative transcriptomics dataset, with homology assignment for 20,000 to 31,000 transcripts per species and a cumulated non-redundant sequence length of 248.6 Mbases. Our approach identifies the majority (87%) of chicken brain transcripts and about 50% of de novo assembled reptilian transcripts. In addition to 57,502 microsatellite loci, we identify thousands of SNP and indel polymorphisms for population genetic and linkage analyses. We also build very large multiple alignments for Sauropsida and mammals (two million residues per species) and perform extensive phylogenetic analyses suggesting that turtles are not basal living reptiles but are rather associated with Archosaurians, hence, potentially answering a long-standing question in the phylogeny of Amniotes. The reptilian transcriptome (freely available at http://www.reptilian-transcriptomes.org) should prove a useful new resource as reptiles are becoming important new models for comparative genomics, ecology, and evolutionary developmental genetics.

  14. Structural details (kinks and non-α conformations) in transmembrane helices are intrahelically determined and can be predicted by sequence pattern descriptors

    PubMed Central

    Rigoutsos, Isidore; Riek, Peter; Graham, Robert M.; Novotny, Jiri

    2003-01-01

    One of the promising methods of protein structure prediction involves the use of amino acid sequence-derived patterns. Here we report on the creation of non-degenerate motif descriptors derived through data mining of training sets of residues taken from the transmembrane-spanning segments of polytopic proteins. These residues correspond to short regions in which there is a deviation from the regular α-helical character (i.e. π-helices, 310-helices and kinks). A ‘search engine’ derived from these motif descriptors correctly identifies, and discriminates amongst instances of the above ‘non-canonical’ helical motifs contained in the SwissProt/TrEMBL database of protein primary structures. Our results suggest that deviations from α-helicity are encoded locally in sequence patterns only about 7–9 residues long and can be determined in silico directly from the amino acid sequence. Delineation of such variations in helical habit is critical to understanding the complex structure–function relationships of polytopic proteins and for drug discovery. The success of our current methodology foretells development of similar prediction tools capable of identifying other structural motifs from sequence alone. The method described here has been implemented and is available on the World Wide Web at http://cbcsrv.watson.ibm.com/Ttkw.html. PMID:12888523

  15. Structural details (kinks and non-alpha conformations) in transmembrane helices are intrahelically determined and can be predicted by sequence pattern descriptors.

    PubMed

    Rigoutsos, Isidore; Riek, Peter; Graham, Robert M; Novotny, Jiri

    2003-08-01

    One of the promising methods of protein structure prediction involves the use of amino acid sequence-derived patterns. Here we report on the creation of non-degenerate motif descriptors derived through data mining of training sets of residues taken from the transmembrane-spanning segments of polytopic proteins. These residues correspond to short regions in which there is a deviation from the regular alpha-helical character (i.e. pi-helices, 3(10)-helices and kinks). A 'search engine' derived from these motif descriptors correctly identifies, and discriminates amongst instances of the above 'non-canonical' helical motifs contained in the SwissProt/TrEMBL database of protein primary structures. Our results suggest that deviations from alpha-helicity are encoded locally in sequence patterns only about 7-9 residues long and can be determined in silico directly from the amino acid sequence. Delineation of such variations in helical habit is critical to understanding the complex structure-function relationships of polytopic proteins and for drug discovery. The success of our current methodology foretells development of similar prediction tools capable of identifying other structural motifs from sequence alone. The method described here has been implemented and is available on the World Wide Web at http://cbcsrv.watson.ibm.com/Ttkw.html.

  16. Precursors of vertebrate peptide antibiotics dermaseptin b and adenoregulin have extensive sequence identities with precursors of opioid peptides dermorphin, dermenkephalin, and deltorphins.

    PubMed

    Amiche, M; Ducancel, F; Mor, A; Boulain, J C; Menez, A; Nicolas, P

    1994-07-08

    The dermaseptins are a family of broad spectrum antimicrobial peptides, 27-34 amino acids long, involved in the defense of the naked skin of frogs against microbial invasion. They are the first vertebrate peptides to show lethal effects against the filamentous fungi responsible for severe opportunistic infections accompanying immunodeficiency syndrome and the use of immunosuppressive agents. A cDNA library was constructed from skin poly(A+) RNA of the arboreal frog Phyllomedusa bicolor and screened with an oligonucleotide probe complementary to the COOH terminus of dermaseptin b. Several clones contained a full-length DNA copy of a 443-nucleotide mRNA that encoded a 78-residue dermaseptin b precursor protein. The deduced precursor contained a putative signal sequence at the NH2 terminus, a 20-residue spacer sequence extremely rich (60%) in glutamic and aspartic acids, and a single copy of a dermaseptin b progenitor sequence at the COOH terminus. One clone contained a complete copy of adenoregulin, a 33-residue peptide reported to enhance the binding of agonists to the A1 adenosine receptor. The mRNAs encoding adenoregulin and dermaseptin b were very similar: 70 and 75% nucleotide identities between the 5'- and 3'-untranslated regions, respectively; 91% amino acid identity between the signal peptides; 82% identity between the acidic spacer sequences; and 38% identity between adenoregulin and dermaseptin b. Because adenoregulin and dermaseptin b have similar precursor designs and antimicrobial spectra, adenoregulin should be considered as a new member of the dermaseptin family and alternatively named dermaseptin b II. Preprodermaseptin b and preproadenoregulin have considerable sequence identities to the precursors encoding the opioid heptapeptides dermorphin, dermenkephalin, and deltorphins. This similarity extended into the 5'-untranslated regions of the mRNAs. These findings suggest that the genes encoding the four preproproteins are all members of the same family despite the fact that they encode end products having very different biological activities. These genes might contain a homologous export exon comprising the 5'-untranslated region, the 22-residue signal peptide, the 20-24-residue acidic spacer, and the basic pair Lys-Arg.

  17. Weed management practice and cropping sequence impact on soil residual nitrogen

    USDA-ARS?s Scientific Manuscript database

    Inefficient N uptake by crops from N fertilization and/or N mineralized from crop residue and soil organic matter results in the accumulation of soil residual N (NH4-N and NO3-N) which increases the potential for N leaching. The objective of this study was to evaluate the effects of weed management ...

  18. Specific primary sequence requirements for Aurora B kinase-mediated phosphorylation and subcellular localization of TMAP during mitosis.

    PubMed

    Kim, Hyun-Jun; Kwon, Hye-Rim; Bae, Chang-Dae; Park, Joobae; Hong, Kyung U

    2010-05-15

    During mitosis, regulation of protein structures and functions by phosphorylation plays critical roles in orchestrating a series of complex events essential for the cell division process. Tumor-associated microtubule-associated protein (TMAP), also known as cytoskeleton-associated protein 2 (CKAP2), is a novel player in spindle assembly and chromosome segregation. We have previously reported that TMAP is phosphorylated at multiple residues specifically during mitosis. However, the mechanisms and functional importance of phosphorylation at most of the sites identified are currently unknown. Here, we report that TMAP is a novel substrate of the Aurora B kinase. Ser627 of TMAP was specifically phosphorylated by Aurora B both in vitro and in vivo. Ser627 and neighboring conserved residues were strictly required for efficient phosphorylation of TMAP by Aurora B, as even minor amino acid substitutions of the phosphorylation motif significantly diminished the efficiency of the substrate phosphorylation. Nearly all mutations at the phosphorylation motif had dramatic effects on the subcellular localization of TMAP. Instead of being localized to the chromosome region during late mitosis, the mutants remained associated with microtubules and centrosomes throughout mitosis. However, the changes in the subcellular localization of these mutants could not be completely explained by the phosphorylation status on Ser627. Our findings suggest that the motif surrounding Ser627 ((625) RRSRRL (630)) is a critical part of a functionally important sequence motif which not only governs the kinase-substrate recognition, but also regulates the subcellular localization of TMAP during mitosis.

  19. Computational design of d-peptide inhibitors of hepatitis delta antigen dimerization

    NASA Astrophysics Data System (ADS)

    Elkin, Carl D.; Zuccola, Harmon J.; Hogle, James M.; Joseph-McCarthy, Diane

    2000-11-01

    Hepatitis delta virus (HDV) encodes a single polypeptide called hepatitis delta antigen (DAg). Dimerization of DAg is required for viral replication. The structure of the dimerization region, residues 12 to 60, consists of an anti-parallel coiled coil [Zuccola et al., Structure, 6 (1998) 821]. Multiple Copy Simultaneous Searches (MCSS) of the hydrophobic core region formed by the bend in the helix of one monomer of this structure were carried out for many diverse functional groups. Six critical interaction sites were identified. The Protein Data Bank was searched for backbone templates to use in the subsequent design process by matching to these sites. A 14 residue helix expected to bind to the d-isomer of the target structure was selected as the template. Over 200 000 mutant sequences of this peptide were generated based on the MCSS results. A secondary structure prediction algorithm was used to screen all sequences, and in general only those that were predicted to be highly helical were retained. Approximately 100 of these 14-mers were model built as d-peptides and docked with the l-isomer of the target monomer. Based on calculated interaction energies, predicted helicity, and intrahelical salt bridge patterns, a small number of peptides were selected as the most promising candidates. The ligand design approach presented here is the computational analogue of mirror image phage display. The results have been used to characterize the interactions responsible for formation of this model anti-parallel coiled coil and to suggest potential ligands to disrupt it.

  20. ACCA phosphopeptide recognition by the BRCT repeats of BRCA1.

    PubMed

    Ray, Hind; Moreau, Karen; Dizin, Eva; Callebaut, Isabelle; Venezia, Nicole Dalla

    2006-06-16

    The tumour suppressor gene BRCA1 encodes a 220 kDa protein that participates in multiple cellular processes. The BRCA1 protein contains a tandem of two BRCT repeats at its carboxy-terminal region. The majority of disease-associated BRCA1 mutations affect this region and provide to the BRCT repeats a central role in the BRCA1 tumour suppressor function. The BRCT repeats have been shown to mediate phospho-dependant protein-protein interactions. They recognize phosphorylated peptides using a recognition groove that spans both BRCT repeats. We previously identified an interaction between the tandem of BRCA1 BRCT repeats and ACCA, which was disrupted by germ line BRCA1 mutations that affect the BRCT repeats. We recently showed that BRCA1 modulates ACCA activity through its phospho-dependent binding to ACCA. To delineate the region of ACCA that is crucial for the regulation of its activity by BRCA1, we searched for potential phosphorylation sites in the ACCA sequence that might be recognized by the BRCA1 BRCT repeats. Using sequence analysis and structure modelling, we proposed the Ser1263 residue as the most favourable candidate among six residues, for recognition by the BRCA1 BRCT repeats. Using experimental approaches, such as GST pull-down assay with Bosc cells, we clearly showed that phosphorylation of only Ser1263 was essential for the interaction of ACCA with the BRCT repeats. We finally demonstrated by immunoprecipitation of ACCA in cells, that the whole BRCA1 protein interacts with ACCA when phosphorylated on Ser1263.

  1. Resonance assignment of disordered protein with repetitive and overlapping sequence using combinatorial approach reveals initial structural propensities and local restrictions in the denatured state.

    PubMed

    Malik, Nikita; Kumar, Ashutosh

    2016-09-01

    NMR resonance assignment of intrinsically disordered proteins poses a challenge because of the limited dispersion of amide proton chemical shifts. This becomes even more complex with the increase in the size of the system. Residue specific selective labeling/unlabeling experiments have been used to resolve the overlap, but require multiple sample preparations. Here, we demonstrate an assignment strategy requiring only a single sample of uniformly labeled (13)C,(15)N-protein. We have used a combinatorial approach, involving 3D-HNN, CC(CO)NH and 2D-MUSIC, which allowed us to assign a denatured centromeric protein Cse4 of 229 residues. Further, we show that even the less sensitive experiments, when used in an efficient manner can lead to the complete assignment of a complex system without the use of specialized probes in a relatively short time frame. The assignment of the amino acids discloses the presence of local structural propensities even in the denatured state accompanied by restricted motion in certain regions that provides insights into the early folding events of the protein.

  2. Structural organization of intercellular channels II. Amino terminal domain of the connexins: sequence, functional roles, and structure.

    PubMed

    Beyer, Eric C; Lipkind, Gregory M; Kyle, John W; Berthoud, Viviana M

    2012-08-01

    The amino terminal domain (NT) of the connexins consists of their first 22-23 amino acids. Site-directed mutagenesis studies have demonstrated that NT amino acids are determinants of gap junction channel properties including unitary conductance, permeability/selectivity, and gating in response to transjunctional voltage. The importance of this region has also been emphasized by the identification of multiple disease-associated connexin mutants affecting amino acid residues in the NT region. The first part of the NT is α-helical. The structure of the Cx26 gap junction channel shows that the NT α-helix localizes within the channel, and lines the wall of the pore. Interactions of the amino acid residues in the NT with those in the transmembrane helices may be critical for holding the channel open. The predicted sites of these interactions and the applicability of the Cx26 structure to the NT of other connexins are considered. This article is part of a Special Issue entitled: The Communicating junctions, composition, structure and characteristics. Copyright © 2011. Published by Elsevier B.V.

  3. Protein 3-Nitrotyrosine in Complex Biological Samples: Quantification by High-Pressure Liquid Chromatography/Electrochemical Detection and Emergence of Proteomic Approaches for Unbiased Identification of Modification Sites

    PubMed Central

    Nuriel, Tal; Deeb, Ruba S.; Hajjar, David P.; Gross, Steven S.

    2008-01-01

    Nitration of tyrosine residues by nitric oxide (NO)-derived species results in the accumulation of 3-nitrotyrosine in proteins, a hallmark of nitrosative stress in cells and tissues. Tyrosine nitration is recognized as one of the multiple signaling modalities used by NO-derived species for the regulation of protein structure and function in health and disease. Various methods have been described for the quantification of protein 3-nitrotyrosine residues, and several strategies have been presented toward the goal of proteome-wide identification of protein tyrosine modification sites. This chapter details a useful protocol for the quantification of 3-nitrotyrosine in cells and tissues using high-pressure liquid chromatography with electrochemical detection. Additionally, this chapter describes a novel biotin-tagging strategy for specific enrichment of 3-nitrotyrosine-containing peptides. Application of this strategy, in conjunction with high-throughput MS/MS-based peptide sequencing, is anticipated to fuel efforts in developing comprehensive inventories of nitrosative stress-induced protein-tyrosine modification sites in cells and tissues. PMID:18554526

  4. Cloning of the IgM heavy chain of the bottlenose dolphin (Tursiops truncatus), and initial analysis of VH gene usage.

    PubMed

    Lundqvist, Mats L; Kohlberg, Kathleen E; Gefroh, Holly A; Arnaud, Philippe; Middleton, Darlene L; Romano, Tracy A; Warr, Gregory W

    2002-07-01

    Clones encoding the dolphin IgM heavy (micro) chain gene were isolated from a cDNA library of peripheral blood leukocytes. Genomic Southern blot analyses showed that the dolphin IGHM gene is most likely present in a single copy, and its sequence shows greatest similarity to those of the IGHM gene of the sheep, pig and cow, evolutionarily related artiodactyls. The transmembrane (TM) form of the IGHM chain was isolated by 3' RACE. While showing similarities to the TM regions of other mammalian IGHM chains, the highly conserved Ser residue of the CART motif is substituted with a Gly in the dolphin. In contrast to the pig and cow, which utilize only a single VH family, the dolphin expresses at least two distinct VH families, belonging to the mammalian VH clans I and III. At least two JH genes were identified in the dolphin. Some CDR3 regions of the dolphin VH are long (up to 21 amino acids), and contain multiple Cys residues, hypothesized to stabilize the CDR3 structure through disulfide bond formation.

  5. The Structure of Rauvolfia serpentina Strictosidine Synthase Is a Novel Six-Bladed β-Propeller Fold in Plant Proteins[W

    PubMed Central

    Ma, Xueyan; Panjikar, Santosh; Koepke, Juergen; Loris, Elke; Stöckigt, Joachim

    2006-01-01

    The enzyme strictosidine synthase (STR1) from the Indian medicinal plant Rauvolfia serpentina is of primary importance for the biosynthetic pathway of the indole alkaloid ajmaline. Moreover, STR1 initiates all biosynthetic pathways leading to the entire monoterpenoid indole alkaloid family representing an enormous structural variety of ∼2000 compounds in higher plants. The crystal structures of STR1 in complex with its natural substrates tryptamine and secologanin provide structural understanding of the observed substrate preference and identify residues lining the active site surface that contact the substrates. STR1 catalyzes a Pictet-Spengler–type reaction and represents a novel six-bladed β-propeller fold in plant proteins. Structure-based sequence alignment revealed a common repetitive sequence motif (three hydrophobic residues are followed by a small residue and a hydrophilic residue), indicating a possible evolutionary relationship between STR1 and several sequence-unrelated six-bladed β-propeller structures. Structural analysis and site-directed mutagenesis experiments demonstrate the essential role of Glu-309 in catalysis. The data will aid in deciphering the details of the reaction mechanism of STR1 as well as other members of this enzyme family. PMID:16531499

  6. The structure of Rauvolfia serpentina strictosidine synthase is a novel six-bladed beta-propeller fold in plant proteins.

    PubMed

    Ma, Xueyan; Panjikar, Santosh; Koepke, Juergen; Loris, Elke; Stöckigt, Joachim

    2006-04-01

    The enzyme strictosidine synthase (STR1) from the Indian medicinal plant Rauvolfia serpentina is of primary importance for the biosynthetic pathway of the indole alkaloid ajmaline. Moreover, STR1 initiates all biosynthetic pathways leading to the entire monoterpenoid indole alkaloid family representing an enormous structural variety of approximately 2000 compounds in higher plants. The crystal structures of STR1 in complex with its natural substrates tryptamine and secologanin provide structural understanding of the observed substrate preference and identify residues lining the active site surface that contact the substrates. STR1 catalyzes a Pictet-Spengler-type reaction and represents a novel six-bladed beta-propeller fold in plant proteins. Structure-based sequence alignment revealed a common repetitive sequence motif (three hydrophobic residues are followed by a small residue and a hydrophilic residue), indicating a possible evolutionary relationship between STR1 and several sequence-unrelated six-bladed beta-propeller structures. Structural analysis and site-directed mutagenesis experiments demonstrate the essential role of Glu-309 in catalysis. The data will aid in deciphering the details of the reaction mechanism of STR1 as well as other members of this enzyme family.

  7. Packing in protein cores

    NASA Astrophysics Data System (ADS)

    Gaines, J. C.; Clark, A. H.; Regan, L.; O'Hern, C. S.

    2017-07-01

    Proteins are biological polymers that underlie all cellular functions. The first high-resolution protein structures were determined by x-ray crystallography in the 1960s. Since then, there has been continued interest in understanding and predicting protein structure and stability. It is well-established that a large contribution to protein stability originates from the sequestration from solvent of hydrophobic residues in the protein core. How are such hydrophobic residues arranged in the core; how can one best model the packing of these residues, and are residues loosely packed with multiple allowed side chain conformations or densely packed with a single allowed side chain conformation? Here we show that to properly model the packing of residues in protein cores it is essential that amino acids are represented by appropriately calibrated atom sizes, and that hydrogen atoms are explicitly included. We show that protein cores possess a packing fraction of φ ≈ 0.56 , which is significantly less than the typically quoted value of 0.74 obtained using the extended atom representation. We also compare the results for the packing of amino acids in protein cores to results obtained for jammed packings from discrete element simulations of spheres, elongated particles, and composite particles with bumpy surfaces. We show that amino acids in protein cores pack as densely as disordered jammed packings of particles with similar values for the aspect ratio and bumpiness as found for amino acids. Knowing the structural properties of protein cores is of both fundamental and practical importance. Practically, it enables the assessment of changes in the structure and stability of proteins arising from amino acid mutations (such as those identified as a result of the massive human genome sequencing efforts) and the design of new folded, stable proteins and protein-protein interactions with tunable specificity and affinity.

  8. Mass spectrometric determination of early and advanced glycation in biology.

    PubMed

    Rabbani, Naila; Ashour, Amal; Thornalley, Paul J

    2016-08-01

    Protein glycation in biological systems occurs predominantly on lysine, arginine and N-terminal residues of proteins. Major quantitative glycation adducts are found at mean extents of modification of 1-5 mol percent of proteins. These are glucose-derived fructosamine on lysine and N-terminal residues of proteins, methylglyoxal-derived hydroimidazolone on arginine residues and N(ε)-carboxymethyl-lysine residues mainly formed by the oxidative degradation of fructosamine. Total glycation adducts of different types are quantified by stable isotopic dilution analysis liquid chromatography-tandem mass spectrometry (LC-MS/MS) in multiple reaction monitoring mode. Metabolism of glycated proteins is followed by LC-MS/MS of glycation free adducts as minor components of the amino acid metabolome. Glycated proteins and sites of modification within them - amino acid residues modified by the glycating agent moiety - are identified and quantified by label-free and stable isotope labelling with amino acids in cell culture (SILAC) high resolution mass spectrometry. Sites of glycation by glucose and methylglyoxal in selected proteins are listed. Key issues in applying proteomics techniques to analysis of glycated proteins are: (i) avoiding compromise of analysis by formation, loss and relocation of glycation adducts in pre-analytic processing; (ii) specificity of immunoaffinity enrichment procedures, (iii) maximizing protein sequence coverage in mass spectrometric analysis for detection of glycation sites, and (iv) development of bioinformatics tools for prediction of protein glycation sites. Protein glycation studies have important applications in biology, ageing and translational medicine - particularly on studies of obesity, diabetes, cardiovascular disease, renal failure, neurological disorders and cancer. Mass spectrometric analysis of glycated proteins has yet to find widespread use clinically. Future use in health screening, disease diagnosis and therapeutic monitoring, and drug and functional food development is expected. A protocol for high resolution mass spectrometry proteomics of glycated proteins is given.

  9. Experimental and analytical study of high velocity impact on Kevlar/Epoxy composite plates

    NASA Astrophysics Data System (ADS)

    Sikarwar, Rahul S.; Velmurugan, Raman; Madhu, Velmuri

    2012-12-01

    In the present study, impact behavior of Kevlar/Epoxy composite plates has been carried out experimentally by considering different thicknesses and lay-up sequences and compared with analytical results. The effect of thickness, lay-up sequence on energy absorbing capacity has been studied for high velocity impact. Four lay-up sequences and four thickness values have been considered. Initial velocities and residual velocities are measured experimentally to calculate the energy absorbing capacity of laminates. Residual velocity of projectile and energy absorbed by laminates are calculated analytically. The results obtained from analytical study are found to be in good agreement with experimental results. It is observed from the study that 0/90 lay-up sequence is most effective for impact resistance. Delamination area is maximum on the back side of the plate for all thickness values and lay-up sequences. The delamination area on the back is maximum for 0/90/45/-45 laminates compared to other lay-up sequences.

  10. Peptide vaccine against canine parvovirus: identification of two neutralization subsites in the N terminus of VP2 and optimization of the amino acid sequence.

    PubMed Central

    Casal, J I; Langeveld, J P; Cortés, E; Schaaper, W W; van Dijk, E; Vela, C; Kamstrup, S; Meloen, R H

    1995-01-01

    The N-terminal domain of the major capsid protein VP2 of canine parvovirus was shown to be an excellent target for development of a synthetic peptide vaccine, but detailed information about number of epitopes, optimal length, sequence choice, and site of coupling to the carrier protein was lacking. Therefore, several overlapping peptides based on this N terminus were synthesized to establish conditions for optimal and reproducible induction of neutralizing antibodies in rabbits. The specificity and neutralizing ability of the antibody response for these peptides were determined. Within the N-terminal 23 residues of VP2, two subsites able to induce neutralizing antibodies and which overlapped by only two glycine residues at positions 10 and 11 could be discriminated. The shortest sequence sufficient for neutralization induction was nine residues. Peptides longer than 13 residues consistently induced neutralization, provided that their N termini were located between positions 1 and 11 of VP2. The orientation of the peptides at the carrier protein was also of importance, being more effective when coupled through the N terminus than through the C terminus to keyhole limpet hemocyanin. The results suggest that the presence of amino acid residues 2 to 21 (and probably 3 to 17) of VP2 in a single peptide is preferable for a synthetic peptide vaccine. PMID:7474152

  11. Peptide vaccine against canine parvovirus: identification of two neutralization subsites in the N terminus of VP2 and optimization of the amino acid sequence.

    PubMed

    Casal, J I; Langeveld, J P; Cortés, E; Schaaper, W W; van Dijk, E; Vela, C; Kamstrup, S; Meloen, R H

    1995-11-01

    The N-terminal domain of the major capsid protein VP2 of canine parvovirus was shown to be an excellent target for development of a synthetic peptide vaccine, but detailed information about number of epitopes, optimal length, sequence choice, and site of coupling to the carrier protein was lacking. Therefore, several overlapping peptides based on this N terminus were synthesized to establish conditions for optimal and reproducible induction of neutralizing antibodies in rabbits. The specificity and neutralizing ability of the antibody response for these peptides were determined. Within the N-terminal 23 residues of VP2, two subsites able to induce neutralizing antibodies and which overlapped by only two glycine residues at positions 10 and 11 could be discriminated. The shortest sequence sufficient for neutralization induction was nine residues. Peptides longer than 13 residues consistently induced neutralization, provided that their N termini were located between positions 1 and 11 of VP2. The orientation of the peptides at the carrier protein was also of importance, being more effective when coupled through the N terminus than through the C terminus to keyhole limpet hemocyanin. The results suggest that the presence of amino acid residues 2 to 21 (and probably 3 to 17) of VP2 in a single peptide is preferable for a synthetic peptide vaccine.

  12. Binding properties of SUMO-interacting motifs (SIMs) in yeast.

    PubMed

    Jardin, Christophe; Horn, Anselm H C; Sticht, Heinrich

    2015-03-01

    Small ubiquitin-like modifier (SUMO) conjugation and interaction play an essential role in many cellular processes. A large number of yeast proteins is known to interact non-covalently with SUMO via short SUMO-interacting motifs (SIMs), but the structural details of this interaction are yet poorly characterized. In the present work, sequence analysis of a large dataset of 148 yeast SIMs revealed the existence of a hydrophobic core binding motif and a preference for acidic residues either within or adjacent to the core motif. Thus the sequence properties of yeast SIMs are highly similar to those described for human. Molecular dynamics simulations were performed to investigate the binding preferences for four representative SIM peptides differing in the number and distribution of acidic residues. Furthermore, the relative stability of two previously observed alternative binding orientations (parallel, antiparallel) was assessed. For all SIMs investigated, the antiparallel binding mode remained stable in the simulations and the SIMs were tightly bound via their hydrophobic core residues supplemented by polar interactions of the acidic residues. In contrary, the stability of the parallel binding mode is more dependent on the sequence features of the SIM motif like the number and position of acidic residues or the presence of additional adjacent interaction motifs. This information should be helpful to enhance the prediction of SIMs and their binding properties in different organisms to facilitate the reconstruction of the SUMO interactome.

  13. An asparagine residue at the N-terminus affects the maturation process of low molecular weight glutenin subunits of wheat endosperm

    PubMed Central

    2014-01-01

    Background Wheat glutenin polymers are made up of two main subunit types, the high- (HMW-GS) and low- (LMW-GS) molecular weight subunits. These latter are represented by heterogeneous proteins. The most common, based on the first amino acid of the mature sequence, are known as LMW-m and LMW-s types. The mature sequences differ as a consequence of three extra amino acids (MET-) at the N-terminus of LMW-m types. The nucleotide sequences of their encoding genes are, however, nearly identical, so that the relationship between gene and protein sequences is difficult to ascertain. It has been hypothesized that the presence of an asparagine residue in position 23 of the complete coding sequence for the LMW-s type might account for the observed three-residue shortened sequence, as a consequence of cleavage at the asparagine by an asparaginyl endopeptidase. Results We performed site-directed mutagenesis of a LMW-s gene to replace asparagine at position 23 with threonine and thus convert it to a candidate LMW-m type gene. Similarly, a candidate LMW-m type gene was mutated at position 23 to replace threonine with asparagine. Next, we produced transgenic durum wheat (cultivar Svevo) lines by introducing the mutated versions of the LMW-m and LMW-s genes, along with the wild type counterpart of the LMW-m gene. Proteomic comparisons between the transgenic and null segregant plants enabled identification of transgenic proteins by mass spectrometry analyses and Edman N-terminal sequencing. Conclusions Our results show that the formation of LMW-s type relies on the presence of an asparagine residue close to the N-terminus generated by signal peptide cleavage, and that LMW-GS can be quantitatively processed most likely by vacuolar asparaginyl endoproteases, suggesting that those accumulated in the vacuole are not sequestered into stable aggregates that would hinder the action of proteolytic enzymes. Rather, whatever is the mechanism of glutenin polymer transport to the vacuole, the proteins remain available for proteolytic processing, and can be converted to the mature form by the removal of a short N-terminal sequence. PMID:24629124

  14. Sub-lethal effects of pesticide residues in brood comb on worker honey bee (Apis mellifera) development and longevity.

    PubMed

    Wu, Judy Y; Anelli, Carol M; Sheppard, Walter S

    2011-02-23

    Numerous surveys reveal high levels of pesticide residue contamination in honey bee comb. We conducted studies to examine possible direct and indirect effects of pesticide exposure from contaminated brood comb on developing worker bees and adult worker lifespan. Worker bees were reared in brood comb containing high levels of known pesticide residues (treatment) or in relatively uncontaminated brood comb (control). Delayed development was observed in bees reared in treatment combs containing high levels of pesticides particularly in the early stages (day 4 and 8) of worker bee development. Adult longevity was reduced by 4 days in bees exposed to pesticide residues in contaminated brood comb during development. Pesticide residue migration from comb containing high pesticide residues caused contamination of control comb after multiple brood cycles and provided insight on how quickly residues move through wax. Higher brood mortality and delayed adult emergence occurred after multiple brood cycles in contaminated control combs. In contrast, survivability increased in bees reared in treatment comb after multiple brood cycles when pesticide residues had been reduced in treatment combs due to residue migration into uncontaminated control combs, supporting comb replacement efforts. Chemical analysis after the experiment confirmed the migration of pesticide residues from treatment combs into previously uncontaminated control comb. This study is the first to demonstrate sub-lethal effects on worker honey bees from pesticide residue exposure from contaminated brood comb. Sub-lethal effects, including delayed larval development and adult emergence or shortened adult longevity, can have indirect effects on the colony such as premature shifts in hive roles and foraging activity. In addition, longer development time for bees may provide a reproductive advantage for parasitic Varroa destructor mites. The impact of delayed development in bees on Varroa mite fecundity should be examined further.

  15. Sequence-structural features and evolutionary relationships of family GH57 α-amylases and their putative α-amylase-like homologues.

    PubMed

    Janeček, Stefan; Blesák, Karol

    2011-08-01

    The glycoside hydrolase family 57 (GH57) contains α-amylase and a few other amylolytic specificities. It counts ~400 members from Archaea (1/4) and Bacteria (3/4), mostly of extremophilic prokaryotes. Only 17 GH57 enzymes have been biochemically characterized. The main goal of the present bioinformatics study was to analyze sequences having the clear GH57 α-amylase features. Of the 107 GH57 sequences, 59 were evaluated as α-amylases (containing both GH57 catalytic residues), whereas 48 were assigned as GH57 α-amylase-like proteins (having a substitution in one or both catalytic residues). Forty-eight of 59 α-amylases were from Archaea, but 42 of 48 α-amylase-like proteins were of bacterial origin. The catalytic residues were substituted in most cases in Bacteroides and Prevotella by serine (instead of catalytic nucleophile glutamate) and glutamate (instead of proton donor aspartate). The GH57 α-amylase specificity has thus been evolved and kept enzymatically active mainly in Archaea.

  16. Molecular characterization of canine parvovirus in Vientiane, Laos.

    PubMed

    Vannamahaxay, Soulasack; Vongkhamchanh, Souliya; Intanon, Montira; Tangtrongsup, Sahatchai; Tiwananthagorn, Saruda; Pringproa, Kidsadagon; Chuammitri, Phongsakorn

    2017-05-01

    The global emergence of canine parvovirus type 2c (CPV-2c) has been well documented. In the present study, 139 rectal swab samples collected from diarrheic dogs living in Vientiane, Laos, in 2016 were tested for the presence of the canine parvovirus (CPV) VP2 gene by PCR. The results showed that 82.73% (115/139) of dogs were CPV positive by PCR. The partial VP2 gene was sequenced in 94 of the positive samples; 91 samples belonged to CPV-2c (426Glu) subtype, while 3 samples belonged to the CPV-2a (426Asn) subtype. Notably, phylogenetic analysis of amino acid sequences revealed a close relationship between Laotian isolates and novel Chinese CPV-2c isolates. In Laotian CPV isolates, aligned protein sequences indicated a high rate of residue substitutions at positions 305, 324, 345, 370, 375, and 426 in the GH loop. The mutation at residue 370 (Q370R), a single mutation, was characterized as a unique mutant residue specific to the Laotian CPV-2c variant.

  17. A Novel Amyloid Designable Scaffold and Potential Inhibitor Inspired by GAIIG of Amyloid Beta and the HIV-1 V3 loop.

    PubMed

    Kokotidou, C; Jonnalagadda, S V R; Orr, A A; Seoane-Blanco, M; Apostolidou, C P; van Raaij, M J; Kotzabasaki, M; Chatzoudis, A; Jakubowski, J M; Mossou, E; Forsyth, V T; Mitchell, E P; Bowler, M W; Llamas-Saiz, A L; Tamamis, P; Mitraki, A

    2018-05-17

    The GAIIG sequence, common to the amyloid beta peptide (residues 29-33) and to the HIV gp 120 (residues 24-28 in a typical V3 loop) self-assembles into amyloid fibrils, as suggested by theory and the experiments presented here. The longer YATGAIIGNII sequence from the V3 loop also self-assembles into amyloid fibrils, of which the first three and the last two residues are outside the amyloid GAIIG core. We postulate that this sequence, with suitable selected replacements at the flexible positions, can serve as a designable scaffold for novel amyloid-based materials. Moreover, we report the single X-ray crystal structure of the beta-breaker peptide GAIPIG at 1.05 Å resolution. This structural information could serve as the basis for structure-based design of potential inhibitors of amyloid formation. This article is protected by copyright. All rights reserved. This article is protected by copyright. All rights reserved.

  18. Protein model discrimination using mutational sensitivity derived from deep sequencing.

    PubMed

    Adkar, Bharat V; Tripathi, Arti; Sahoo, Anusmita; Bajaj, Kanika; Goswami, Devrishi; Chakrabarti, Purbani; Swarnkar, Mohit K; Gokhale, Rajesh S; Varadarajan, Raghavan

    2012-02-08

    A major bottleneck in protein structure prediction is the selection of correct models from a pool of decoys. Relative activities of ∼1,200 individual single-site mutants in a saturation library of the bacterial toxin CcdB were estimated by determining their relative populations using deep sequencing. This phenotypic information was used to define an empirical score for each residue (RankScore), which correlated with the residue depth, and identify active-site residues. Using these correlations, ∼98% of correct models of CcdB (RMSD ≤ 4Å) were identified from a large set of decoys. The model-discrimination methodology was further validated on eleven different monomeric proteins using simulated RankScore values. The methodology is also a rapid, accurate way to obtain relative activities of each mutant in a large pool and derive sequence-structure-function relationships without protein isolation or characterization. It can be applied to any system in which mutational effects can be monitored by a phenotypic readout. Copyright © 2012 Elsevier Ltd. All rights reserved.

  19. Contribution to the Prediction of the Fold Code: Application to Immunoglobulin and Flavodoxin Cases

    PubMed Central

    Banach, Mateusz; Prudhomme, Nicolas; Carpentier, Mathilde; Duprat, Elodie; Papandreou, Nikolaos; Kalinowska, Barbara; Chomilier, Jacques; Roterman, Irena

    2015-01-01

    Background Folding nucleus of globular proteins formation starts by the mutual interaction of a group of hydrophobic amino acids whose close contacts allow subsequent formation and stability of the 3D structure. These early steps can be predicted by simulation of the folding process through a Monte Carlo (MC) coarse grain model in a discrete space. We previously defined MIRs (Most Interacting Residues), as the set of residues presenting a large number of non-covalent neighbour interactions during such simulation. MIRs are good candidates to define the minimal number of residues giving rise to a given fold instead of another one, although their proportion is rather high, typically [15-20]% of the sequences. Having in mind experiments with two sequences of very high levels of sequence identity (up to 90%) but different folds, we combined the MIR method, which takes sequence as single input, with the “fuzzy oil drop” (FOD) model that requires a 3D structure, in order to estimate the residues coding for the fold. FOD assumes that a globular protein follows an idealised 3D Gaussian distribution of hydrophobicity density, with the maximum in the centre and minima at the surface of the “drop”. If the actual local density of hydrophobicity around a given amino acid is as high as the ideal one, then this amino acid is assigned to the core of the globular protein, and it is assumed to follow the FOD model. Therefore one obtains a distribution of the amino acids of a protein according to their agreement or rejection with the FOD model. Results We compared and combined MIR and FOD methods to define the minimal nucleus, or keystone, of two populated folds: immunoglobulin-like (Ig) and flavodoxins (Flav). The combination of these two approaches defines some positions both predicted as a MIR and assigned as accordant with the FOD model. It is shown here that for these two folds, the intersection of the predicted sets of residues significantly differs from random selection. It reduces the number of selected residues by each individual method and allows a reasonable agreement with experimentally determined key residues coding for the particular fold. In addition, the intersection of the two methods significantly increases the specificity of the prediction, providing a robust set of residues that constitute the folding nucleus. PMID:25915049

  20. Accurate Prediction of Protein Contact Maps by Coupling Residual Two-Dimensional Bidirectional Long Short-Term Memory with Convolutional Neural Networks.

    PubMed

    Hanson, Jack; Paliwal, Kuldip; Litfin, Thomas; Yang, Yuedong; Zhou, Yaoqi

    2018-06-19

    Accurate prediction of a protein contact map depends greatly on capturing as much contextual information as possible from surrounding residues for a target residue pair. Recently, ultra-deep residual convolutional networks were found to be state-of-the-art in the latest Critical Assessment of Structure Prediction techniques (CASP12, (Schaarschmidt et al., 2018)) for protein contact map prediction by attempting to provide a protein-wide context at each residue pair. Recurrent neural networks have seen great success in recent protein residue classification problems due to their ability to propagate information through long protein sequences, especially Long Short-Term Memory (LSTM) cells. Here we propose a novel protein contact map prediction method by stacking residual convolutional networks with two-dimensional residual bidirectional recurrent LSTM networks, and using both one-dimensional sequence-based and two-dimensional evolutionary coupling-based information. We show that the proposed method achieves a robust performance over validation and independent test sets with the Area Under the receiver operating characteristic Curve (AUC)>0.95 in all tests. When compared to several state-of-the-art methods for independent testing of 228 proteins, the method yields an AUC value of 0.958, whereas the next-best method obtains an AUC of 0.909. More importantly, the improvement is over contacts at all sequence-position separations. Specifically, a 8.95%, 5.65% and 2.84% increase in precision were observed for the top L∕10 predictions over the next best for short, medium and long-range contacts, respectively. This confirms the usefulness of ResNets to congregate the short-range relations and 2D-BRLSTM to propagate the long-range dependencies throughout the entire protein contact map 'image'. SPOT-Contact server url: http://sparks-lab.org/jack/server/SPOT-Contact/. Supplementary data is available at Bioinformatics online.

  1. Conformational analysis of the N-terminal sequence Met1 Val60 of the tyrosine hydroxylase

    NASA Astrophysics Data System (ADS)

    Alieva, Irada N.; Mustafayeva, Narmina N.; Gojayev, Niftali M.

    2006-03-01

    Molecular mechanics method and molecular dynamics (MD) simulation techniques are used to study the behavior and the effect of the amino acids substitution on structure and molecular dynamics of the specific portion of Met1-Val60 amino acid residues from N-terminal regulatory domain of the tyrosine hydroxylase (TH) and its mutants in which the positively charged arginine residues at positions 37 and 38 were replaced by electrically neutral Gly and negatively charged Glu, and serine residue at position 40 was replaced by Ala or Asp residue. Our study allowed us to make the following conclusions: (i) the higher conformational flexibility of the Met1-Arg16 sequence is revealed in comparision to other part of the N-terminus; (ii) the stretch of amino acid residues Met30-Ser40 within the N-terminus forms β-turn so that two α-helices (residues 16-29 and residues 41-60) are paralel one another; (ii) the significant differences that are observed for the Arg37→Gly37, Arg37-Arg38→Glu37-Glu38 mutant segments indicates that the positive charge of the Arg37 and Arg38 residues is one of the main factor that maintains the characteristic of the turn; (ii) no major conformational changes are observed between Ser40→Ala40, and Ser40→Asp40 mutant segments.

  2. A mechanistic insight into the amyloidogenic structure of hIAPP peptide revealed from sequence analysis and molecular dynamics simulation.

    PubMed

    Chakraborty, Sandipan; Chatterjee, Barnali; Basu, Soumalee

    2012-07-01

    A collective approach of sequence analysis, phylogenetic tree and in silico prediction of amyloidogenecity using bioinformatics tools have been used to correlate the observed species-specific variations in IAPP sequences with the amyloid forming propensity. Observed substitution patterns indicate that probable changes in local hydrophobicity are instrumental in altering the aggregation propensity of the peptide. In particular, residues at 17th, 22nd and 23rd positions of the IAPP peptide are found to be crucial for amyloid formation. Proline25 primarily dictates the observed non-amyloidogenecity in rodents. Furthermore, extensive molecular dynamics simulation of 0.24 μs have been carried out with human IAPP (hIAPP) fragment 19-27, the portion showing maximum sequence variation across different species, to understand the native folding characteristic of this region. Principal component analysis in combination with free energy landscape analysis illustrates a four residue turn spanning from residue 22 to 25. The results provide a structural insight into the intramolecular β-sheet structure of amylin which probably is the template for nucleation of fibril formation and growth, a pathogenic feature of type II diabetes. Copyright © 2012 Elsevier B.V. All rights reserved.

  3. Structure-related statistical singularities along protein sequences: a correlation study.

    PubMed

    Colafranceschi, Mauro; Colosimo, Alfredo; Zbilut, Joseph P; Uversky, Vladimir N; Giuliani, Alessandro

    2005-01-01

    A data set composed of 1141 proteins representative of all eukaryotic protein sequences in the Swiss-Prot Protein Knowledge base was coded by seven physicochemical properties of amino acid residues. The resulting numerical profiles were submitted to correlation analysis after the application of a linear (simple mean) and a nonlinear (Recurrence Quantification Analysis, RQA) filter. The main RQA variables, Recurrence and Determinism, were subsequently analyzed by Principal Component Analysis. The RQA descriptors showed that (i) within protein sequences is embedded specific information neither present in the codes nor in the amino acid composition and (ii) the most sensitive code for detecting ordered recurrent (deterministic) patterns of residues in protein sequences is the Miyazawa-Jernigan hydrophobicity scale. The most deterministic proteins in terms of autocorrelation properties of primary structures were found (i) to be involved in protein-protein and protein-DNA interactions and (ii) to display a significantly higher proportion of structural disorder with respect to the average data set. A study of the scaling behavior of the average determinism with the setting parameters of RQA (embedding dimension and radius) allows for the identification of patterns of minimal length (six residues) as possible markers of zones specifically prone to inter- and intramolecular interactions.

  4. Fifty years of coiled-coils and alpha-helical bundles: a close relationship between sequence and structure.

    PubMed

    Parry, David A D; Fraser, R D Bruce; Squire, John M

    2008-09-01

    alpha-Helical coiled coils are remarkable for the diversity of related conformations that they adopt in both fibrous and globular proteins, and for the range of functions that they exhibit. The coiled coils are based on a heptad (7-residue), hendecad (11-residue) or a related quasi-repeat of apolar residues in the sequences of the alpha-helical regions involved. Most of these, however, display one or more sequence discontinuities known as stutters or stammers. The resulting coiled coils vary in length, in the number of chains participating, in the relative polarity of the contributing alpha-helical regions (parallel or antiparallel), and in the pitch length and handedness of the supercoil (left- or right-handed). Functionally, the concept that a coiled coil can act only as a static rod is no longer valid, and the range of roles that these structures have now been shown to exhibit has expanded rapidly in recent years. An important development has been the recognition that the delightful simplicity that exists between sequence and structure, and between structure and function, allows coiled coils with specialized features to be designed de novo.

  5. Molecular cloning and characterization of a gene encoding glutaminase from Aspergillus oryzae.

    PubMed

    Koibuchi, K; Nagasaki, H; Yuasa, A; Kataoka, J; Kitamoto, K

    2000-07-01

    A glutaminase from Aspergillus oryzae was purified and its molecular weight was determined to be 82,091 by matrix-assisted laser desorption ionization time-of-flight mass spectrometry. Purified glutaminase catalysed the hydrolysis not only of L-glutamine but also of D-glutamine. Both the molecular weight and the substrate specificity of this glutaminase were different from those reported previously [Yano et al. (1998) J Ferment Technol 66: 137-143]. On the basis of its internal amino acid sequences, we have isolated and characterized the glutaminase gene (gtaA) from A. oryzae. The gtaA gene had an open reading frame coding for 690 amino acid residues, including a signal peptide of 20 amino acid residues and a mature protein of 670 amino acid residues. In the 5'-flanking region of the gene, there were three putative CreAp binding sequences and one putative AreAp binding sequence. The gtaA structural gene was introduced into A. oryzae NS4 and a marked increase in activity was detected in comparison with the control strain. The gtaA gene was also isolated from Aspergillus nidulans on the basis of the determined nucleotide sequence of the gtaA gene from A. oryzae.

  6. Active site of tripeptidyl peptidase II from human erythrocytes is of the subtilisin type

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Tomkinson, B.; Wernstedt, C.; Hellman, U.

    1987-11-01

    The present report presents evidence that the amino acid sequence around the serine of the active site of human tripeptidyl peptidase II is of the subtilisin type. The enzyme from human erythrocytes was covalently labeled at its active site with (/sup 3/H)diisopropyl fluorophosphate, and the protein was subsequently reduced, alkylated, and digested with trypsin. The labeled tryptic peptides were purified by gel filtration and repeated reversed-phase HPLC, and their amino-terminal sequences were determined. Residue 9 contained the radioactive label and was, therefore, considered to be the active serine residue. The primary structure of the part of the active site (residuesmore » 1-10) containing this residue was concluded to be Xaa-Thr-Gln-Leu-Met-Asx-Gly-Thr-Ser-Met. This amino acid sequence is homologous to the sequence surrounding the active serine of the microbial peptidases subtilisin and thermitase. These data demonstrate that human tripeptidyl peptidase II represents a potentially distinct class of human peptidases and raise the question of an evolutionary relationship between the active site of a mammalian peptidase and that of the subtilisin family of serine peptidases.« less

  7. Structure of the horseradish peroxidase isozyme C genes.

    PubMed

    Fujiyama, K; Takemura, H; Shibayama, S; Kobayashi, K; Choi, J K; Shinmyo, A; Takano, M; Yamada, Y; Okada, H

    1988-05-02

    We have isolated, cloned and characterized three cDNAs and two genomic DNAs corresponding to the mRNAs and genes for the horseradish (Armoracia rusticana) peroxidase isoenzyme C (HPR C). The amino acid sequence of HRP C1, deduced from the nucleotide sequence of one of the cDNA clone, pSK1, contained the same primary sequence as that of the purified enzyme established by Welinder [FEBS Lett. 72, 19-23 (1976)] with additional sequences at the N and C terminal. All three inserts in the cDNA clones, pSK1, pSK2 and pSK3, coded the same size of peptide (308 amino acid residues) if these are processed in the same way, and the amino acid sequence were homologous to each other by 91-94%. Functional amino acids, including His40, His170, Tyr185 and Arg183 and S-S-bond-forming Cys, were conserved in the three isozymes, but a few N-glycosylation sites were not the same. Two HRP C isoenzyme genomic genes, prxC1 and prxC2, were tandem on the chromosomal DNA and each gene consisted of four exons and three introns. The positions in the exons interrupted by introns were the same in two genes. We observed a putative promoter sequence 5' upstream and a poly(A) signal 3' downstream in both genes. The gene product of prxC1 might be processed with a signal sequence of 30 amino acid residues at the N terminus and a peptide consisting of 15 amino acid residues at the C terminus.

  8. Viral morphogenesis is the dominant source of sequence censorship in M13 combinatorial peptide phage display.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Rodi, D. J.; Soares, A. S.; Makowski, L.

    Novel statistical methods have been developed and used to quantitate and annotate the sequence diversity within combinatorial peptide libraries on the basis of small numbers (1-200) of sequences selected at random from commercially available M13 p3-based phage display libraries. These libraries behave statistically as though they correspond to populations containing roughly 4.0{+-}1.6% of the random dodecapeptides and 7.9{+-}2.6% of the random constrained heptapeptides that are theoretically possible within the phage populations. Analysis of amino acid residue occurrence patterns shows no demonstrable influence on sequence censorship by Escherichia coli tRNA isoacceptor profiles or either overall codon or Class II codon usagemore » patterns, suggesting no metabolic constraints on recombinant p3 synthesis. There is an overall depression in the occurrence of cysteine, arginine and glycine residues and an overabundance of proline, threonine and histidine residues. The majority of position-dependent amino acid sequence bias is clustered at three positions within the inserted peptides of the dodecapeptide library, +1, +3 and +12 downstream from the signal peptidase cleavage site. Conformational tendency measures of the peptides indicate a significant preference for inserts favoring a {beta}-turn conformation. The observed protein sequence limitations can primarily be attributed to genetic codon degeneracy and signal peptidase cleavage preferences. These data suggest that for applications in which maximal sequence diversity is essential, such as epitope mapping or novel receptor identification, combinatorial peptide libraries should be constructed using codon-corrected trinucleotide cassettes within vector-host systems designed to minimize morphogenesis-related censorship.« less

  9. An inter-residue network model to identify mutational-constrained regions on the Ebola coat glycoprotein

    PubMed Central

    Quinlan, Devin S.; Raman, Rahul; Tharakaraman, Kannan; Subramanian, Vidya; del Hierro, Gabriella; Sasisekharan, Ram

    2017-01-01

    Recently, progress has been made in the development of vaccines and monoclonal antibody cocktails that target the Ebola coat glycoprotein (GP). Based on the mutation rates for Ebola virus given its natural sequence evolution, these treatment strategies are likely to impose additional selection pressure to drive acquisition of mutations in GP that escape neutralization. Given the high degree of sequence conservation among GP of Ebola viruses, it would be challenging to determine the propensity of acquiring mutations in response to vaccine or treatment with one or a cocktail of monoclonal antibodies. In this study, we analyzed the mutability of each residue using an approach that captures the structural constraints on mutability based on the extent of its inter-residue interaction network within the three-dimensional structure of the trimeric GP. This analysis showed two distinct clusters of highly networked residues along the GP1-GP2 interface, part of which overlapped with epitope surfaces of known neutralizing antibodies. This network approach also permitted us to identify additional residues in the network of the known hotspot residues of different anti-Ebola antibodies that would impact antibody-epitope interactions. PMID:28397835

  10. The alphabet of intrinsic disorder

    PubMed Central

    Uversky, Vladimir N

    2013-01-01

    The ability of a protein to fold into unique functional state or to stay intrinsically disordered is encoded in its amino acid sequence. Both ordered and intrinsically disordered proteins (IDPs) are natural polypeptides that use the same arsenal of 20 proteinogenic amino acid residues as their major building blocks. The exceptional structural plasticity of IDPs, their capability to exist as heterogeneous structural ensembles and their wide array of important disorder-based biological functions that complements functional repertoire of ordered proteins are all rooted within the peculiar differential usage of these building blocks by ordered proteins and IDPs. In fact, some residues (so-called disorder-promoting residues) are noticeably more common in IDPs than in sequences of ordered proteins, which, in their turn, are enriched in several order-promoting residues. Furthermore, residues can be arranged according to their “disorder promoting potencies,” which are evaluated based on the relative abundances of various amino acids in ordered and disordered proteins. This review continues a series of publications on the roles of different amino acids in defining the phenomenon of protein intrinsic disorder and concerns glutamic acid, which is the second most disorder-promoting residue. PMID:28516010

  11. Variability and repertoire size of T-cell receptor V alpha gene segments.

    PubMed

    Becker, D M; Pattern, P; Chien, Y; Yokota, T; Eshhar, Z; Giedlin, M; Gascoigne, N R; Goodnow, C; Wolf, R; Arai, K

    The immune system of higher organisms is composed largely of two distinct cell types, B lymphocytes and T lymphocytes, each of which is independently capable of recognizing an enormous number of distinct entities through their antigen receptors; surface immunoglobulin in the case of the former, and the T-cell receptor (TCR) in the case of the latter. In both cell types, the genes encoding the antigen receptors consist of multiple gene segments which recombine during maturation to produce many possible peptides. One striking difference between B- and T-cell recognition that has not yet been resolved by the structural data is the fact that T cells generally require a major histocompatibility determinant together with an antigen whereas, in most cases, antibodies recognize antigen alone. Recently, we and others have found that a series of TCR V beta gene sequences show conservation of many of the same residues that are conserved between heavy- and light-chain immunoglobulin V regions, and these V beta sequences are predicted to have an immunoglobulin-like secondary structure. To extend these studies, we have isolated and sequenced eight additional alpha-chain complementary cDNA clones and compared them with published sequences. Analyses of these sequences, reported here, indicate that V alpha regions have many of the characteristics of V beta gene segments but differ in that they almost always occur as cross-hybridizing gene families. We conclude that there may be very different selective pressures operating on V alpha and V beta sequences and that the V alpha repertoire may be considerably larger than that of V beta.

  12. Complete Amino Acid Sequence of a Copper/Zinc-Superoxide Dismutase from Ginger Rhizome.

    PubMed

    Nishiyama, Yuki; Fukamizo, Tamo; Yoneda, Kazunari; Araki, Tomohiro

    2017-04-01

    Superoxide dismutase (SOD) is an antioxidant enzyme protecting cells from oxidative stress. Ginger (Zingiber officinale) is known for its antioxidant properties, however, there are no data on SODs from ginger rhizomes. In this study, we purified SOD from the rhizome of Z. officinale (Zo-SOD) and determined its complete amino acid sequence using N terminal sequencing, amino acid analysis, and de novo sequencing by tandem mass spectrometry. Zo-SOD consists of 151 amino acids with two signature Cu/Zn-SOD motifs and has high similarity to other plant Cu/Zn-SODs. Multiple sequence alignment showed that Cu/Zn-binding residues and cysteines forming a disulfide bond, which are highly conserved in Cu/Zn-SODs, are also present in Zo-SOD. Phylogenetic analysis revealed that plant Cu/Zn-SODs clustered into distinct chloroplastic, cytoplasmic, and intermediate groups. Among them, only chloroplastic enzymes carried amino acid substitutions in the region functionally important for enzymatic activity, suggesting that chloroplastic SODs may have a function distinct from those of SODs localized in other subcellular compartments. The nucleotide sequence of the Zo-SOD coding region was obtained by reverse-translation, and the gene was synthesized, cloned, and expressed. The recombinant Zo-SOD demonstrated pH stability in the range of 5-10, which is similar to other reported Cu/Zn-SODs, and thermal stability in the range of 10-60 °C, which is higher than that for most plant Cu/Zn-SODs but lower compared to the enzyme from a Z. officinale relative Curcuma aromatica.

  13. Evolution of bacterial-like phosphoprotein phosphatases in photosynthetic eukaryotes features ancestral mitochondrial or archaeal origin and possible lateral gene transfer.

    PubMed

    Uhrig, R Glen; Kerk, David; Moorhead, Greg B

    2013-12-01

    Protein phosphorylation is a reversible regulatory process catalyzed by the opposing reactions of protein kinases and phosphatases, which are central to the proper functioning of the cell. Dysfunction of members in either the protein kinase or phosphatase family can have wide-ranging deleterious effects in both metazoans and plants alike. Previously, three bacterial-like phosphoprotein phosphatase classes were uncovered in eukaryotes and named according to the bacterial sequences with which they have the greatest similarity: Shewanella-like (SLP), Rhizobiales-like (RLPH), and ApaH-like (ALPH) phosphatases. Utilizing the wealth of data resulting from recently sequenced complete eukaryotic genomes, we conducted database searching by hidden Markov models, multiple sequence alignment, and phylogenetic tree inference with Bayesian and maximum likelihood methods to elucidate the pattern of evolution of eukaryotic bacterial-like phosphoprotein phosphatase sequences, which are predominantly distributed in photosynthetic eukaryotes. We uncovered a pattern of ancestral mitochondrial (SLP and RLPH) or archaeal (ALPH) gene entry into eukaryotes, supplemented by possible instances of lateral gene transfer between bacteria and eukaryotes. In addition to the previously known green algal and plant SLP1 and SLP2 protein forms, a more ancestral third form (SLP3) was found in green algae. Data from in silico subcellular localization predictions revealed class-specific differences in plants likely to result in distinct functions, and for SLP sequences, distinctive and possibly functionally significant differences between plants and nonphotosynthetic eukaryotes. Conserved carboxyl-terminal sequence motifs with class-specific patterns of residue substitutions, most prominent in photosynthetic organisms, raise the possibility of complex interactions with regulatory proteins.

  14. Comprehensively Surveying Structure and Function of RING Domains from Drosophila melanogaster

    PubMed Central

    Wu, Yuehao; Wan, Fusheng; Huang, Chunhong; Jie, Kemin

    2011-01-01

    Using a complete set of RING domains from Drosophila melanogaster, all the solved RING domains and cocrystal structures of RING-containing ubiquitin-ligases (RING-E3) and ubiquitin-conjugating enzyme (E2) pairs, we analyzed RING domains structures from their primary to quarternary structures. The results showed that: i) putative orthologs of RING domains between Drosophila melanogaster and the human largely occur (118/139, 84.9%); ii) of the 118 orthologous pairs from Drosophila melanogaster and the human, 117 pairs (117/118, 99.2%) were found to retain entirely uniform domain architectures, only Iap2/Diap2 experienced evolutionary expansion of domain architecture; iii) 4 evolutionary structurally conserved regions (SCRs) are responsible for homologous folding of RING domains at the superfamily level; iv) besides the conserved Cys/His chelating zinc ions, 6 equivalent residues (4 hydrophobic and 2 polar residues) in the SCRs possess good-consensus and conservation- these 4 SCRs function in the structural positioning of 6 equivalent residues as determinants for RING-E3 catalysis; v) members of these RING proteins located nucleus, multiple subcellular compartments, membrane protein and mitochondrion are respectively 42 (42/139, 30.2%), 71 (71/139, 51.1%), 22 (22/139, 15.8%) and 4 (4/139, 2.9%); vi) CG15104 (Topors) and CG1134 (Mul1) in C3HC4, and CG3929 (Deltex) in C3H2C3 seem to display broader E2s binding profiles than other RING-E3s; vii) analyzing intermolecular interfaces of E2/RING-E3 complexes indicate that residues directly interacting with E2s are all from the SCRs in RING domains. Of the 6 residues, 2 hydrophobic ones contribute to constructing the conserved hydrophobic core, while the 2 hydrophobic and 2 polar residues directly participate in E2/RING-E3 interactions. Based on sequence and structural data, SCRs, conserved equivalent residues and features of intermolecular interfaces were extracted, highlighting the presence of a nucleus for RING domain fold and formation of catalytic core in which related residues and regions exhibit preferential evolutionary conservation. PMID:21912646

  15. Localization and characterization of an alpha-thrombin-binding site on platelet glycoprotein Ib alpha.

    PubMed

    De Marco, L; Mazzucato, M; Masotti, A; Ruggeri, Z M

    1994-03-04

    Glycoprotein (GP) Ib alpha is required for expression of the highest affinity alpha-thrombin-binding site on platelets, possibly contributing to platelet activation through a pathway involving cleavage of a specific receptor. This function may be important for the initiation of hemostasis and may also play a role in the development of pathological vascular occlusion. We have now identified a discrete sequence in the extracytoplasmic domain of GP Ib alpha, including residues 271-284 of the mature protein, which appears to be part of the high affinity alpha-thrombin-binding site. Synthetic peptidyl mimetics of this sequence inhibit alpha-thrombin binding to GP Ib as well as platelet activation and aggregation induced by subnanomolar concentrations of the agonist; they also inhibit alpha-thrombin binding to purified glycocalicin, the isolated extracytoplasmic portion of GP Ib alpha. The inhibitory peptides interfere with the clotting of fibrinogen by alpha-thrombin but not with the amidolytic activity of the enzyme on a small synthetic substrate, a finding compatible with the concept that the identified GP Ib alpha sequence interacts with the anion-binding exosite of alpha-thrombin but not with its active proteolytic site. The crucial structural elements of this sequence necessary for thrombin binding appear to be a cluster of negatively charged residues as well as three tyrosine residues that, in the native protein, may be sulfated. GP Ib alpha has no significant overall sequence homology with the thrombin inhibitor, hirudin, nor with the specific thrombin receptor on platelets; all three molecules, however, possess a distinct region rich in negatively charged residues that appear to be involved in thrombin binding. This may represent a case of convergent evolution of unrelated proteins for high affinity interaction with the same ligand.

  16. Structural protein descriptors in 1-dimension and their sequence-based predictions.

    PubMed

    Kurgan, Lukasz; Disfani, Fatemeh Miri

    2011-09-01

    The last few decades observed an increasing interest in development and application of 1-dimensional (1D) descriptors of protein structure. These descriptors project 3D structural features onto 1D strings of residue-wise structural assignments. They cover a wide-range of structural aspects including conformation of the backbone, burying depth/solvent exposure and flexibility of residues, and inter-chain residue-residue contacts. We perform first-of-its-kind comprehensive comparative review of the existing 1D structural descriptors. We define, review and categorize ten structural descriptors and we also describe, summarize and contrast over eighty computational models that are used to predict these descriptors from the protein sequences. We show that the majority of the recent sequence-based predictors utilize machine learning models, with the most popular being neural networks, support vector machines, hidden Markov models, and support vector and linear regressions. These methods provide high-throughput predictions and most of them are accessible to a non-expert user via web servers and/or stand-alone software packages. We empirically evaluate several recent sequence-based predictors of secondary structure, disorder, and solvent accessibility descriptors using a benchmark set based on CASP8 targets. Our analysis shows that the secondary structure can be predicted with over 80% accuracy and segment overlap (SOV), disorder with over 0.9 AUC, 0.6 Matthews Correlation Coefficient (MCC), and 75% SOV, and relative solvent accessibility with PCC of 0.7 and MCC of 0.6 (0.86 when homology is used). We demonstrate that the secondary structure predicted from sequence without the use of homology modeling is as good as the structure extracted from the 3D folds predicted by top-performing template-based methods.

  17. Two different groups of signal sequence in M-superfamily conotoxins.

    PubMed

    Wang, Qi; Jiang, Hui; Han, Yu-Hong; Yuan, Duo-Duo; Chi, Cheng-Wu

    2008-04-01

    M-superfamily conotoxins can be divided into four branches (M-1, M-2, M-3 and M-4) according to the number of amino acid residues in the third Cys loop. In general, it is widely accepted that the conotoxin signal peptides of each superfamily are strictly conserved. Recently, we cloned six cDNAs of novel M-superfamily conotoxins from Conus leopardus, Conus marmoreus and Conus quercinus, belonging to either M-1 or M-3 branch. These conotoxins, judging from the putative peptide sequences deducted from cDNAs, are rich in acidic residues and share highly conserved signal and pro-peptide region. However, they are quite different from the reported conotoxins of M-2 and M-4 branches even in their signal peptides, which in general are considered highly conserved for each superfamily of conotoxins. The signal sequences of M-1 and M-3 conotoxins composed of 24 residues start with MLKMGVVL-, while those of M-2 and M-4 conotoxins composed of 25 residues start with MMSKLGVL-. It is another example that different types of signal peptides can exist within a superfamily besides the I-conotoxin superfamily. In addition to the different disulfide connectivity of M-1 conotoxins from that of M-4 or M-2 conotoxins, the sequence alignment, preferential Cys codon usage and phylogenetic tree analysis suggest that M-1 and M-3 conotoxins have much closer relationship, being different from the conotoxins of other two branches (M-4 and M-2) of M-superfamily.

  18. Analysis of correlated mutations in HIV-1 protease using spectral clustering.

    PubMed

    Liu, Ying; Eyal, Eran; Bahar, Ivet

    2008-05-15

    The ability of human immunodeficiency virus-1 (HIV-1) protease to develop mutations that confer multi-drug resistance (MDR) has been a major obstacle in designing rational therapies against HIV. Resistance is usually imparted by a cooperative mechanism that can be elucidated by a covariance analysis of sequence data. Identification of such correlated substitutions of amino acids may be obscured by evolutionary noise. HIV-1 protease sequences from patients subjected to different specific treatments (set 1), and from untreated patients (set 2) were subjected to sequence covariance analysis by evaluating the mutual information (MI) between all residue pairs. Spectral clustering of the resulting covariance matrices disclosed two distinctive clusters of correlated residues: the first, observed in set 1 but absent in set 2, contained residues involved in MDR acquisition; and the second, included those residues differentiated in the various HIV-1 protease subtypes, shortly referred to as the phylogenetic cluster. The MDR cluster occupies sites close to the central symmetry axis of the enzyme, which overlap with the global hinge region identified from coarse-grained normal-mode analysis of the enzyme structure. The phylogenetic cluster, on the other hand, occupies solvent-exposed and highly mobile regions. This study demonstrates (i) the possibility of distinguishing between the correlated substitutions resulting from neutral mutations and those induced by MDR upon appropriate clustering analysis of sequence covariance data and (ii) a connection between global dynamics and functional substitution of amino acids.

  19. Immunological and biochemical characterization of processing products from the neurotensin/neuromedin N precursor in the rat medullary thyroid carcinoma 6-23 cell line.

    PubMed Central

    Bidard, J N; de Nadai, F; Rovere, C; Moinier, D; Laur, J; Martinez, J; Cuber, J C; Kitabgi, P

    1993-01-01

    Neurotensin (NT) and neuromedin N (NN) are two related biologically active peptides that are encoded in the same precursor molecule. In the rat, the precursor consists of a 169-residue polypeptide starting with an N-terminal signal peptide and containing in its C-terminal region one copy each of NT and NN. NN precedes NT and is separated from it by a Lys-Arg sequence. Two other Lys-Arg sequences flank the N-terminus of NN and the C-terminus of NT. A fourth Lys-Arg sequence occurs near the middle of the precursor and is followed by an NN-like sequence. Finally, an Arg-Arg pair is present within the NT moiety. The four Lys-Arg doublets represent putative processing sites in the precursor molecule. The present study was designed to investigate the post-translational processing of the NT/NN precursor in the rat medullary thyroid carcinoma (rMTC) 6-23 cell line, which synthesizes large amounts of NT upon dexamethasone treatment. Five region-specific antisera recognizing the free N- or C-termini of sequences adjacent to the basic doublets were produced, characterized and used for immunoblotting and radioimmunoassay studies in combination with gel filtration, reverse-phase h.p.l.c. and trypsin digestion of rMTC 6-23 cell extracts. Because two of the antigenic sequences, i.e. NN and the NN-like sequence, start with a lysine residue that is essential for recognition by their respective antisera, a micromethod by which trypsin specifically cleaves at arginine residues was developed. The results show that dexamethasone-treated rMTC 6-23 cells produced comparable amounts of NT, NN and a peptide corresponding to a large N-terminal precursor fragment lacking the NN and NT moieties. This large fragment was purified. N-Terminal sequencing revealed that it started at residue Ser23 of the prepro-NT/NN sequence, and thus established the Cys22-Ser23 bond as the cleavage site of the signal peptide. Two other large N-terminal fragments bearing respectively the NN and NT sequences at their C-termini were present in lower amounts. The NN-like sequence was internal to all the large fragments. There was no evidence for the presence of peptides with the NN-like sequence at their N-termini. This shows that, in rMTC 6-23 cells, the precursor is readily processed at the three Lys-Arg doublets that flank and separate the NT and NN sequences. In contrast, the Lys-Arg doublet that precedes the NN-like sequence is not processed in this system.(ABSTRACT TRUNCATED AT 400 WORDS) Images Figure 3 PMID:8471039

  20. High accuracy prediction of beta-turns and their types using propensities and multiple alignments.

    PubMed

    Fuchs, Patrick F J; Alix, Alain J P

    2005-06-01

    We have developed a method that predicts both the presence and the type of beta-turns, using a straightforward approach based on propensities and multiple alignments. The propensities were calculated classically, but the way to use them for prediction was completely new: starting from a tetrapeptide sequence on which one wants to evaluate the presence of a beta-turn, the propensity for a given residue is modified by taking into account all the residues present in the multiple alignment at this position. The evaluation of a score is then done by weighting these propensities by the use of Position-specific score matrices generated by PSI-BLAST. The introduction of secondary structure information predicted by PSIPRED or SSPRO2 as well as taking into account the flanking residues around the tetrapeptide improved the accuracy greatly. This latter evaluated on a database of 426 reference proteins (previously used on other studies) by a sevenfold crossvalidation gave very good results with a Matthews Correlation Coefficient (MCC) of 0.42 and an overall prediction accuracy of 74.8%; this places our method among the best ones. A jackknife test was also done, which gave results within the same range. This shows that it is possible to reach neural networks accuracy with considerably less computional cost and complexity. Furthermore, propensities remain excellent descriptors of amino acid tendencies to belong to beta-turns, which can be useful for peptide or protein engineering and design. For beta-turn type prediction, we reached the best accuracy ever published in terms of MCC (except for the irregular type IV) in the range of 0.25-0.30 for types I, II, and I' and 0.13-0.15 for types VIII, II', and IV. To our knowledge, our method is the only one available on the Web that predicts types I' and II'. The accuracy evaluated on two larger databases of 547 and 823 proteins was not improved significantly. All of this was implemented into a Web server called COUDES (French acronym for: Chercher Ou Une Deviation Existe Surement), which is available at the following URL: http://bioserv.rpbs.jussieu.fr/Coudes/index.html within the new bioinformatics platform RPBS.

  1. MANGO: a new approach to multiple sequence alignment.

    PubMed

    Zhang, Zefeng; Lin, Hao; Li, Ming

    2007-01-01

    Multiple sequence alignment is a classical and challenging task for biological sequence analysis. The problem is NP-hard. The full dynamic programming takes too much time. The progressive alignment heuristics adopted by most state of the art multiple sequence alignment programs suffer from the 'once a gap, always a gap' phenomenon. Is there a radically new way to do multiple sequence alignment? This paper introduces a novel and orthogonal multiple sequence alignment method, using multiple optimized spaced seeds and new algorithms to handle these seeds efficiently. Our new algorithm processes information of all sequences as a whole, avoiding problems caused by the popular progressive approaches. Because the optimized spaced seeds are provably significantly more sensitive than the consecutive k-mers, the new approach promises to be more accurate and reliable. To validate our new approach, we have implemented MANGO: Multiple Alignment with N Gapped Oligos. Experiments were carried out on large 16S RNA benchmarks showing that MANGO compares favorably, in both accuracy and speed, against state-of-art multiple sequence alignment methods, including ClustalW 1.83, MUSCLE 3.6, MAFFT 5.861, Prob-ConsRNA 1.11, Dialign 2.2.1, DIALIGN-T 0.2.1, T-Coffee 4.85, POA 2.0 and Kalign 2.0.

  2. Conserved interdomain linker promotes phase separation of the multivalent adaptor protein Nck

    PubMed Central

    Banjade, Sudeep; Wu, Qiong; Mittal, Anuradha; Peeples, William B.; Pappu, Rohit V.; Rosen, Michael K.

    2015-01-01

    The organization of membranes, the cytosol, and the nucleus of eukaryotic cells can be controlled through phase separation of lipids, proteins, and nucleic acids. Collective interactions of multivalent molecules mediated by modular binding domains can induce gelation and phase separation in several cytosolic and membrane-associated systems. The adaptor protein Nck has three SRC-homology 3 (SH3) domains that bind multiple proline-rich segments in the actin regulatory protein neuronal Wiskott-Aldrich syndrome protein (N-WASP) and an SH2 domain that binds to multiple phosphotyrosine sites in the adhesion protein nephrin, leading to phase separation. Here, we show that the 50-residue linker between the first two SH3 domains of Nck enhances phase separation of Nck/N-WASP/nephrin assemblies. Two linear motifs within this element, as well as its overall positively charged character, are important for this effect. The linker increases the driving force for self-assembly of Nck, likely through weak interactions with the second SH3 domain, and this effect appears to promote phase separation. The linker sequence is highly conserved, suggesting that the sequence determinants of the driving forces for phase separation may be generally important to Nck functions. Our studies demonstrate that linker regions between modular domains can contribute to the driving forces for self-assembly and phase separation of multivalent proteins. PMID:26553976

  3. Identification and characterization of novel multiple bacteriocins produced by Leuconostoc pseudomesenteroides QU 15.

    PubMed

    Sawa, N; Okamura, K; Zendo, T; Himeno, K; Nakayama, J; Sonomoto, K

    2010-07-01

    To characterize novel multiple bacteriocins produced by Leuconostoc pseudomesenteroides QU 15. Leuconostoc pseudomesenteroides QU 15 isolated from Nukadoko (rice bran bed) produced novel bacteriocins. By using three purification steps, four antimicrobial peptides termed leucocin A (ΔC7), leucocin A-QU 15, leucocin Q and leucocin N were purified from the culture supernatant. The amino acid sequences of leucocin A (ΔC7) and leucocin A-QU 15 were identical to that of leucocin A-UAL 187 belonging to class IIa bacteriocins, but leucocin A (ΔC7) was deficient in seven C-terminal residues. Leucocin Q and leucocin N are novel class IId bacteriocins. Moreover, the DNA sequences encoding three bacteriocins, leucocin A-QU 15, leucocin Q and leucocin N were obtained. These bacteriocins including two novel bacteriocins were identified from Leuc. pseudomesenteroides QU 15. They showed similar antimicrobial spectra, but their intensities differed. The C-terminal region of leucocin A-QU 15 was important for its antimicrobial activity. Leucocins Q and N were encoded by adjacent open reading frames (ORFs) in the same operon, but leucocin A-QU 15 was not. These leucocins were produced concomitantly by the same strain. Although the two novel bacteriocins were encoded by adjacent ORFs, a characteristic of class IIb bacteriocins, they did not show synergistic activity. © 2010 The Authors. Journal compilation © 2010 The Society for Applied Microbiology.

  4. Computational prediction of protein hot spot residues.

    PubMed

    Morrow, John Kenneth; Zhang, Shuxing

    2012-01-01

    Most biological processes involve multiple proteins interacting with each other. It has been recently discovered that certain residues in these protein-protein interactions, which are called hot spots, contribute more significantly to binding affinity than others. Hot spot residues have unique and diverse energetic properties that make them challenging yet important targets in the modulation of protein-protein complexes. Design of therapeutic agents that interact with hot spot residues has proven to be a valid methodology in disrupting unwanted protein-protein interactions. Using biological methods to determine which residues are hot spots can be costly and time consuming. Recent advances in computational approaches to predict hot spots have incorporated a myriad of features, and have shown increasing predictive successes. Here we review the state of knowledge around protein-protein interactions, hot spots, and give an overview of multiple in silico prediction techniques of hot spot residues.

  5. Computational Prediction of Hot Spot Residues

    PubMed Central

    Morrow, John Kenneth; Zhang, Shuxing

    2013-01-01

    Most biological processes involve multiple proteins interacting with each other. It has been recently discovered that certain residues in these protein-protein interactions, which are called hot spots, contribute more significantly to binding affinity than others. Hot spot residues have unique and diverse energetic properties that make them challenging yet important targets in the modulation of protein-protein complexes. Design of therapeutic agents that interact with hot spot residues has proven to be a valid methodology in disrupting unwanted protein-protein interactions. Using biological methods to determine which residues are hot spots can be costly and time consuming. Recent advances in computational approaches to predict hot spots have incorporated a myriad of features, and have shown increasing predictive successes. Here we review the state of knowledge around protein-protein interactions, hot spots, and give an overview of multiple in silico prediction techniques of hot spot residues. PMID:22316154

  6. Protein interface classification by evolutionary analysis

    PubMed Central

    2012-01-01

    Background Distinguishing biologically relevant interfaces from lattice contacts in protein crystals is a fundamental problem in structural biology. Despite efforts towards the computational prediction of interface character, many issues are still unresolved. Results We present here a protein-protein interface classifier that relies on evolutionary data to detect the biological character of interfaces. The classifier uses a simple geometric measure, number of core residues, and two evolutionary indicators based on the sequence entropy of homolog sequences. Both aim at detecting differential selection pressure between interface core and rim or rest of surface. The core residues, defined as fully buried residues (>95% burial), appear to be fundamental determinants of biological interfaces: their number is in itself a powerful discriminator of interface character and together with the evolutionary measures it is able to clearly distinguish evolved biological contacts from crystal ones. We demonstrate that this definition of core residues leads to distinctively better results than earlier definitions from the literature. The stringent selection and quality filtering of structural and sequence data was key to the success of the method. Most importantly we demonstrate that a more conservative selection of homolog sequences - with relatively high sequence identities to the query - is able to produce a clearer signal than previous attempts. Conclusions An evolutionary approach like the one presented here is key to the advancement of the field, which so far was missing an effective method exploiting the evolutionary character of protein interfaces. Its coverage and performance will only improve over time thanks to the incessant growth of sequence databases. Currently our method reaches an accuracy of 89% in classifying interfaces of the Ponstingl 2003 datasets and it lends itself to a variety of useful applications in structural biology and bioinformatics. We made the corresponding software implementation available to the community as an easy-to-use graphical web interface at http://www.eppic-web.org. PMID:23259833

  7. Sequence characterization of cDNA sequence of encoding of an antimicrobial Peptide with no disulfide bridge from the Iranian mesobuthus eupeus venomous glands.

    PubMed

    Farajzadeh-Sheikh, Ahmad; Jolodar, Abbas; Ghaemmaghami, Shamsedin

    2013-01-01

    Scorpion venom glands produce some antimicrobial peptides (AMP) that can rapidly kill a broad range of microbes and have additional activities that impact on the quality and effectiveness of innate responses and inflammation. In this study, we reported the identification of a cDNA sequence encoding cysteine-free antimicrobial peptides isolated from venomous glands of this species. Total RNA was extracted from the Iranian mesobuthus eupeus venom glands, and cDNA was synthesized by using the modified oligo (dT). The cDNA was used as the template for applying Semi-nested RT- PCR technique. PCR Products were used for direct nucleotide sequencing and the results were compared with Gen Bank database. A 213 BP cDNA fragment encoding the entire coding region of an antimicrobial toxin from the Iranian scorpion M. Eupeus venom glands were isolated. The full-length sequence of the coding region was 210 BP contained an open reading frame of 70 amino with a predicted molecular mass of 7970.48 Da and theoretical Pi of 9.10. The open reading frame consists of 210 BP encoding a precursor of 70 amino acid residues, including a signal peptide of 23 residues a propertied of 7 residues, and a mature peptide of 34 residues with no disulfide bridge. The peptide has detectable sequence identity to the Lesser Asian mesobuthus eupeus MeVAMP-2 (98%), MeVAMP-9 (60%) and several previously described AMPs from other scorpion venoms including mesobuthus martensii (94%) and buthus occitanus Israelis (82%). The secondary structure of the peptide mainly consisted of α-helical structure which was generally conserved by previously reported scorpion counterparts. The phylogenetic analysis showed that the Iranian MeAMP-like toxin was similar but not identical with that of venom antimicrobial peptides from lesser Asian scorpion mesobuthus eupeus.

  8. Thermal deformations and stresses in composite materials

    NASA Technical Reports Server (NTRS)

    Daniel, I. M.

    1980-01-01

    Residual stresses are induced during curing in angle-ply laminates as a result of anisotropic thermal deformations of the variously oriented plies. Residual strains are measured experimentally using embedded strain gage techniques, and residual stresses are computed using orthotropic stress-strain relations. The results show that, for graphite and Kevlar laminates, residual stresses at room temperature are high enough to cause damage in the plies in the transverse to the fiber direction. It is also shown that residual stresses do not relax appreciably. The ply stacking sequence is found to have no effect on the magnitude of average residual stresses. Residual stresses and susceptibility to cracking during curing depend to a marked extent on ply layup.

  9. Using msa-2b as a molecular marker for genotyping Mexican isolates of Babesia bovis.

    PubMed

    Genis, Alma D; Perez, Jocelin; Mosqueda, Juan J; Alvarez, Antonio; Camacho, Minerva; Muñoz, Maria de Lourdes; Rojas, Carmen; Figueroa, Julio V

    2009-12-01

    Variable merozoite surface antigens of Babesia bovis are exposed glycoproteins having a role in erythrocyte invasion. Members of this gene family include msa-1 and msa-2 (msa-2c, msa-2a(1), msa-2a(2) and msa-2b). To determine the sequence variation among B. bovis Mexican isolates using msa-2b as a genetic marker, PCR amplicons corresponding to msa-2b were cloned and plasmids carrying the corresponding inserts were purified and sequenced. Comparative analysis of nucleotide and deduced amino acid sequences revealed distinct degrees of variability and identity among the coding gene sequences obtained from 16 geographically different Mexican B. bovis isolates and a reference strain. Clustal-W multiple alignments of the MSA-2b deduced amino acid sequences performed with the 17 B. bovis Mexican isolates, revealed the identification of three genotypes with a distinct set each of amino acid residues present at the variable region: Genotype I represented by the MO7 strain (in vitro culture-derived from the Mexico isolate) as well as RAD, Chiapas-1, Tabasco and Veracruz-3 isolates; Genotype II, represented by the Jalisco, Mexico and Veracruz-2 isolates; and Genotype III comprising the sequences from most of the isolates studied, Tamaulipas-1, Chiapas-2, Guerrero-1, Nayarit, Quintana Roo, Nuevo Leon, Tamaulipas-2, Yucatan and Guerrero-2. Moreover, these three genotypes could be discriminated against each other by using a PCR-RFLP approach. The results suggest that occurrence of indels within the variable region of msa-2b sequences can be useful markers for identifying a particular genotype present in field populations of B. bovis isolated from infected cattle in Mexico.

  10. Induction and maintenance of DNA methylation in plant promoter sequences by apple latent spherical virus-induced transcriptional gene silencing

    PubMed Central

    Kon, Tatsuya; Yoshikawa, Nobuyuki

    2014-01-01

    Apple latent spherical virus (ALSV) is an efficient virus-induced gene silencing vector in functional genomics analyses of a broad range of plant species. Here, an Agrobacterium-mediated inoculation (agroinoculation) system was developed for the ALSV vector, and virus-induced transcriptional gene silencing (VITGS) is described in plants infected with the ALSV vector. The cDNAs of ALSV RNA1 and RNA2 were inserted between the cauliflower mosaic virus 35S promoter and the NOS-T sequences in a binary vector pCAMBIA1300 to produce pCALSR1 and pCALSR2-XSB or pCALSR2-XSB/MN. When these vector constructs were agroinoculated into Nicotiana benthamiana plants with a construct expressing a viral silencing suppressor, the infection efficiency of the vectors was 100%. A recombinant ALSV vector carrying part of the 35S promoter sequence induced transcriptional gene silencing of the green fluorescent protein gene in a line of N. benthamiana plants, resulting in the disappearance of green fluorescence of infected plants. Bisulfite sequencing showed that cytosine residues at CG and CHG sites of the 35S promoter sequence were highly methylated in the silenced generation zero plants infected with the ALSV carrying the promoter sequence as well as in progeny. The ALSV-mediated VITGS state was inherited by progeny for multiple generations. In addition, induction of VITGS of an endogenous gene (chalcone synthase-A) was demonstrated in petunia plants infected with an ALSV vector carrying the native promoter sequence. These results suggest that ALSV-based vectors can be applied to study DNA methylation in plant genomes, and provide a useful tool for plant breeding via epigenetic modification. PMID:25426109

  11. A new molecular evolution model for limited insertion independent of substitution.

    PubMed

    Lèbre, Sophie; Michel, Christian J

    2013-10-01

    We recently introduced a new molecular evolution model called the IDIS model for Insertion Deletion Independent of Substitution [13,14]. In the IDIS model, the three independent processes of substitution, insertion and deletion of residues have constant rates. In order to control the genome expansion during evolution, we generalize here the IDIS model by introducing an insertion rate which decreases when the sequence grows and tends to 0 for a maximum sequence length nmax. This new model, called LIIS for Limited Insertion Independent of Substitution, defines a matrix differential equation satisfied by a vector P(t) describing the sequence content in each residue at evolution time t. An analytical solution is obtained for any diagonalizable substitution matrix M. Thus, the LIIS model gives an expression of the sequence content vector P(t) in each residue under evolution time t as a function of the eigenvalues and the eigenvectors of matrix M, the residue insertion rate vector R, the total insertion rate r, the initial and maximum sequence lengths n0 and nmax, respectively, and the sequence content vector P(t0) at initial time t0. The derivation of the analytical solution is much more technical, compared to the IDIS model, as it involves Gauss hypergeometric functions. Several propositions of the LIIS model are derived: proof that the IDIS model is a particular case of the LIIS model when the maximum sequence length nmax tends to infinity, fixed point, time scale, time step and time inversion. Using a relation between the sequence length l and the evolution time t, an expression of the LIIS model as a function of the sequence length l=n(t) is obtained. Formulas for 'insertion only', i.e. when the substitution rates are all equal to 0, are derived at evolution time t and sequence length l. Analytical solutions of the LIIS model are explicitly derived, as a function of either evolution time t or sequence length l, for two classical substitution matrices: the 3-parameter symmetric substitution matrix [12] (LIIS-SYM3) and the HKY asymmetric substitution matrix[9] (LIIS-HKY). An evaluation of the LIIS model (precisely, LIIS-HKY) based on four statistical analyses of the GC content in complete genomes of four prokaryotic taxonomic groups, namely Chlamydiae, Crenarchaeota, Spirochaetes and Thermotogae, shows the expected improvement from the theory of the LIIS model compared to the IDIS model. Copyright © 2013 Elsevier Inc. All rights reserved.

  12. Selective excitation for spectral editing and assignment in separated local field experiments of oriented membrane proteins

    NASA Astrophysics Data System (ADS)

    Koroloff, Sophie N.; Nevzorov, Alexander A.

    2017-01-01

    Spectroscopic assignment of NMR spectra for oriented uniformly labeled membrane proteins embedded in their native-like bilayer environment is essential for their structure determination. However, sequence-specific assignment in oriented-sample (OS) NMR is often complicated by insufficient resolution and spectral crowding. Therefore, the assignment process is usually done by a laborious and expensive "shotgun" method involving multiple selective labeling of amino acid residues. Presented here is a strategy to overcome poor spectral resolution in crowded regions of 2D spectra by selecting resolved "seed" residues via soft Gaussian pulses inserted into spin-exchange separated local-field experiments. The Gaussian pulse places the selected polarization along the z-axis while dephasing the other signals before the evolution of the 1H-15N dipolar couplings. The transfer of magnetization is accomplished via mismatched Hartmann-Hahn conditions to the nearest-neighbor peaks via the proton bath. By optimizing the length and amplitude of the Gaussian pulse, one can also achieve a phase inversion of the closest peaks, thus providing an additional phase contrast. From the superposition of the selective spin-exchanged SAMPI4 onto the fully excited SAMPI4 spectrum, the 15N sites that are directly adjacent to the selectively excited residues can be easily identified, thereby providing a straightforward method for initiating the assignment process in oriented membrane proteins.

  13. Functional analysis of CedA based on its structure: residues important in binding of DNA and RNA polymerase and in the cell division regulation

    PubMed Central

    Abe, Yoshito; Fujisaki, Naoki; Miyoshi, Takanori; Watanabe, Noriko; Katayama, Tsutomu; Ueda, Tadashi

    2016-01-01

    DnaAcos, a mutant of the initiator DnaA, causes overinitiation of chromosome replication in Escherichia coli, resulting in inhibition of cell division. CedA was found to be a multi-copy suppressor which represses the dnaAcos inhibition of cell division. However, functional mechanism of CedA remains elusive except for previously indicated possibilities in binding to DNA and RNA polymerase. In this study, we searched for the specific sites of CedA in binding of DNA and RNA polymerase and in repression of cell division inhibition. First, DNA sequence to which CedA preferentially binds was determined. Next, the several residues and β4 region in CedA C-terminal domain was suggested to specifically interact with the DNA. Moreover, we found that the flexible N-terminal region was required for tight binding to longer DNA as well as interaction with RNA polymerase. Based on these results, several cedA mutants were examined in ability for repressing dnaAcos cell division inhibition. We found that the N-terminal region was dispensable and that Glu32 in the C-terminal domain was required for the repression. These results suggest that CedA has multiple roles and residues with different functions are positioned in the two regions. PMID:26400504

  14. Structural Basis of the High Affinity Interaction between the Alphavirus Nonstructural Protein-3 (nsP3) and the SH3 Domain of Amphiphysin-2.

    PubMed

    Tossavainen, Helena; Aitio, Olli; Hellman, Maarit; Saksela, Kalle; Permi, Perttu

    2016-07-29

    We show that a peptide from Chikungunya virus nsP3 protein spanning residues 1728-1744 binds the amphiphysin-2 (BIN1) Src homology-3 (SH3) domain with an unusually high affinity (Kd 24 nm). Our NMR solution complex structure together with isothermal titration calorimetry data on several related viral and cellular peptide ligands reveal that this exceptional affinity originates from interactions between multiple basic residues in the target peptide and the extensive negatively charged binding surface of amphiphysin-2 SH3. Remarkably, these arginines show no fixed conformation in the complex structure, indicating that a transient or fluctuating polyelectrostatic interaction accounts for this affinity. Thus, via optimization of such dynamic electrostatic forces, viral peptides have evolved a superior binding affinity for amphiphysin-2 SH3 compared with typical cellular ligands, such as dynamin, thereby enabling hijacking of amphiphysin-2 SH3-regulated host cell processes by these viruses. Moreover, our data show that the previously described consensus sequence PXRPXR for amphiphysin SH3 ligands is inaccurate and instead define it as an extended Class II binding motif PXXPXRpXR, where additional positive charges between the two constant arginine residues can give rise to extraordinary high SH3 binding affinity. © 2016 by The American Society for Biochemistry and Molecular Biology, Inc.

  15. Effective prediction of bacterial type IV secreted effectors by combined features of both C-termini and N-termini.

    PubMed

    Wang, Yu; Guo, Yanzhi; Pu, Xuemei; Li, Menglong

    2017-11-01

    Various bacterial pathogens can deliver their secreted substrates also called as effectors through type IV secretion systems (T4SSs) into host cells and cause diseases. Since T4SS secreted effectors (T4SEs) play important roles in pathogen-host interactions, identifying them is crucial to our understanding of the pathogenic mechanisms of T4SSs. A few computational methods using machine learning algorithms for T4SEs prediction have been developed by using features of C-terminal residues. However, recent studies have shown that targeting information can also be encoded in the N-terminal region of at least some T4SEs. In this study, we present an effective method for T4SEs prediction by novelly integrating both N-terminal and C-terminal sequence information. First, we collected a comprehensive dataset across multiple bacterial species of known T4SEs and non-T4SEs from literatures. Then, three types of distinctive features, namely amino acid composition, composition, transition and distribution and position-specific scoring matrices were calculated for 50 N-terminal and 100 C-terminal residues. After that, we employed information gain represent to rank the importance score of the 150 different position residues for T4SE secretion signaling. At last, 125 distinctive position residues were singled out for the prediction model to classify T4SEs and non-T4SEs. The support vector machine model yields a high receiver operating curve of 0.916 in the fivefold cross-validation and an accuracy of 85.29% for the independent test set.

  16. Effective prediction of bacterial type IV secreted effectors by combined features of both C-termini and N-termini

    NASA Astrophysics Data System (ADS)

    Wang, Yu; Guo, Yanzhi; Pu, Xuemei; Li, Menglong

    2017-11-01

    Various bacterial pathogens can deliver their secreted substrates also called as effectors through type IV secretion systems (T4SSs) into host cells and cause diseases. Since T4SS secreted effectors (T4SEs) play important roles in pathogen-host interactions, identifying them is crucial to our understanding of the pathogenic mechanisms of T4SSs. A few computational methods using machine learning algorithms for T4SEs prediction have been developed by using features of C-terminal residues. However, recent studies have shown that targeting information can also be encoded in the N-terminal region of at least some T4SEs. In this study, we present an effective method for T4SEs prediction by novelly integrating both N-terminal and C-terminal sequence information. First, we collected a comprehensive dataset across multiple bacterial species of known T4SEs and non-T4SEs from literatures. Then, three types of distinctive features, namely amino acid composition, composition, transition and distribution and position-specific scoring matrices were calculated for 50 N-terminal and 100 C-terminal residues. After that, we employed information gain represent to rank the importance score of the 150 different position residues for T4SE secretion signaling. At last, 125 distinctive position residues were singled out for the prediction model to classify T4SEs and non-T4SEs. The support vector machine model yields a high receiver operating curve of 0.916 in the fivefold cross-validation and an accuracy of 85.29% for the independent test set.

  17. CpG PatternFinder: a Windows-based utility program for easy and rapid identification of the CpG methylation status of DNA.

    PubMed

    Xu, Yi-Hua; Manoharan, Herbert T; Pitot, Henry C

    2007-09-01

    The bisulfite genomic sequencing technique is one of the most widely used techniques to study sequence-specific DNA methylation because of its unambiguous ability to reveal DNA methylation status to the order of a single nucleotide. One characteristic feature of the bisulfite genomic sequencing technique is that a number of sample sequence files will be produced from a single DNA sample. The PCR products of bisulfite-treated DNA samples cannot be sequenced directly because they are heterogeneous in nature; therefore they should be cloned into suitable plasmids and then sequenced. This procedure generates an enormous number of sample DNA sequence files as well as adding extra bases belonging to the plasmids to the sequence, which will cause problems in the final sequence comparison. Finding the methylation status for each CpG in each sample sequence is not an easy job. As a result CpG PatternFinder was developed for this purpose. The main functions of the CpG PatternFinder are: (i) to analyze the reference sequence to obtain CpG and non-CpG-C residue position information. (ii) To tailor sample sequence files (delete insertions and mark deletions from the sample sequence files) based on a configuration of ClustalW multiple alignment. (iii) To align sample sequence files with a reference file to obtain bisulfite conversion efficiency and CpG methylation status. And, (iv) to produce graphics, highlighted aligned sequence text and a summary report which can be easily exported to Microsoft Office suite. CpG PatternFinder is designed to operate cooperatively with BioEdit, a freeware on the internet. It can handle up to 100 files of sample DNA sequences simultaneously, and the total CpG pattern analysis process can be finished in minutes. CpG PatternFinder is an ideal software tool for DNA methylation studies to determine the differential methylation pattern in a large number of individuals in a population. Previously we developed the CpG Analyzer program; CpG PatternFinder is our further effort to create software tools for DNA methylation studies.

  18. The immediate upstream region of the 5′-UTR from the AUG start codon has a pronounced effect on the translational efficiency in Arabidopsis thaliana

    PubMed Central

    Kim, Younghyun; Lee, Goeun; Jeon, Eunhyun; Sohn, Eun ju; Lee, Yongjik; Kang, Hyangju; Lee, Dong wook; Kim, Dae Heon; Hwang, Inhwan

    2014-01-01

    The nucleotide sequence around the translational initiation site is an important cis-acting element for post-transcriptional regulation. However, it has not been fully understood how the sequence context at the 5′-untranslated region (5′-UTR) affects the translational efficiency of individual mRNAs. In this study, we provide evidence that the 5′-UTRs of Arabidopsis genes showing a great difference in the nucleotide sequence vary greatly in translational efficiency with more than a 200-fold difference. Of the four types of nucleotides, the A residue was the most favourable nucleotide from positions −1 to −21 of the 5′-UTRs in Arabidopsis genes. In particular, the A residue in the 5′-UTR from positions −1 to −5 was required for a high-level translational efficiency. In contrast, the T residue in the 5′-UTR from positions −1 to −5 was the least favourable nucleotide in translational efficiency. Furthermore, the effect of the sequence context in the −1 to −21 region of the 5′-UTR was conserved in different plant species. Based on these observations, we propose that the sequence context immediately upstream of the AUG initiation codon plays a crucial role in determining the translational efficiency of plant genes. PMID:24084084

  19. FASMA: a service to format and analyze sequences in multiple alignments.

    PubMed

    Costantini, Susan; Colonna, Giovanni; Facchiano, Angelo M

    2007-12-01

    Multiple sequence alignments are successfully applied in many studies for under- standing the structural and functional relations among single nucleic acids and protein sequences as well as whole families. Because of the rapid growth of sequence databases, multiple sequence alignments can often be very large and difficult to visualize and analyze. We offer a new service aimed to visualize and analyze the multiple alignments obtained with different external algorithms, with new features useful for the comparison of the aligned sequences as well as for the creation of a final image of the alignment. The service is named FASMA and is available at http://bioinformatica.isa.cnr.it/FASMA/.

  20. Self-adaptive calibration for staring infrared sensors

    NASA Astrophysics Data System (ADS)

    Kendall, William B.; Stocker, Alan D.

    1993-10-01

    This paper presents a new, self-adaptive technique for the correlation of non-uniformities (fixed-pattern noise) in high-density infrared focal-plane detector arrays. We have developed a new approach to non-uniformity correction in which we use multiple image frames of the scene itself, and take advantage of the aim-point wander caused by jitter, residual tracking errors, or deliberately induced motion. Such wander causes each detector in the array to view multiple scene elements, and each scene element to be viewed by multiple detectors. It is therefore possible to formulate (and solve) a set of simultaneous equations from which correction parameters can be computed for the detectors. We have tested our approach with actual images collected by the ARPA-sponsored MUSIC infrared sensor. For these tests we employed a 60-frame (0.75-second) sequence of terrain images for which an out-of-date calibration was deliberately used. The sensor was aimed at a point on the ground via an operator-assisted tracking system having a maximum aim point wander on the order of ten pixels. With these data, we were able to improve the calibration accuracy by a factor of approximately 100.

  1. Selecting forest residue treatment alternatives using goal programming.

    Treesearch

    Bruce B. Bare; Brian F. Anholt

    1976-01-01

    The use of goal programing for selecting forest residue treatment alternatives within a multiple goal framework is described. The basic features of goal programing are reviewed and illustrated with a hypothetical problem involving the selection of residue treatments for 10 cutting units. Twelve residue-regeneration treatment combinations are evaluated by using physical...

  2. Selection of functional 2A sequences within foot-and-mouth disease virus; requirements for the NPGP motif with a distinct codon bias.

    PubMed

    Kjær, Jonas; Belsham, Graham J

    2018-01-01

    Foot-and-mouth disease virus (FMDV) has a positive-sense ssRNA genome including a single, large, open reading frame. Splitting of the encoded polyprotein at the 2A/2B junction is mediated by the 2A peptide (18 residues long), which induces a nonproteolytic, cotranslational "cleavage" at its own C terminus. A conserved feature among variants of 2A is the C-terminal motif N 16 P 17 G 18 /P 19 , where P 19 is the first residue of 2B. It has been shown previously that certain amino acid substitutions can be tolerated at residues E 14 , S 15 , and N 16 within the 2A sequence of infectious FMDVs, but no variants at residues P 17 , G 18 , or P 19 have been identified. In this study, using highly degenerate primers, we analyzed if any other residues can be present at each position of the NPG/P motif within infectious FMDV. No alternative forms of this motif were found to be encoded by rescued FMDVs after two, three, or four passages. However, surprisingly, a clear codon preference for the wt nucleotide sequence encoding the NPGP motif within these viruses was observed. Indeed, the codons selected to code for P 17 and P 19 within this motif were distinct; thus the synonymous codons are not equivalent. © 2018 Kjær and Belsham; Published by Cold Spring Harbor Laboratory Press for the RNA Society.

  3. Using noble gas tracers to estimate residual CO2 saturation in the field: results from the CO2CRC Otway residual saturation and dissolution test

    NASA Astrophysics Data System (ADS)

    LaForce, T.; Ennis-King, J.; Paterson, L.

    2013-12-01

    Residual CO2 saturation is a critically important parameter in CO2 storage as it can have a large impact on the available secure storage volume and post-injection CO2 migration. A suite of single-well tests to measure residual trapping was conducted at the Otway test site in Victoria, Australia during 2011. One or more of these tests could be conducted at a prospective CO2 storage site before large-scale injection. The test involved injection of 150 tonnes of pure carbon dioxide followed by 454 tonnes of CO2-saturated formation water to drive the carbon dioxide to residual saturation. This work presents a brief overview of the full test sequence, followed by the analysis and interpretation of the tests using noble gas tracers. Prior to CO2 injection krypton (Kr) and xenon (Xe) tracers were injected and back-produced to characterise the aquifer under single-phase conditions. After CO2 had been driven to residual the two tracers were injected and produced again. The noble gases act as non-partitioning aqueous-phase tracers in the undisturbed aquifer and as partitioning tracers in the presence of residual CO2. To estimate residual saturation from the tracer test data a one-dimensional radial model of the near-well region is used. In the model there are only two independent parameters: the apparent dispersivity of each tracer and the residual CO2 saturation. Independent analysis of the Kr and Xe tracer production curves gives the same estimate of residual saturation to within the accuracy of the method. Furthermore the residual from the noble gas tracer tests is consistent with other measurements in the sequence of tests.

  4. Nucleotide sequence analysis of the gene encoding the Deinococcus radiodurans surface protein, derived amino acid sequence, and complementary protein chemical studies

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Peters, J.; Peters, M.; Lottspeich, F.

    1987-11-01

    The complete nucleotide sequence of the gene encoding the surface (hexagonally packed intermediate (HPI))-layer polypeptide of Deinococcus radiodurans Sark was determined and found to encode a polypeptide of 1036 amino acids. Amino acid sequence analysis of about 30% of the residues revealed that the mature polypeptide consists of at least 978 amino acids. The N terminus was blocked to Edman degradation. The results of proteolytic modification of the HPI layer in situ and M/sub r/ estimations of the HPI polypeptide expressed in Escherichia coli indicated that there is a leader sequence. The N-terminal region contained a very high percentage (29%)more » of threonine and serine, including a cluster of nine consecutive serine or threonine residues, whereas a stretch near the C terminus was extremely rich in aromatic amino acids (29%). The protein contained at least two disulfide bridges, as well as tightly bound reducing sugars and fatty acids.« less

  5. Relook TURBT in superficial bladder cancer: its importance and its correlation with the tumor ploidy.

    PubMed

    Dwivedi, Udai S; Kumar, Abhay; Das, Suren K; Trivedi, Sameer; Kumar, Mohan; Sunder, Shyam; Singh, Pratap B

    2009-01-01

    To evaluate various prognostic factor predictors of residual growth in Relook transurethral resection of bladder tumor (TURBT) in superficial bladder cancer. Also, to evaluate the role of Relook TURBT along with the ploidy for prediction of recurrence and stage progression in these patients. Fifty patients with superficial bladder cancer underwent TURBT after complete evaluation. Ploidy of the tumor specimen was evaluated by flow cytometry. After 4 to 6 weeks of initial TURBT, these patients underwent Relook TURBT. Final treatment was given after the results of the histological evaluation of these specimens. Patients who underwent bladder sparing treatment were followed-up. Of the patients, 28.5% had residual tumor in Relook TURBT. Growth was found to be at the same site in 66.7% and at a different site 33.3%; 75% had single while 25% had multiple residual growth. Residual malignant tissue had a statistically significant correlation with size of the tumor (>3 cm), appearance (solid tumor), number (>3), grade (high), and multiple previous resections. Overall, the up-migration of stage and grade leads to change in treatment in 41.6%; 5 underwent radical cystectomy and 1 opted for radiotherapy; in 2 patients, intravesical BCG was given. In follow-up of mean 11.5 months, 16.6% had recurrence. Presence of residual growth in Relook TURBT along with number, size, morphology, and multiple previous resections were found to have significant correlation with the recurrence in these patients. Ploidy and grade of the tumor were not found to have correlation. Multiple, more than 3 cm, solid high grade tumor with > 3 previous resections were predictors of presence of residual tumor in Relook TURBT. Presence of residual growth is a significant risk factor for recurrence. Ploidy was not found to be significantly correlated with recurrence.

  6. UniDrug-target: a computational tool to identify unique drug targets in pathogenic bacteria.

    PubMed

    Chanumolu, Sree Krishna; Rout, Chittaranjan; Chauhan, Rajinder S

    2012-01-01

    Targeting conserved proteins of bacteria through antibacterial medications has resulted in both the development of resistant strains and changes to human health by destroying beneficial microbes which eventually become breeding grounds for the evolution of resistances. Despite the availability of more than 800 genomes sequences, 430 pathways, 4743 enzymes, 9257 metabolic reactions and protein (three-dimensional) 3D structures in bacteria, no pathogen-specific computational drug target identification tool has been developed. A web server, UniDrug-Target, which combines bacterial biological information and computational methods to stringently identify pathogen-specific proteins as drug targets, has been designed. Besides predicting pathogen-specific proteins essentiality, chokepoint property, etc., three new algorithms were developed and implemented by using protein sequences, domains, structures, and metabolic reactions for construction of partial metabolic networks (PMNs), determination of conservation in critical residues, and variation analysis of residues forming similar cavities in proteins sequences. First, PMNs are constructed to determine the extent of disturbances in metabolite production by targeting a protein as drug target. Conservation of pathogen-specific protein's critical residues involved in cavity formation and biological function determined at domain-level with low-matching sequences. Last, variation analysis of residues forming similar cavities in proteins sequences from pathogenic versus non-pathogenic bacteria and humans is performed. The server is capable of predicting drug targets for any sequenced pathogenic bacteria having fasta sequences and annotated information. The utility of UniDrug-Target server was demonstrated for Mycobacterium tuberculosis (H37Rv). The UniDrug-Target identified 265 mycobacteria pathogen-specific proteins, including 17 essential proteins which can be potential drug targets. UniDrug-Target is expected to accelerate pathogen-specific drug targets identification which will increase their success and durability as drugs developed against them have less chance to develop resistances and adverse impact on environment. The server is freely available at http://117.211.115.67/UDT/main.html. The standalone application (source codes) is available at http://www.bioinformatics.org/ftp/pub/bioinfojuit/UDT.rar.

  7. Tn5401, a new class II transposable element from Bacillus thuringiensis.

    PubMed Central

    Baum, J A

    1994-01-01

    A new class II (Tn3-like) transposable element, designated Tn5401, was recovered from a sporulation-deficient variant of Bacillus thuringiensis subsp. morrisoni EG2158 following its insertion into a recombinant plasmid. Sequence analysis of the insert revealed a 4,837-bp transposon with two large open reading frames, in the same orientation, encoding proteins of 36 kDa (306 residues) and 116 kDa (1,005 residues) and 53-bp terminal inverted repeats. The deduced amino acid sequence for the 36-kDa protein shows 24% sequence identity with the TnpI recombinase of the B. thuringiensis transposon Tn4430, a member of the phage integrase family of site-specific recombinases. The deduced amino acid sequence for the 116-kDa protein shows 42% sequence identity with the transposase of Tn3 but only 28% identity with the TnpA transposase of Tn4430. Two small open reading frames of unknown function, designated orf1 (85 residues) and orf2 (74 residues), were also identified. Southern blot analysis indicated that Tn5401, in contrast to Tn4430, is not commonly found among different subspecies of B. thuringiensis and is not typically associated with known insecticidal crystal protein genes. Transposition was studied with B. thuringiensis by using plasmid pEG922, a temperature-sensitive shuttle vector containing Tn5401. Tn5401 transposed to both chromosomal and plasmid target sites but displayed an apparent preference for plasmid sites. Transposition was replicative and resulted in the generation of a 5-bp duplication at the target site. Transcriptional start sites within Tn5401 were mapped by primer extension analysis. Two promoters, designated PL and PR, direct the transcription of orf1-orf2 and tnpI-tnpA, respectively, and are negatively regulated by TnpI. Sequence comparison of the promoter regions of Tn5401 and Tn4430 suggests that the conserved sequence element ATGTCCRCTAAY mediates TnpI binding and cointegrate resolution. The same element is contained within the 53-bp terminal inverted repeats, thus accounting for their unusual lengths and suggesting an additional role for TnpI in regulating Tn5401 transposition. Images PMID:7514590

  8. Complementary DNA sequences encoding the multimammate rat MHC class II DQ alpha and beta chains and cross-species sequence comparison in rodents.

    PubMed

    de Bellocq, J Goüy; Leirs, H

    2009-09-01

    Sequences of the complete open reading frame (ORF) for rodents major histocompatibility complex (MHC) class II genes are rare. Multimammate rat (Mastomys natalensis) complementary DNA (cDNA) encoding the alpha and beta chains of MHC class II DQ gene was cloned from a rapid amplifications of cDNA Emds (RACE) cDNA library. The ORFs consist of 801 and 771 bp encoding 266 and 256 amino acid residues for DQB and DQA, respectively. The genomic structure of Mana-DQ genes is globally analogous to that described for other rodents except for the insertion of a serine residue in the signal peptide of Mana-DQB, which is unique among known rodents.

  9. Effects of stacking sequence on impact damage resistance and residual strength for quasi-isotropic laminates

    NASA Technical Reports Server (NTRS)

    Dost, Ernest F.; Ilcewicz, Larry B.; Avery, William B.; Coxon, Brian R.

    1991-01-01

    Residual strength of an impacted composite laminate is dependent on details of the damage state. Stacking sequence was varied to judge its effect on damage caused by low-velocity impact. This was done for quasi-isotropic layups of a toughened composite material. Experimental observations on changes in the impact damage state and postimpact compressive performance were presented for seven different laminate stacking sequences. The applicability and limitations of analysis compared to experimental results were also discussed. Postimpact compressive behavior was found to be a strong function of the laminate stacking sequence. This relationship was found to depend on thickness, stacking sequence, size, and location of sublaminates that comprise the impact damage state. The postimpact strength for specimens with a relatively symmetric distribution of damage through the laminate thickness was accurately predicted by models that accounted for sublaminate stability and in-plane stress redistribution. An asymmetric distribution of damage in some laminate stacking sequences tended to alter specimen stability. Geometrically nonlinear finite element analysis was used to predict this behavior.

  10. Implication of the cause of differences in 3D structures of proteins with high sequence identity based on analyses of amino acid sequences and 3D structures.

    PubMed

    Matsuoka, Masanari; Sugita, Masatake; Kikuchi, Takeshi

    2014-09-18

    Proteins that share a high sequence homology while exhibiting drastically different 3D structures are investigated in this study. Recently, artificial proteins related to the sequences of the GA and IgG binding GB domains of human serum albumin have been designed. These artificial proteins, referred to as GA and GB, share 98% amino acid sequence identity but exhibit different 3D structures, namely, a 3α bundle versus a 4β + α structure. Discriminating between their 3D structures based on their amino acid sequences is a very difficult problem. In the present work, in addition to using bioinformatics techniques, an analysis based on inter-residue average distance statistics is used to address this problem. It was hard to distinguish which structure a given sequence would take only with the results of ordinary analyses like BLAST and conservation analyses. However, in addition to these analyses, with the analysis based on the inter-residue average distance statistics and our sequence tendency analysis, we could infer which part would play an important role in its structural formation. The results suggest possible determinants of the different 3D structures for sequences with high sequence identity. The possibility of discriminating between the 3D structures based on the given sequences is also discussed.

  11. Comparison of ZP3 protein sequences among vertebrate species: to obtain a consensus sequence for immunocontraception.

    PubMed

    Zhu, X; Naz, R K

    1999-03-01

    The deduced ZP3 amino acid (aa) sequences of 13 vertebrate species namely mouse, hamster, rabbit, pig, porcine, cow, dog, cat, human, bonnet, marmoset, carp, and frog were compared using the PILEUP and PRETTY alignment programs (GCG, Wisconsin, USA). The published aa sequences obtained from 13 vertebrate species indicated the overall evolutionarily conservation in the N-terminus, central region, and C-terminus of the ZP3 polypeptide. More variations of ZP3 polypeptide sequences were seen in the alignments of carp and frog from the 11 mammalian species making the leader sequence more prominent. The canonical furin proteolytic processing signal at the C-terminus was found in all the ZP3 polypeptide sequences except of carp and frog. In the central region, the ZP3 deduced aa sequences of all the 13 vertebrate species aligned well, and six relatively conserved sequences were found. There are 11 conserved cysteine residues in the central region across all species including carp and frog, indicating that these residues have longer evolutionary history. The ZP3 aa sequence similarities were examined using the GAP program (GCG). The highest aa similarities are observed between the members of the same order within the class mammalia, and also (95.4%) between pig (ungulata) and rabbit (lagomorpha). The deduced ZP3 aa sequences per se may not be enough to build a phylogenetic tree.

  12. Residue contacts predicted by evolutionary covariance extend the application of ab initio molecular replacement to larger and more challenging protein folds.

    PubMed

    Simkovic, Felix; Thomas, Jens M H; Keegan, Ronan M; Winn, Martyn D; Mayans, Olga; Rigden, Daniel J

    2016-07-01

    For many protein families, the deluge of new sequence information together with new statistical protocols now allow the accurate prediction of contacting residues from sequence information alone. This offers the possibility of more accurate ab initio (non-homology-based) structure prediction. Such models can be used in structure solution by molecular replacement (MR) where the target fold is novel or is only distantly related to known structures. Here, AMPLE, an MR pipeline that assembles search-model ensembles from ab initio structure predictions ('decoys'), is employed to assess the value of contact-assisted ab initio models to the crystallographer. It is demonstrated that evolutionary covariance-derived residue-residue contact predictions improve the quality of ab initio models and, consequently, the success rate of MR using search models derived from them. For targets containing β-structure, decoy quality and MR performance were further improved by the use of a β-strand contact-filtering protocol. Such contact-guided decoys achieved 14 structure solutions from 21 attempted protein targets, compared with nine for simple Rosetta decoys. Previously encountered limitations were superseded in two key respects. Firstly, much larger targets of up to 221 residues in length were solved, which is far larger than the previously benchmarked threshold of 120 residues. Secondly, contact-guided decoys significantly improved success with β-sheet-rich proteins. Overall, the improved performance of contact-guided decoys suggests that MR is now applicable to a significantly wider range of protein targets than were previously tractable, and points to a direct benefit to structural biology from the recent remarkable advances in sequencing.

  13. Collagenolytic Matrix Metalloproteinase Activities toward Peptomeric Triple-Helical Substrates.

    PubMed

    Stawikowski, Maciej J; Stawikowska, Roma; Fields, Gregg B

    2015-05-19

    Although collagenolytic matrix metalloproteinases (MMPs) possess common domain organizations, there are subtle differences in their processing of collagenous triple-helical substrates. In this study, we have incorporated peptoid residues into collagen model triple-helical peptides and examined MMP activities toward these peptomeric chimeras. Several different peptoid residues were incorporated into triple-helical substrates at subsites P3, P1, P1', and P10' individually or in combination, and the effects of the peptoid residues were evaluated on the activities of full-length MMP-1, MMP-8, MMP-13, and MMP-14/MT1-MMP. Most peptomers showed little discrimination between MMPs. However, a peptomer containing N-methyl Gly (sarcosine) in the P1' subsite and N-isobutyl Gly (NLeu) in the P10' subsite was hydrolyzed efficiently only by MMP-13 [nomenclature relative to the α1(I)772-786 sequence]. Cleavage site analysis showed hydrolysis at the Gly-Gln bond, indicating a shifted binding of the triple helix compared to the parent sequence. Favorable hydrolysis by MMP-13 was not due to sequence specificity or instability of the substrate triple helix but rather was based on the specific interactions of the P7' peptoid residue with the MMP-13 hemopexin-like domain. A fluorescence resonance energy transfer triple-helical peptomer was constructed and found to be readily processed by MMP-13, not cleaved by MMP-1 and MMP-8, and weakly hydrolyzed by MT1-MMP. The influence of the triple-helical structure containing peptoid residues on the interaction between MMP subsites and individual substrate residues may provide additional information about the mechanism of collagenolysis, the understanding of collagen specificity, and the design of selective MMP probes.

  14. The Limits of Template-Directed Synthesis with Nucleoside-5'-Phosphoro(2-Methyl) Imidazolides

    NASA Technical Reports Server (NTRS)

    Hill, Aubrey R., Jr.; Orgel, Leslie E.; Wu, Taifeng

    1993-01-01

    In earlier work we have shown that C-rich templates containing isolated A, T or G residues and short oligo(G) sequences can be copied effectively using nucleoside-5'-phosphoro(2-methyl)imidazolides as substrates. We now show that isolated A or T residues within an oligo(G) sequence are a complete block to copying and that an isolated C residue is copied inefficiently. Replication is possible only if there are two complementary oligonucleotides each of which acts as a template to facilitate the synthesis of the other. We emphasize the severity of the problems that need to be overcome to make possible non-enzymatic replication in homogeneous aqueous solution. We conclude that an efficient catalyst was involved in the origin of polynucleotide replication.

  15. Fatigue of graphite/epoxy /0/90/45/-45/s laminates under dual stress levels

    NASA Technical Reports Server (NTRS)

    Yang, J. N.; Jones, D. L.

    1982-01-01

    A model for the prediction of loading sequence effects on the statistical distribution of fatigue life and residual strength in composite materials is generalized and applied to (0/90/45/-45)s graphite/epoxy laminates. Load sequence effects are found to be caused by both the difference in residual strength when failure occurs (boundary effect) and the effect of previously applied loads (memory effect). The model allows the isolation of these two effects, and the estimation of memory effect magnitudes under dual fatigue loading levels. It is shown that the material memory effect is insignificant, and that correlations between predictions of the number of early failures agree with the verification tests, as do predictions of fatigue life and residual strength degradation under dual stress levels.

  16. Effect of craniosacral therapy on lower urinary tract signs and symptoms in multiple sclerosis.

    PubMed

    Raviv, Gil; Shefi, Shai; Nizani, Dalia; Achiron, Anat

    2009-05-01

    To examine whether craniosacral therapy improves lower urinary tract symptoms of multiple sclerosis (MS) patients. A prospective cohort study. Out-patient clinic of multiple sclerosis center in a referral medical center. Hands on craniosacral therapy (CST). Change in lower urinary tract symptoms, post voiding residual volume and quality of life. Patients from our multiple sclerosis clinic were assessed before and after craniosacral therapy. Evaluation included neurological examination, disability status determination, ultrasonographic post voiding residual volume estimation and questionnaires regarding lower urinary tract symptoms and quality of life. Twenty eight patients met eligibility criteria and were included in this study. Comparison of post voiding residual volume, lower urinary tract symptoms and quality of life before and after craniosacral therapy revealed a significant improvement (0.001>p>0.0001). CST was found to be an effective means for treating lower urinary tract symptoms and improving quality of life in MS patients.

  17. Adaption of the microbial community to continuous exposures of multiple residual antibiotics in sediments from a salt-water aquacultural farm.

    PubMed

    Xi, Xiuping; Wang, Min; Chen, Yongshan; Yu, Shen; Hong, Youwei; Ma, Jun; Wu, Qian; Lin, Qiaoyin; Xu, Xiangrong

    2015-06-15

    Residual antibiotics from aquacultural farming may alter microbial community structure in aquatic environments in ways that may adversely or positively impact microbially-mediated ecological functions. This study investigated 26 ponds (26 composited samples) used to produce fish, razor clam and shrimp (farming and drying) and 2 channels (10 samples) in a saltwater aquacultural farm in southern China to characterize microbial community structure (represented by phospholipid fatty acids) in surface sediments (0-10 cm) with long-term exposure to residual antibiotics. 11 out of 14 widely-used antibiotics were quantifiable at μg kg(-1) levels in sediments but their concentrations did not statistically differ among ponds and channels, except norfloxacin in drying shrimp ponds and thiamphenicol in razor clam ponds. Concentrations of protozoan PLFAs were significantly increased in sediments from razor clam ponds while other microbial groups were similar among ponds and channels. Both canonical-correlation and stepwise-multiple-regression analyses on microbial community and residual antibiotics suggested that roxithromycin residuals were significantly related to shifts in microbial community structure in sediments. This study provided field evidence that multiple residual antibiotics at low environmental levels from aquacultural farming do not produce fundamental shifts in microbial community structure. Copyright © 2015 Elsevier B.V. All rights reserved.

  18. Accurate Simulation and Detection of Coevolution Signals in Multiple Sequence Alignments

    PubMed Central

    Ackerman, Sharon H.; Tillier, Elisabeth R.; Gatti, Domenico L.

    2012-01-01

    Background While the conserved positions of a multiple sequence alignment (MSA) are clearly of interest, non-conserved positions can also be important because, for example, destabilizing effects at one position can be compensated by stabilizing effects at another position. Different methods have been developed to recognize the evolutionary relationship between amino acid sites, and to disentangle functional/structural dependencies from historical/phylogenetic ones. Methodology/Principal Findings We have used two complementary approaches to test the efficacy of these methods. In the first approach, we have used a new program, MSAvolve, for the in silico evolution of MSAs, which records a detailed history of all covarying positions, and builds a global coevolution matrix as the accumulated sum of individual matrices for the positions forced to co-vary, the recombinant coevolution, and the stochastic coevolution. We have simulated over 1600 MSAs for 8 protein families, which reflect sequences of different sizes and proteins with widely different functions. The calculated coevolution matrices were compared with the coevolution matrices obtained for the same evolved MSAs with different coevolution detection methods. In a second approach we have evaluated the capacity of the different methods to predict close contacts in the representative X-ray structures of an additional 150 protein families using only experimental MSAs. Conclusions/Significance Methods based on the identification of global correlations between pairs were found to be generally superior to methods based only on local correlations in their capacity to identify coevolving residues using either simulated or experimental MSAs. However, the significant variability in the performance of different methods with different proteins suggests that the simulation of MSAs that replicate the statistical properties of the experimental MSA can be a valuable tool to identify the coevolution detection method that is most effective in each case. PMID:23091608

  19. Reptilian-transcriptome v1.0, a glimpse in the brain transcriptome of five divergent Sauropsida lineages and the phylogenetic position of turtles

    PubMed Central

    2011-01-01

    Background Reptiles are largely under-represented in comparative genomics despite the fact that they are substantially more diverse in many respects than mammals. Given the high divergence of reptiles from classical model species, next-generation sequencing of their transcriptomes is an approach of choice for gene identification and annotation. Results Here, we use 454 technology to sequence the brain transcriptome of four divergent reptilian and one reference avian species: the Nile crocodile, the corn snake, the bearded dragon, the red-eared turtle, and the chicken. Using an in-house pipeline for recursive similarity searches of >3,000,000 reads against multiple databases from 7 reference vertebrates, we compile a reptilian comparative transcriptomics dataset, with homology assignment for 20,000 to 31,000 transcripts per species and a cumulated non-redundant sequence length of 248.6 Mbases. Our approach identifies the majority (87%) of chicken brain transcripts and about 50% of de novo assembled reptilian transcripts. In addition to 57,502 microsatellite loci, we identify thousands of SNP and indel polymorphisms for population genetic and linkage analyses. We also build very large multiple alignments for Sauropsida and mammals (two million residues per species) and perform extensive phylogenetic analyses suggesting that turtles are not basal living reptiles but are rather associated with Archosaurians, hence, potentially answering a long-standing question in the phylogeny of Amniotes. Conclusions The reptilian transcriptome (freely available at http://www.reptilian-transcriptomes.org) should prove a useful new resource as reptiles are becoming important new models for comparative genomics, ecology, and evolutionary developmental genetics. PMID:21943375

  20. Multiplexed fragaria chloroplast genome sequencing

    Treesearch

    W. Njuguna; A. Liston; R. Cronn; N.V. Bassil

    2010-01-01

    A method to sequence multiple chloroplast genomes using ultra high throughput sequencing technologies was recently described. Complete chloroplast genome sequences can resolve phylogenetic relationships at low taxonomic levels and identify informative point mutations and indels. The objective of this research was to sequence multiple Fragaria...

  1. Evolutionary Diversifaction of Aminopeptidase N in Lepidoptera by Conserved Clade-specific Amino Acid Residues

    PubMed Central

    Hughes, Austin L.

    2015-01-01

    Members of the aminopepidase N (APN) gene family of the insect order Lepidoptera (moths and butterflies) bind the naturally insecticidal Cry toxins produced by the bacterium Bacillus thuringiensis. Phylogenetic analysis of amino acid sequences of seven lepidopteran APN classes provided strong support for the hypothesis that lepidopteran APN2 class arose by gene duplication prior to the most recent common ancestor of Lepidoptera and Diptera. The Cry toxin-binding region (BR) of lepidopteran and dipteran APNs was subject to stronger purifying selection within APN classes than was the remainder of the molecule, reflecting conservation of catalytic site and adjoining residues within the BR. Of lepidopteran APN classes, APN2, APN6, and APN8 showed the strongest evidence of functional specialization, both in expression patterns and in the occurrence of conserved derived amino acid residues. The latter three APN classes also shared a convergently evolved conserved residue close to the catalytic site. APN8 showed a particularly strong tendency towards class-specific conserved residues, including one of the catalytic site residues in the BR and ten others in close vicinity to the catalytic site residues. The occurrence of class-specific sequences along with the conservation of enzymatic function is consistent with the hypothesis that the presence of Cry toxins in the environment has been a factor shaping the evolution of this multi-gene family. PMID:24675701

  2. Determination of Multiple Near-Surface Residual Stress Components in Laser Peened Aluminum Alloy via the Contour Method

    NASA Astrophysics Data System (ADS)

    Toparli, M. Burak; Fitzpatrick, Michael E.; Gungor, Salih

    2015-09-01

    In this study, residual stress fields, including the near-surface residual stresses, were determined for an Al7050-T7451 sample after laser peening. The contour method was applied to measure one component of the residual stress, and the relaxed stresses on the cut surfaces were then measured by X-ray diffraction. This allowed calculation of the three orthogonal stress components using the superposition principle. The near-surface results were validated with results from incremental hole drilling and conventional X-ray diffraction. The results demonstrate that multiple residual stress components can be determined using a combination of the contour method and another technique. If the measured stress components are congruent with the principal stress axes in the sample, then this allows for determination of the complete stress tensor.

  3. Characterization of a marsupial sperm protamine gene and its transcripts from the North American opossum (Didelphis marsupialis).

    PubMed

    Winkfein, R J; Nishikawa, S; Connor, W; Dixon, G H

    1993-07-01

    A synthetic oligonucleotide primer, designed from marsupial protamine protein-sequence data [Balhorn, R., Corzett, M., Matrimas, J. A., Cummins, J. & Faden, B. (1989) Analysis of protamines isolated from two marsupials, the ring-tailed wallaby and gray short-tailed opossum, J. Cell. Biol. 107] was used to amplify, via the polymerase chain reaction, protamine sequences from a North American opossum (Didelphis marsupialis) cDNA. Using the amplified sequences as probes, several protamine cDNA clones were isolated. The protein sequence, predicted from the cDNA sequences, consisted of 57 amino acids, contained a large number of arginine residues and exhibited the sequence ARYR at its amino terminus, which is conserved in avian and most eutherian mammal protamines. Like the true protamines of trout and chicken, the opossum protamine lacked cysteine residues, distinguishing it from placental mammalian protamine 1 (P1 or stable) protamines. Examination of the protamine gene, isolated by polymerase-chain-reaction amplification of genomic DNA, revealed the presence of an intron dividing the protamine-coding region, a common characteristic of all mammalian P1 genes. In addition, extensive sequence identity in the 5' and 3' flanking regions between mouse and opossum sequences classify the marsupial protamine as being closely related to placental mammal P1. Protamine transcripts, in both birds and mammals, are present in two size classes, differing by the length of their poly(A) tails (either short or long). Examination of opossum protamine transcripts by Northern hybridization revealed four distinct mRNA species in the total RNA fraction, two of which were enriched in the poly(A)-rich fraction. Northern-blot analysis, using an intron-specific probe, revealed the presence of intron sequences in two of the four protamine transcripts. If expressed, the corresponding protein from intron-containing transcripts would differ from spliced transcripts by length (49 versus 57 amino acids) and would contain a cysteine residue.

  4. The diversity of H3 loops determines the antigen-binding tendencies of antibody CDR loops.

    PubMed

    Tsuchiya, Yuko; Mizuguchi, Kenji

    2016-04-01

    Of the complementarity-determining regions (CDRs) of antibodies, H3 loops, with varying amino acid sequences and loop lengths, adopt particularly diverse loop conformations. The diversity of H3 conformations produces an array of antigen recognition patterns involving all the CDRs, in which the residue positions actually in contact with the antigen vary considerably. Therefore, for a deeper understanding of antigen recognition, it is necessary to relate the sequence and structural properties of each residue position in each CDR loop to its ability to bind antigens. In this study, we proposed a new method for characterizing the structural features of the CDR loops and obtained the antigen-binding ability of each residue position in each CDR loop. This analysis led to a simple set of rules for identifying probable antigen-binding residues. We also found that the diversity of H3 loop lengths and conformations affects the antigen-binding tendencies of all the CDR loops. © 2016 The Protein Society.

  5. A 31-residue peptide induces aggregation of tau's microtubule-binding region in cells

    NASA Astrophysics Data System (ADS)

    Stöhr, Jan; Wu, Haifan; Nick, Mimi; Wu, Yibing; Bhate, Manasi; Condello, Carlo; Johnson, Noah; Rodgers, Jeffrey; Lemmin, Thomas; Acharya, Srabasti; Becker, Julia; Robinson, Kathleen; Kelly, Mark J. S.; Gai, Feng; Stubbs, Gerald; Prusiner, Stanley B.; Degrado, William F.

    2017-09-01

    The self-propagation of misfolded conformations of tau underlies neurodegenerative diseases, including Alzheimer's. There is considerable interest in discovering the minimal sequence and active conformational nucleus that defines this self-propagating event. The microtubule-binding region, spanning residues 244-372, reproduces much of the aggregation behaviour of tau in cells and animal models. Further dissection of the amyloid-forming region to a hexapeptide from the third microtubule-binding repeat resulted in a peptide that rapidly forms fibrils in vitro. We show that this peptide lacks the ability to seed aggregation of tau244-372 in cells. However, as the hexapeptide is gradually extended to 31 residues, the peptides aggregate more slowly and gain potent activity to induce aggregation of tau244-372 in cells. X-ray fibre diffraction, hydrogen-deuterium exchange and solid-state NMR studies map the beta-forming region to a 25-residue sequence. Thus, the nucleus for self-propagating aggregation of tau244-372 in cells is packaged in a remarkably small peptide.

  6. Distantly related lipocalins share two conserved clusters of hydrophobic residues: use in homology modeling

    PubMed Central

    Adam, Benoit; Charloteaux, Benoit; Beaufays, Jerome; Vanhamme, Luc; Godfroid, Edmond; Brasseur, Robert; Lins, Laurence

    2008-01-01

    Background Lipocalins are widely distributed in nature and are found in bacteria, plants, arthropoda and vertebra. In hematophagous arthropods, they are implicated in the successful accomplishment of the blood meal, interfering with platelet aggregation, blood coagulation and inflammation and in the transmission of disease parasites such as Trypanosoma cruzi and Borrelia burgdorferi. The pairwise sequence identity is low among this family, often below 30%, despite a well conserved tertiary structure. Under the 30% identity threshold, alignment methods do not correctly assign and align proteins. The only safe way to assign a sequence to that family is by experimental determination. However, these procedures are long and costly and cannot always be applied. A way to circumvent the experimental approach is sequence and structure analyze. To further help in that task, the residues implicated in the stabilisation of the lipocalin fold were determined. This was done by analyzing the conserved interactions for ten lipocalins having a maximum pairwise identity of 28% and various functions. Results It was determined that two hydrophobic clusters of residues are conserved by analysing the ten lipocalin structures and sequences. One cluster is internal to the barrel, involving all strands and the 310 helix. The other is external, involving four strands and the helix lying parallel to the barrel surface. These clusters are also present in RaHBP2, a unusual "outlier" lipocalin from tick Rhipicephalus appendiculatus. This information was used to assess assignment of LIR2 a protein from Ixodes ricinus and to build a 3D model that helps to predict function. FTIR data support the lipocalin fold for this protein. Conclusion By sequence and structural analyzes, two conserved clusters of hydrophobic residues in interactions have been identified in lipocalins. Since the residues implicated are not conserved for function, they should provide the minimal subset necessary to confer the lipocalin fold. This information has been used to assign LIR2 to lipocalins and to investigate its structure/function relationship. This study could be applied to other protein families with low pairwise similarity, such as the structurally related fatty acid binding proteins or avidins. PMID:18190694

  7. Prediction of protein tertiary structure from sequences using a very large back-propagation neural network

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Liu, X.; Wilcox, G.L.

    1993-12-31

    We have implemented large scale back-propagation neural networks on a 544 node Connection Machine, CM-5, using the C language in MIMD mode. The program running on 512 processors performs backpropagation learning at 0.53 Gflops, which provides 76 million connection updates per second. We have applied the network to the prediction of protein tertiary structure from sequence information alone. A neural network with one hidden layer and 40 million connections is trained to learn the relationship between sequence and tertiary structure. The trained network yields predicted structures of some proteins on which it has not been trained given only their sequences.more » Presentation of the Fourier transform of the sequences accentuates periodicity in the sequence and yields good generalization with greatly increased training efficiency. Training simulations with a large, heterologous set of protein structures (111 proteins from CM-5 time) to solutions with under 2% RMS residual error within the training set (random responses give an RMS error of about 20%). Presentation of 15 sequences of related proteins in a testing set of 24 proteins yields predicted structures with less than 8% RMS residual error, indicating good apparent generalization.« less

  8. Evolutionary and molecular foundations of multiple contemporary functions of the nitroreductase superfamily

    PubMed Central

    Akiva, Eyal; Copp, Janine N.; Tokuriki, Nobuhiko; Babbitt, Patricia C.

    2017-01-01

    Insight regarding how diverse enzymatic functions and reactions have evolved from ancestral scaffolds is fundamental to understanding chemical and evolutionary biology, and for the exploitation of enzymes for biotechnology. We undertook an extensive computational analysis using a unique and comprehensive combination of tools that include large-scale phylogenetic reconstruction to determine the sequence, structural, and functional relationships of the functionally diverse flavin mononucleotide-dependent nitroreductase (NTR) superfamily (>24,000 sequences from all domains of life, 54 structures, and >10 enzymatic functions). Our results suggest an evolutionary model in which contemporary subgroups of the superfamily have diverged in a radial manner from a minimal flavin-binding scaffold. We identified the structural design principle for this divergence: Insertions at key positions in the minimal scaffold that, combined with the fixation of key residues, have led to functional specialization. These results will aid future efforts to delineate the emergence of functional diversity in enzyme superfamilies, provide clues for functional inference for superfamily members of unknown function, and facilitate rational redesign of the NTR scaffold. PMID:29078300

  9. The GAGA protein of Drosophila is phosphorylated by CK2.

    PubMed

    Bonet, Carles; Fernández, Irene; Aran, Xavier; Bernués, Jordi; Giralt, Ernest; Azorín, Fernando

    2005-08-19

    The GAGA factor of Drosophila is a sequence-specific DNA-binding protein that contributes to multiple processes from the regulation of gene expression to the structural organisation of heterochromatin and chromatin remodelling. GAGA is known to interact with various other proteins (tramtrack, pipsqueak, batman and dSAP18) and protein complexes (PRC1, NURF and FACT). GAGA functions are likely regulated at the level of post-translational modifications. Little is known, however, about its actual pattern of modification. It was proposed that GAGA can be O-glycosylated. Here, we report that GAGA519 isoform is a phosphoprotein that is phosphorylated by CK2 at the region of the DNA-binding domain. Our results indicate that phosphorylation occurs at S388 and, to a lesser extent, at S378. These two residues are located in a region of the DNA-binding domain that makes no direct contact with DNA, being dispensable for sequence-specific recognition. Phosphorylation at these sites does not abolish DNA binding but reduces the affinity of the interaction. These results are discussed in the context of the various functions and interactions that GAGA supports.

  10. Src regulates sequence-dependent beta-2 adrenergic receptor recycling via cortactin phosphorylation*

    PubMed Central

    Vistein, Rachel; Puthenveedu, Manojkumar A.

    2014-01-01

    The recycling of internalized signaling receptors, which has direct functional consequences, is subject to multiple sequence and biochemical requirements. Why signaling receptors recycle via a specialized pathway, unlike many other proteins that recycle by bulk, is a fundamental unanswered question. Here we show that these specialized pathways allow selective control of signaling receptor recycling by heterologous signaling. Using assays to visualize receptor recycling in living cells, we show that the recycling of the beta-2 adrenergic receptor (B2AR), a prototypic signaling receptor, is regulated by Src family kinases. The target of Src is cortactin, an essential factor for B2AR sorting into specialized recycling microdomains on the endosome. Phosphorylation of a single cortactin residue, Y466, regulates the rate of fission of B2AR recycling vesicles from these microdomains, and, therefore, the rate of delivery of B2AR to the cell surface. Together, our results indicate that actin-stabilized microdomains that mediate signaling receptor recycling can serve as a functional point of convergence for crosstalk between signaling pathways. PMID:25077552

  11. Evolution-Based Functional Decomposition of Proteins

    PubMed Central

    Rivoire, Olivier; Reynolds, Kimberly A.; Ranganathan, Rama

    2016-01-01

    The essential biological properties of proteins—folding, biochemical activities, and the capacity to adapt—arise from the global pattern of interactions between amino acid residues. The statistical coupling analysis (SCA) is an approach to defining this pattern that involves the study of amino acid coevolution in an ensemble of sequences comprising a protein family. This approach indicates a functional architecture within proteins in which the basic units are coupled networks of amino acids termed sectors. This evolution-based decomposition has potential for new understandings of the structural basis for protein function. To facilitate its usage, we present here the principles and practice of the SCA and introduce new methods for sector analysis in a python-based software package (pySCA). We show that the pattern of amino acid interactions within sectors is linked to the divergence of functional lineages in a multiple sequence alignment—a model for how sector properties might be differentially tuned in members of a protein family. This work provides new tools for studying proteins and for generally testing the concept of sectors as the principal units of function and adaptive variation. PMID:27254668

  12. Early Antibody Lineage Diversification and Independent Limb Maturation Lead to Broad HIV-1 Neutralization Targeting the Env High-Mannose Patch.

    PubMed

    MacLeod, Daniel T; Choi, Nancy M; Briney, Bryan; Garces, Fernando; Ver, Lorena S; Landais, Elise; Murrell, Ben; Wrin, Terri; Kilembe, William; Liang, Chi-Hui; Ramos, Alejandra; Bian, Chaoran B; Wickramasinghe, Lalinda; Kong, Leopold; Eren, Kemal; Wu, Chung-Yi; Wong, Chi-Huey; Kosakovsky Pond, Sergei L; Wilson, Ian A; Burton, Dennis R; Poignard, Pascal

    2016-05-17

    The high-mannose patch on HIV Env is a preferred target for broadly neutralizing antibodies (bnAbs), but to date, no vaccination regimen has elicited bnAbs against this region. Here, we present the development of a bnAb lineage targeting the high-mannose patch in an HIV-1 subtype-C-infected donor from sub-Saharan Africa. The Abs first acquired autologous neutralization, then gradually matured to achieve breadth. One Ab neutralized >47% of HIV-1 strains with only ∼11% somatic hypermutation and no insertions or deletions. By sequencing autologous env, we determined key residues that triggered the lineage and participated in Ab-Env coevolution. Next-generation sequencing of the Ab repertoire showed an early expansive diversification of the lineage followed by independent maturation of individual limbs, several of them developing notable breadth and potency. Overall, the findings are encouraging from a vaccine standpoint and suggest immunization strategies mimicking the evolution of the entire high-mannose patch and promoting maturation of multiple diverse Ab pathways. Copyright © 2016 Elsevier Inc. All rights reserved.

  13. Relationship between the dimerization of thyroglobulin and its ability to form triiodothyronine.

    PubMed

    Citterio, Cintia E; Morishita, Yoshiaki; Dakka, Nada; Veluswamy, Balaji; Arvan, Peter

    2018-03-30

    Thyroglobulin (TG) is the most abundant thyroid gland protein, a dimeric iodoglycoprotein (660 kDa). TG serves as the protein precursor in the synthesis of thyroid hormones tetraiodothyronine (T 4 ) and triiodothyronine (T 3 ). The primary site for T 3 synthesis in TG involves an iodotyrosine acceptor at the antepenultimate Tyr residue (at the extreme carboxyl terminus of the protein). The carboxyl-terminal region of TG comprises a ch olin e sterase- l ike (ChEL) domain followed by a short unique tail sequence. Despite many studies, the monoiodotyrosine donor residue needed for the coupling reaction to create T 3 at this evolutionarily conserved site remains unidentified. In this report, we have utilized a novel, convenient immunoblotting assay to detect T 3 formation after protein iodination in vitro , enabling the study of T 3 formation in recombinant TG secreted from thyrocytes or heterologous cells. With this assay, we confirm the antepenultimate residue of TG as a major T 3 -forming site, but also demonstrate that the side chain of this residue intimately interacts with the same residue in the apposed monomer of the TG dimer. T 3 formation in TG, or the isolated carboxyl-terminal region, is inhibited by mutation of this antepenultimate residue, but we describe the first substitution mutation that actually increases T 3 hormonogenesis by engineering a novel cysteine, 10 residues upstream of the antepenultimate residue, allowing for covalent association of the unique tail sequences, and that helps to bring residues Tyr 2744 from apposed monomers into closer proximity. © 2018 Citterio et al.

  14. On the structural context and identification of enzyme catalytic residues.

    PubMed

    Chien, Yu-Tung; Huang, Shao-Wei

    2013-01-01

    Enzymes play important roles in most of the biological processes. Although only a small fraction of residues are directly involved in catalytic reactions, these catalytic residues are the most crucial parts in enzymes. The study of the fundamental and unique features of catalytic residues benefits the understanding of enzyme functions and catalytic mechanisms. In this work, we analyze the structural context of catalytic residues based on theoretical and experimental structure flexibility. The results show that catalytic residues have distinct structural features and context. Their neighboring residues, whether sequence or structure neighbors within specific range, are usually structurally more rigid than those of noncatalytic residues. The structural context feature is combined with support vector machine to identify catalytic residues from enzyme structure. The prediction results are better or comparable to those of recent structure-based prediction methods.

  15. Structure and stability of the ankyrin domain of the Drosophila Notch receptor.

    PubMed

    Zweifel, Mark E; Leahy, Daniel J; Hughson, Frederick M; Barrick, Doug

    2003-11-01

    The Notch receptor contains a conserved ankyrin repeat domain that is required for Notch-mediated signal transduction. The ankyrin domain of Drosophila Notch contains six ankyrin sequence repeats previously identified as closely matching the ankyrin repeat consensus sequence, and a putative seventh C-terminal sequence repeat that exhibits lower similarity to the consensus sequence. To better understand the role of the Notch ankyrin domain in Notch-mediated signaling and to examine how structure is distributed among the seven ankyrin sequence repeats, we have determined the crystal structure of this domain to 2.0 angstroms resolution. The seventh, C-terminal, ankyrin sequence repeat adopts a regular ankyrin fold, but the first, N-terminal ankyrin repeat, which contains a 15-residue insertion, appears to be largely disordered. The structure reveals a substantial interface between ankyrin polypeptides, showing a high degree of shape and charge complementarity, which may be related to homotypic interactions suggested from indirect studies. However, the Notch ankyrin domain remains largely monomeric in solution, demonstrating that this interface alone is not sufficient to promote tight association. Using the structure, we have classified reported mutations within the Notch ankyrin domain that are known to disrupt signaling into those that affect buried residues and those restricted to surface residues. We show that the buried substitutions greatly decrease protein stability, whereas the surface substitutions have only a marginal affect on stability. The surface substitutions are thus likely to interfere with Notch signaling by disrupting specific Notch-effector interactions and map the sites of these interactions.

  16. A reduced amino acid alphabet for understanding and designing protein adaptation to mutation.

    PubMed

    Etchebest, C; Benros, C; Bornot, A; Camproux, A-C; de Brevern, A G

    2007-11-01

    Protein sequence world is considerably larger than structure world. In consequence, numerous non-related sequences may adopt similar 3D folds and different kinds of amino acids may thus be found in similar 3D structures. By grouping together the 20 amino acids into a smaller number of representative residues with similar features, sequence world simplification may be achieved. This clustering hence defines a reduced amino acid alphabet (reduced AAA). Numerous works have shown that protein 3D structures are composed of a limited number of building blocks, defining a structural alphabet. We previously identified such an alphabet composed of 16 representative structural motifs (5-residues length) called Protein Blocks (PBs). This alphabet permits to translate the structure (3D) in sequence of PBs (1D). Based on these two concepts, reduced AAA and PBs, we analyzed the distributions of the different kinds of amino acids and their equivalences in the structural context. Different reduced sets were considered. Recurrent amino acid associations were found in all the local structures while other were specific of some local structures (PBs) (e.g Cysteine, Histidine, Threonine and Serine for the alpha-helix Ncap). Some similar associations are found in other reduced AAAs, e.g Ile with Val, or hydrophobic aromatic residues Trp with Phe and Tyr. We put into evidence interesting alternative associations. This highlights the dependence on the information considered (sequence or structure). This approach, equivalent to a substitution matrix, could be useful for designing protein sequence with different features (for instance adaptation to environment) while preserving mainly the 3D fold.

  17. Large‐scale analysis of intrinsic disorder flavors and associated functions in the protein sequence universe

    PubMed Central

    Necci, Marco; Piovesan, Damiano

    2016-01-01

    Abstract Intrinsic disorder (ID) in proteins has been extensively described for the last decade; a large‐scale classification of ID in proteins is mostly missing. Here, we provide an extensive analysis of ID in the protein universe on the UniProt database derived from sequence‐based predictions in MobiDB. Almost half the sequences contain an ID region of at least five residues. About 9% of proteins have a long ID region of over 20 residues which are more abundant in Eukaryotic organisms and most frequently cover less than 20% of the sequence. A small subset of about 67,000 (out of over 80 million) proteins is fully disordered and mostly found in Viruses. Most proteins have only one ID, with short ID evenly distributed along the sequence and long ID overrepresented in the center. The charged residue composition of Das and Pappu was used to classify ID proteins by structural propensities and corresponding functional enrichment. Swollen Coils seem to be used mainly as structural components and in biosynthesis in both Prokaryotes and Eukaryotes. In Bacteria, they are confined in the nucleoid and in Viruses provide DNA binding function. Coils & Hairpins seem to be specialized in ribosome binding and methylation activities. Globules & Tadpoles bind antigens in Eukaryotes but are involved in killing other organisms and cytolysis in Bacteria. The Undefined class is used by Bacteria to bind toxic substances and mediate transport and movement between and within organisms in Viruses. Fully disordered proteins behave similarly, but are enriched for glycine residues and extracellular structures. PMID:27636733

  18. Redesigning the type II' β-turn in green fluorescent protein to type I': implications for folding kinetics and stability.

    PubMed

    Madan, Bharat; Sokalingam, Sriram; Raghunathan, Govindan; Lee, Sun-Gu

    2014-10-01

    Both Type I' and Type II' β-turns have the same sense of the β-turn twist that is compatible with the β-sheet twist. They occur predominantly in two residue β-hairpins, but the occurrence of Type I' β-turns is two times higher than Type II' β-turns. This suggests that Type I' β-turns may be more stable than Type II' β-turns, and Type I' β-turn sequence and structure can be more favorable for protein folding than Type II' β-turns. Here, we redesigned the native Type II' β-turn in GFP to Type I' β-turn, and investigated its effect on protein folding and stability. The Type I' β-turns were designed based on the statistical analysis of residues in natural Type I' β-turns. The substitution of the native "GD" sequence of i+1 and i+2 residues with Type I' preferred "(N/D)G" sequence motif increased the folding rate by 50% and slightly improved the thermodynamic stability. Despite the enhancement of in vitro refolding kinetics and stability of the redesigned mutants, they showed poor soluble expression level compared to wild type. To overcome this problem, i and i + 3 residues of the designed Type I' β-turn were further engineered. The mutation of Thr to Lys at i + 3 could restore the in vivo soluble expression of the Type I' mutant. This study indicates that Type II' β-turns in natural β-hairpins can be further optimized by converting the sequence to Type I'. © 2014 Wiley Periodicals, Inc.

  19. Differential signatures of bacterial and mammalian IMP dehydrogenase enzymes.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zhang, R.; Evans, G.; Rotella, F.

    1999-06-01

    IMP dehydrogenase (IMPDH) is an essential enzyme of de novo guanine nucleotide synthesis. IMPDH inhibitors have clinical utility as antiviral, anticancer or immunosuppressive agents. The essential nature of this enzyme suggests its therapeutic applications may be extended to the development of antimicrobial agents. Bacterial IMPDH enzymes show bio- chemical and kinetic characteristics that are different than the mammalian IMPDH enzymes, suggesting IMPDH may be an attractive target for the development of antimicrobial agents. We suggest that the biochemical and kinetic differences between bacterial and mammalian enzymes are a consequence of the variance of specific, identifiable amino acid residues. Identification ofmore » these residues or combination of residues that impart this mammalian or bacterial enzyme signature is a prerequisite for the rational identification of agents that specifically target the bacterial enzyme. We used sequence alignments of IMPDH proteins to identify sequence signatures associated with bacterial or eukaryotic IMPDH enzymes. These selections were further refined to discern those likely to have a role in catalysis using information derived from the bacterial and mammalian IMPDH crystal structures and site-specific mutagenesis. Candidate bacterial sequence signatures identified by this process include regions involved in subunit interactions, the active site flap and the NAD binding region. Analysis of sequence alignments in these regions indicates a pattern of catalytic residues conserved in all enzymes and a secondary pattern of amino acid conservation associated with the major phylogenetic groups. Elucidation of the basis for this mammalian/bacterial IMPDH signature will provide insight into the catalytic mechanism of this enzyme and the foundation for the development of highly specific inhibitors.« less

  20. Amino-acid sequence and predicted three-dimensional structure of pea seed (Pisum sativum) ferritin.

    PubMed Central

    Lobreaux, S; Yewdall, S J; Briat, J F; Harrison, P M

    1992-01-01

    The iron storage protein, ferritin, is widely distributed in the living kingdom. Here the complete cDNA and derived amino-acid sequence of pea seed ferritin are described, together with its predicted secondary structure, namely a four-helix-bundle fold similar to those of mammalian ferritins, with a fifth short helix at the C-terminus. An N-terminal extension of 71 residues contains a transit peptide (first 47 residues) responsible for plastid targetting as in other plant ferritins, and this is cleaved before assembly. The second part of the extension (24 residues) belongs to the mature subunit; it is cleaved during germination. The amino-acid sequence of pea seed ferritin is aligned with those of other ferritins (49% amino-acid identity with H-chains and 40% with L-chains of human liver ferritin in the aligned region). A three-dimensional model has been constructed by fitting the aligned sequence to the coordinates of human H-chains, with appropriate modifications. A folded conformation with an 11-residue helix is predicted for the N-terminal extension. As in mammalian ferritins, 24 subunits assemble into a hollow shell. In pea seed ferritin, its N-terminal extension is exposed on the outside surface of the shell. Within each pea subunit is a ferroxidase centre resembling those of human ferritin H-chains except for a replacement of Glu-62 by His. The channel at the 4-fold-symmetry axes defined by E-helices, is predicted to be hydrophilic in plant ferritins, whereas it is hydrophobic in mammalian ferritins. Images Fig. 3. Fig. 5. Fig. 6. PMID:1472006

  1. Homooligomeric β3 (R)-valine peptides: Transformation between C14 and C12 helical structures induced by a guest Aib residue.

    PubMed

    Vasantha, Basavalingappa; George, Gijo; Raghothama, Srinivasarao; Balaram, Padmanabhan

    2017-01-01

    Novel helical, structures unprecedented in the chemistry of α-polypeptides, may be found in polypeptides containing β and γ amino acids. The structural characterization of C 12 and C 14 -helices in oligo β-peptides was originally achieved using conformationally constrained cyclic β-residues. This study explores the conformational characteristics of proteinogenic β 3 residues in homooligomeric sequences and addresses the issue of inducing a transition between C 14 and C 12 helices by the introduction of a guest α-residue. Folded C 14 -helical structures are demonstrated for the nonapeptide Boc-[β 3 (R)Val] 9 -OMe by NMR methods in CDCl 3 -DMSO mixtures, while the peptide was found to be aggregated in CDCl 3 . The insertion of a guest Aib residue into an oligo-β-valine sequence in the octapeptide model Boc-[(β 3 (R)Val) 3 -Aib-(β 3 (R)Val] 4 -OMe results in well dispersed NH region in the NMR spectrum indicating folded structures in CDCl 3 . Structure calculations for both the peptides using NOE distance constraints support a C 14 helical structure in the homooligomer which transform into a C 12 helix on introduction of the guest Aib residue. © 2016 Wiley Periodicals, Inc.

  2. Characterization of aromatic residue-controlled protein retention in the endoplasmic reticulum of Saccharomyces cerevisiae.

    PubMed

    Mei, Meng; Zhai, Chao; Li, Xinzhi; Zhou, Yu; Peng, Wenfang; Ma, Lixin; Wang, Qinhong; Iverson, Brent L; Zhang, Guimin; Yi, Li

    2017-12-15

    An endoplasmic reticulum (ER) retention sequence (ERS) is a characteristic short sequence that mediates protein retention in the ER of eukaryotic cells. However, little is known about the detailed molecular mechanism involved in ERS-mediated protein ER retention. Using a new surface display-based fluorescence technique that effectively quantifies ERS-promoted protein ER retention within Saccharomyces cerevisiae cells, we performed comprehensive ERS analyses. We found that the length, type of amino acid residue, and additional residues at positions -5 and -6 of the C-terminal HDEL motif all determined the retention of ERS in the yeast ER. Moreover, the biochemical results guided by structure simulation revealed that aromatic residues (Phe-54, Trp-56, and other aromatic residues facing the ER lumen) in both the ERS (at positions -6 and -4) and its receptor, Erd2, jointly determined their interaction with each other. Our studies also revealed that this aromatic residue interaction might lead to the discriminative recognition of HDEL or KDEL as ERS in yeast or human cells, respectively. Our findings expand the understanding of ERS-mediated residence of proteins in the ER and may guide future research into protein folding, modification, and translocation affected by ER retention. © 2017 by The American Society for Biochemistry and Molecular Biology, Inc.

  3. Properties and cDNA cloning of antihemorrhagic factors in sera of Chinese and Japanese mamushi (Gloydius blomhoffi).

    PubMed

    Aoki, Narumi; Tsutsumi, Kadzuyo; Deshimaru, Masanobu; Terada, Shigeyuki

    2008-02-01

    An antihemorrhagic protein has been isolated from the serum of Chinese mamushi (Gloydius blomhoffi brevicaudus) by using a combination of ethanol precipitation and a reverse-phase high-performance liquid chromatography (HPLC) on a C8 column. This protein-designated Chinese mamushi serum factor (cMSF)-suppressed mamushi venom-induced hemorrhage in a dose-dependent manner. It had no effect on trypsin, chymotrypsin, thermolysin, and papain but inhibited the proteinase activities of several snake venom metalloproteinases (SVMPs) including hemorrhagic enzymes isolated from the venoms of mamushi and habu (Trimeresurus flavoviridis). A similar protein (Japanese MSF, jMSF) with antihemorrhagic activity has also been purified from the sera of Japanese mamushi (G. blomhoffi). The N-terminal 70 and 51 residues of the intact cMSF and jMSF were directly analyzed; a similarity between the sequences of two MSFs to that of antihemorrhagic protein (HSF) from habu serum was noticed. To obtain the complete amino acid sequences of MSFs, cDNAs encoding these proteins were cloned from the liver mRNA of Chinese and Japanese vipers based on their N-terminal amino acid sequences. The mature forms of both MSFs consisted of 305 amino acids with a 19-residue signal sequence, and a unique 17-residue deletion was detected in their His-rich domains.

  4. Dissection of a nuclear localization signal.

    PubMed

    Hodel, M R; Corbett, A H; Hodel, A E

    2001-01-12

    The regulated process of protein import into the nucleus of a eukaryotic cell is mediated by specific nuclear localization signals (NLSs) that are recognized by protein import receptors. This study seeks to decipher the energetic details of NLS recognition by the receptor importin alpha through quantitative analysis of variant NLSs. The relative importance of each residue in two monopartite NLS sequences was determined using an alanine scanning approach. These measurements yield an energetic definition of a monopartite NLS sequence where a required lysine residue is followed by two other basic residues in the sequence K(K/R)X(K/R). In addition, the energetic contributions of the second basic cluster in a bipartite NLS ( approximately 3 kcal/mol) as well as the energy of inhibition of the importin alpha importin beta-binding domain ( approximately 3 kcal/mol) were also measured. These data allow the generation of an energetic scale of nuclear localization sequences based on a peptide's affinity for the importin alpha-importin beta complex. On this scale, a functional NLS has a binding constant of approximately 10 nm, whereas a nonfunctional NLS has a 100-fold weaker affinity of 1 microm. Further correlation between the current in vitro data and in vivo function will provide the foundation for a comprehensive quantitative model of protein import.

  5. Deciphering the Hidden Informational Content of Protein Sequences

    PubMed Central

    Liu, Ming; Hua, Qing-xin; Hu, Shi-Quan; Jia, Wenhua; Yang, Yanwu; Saith, Sunil Evan; Whittaker, Jonathan; Arvan, Peter; Weiss, Michael A.

    2010-01-01

    Protein sequences encode both structure and foldability. Whereas the interrelationship of sequence and structure has been extensively investigated, the origins of folding efficiency are enigmatic. We demonstrate that the folding of proinsulin requires a flexible N-terminal hydrophobic residue that is dispensable for the structure, activity, and stability of the mature hormone. This residue (PheB1 in placental mammals) is variably positioned within crystal structures and exhibits 1H NMR motional narrowing in solution. Despite such flexibility, its deletion impaired insulin chain combination and led in cell culture to formation of non-native disulfide isomers with impaired secretion of the variant proinsulin. Cellular folding and secretion were maintained by hydrophobic substitutions at B1 but markedly perturbed by polar or charged side chains. We propose that, during folding, a hydrophobic side chain at B1 anchors transient long-range interactions by a flexible N-terminal arm (residues B1–B8) to mediate kinetic or thermodynamic partitioning among disulfide intermediates. Evidence for the overall contribution of the arm to folding was obtained by alanine scanning mutagenesis. Together, our findings demonstrate that efficient folding of proinsulin requires N-terminal sequences that are dispensable in the native state. Such arm-dependent folding can be abrogated by mutations associated with β-cell dysfunction and neonatal diabetes mellitus. PMID:20663888

  6. Multidimensional structure-function relationships in human β-cardiac myosin from population-scale genetic variation

    PubMed Central

    Homburger, Julian R.; Green, Eric M.; Caleshu, Colleen; Sunitha, Margaret S.; Taylor, Rebecca E.; Ruppel, Kathleen M.; Metpally, Raghu Prasad Rao; Colan, Steven D.; Michels, Michelle; Day, Sharlene M.; Olivotto, Iacopo; Bustamante, Carlos D.; Dewey, Frederick E.; Ho, Carolyn Y.; Spudich, James A.; Ashley, Euan A.

    2016-01-01

    Myosin motors are the fundamental force-generating elements of muscle contraction. Variation in the human β-cardiac myosin heavy chain gene (MYH7) can lead to hypertrophic cardiomyopathy (HCM), a heritable disease characterized by cardiac hypertrophy, heart failure, and sudden cardiac death. How specific myosin variants alter motor function or clinical expression of disease remains incompletely understood. Here, we combine structural models of myosin from multiple stages of its chemomechanical cycle, exome sequencing data from two population cohorts of 60,706 and 42,930 individuals, and genetic and phenotypic data from 2,913 patients with HCM to identify regions of disease enrichment within β-cardiac myosin. We first developed computational models of the human β-cardiac myosin protein before and after the myosin power stroke. Then, using a spatial scan statistic modified to analyze genetic variation in protein 3D space, we found significant enrichment of disease-associated variants in the converter, a kinetic domain that transduces force from the catalytic domain to the lever arm to accomplish the power stroke. Focusing our analysis on surface-exposed residues, we identified a larger region significantly enriched for disease-associated variants that contains both the converter domain and residues on a single flat surface on the myosin head described as the myosin mesa. Notably, patients with HCM with variants in the enriched regions have earlier disease onset than patients who have HCM with variants elsewhere. Our study provides a model for integrating protein structure, large-scale genetic sequencing, and detailed phenotypic data to reveal insight into time-shifted protein structures and genetic disease. PMID:27247418

  7. Conformational diversity in contryphans from Conus venom: cis-trans isomerisation and aromatic/proline interactions in the 23-membered ring of a 7-residue peptide disulfide loop.

    PubMed

    Sonti, Rajesh; Gowd, Konkallu Hanumae; Rao, K N Shashanka; Ragothama, Srinivasarao; Rodriguez, Alex; Perez, Juan Jesus; Balaram, Padmanabhan

    2013-11-04

    Conformational diversity or "shapeshifting" in cyclic peptide natural products can, in principle, confer a single molecular entity with the property of binding to multiple receptors. Conformational equilibria have been probed in the contryphans, which are peptides derived from Conus venom possessing a 23-membered cyclic disulfide moiety. The natural sequences derived from Conus inscriptus, GCV(D)LYPWC* (In936) and Conus loroisii, GCP(D)WDPWC* (Lo959) differ in the number of proline residues within the macrocyclic ring. Structural characterisation of distinct conformational states arising from cis-trans equilibria about Xxx-Pro bonds is reported. Isomerisation about the C2-P3 bond is observed in the case of Lo959 and about the Y5-P6 bond in In936. Evidence is presented for as many as four distinct species in the case of the synthetic analogue V3P In936. The Tyr-Pro-Trp segment in In936 is characterised by distinct sidechain orientations as a consequence of aromatic/proline interactions as evidenced by specific sidechain-sidechain nuclear Overhauser effects and ring current shifted proton chemical shifts. Molecular dynamics simulations suggest that Tyr5 and Trp7 sidechain conformations are correlated and depend on the geometry of the Xxx-Pro bond. Thermodynamic parameters are derived for the cis↔trans equilibrium for In936. Studies on synthetic analogues provide insights into the role of sequence effects in modulating isomerisation about Xxx-Pro bonds. Copyright © 2013 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  8. Identification and characterization of two arasin-like peptides in red swamp crayfish Procambarus clarkii.

    PubMed

    Chai, Lian-Qin; Li, Wan-Wan; Wang, Xian-Wei

    2017-11-01

    Antimicrobial peptides (AMPs) are small effectors in host defense by directly targeting microorganisms or by indirectly modulating immune responses. In the present study, two arasin like AMPs, named as Pc-arasin1 and Pc-arasin2, were identified in red swamp crayfish Procambarus clarkii with sequence similarity to the arasins found in Hyas araneus. Both Pc-arasins consisted of signal peptide, N-terminal proline-rich region and C-terminal region containing four conserved cysteine residues. The similarity of two Pc-arasins was 44.44%, and Pc-arasin2 contained several additional residues in the N-terminus. Multiple alignment of arasin family suggested the conservation of the C-terminus and the variation of the N-terminus of Pc-arasins. Both AMPs were found hemocytes-specific, and the expression could be induced the challenge of bacteria, espeacially by the pathogenic bacterium Aeromonas hydrophila. Knockdown of each Pc-arasin expression by double strand RNA would suppress the host immunity against A. hydrophila, and the commercially synthetic Pc-arasins could rescue the knockdown consequence. Both synthetic peptide showed broad antimicrobial activity towards 3 Gram-positive bacterium and 3 Gram-negative bacterium, and the minimal inhibitory concentrations varied from 6.25 μM to 50 μM. These results presented new data about the sequence, expression and function of arasin family, and emphasized the role of this family in host immune response against bacterial pathogens. The characterization of Pc-arasins also provided potential of therapeutic agent development for disease control in aquaculture based on these two newly identified AMPs. Copyright © 2017 Elsevier Ltd. All rights reserved.

  9. Comparative analysis of seven viral nuclear export signals (NESs) reveals the crucial role of nuclear export mediated by the third NES consensus sequence of nucleoprotein (NP) in influenza A virus replication.

    PubMed

    Chutiwitoonchai, Nopporn; Kakisaka, Michinori; Yamada, Kazunori; Aida, Yoko

    2014-01-01

    The assembly of influenza virus progeny virions requires machinery that exports viral genomic ribonucleoproteins from the cell nucleus. Currently, seven nuclear export signal (NES) consensus sequences have been identified in different viral proteins, including NS1, NS2, M1, and NP. The present study examined the roles of viral NES consensus sequences and their significance in terms of viral replication and nuclear export. Mutation of the NP-NES3 consensus sequence resulted in a failure to rescue viruses using a reverse genetics approach, whereas mutation of the NS2-NES1 and NS2-NES2 sequences led to a strong reduction in viral replication kinetics compared with the wild-type sequence. While the viral replication kinetics for other NES mutant viruses were also lower than those of the wild-type, the difference was not so marked. Immunofluorescence analysis after transient expression of NP-NES3, NS2-NES1, or NS2-NES2 proteins in host cells showed that they accumulated in the cell nucleus. These results suggest that the NP-NES3 consensus sequence is mostly required for viral replication. Therefore, each of the hydrophobic (Φ) residues within this NES consensus sequence (Φ1, Φ2, Φ3, or Φ4) was mutated, and its viral replication and nuclear export function were analyzed. No viruses harboring NP-NES3 Φ2 or Φ3 mutants could be rescued. Consistent with this, the NP-NES3 Φ2 and Φ3 mutants showed reduced binding affinity with CRM1 in a pull-down assay, and both accumulated in the cell nucleus. Indeed, a nuclear export assay revealed that these mutant proteins showed lower nuclear export activity than the wild-type protein. Moreover, the Φ2 and Φ3 residues (along with other Φ residues) within the NP-NES3 consensus were highly conserved among different influenza A viruses, including human, avian, and swine. Taken together, these results suggest that the Φ2 and Φ3 residues within the NP-NES3 protein are important for its nuclear export function during viral replication.

  10. Single Amino Acid Substitutions at Specific Positions of the Heptad Repeat Sequence of Piscidin-1 Yielded Novel Analogs That Show Low Cytotoxicity and In Vitro and In Vivo Antiendotoxin Activity

    PubMed Central

    Kumar, Amit; Tripathi, Amit Kumar; Kathuria, Manoj; Shree, Sonal; Tripathi, Jitendra Kumar; Purshottam, R. K.; Ramachandran, Ravishankar; Mitra, Kalyan

    2016-01-01

    Piscidin-1 possesses significant antimicrobial and cytotoxic activities. To recognize the primary amino acid sequence(s) in piscidin-1 that could be important for its biological activity, a long heptad repeat sequence located in the region from amino acids 2 to 19 was identified. To comprehend the possible role of this motif, six analogs of piscidin-1 were designed by selectively replacing a single isoleucine residue at a d (5th) position or at an a (9th or 16th) position with either an alanine or a valine residue. Two more analogs, namely, I5F,F6A-piscidin-1 and V12I-piscidin-1, were designed for investigating the effect of interchanging an alanine residue at a d position with an adjacent phenylalanine residue and replacing a valine residue with an isoleucine residue at another d position of the heptad repeat of piscidin-1, respectively. Single alanine-substituted analogs exhibited significantly reduced cytotoxicity against mammalian cells compared with that of piscidin-1 but appreciably retained the antibacterial and antiendotoxin activities of piscidin-1. All the single valine-substituted piscidin-1 analogs and I5F,F6A-piscidin-1 showed cytotoxicity greater than that of the corresponding alanine-substituted analogs, antibacterial activity marginally greater than or similar to that of the corresponding alanine-substituted analogs, and also antiendotoxin activity superior to that of the corresponding alanine-substituted analogs. Interestingly, among these peptides, V12I-piscidin-1 showed the highest cytotoxicity and antibacterial and antiendotoxin activities. Lipopolysaccharide (12 mg/kg of body weight)-treated mice, further treated with I16A-piscidin-1, the piscidin-1 analog with the highest therapeutic index, at a single dose of 1 or 2 mg/kg of body weight, showed 80 and 100% survival, respectively. Structural and functional characterization of these peptides revealed the basis of their biological activity and demonstrated that nontoxic piscidin-1 analogs with significant antimicrobial and antiendotoxin activities can be designed by incorporating single alanine substitutions in the piscidin-1 heptad repeat. PMID:27067326

  11. Evidence for the Concerted Evolution between Short Linear Protein Motifs and Their Flanking Regions

    PubMed Central

    Chica, Claudia; Diella, Francesca; Gibson, Toby J.

    2009-01-01

    Background Linear motifs are short modules of protein sequences that play a crucial role in mediating and regulating many protein–protein interactions. The function of linear motifs strongly depends on the context, e.g. functional instances mainly occur inside flexible regions that are accessible for interaction. Sometimes linear motifs appear as isolated islands of conservation in multiple sequence alignments. However, they also occur in larger blocks of sequence conservation, suggesting an active role for the neighbouring amino acids. Results The evolution of regions flanking 116 functional linear motif instances was studied. The conservation of the amino acid sequence and order/disorder tendency of those regions was related to presence/absence of the instance. For the majority of the analysed instances, the pairs of sequences conserving the linear motif were also observed to maintain a similar local structural tendency and/or to have higher local sequence conservation when compared to pairs of sequences where one is missing the linear motif. Furthermore, those instances have a higher chance to co–evolve with the neighbouring residues in comparison to the distant ones. Those findings are supported by examples where the regulation of the linear motif–mediated interaction has been shown to depend on the modifications (e.g. phosphorylation) at neighbouring positions or is thought to benefit from the binding versatility of disordered regions. Conclusion The results suggest that flanking regions are relevant for linear motif–mediated interactions, both at the structural and sequence level. More interestingly, they indicate that the prediction of linear motif instances can be enriched with contextual information by performing a sequence analysis similar to the one presented here. This can facilitate the understanding of the role of these predicted instances in determining the protein function inside the broader context of the cellular network where they arise. PMID:19584925

  12. Truncated presequences of mitochondrial F1-ATPase beta subunit from Nicotiana plumbaginifolia transport CAT and GUS proteins into mitochondria of transgenic tobacco.

    PubMed

    Chaumont, F; Silva Filho, M de C; Thomas, D; Leterme, S; Boutry, M

    1994-02-01

    The mitochondrial F1-ATPase beta subunit (ATPase-beta) of Nicotiana plumbaginifolia is nucleus-encoded as a precursor containing an NH2-terminal extension. By sequencing the mature N. tabacum ATPase-beta, we determined the length of the presequence, viz. 54 residues. To define the essential regions of this presequence, we produced a series of 3' deletions in the sequence coding for the 90 NH2-terminal residues of ATPase-beta. The truncated sequences were fused with the chloramphenicol acetyl transferase (cat) and beta-glucuronidase (gus) genes and introduced into tobacco plants. From the observed distribution of CAT and GUS activity in the plant cells, we conclude that the first 23 amino-acid residues of ATPase-beta remain capable of specifically targeting reporter proteins into mitochondria. Immunodetection in transgenic plants and in vitro import experiments with various CAT fusion proteins show that the precursors are processed at the expected cleavage site but also at a cryptic site located in the linker region between the presequence and the first methionine of native CAT.

  13. Sub-Lethal Effects of Pesticide Residues in Brood Comb on Worker Honey Bee (Apis mellifera) Development and Longevity

    PubMed Central

    Wu, Judy Y.; Anelli, Carol M.; Sheppard, Walter S.

    2011-01-01

    Background Numerous surveys reveal high levels of pesticide residue contamination in honey bee comb. We conducted studies to examine possible direct and indirect effects of pesticide exposure from contaminated brood comb on developing worker bees and adult worker lifespan. Methodology/Principal Findings Worker bees were reared in brood comb containing high levels of known pesticide residues (treatment) or in relatively uncontaminated brood comb (control). Delayed development was observed in bees reared in treatment combs containing high levels of pesticides particularly in the early stages (day 4 and 8) of worker bee development. Adult longevity was reduced by 4 days in bees exposed to pesticide residues in contaminated brood comb during development. Pesticide residue migration from comb containing high pesticide residues caused contamination of control comb after multiple brood cycles and provided insight on how quickly residues move through wax. Higher brood mortality and delayed adult emergence occurred after multiple brood cycles in contaminated control combs. In contrast, survivability increased in bees reared in treatment comb after multiple brood cycles when pesticide residues had been reduced in treatment combs due to residue migration into uncontaminated control combs, supporting comb replacement efforts. Chemical analysis after the experiment confirmed the migration of pesticide residues from treatment combs into previously uncontaminated control comb. Conclusions/Significance This study is the first to demonstrate sub-lethal effects on worker honey bees from pesticide residue exposure from contaminated brood comb. Sub-lethal effects, including delayed larval development and adult emergence or shortened adult longevity, can have indirect effects on the colony such as premature shifts in hive roles and foraging activity. In addition, longer development time for bees may provide a reproductive advantage for parasitic Varroa destructor mites. The impact of delayed development in bees on Varroa mite fecundity should be examined further. PMID:21373182

  14. Automatic phylogenetic classification of bacterial beta-lactamase sequences including structural and antibiotic substrate preference information.

    PubMed

    Ma, Jianmin; Eisenhaber, Frank; Maurer-Stroh, Sebastian

    2013-12-01

    Beta lactams comprise the largest and still most effective group of antibiotics, but bacteria can gain resistance through different beta lactamases that can degrade these antibiotics. We developed a user friendly tree building web server that allows users to assign beta lactamase sequences to their respective molecular classes and subclasses. Further clinically relevant information includes if the gene is typically chromosomal or transferable through plasmids as well as listing the antibiotics which the most closely related reference sequences are known to target and cause resistance against. This web server can automatically build three phylogenetic trees: the first tree with closely related sequences from a Tachyon search against the NCBI nr database, the second tree with curated reference beta lactamase sequences, and the third tree built specifically from substrate binding pocket residues of the curated reference beta lactamase sequences. We show that the latter is better suited to recover antibiotic substrate assignments through nearest neighbor annotation transfer. The users can also choose to build a structural model for the query sequence and view the binding pocket residues of their query relative to other beta lactamases in the sequence alignment as well as in the 3D structure relative to bound antibiotics. This web server is freely available at http://blac.bii.a-star.edu.sg/.

  15. The visualCMAT: A web-server to select and interpret correlated mutations/co-evolving residues in protein families.

    PubMed

    Suplatov, Dmitry; Sharapova, Yana; Timonina, Daria; Kopylov, Kirill; Švedas, Vytas

    2018-04-01

    The visualCMAT web-server was designed to assist experimental research in the fields of protein/enzyme biochemistry, protein engineering, and drug discovery by providing an intuitive and easy-to-use interface to the analysis of correlated mutations/co-evolving residues. Sequence and structural information describing homologous proteins are used to predict correlated substitutions by the Mutual information-based CMAT approach, classify them into spatially close co-evolving pairs, which either form a direct physical contact or interact with the same ligand (e.g. a substrate or a crystallographic water molecule), and long-range correlations, annotate and rank binding sites on the protein surface by the presence of statistically significant co-evolving positions. The results of the visualCMAT are organized for a convenient visual analysis and can be downloaded to a local computer as a content-rich all-in-one PyMol session file with multiple layers of annotation corresponding to bioinformatic, statistical and structural analyses of the predicted co-evolution, or further studied online using the built-in interactive analysis tools. The online interactivity is implemented in HTML5 and therefore neither plugins nor Java are required. The visualCMAT web-server is integrated with the Mustguseal web-server capable of constructing large structure-guided sequence alignments of protein families and superfamilies using all available information about their structures and sequences in public databases. The visualCMAT web-server can be used to understand the relationship between structure and function in proteins, implemented at selecting hotspots and compensatory mutations for rational design and directed evolution experiments to produce novel enzymes with improved properties, and employed at studying the mechanism of selective ligand's binding and allosteric communication between topologically independent sites in protein structures. The web-server is freely available at https://biokinet.belozersky.msu.ru/visualcmat and there are no login requirements.

  16. Simultaneous phylogeny reconstruction and multiple sequence alignment

    PubMed Central

    Yue, Feng; Shi, Jian; Tang, Jijun

    2009-01-01

    Background A phylogeny is the evolutionary history of a group of organisms. To date, sequence data is still the most used data type for phylogenetic reconstruction. Before any sequences can be used for phylogeny reconstruction, they must be aligned, and the quality of the multiple sequence alignment has been shown to affect the quality of the inferred phylogeny. At the same time, all the current multiple sequence alignment programs use a guide tree to produce the alignment and experiments showed that good guide trees can significantly improve the multiple alignment quality. Results We devise a new algorithm to simultaneously align multiple sequences and search for the phylogenetic tree that leads to the best alignment. We also implemented the algorithm as a C program package, which can handle both DNA and protein data and can take simple cost model as well as complex substitution matrices, such as PAM250 or BLOSUM62. The performance of the new method are compared with those from other popular multiple sequence alignment tools, including the widely used programs such as ClustalW and T-Coffee. Experimental results suggest that this method has good performance in terms of both phylogeny accuracy and alignment quality. Conclusion We present an algorithm to align multiple sequences and reconstruct the phylogenies that minimize the alignment score, which is based on an efficient algorithm to solve the median problems for three sequences. Our extensive experiments suggest that this method is very promising and can produce high quality phylogenies and alignments. PMID:19208110

  17. Ocellatin peptides from the skin secretion of the South American frog Leptodactylus labyrinthicus (Leptodactylidae): characterization, antimicrobial activities and membrane interactions.

    PubMed

    Gusmão, Karla A G; Dos Santos, Daniel M; Santos, Virgílio M; Cortés, María Esperanza; Reis, Pablo V M; Santos, Vera L; Piló-Veloso, Dorila; Verly, Rodrigo M; de Lima, Maria Elena; Resende, Jarbas M

    2017-01-01

    The availability of antimicrobial peptides from several different natural sources has opened an avenue for the discovery of new biologically active molecules. To the best of our knowledge, only two peptides isolated from the frog Leptodactylus labyrinthicus , namely pentadactylin and ocellatin-F1, have shown antimicrobial activities. Therefore, in order to explore the antimicrobial potential of this species, we have investigated the biological activities and membrane interactions of three peptides isolated from the anuran skin secretion. Three peptide primary structures were determined by automated Edman degradation. These sequences were prepared by solid-phase synthesis and submitted to activity assays against gram-positive and gram-negative bacteria and against two fungal strains. The hemolytic properties of the peptides were also investigated in assays with rabbit blood erythrocytes. The conformational preferences of the peptides and their membrane interactions have been investigated by circular dichroism spectroscopy and liposome dye release assays. The amino acid compositions of three ocellatins were determined and the sequences exhibit 100% homology for the first 22 residues (ocellatin-LB1 sequence). Ocellatin-LB2 carries an extra Asn residue and ocellatin-F1 extra Asn-Lys-Leu residues at C-terminus. Ocellatin-F1 presents a stronger antibiotic potential and a broader spectrum of activities compared to the other peptides. The membrane interactions and pore formation capacities of the peptides correlate directly with their antimicrobial activities, i.e., ocellatin-F1 > ocellatin-LB1 > ocellatin-LB2. All peptides acquire high helical contents in membrane environments. However, ocellatin-F1 shows in average stronger helical propensities. The obtained results indicate that the three extra amino acid residues at the ocellatin-F1 C-terminus play an important role in promoting stronger peptide-membrane interactions and antimicrobial properties. The extra Asn-23 residue present in ocellatin-LB2 sequence seems to decrease its antimicrobial potential and the strength of the peptide-membrane interactions.

  18. Sequence Elucidation of an Unknown Cyclic Peptide of High Doping Potential by ETD and CID Tandem Mass Spectrometry

    NASA Astrophysics Data System (ADS)

    Guan, Fuyu; Uboh, Cornelius E.; Soma, Lawrence R.; Rudy, Jeffrey

    2011-04-01

    Identification of an unknown substance without any information remains a daunting challenge despite advances in chemistry and mass spectrometry. However, an unknown cyclic peptide in a sample with very limited volume seized at a Pennsylvania racetrack has been successfully identified. The unknown sample was determined by accurate mass measurements to contain a small unknown peptide as the major component. Collision-induced dissociation (CID) of the unknown peptide revealed the presence of Lys (not Gln, by accurate mass), Phe, and Arg residues, and absence of any y-type product ion. The latter, together with the tryptic digestion results of the unusual deamidation and absence of any tryptic cleavage, suggests a cyclic structure for the peptide. Electron-transfer dissociation (ETD) of the unknown peptide indicated the presence of Gln (not Lys, by the unusual deamidation), Phe, and Arg residues and their connectivity. After all the results were pieced together, a cyclic tetrapeptide, cyclo[Arg-Lys-N(C6H9)Gln-Phe], is proposed for the unknown peptide. Observations of different amino acid residues from CID and ETD experiments for the peptide were interpreted by a fragmentation pathway proposed, as was preferential CID loss of a Lys residue from the peptide. ETD was used for the first time in sequencing of a cyclic peptide; product ions resulting from ETD of the peptide identified were categorized into two types and named pseudo-b and pseudo-z ions that are important for sequencing of cyclic peptides. The ETD product ions were interpreted by fragmentation pathways proposed. Additionally, multi-stage CID mass spectrometry cannot provide complete sequence information for cyclic peptides containing adjacent Arg and Lys residues. The identified cyclic peptide has not been documented in the literature, its pharmacological effects are unknown, but it might be a "designer" drug with athletic performance-enhancing effects.

  19. Structure-Based Sequence Alignment of the Transmembrane Domains of All Human GPCRs: Phylogenetic, Structural and Functional Implications

    PubMed Central

    Cvicek, Vaclav; Goddard, William A.; Abrol, Ravinder

    2016-01-01

    The understanding of G-protein coupled receptors (GPCRs) is undergoing a revolution due to increased information about their signaling and the experimental determination of structures for more than 25 receptors. The availability of at least one receptor structure for each of the GPCR classes, well separated in sequence space, enables an integrated superfamily-wide analysis to identify signatures involving the role of conserved residues, conserved contacts, and downstream signaling in the context of receptor structures. In this study, we align the transmembrane (TM) domains of all experimental GPCR structures to maximize the conserved inter-helical contacts. The resulting superfamily-wide GpcR Sequence-Structure (GRoSS) alignment of the TM domains for all human GPCR sequences is sufficient to generate a phylogenetic tree that correctly distinguishes all different GPCR classes, suggesting that the class-level differences in the GPCR superfamily are encoded at least partly in the TM domains. The inter-helical contacts conserved across all GPCR classes describe the evolutionarily conserved GPCR structural fold. The corresponding structural alignment of the inactive and active conformations, available for a few GPCRs, identifies activation hot-spot residues in the TM domains that get rewired upon activation. Many GPCR mutations, known to alter receptor signaling and cause disease, are located at these conserved contact and activation hot-spot residue positions. The GRoSS alignment places the chemosensory receptor subfamilies for bitter taste (TAS2R) and pheromones (Vomeronasal, VN1R) in the rhodopsin family, known to contain the chemosensory olfactory receptor subfamily. The GRoSS alignment also enables the quantification of the structural variability in the TM regions of experimental structures, useful for homology modeling and structure prediction of receptors. Furthermore, this alignment identifies structurally and functionally important residues in all human GPCRs. These residues can be used to make testable hypotheses about the structural basis of receptor function and about the molecular basis of disease-associated single nucleotide polymorphisms. PMID:27028541

  20. The cellodextrinase from Pseudomonas fluorescens subsp. cellulosa consists of multiple functional domains.

    PubMed Central

    Ferreira, L M; Hazlewood, G P; Barker, P J; Gilbert, H J

    1991-01-01

    A genomic library of Pseudomonas fluorescens subsp. cellulosa DNA was constructed in pUC18 and Escherichia coli recombinants expressing 4-methylumbelliferyl beta-D-cellobioside-hydrolysing activity (MUCase) were isolated. Enzyme produced by MUCase-positive clones did not hydrolyse either cellobiose or cellotriose but converted cellotetraose into cellobiose and cleaved cellopentaose and cellohexaose, producing a mixture of cellobiose and cellotriose. There was no activity against CM-cellulose, insoluble cellulose or xylan. On this basis, the enzyme is identified as an endo-acting cellodextrinase and is designated cellodextrinase C (CELC). Nucleotide sequencing of the gene (celC) which directs the synthesis of CELC revealed an open reading frame of 2153 bp, encoding a protein of Mr 80,189. The deduced primary sequence of CELC was confirmed by the Mr of purified CELC (77,000) and by the experimentally determined N-terminus of the enzyme which was identical with residues 38-47 of the translated sequence. The N-terminal region of CELC showed strong homology with endoglucanase, xylanases and an arabinofuranosidase of Ps. fluorescens subsp. cellulosa; homologous sequences included highly conserved serine-rich regions. Full-length CELC bound tightly to crystalline cellulose. Truncated forms of celC from which the DNA sequence encoding the conserved domain had been deleted, directed the synthesis of a functional cellodextrinase that did not bind to crystalline cellulose. This is consistent with the N-terminal region of CELC comprising a non-catalytic cellulose-binding domain which is distinct from the catalytic domain. The role of the cellulose-binding region is discussed. Images Fig. 2. Fig. 6. PMID:1953673

Top