SeqRate: sequence-based protein folding type classification and rates prediction
2010-01-01
Background Protein folding rate is an important property of a protein. Predicting protein folding rate is useful for understanding protein folding process and guiding protein design. Most previous methods of predicting protein folding rate require the tertiary structure of a protein as an input. And most methods do not distinguish the different kinetic nature (two-state folding or multi-state folding) of the proteins. Here we developed a method, SeqRate, to predict both protein folding kinetic type (two-state versus multi-state) and real-value folding rate using sequence length, amino acid composition, contact order, contact number, and secondary structure information predicted from only protein sequence with support vector machines. Results We systematically studied the contributions of individual features to folding rate prediction. On a standard benchmark dataset, the accuracy of folding kinetic type classification is 80%. The Pearson correlation coefficient and the mean absolute difference between predicted and experimental folding rates (sec-1) in the base-10 logarithmic scale are 0.81 and 0.79 for two-state protein folders, and 0.80 and 0.68 for three-state protein folders. SeqRate is the first sequence-based method for protein folding type classification and its accuracy of fold rate prediction is improved over previous sequence-based methods. Its performance can be further enhanced with additional information, such as structure-based geometric contacts, as inputs. Conclusions Both the web server and software of predicting folding rate are publicly available at http://casp.rnet.missouri.edu/fold_rate/index.html. PMID:20438647
On the Origin of Protein Superfamilies and Superfolds
NASA Astrophysics Data System (ADS)
Magner, Abram; Szpankowski, Wojciech; Kihara, Daisuke
2015-02-01
Distributions of protein families and folds in genomes are highly skewed, having a small number of prevalent superfamiles/superfolds and a large number of families/folds of a small size. Why are the distributions of protein families and folds skewed? Why are there only a limited number of protein families? Here, we employ an information theoretic approach to investigate the protein sequence-structure relationship that leads to the skewed distributions. We consider that protein sequences and folds constitute an information theoretic channel and computed the most efficient distribution of sequences that code all protein folds. The identified distributions of sequences and folds are found to follow a power law, consistent with those observed for proteins in nature. Importantly, the skewed distributions of sequences and folds are suggested to have different origins: the skewed distribution of sequences is due to evolutionary pressure to achieve efficient coding of necessary folds, whereas that of folds is based on the thermodynamic stability of folds. The current study provides a new information theoretic framework for proteins that could be widely applied for understanding protein sequences, structures, functions, and interactions.
Coarse-grained sequences for protein folding and design.
Brown, Scott; Fawzi, Nicolas J; Head-Gordon, Teresa
2003-09-16
We present the results of sequence design on our off-lattice minimalist model in which no specification of native-state tertiary contacts is needed. We start with a sequence that adopts a target topology and build on it through sequence mutation to produce new sequences that comprise distinct members within a target fold class. In this work, we use the alpha/beta ubiquitin fold class and design two new sequences that, when characterized through folding simulations, reproduce the differences in folding mechanism seen experimentally for proteins L and G. The primary implication of this work is that patterning of hydrophobic and hydrophilic residues is the physical origin for the success of relative contact-order descriptions of folding, and that these physics-based potentials provide a predictive connection between free energy landscapes and amino acid sequence (the original protein folding problem). We present results of the sequence mapping from a 20- to the three-letter code for determining a sequence that folds into the WW domain topology to illustrate future extensions to protein design.
Coarse-grained sequences for protein folding and design
Brown, Scott; Fawzi, Nicolas J.; Head-Gordon, Teresa
2003-01-01
We present the results of sequence design on our off-lattice minimalist model in which no specification of native-state tertiary contacts is needed. We start with a sequence that adopts a target topology and build on it through sequence mutation to produce new sequences that comprise distinct members within a target fold class. In this work, we use the α/β ubiquitin fold class and design two new sequences that, when characterized through folding simulations, reproduce the differences in folding mechanism seen experimentally for proteins L and G. The primary implication of this work is that patterning of hydrophobic and hydrophilic residues is the physical origin for the success of relative contact-order descriptions of folding, and that these physics-based potentials provide a predictive connection between free energy landscapes and amino acid sequence (the original protein folding problem). We present results of the sequence mapping from a 20- to the three-letter code for determining a sequence that folds into the WW domain topology to illustrate future extensions to protein design. PMID:12963815
Exploring the sequence-structure protein landscape in the glycosyltransferase family
Zhang, Ziding; Kochhar, Sunil; Grigorov, Martin
2003-01-01
To understand the molecular basis of glycosyltransferases’ (GTFs) catalytic mechanism, extensive structural information is required. Here, fold recognition methods were employed to assign 3D protein shapes (folds) to the currently known GTF sequences, available in public databases such as GenBank and Swissprot. First, GTF sequences were retrieved and classified into clusters, based on sequence similarity only. Intracluster sequence similarity was chosen sufficiently high to ensure that the same fold is found within a given cluster. Then, a representative sequence from each cluster was selected to compose a subset of GTF sequences. The members of this reduced set were processed by three different fold recognition methods: 3D-PSSM, FUGUE, and GeneFold. Finally, the results from different fold recognition methods were analyzed and compared to sequence-similarity search methods (i.e., BLAST and PSI-BLAST). It was established that the folds of about 70% of all currently known GTF sequences can be confidently assigned by fold recognition methods, a value which is higher than the fold identification rate based on sequence comparison alone (48% for BLAST and 64% for PSI-BLAST). The identified folds were submitted to 3D clustering, and we found that most of the GTF sequences adopt the typical GTF A or GTF B folds. Our results indicate a lack of evidence that new GTF folds (i.e., folds other than GTF A and B) exist. Based on cases where fold identification was not possible, we suggest several sequences as the most promising targets for a structural genomics initiative focused on the GTF protein family. PMID:14500887
A strategy for detecting the conservation of folding-nucleus residues in protein superfamilies.
Michnick, S W; Shakhnovich, E
1998-01-01
Nucleation-growth theory predicts that fast-folding peptide sequences fold to their native structure via structures in a transition-state ensemble that share a small number of native contacts (the folding nucleus). Experimental and theoretical studies of proteins suggest that residues participating in folding nuclei are conserved among homologs. We attempted to determine if this is true in proteins with highly diverged sequences but identical folds (superfamilies). We describe a strategy based on comparisons of residue conservation in natural superfamily sequences with simulated sequences (generated with a Monte-Carlo sequence design strategy) for the same proteins. The basic assumptions of the strategy were that natural sequences will conserve residues needed for folding and stability plus function, the simulated sequences contain no functional conservation, and nucleus residues make native contacts with each other. Based on these assumptions, we identified seven potential nucleus residues in ubiquitin superfamily members. Non-nucleus conserved residues were also identified; these are proposed to be involved in stabilizing native interactions. We found that all superfamily members conserved the same potential nucleus residue positions, except those for which the structural topology is significantly different. Our results suggest that the conservation of the nucleus of a specific fold can be predicted by comparing designed simulated sequences with natural highly diverged sequences that fold to the same structure. We suggest that such a strategy could be used to help plan protein folding and design experiments, to identify new superfamily members, and to subdivide superfamilies further into classes having a similar folding mechanism.
An Evolution-Based Approach to De Novo Protein Design and Case Study on Mycobacterium tuberculosis
Brender, Jeffrey R.; Czajka, Jeff; Marsh, David; Gray, Felicia; Cierpicki, Tomasz; Zhang, Yang
2013-01-01
Computational protein design is a reverse procedure of protein folding and structure prediction, where constructing structures from evolutionarily related proteins has been demonstrated to be the most reliable method for protein 3-dimensional structure prediction. Following this spirit, we developed a novel method to design new protein sequences based on evolutionarily related protein families. For a given target structure, a set of proteins having similar fold are identified from the PDB library by structural alignments. A structural profile is then constructed from the protein templates and used to guide the conformational search of amino acid sequence space, where physicochemical packing is accommodated by single-sequence based solvation, torsion angle, and secondary structure predictions. The method was tested on a computational folding experiment based on a large set of 87 protein structures covering different fold classes, which showed that the evolution-based design significantly enhances the foldability and biological functionality of the designed sequences compared to the traditional physics-based force field methods. Without using homologous proteins, the designed sequences can be folded with an average root-mean-square-deviation of 2.1 Å to the target. As a case study, the method is extended to redesign all 243 structurally resolved proteins in the pathogenic bacteria Mycobacterium tuberculosis, which is the second leading cause of death from infectious disease. On a smaller scale, five sequences were randomly selected from the design pool and subjected to experimental validation. The results showed that all the designed proteins are soluble with distinct secondary structure and three have well ordered tertiary structure, as demonstrated by circular dichroism and NMR spectroscopy. Together, these results demonstrate a new avenue in computational protein design that uses knowledge of evolutionary conservation from protein structural families to engineer new protein molecules of improved fold stability and biological functionality. PMID:24204234
A protein block based fold recognition method for the annotation of twilight zone sequences.
Suresh, V; Ganesan, K; Parthasarathy, S
2013-03-01
The description of protein backbone was recently improved with a group of structural fragments called Structural Alphabets instead of the regular three states (Helix, Sheet and Coil) secondary structure description. Protein Blocks is one of the Structural Alphabets used to describe each and every region of protein backbone including the coil. According to de Brevern (2000) the Protein Blocks has 16 structural fragments and each one has 5 residues in length. Protein Blocks fragments are highly informative among the available Structural Alphabets and it has been used for many applications. Here, we present a protein fold recognition method based on Protein Blocks for the annotation of twilight zone sequences. In our method, we align the predicted Protein Blocks of a query amino acid sequence with a library of assigned Protein Blocks of 953 known folds using the local pair-wise alignment. The alignment results with z-value ≥ 2.5 and P-value ≤ 0.08 are predicted as possible folds. Our method is able to recognize the possible folds for nearly 35.5% of the twilight zone sequences with their predicted Protein Block sequence obtained by pb_prediction, which is available at Protein Block Export server.
Designing pH induced fold switch in proteins
NASA Astrophysics Data System (ADS)
Baruah, Anupaul; Biswas, Parbati
2015-05-01
This work investigates the computational design of a pH induced protein fold switch based on a self-consistent mean-field approach by identifying the ensemble averaged characteristics of sequences that encode a fold switch. The primary challenge to balance the alternative sets of interactions present in both target structures is overcome by simultaneously optimizing two foldability criteria corresponding to two target structures. The change in pH is modeled by altering the residual charge on the amino acids. The energy landscape of the fold switch protein is found to be double funneled. The fold switch sequences stabilize the interactions of the sites with similar relative surface accessibility in both target structures. Fold switch sequences have low sequence complexity and hence lower sequence entropy. The pH induced fold switch is mediated by attractive electrostatic interactions rather than hydrophobic-hydrophobic contacts. This study may provide valuable insights to the design of fold switch proteins.
Exploring the Sequence-based Prediction of Folding Initiation Sites in Proteins.
Raimondi, Daniele; Orlando, Gabriele; Pancsa, Rita; Khan, Taushif; Vranken, Wim F
2017-08-18
Protein folding is a complex process that can lead to disease when it fails. Especially poorly understood are the very early stages of protein folding, which are likely defined by intrinsic local interactions between amino acids close to each other in the protein sequence. We here present EFoldMine, a method that predicts, from the primary amino acid sequence of a protein, which amino acids are likely involved in early folding events. The method is based on early folding data from hydrogen deuterium exchange (HDX) data from NMR pulsed labelling experiments, and uses backbone and sidechain dynamics as well as secondary structure propensities as features. The EFoldMine predictions give insights into the folding process, as illustrated by a qualitative comparison with independent experimental observations. Furthermore, on a quantitative proteome scale, the predicted early folding residues tend to become the residues that interact the most in the folded structure, and they are often residues that display evolutionary covariation. The connection of the EFoldMine predictions with both folding pathway data and the folded protein structure suggests that the initial statistical behavior of the protein chain with respect to local structure formation has a lasting effect on its subsequent states.
Atomic interaction networks in the core of protein domains and their native folds.
Soundararajan, Venkataramanan; Raman, Rahul; Raguram, S; Sasisekharan, V; Sasisekharan, Ram
2010-02-23
Vastly divergent sequences populate a majority of protein folds. In the quest to identify features that are conserved within protein domains belonging to the same fold, we set out to examine the entire protein universe on a fold-by-fold basis. We report that the atomic interaction network in the solvent-unexposed core of protein domains are fold-conserved, extraordinary sequence divergence notwithstanding. Further, we find that this feature, termed protein core atomic interaction network (or PCAIN) is significantly distinguishable across different folds, thus appearing to be "signature" of a domain's native fold. As part of this study, we computed the PCAINs for 8698 representative protein domains from families across the 1018 known protein folds to construct our seed database and an automated framework was developed for PCAIN-based characterization of the protein fold universe. A test set of randomly selected domains that are not in the seed database was classified with over 97% accuracy, independent of sequence divergence. As an application of this novel fold signature, a PCAIN-based scoring scheme was developed for comparative (homology-based) structure prediction, with 1-2 angstroms (mean 1.61A) C(alpha) RMSD generally observed between computed structures and reference crystal structures. Our results are consistent across the full spectrum of test domains including those from recent CASP experiments and most notably in the 'twilight' and 'midnight' zones wherein <30% and <10% target-template sequence identity prevails (mean twilight RMSD of 1.69A). We further demonstrate the utility of the PCAIN protocol to derive biological insight into protein structure-function relationships, by modeling the structure of the YopM effector novel E3 ligase (NEL) domain from plague-causative bacterium Yersinia Pestis and discussing its implications for host adaptive and innate immune modulation by the pathogen. Considering the several high-throughput, sequence-identity-independent applications demonstrated in this work, we suggest that the PCAIN is a fundamental fold feature that could be a valuable addition to the arsenal of protein modeling and analysis tools.
Atomic Interaction Networks in the Core of Protein Domains and Their Native Folds
Soundararajan, Venkataramanan; Raman, Rahul; Raguram, S.; Sasisekharan, V.; Sasisekharan, Ram
2010-01-01
Vastly divergent sequences populate a majority of protein folds. In the quest to identify features that are conserved within protein domains belonging to the same fold, we set out to examine the entire protein universe on a fold-by-fold basis. We report that the atomic interaction network in the solvent-unexposed core of protein domains are fold-conserved, extraordinary sequence divergence notwithstanding. Further, we find that this feature, termed protein core atomic interaction network (or PCAIN) is significantly distinguishable across different folds, thus appearing to be “signature” of a domain's native fold. As part of this study, we computed the PCAINs for 8698 representative protein domains from families across the 1018 known protein folds to construct our seed database and an automated framework was developed for PCAIN-based characterization of the protein fold universe. A test set of randomly selected domains that are not in the seed database was classified with over 97% accuracy, independent of sequence divergence. As an application of this novel fold signature, a PCAIN-based scoring scheme was developed for comparative (homology-based) structure prediction, with 1–2 angstroms (mean 1.61A) Cα RMSD generally observed between computed structures and reference crystal structures. Our results are consistent across the full spectrum of test domains including those from recent CASP experiments and most notably in the ‘twilight’ and ‘midnight’ zones wherein <30% and <10% target-template sequence identity prevails (mean twilight RMSD of 1.69A). We further demonstrate the utility of the PCAIN protocol to derive biological insight into protein structure-function relationships, by modeling the structure of the YopM effector novel E3 ligase (NEL) domain from plague-causative bacterium Yersinia Pestis and discussing its implications for host adaptive and innate immune modulation by the pathogen. Considering the several high-throughput, sequence-identity-independent applications demonstrated in this work, we suggest that the PCAIN is a fundamental fold feature that could be a valuable addition to the arsenal of protein modeling and analysis tools. PMID:20186337
Liu, Bin; Wang, Xiaolong; Lin, Lei; Dong, Qiwen; Wang, Xuan
2008-12-01
Protein remote homology detection and fold recognition are central problems in bioinformatics. Currently, discriminative methods based on support vector machine (SVM) are the most effective and accurate methods for solving these problems. A key step to improve the performance of the SVM-based methods is to find a suitable representation of protein sequences. In this paper, a novel building block of proteins called Top-n-grams is presented, which contains the evolutionary information extracted from the protein sequence frequency profiles. The protein sequence frequency profiles are calculated from the multiple sequence alignments outputted by PSI-BLAST and converted into Top-n-grams. The protein sequences are transformed into fixed-dimension feature vectors by the occurrence times of each Top-n-gram. The training vectors are evaluated by SVM to train classifiers which are then used to classify the test protein sequences. We demonstrate that the prediction performance of remote homology detection and fold recognition can be improved by combining Top-n-grams and latent semantic analysis (LSA), which is an efficient feature extraction technique from natural language processing. When tested on superfamily and fold benchmarks, the method combining Top-n-grams and LSA gives significantly better results compared to related methods. The method based on Top-n-grams significantly outperforms the methods based on many other building blocks including N-grams, patterns, motifs and binary profiles. Therefore, Top-n-gram is a good building block of the protein sequences and can be widely used in many tasks of the computational biology, such as the sequence alignment, the prediction of domain boundary, the designation of knowledge-based potentials and the prediction of protein binding sites.
Simulating protein folding initiation sites using an alpha-carbon-only knowledge-based force field
Buck, Patrick M.; Bystroff, Christopher
2015-01-01
Protein folding is a hierarchical process where structure forms locally first, then globally. Some short sequence segments initiate folding through strong structural preferences that are independent of their three-dimensional context in proteins. We have constructed a knowledge-based force field in which the energy functions are conditional on local sequence patterns, as expressed in the hidden Markov model for local structure (HMMSTR). Carbon-alpha force field (CALF) builds sequence specific statistical potentials based on database frequencies for α-carbon virtual bond opening and dihedral angles, pairwise contacts and hydrogen bond donor-acceptor pairs, and simulates folding via Brownian dynamics. We introduce hydrogen bond donor and acceptor potentials as α-carbon probability fields that are conditional on the predicted local sequence. Constant temperature simulations were carried out using 27 peptides selected as putative folding initiation sites, each 12 residues in length, representing several different local structure motifs. Each 0.6 μs trajectory was clustered based on structure. Simulation convergence or representativeness was assessed by subdividing trajectories and comparing clusters. For 21 of the 27 sequences, the largest cluster made up more than half of the total trajectory. Of these 21 sequences, 14 had cluster centers that were at most 2.6 Å root mean square deviation (RMSD) from their native structure in the corresponding full-length protein. To assess the adequacy of the energy function on nonlocal interactions, 11 full length native structures were relaxed using Brownian dynamics simulations. Equilibrated structures deviated from their native states but retained their overall topology and compactness. A simple potential that folds proteins locally and stabilizes proteins globally may enable a more realistic understanding of hierarchical folding pathways. PMID:19137613
Mining sequential patterns for protein fold recognition.
Exarchos, Themis P; Papaloukas, Costas; Lampros, Christos; Fotiadis, Dimitrios I
2008-02-01
Protein data contain discriminative patterns that can be used in many beneficial applications if they are defined correctly. In this work sequential pattern mining (SPM) is utilized for sequence-based fold recognition. Protein classification in terms of fold recognition plays an important role in computational protein analysis, since it can contribute to the determination of the function of a protein whose structure is unknown. Specifically, one of the most efficient SPM algorithms, cSPADE, is employed for the analysis of protein sequence. A classifier uses the extracted sequential patterns to classify proteins in the appropriate fold category. For training and evaluating the proposed method we used the protein sequences from the Protein Data Bank and the annotation of the SCOP database. The method exhibited an overall accuracy of 25% in a classification problem with 36 candidate categories. The classification performance reaches up to 56% when the five most probable protein folds are considered.
Chikenji, George; Fujitsuka, Yoshimi; Takada, Shoji
2006-02-28
Predicting protein tertiary structure by folding-like simulations is one of the most stringent tests of how much we understand the principle of protein folding. Currently, the most successful method for folding-based structure prediction is the fragment assembly (FA) method. Here, we address why the FA method is so successful and its lesson for the folding problem. To do so, using the FA method, we designed a structure prediction test of "chimera proteins." In the chimera proteins, local structural preference is specific to the target sequences, whereas nonlocal interactions are only sequence-independent compaction forces. We find that these chimera proteins can find the native folds of the intact sequences with high probability indicating dominant roles of the local interactions. We further explore roles of local structural preference by exact calculation of the HP lattice model of proteins. From these results, we suggest principles of protein folding: For small proteins, compact structures that are fully compatible with local structural preference are few, one of which is the native fold. These local biases shape up the funnel-like energy landscape.
Shaping up the protein folding funnel by local interaction: Lesson from a structure prediction study
Chikenji, George; Fujitsuka, Yoshimi; Takada, Shoji
2006-01-01
Predicting protein tertiary structure by folding-like simulations is one of the most stringent tests of how much we understand the principle of protein folding. Currently, the most successful method for folding-based structure prediction is the fragment assembly (FA) method. Here, we address why the FA method is so successful and its lesson for the folding problem. To do so, using the FA method, we designed a structure prediction test of “chimera proteins.” In the chimera proteins, local structural preference is specific to the target sequences, whereas nonlocal interactions are only sequence-independent compaction forces. We find that these chimera proteins can find the native folds of the intact sequences with high probability indicating dominant roles of the local interactions. We further explore roles of local structural preference by exact calculation of the HP lattice model of proteins. From these results, we suggest principles of protein folding: For small proteins, compact structures that are fully compatible with local structural preference are few, one of which is the native fold. These local biases shape up the funnel-like energy landscape. PMID:16488978
How Many Protein Sequences Fold to a Given Structure? A Coevolutionary Analysis.
Tian, Pengfei; Best, Robert B
2017-10-17
Quantifying the relationship between protein sequence and structure is key to understanding the protein universe. A fundamental measure of this relationship is the total number of amino acid sequences that can fold to a target protein structure, known as the "sequence capacity," which has been suggested as a proxy for how designable a given protein fold is. Although sequence capacity has been extensively studied using lattice models and theory, numerical estimates for real protein structures are currently lacking. In this work, we have quantitatively estimated the sequence capacity of 10 proteins with a variety of different structures using a statistical model based on residue-residue co-evolution to capture the variation of sequences from the same protein family. Remarkably, we find that even for the smallest protein folds, such as the WW domain, the number of foldable sequences is extremely large, exceeding the Avogadro constant. In agreement with earlier theoretical work, the calculated sequence capacity is positively correlated with the size of the protein, or better, the density of contacts. This allows the absolute sequence capacity of a given protein to be approximately predicted from its structure. On the other hand, the relative sequence capacity, i.e., normalized by the total number of possible sequences, is an extremely tiny number and is strongly anti-correlated with the protein length. Thus, although there may be more foldable sequences for larger proteins, it will be much harder to find them. Lastly, we have correlated the evolutionary age of proteins in the CATH database with their sequence capacity as predicted by our model. The results suggest a trade-off between the opposing requirements of high designability and the likelihood of a novel fold emerging by chance. Published by Elsevier Inc.
Xu, Dong; Zhang, Yang
2013-01-01
Genome-wide protein structure prediction and structure-based function annotation have been a long-term goal in molecular biology but not yet become possible due to difficulties in modeling distant-homology targets. We developed a hybrid pipeline combining ab initio folding and template-based modeling for genome-wide structure prediction applied to the Escherichia coli genome. The pipeline was tested on 43 known sequences, where QUARK-based ab initio folding simulation generated models with TM-score 17% higher than that by traditional comparative modeling methods. For 495 unknown hard sequences, 72 are predicted to have a correct fold (TM-score > 0.5) and 321 have a substantial portion of structure correctly modeled (TM-score > 0.35). 317 sequences can be reliably assigned to a SCOP fold family based on structural analogy to existing proteins in PDB. The presented results, as a case study of E. coli, represent promising progress towards genome-wide structure modeling and fold family assignment using state-of-the-art ab initio folding algorithms. PMID:23719418
Insights into the fold organization of TIM barrel from interaction energy based structure networks.
Vijayabaskar, M S; Vishveshwara, Saraswathi
2012-01-01
There are many well-known examples of proteins with low sequence similarity, adopting the same structural fold. This aspect of sequence-structure relationship has been extensively studied both experimentally and theoretically, however with limited success. Most of the studies consider remote homology or "sequence conservation" as the basis for their understanding. Recently "interaction energy" based network formalism (Protein Energy Networks (PENs)) was developed to understand the determinants of protein structures. In this paper we have used these PENs to investigate the common non-covalent interactions and their collective features which stabilize the TIM barrel fold. We have also developed a method of aligning PENs in order to understand the spatial conservation of interactions in the fold. We have identified key common interactions responsible for the conservation of the TIM fold, despite high sequence dissimilarity. For instance, the central beta barrel of the TIM fold is stabilized by long-range high energy electrostatic interactions and low-energy contiguous vdW interactions in certain families. The other interfaces like the helix-sheet or the helix-helix seem to be devoid of any high energy conserved interactions. Conserved interactions in the loop regions around the catalytic site of the TIM fold have also been identified, pointing out their significance in both structural and functional evolution. Based on these investigations, we have developed a novel network based phylogenetic analysis for remote homologues, which can perform better than sequence based phylogeny. Such an analysis is more meaningful from both structural and functional evolutionary perspective. We believe that the information obtained through the "interaction conservation" viewpoint and the subsequently developed method of structure network alignment, can shed new light in the fields of fold organization and de novo computational protein design.
Rajendran, Senthilnathan; Jothi, Arunachalam
2018-05-16
The Three-dimensional structure of a protein depends on the interaction between their amino acid residues. These interactions are in turn influenced by various biophysical properties of the amino acids. There are several examples of proteins that share the same fold but are very dissimilar at the sequence level. For proteins to share a common fold some crucial interactions should be maintained despite insignificant sequence similarity. Since the interactions are because of the biophysical properties of the amino acids, we should be able to detect descriptive patterns for folds at such a property level. In this line, the main focus of our research is to analyze such proteins and to characterize them in terms of their biophysical properties. Protein structures with sequence similarity lesser than 40% were selected for ten different subfolds from three different mainfolds (according to CATH classification) and were used for this analysis. We used the normalized values of the 49 physio-chemical, energetic and conformational properties of amino acids. We characterize the folds based on the average biophysical property values. We also observed a fold specific correlational behavior of biophysical properties despite a very low sequence similarity in our data. We further trained three different binary classification models (Naive Bayes-NB, Support Vector Machines-SVM and Bayesian Generalized Linear Model-BGLM) which could discriminate mainfold based on the biophysical properties. We also show that among the three generated models, the BGLM classifier model was able to discriminate protein sequences coming under all beta category with 81.43% accuracy and all alpha, alpha-beta proteins with 83.37% accuracy. Copyright © 2018 Elsevier Ltd. All rights reserved.
Evolution of the arginase fold and functional diversity
Dowling, Daniel P.; Costanzo, Luigi Di; Gennadios, Heather A.; Christianson, David W.
2009-01-01
The large number of protein structures deposited in the Protein Data Bank allows for the identification of novel structural superfamilies based on conservation of fold in addition to conservation of amino acid sequence. Since sequence diverges more rapidly than fold in protein evolution, proteins with little or no significant sequence identity are occasionally observed to adopt similar folds, thereby reflecting unanticipated evolutionary relationships. Here, we review the unique α/β fold first observed in the manganese metalloenzyme rat liver arginase, consisting of a parallel 8 stranded β-sheet surrounded by several helices, and its evolutionary relationship with the zinc-requiring and/or iron-requiring histone deacetylases and acetylpolyamine amidohydrolases. Structural comparisons reveal key features of the core α/β fold that contribute to the divergent metal ion specificity and stoichiometry required for the chemical and biological functions of these enzymes. PMID:18360740
Xia, Jiaqi; Peng, Zhenling; Qi, Dawei; Mu, Hongbo; Yang, Jianyi
2017-03-15
Protein fold classification is a critical step in protein structure prediction. There are two possible ways to classify protein folds. One is through template-based fold assignment and the other is ab-initio prediction using machine learning algorithms. Combination of both solutions to improve the prediction accuracy was never explored before. We developed two algorithms, HH-fold and SVM-fold for protein fold classification. HH-fold is a template-based fold assignment algorithm using the HHsearch program. SVM-fold is a support vector machine-based ab-initio classification algorithm, in which a comprehensive set of features are extracted from three complementary sequence profiles. These two algorithms are then combined, resulting to the ensemble approach TA-fold. We performed a comprehensive assessment for the proposed methods by comparing with ab-initio methods and template-based threading methods on six benchmark datasets. An accuracy of 0.799 was achieved by TA-fold on the DD dataset that consists of proteins from 27 folds. This represents improvement of 5.4-11.7% over ab-initio methods. After updating this dataset to include more proteins in the same folds, the accuracy increased to 0.971. In addition, TA-fold achieved >0.9 accuracy on a large dataset consisting of 6451 proteins from 184 folds. Experiments on the LE dataset show that TA-fold consistently outperforms other threading methods at the family, superfamily and fold levels. The success of TA-fold is attributed to the combination of template-based fold assignment and ab-initio classification using features from complementary sequence profiles that contain rich evolution information. http://yanglab.nankai.edu.cn/TA-fold/. yangjy@nankai.edu.cn or mhb-506@163.com. Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
Nucleic and Amino Acid Sequences Support Structure-Based Viral Classification.
Sinclair, Robert M; Ravantti, Janne J; Bamford, Dennis H
2017-04-15
Viral capsids ensure viral genome integrity by protecting the enclosed nucleic acids. Interactions between the genome and capsid and between individual capsid proteins (i.e., capsid architecture) are intimate and are expected to be characterized by strong evolutionary conservation. For this reason, a capsid structure-based viral classification has been proposed as a way to bring order to the viral universe. The seeming lack of sufficient sequence similarity to reproduce this classification has made it difficult to reject structural convergence as the basis for the classification. We reinvestigate whether the structure-based classification for viral coat proteins making icosahedral virus capsids is in fact supported by previously undetected sequence similarity. Since codon choices can influence nascent protein folding cotranslationally, we searched for both amino acid and nucleotide sequence similarity. To demonstrate the sensitivity of the approach, we identify a candidate gene for the pandoravirus capsid protein. We show that the structure-based classification is strongly supported by amino acid and also nucleotide sequence similarities, suggesting that the similarities are due to common descent. The correspondence between structure-based and sequence-based analyses of the same proteins shown here allow them to be used in future analyses of the relationship between linear sequence information and macromolecular function, as well as between linear sequence and protein folds. IMPORTANCE Viral capsids protect nucleic acid genomes, which in turn encode capsid proteins. This tight coupling of protein shell and nucleic acids, together with strong functional constraints on capsid protein folding and architecture, leads to the hypothesis that capsid protein-coding nucleotide sequences may retain signatures of ancient viral evolution. We have been able to show that this is indeed the case, using the major capsid proteins of viruses forming icosahedral capsids. Importantly, we detected similarity at the nucleotide level between capsid protein-coding regions from viruses infecting cells belonging to all three domains of life, reproducing a previously established structure-based classification of icosahedral viral capsids. Copyright © 2017 Sinclair et al.
Nucleic and Amino Acid Sequences Support Structure-Based Viral Classification
Sinclair, Robert M.; Ravantti, Janne J.
2017-01-01
ABSTRACT Viral capsids ensure viral genome integrity by protecting the enclosed nucleic acids. Interactions between the genome and capsid and between individual capsid proteins (i.e., capsid architecture) are intimate and are expected to be characterized by strong evolutionary conservation. For this reason, a capsid structure-based viral classification has been proposed as a way to bring order to the viral universe. The seeming lack of sufficient sequence similarity to reproduce this classification has made it difficult to reject structural convergence as the basis for the classification. We reinvestigate whether the structure-based classification for viral coat proteins making icosahedral virus capsids is in fact supported by previously undetected sequence similarity. Since codon choices can influence nascent protein folding cotranslationally, we searched for both amino acid and nucleotide sequence similarity. To demonstrate the sensitivity of the approach, we identify a candidate gene for the pandoravirus capsid protein. We show that the structure-based classification is strongly supported by amino acid and also nucleotide sequence similarities, suggesting that the similarities are due to common descent. The correspondence between structure-based and sequence-based analyses of the same proteins shown here allow them to be used in future analyses of the relationship between linear sequence information and macromolecular function, as well as between linear sequence and protein folds. IMPORTANCE Viral capsids protect nucleic acid genomes, which in turn encode capsid proteins. This tight coupling of protein shell and nucleic acids, together with strong functional constraints on capsid protein folding and architecture, leads to the hypothesis that capsid protein-coding nucleotide sequences may retain signatures of ancient viral evolution. We have been able to show that this is indeed the case, using the major capsid proteins of viruses forming icosahedral capsids. Importantly, we detected similarity at the nucleotide level between capsid protein-coding regions from viruses infecting cells belonging to all three domains of life, reproducing a previously established structure-based classification of icosahedral viral capsids. PMID:28122979
Computational Modeling of Proteins based on Cellular Automata: A Method of HP Folding Approximation.
Madain, Alia; Abu Dalhoum, Abdel Latif; Sleit, Azzam
2018-06-01
The design of a protein folding approximation algorithm is not straightforward even when a simplified model is used. The folding problem is a combinatorial problem, where approximation and heuristic algorithms are usually used to find near optimal folds of proteins primary structures. Approximation algorithms provide guarantees on the distance to the optimal solution. The folding approximation approach proposed here depends on two-dimensional cellular automata to fold proteins presented in a well-studied simplified model called the hydrophobic-hydrophilic model. Cellular automata are discrete computational models that rely on local rules to produce some overall global behavior. One-third and one-fourth approximation algorithms choose a subset of the hydrophobic amino acids to form H-H contacts. Those algorithms start with finding a point to fold the protein sequence into two sides where one side ignores H's at even positions and the other side ignores H's at odd positions. In addition, blocks or groups of amino acids fold the same way according to a predefined normal form. We intend to improve approximation algorithms by considering all hydrophobic amino acids and folding based on the local neighborhood instead of using normal forms. The CA does not assume a fixed folding point. The proposed approach guarantees one half approximation minus the H-H endpoints. This lower bound guaranteed applies to short sequences only. This is proved as the core and the folds of the protein will have two identical sides for all short sequences.
Gold, Nicola D; Jackson, Richard M
2006-02-03
The rapid growth in protein structural data and the emergence of structural genomics projects have increased the need for automatic structure analysis and tools for function prediction. Small molecule recognition is critical to the function of many proteins; therefore, determination of ligand binding site similarity is important for understanding ligand interactions and may allow their functional classification. Here, we present a binding sites database (SitesBase) that given a known protein-ligand binding site allows rapid retrieval of other binding sites with similar structure independent of overall sequence or fold similarity. However, each match is also annotated with sequence similarity and fold information to aid interpretation of structure and functional similarity. Similarity in ligand binding sites can indicate common binding modes and recognition of similar molecules, allowing potential inference of function for an uncharacterised protein or providing additional evidence of common function where sequence or fold similarity is already known. Alternatively, the resource can provide valuable information for detailed studies of molecular recognition including structure-based ligand design and in understanding ligand cross-reactivity. Here, we show examples of atomic similarity between superfamily or more distant fold relatives as well as between seemingly unrelated proteins. Assignment of unclassified proteins to structural superfamiles is also undertaken and in most cases substantiates assignments made using sequence similarity. Correct assignment is also possible where sequence similarity fails to find significant matches, illustrating the potential use of binding site comparisons for newly determined proteins.
Exploration of the relationship between topology and designability of conformations
NASA Astrophysics Data System (ADS)
Leelananda, Sumudu P.; Towfic, Fadi; Jernigan, Robert L.; Kloczkowski, Andrzej
2011-06-01
Protein structures are evolutionarily more conserved than sequences, and sequences with very low sequence identity frequently share the same fold. This leads to the concept of protein designability. Some folds are more designable and lots of sequences can assume that fold. Elucidating the relationship between protein sequence and the three-dimensional (3D) structure that the sequence folds into is an important problem in computational structural biology. Lattice models have been utilized in numerous studies to model protein folds and predict the designability of certain folds. In this study, all possible compact conformations within a set of two-dimensional and 3D lattice spaces are explored. Complementary interaction graphs are then generated for each conformation and are described using a set of graph features. The full HP sequence space for each lattice model is generated and contact energies are calculated by threading each sequence onto all the possible conformations. Unique conformation giving minimum energy is identified for each sequence and the number of sequences folding to each conformation (designability) is obtained. Machine learning algorithms are used to predict the designability of each conformation. We find that the highly designable structures can be distinguished from other non-designable conformations based on certain graphical geometric features of the interactions. This finding confirms the fact that the topology of a conformation is an important determinant of the extent of its designability and suggests that the interactions themselves are important for determining the designability.
SVM-Fold: a tool for discriminative multi-class protein fold and superfamily recognition
Melvin, Iain; Ie, Eugene; Kuang, Rui; Weston, Jason; Stafford, William Noble; Leslie, Christina
2007-01-01
Background Predicting a protein's structural class from its amino acid sequence is a fundamental problem in computational biology. Much recent work has focused on developing new representations for protein sequences, called string kernels, for use with support vector machine (SVM) classifiers. However, while some of these approaches exhibit state-of-the-art performance at the binary protein classification problem, i.e. discriminating between a particular protein class and all other classes, few of these studies have addressed the real problem of multi-class superfamily or fold recognition. Moreover, there are only limited software tools and systems for SVM-based protein classification available to the bioinformatics community. Results We present a new multi-class SVM-based protein fold and superfamily recognition system and web server called SVM-Fold, which can be found at . Our system uses an efficient implementation of a state-of-the-art string kernel for sequence profiles, called the profile kernel, where the underlying feature representation is a histogram of inexact matching k-mer frequencies. We also employ a novel machine learning approach to solve the difficult multi-class problem of classifying a sequence of amino acids into one of many known protein structural classes. Binary one-vs-the-rest SVM classifiers that are trained to recognize individual structural classes yield prediction scores that are not comparable, so that standard "one-vs-all" classification fails to perform well. Moreover, SVMs for classes at different levels of the protein structural hierarchy may make useful predictions, but one-vs-all does not try to combine these multiple predictions. To deal with these problems, our method learns relative weights between one-vs-the-rest classifiers and encodes information about the protein structural hierarchy for multi-class prediction. In large-scale benchmark results based on the SCOP database, our code weighting approach significantly improves on the standard one-vs-all method for both the superfamily and fold prediction in the remote homology setting and on the fold recognition problem. Moreover, our code weight learning algorithm strongly outperforms nearest-neighbor methods based on PSI-BLAST in terms of prediction accuracy on every structure classification problem we consider. Conclusion By combining state-of-the-art SVM kernel methods with a novel multi-class algorithm, the SVM-Fold system delivers efficient and accurate protein fold and superfamily recognition. PMID:17570145
Folding and Stabilization of Native-Sequence-Reversed Proteins
Zhang, Yuanzhao; Weber, Jeffrey K; Zhou, Ruhong
2016-01-01
Though the problem of sequence-reversed protein folding is largely unexplored, one might speculate that reversed native protein sequences should be significantly more foldable than purely random heteropolymer sequences. In this article, we investigate how the reverse-sequences of native proteins might fold by examining a series of small proteins of increasing structural complexity (α-helix, β-hairpin, α-helix bundle, and α/β-protein). Employing a tandem protein structure prediction algorithmic and molecular dynamics simulation approach, we find that the ability of reverse sequences to adopt native-like folds is strongly influenced by protein size and the flexibility of the native hydrophobic core. For β-hairpins with reverse-sequences that fail to fold, we employ a simple mutational strategy for guiding stable hairpin formation that involves the insertion of amino acids into the β-turn region. This systematic look at reverse sequence duality sheds new light on the problem of protein sequence-structure mapping and may serve to inspire new protein design and protein structure prediction protocols. PMID:27113844
Folding and Stabilization of Native-Sequence-Reversed Proteins
NASA Astrophysics Data System (ADS)
Zhang, Yuanzhao; Weber, Jeffrey K.; Zhou, Ruhong
2016-04-01
Though the problem of sequence-reversed protein folding is largely unexplored, one might speculate that reversed native protein sequences should be significantly more foldable than purely random heteropolymer sequences. In this article, we investigate how the reverse-sequences of native proteins might fold by examining a series of small proteins of increasing structural complexity (α-helix, β-hairpin, α-helix bundle, and α/β-protein). Employing a tandem protein structure prediction algorithmic and molecular dynamics simulation approach, we find that the ability of reverse sequences to adopt native-like folds is strongly influenced by protein size and the flexibility of the native hydrophobic core. For β-hairpins with reverse-sequences that fail to fold, we employ a simple mutational strategy for guiding stable hairpin formation that involves the insertion of amino acids into the β-turn region. This systematic look at reverse sequence duality sheds new light on the problem of protein sequence-structure mapping and may serve to inspire new protein design and protein structure prediction protocols.
Biophysical and structural considerations for protein sequence evolution
2011-01-01
Background Protein sequence evolution is constrained by the biophysics of folding and function, causing interdependence between interacting sites in the sequence. However, current site-independent models of sequence evolutions do not take this into account. Recent attempts to integrate the influence of structure and biophysics into phylogenetic models via statistical/informational approaches have not resulted in expected improvements in model performance. This suggests that further innovations are needed for progress in this field. Results Here we develop a coarse-grained physics-based model of protein folding and binding function, and compare it to a popular informational model. We find that both models violate the assumption of the native sequence being close to a thermodynamic optimum, causing directional selection away from the native state. Sampling and simulation show that the physics-based model is more specific for fold-defining interactions that vary less among residue type. The informational model diffuses further in sequence space with fewer barriers and tends to provide less support for an invariant sites model, although amino acid substitutions are generally conservative. Both approaches produce sequences with natural features like dN/dS < 1 and gamma-distributed rates across sites. Conclusions Simple coarse-grained models of protein folding can describe some natural features of evolving proteins but are currently not accurate enough to use in evolutionary inference. This is partly due to improper packing of the hydrophobic core. We suggest possible improvements on the representation of structure, folding energy, and binding function, as regards both native and non-native conformations, and describe a large number of possible applications for such a model. PMID:22171550
Muthu Krishnan, S
2018-05-14
The receptor-associated protein (RAP) is an inhibitor of endocytic receptors that belong to the lipoprotein receptor gene family. In this study, a computational approach was tried to find the evolutionarily related fold of the RAP proteins. Through the structural and sequence-based analysis, found various protein folds that are very close to the RAP folds. Remote homolog datasets were used potentially to develop a different support vector machine (SVM) methods to recognize the homologous RAP fold. This study helps in understanding the relationship of RAP homologs folds based on the structure, function and evolutionary history. Copyright © 2018 Elsevier Ltd. All rights reserved.
PASS2: an automated database of protein alignments organised as structural superfamilies.
Bhaduri, Anirban; Pugalenthi, Ganesan; Sowdhamini, Ramanathan
2004-04-02
The functional selection and three-dimensional structural constraints of proteins in nature often relates to the retention of significant sequence similarity between proteins of similar fold and function despite poor sequence identity. Organization of structure-based sequence alignments for distantly related proteins, provides a map of the conserved and critical regions of the protein universe that is useful for the analysis of folding principles, for the evolutionary unification of protein families and for maximizing the information return from experimental structure determination. The Protein Alignment organised as Structural Superfamily (PASS2) database represents continuously updated, structural alignments for evolutionary related, sequentially distant proteins. An automated and updated version of PASS2 is, in direct correspondence with SCOP 1.63, consisting of sequences having identity below 40% among themselves. Protein domains have been grouped into 628 multi-member superfamilies and 566 single member superfamilies. Structure-based sequence alignments for the superfamilies have been obtained using COMPARER, while initial equivalencies have been derived from a preliminary superposition using LSQMAN or STAMP 4.0. The final sequence alignments have been annotated for structural features using JOY4.0. The database is supplemented with sequence relatives belonging to different genomes, conserved spatially interacting and structural motifs, probabilistic hidden markov models of superfamilies based on the alignments and useful links to other databases. Probabilistic models and sensitive position specific profiles obtained from reliable superfamily alignments aid annotation of remote homologues and are useful tools in structural and functional genomics. PASS2 presents the phylogeny of its members both based on sequence and structural dissimilarities. Clustering of members allows us to understand diversification of the family members. The search engine has been improved for simpler browsing of the database. The database resolves alignments among the structural domains consisting of evolutionarily diverged set of sequences. Availability of reliable sequence alignments of distantly related proteins despite poor sequence identity and single-member superfamilies permit better sampling of structures in libraries for fold recognition of new sequences and for the understanding of protein structure-function relationships of individual superfamilies. PASS2 is accessible at http://www.ncbs.res.in/~faculty/mini/campass/pass2.html
Automated design evolution of stereochemically randomized protein foldamers
NASA Astrophysics Data System (ADS)
Ranbhor, Ranjit; Kumar, Anil; Patel, Kirti; Ramakrishnan, Vibin; Durani, Susheel
2018-05-01
Diversification of chain stereochemistry opens up the possibilities of an ‘in principle’ increase in the design space of proteins. This huge increase in the sequence and consequent structural variation is aimed at the generation of smart materials. To diversify protein structure stereochemically, we introduced L- and D-α-amino acids as the design alphabet. With a sequence design algorithm, we explored the usage of specific variables such as chirality and the sequence of this alphabet in independent steps. With molecular dynamics, we folded stereochemically diverse homopolypeptides and evaluated their ‘fitness’ for possible design as protein-like foldamers. We propose a fitness function to prune the most optimal fold among 1000 structures simulated with an automated repetitive simulated annealing molecular dynamics (AR-SAMD) approach. The highly scored poly-leucine fold with sequence lengths of 24 and 30 amino acids were later sequence-optimized using a Dead End Elimination cum Monte Carlo based optimization tool. This paper demonstrates a novel approach for the de novo design of protein-like foldamers.
Kono, H; Saven, J G
2001-02-23
Combinatorial experiments provide new ways to probe the determinants of protein folding and to identify novel folding amino acid sequences. These types of experiments, however, are complicated both by enormous conformational complexity and by large numbers of possible sequences. Therefore, a quantitative computational theory would be helpful in designing and interpreting these types of experiment. Here, we present and apply a statistically based, computational approach for identifying the properties of sequences compatible with a given main-chain structure. Protein side-chain conformations are included in an atom-based fashion. Calculations are performed for a variety of similar backbone structures to identify sequence properties that are robust with respect to minor changes in main-chain structure. Rather than specific sequences, the method yields the likelihood of each of the amino acids at preselected positions in a given protein structure. The theory may be used to quantify the characteristics of sequence space for a chosen structure without explicitly tabulating sequences. To account for hydrophobic effects, we introduce an environmental energy that it is consistent with other simple hydrophobicity scales and show that it is effective for side-chain modeling. We apply the method to calculate the identity probabilities of selected positions of the immunoglobulin light chain-binding domain of protein L, for which many variant folding sequences are available. The calculations compare favorably with the experimentally observed identity probabilities.
Ganesan, K; Parthasarathy, S
2011-12-01
Annotation of any newly determined protein sequence depends on the pairwise sequence identity with known sequences. However, for the twilight zone sequences which have only 15-25% identity, the pair-wise comparison methods are inadequate and the annotation becomes a challenging task. Such sequences can be annotated by using methods that recognize their fold. Bowie et al. described a 3D1D profile method in which the amino acid sequences that fold into a known 3D structure are identified by their compatibility to that known 3D structure. We have improved the above method by using the predicted secondary structure information and employ it for fold recognition from the twilight zone sequences. In our Protein Secondary Structure 3D1D (PSS-3D1D) method, a score (w) for the predicted secondary structure of the query sequence is included in finding the compatibility of the query sequence to the known fold 3D structures. In the benchmarks, the PSS-3D1D method shows a maximum of 21% improvement in predicting correctly the α + β class of folds from the sequences with twilight zone level of identity, when compared with the 3D1D profile method. Hence, the PSS-3D1D method could offer more clues than the 3D1D method for the annotation of twilight zone sequences. The web based PSS-3D1D method is freely available in the PredictFold server at http://bioinfo.bdu.ac.in/servers/ .
Kister, Alexander
2015-01-01
We present an alternative approach to protein 3D folding prediction based on determination of rules that specify distribution of “favorable” residues, that are mainly responsible for a given fold formation, and “unfavorable” residues, that are incompatible with that fold, in polypeptide sequences. The process of determining favorable and unfavorable residues is iterative. The starting assumptions are based on the general principles of protein structure formation as well as structural features peculiar to a protein fold under investigation. The initial assumptions are tested one-by-one for a set of all known proteins with a given structure. The assumption is accepted as a “rule of amino acid distribution” for the protein fold if it holds true for all, or near all, structures. If the assumption is not accepted as a rule, it can be modified to better fit the data and then tested again in the next step of the iterative search algorithm, or rejected. We determined the set of amino acid distribution rules for a large group of beta sandwich-like proteins characterized by a specific arrangement of strands in two beta sheets. It was shown that this set of rules is highly sensitive (~90%) and very specific (~99%) for identifying sequences of proteins with specified beta sandwich fold structure. The advantage of the proposed approach is that it does not require that query proteins have a high degree of homology to proteins with known structure. So long as the query protein satisfies residue distribution rules, it can be confidently assigned to its respective protein fold. Another advantage of our approach is that it allows for a better understanding of which residues play an essential role in protein fold formation. It may, therefore, facilitate rational protein engineering design. PMID:25625198
Protein fold recognition using geometric kernel data fusion.
Zakeri, Pooya; Jeuris, Ben; Vandebril, Raf; Moreau, Yves
2014-07-01
Various approaches based on features extracted from protein sequences and often machine learning methods have been used in the prediction of protein folds. Finding an efficient technique for integrating these different protein features has received increasing attention. In particular, kernel methods are an interesting class of techniques for integrating heterogeneous data. Various methods have been proposed to fuse multiple kernels. Most techniques for multiple kernel learning focus on learning a convex linear combination of base kernels. In addition to the limitation of linear combinations, working with such approaches could cause a loss of potentially useful information. We design several techniques to combine kernel matrices by taking more involved, geometry inspired means of these matrices instead of convex linear combinations. We consider various sequence-based protein features including information extracted directly from position-specific scoring matrices and local sequence alignment. We evaluate our methods for classification on the SCOP PDB-40D benchmark dataset for protein fold recognition. The best overall accuracy on the protein fold recognition test set obtained by our methods is ∼ 86.7%. This is an improvement over the results of the best existing approach. Moreover, our computational model has been developed by incorporating the functional domain composition of proteins through a hybridization model. It is observed that by using our proposed hybridization model, the protein fold recognition accuracy is further improved to 89.30%. Furthermore, we investigate the performance of our approach on the protein remote homology detection problem by fusing multiple string kernels. The MATLAB code used for our proposed geometric kernel fusion frameworks are publicly available at http://people.cs.kuleuven.be/∼raf.vandebril/homepage/software/geomean.php?menu=5/. © The Author 2014. Published by Oxford University Press.
Homochiral stereochemistry: the missing link of structure to energetics in protein folding.
Kumar, Anil; Ramakrishnan, Vibin; Ranbhor, Ranjit; Patel, Kirti; Durani, Susheel
2009-12-24
The notion is tested that homochiral stereochemistry being ubiquitous to protein structure could be critical to protein folding as well, causing it to become frustrated energetically providing the basis for its solvent- and sequence-mediated control. The proof in support of the notion is found in a consensus of experiment and computation according to which suitable oligopeptides are in their folding-unfolding equilibria, at both macrostate and microstate levels, susceptible to dielectric because of the conflict of peptide-chain electrostatics with interpeptide hydrogen bonds when the structure is poly-L but not when it is alternating-L,D. The argument is thus made that homochiral stereochemistry may in protein folding provide the unifying basis for its solvent- and sequence-mediated control based on screening of peptide-chain electrostatics under conflict with folding of the chain due to homochiral stereochemistry. Dielectric is brought into spotlight as the effect comparatively obscure but presumably critical to the folding in protein structure for its control.
Use of designed sequences in protein structure recognition.
Kumar, Gayatri; Mudgal, Richa; Srinivasan, Narayanaswamy; Sandhya, Sankaran
2018-05-09
Knowledge of the protein structure is a pre-requisite for improved understanding of molecular function. The gap in the sequence-structure space has increased in the post-genomic era. Grouping related protein sequences into families can aid in narrowing the gap. In the Pfam database, structure description is provided for part or full-length proteins of 7726 families. For the remaining 52% of the families, information on 3-D structure is not yet available. We use the computationally designed sequences that are intermediately related to two protein domain families, which are already known to share the same fold. These strategically designed sequences enable detection of distant relationships and here, we have employed them for the purpose of structure recognition of protein families of yet unknown structure. We first measured the success rate of our approach using a dataset of protein families of known fold and achieved a success rate of 88%. Next, for 1392 families of yet unknown structure, we made structural assignments for part/full length of the proteins. Fold association for 423 domains of unknown function (DUFs) are provided as a step towards functional annotation. The results indicate that knowledge-based filling of gaps in protein sequence space is a lucrative approach for structure recognition. Such sequences assist in traversal through protein sequence space and effectively function as 'linkers', where natural linkers between distant proteins are unavailable. This article was reviewed by Oliviero Carugo, Christine Orengo and Srikrishna Subramanian.
Wang, Jichao; Zhang, Tongchuan; Liu, Ruicun; Song, Meilin; Wang, Juncheng; Hong, Jiong; Chen, Quan; Liu, Haiyan
2017-02-01
An interesting way of generating novel artificial proteins is to combine sequence motifs from natural proteins, mimicking the evolutionary path suggested by natural proteins comprising recurring motifs. We analyzed the βα and αβ modules of TIM barrel proteins by structure alignment-based sequence clustering. A number of preferred motifs were identified. A chimeric TIM was designed by using recurring elements as mutually compatible interfaces. The foldability of the designed TIM protein was then significantly improved by six rounds of directed evolution. The melting temperature has been improved by more than 20°C. A variety of characteristics suggested that the resulting protein is well-folded. Our analysis provided a library of peptide motifs that is potentially useful for different protein engineering studies. The protein engineering strategy of using recurring motifs as interfaces to connect partial natural proteins may be applied to other protein folds. Copyright © 2016 Elsevier B.V. All rights reserved.
A Particle Swarm Optimization-Based Approach with Local Search for Predicting Protein Folding.
Yang, Cheng-Hong; Lin, Yu-Shiun; Chuang, Li-Yeh; Chang, Hsueh-Wei
2017-10-01
The hydrophobic-polar (HP) model is commonly used for predicting protein folding structures and hydrophobic interactions. This study developed a particle swarm optimization (PSO)-based algorithm combined with local search algorithms; specifically, the high exploration PSO (HEPSO) algorithm (which can execute global search processes) was combined with three local search algorithms (hill-climbing algorithm, greedy algorithm, and Tabu table), yielding the proposed HE-L-PSO algorithm. By using 20 known protein structures, we evaluated the performance of the HE-L-PSO algorithm in predicting protein folding in the HP model. The proposed HE-L-PSO algorithm exhibited favorable performance in predicting both short and long amino acid sequences with high reproducibility and stability, compared with seven reported algorithms. The HE-L-PSO algorithm yielded optimal solutions for all predicted protein folding structures. All HE-L-PSO-predicted protein folding structures possessed a hydrophobic core that is similar to normal protein folding.
Protein classification using sequential pattern mining.
Exarchos, Themis P; Papaloukas, Costas; Lampros, Christos; Fotiadis, Dimitrios I
2006-01-01
Protein classification in terms of fold recognition can be employed to determine the structural and functional properties of a newly discovered protein. In this work sequential pattern mining (SPM) is utilized for sequence-based fold recognition. One of the most efficient SPM algorithms, cSPADE, is employed for protein primary structure analysis. Then a classifier uses the extracted sequential patterns for classifying proteins of unknown structure in the appropriate fold category. The proposed methodology exhibited an overall accuracy of 36% in a multi-class problem of 17 candidate categories. The classification performance reaches up to 65% when the three most probable protein folds are considered.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sachleben, Joseph R.; Adhikari, Aashish N.; Gawlak, Grzegorz
2016-11-10
We determined the NMR structure of a highly aromatic (13%) protein of unknown function, Aq1974 from Aquifex aeolicus (PDB ID: 5SYQ). The unusual sequence of this protein has a tryptophan content five times the normal (six tryptophan residues of 114 or 5.2% while the average tryptophan content is 1.0%) with the tryptophans occurring in a WXW motif. It has no detectable sequence homology with known protein structures. Although its NMR spectrum suggested that the protein was rich in β-sheet, upon resonance assignment and solution structure determination, the protein was found to be primarily α-helical with a small two-stranded β-sheet withmore » a novel fold that we have termed an Aromatic Claw. As this fold was previously unknown and the sequence unique, we submitted the sequence to CASP10 as a target for blind structural prediction. At the end of the competition, the sequence was classified a hard template based model; the structural relationship between the template and the experimental structure was small and the predictions all failed to predict the structure. CSRosetta was found to predict the secondary structure and its packing; however, it was found that there was little correlation between CSRosetta score and the RMSD between the CSRosetta structure and the NMR determined one. This work demonstrates that even in relatively small proteins, we do not yet have the capacity to accurately predict the fold for all primary sequences. The experimental discovery of new folds helps guide the improvement of structural prediction methods.« less
Metamorphic Proteins: Emergence of Dual Protein Folds from One Primary Sequence.
Lella, Muralikrishna; Mahalakshmi, Radhakrishnan
2017-06-20
Every amino acid exhibits a different propensity for distinct structural conformations. Hence, decoding how the primary amino acid sequence undergoes the transition to a defined secondary structure and its final three-dimensional fold is presently considered predictable with reasonable certainty. However, protein sequences that defy the first principles of secondary structure prediction (they attain two different folds) have recently been discovered. Such proteins, aptly named metamorphic proteins, decrease the conformational constraint by increasing flexibility in the secondary structure and thereby result in efficient functionality. In this review, we discuss the major factors driving the conformational switch related both to protein sequence and to structure using illustrative examples. We discuss the concept of an evolutionary transition in sequence and structure, the functional impact of the tertiary fold, and the pressure of intrinsic and external factors that give rise to metamorphic proteins. We mainly focus on the major components of protein architecture, namely, the α-helix and β-sheet segments, which are involved in conformational switching within the same or highly similar sequences. These chameleonic sequences are widespread in both cytosolic and membrane proteins, and these folds are equally important for protein structure and function. We discuss the implications of metamorphic proteins and chameleonic peptide sequences in de novo peptide design.
Roessler, Christian G.; Hall, Branwen M.; Anderson, William J.; Ingram, Wendy M.; Roberts, Sue A.; Montfort, William R.; Cordes, Matthew H. J.
2008-01-01
Proteins that share common ancestry may differ in structure and function because of divergent evolution of their amino acid sequences. For a typical diverse protein superfamily, the properties of a few scattered members are known from experiment. A satisfying picture of functional and structural evolution in relation to sequence changes, however, may require characterization of a larger, well chosen subset. Here, we employ a “stepping-stone” method, based on transitive homology, to target sequences intermediate between two related proteins with known divergent properties. We apply the approach to the question of how new protein folds can evolve from preexisting folds and, in particular, to an evolutionary change in secondary structure and oligomeric state in the Cro family of bacteriophage transcription factors, initially identified by sequence-structure comparison of distant homologs from phages P22 and λ. We report crystal structures of two Cro proteins, Xfaso 1 and Pfl 6, with sequences intermediate between those of P22 and λ. The domains show 40% sequence identity but differ by switching of α-helix to β-sheet in a C-terminal region spanning ≈25 residues. Sedimentation analysis also suggests a correlation between helix-to-sheet conversion and strengthened dimerization. PMID:18227506
Kurgan, Lukasz; Cios, Krzysztof; Chen, Ke
2008-05-01
Protein structure prediction methods provide accurate results when a homologous protein is predicted, while poorer predictions are obtained in the absence of homologous templates. However, some protein chains that share twilight-zone pairwise identity can form similar folds and thus determining structural similarity without the sequence similarity would be desirable for the structure prediction. The folding type of a protein or its domain is defined as the structural class. Current structural class prediction methods that predict the four structural classes defined in SCOP provide up to 63% accuracy for the datasets in which sequence identity of any pair of sequences belongs to the twilight-zone. We propose SCPRED method that improves prediction accuracy for sequences that share twilight-zone pairwise similarity with sequences used for the prediction. SCPRED uses a support vector machine classifier that takes several custom-designed features as its input to predict the structural classes. Based on extensive design that considers over 2300 index-, composition- and physicochemical properties-based features along with features based on the predicted secondary structure and content, the classifier's input includes 8 features based on information extracted from the secondary structure predicted with PSI-PRED and one feature computed from the sequence. Tests performed with datasets of 1673 protein chains, in which any pair of sequences shares twilight-zone similarity, show that SCPRED obtains 80.3% accuracy when predicting the four SCOP-defined structural classes, which is superior when compared with over a dozen recent competing methods that are based on support vector machine, logistic regression, and ensemble of classifiers predictors. The SCPRED can accurately find similar structures for sequences that share low identity with sequence used for the prediction. The high predictive accuracy achieved by SCPRED is attributed to the design of the features, which are capable of separating the structural classes in spite of their low dimensionality. We also demonstrate that the SCPRED's predictions can be successfully used as a post-processing filter to improve performance of modern fold classification methods.
Kurgan, Lukasz; Cios, Krzysztof; Chen, Ke
2008-01-01
Background Protein structure prediction methods provide accurate results when a homologous protein is predicted, while poorer predictions are obtained in the absence of homologous templates. However, some protein chains that share twilight-zone pairwise identity can form similar folds and thus determining structural similarity without the sequence similarity would be desirable for the structure prediction. The folding type of a protein or its domain is defined as the structural class. Current structural class prediction methods that predict the four structural classes defined in SCOP provide up to 63% accuracy for the datasets in which sequence identity of any pair of sequences belongs to the twilight-zone. We propose SCPRED method that improves prediction accuracy for sequences that share twilight-zone pairwise similarity with sequences used for the prediction. Results SCPRED uses a support vector machine classifier that takes several custom-designed features as its input to predict the structural classes. Based on extensive design that considers over 2300 index-, composition- and physicochemical properties-based features along with features based on the predicted secondary structure and content, the classifier's input includes 8 features based on information extracted from the secondary structure predicted with PSI-PRED and one feature computed from the sequence. Tests performed with datasets of 1673 protein chains, in which any pair of sequences shares twilight-zone similarity, show that SCPRED obtains 80.3% accuracy when predicting the four SCOP-defined structural classes, which is superior when compared with over a dozen recent competing methods that are based on support vector machine, logistic regression, and ensemble of classifiers predictors. Conclusion The SCPRED can accurately find similar structures for sequences that share low identity with sequence used for the prediction. The high predictive accuracy achieved by SCPRED is attributed to the design of the features, which are capable of separating the structural classes in spite of their low dimensionality. We also demonstrate that the SCPRED's predictions can be successfully used as a post-processing filter to improve performance of modern fold classification methods. PMID:18452616
NASA Astrophysics Data System (ADS)
Camilloni, Carlo; Broglia, Ricardo A.; Tiana, Guido
2011-01-01
The study of the mechanism which is at the basis of the phenomenon of protein folding requires the knowledge of multiple folding trajectories under biological conditions. Using a biasing molecular-dynamics algorithm based on the physics of the ratchet-and-pawl system, we carry out all-atom, explicit solvent simulations of the sequence of folding events which proteins G, CI2, and ACBP undergo in evolving from the denatured to the folded state. Starting from highly disordered conformations, the algorithm allows the proteins to reach, at the price of a modest computational effort, nativelike conformations, within a root mean square deviation (RMSD) of approximately 1 Å. A scheme is developed to extract, from the myriad of events, information concerning the sequence of native contact formation and of their eventual correlation. Such an analysis indicates that all the studied proteins fold hierarchically, through pathways which, although not deterministic, are well-defined with respect to the order of contact formation. The algorithm also allows one to study unfolding, a process which looks, to a large extent, like the reverse of the major folding pathway. This is also true in situations in which many pathways contribute to the folding process, like in the case of protein G.
Single-molecule Protein Unfolding in Solid State Nanopores
Talaga, David S.; Li, Jiali
2009-01-01
We use single silicon nitride nanopores to study folded, partially folded and unfolded single proteins by measuring their excluded volumes. The DNA-calibrated translocation signals of β-lactoglobulin and histidine-containing phosphocarrier protein match quantitatively with that predicted by a simple sum of the partial volumes of the amino acids in the polypeptide segment inside the pore when translocation stalls due to the primary charge sequence. Our analysis suggests that the majority of the protein molecules were linear or looped during translocation and that the electrical forces present under physiologically relevant potentials can unfold proteins. Our results show that the nanopore translocation signals are sensitive enough to distinguish the folding state of a protein and distinguish between proteins based on the excluded volume of a local segment of the polypeptide chain that transiently stalls in the nanopore due to the primary sequence of charges. PMID:19530678
In vivo protein stabilization based on fragment complementation and a split GFP system.
Lindman, Stina; Hernandez-Garcia, Armando; Szczepankiewicz, Olga; Frohm, Birgitta; Linse, Sara
2010-11-16
Protein stabilization was achieved through in vivo screening based on the thermodynamic linkage between protein folding and fragment complementation. The split GFP system was found suitable to derive protein variants with enhanced stability due to the correlation between effects of mutations on the stability of the intact chain and the effects of the same mutations on the affinity between fragments of the chain. PGB1 mutants with higher affinity between fragments 1 to 40 and 41 to 56 were obtained by in vivo screening of a library of the 1 to 40 fragments against wild-type 41 to 56 fragments. Colonies were ranked based on the intensity of green fluorescence emerging from assembly and folding of the fused GFP fragments. The DNA from the brightest fluorescent colonies was sequenced, and intact mutant PGB1s corresponding to the top three sequences were expressed, purified, and analyzed for stability toward thermal denaturation. The protein sequence derived from the top fluorescent colony was found to yield a 12 °C increase in the thermal denaturation midpoint and a free energy of stabilization of -8.7 kJ/mol at 25 °C. The stability rank order of the three mutant proteins follows the fluorescence rank order in the split GFP system. The variants are stabilized through increased hydrophobic effect, which raises the free energy of the unfolded more than the folded state; as well as substitutions, which lower the free energy of the folded more than the unfolded state; optimized van der Waals interactions; helix stabilization; improved hydrogen bonding network; and reduced electrostatic repulsion in the folded state.
Deiana, Antonio; Giansanti, Andrea
2010-04-21
Natively unfolded proteins lack a well defined three dimensional structure but have important biological functions, suggesting a re-assignment of the structure-function paradigm. To assess that a given protein is natively unfolded requires laborious experimental investigations, then reliable sequence-only methods for predicting whether a sequence corresponds to a folded or to an unfolded protein are of interest in fundamental and applicative studies. Many proteins have amino acidic compositions compatible both with the folded and unfolded status, and belong to a twilight zone between order and disorder. This makes difficult a dichotomic classification of protein sequences into folded and natively unfolded ones. In this work we propose an operational method to identify proteins belonging to the twilight zone by combining into a consensus score good performing single predictors of folding. In this methodological paper dichotomic folding indexes are considered: hydrophobicity-charge, mean packing, mean pairwise energy, Poodle-W and a new global index, that is called here gVSL2, based on the local disorder predictor VSL2. The performance of these indexes is evaluated on different datasets, in particular on a new dataset composed by 2369 folded and 81 natively unfolded proteins. Poodle-W, gVSL2 and mean pairwise energy have good performance and stability in all the datasets considered and are combined into a strictly unanimous combination score SSU, that leaves proteins unclassified when the consensus of all combined indexes is not reached. The unclassified proteins: i) belong to an overlap region in the vector space of amino acidic compositions occupied by both folded and unfolded proteins; ii) are composed by approximately the same number of order-promoting and disorder-promoting amino acids; iii) have a mean flexibility intermediate between that of folded and that of unfolded proteins. Our results show that proteins unclassified by SSU belong to a twilight zone. Proteins left unclassified by the consensus score SSU have physical properties intermediate between those of folded and those of natively unfolded proteins and their structural properties and evolutionary history are worth to be investigated.
2010-01-01
Background Natively unfolded proteins lack a well defined three dimensional structure but have important biological functions, suggesting a re-assignment of the structure-function paradigm. To assess that a given protein is natively unfolded requires laborious experimental investigations, then reliable sequence-only methods for predicting whether a sequence corresponds to a folded or to an unfolded protein are of interest in fundamental and applicative studies. Many proteins have amino acidic compositions compatible both with the folded and unfolded status, and belong to a twilight zone between order and disorder. This makes difficult a dichotomic classification of protein sequences into folded and natively unfolded ones. In this work we propose an operational method to identify proteins belonging to the twilight zone by combining into a consensus score good performing single predictors of folding. Results In this methodological paper dichotomic folding indexes are considered: hydrophobicity-charge, mean packing, mean pairwise energy, Poodle-W and a new global index, that is called here gVSL2, based on the local disorder predictor VSL2. The performance of these indexes is evaluated on different datasets, in particular on a new dataset composed by 2369 folded and 81 natively unfolded proteins. Poodle-W, gVSL2 and mean pairwise energy have good performance and stability in all the datasets considered and are combined into a strictly unanimous combination score SSU, that leaves proteins unclassified when the consensus of all combined indexes is not reached. The unclassified proteins: i) belong to an overlap region in the vector space of amino acidic compositions occupied by both folded and unfolded proteins; ii) are composed by approximately the same number of order-promoting and disorder-promoting amino acids; iii) have a mean flexibility intermediate between that of folded and that of unfolded proteins. Conclusions Our results show that proteins unclassified by SSU belong to a twilight zone. Proteins left unclassified by the consensus score SSU have physical properties intermediate between those of folded and those of natively unfolded proteins and their structural properties and evolutionary history are worth to be investigated. PMID:20409339
Naranjo, Yandi; Pons, Miquel; Konrat, Robert
2012-01-01
The number of existing protein sequences spans a very small fraction of sequence space. Natural proteins have overcome a strong negative selective pressure to avoid the formation of insoluble aggregates. Stably folded globular proteins and intrinsically disordered proteins (IDPs) use alternative solutions to the aggregation problem. While in globular proteins folding minimizes the access to aggregation prone regions, IDPs on average display large exposed contact areas. Here, we introduce the concept of average meta-structure correlation maps to analyze sequence space. Using this novel conceptual view we show that representative ensembles of folded and ID proteins show distinct characteristics and respond differently to sequence randomization. By studying the way evolutionary constraints act on IDPs to disable a negative function (aggregation) we might gain insight into the mechanisms by which function-enabling information is encoded in IDPs.
Solitons and protein folding: An In Silico experiment
NASA Astrophysics Data System (ADS)
Ilieva, N.; Dai, J.; Sieradzan, A.; Niemi, A.
2015-10-01
Protein folding [1] is the process of formation of a functional 3D structure from a random coil — the shape in which amino-acid chains leave the ribosome. Anfinsen's dogma states that the native 3D shape of a protein is completely determined by protein's amino acid sequence. Despite the progress in understanding the process rate and the success in folding prediction for some small proteins, with presently available physics-based methods it is not yet possible to reliably deduce the shape of a biologically active protein from its amino acid sequence. The protein-folding problem endures as one of the most important unresolved problems in science; it addresses the origin of life itself. Furthermore, a wrong fold is a common cause for a protein to lose its function or even endanger the living organism. Soliton solutions of a generalized discrete non-linear Schrödinger equation (GDNLSE) obtained from the energy function in terms of bond and torsion angles κ and τ provide a constructive theoretical framework for describing protein folds and folding patterns [2]. Here we study the dynamics of this process by means of molecular-dynamics simulations. The soliton manifestation is the pattern helix-loop-helix in the secondary structure of the protein, which explains the importance of understanding loop formation in helical proteins. We performed in silico experiments for unfolding one subunit of the core structure of gp41 from the HIV envelope glycoprotein (PDB ID: 1AIK [3]) by molecular-dynamics simulations with the MD package GROMACS. We analyzed 80 ns trajectories, obtained with one united-atom and two different all-atom force fields, to justify the side-chain orientation quantification scheme adopted in the studies and to eliminate force-field based artifacts. Our results are compatible with the soliton model of protein folding and provide first insight into soliton-formation dynamics.
Protein structure recognition: From eigenvector analysis to structural threading method
NASA Astrophysics Data System (ADS)
Cao, Haibo
In this work, we try to understand the protein folding problem using pair-wise hydrophobic interaction as the dominant interaction for the protein folding process. We found a strong correlation between amino acid sequence and the corresponding native structure of the protein. Some applications of this correlation were discussed in this dissertation include the domain partition and a new structural threading method as well as the performance of this method in the CASP5 competition. In the first part, we give a brief introduction to the protein folding problem. Some essential knowledge and progress from other research groups was discussed. This part include discussions of interactions among amino acids residues, lattice HP model, and the designablity principle. In the second part, we try to establish the correlation between amino acid sequence and the corresponding native structure of the protein. This correlation was observed in our eigenvector study of protein contact matrix. We believe the correlation is universal, thus it can be used in automatic partition of protein structures into folding domains. In the third part, we discuss a threading method based on the correlation between amino acid sequence and ominant eigenvector of the structure contact-matrix. A mathematically straightforward iteration scheme provides a self-consistent optimum global sequence-structure alignment. The computational efficiency of this method makes it possible to search whole protein structure databases for structural homology without relying on sequence similarity. The sensitivity and specificity of this method is discussed, along with a case of blind test prediction. In the appendix, we list the overall performance of this threading method in CASP5 blind test in comparison with other existing approaches.
Song, Jiangning; Yuan, Zheng; Tan, Hao; Huber, Thomas; Burrage, Kevin
2007-12-01
Disulfide bonds are primary covalent crosslinks between two cysteine residues in proteins that play critical roles in stabilizing the protein structures and are commonly found in extracy-toplasmatic or secreted proteins. In protein folding prediction, the localization of disulfide bonds can greatly reduce the search in conformational space. Therefore, there is a great need to develop computational methods capable of accurately predicting disulfide connectivity patterns in proteins that could have potentially important applications. We have developed a novel method to predict disulfide connectivity patterns from protein primary sequence, using a support vector regression (SVR) approach based on multiple sequence feature vectors and predicted secondary structure by the PSIPRED program. The results indicate that our method could achieve a prediction accuracy of 74.4% and 77.9%, respectively, when averaged on proteins with two to five disulfide bridges using 4-fold cross-validation, measured on the protein and cysteine pair on a well-defined non-homologous dataset. We assessed the effects of different sequence encoding schemes on the prediction performance of disulfide connectivity. It has been shown that the sequence encoding scheme based on multiple sequence feature vectors coupled with predicted secondary structure can significantly improve the prediction accuracy, thus enabling our method to outperform most of other currently available predictors. Our work provides a complementary approach to the current algorithms that should be useful in computationally assigning disulfide connectivity patterns and helps in the annotation of protein sequences generated by large-scale whole-genome projects. The prediction web server and Supplementary Material are accessible at http://foo.maths.uq.edu.au/~huber/disulfide
Principles of protein folding--a perspective from simple exact models.
Dill, K. A.; Bromberg, S.; Yue, K.; Fiebig, K. M.; Yee, D. P.; Thomas, P. D.; Chan, H. S.
1995-01-01
General principles of protein structure, stability, and folding kinetics have recently been explored in computer simulations of simple exact lattice models. These models represent protein chains at a rudimentary level, but they involve few parameters, approximations, or implicit biases, and they allow complete explorations of conformational and sequence spaces. Such simulations have resulted in testable predictions that are sometimes unanticipated: The folding code is mainly binary and delocalized throughout the amino acid sequence. The secondary and tertiary structures of a protein are specified mainly by the sequence of polar and nonpolar monomers. More specific interactions may refine the structure, rather than dominate the folding code. Simple exact models can account for the properties that characterize protein folding: two-state cooperativity, secondary and tertiary structures, and multistage folding kinetics--fast hydrophobic collapse followed by slower annealing. These studies suggest the possibility of creating "foldable" chain molecules other than proteins. The encoding of a unique compact chain conformation may not require amino acids; it may require only the ability to synthesize specific monomer sequences in which at least one monomer type is solvent-averse. PMID:7613459
2011-01-01
Background Mapping protein primary sequences to their three dimensional folds referred to as the 'second genetic code' remains an unsolved scientific problem. A crucial part of the problem concerns the geometrical specificity in side chain association leading to densely packed protein cores, a hallmark of correctly folded native structures. Thus, any model of packing within proteins should constitute an indispensable component of protein folding and design. Results In this study an attempt has been made to find, characterize and classify recurring patterns in the packing of side chain atoms within a protein which sustains its native fold. The interaction of side chain atoms within the protein core has been represented as a contact network based on the surface complementarity and overlap between associating side chain surfaces. Some network topologies definitely appear to be preferred and they have been termed 'packing motifs', analogous to super secondary structures in proteins. Study of the distribution of these motifs reveals the ubiquitous presence of typical smaller graphs, which appear to get linked or coalesce to give larger graphs, reminiscent of the nucleation-condensation model in protein folding. One such frequently occurring motif, also envisaged as the unit of clustering, the three residue clique was invariably found in regions of dense packing. Finally, topological measures based on surface contact networks appeared to be effective in discriminating sequences native to a specific fold amongst a set of decoys. Conclusions Out of innumerable topological possibilities, only a finite number of specific packing motifs are actually realized in proteins. This small number of motifs could serve as a basis set in the construction of larger networks. Of these, the triplet clique exhibits distinct preference both in terms of composition and geometry. PMID:21605466
Evidence for the principle of minimal frustration in the evolution of protein folding landscapes.
Tzul, Franco O; Vasilchuk, Daniel; Makhatadze, George I
2017-02-28
Theoretical and experimental studies have firmly established that protein folding can be described by a funneled energy landscape. This funneled energy landscape is the result of foldable protein sequences evolving following the principle of minimal frustration, which allows proteins to rapidly fold to their native biologically functional conformations. For a protein family with a given functional fold, the principle of minimal frustration suggests that, independent of sequence, all proteins within this family should fold with similar rates. However, depending on the optimal living temperature of the organism, proteins also need to modulate their thermodynamic stability. Consequently, the difference in thermodynamic stability should be primarily caused by differences in the unfolding rates. To test this hypothesis experimentally, we performed comprehensive thermodynamic and kinetic analyses of 15 different proteins from the thioredoxin family. Eight of these thioredoxins were extant proteins from psychrophilic, mesophilic, or thermophilic organisms. The other seven protein sequences were obtained using ancestral sequence reconstruction and can be dated back over 4 billion years. We found that all studied proteins fold with very similar rates but unfold with rates that differ up to three orders of magnitude. The unfolding rates correlate well with the thermodynamic stability of the proteins. Moreover, proteins that unfold slower are more resistant to proteolysis. These results provide direct experimental support to the principle of minimal frustration hypothesis.
Alva, Vikram; Remmert, Michael; Biegert, Andreas; Lupas, Andrei N; Söding, Johannes
2010-01-01
Many protein classification systems capture homologous relationships by grouping domains into families and superfamilies on the basis of sequence similarity. Superfamilies with similar 3D structures are further grouped into folds. In the absence of discernable sequence similarity, these structural similarities were long thought to have originated independently, by convergent evolution. However, the growth of databases and advances in sequence comparison methods have led to the discovery of many distant evolutionary relationships that transcend the boundaries of superfamilies and folds. To investigate the contributions of convergent versus divergent evolution in the origin of protein folds, we clustered representative domains of known structure by their sequence similarity, treating them as point masses in a virtual 2D space which attract or repel each other depending on their pairwise sequence similarities. As expected, families in the same superfamily form tight clusters. But often, superfamilies of the same fold are linked with each other, suggesting that the entire fold evolved from an ancient prototype. Strikingly, some links connect superfamilies with different folds. They arise from modular peptide fragments of between 20 and 40 residues that co-occur in the connected folds in disparate structural contexts. These may be descendants of an ancestral pool of peptide modules that evolved as cofactors in the RNA world and from which the first folded proteins arose by amplification and recombination. Our galaxy of folds summarizes, in a single image, most known and many yet undescribed homologous relationships between protein superfamilies, providing new insights into the evolution of protein domains.
Directed evolution of the TALE N-terminal domain for recognition of all 5' bases.
Lamb, Brian M; Mercer, Andrew C; Barbas, Carlos F
2013-11-01
Transcription activator-like effector (TALE) proteins can be designed to bind virtually any DNA sequence. General guidelines for design of TALE DNA-binding domains suggest that the 5'-most base of the DNA sequence bound by the TALE (the N0 base) should be a thymine. We quantified the N0 requirement by analysis of the activities of TALE transcription factors (TALE-TF), TALE recombinases (TALE-R) and TALE nucleases (TALENs) with each DNA base at this position. In the absence of a 5' T, we observed decreases in TALE activity up to >1000-fold in TALE-TF activity, up to 100-fold in TALE-R activity and up to 10-fold reduction in TALEN activity compared with target sequences containing a 5' T. To develop TALE architectures that recognize all possible N0 bases, we used structure-guided library design coupled with TALE-R activity selections to evolve novel TALE N-terminal domains to accommodate any N0 base. A G-selective domain and broadly reactive domains were isolated and characterized. The engineered TALE domains selected in the TALE-R format demonstrated modularity and were active in TALE-TF and TALEN architectures. Evolved N-terminal domains provide effective and unconstrained TALE-based targeting of any DNA sequence as TALE binding proteins and designer enzymes.
Komives, Elizabeth A.; Wolynes, Peter G.
2008-01-01
Repeat-proteins are made up of near repetitions of 20– to 40–amino acid stretches. These polypeptides usually fold up into non-globular, elongated architectures that are stabilized by the interactions within each repeat and those between adjacent repeats, but that lack contacts between residues distant in sequence. The inherent symmetries both in primary sequence and three-dimensional structure are reflected in a folding landscape that may be analyzed as a quasi–one-dimensional problem. We present a general description of repeat-protein energy landscapes based on a formal Ising-like treatment of the elementary interaction energetics in and between foldons, whose collective ensemble are treated as spin variables. The overall folding properties of a complete “domain” (the stability and cooperativity of the repeating array) can be derived from this microscopic description. The one-dimensional nature of the model implies there are simple relations for the experimental observables: folding free-energy (ΔGwater) and the cooperativity of denaturation (m-value), which do not ordinarily apply for globular proteins. We show how the parameters for the “coarse-grained” description in terms of foldon spin variables can be extracted from more detailed folding simulations on perfectly funneled landscapes. To illustrate the ideas, we present a case-study of a family of tetratricopeptide (TPR) repeat proteins and quantitatively relate the results to the experimentally observed folding transitions. Based on the dramatic effect that single point mutations exert on the experimentally observed folding behavior, we speculate that natural repeat proteins are “poised” at particular ratios of inter- and intra-element interaction energetics that allow them to readily undergo structural transitions in physiologically relevant conditions, which may be intrinsically related to their biological functions. PMID:18483553
Evolution, Energy Landscapes and the Paradoxes of Protein Folding
Wolynes, Peter G.
2014-01-01
Protein folding has been viewed as a difficult problem of molecular self-organization. The search problem involved in folding however has been simplified through the evolution of folding energy landscapes that are funneled. The funnel hypothesis can be quantified using energy landscape theory based on the minimal frustration principle. Strong quantitative predictions that follow from energy landscape theory have been widely confirmed both through laboratory folding experiments and from detailed simulations. Energy landscape ideas also have allowed successful protein structure prediction algorithms to be developed. The selection constraint of having funneled folding landscapes has left its imprint on the sequences of existing protein structural families. Quantitative analysis of co-evolution patterns allows us to infer the statistical characteristics of the folding landscape. These turn out to be consistent with what has been obtained from laboratory physicochemical folding experiments signalling a beautiful confluence of genomics and chemical physics. PMID:25530262
How the Sequence of a Gene Specifies Structural Symmetry in Proteins
Shen, Xiaojuan; Huang, Tongcheng; Wang, Guanyu; Li, Guanglin
2015-01-01
Internal symmetry is commonly observed in the majority of fundamental protein folds. Meanwhile, sufficient evidence suggests that nascent polypeptide chains of proteins have the potential to start the co-translational folding process and this process allows mRNA to contain additional information on protein structure. In this paper, we study the relationship between gene sequences and protein structures from the viewpoint of symmetry to explore how gene sequences code for structural symmetry in proteins. We found that, for a set of two-fold symmetric proteins from left-handed beta-helix fold, intragenic symmetry always exists in their corresponding gene sequences. Meanwhile, codon usage bias and local mRNA structure might be involved in modulating translation speed for the formation of structural symmetry: a major decrease of local codon usage bias in the middle of the codon sequence can be identified as a common feature; and major or consecutive decreases in local mRNA folding energy near the boundaries of the symmetric substructures can also be observed. The results suggest that gene duplication and fusion may be an evolutionarily conserved process for this protein fold. In addition, the usage of rare codons and the formation of higher order of secondary structure near the boundaries of symmetric substructures might have coevolved as conserved mechanisms to slow down translation elongation and to facilitate effective folding of symmetric substructures. These findings provide valuable insights into our understanding of the mechanisms of translation and its evolution, as well as the design of proteins via symmetric modules. PMID:26641668
Method of generating ploynucleotides encoding enhanced folding variants
Bradbury, Andrew M.; Kiss, Csaba; Waldo, Geoffrey S.
2017-05-02
The invention provides directed evolution methods for improving the folding, solubility and stability (including thermostability) characteristics of polypeptides. In one aspect, the invention provides a method for generating folding and stability-enhanced variants of proteins, including but not limited to fluorescent proteins, chromophoric proteins and enzymes. In another aspect, the invention provides methods for generating thermostable variants of a target protein or polypeptide via an internal destabilization baiting strategy. Internally destabilization a protein of interest is achieved by inserting a heterologous, folding-destabilizing sequence (folding interference domain) within DNA encoding the protein of interest, evolving the protein sequences adjacent to the heterologous insertion to overcome the destabilization (using any number of mutagenesis methods), thereby creating a library of variants. The variants in the library are expressed, and those with enhanced folding characteristics selected.
Modularity of Protein Folds as a Tool for Template-Free Modeling of Structures.
Vallat, Brinda; Madrid-Aliste, Carlos; Fiser, Andras
2015-08-01
Predicting the three-dimensional structure of proteins from their amino acid sequences remains a challenging problem in molecular biology. While the current structural coverage of proteins is almost exclusively provided by template-based techniques, the modeling of the rest of the protein sequences increasingly require template-free methods. However, template-free modeling methods are much less reliable and are usually applicable for smaller proteins, leaving much space for improvement. We present here a novel computational method that uses a library of supersecondary structure fragments, known as Smotifs, to model protein structures. The library of Smotifs has saturated over time, providing a theoretical foundation for efficient modeling. The method relies on weak sequence signals from remotely related protein structures to create a library of Smotif fragments specific to the target protein sequence. This Smotif library is exploited in a fragment assembly protocol to sample decoys, which are assessed by a composite scoring function. Since the Smotif fragments are larger in size compared to the ones used in other fragment-based methods, the proposed modeling algorithm, SmotifTF, can employ an exhaustive sampling during decoy assembly. SmotifTF successfully predicts the overall fold of the target proteins in about 50% of the test cases and performs competitively when compared to other state of the art prediction methods, especially when sequence signal to remote homologs is diminishing. Smotif-based modeling is complementary to current prediction methods and provides a promising direction in addressing the structure prediction problem, especially when targeting larger proteins for modeling.
NASA Astrophysics Data System (ADS)
Tavenor, Nathan Albert
Protein-based supramolecular polymers (SMPs) are a class of biomaterials which draw inspiration from and expand upon the many examples of complex protein quaternary structures observed in nature: collagen, microtubules, viral capsids, etc. Designing synthetic supramolecular protein scaffolds both increases our understanding of natural superstructures and allows for the creation of novel materials. Similar to small-molecule SMPs, protein-based SMPs form due to self-assembly driven by intermolecular interactions between monomers, and monomer structure determines the properties of the overall material. Using protein-based monomers takes advantage of the self-assembly and highly specific molecular recognition properties encodable in polypeptide sequences to rationally design SMP architectures. The central hypothesis underlying our work is that alpha-helical coiled coils, a well-studied protein quaternary folding motif, are well-suited to SMP design through the addition of synthetic linkers at solvent-exposed sites. Through small changes in the structures of the cross-links and/or peptide sequence, we have been able to control both the nanoscale organization and the macroscopic properties of the SMPs. Changes to the linker and hydrophobic core of the peptide can be used to control polymer rigidity, stability, and dimensionality. The gaps in knowledge that this thesis sought to fill on this project were 1) the relationship between the molecular structure of the cross-linked polypeptides and the macroscopic properties of the SMPs and 2) a means of creating materials exhibiting multi-dimensional net or framework topologies. Separate from the above efforts on supramolecular architectures was work on improving backbone modification strategies for an alpha-helix in the context of a complex protein tertiary fold. Earlier work in our lab had successfully incorporated unnatural building blocks into every major secondary structure (beta-sheet, alpha-helix, loops and beta-turns) of a small protein with a tertiary fold. Although the tertiary fold of the native sequence was mimicked by the resulting artificial protein, the thermodynamic stability was greatly compromised. Most of this energetic penalty derived from the modifications present in the alpha-helix. The contribution within this thesis was direct comparison of several alpha-helical design strategies and establishment of the thermodynamic consequences of each.
The PYRIN domain: A member of the death domain-fold superfamily
Fairbrother, Wayne J.; Gordon, Nathaniel C.; Humke, Eric W.; O'Rourke, Karen M.; Starovasnik, Melissa A.; Yin, Jian-Ping; Dixit, Vishva M.
2001-01-01
PYRIN domains were identified recently as putative protein–protein interaction domains at the N-termini of several proteins thought to function in apoptotic and inflammatory signaling pathways. The ∼95 residue PYRIN domains have no statistically significant sequence homology to proteins with known three-dimensional structure. Using secondary structure prediction and potential-based fold recognition methods, however, the PYRIN domain is predicted to be a member of the six-helix bundle death domain-fold superfamily that includes death domains (DDs), death effector domains (DEDs), and caspase recruitment domains (CARDs). Members of the death domain-fold superfamily are well established mediators of protein–protein interactions found in many proteins involved in apoptosis and inflammation, indicating further that the PYRIN domains serve a similar function. An homology model of the PYRIN domain of CARD7/DEFCAP/NAC/NALP1, a member of the Apaf-1/Ced-4 family of proteins, was constructed using the three-dimensional structures of the FADD and p75 neurotrophin receptor DDs, and of the Apaf-1 and caspase-9 CARDs, as templates. Validation of the model using a variety of computational techniques indicates that the fold prediction is consistent with the sequence. Comparison of a circular dichroism spectrum of the PYRIN domain of CARD7/DEFCAP/NAC/NALP1 with spectra of several proteins known to adopt the death domain-fold provides experimental support for the structure prediction. PMID:11514682
Improving Protein Fold Recognition by Deep Learning Networks.
Jo, Taeho; Hou, Jie; Eickholt, Jesse; Cheng, Jianlin
2015-12-04
For accurate recognition of protein folds, a deep learning network method (DN-Fold) was developed to predict if a given query-template protein pair belongs to the same structural fold. The input used stemmed from the protein sequence and structural features extracted from the protein pair. We evaluated the performance of DN-Fold along with 18 different methods on Lindahl's benchmark dataset and on a large benchmark set extracted from SCOP 1.75 consisting of about one million protein pairs, at three different levels of fold recognition (i.e., protein family, superfamily, and fold) depending on the evolutionary distance between protein sequences. The correct recognition rate of ensembled DN-Fold for Top 1 predictions is 84.5%, 61.5%, and 33.6% and for Top 5 is 91.2%, 76.5%, and 60.7% at family, superfamily, and fold levels, respectively. We also evaluated the performance of single DN-Fold (DN-FoldS), which showed the comparable results at the level of family and superfamily, compared to ensemble DN-Fold. Finally, we extended the binary classification problem of fold recognition to real-value regression task, which also show a promising performance. DN-Fold is freely available through a web server at http://iris.rnet.missouri.edu/dnfold.
NASA Astrophysics Data System (ADS)
Pappu, Rohit V.; Nussinov, Ruth
2009-03-01
In appropriate physiological milieux proteins spontaneously fold into their functional three-dimensional structures. The amino acid sequences of functional proteins contain all the information necessary to specify the folds. This remarkable observation has spawned research aimed at answering two major questions. (1) Of all the conceivable structures that a protein can adopt, why is the ensemble of native-like structures the most favorable? (2) What are the paths by which proteins manage to robustly and reproducibly fold into their native structures? Anfinsen's thermodynamic hypothesis has guided the pursuit of answers to the first question whereas Levinthal's paradox has influenced the development of models for protein folding dynamics. Decades of work have led to significant advances in the folding problem. Mean-field models have been developed to capture our current, coarse grain understanding of the driving forces for protein folding. These models are being used to predict three-dimensional protein structures from sequence and stability profiles as a function of thermodynamic and chemical perturbations. Impressive strides have also been made in the field of protein design, also known as the inverse folding problem, thereby testing our understanding of the determinants of the fold specificities of different sequences. Early work on protein folding pathways focused on the specific sequence of events that could lead to a simplification of the search process. However, unifying principles proved to be elusive. Proteins that show reversible two-state folding-unfolding transitions turned out to be a gift of natural selection. Focusing on these simple systems helped researchers to uncover general principles regarding the origins of cooperativity in protein folding thermodynamics and kinetics. On the theoretical front, concepts borrowed from polymer physics and the physics of spin glasses led to the development of a framework based on energy landscape theories. These theories predict that evolved sequences (functional proteins as opposed to random sequences) find their native folds by minimizing geometric (topological) frustration (i.e. avoiding entropic bottlenecks/kinetic traps). In some cases, following a dominant pathway is the optimal way to minimize frustration, whereas in extreme cases, proteins may fold without encountering bottlenecks. Experimental studies of two-state proteins led in turn to the development of quantitative descriptors that have allowed specific testing of theoretical predictions. These include methods such as phi value analysis to characterize transition state ensembles and descriptors that measure the effects of geometry/topology on folding rates. Interestingly, there exists a striking inverse correlation between the relative contact order (the distance in sequence space between spatially proximal contacts made in the native state) and the folding rates of several two-state proteins. The relative contact order provides a rough estimate of the net entropic cost associated with realizing the folded state, and theories have been developed to explain the observed correlation between the contact order and folding rates. Despite its maturity as a field, there are several areas that come under the rubric of protein folding that are just beginning to receive attention. For example, how do complications in vivo such as macromolecular crowding, confinement, the presence of cosolutes, membrane anchoring, and tethering to surfaces influence protein stabilities and folding dynamics? While we are accustomed to studying proteins at concentrations that are amenable to investigation via probes whose signal intensities grow with protein concentration, this does not make these readouts relevant to the in vivo setting. In cells, protein concentrations are tightly regulated and are likely to be orders of magnitude lower than what we are accustomed to using within in vitro experimental setups. Protein folding in vivo is a complex multi-scale dynamical problem when one considers the synergies between protein expression, spontaneous folding, chaperonin-assisted folding, protein targeting, the kinetics of post-translational modifications, protein degradation, and of course the drive to avoid aggregation. Further, there is growing recognition that cells not only tolerate but select for proteins that are intrinsically disordered. These proteins are essential for many crucial activities, and yet their inability to fold in isolation makes them prone to proteolytic processing and aggregation. In the series of papers that make up this special focus on protein folding in physical biology, leading researchers provide insights into diverse cross-sections of problems in protein folding. Barrick provides a concise review of what we have learned from the study of two-state folders and draws attention to how several unanswered questions are being approached using studies on large repeat proteins. Dissecting the contribution of hydration-mediated interactions to driving forces for protein folding and assembly has been extremely challenging. There is renewed interest in using hydrostatic pressure as a tool to access folding intermediates and decipher the role of partially hydrated states in folding, misfolding, and aggregation. Silva and Foguel review many of the nuances that have been uncovered by perturbing hydrostatic pressure as a thermodynamic parameter. As noted above, protein folding in vivo is expected to be considerably more complex than the folding of two-state proteins in dilute solutions. Lucent et al review the state-of-the-art in the development of quantitative theories to explain chaperonin-assisted folding in vivo. Additionally, they highlight unanswered questions pertaining to the processing of unfolded/misfolded proteins by the chaperone machinery. Zhuang et al present results that focus on the effects of surface tethering on transition state ensembles and folding mechanisms of a model two-state protein. Their results are important because several proteins in vivo fold while being anchored to membranes. Finally, several neurodegenerative and systemic diseases are associated with the aggregation of intrinsically disordered polypeptides. The search for cures in these debilitating and fatal diseases has focused attention on shared attributes in aggregation mechanisms of different proteins and the possibility of identifying druggable targets from mechanistic studies. Abedini and Raleigh review common features gleaned from mechanistic studies of the aggregation of several intrinsically disordered proteins. They propose that the population of helical intermediates and their stabilization via interactions with membranes might be an important route by which the process of aggregation leads to toxicity. The five papers that form this protein folding focus cover specific sub-topics within the larger field of protein folding. They address current questions and emphasize the importance of the growing and productive interface between the physical sciences and biology. We hope that these papers will stimulate much discussion and more importantly advances in the areas highlighted by the contributors.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ilieva, N., E-mail: nevena.ilieva@parallel.bas.bg; Dai, J., E-mail: daijing491@gmail.com; Sieradzan, A., E-mail: adams86@wp.pl
Protein folding [1] is the process of formation of a functional 3D structure from a random coil — the shape in which amino-acid chains leave the ribosome. Anfinsen’s dogma states that the native 3D shape of a protein is completely determined by protein’s amino acid sequence. Despite the progress in understanding the process rate and the success in folding prediction for some small proteins, with presently available physics-based methods it is not yet possible to reliably deduce the shape of a biologically active protein from its amino acid sequence. The protein-folding problem endures as one of the most important unresolvedmore » problems in science; it addresses the origin of life itself. Furthermore, a wrong fold is a common cause for a protein to lose its function or even endanger the living organism. Soliton solutions of a generalized discrete non-linear Schrödinger equation (GDNLSE) obtained from the energy function in terms of bond and torsion angles κ and τ provide a constructive theoretical framework for describing protein folds and folding patterns [2]. Here we study the dynamics of this process by means of molecular-dynamics simulations. The soliton manifestation is the pattern helix–loop–helix in the secondary structure of the protein, which explains the importance of understanding loop formation in helical proteins. We performed in silico experiments for unfolding one subunit of the core structure of gp41 from the HIV envelope glycoprotein (PDB ID: 1AIK [3]) by molecular-dynamics simulations with the MD package GROMACS. We analyzed 80 ns trajectories, obtained with one united-atom and two different all-atom force fields, to justify the side-chain orientation quantification scheme adopted in the studies and to eliminate force-field based artifacts. Our results are compatible with the soliton model of protein folding and provide first insight into soliton-formation dynamics.« less
RNA 3D Structural Motifs: Definition, Identification, Annotation, and Database Searching
NASA Astrophysics Data System (ADS)
Nasalean, Lorena; Stombaugh, Jesse; Zirbel, Craig L.; Leontis, Neocles B.
Structured RNA molecules resemble proteins in the hierarchical organization of their global structures, folding and broad range of functions. Structured RNAs are composed of recurrent modular motifs that play specific functional roles. Some motifs direct the folding of the RNA or stabilize the folded structure through tertiary interactions. Others bind ligands or proteins or catalyze chemical reactions. Therefore, it is desirable, starting from the RNA sequence, to be able to predict the locations of recurrent motifs in RNA molecules. Conversely, the potential occurrence of one or more known 3D RNA motifs may indicate that a genomic sequence codes for a structured RNA molecule. To identify known RNA structural motifs in new RNA sequences, precise structure-based definitions are needed that specify the core nucleotides of each motif and their conserved interactions. By comparing instances of each recurrent motif and applying base pair isosteriCity relations, one can identify neutral mutations that preserve its structure and function in the contexts in which it occurs.
CABS-fold: Server for the de novo and consensus-based prediction of protein structure.
Blaszczyk, Maciej; Jamroz, Michal; Kmiecik, Sebastian; Kolinski, Andrzej
2013-07-01
The CABS-fold web server provides tools for protein structure prediction from sequence only (de novo modeling) and also using alternative templates (consensus modeling). The web server is based on the CABS modeling procedures ranked in previous Critical Assessment of techniques for protein Structure Prediction competitions as one of the leading approaches for de novo and template-based modeling. Except for template data, fragmentary distance restraints can also be incorporated into the modeling process. The web server output is a coarse-grained trajectory of generated conformations, its Jmol representation and predicted models in all-atom resolution (together with accompanying analysis). CABS-fold can be freely accessed at http://biocomp.chem.uw.edu.pl/CABSfold.
CABS-fold: server for the de novo and consensus-based prediction of protein structure
Blaszczyk, Maciej; Jamroz, Michal; Kmiecik, Sebastian; Kolinski, Andrzej
2013-01-01
The CABS-fold web server provides tools for protein structure prediction from sequence only (de novo modeling) and also using alternative templates (consensus modeling). The web server is based on the CABS modeling procedures ranked in previous Critical Assessment of techniques for protein Structure Prediction competitions as one of the leading approaches for de novo and template-based modeling. Except for template data, fragmentary distance restraints can also be incorporated into the modeling process. The web server output is a coarse-grained trajectory of generated conformations, its Jmol representation and predicted models in all-atom resolution (together with accompanying analysis). CABS-fold can be freely accessed at http://biocomp.chem.uw.edu.pl/CABSfold. PMID:23748950
Conserved thioredoxin fold is present in Pisum sativum L. sieve element occlusion-1 protein
Umate, Pavan; Tuteja, Renu
2010-01-01
Homology-based three-dimensional model for Pisum sativum sieve element occlusion 1 (Ps.SEO1) (forisomes) protein was constructed. A stretch of amino acids (residues 320 to 456) which is well conserved in all known members of forisomes proteins was used to model the 3D structure of Ps.SEO1. The structural prediction was done using Protein Homology/analogY Recognition Engine (PHYRE) web server. Based on studies of local sequence alignment, the thioredoxin-fold containing protein [Structural Classification of Proteins (SCOP) code d1o73a_], a member of the glutathione peroxidase family was selected as a template for modeling the spatial structure of Ps.SEO1. Selection was based on comparison of primary sequence, higher match quality and alignment accuracy. Motif 1 (EVF) is conserved in Ps.SEO1, Vicia faba (Vf.For1) and Medicago truncatula (MT.SEO3); motif 2 (KKED) is well conserved across all forisomes proteins and motif 3 (IGYIGNP) is conserved in Ps.SEO1 and Vf.For1. PMID:20404566
Urvoas, Agathe; Guellouz, Asma; Valerio-Lepiniec, Marie; Graille, Marc; Durand, Dominique; Desravines, Danielle C; van Tilbeurgh, Herman; Desmadril, Michel; Minard, Philippe
2010-11-26
Repeat proteins have a modular organization and a regular architecture that make them attractive models for design and directed evolution experiments. HEAT repeat proteins, although very common, have not been used as a scaffold for artificial proteins, probably because they are made of long and irregular repeats. Here, we present and validate a consensus sequence for artificial HEAT repeat proteins. The sequence was defined from the structure-based sequence analysis of a thermostable HEAT-like repeat protein. Appropriate sequences were identified for the N- and C-caps. A library of genes coding for artificial proteins based on this sequence design, named αRep, was assembled using new and versatile methodology based on circular amplification. Proteins picked randomly from this library are expressed as soluble proteins. The biophysical properties of proteins with different numbers of repeats and different combinations of side chains in hypervariable positions were characterized. Circular dichroism and differential scanning calorimetry experiments showed that all these proteins are folded cooperatively and are very stable (T(m) >70 °C). Stability of these proteins increases with the number of repeats. Detailed gel filtration and small-angle X-ray scattering studies showed that the purified proteins form either monomers or dimers. The X-ray structure of a stable dimeric variant structure was solved. The protein is folded with a highly regular topology and the repeat structure is organized, as expected, as pairs of alpha helices. In this protein variant, the dimerization interface results directly from the variable surface enriched in aromatic residues located in the randomized positions of the repeats. The dimer was crystallized both in an apo and in a PEG-bound form, revealing a very well defined binding crevice and some structure flexibility at the interface. This fortuitous binding site could later prove to be a useful binding site for other low molecular mass partners. Copyright © 2010 Elsevier Ltd. All rights reserved.
Mittal, Anuradha; Holehouse, Alex S; Cohan, Megan C; Pappu, Rohit V
2018-05-12
Intrinsically disordered proteins and regions (IDPs / IDRs) are characterized by well-defined sequence-to-conformation relationships (SCRs). These relationships refer to the sequence-specific preferences for average sizes, shapes, residue-specific secondary structure propensities, and amplitudes of multiscale conformational fluctuations. SCRs are discerned from the sequence-specific conformational ensembles of IDPs. A vast majority of IDPs are actually tethered to folded domains (FDs). This raises the question of whether or not SCRs inferred for IDPs are applicable to IDRs tethered to folded domains. Here, we use atomistic simulations based on a well-established forcefield paradigm and an enhanced sampling method to obtain comparative assessments of SCRs for thirteen archetypal IDRs modeled as autonomous units, as C-terminal tails connected to folded domains, and as linkers between pairs of folded domains. Our studies uncover a set of general observations regarding context-independent versus context-dependent SCRs of IDRs. SCRs are minimally perturbed upon tethering to folded domains if the IDRs are deficient in charged residues and for polyampholytic IDRs where the oppositely charged residues within the sequence of the IDR are separated into distinct blocks. In contrast, the interplay between IDRs and tethered folded domains has a significant modulatory effect on SCRs if the IDRs have intermediate fractions of charged residues or if they have sequence-intrinsic conformational preferences for canonical random coils. Our findings suggest that IDRs with context-independent SCRs might be independent evolutionary modules whereas IDRs with context-dependent intrinsic SCRs might co-evolve with the FDs to which they are tethered. Copyright © 2018. Published by Elsevier Ltd.
Neumann, Sindy; Hartmann, Holger; Martin-Galiano, Antonio J; Fuchs, Angelika; Frishman, Dmitrij
2012-03-01
Structural bioinformatics of membrane proteins is still in its infancy, and the picture of their fold space is only beginning to emerge. Because only a handful of three-dimensional structures are available, sequence comparison and structure prediction remain the main tools for investigating sequence-structure relationships in membrane protein families. Here we present a comprehensive analysis of the structural families corresponding to α-helical membrane proteins with at least three transmembrane helices. The new version of our CAMPS database (CAMPS 2.0) covers nearly 1300 eukaryotic, prokaryotic, and viral genomes. Using an advanced classification procedure, which is based on high-order hidden Markov models and considers both sequence similarity as well as the number of transmembrane helices and loop lengths, we identified 1353 structurally homogeneous clusters roughly corresponding to membrane protein folds. Only 53 clusters are associated with experimentally determined three-dimensional structures, and for these clusters CAMPS is in reasonable agreement with structure-based classification approaches such as SCOP and CATH. We therefore estimate that ∼1300 structures would need to be determined to provide a sufficient structural coverage of polytopic membrane proteins. CAMPS 2.0 is available at http://webclu.bio.wzw.tum.de/CAMPS2.0/. Copyright © 2011 Wiley Periodicals, Inc.
Directed evolution of the TALE N-terminal domain for recognition of all 5′ bases
Lamb, Brian M.; Mercer, Andrew C.; Barbas, Carlos F.
2013-01-01
Transcription activator-like effector (TALE) proteins can be designed to bind virtually any DNA sequence. General guidelines for design of TALE DNA-binding domains suggest that the 5′-most base of the DNA sequence bound by the TALE (the N0 base) should be a thymine. We quantified the N0 requirement by analysis of the activities of TALE transcription factors (TALE-TF), TALE recombinases (TALE-R) and TALE nucleases (TALENs) with each DNA base at this position. In the absence of a 5′ T, we observed decreases in TALE activity up to >1000-fold in TALE-TF activity, up to 100-fold in TALE-R activity and up to 10-fold reduction in TALEN activity compared with target sequences containing a 5′ T. To develop TALE architectures that recognize all possible N0 bases, we used structure-guided library design coupled with TALE-R activity selections to evolve novel TALE N-terminal domains to accommodate any N0 base. A G-selective domain and broadly reactive domains were isolated and characterized. The engineered TALE domains selected in the TALE-R format demonstrated modularity and were active in TALE-TF and TALEN architectures. Evolved N-terminal domains provide effective and unconstrained TALE-based targeting of any DNA sequence as TALE binding proteins and designer enzymes. PMID:23980031
Structure of a Trypanosoma Brucei Alpha/Beta--Hydrolase Fold Protein With Unknown Function
DOE Office of Scientific and Technical Information (OSTI.GOV)
Merritt, E.A.; Holmes, M.; Buckner, F.S.
2009-05-26
The structure of a structural genomics target protein, Tbru020260AAA from Trypanosoma brucei, has been determined to a resolution of 2.2 {angstrom} using multiple-wavelength anomalous diffraction at the Se K edge. This protein belongs to Pfam sequence family PF08538 and is only distantly related to previously studied members of the {alpha}/{beta}-hydrolase fold family. Structural superposition onto representative {alpha}/{beta}-hydrolase fold proteins of known function indicates that a possible catalytic nucleophile, Ser116 in the T. brucei protein, lies at the expected location. However, the present structure and by extension the other trypanosomatid members of this sequence family have neither sequence nor structural similaritymore » at the location of other active-site residues typical for proteins with this fold. Together with the presence of an additional domain between strands {beta}6 and {beta}7 that is conserved in trypanosomatid genomes, this suggests that the function of these homologs has diverged from other members of the fold family.« less
Hills, Ronald D.; Kathuria, Sagar V.; Wallace, Louise A.; Day, Iain J.; Brooks, Charles L.; Matthews, C. Robert
2010-01-01
The thermodynamic hypothesis of Anfinsen postulates that structures and stabilities of globular proteins are determined by their amino acid sequences. Chain topology, however, is known to influence the folding reaction, in that motifs with a preponderance of local interactions typically fold more rapidly than those with a larger fraction of non-local interactions. Together, the topology and sequence can modulate the energy landscape and influence the rate at which the protein folds to the native conformation. To explore the relationship of sequence and topology in the folding of βα–repeat proteins, which are dominated by local interactions, a combined experimental and simulation analysis was performed on two members of the flavodoxin-like, α/β/α sandwich fold. Spo0F and the N-terminal receiver domain of NtrC (NT-NtrC) have similar topologies but low sequence identity, enabling a test of the effects of sequence on folding. Experimental results demonstrated that both response-regulator proteins fold via parallel channels through highly structured sub-millisecond intermediates before accessing their cis prolyl peptide bond-containing native conformations. Global analysis of the experimental results preferentially places these intermediates off the productive folding pathway. Sequence-sensitive Gō-model simulations conclude that frustration in the folding in Spo0F, corresponding to the appearance of the off-pathway intermediate, reflects competition for intra-subdomain van der Waals contacts between its N- and C-terminal subdomains. The extent of transient, premature structure appears to correlate with the number of isoleucine, leucine and valine (ILV) side-chains that form a large sequence-local cluster involving the central β-sheet and helices α2, α3 and α4. The failure to detect the off-pathway species in the simulations of NT-NtrC may reflect the reduced number of ILV side-chains in its corresponding hydrophobic cluster. The location of the hydrophobic clusters in the structure may also be related to the differing functional properties of these response regulators. Comparison with the results of previous experimental and simulation analyses on the homologous CheY argues that prematurely-folded unproductive intermediates are a common property of the βα-repeat motif. PMID:20226790
Improving Protein Fold Recognition by Deep Learning Networks
NASA Astrophysics Data System (ADS)
Jo, Taeho; Hou, Jie; Eickholt, Jesse; Cheng, Jianlin
2015-12-01
For accurate recognition of protein folds, a deep learning network method (DN-Fold) was developed to predict if a given query-template protein pair belongs to the same structural fold. The input used stemmed from the protein sequence and structural features extracted from the protein pair. We evaluated the performance of DN-Fold along with 18 different methods on Lindahl’s benchmark dataset and on a large benchmark set extracted from SCOP 1.75 consisting of about one million protein pairs, at three different levels of fold recognition (i.e., protein family, superfamily, and fold) depending on the evolutionary distance between protein sequences. The correct recognition rate of ensembled DN-Fold for Top 1 predictions is 84.5%, 61.5%, and 33.6% and for Top 5 is 91.2%, 76.5%, and 60.7% at family, superfamily, and fold levels, respectively. We also evaluated the performance of single DN-Fold (DN-FoldS), which showed the comparable results at the level of family and superfamily, compared to ensemble DN-Fold. Finally, we extended the binary classification problem of fold recognition to real-value regression task, which also show a promising performance. DN-Fold is freely available through a web server at http://iris.rnet.missouri.edu/dnfold.
A fully automatic evolutionary classification of protein folds: Dali Domain Dictionary version 3
Dietmann, Sabine; Park, Jong; Notredame, Cedric; Heger, Andreas; Lappe, Michael; Holm, Liisa
2001-01-01
The Dali Domain Dictionary (http://www.ebi.ac.uk/dali/domain) is a numerical taxonomy of all known structures in the Protein Data Bank (PDB). The taxonomy is derived fully automatically from measurements of structural, functional and sequence similarities. Here, we report the extension of the classification to match the traditional four hierarchical levels corresponding to: (i) supersecondary structural motifs (attractors in fold space), (ii) the topology of globular domains (fold types), (iii) remote homologues (functional families) and (iv) homologues with sequence identity above 25% (sequence families). The computational definitions of attractors and functional families are new. In September 2000, the Dali classification contained 10 531 PDB entries comprising 17 101 chains, which were partitioned into five attractor regions, 1375 fold types, 2582 functional families and 3724 domain sequence families. Sequence families were further associated with 99 582 unique homologous sequences in the HSSP database, which increases the number of effectively known structures several-fold. The resulting database contains the description of protein domain architecture, the definition of structural neighbours around each known structure, the definition of structurally conserved cores and a comprehensive library of explicit multiple alignments of distantly related protein families. PMID:11125048
The Histone Database: an integrated resource for histones and histone fold-containing proteins
Mariño-Ramírez, Leonardo; Levine, Kevin M.; Morales, Mario; Zhang, Suiyuan; Moreland, R. Travis; Baxevanis, Andreas D.; Landsman, David
2011-01-01
Eukaryotic chromatin is composed of DNA and protein components—core histones—that act to compactly pack the DNA into nucleosomes, the fundamental building blocks of chromatin. These nucleosomes are connected to adjacent nucleosomes by linker histones. Nucleosomes are highly dynamic and, through various core histone post-translational modifications and incorporation of diverse histone variants, can serve as epigenetic marks to control processes such as gene expression and recombination. The Histone Sequence Database is a curated collection of sequences and structures of histones and non-histone proteins containing histone folds, assembled from major public databases. Here, we report a substantial increase in the number of sequences and taxonomic coverage for histone and histone fold-containing proteins available in the database. Additionally, the database now contains an expanded dataset that includes archaeal histone sequences. The database also provides comprehensive multiple sequence alignments for each of the four core histones (H2A, H2B, H3 and H4), the linker histones (H1/H5) and the archaeal histones. The database also includes current information on solved histone fold-containing structures. The Histone Sequence Database is an inclusive resource for the analysis of chromatin structure and function focused on histones and histone fold-containing proteins. Database URL: The Histone Sequence Database is freely available and can be accessed at http://research.nhgri.nih.gov/histones/. PMID:22025671
Evolutionary trend toward kinetic stability in the folding trajectory of RNases H
Lim, Shion A.; Hart, Kathryn M.; Marqusee, Susan
2016-01-01
Proper folding of proteins is critical to producing the biological machinery essential for cellular function. The rates and energetics of a protein’s folding process, which is described by its energy landscape, are encoded in the amino acid sequence. Over the course of evolution, this landscape must be maintained such that the protein folds and remains folded over a biologically relevant time scale. How exactly a protein’s energy landscape is maintained or altered throughout evolution is unclear. To study how a protein’s energy landscape changed over time, we characterized the folding trajectories of ancestral proteins of the ribonuclease H (RNase H) family using ancestral sequence reconstruction to access the evolutionary history between RNases H from mesophilic and thermophilic bacteria. We found that despite large sequence divergence, the overall folding pathway is conserved over billions of years of evolution. There are robust trends in the rates of protein folding and unfolding; both modern RNases H evolved to be more kinetically stable than their most recent common ancestor. Finally, our study demonstrates how a partially folded intermediate provides a readily adaptable folding landscape by allowing the independent tuning of kinetics and thermodynamics. PMID:27799545
Evolutionary Dynamics on Protein Bi-stability Landscapes can Potentially Resolve Adaptive Conflicts
Sikosek, Tobias; Bornberg-Bauer, Erich; Chan, Hue Sun
2012-01-01
Experimental studies have shown that some proteins exist in two alternative native-state conformations. It has been proposed that such bi-stable proteins can potentially function as evolutionary bridges at the interface between two neutral networks of protein sequences that fold uniquely into the two different native conformations. Under adaptive conflict scenarios, bi-stable proteins may be of particular advantage if they simultaneously provide two beneficial biological functions. However, computational models that simulate protein structure evolution do not yet recognize the importance of bi-stability. Here we use a biophysical model to analyze sequence space to identify bi-stable or multi-stable proteins with two or more equally stable native-state structures. The inclusion of such proteins enhances phenotype connectivity between neutral networks in sequence space. Consideration of the sequence space neighborhood of bridge proteins revealed that bi-stability decreases gradually with each mutation that takes the sequence further away from an exactly bi-stable protein. With relaxed selection pressures, we found that bi-stable proteins in our model are highly successful under simulated adaptive conflict. Inspired by these model predictions, we developed a method to identify real proteins in the PDB with bridge-like properties, and have verified a clear bi-stability gradient for a series of mutants studied by Alexander et al. (Proc Nat Acad Sci USA 2009, 106:21149–21154) that connect two sequences that fold uniquely into two different native structures via a bridge-like intermediate mutant sequence. Based on these findings, new testable predictions for future studies on protein bi-stability and evolution are discussed. PMID:23028272
2014-01-01
Background Protein sequence similarities to any types of non-globular segments (coiled coils, low complexity regions, transmembrane regions, long loops, etc. where either positional sequence conservation is the result of a very simple, physically induced pattern or rather integral sequence properties are critical) are pertinent sources for mistaken homologies. Regretfully, these considerations regularly escape attention in large-scale annotation studies since, often, there is no substitute to manual handling of these cases. Quantitative criteria are required to suppress events of function annotation transfer as a result of false homology assignments. Results The sequence homology concept is based on the similarity comparison between the structural elements, the basic building blocks for conferring the overall fold of a protein. We propose to dissect the total similarity score into fold-critical and other, remaining contributions and suggest that, for a valid homology statement, the fold-relevant score contribution should at least be significant on its own. As part of the article, we provide the DissectHMMER software program for dissecting HMMER2/3 scores into segment-specific contributions. We show that DissectHMMER reproduces HMMER2/3 scores with sufficient accuracy and that it is useful in automated decisions about homology for instructive sequence examples. To generalize the dissection concept for cases without 3D structural information, we find that a dissection based on alignment quality is an appropriate surrogate. The approach was applied to a large-scale study of SMART and PFAM domains in the space of seed sequences and in the space of UniProt/SwissProt. Conclusions Sequence similarity core dissection with regard to fold-critical and other contributions systematically suppresses false hits and, additionally, recovers previously obscured homology relationships such as the one between aquaporins and formate/nitrite transporters that, so far, was only supported by structure comparison. PMID:24890864
Guarracino, Danielle A; Gentile, Kayla; Grossman, Alec; Li, Evan; Refai, Nader; Mohnot, Joy; King, Daniel
2018-02-01
Determining the minimal sequence necessary to induce protein folding is beneficial in understanding the role of protein-protein interactions in biological systems, as their three-dimensional structures often dictate their activity. Proteins are generally comprised of discrete secondary structures, from α-helices to β-turns and larger β-sheets, each of which is influenced by its primary structure. Manipulating the sequence of short, moderately helical peptides can help elucidate the influences on folding. We created two new scaffolds based on a modestly helical eight-residue peptide, PT3, we previously published. Using circular dichroism (CD) spectroscopy and changing the possible salt-bridging residues to new combinations of Lys, Arg, Glu, and Asp, we found that our most helical improvements came from the Arg-Glu combination, whereas the Lys-Asp was not significantly different from the Lys-Glu of the parent scaffold, PT3. The marked 3 10 -helical contributions in PT3 were lessened in the Arg-Glu-containing peptide with the beginning of cooperative unfolding seen through a thermal denaturation. However, a unique and unexpected signature was seen for the denaturation of the Lys-Asp peptide which could help elucidate the stages of folding between the 3 10 and α-helix. In addition, we developed a short six-residue peptide with β-turn/sheet CD signature, again to help study minimal sequences needed for folding. Overall, the results indicate that improvements made to short peptide scaffolds by fine-tuning the salt-bridging residues can enhance scaffold structure. Likewise, with the results from the new, short β-turn motif, these can help impact future peptidomimetic designs in creating biologically useful, short, structured β-sheet-forming peptides.
Sequence repeats and protein structure
NASA Astrophysics Data System (ADS)
Hoang, Trinh X.; Trovato, Antonio; Seno, Flavio; Banavar, Jayanth R.; Maritan, Amos
2012-11-01
Repeats are frequently found in known protein sequences. The level of sequence conservation in tandem repeats correlates with their propensities to be intrinsically disordered. We employ a coarse-grained model of a protein with a two-letter amino acid alphabet, hydrophobic (H) and polar (P), to examine the sequence-structure relationship in the realm of repeated sequences. A fraction of repeated sequences comprises a distinct class of bad folders, whose folding temperatures are much lower than those of random sequences. Imperfection in sequence repetition improves the folding properties of the bad folders while deteriorating those of the good folders. Our results may explain why nature has utilized repeated sequences for their versatility and especially to design functional proteins that are intrinsically unstructured at physiological temperatures.
β-Propeller Blades as Ancestral Peptides in Protein Evolution
Kopec, Klaus O.; Lupas, Andrei N.
2013-01-01
Proteins of the β-propeller fold are ubiquitous in nature and widely used as structural scaffolds for ligand binding and enzymatic activity. This fold comprises between four and twelve four-stranded β-meanders, the so called blades that are arranged circularly around a central funnel-shaped pore. Despite the large size range of β-propellers, their blades frequently show sequence similarity indicative of a common ancestry and it has been proposed that the majority of β-propellers arose divergently by amplification and diversification of an ancestral blade. Given the structural versatility of β-propellers and the hypothesis that the first folded proteins evolved from a simpler set of peptides, we investigated whether this blade may have given rise to other folds as well. Using sequence comparisons, we identified proteins of four other folds as potential homologs of β-propellers: the luminal domain of inositol-requiring enzyme 1 (IRE1-LD), type II β-prisms, β-pinwheels, and WW domains. Because, with increasing evolutionary distance and decreasing sequence length, the statistical significance of sequence comparisons becomes progressively harder to distinguish from the background of convergent similarities, we complemented our analyses with a new method that evaluates possible homology based on the correlation between sequence and structure similarity. Our results indicate a homologous relationship of IRE1-LD and type II β-prisms with β-propellers, and an analogous one for β-pinwheels and WW domains. Whereas IRE1-LD most likely originated by fold-changing mutations from a fully formed PQQ motif β-propeller, type II β-prisms originated by amplification and differentiation of a single blade, possibly also of the PQQ type. We conclude that both β-propellers and type II β-prisms arose by independent amplification of a blade-sized fragment, which represents a remnant of an ancient peptide world. PMID:24143202
NASA Astrophysics Data System (ADS)
Xu, Zhijun; Lazim, Raudah; Sun, Tiedong; Mei, Ye; Zhang, Dawei
2012-04-01
Solvent effect on protein conformation and folding mechanism of E6-associated protein (E6ap) peptide are investigated using a recently developed charge update scheme termed as adaptive hydrogen bond-specific charge (AHBC). On the basis of the close agreement between the calculated helix contents from AHBC simulations and experimental results, we observed based on the presented simulations that the two ends of the peptide may simultaneously take part in the formation of the helical structure at the early stage of folding and finally merge to form a helix with lowest backbone RMSD of about 0.9 Å in 40% 2,2,2-trifluoroethanol solution. However, in pure water, the folding may start at the center of the peptide sequence instead of at the two opposite ends. The analysis of the free energy landscape indicates that the solvent may determine the folding clusters of E6ap, which subsequently leads to the different final folded structure. The current study demonstrates new insight to the role of solvent in the determination of protein structure and folding dynamics.
Robic, Srebrenka
2010-01-01
To fully understand the roles proteins play in cellular processes, students need to grasp complex ideas about protein structure, folding, and stability. Our current understanding of these topics is based on mathematical models and experimental data. However, protein structure, folding, and stability are often introduced as descriptive, qualitative phenomena in undergraduate classes. In the process of learning about these topics, students often form incorrect ideas. For example, by learning about protein folding in the context of protein synthesis, students may come to an incorrect conclusion that once synthesized on the ribosome, a protein spends its entire cellular life time in its fully folded native confirmation. This is clearly not true; proteins are dynamic structures that undergo both local fluctuations and global unfolding events. To prevent and address such misconceptions, basic concepts of protein science can be introduced in the context of simple mathematical models and hands-on explorations of publicly available data sets. Ten common misconceptions about proteins are presented, along with suggestions for using equations, models, sequence, structure, and thermodynamic data to help students gain a deeper understanding of basic concepts relating to protein structure, folding, and stability.
Use of conserved key amino acid positions to morph protein folds.
Reddy, Boojala V B; Li, Wilfred W; Bourne, Philip E
2002-07-15
By using three-dimensional (3D) structure alignments and a previously published method to determine Conserved Key Amino Acid Positions (CKAAPs) we propose a theoretical method to design mutations that can be used to morph the protein folds. The original Paracelsus challenge, met by several groups, called for the engineering of a stable but different structure by modifying less than 50% of the amino acid residues. We have used the sequences from the Protein Data Bank (PDB) identifiers 1ROP, and 2CRO, which were previously used in the Paracelsus challenge by those groups, and suggest mutation to CKAAPs to morph the protein fold. The total number of mutations suggested is less than 40% of the starting sequence theoretically improving the challenge results. From secondary structure prediction experiments of the proposed mutant sequence structures, we observe that each of the suggested mutant protein sequences likely folds to a different, non-native potentially stable target structure. These results are an early indicator that analyses using structure alignments leading to CKAAPs of a given structure are of value in protein engineering experiments. Copyright 2002 Wiley Periodicals, Inc.
Sequence-structure mapping errors in the PDB: OB-fold domains
Venclovas, Česlovas; Ginalski, Krzysztof; Kang, Chulhee
2004-01-01
The Protein Data Bank (PDB) is the single most important repository of structural data for proteins and other biologically relevant molecules. Therefore, it is critically important to keep the PDB data, as much as possible, error-free. In this study, we have analyzed PDB crystal structures possessing oligonucleotide/oligosaccharide binding (OB)-fold, one of the highly populated folds, for the presence of sequence-structure mapping errors. Using energy-based structure quality assessment coupled with sequence analyses, we have found that there are at least five OB-structures in the PDB that have regions where sequences have been incorrectly mapped onto the structure. We have demonstrated that the combination of these computation techniques is effective not only in detecting sequence-structure mapping errors, but also in providing guidance to correct them. Namely, we have used results of computational analysis to direct a revision of X-ray data for one of the PDB entries containing a fairly inconspicuous sequence-structure mapping error. The revised structure has been deposited with the PDB. We suggest use of computational energy assessment and sequence analysis techniques to facilitate structure determination when homologs having known structure are available to use as a reference. Such computational analysis may be useful in either guiding the sequence-structure assignment process or verifying the sequence mapping within poorly defined regions. PMID:15133161
Rotondi, Kenneth S; Gierasch, Lila M
2003-07-08
The experiments described here explore the role of local sequence in the folding of cellular retinoic acid binding protein I (CRABP I). This is a 136-residue, 10-stranded, antiparallel beta-barrel protein with seven beta-hairpins and is a member of the intracellular lipid binding protein (iLBP) family. The relative roles of local and global sequence information in governing the folding of this class of proteins are not well-understood. In question is whether the beta-turns are locally defined by short-range interactions within their sequences, and are thus able to play an active role in reducing the conformational space available to the folding chain, or whether the turns are passive, relying upon global forces to form. Short (six- and seven-residue) peptides corresponding to the seven CRABP I turns were analyzed by circular dichroism and NMR for their tendencies to take up the conformations they adopt in the context of the native protein. The results indicate that two of the peptides, encompassing turns III and IV in CRABP I, have a strong intrinsic bias to form native turns. Intriguingly, these turns are on linked hairpins in CRABP I and represent the best-conserved turns in the iLBP family. These results suggest that local sequence may play an important role in narrowing the conformational ensemble of CRABP I during folding.
Schmidt Am Busch, Marcel; Sedano, Audrey; Simonson, Thomas
2010-05-05
Protein fold recognition usually relies on a statistical model of each fold; each model is constructed from an ensemble of natural sequences belonging to that fold. A complementary strategy may be to employ sequence ensembles produced by computational protein design. Designed sequences can be more diverse than natural sequences, possibly avoiding some limitations of experimental databases. WE EXPLORE THIS STRATEGY FOR FOUR SCOP FAMILIES: Small Kunitz-type inhibitors (SKIs), Interleukin-8 chemokines, PDZ domains, and large Caspase catalytic subunits, represented by 43 structures. An automated procedure is used to redesign the 43 proteins. We use the experimental backbones as fixed templates in the folded state and a molecular mechanics model to compute the interaction energies between sidechain and backbone groups. Calculations are done with the Proteins@Home volunteer computing platform. A heuristic algorithm is used to scan the sequence and conformational space, yielding 200,000-300,000 sequences per backbone template. The results confirm and generalize our earlier study of SH2 and SH3 domains. The designed sequences ressemble moderately-distant, natural homologues of the initial templates; e.g., the SUPERFAMILY, profile Hidden-Markov Model library recognizes 85% of the low-energy sequences as native-like. Conversely, Position Specific Scoring Matrices derived from the sequences can be used to detect natural homologues within the SwissProt database: 60% of known PDZ domains are detected and around 90% of known SKIs and chemokines. Energy components and inter-residue correlations are analyzed and ways to improve the method are discussed. For some families, designed sequences can be a useful complement to experimental ones for homologue searching. However, improved tools are needed to extract more information from the designed profiles before the method can be of general use.
Sequence Determinants of Compaction in Intrinsically Disordered Proteins
Marsh, Joseph A.; Forman-Kay, Julie D.
2010-01-01
Abstract Intrinsically disordered proteins (IDPs), which lack folded structure and are disordered under nondenaturing conditions, have been shown to perform important functions in a large number of cellular processes. These proteins have interesting structural properties that deviate from the random-coil-like behavior exhibited by chemically denatured proteins. In particular, IDPs are often observed to exhibit significant compaction. In this study, we have analyzed the hydrodynamic radii of a number of IDPs to investigate the sequence determinants of this compaction. Net charge and proline content are observed to be strongly correlated with increased hydrodynamic radii, suggesting that these are the dominant contributors to compaction. Hydrophobicity and secondary structure, on the other hand, appear to have negligible effects on compaction, which implies that the determinants of structure in folded and intrinsically disordered proteins are profoundly different. Finally, we observe that polyhistidine tags seem to increase IDP compaction, which suggests that these tags have significant perturbing effects and thus should be removed before any structural characterizations of IDPs. Using the relationships observed in this analysis, we have developed a sequence-based predictor of hydrodynamic radius for IDPs that shows substantial improvement over a simple model based upon chain length alone. PMID:20483348
Characterization of protein-folding pathways by reduced-space modeling.
Kmiecik, Sebastian; Kolinski, Andrzej
2007-07-24
Ab initio simulations of the folding pathways are currently limited to very small proteins. For larger proteins, some approximations or simplifications in protein models need to be introduced. Protein folding and unfolding are among the basic processes in the cell and are very difficult to characterize in detail by experiment or simulation. Chymotrypsin inhibitor 2 (CI2) and barnase are probably the best characterized experimentally in this respect. For these model systems, initial folding stages were simulated by using CA-CB-side chain (CABS), a reduced-space protein-modeling tool. CABS employs knowledge-based potentials that proved to be very successful in protein structure prediction. With the use of isothermal Monte Carlo (MC) dynamics, initiation sites with a residual structure and weak tertiary interactions were identified. Such structures are essential for the initiation of the folding process through a sequential reduction of the protein conformational space, overcoming the Levinthal paradox in this manner. Furthermore, nucleation sites that initiate a tertiary interactions network were located. The MC simulations correspond perfectly to the results of experimental and theoretical research and bring insights into CI2 folding mechanism: unambiguous sequence of folding events was reported as well as cooperative substructures compatible with those obtained in recent molecular dynamics unfolding studies. The correspondence between the simulation and experiment shows that knowledge-based potentials are not only useful in protein structure predictions but are also capable of reproducing the folding pathways. Thus, the results of this work significantly extend the applicability range of reduced models in the theoretical study of proteins.
Can natural proteins designed with 'inverted' peptide sequences adopt native-like protein folds?
Sridhar, Settu; Guruprasad, Kunchur
2014-01-01
We have carried out a systematic computational analysis on a representative dataset of proteins of known three-dimensional structure, in order to evaluate whether it would possible to 'swap' certain short peptide sequences in naturally occurring proteins with their corresponding 'inverted' peptides and generate 'artificial' proteins that are predicted to retain native-like protein fold. The analysis of 3,967 representative proteins from the Protein Data Bank revealed 102,677 unique identical inverted peptide sequence pairs that vary in sequence length between 5-12 and 18 amino acid residues. Our analysis illustrates with examples that such 'artificial' proteins may be generated by identifying peptides with 'similar structural environment' and by using comparative protein modeling and validation studies. Our analysis suggests that natural proteins may be tolerant to accommodating such peptides.
Concomitant prediction of function and fold at the domain level with GO-based profiles.
Lopez, Daniel; Pazos, Florencio
2013-01-01
Predicting the function of newly sequenced proteins is crucial due to the pace at which these raw sequences are being obtained. Almost all resources for predicting protein function assign functional terms to whole chains, and do not distinguish which particular domain is responsible for the allocated function. This is not a limitation of the methodologies themselves but it is due to the fact that in the databases of functional annotations these methods use for transferring functional terms to new proteins, these annotations are done on a whole-chain basis. Nevertheless, domains are the basic evolutionary and often functional units of proteins. In many cases, the domains of a protein chain have distinct molecular functions, independent from each other. For that reason resources with functional annotations at the domain level, as well as methodologies for predicting function for individual domains adapted to these resources are required.We present a methodology for predicting the molecular function of individual domains, based on a previously developed database of functional annotations at the domain level. The approach, which we show outperforms a standard method based on sequence searches in assigning function, concomitantly predicts the structural fold of the domains and can give hints on the functionally important residues associated to the predicted function.
Revealing the global map of protein folding space by large-scale simulations
NASA Astrophysics Data System (ADS)
Sinner, Claude; Lutz, Benjamin; Verma, Abhinav; Schug, Alexander
2015-12-01
The full characterization of protein folding is a remarkable long-standing challenge both for experiment and simulation. Working towards a complete understanding of this process, one needs to cover the full diversity of existing folds and identify the general principles driving the process. Here, we want to understand and quantify the diversity in folding routes for a large and representative set of protein topologies covering the full range from all alpha helical topologies towards beta barrels guided by the key question: Does the majority of the observed routes contribute to the folding process or only a particular route? We identified a set of two-state folders among non-homologous proteins with a sequence length of 40-120 residues. For each of these proteins, we ran native-structure based simulations both with homogeneous and heterogeneous contact potentials. For each protein, we simulated dozens of folding transitions in continuous uninterrupted simulations and constructed a large database of kinetic parameters. We investigate folding routes by tracking the formation of tertiary structure interfaces and discuss whether a single specific route exists for a topology or if all routes are equiprobable. These results permit us to characterize the complete folding space for small proteins in terms of folding barrier ΔG‡, number of routes, and the route specificity RT.
Liang, Yunyun; Liu, Sanyang; Zhang, Shengli
2015-01-01
Prediction of protein structural classes for low-similarity sequences is useful for understanding fold patterns, regulation, functions, and interactions of proteins. It is well known that feature extraction is significant to prediction of protein structural class and it mainly uses protein primary sequence, predicted secondary structure sequence, and position-specific scoring matrix (PSSM). Currently, prediction solely based on the PSSM has played a key role in improving the prediction accuracy. In this paper, we propose a novel method called CSP-SegPseP-SegACP by fusing consensus sequence (CS), segmented PsePSSM, and segmented autocovariance transformation (ACT) based on PSSM. Three widely used low-similarity datasets (1189, 25PDB, and 640) are adopted in this paper. Then a 700-dimensional (700D) feature vector is constructed and the dimension is decreased to 224D by using principal component analysis (PCA). To verify the performance of our method, rigorous jackknife cross-validation tests are performed on 1189, 25PDB, and 640 datasets. Comparison of our results with the existing PSSM-based methods demonstrates that our method achieves the favorable and competitive performance. This will offer an important complementary to other PSSM-based methods for prediction of protein structural classes for low-similarity sequences.
Shamim, Mohammad Tabrez Anwar; Anwaruddin, Mohammad; Nagarajaram, H A
2007-12-15
Fold recognition is a key step in the protein structure discovery process, especially when traditional sequence comparison methods fail to yield convincing structural homologies. Although many methods have been developed for protein fold recognition, their accuracies remain low. This can be attributed to insufficient exploitation of fold discriminatory features. We have developed a new method for protein fold recognition using structural information of amino acid residues and amino acid residue pairs. Since protein fold recognition can be treated as a protein fold classification problem, we have developed a Support Vector Machine (SVM) based classifier approach that uses secondary structural state and solvent accessibility state frequencies of amino acids and amino acid pairs as feature vectors. Among the individual properties examined secondary structural state frequencies of amino acids gave an overall accuracy of 65.2% for fold discrimination, which is better than the accuracy by any method reported so far in the literature. Combination of secondary structural state frequencies with solvent accessibility state frequencies of amino acids and amino acid pairs further improved the fold discrimination accuracy to more than 70%, which is approximately 8% higher than the best available method. In this study we have also tested, for the first time, an all-together multi-class method known as Crammer and Singer method for protein fold classification. Our studies reveal that the three multi-class classification methods, namely one versus all, one versus one and Crammer and Singer method, yield similar predictions. Dataset and stand-alone program are available upon request.
Shao, Qiang
2014-06-05
A comparative study on the folding of multiple three-α-helix bundle proteins including α3D, α3W, and the B domain of protein A (BdpA) is presented. The use of integrated-tempering-sampling molecular dynamics simulations achieves reversible folding and unfolding events in individual short trajectories, which thus provides an efficient approach to sufficiently sample the configuration space of protein and delineate the folding pathway of α-helix bundle. The detailed free energy landscape analyses indicate that the folding mechanism of α-helix bundle is not uniform but sequence dependent. A simple model is then proposed to predict folding mechanism of α-helix bundle on the basis of amino acid composition: α-helical proteins containing higher percentage of hydrophobic residues than charged ones fold via nucleation-condensation mechanism (e.g., α3D and BdpA) whereas proteins having opposite tendency in amino acid composition more likely fold via the framework mechanism (e.g., α3W). The model is tested on various α-helix bundle proteins, and the predicted mechanism is similar to the most approved one for each protein. In addition, the common features in the folding pathway of α-helix bundle protein are also deduced. In summary, the present study provides comprehensive, atomic-level picture of the folding of α-helix bundle proteins.
Analysis of sequence repeats of proteins in the PDB.
Mary Rajathei, David; Selvaraj, Samuel
2013-12-01
Internal repeats in protein sequences play a significant role in the evolution of protein structure and function. Applications of different bioinformatics tools help in the identification and characterization of these repeats. In the present study, we analyzed sequence repeats in a non-redundant set of proteins available in the Protein Data Bank (PDB). We used RADAR for detecting internal repeats in a protein, PDBeFOLD for assessing structural similarity, PDBsum for finding functional involvement and Pfam for domain assignment of the repeats in a protein. Through the analysis of sequence repeats, we found that identity of the sequence repeats falls in the range of 20-40% and, the superimposed structures of the most of the sequence repeats maintain similar overall folding. Analysis sequence repeats at the functional level reveals that most of the sequence repeats are involved in the function of the protein through functionally involved residues in the repeat regions. We also found that sequence repeats in single and two domain proteins often contained conserved sequence motifs for the function of the domain. Copyright © 2013 Elsevier Ltd. All rights reserved.
Street, Timothy O; Barrick, Doug
2009-01-01
The Notch ankyrin domain is a repeat protein whose folding has been characterized through equilibrium and kinetic measurements. In previous work, equilibrium folding free energies of truncated constructs were used to generate an experimentally determined folding energy landscape (Mello and Barrick, Proc Natl Acad Sci USA 2004;101:14102–14107). Here, this folding energy landscape is used to parameterize a kinetic model in which local transition probabilities between partly folded states are based on energy values from the landscape. The landscape-based model correctly predicts highly diverse experimentally determined folding kinetics of the Notch ankyrin domain and sequence variants. These predictions include monophasic folding and biphasic unfolding, curvature in the unfolding limb of the chevron plot, population of a transient unfolding intermediate, relative folding rates of 19 variants spanning three orders of magnitude, and a change in the folding pathway that results from C-terminal stabilization. These findings indicate that the folding pathway(s) of the Notch ankyrin domain are thermodynamically selected: the primary determinants of kinetic behavior can be simply deduced from the local stability of individual repeats. PMID:19177351
Efficient use of unlabeled data for protein sequence classification: a comparative study.
Kuksa, Pavel; Huang, Pai-Hsi; Pavlovic, Vladimir
2009-04-29
Recent studies in computational primary protein sequence analysis have leveraged the power of unlabeled data. For example, predictive models based on string kernels trained on sequences known to belong to particular folds or superfamilies, the so-called labeled data set, can attain significantly improved accuracy if this data is supplemented with protein sequences that lack any class tags-the unlabeled data. In this study, we present a principled and biologically motivated computational framework that more effectively exploits the unlabeled data by only using the sequence regions that are more likely to be biologically relevant for better prediction accuracy. As overly-represented sequences in large uncurated databases may bias the estimation of computational models that rely on unlabeled data, we also propose a method to remove this bias and improve performance of the resulting classifiers. Combined with state-of-the-art string kernels, our proposed computational framework achieves very accurate semi-supervised protein remote fold and homology detection on three large unlabeled databases. It outperforms current state-of-the-art methods and exhibits significant reduction in running time. The unlabeled sequences used under the semi-supervised setting resemble the unpolished gemstones; when used as-is, they may carry unnecessary features and hence compromise the classification accuracy but once cut and polished, they improve the accuracy of the classifiers considerably.
Zhang, Yang
2014-01-01
We develop and test a new pipeline in CASP10 to predict protein structures based on an interplay of I-TASSER and QUARK for both free-modeling (FM) and template-based modeling (TBM) targets. The most noteworthy observation is that sorting through the threading template pool using the QUARK-based ab initio models as probes allows the detection of distant-homology templates which might be ignored by the traditional sequence profile-based threading alignment algorithms. Further template assembly refinement by I-TASSER resulted in successful folding of two medium-sized FM targets with >150 residues. For TBM, the multiple threading alignments from LOMETS are, for the first time, incorporated into the ab initio QUARK simulations, which were further refined by I-TASSER assembly refinement. Compared with the traditional threading assembly refinement procedures, the inclusion of the threading-constrained ab initio folding models can consistently improve the quality of the full-length models as assessed by the GDT-HA and hydrogen-bonding scores. Despite the success, significant challenges still exist in domain boundary prediction and consistent folding of medium-size proteins (especially beta-proteins) for nonhomologous targets. Further developments of sensitive fold-recognition and ab initio folding methods are critical for solving these problems. PMID:23760925
Zhang, Yang
2014-02-01
We develop and test a new pipeline in CASP10 to predict protein structures based on an interplay of I-TASSER and QUARK for both free-modeling (FM) and template-based modeling (TBM) targets. The most noteworthy observation is that sorting through the threading template pool using the QUARK-based ab initio models as probes allows the detection of distant-homology templates which might be ignored by the traditional sequence profile-based threading alignment algorithms. Further template assembly refinement by I-TASSER resulted in successful folding of two medium-sized FM targets with >150 residues. For TBM, the multiple threading alignments from LOMETS are, for the first time, incorporated into the ab initio QUARK simulations, which were further refined by I-TASSER assembly refinement. Compared with the traditional threading assembly refinement procedures, the inclusion of the threading-constrained ab initio folding models can consistently improve the quality of the full-length models as assessed by the GDT-HA and hydrogen-bonding scores. Despite the success, significant challenges still exist in domain boundary prediction and consistent folding of medium-size proteins (especially beta-proteins) for nonhomologous targets. Further developments of sensitive fold-recognition and ab initio folding methods are critical for solving these problems. Copyright © 2013 Wiley Periodicals, Inc.
Computational design of water-soluble α-helical barrels.
Thomson, Andrew R; Wood, Christopher W; Burton, Antony J; Bartlett, Gail J; Sessions, Richard B; Brady, R Leo; Woolfson, Derek N
2014-10-24
The design of protein sequences that fold into prescribed de novo structures is challenging. General solutions to this problem require geometric descriptions of protein folds and methods to fit sequences to these. The α-helical coiled coils present a promising class of protein for this and offer considerable scope for exploring hitherto unseen structures. For α-helical barrels, which have more than four helices and accessible central channels, many of the possible structures remain unobserved. Here, we combine geometrical considerations, knowledge-based scoring, and atomistic modeling to facilitate the design of new channel-containing α-helical barrels. X-ray crystal structures of the resulting designs match predicted in silico models. Furthermore, the observed channels are chemically defined and have diameters related to oligomer state, which present routes to design protein function. Copyright © 2014, American Association for the Advancement of Science.
Blind test of physics-based prediction of protein structures.
Shell, M Scott; Ozkan, S Banu; Voelz, Vincent; Wu, Guohong Albert; Dill, Ken A
2009-02-01
We report here a multiprotein blind test of a computer method to predict native protein structures based solely on an all-atom physics-based force field. We use the AMBER 96 potential function with an implicit (GB/SA) model of solvation, combined with replica-exchange molecular-dynamics simulations. Coarse conformational sampling is performed using the zipping and assembly method (ZAM), an approach that is designed to mimic the putative physical routes of protein folding. ZAM was applied to the folding of six proteins, from 76 to 112 monomers in length, in CASP7, a community-wide blind test of protein structure prediction. Because these predictions have about the same level of accuracy as typical bioinformatics methods, and do not utilize information from databases of known native structures, this work opens up the possibility of predicting the structures of membrane proteins, synthetic peptides, or other foldable polymers, for which there is little prior knowledge of native structures. This approach may also be useful for predicting physical protein folding routes, non-native conformations, and other physical properties from amino acid sequences.
Blind Test of Physics-Based Prediction of Protein Structures
Shell, M. Scott; Ozkan, S. Banu; Voelz, Vincent; Wu, Guohong Albert; Dill, Ken A.
2009-01-01
We report here a multiprotein blind test of a computer method to predict native protein structures based solely on an all-atom physics-based force field. We use the AMBER 96 potential function with an implicit (GB/SA) model of solvation, combined with replica-exchange molecular-dynamics simulations. Coarse conformational sampling is performed using the zipping and assembly method (ZAM), an approach that is designed to mimic the putative physical routes of protein folding. ZAM was applied to the folding of six proteins, from 76 to 112 monomers in length, in CASP7, a community-wide blind test of protein structure prediction. Because these predictions have about the same level of accuracy as typical bioinformatics methods, and do not utilize information from databases of known native structures, this work opens up the possibility of predicting the structures of membrane proteins, synthetic peptides, or other foldable polymers, for which there is little prior knowledge of native structures. This approach may also be useful for predicting physical protein folding routes, non-native conformations, and other physical properties from amino acid sequences. PMID:19186130
De novo design of a four-fold symmetric TIM-barrel protein with atomic-level accuracy
Parmeggiani, Fabio; Velasco, D. Alejandro Fernandez; Höcker, Birte; Baker, David
2015-01-01
Despite efforts for over 25 years, de novo protein design has not succeeded in achieving the TIM-barrel fold. Here we describe the computational design of 4-fold symmetrical (β/α)8-barrels guided by geometrical and chemical principles. Experimental characterization of 33 designs revealed the importance of sidechain-backbone hydrogen bonding for defining the strand register between repeat units. The X-ray crystal structure of a designed thermostable 184-residue protein is nearly identical with the designed TIM-barrel model. PSI-BLAST searches do not identify sequence similarities to known TIM-barrel proteins, and sensitive profile-profile searches indicate that the design sequence is distant from other naturally occurring TIM-barrel superfamilies, suggesting that Nature has only sampled a subset of the sequence space available to the TIM-barrel fold. The ability to de novo design TIM-barrels opens new possibilities for custom-made enzymes. PMID:26595462
Christensen, Signe; Horowitz, Scott; Bardwell, James C.A.; Olsen, Johan G.; Willemoës, Martin; Lindorff-Larsen, Kresten; Ferkinghoff-Borg, Jesper; Hamelryck, Thomas; Winther, Jakob R.
2017-01-01
Despite the development of powerful computational tools, the full-sequence design of proteins still remains a challenging task. To investigate the limits and capabilities of computational tools, we conducted a study of the ability of the program Rosetta to predict sequences that recreate the authentic fold of thioredoxin. Focusing on the influence of conformational details in the template structures, we based our study on 8 experimentally determined template structures and generated 120 designs from each. For experimental evaluation, we chose six sequences from each of the eight templates by objective criteria. The 48 selected sequences were evaluated based on their progressive ability to (1) produce soluble protein in Escherichia coli and (2) yield stable monomeric protein, and (3) on the ability of the stable, soluble proteins to adopt the target fold. Of the 48 designs, we were able to synthesize 32, 20 of which resulted in soluble protein. Of these, only two were sufficiently stable to be purified. An X-ray crystal structure was solved for one of the designs, revealing a close resemblance to the target structure. We found a significant difference among the eight template structures to realize the above three criteria despite their high structural similarity. Thus, in order to improve the success rate of computational full-sequence design methods, we recommend that multiple template structures are used. Furthermore, this study shows that special care should be taken when optimizing the geometry of a structure prior to computational design when using a method that is based on rigid conformations. PMID:27659562
Johansson, Kristoffer E; Tidemand Johansen, Nicolai; Christensen, Signe; Horowitz, Scott; Bardwell, James C A; Olsen, Johan G; Willemoës, Martin; Lindorff-Larsen, Kresten; Ferkinghoff-Borg, Jesper; Hamelryck, Thomas; Winther, Jakob R
2016-10-23
Despite the development of powerful computational tools, the full-sequence design of proteins still remains a challenging task. To investigate the limits and capabilities of computational tools, we conducted a study of the ability of the program Rosetta to predict sequences that recreate the authentic fold of thioredoxin. Focusing on the influence of conformational details in the template structures, we based our study on 8 experimentally determined template structures and generated 120 designs from each. For experimental evaluation, we chose six sequences from each of the eight templates by objective criteria. The 48 selected sequences were evaluated based on their progressive ability to (1) produce soluble protein in Escherichia coli and (2) yield stable monomeric protein, and (3) on the ability of the stable, soluble proteins to adopt the target fold. Of the 48 designs, we were able to synthesize 32, 20 of which resulted in soluble protein. Of these, only two were sufficiently stable to be purified. An X-ray crystal structure was solved for one of the designs, revealing a close resemblance to the target structure. We found a significant difference among the eight template structures to realize the above three criteria despite their high structural similarity. Thus, in order to improve the success rate of computational full-sequence design methods, we recommend that multiple template structures are used. Furthermore, this study shows that special care should be taken when optimizing the geometry of a structure prior to computational design when using a method that is based on rigid conformations. Copyright © 2016 Elsevier Ltd. All rights reserved.
Ibrahim, Wisam; Abadeh, Mohammad Saniee
2017-05-21
Protein fold recognition is an important problem in bioinformatics to predict three-dimensional structure of a protein. One of the most challenging tasks in protein fold recognition problem is the extraction of efficient features from the amino-acid sequences to obtain better classifiers. In this paper, we have proposed six descriptors to extract features from protein sequences. These descriptors are applied in the first stage of a three-stage framework PCA-DELM-LDA to extract feature vectors from the amino-acid sequences. Principal Component Analysis PCA has been implemented to reduce the number of extracted features. The extracted feature vectors have been used with original features to improve the performance of the Deep Extreme Learning Machine DELM in the second stage. Four new features have been extracted from the second stage and used in the third stage by Linear Discriminant Analysis LDA to classify the instances into 27 folds. The proposed framework is implemented on the independent and combined feature sets in SCOP datasets. The experimental results show that extracted feature vectors in the first stage could improve the performance of DELM in extracting new useful features in second stage. Copyright © 2017 Elsevier Ltd. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Le Coq, Johanne; Ghosh, Partho
2012-06-19
Anticipatory ligand binding through massive protein sequence variation is rare in biological systems, having been observed only in the vertebrate adaptive immune response and in a phage diversity-generating retroelement (DGR). Earlier work has demonstrated that the prototypical DGR variable protein, major tropism determinant (Mtd), meets the demands of anticipatory ligand binding by novel means through the C-type lectin (CLec) fold. However, because of the low sequence identity among DGR variable proteins, it has remained unclear whether the CLec fold is a general solution for DGRs. We have addressed this problem by determining the structure of a second DGR variable protein,more » TvpA, from the pathogenic oral spirochete Treponema denticola. Despite its weak sequence identity to Mtd ({approx}16%), TvpA was found to also have a CLec fold, with predicted variable residues exposed in a ligand-binding site. However, this site in TvpA was markedly more variable than the one in Mtd, reflecting the unprecedented approximate 10{sup 20} potential variability of TvpA. In addition, similarity between TvpA and Mtd with formylglycine-generating enzymes was detected. These results provide strong evidence for the conservation of the formylglycine-generating enzyme-type CLec fold among DGRs as a means of accommodating massive sequence variation.« less
Le Coq, Johanne; Ghosh, Partho
2011-01-01
Anticipatory ligand binding through massive protein sequence variation is rare in biological systems, having been observed only in the vertebrate adaptive immune response and in a phage diversity-generating retroelement (DGR). Earlier work has demonstrated that the prototypical DGR variable protein, major tropism determinant (Mtd), meets the demands of anticipatory ligand binding by novel means through the C-type lectin (CLec) fold. However, because of the low sequence identity among DGR variable proteins, it has remained unclear whether the CLec fold is a general solution for DGRs. We have addressed this problem by determining the structure of a second DGR variable protein, TvpA, from the pathogenic oral spirochete Treponema denticola. Despite its weak sequence identity to Mtd (∼16%), TvpA was found to also have a CLec fold, with predicted variable residues exposed in a ligand-binding site. However, this site in TvpA was markedly more variable than the one in Mtd, reflecting the unprecedented approximate 1020 potential variability of TvpA. In addition, similarity between TvpA and Mtd with formylglycine-generating enzymes was detected. These results provide strong evidence for the conservation of the formylglycine-generating enzyme-type CLec fold among DGRs as a means of accommodating massive sequence variation. PMID:21873231
Le Coq, Johanne; Ghosh, Partho
2011-08-30
Anticipatory ligand binding through massive protein sequence variation is rare in biological systems, having been observed only in the vertebrate adaptive immune response and in a phage diversity-generating retroelement (DGR). Earlier work has demonstrated that the prototypical DGR variable protein, major tropism determinant (Mtd), meets the demands of anticipatory ligand binding by novel means through the C-type lectin (CLec) fold. However, because of the low sequence identity among DGR variable proteins, it has remained unclear whether the CLec fold is a general solution for DGRs. We have addressed this problem by determining the structure of a second DGR variable protein, TvpA, from the pathogenic oral spirochete Treponema denticola. Despite its weak sequence identity to Mtd (∼16%), TvpA was found to also have a CLec fold, with predicted variable residues exposed in a ligand-binding site. However, this site in TvpA was markedly more variable than the one in Mtd, reflecting the unprecedented approximate 10(20) potential variability of TvpA. In addition, similarity between TvpA and Mtd with formylglycine-generating enzymes was detected. These results provide strong evidence for the conservation of the formylglycine-generating enzyme-type CLec fold among DGRs as a means of accommodating massive sequence variation.
Zhang, Hua; Zhang, Tuo; Gao, Jianzhao; Ruan, Jishou; Shen, Shiyi; Kurgan, Lukasz
2012-01-01
Proteins fold through a two-state (TS), with no visible intermediates, or a multi-state (MS), via at least one intermediate, process. We analyze sequence-derived factors that determine folding types by introducing a novel sequence-based folding type predictor called FOKIT. This method implements a logistic regression model with six input features which hybridize information concerning amino acid composition and predicted secondary structure and solvent accessibility. FOKIT provides predictions with average Matthews correlation coefficient (MCC) between 0.58 and 0.91 measured using out-of-sample tests on four benchmark datasets. These results are shown to be competitive or better than results of four modern predictors. We also show that FOKIT outperforms these methods when predicting chains that share low similarity with the chains used to build the model, which is an important advantage given the limited number of annotated chains. We demonstrate that inclusion of solvent accessibility helps in discrimination of the folding kinetic types and that three of the features constitute statistically significant markers that differentiate TS and MS folders. We found that the increased content of exposed Trp and buried Leu are indicative of the MS folding, which implies that the exposure/burial of certain hydrophobic residues may play important role in the formation of the folding intermediates. Our conclusions are supported by two case studies.
Li, Bai; Lin, Mu; Liu, Qiao; Li, Ya; Zhou, Changjun
2015-10-01
Protein folding is a fundamental topic in molecular biology. Conventional experimental techniques for protein structure identification or protein folding recognition require strict laboratory requirements and heavy operating burdens, which have largely limited their applications. Alternatively, computer-aided techniques have been developed to optimize protein structures or to predict the protein folding process. In this paper, we utilize a 3D off-lattice model to describe the original protein folding scheme as a simplified energy-optimal numerical problem, where all types of amino acid residues are binarized into hydrophobic and hydrophilic ones. We apply a balance-evolution artificial bee colony (BE-ABC) algorithm as the minimization solver, which is featured by the adaptive adjustment of search intensity to cater for the varying needs during the entire optimization process. In this work, we establish a benchmark case set with 13 real protein sequences from the Protein Data Bank database and evaluate the convergence performance of BE-ABC algorithm through strict comparisons with several state-of-the-art ABC variants in short-term numerical experiments. Besides that, our obtained best-so-far protein structures are compared to the ones in comprehensive previous literature. This study also provides preliminary insights into how artificial intelligence techniques can be applied to reveal the dynamics of protein folding. Graphical Abstract Protein folding optimization using 3D off-lattice model and advanced optimization techniques.
``Sequence space soup'' of proteins and copolymers
NASA Astrophysics Data System (ADS)
Chan, Hue Sun; Dill, Ken A.
1991-09-01
To study the protein folding problem, we use exhaustive computer enumeration to explore ``sequence space soup,'' an imaginary solution containing the ``native'' conformations (i.e., of lowest free energy) under folding conditions, of every possible copolymer sequence. The model is of short self-avoiding chains of hydrophobic (H) and polar (P) monomers configured on the two-dimensional square lattice. By exhaustive enumeration, we identify all native structures for every possible sequence. We find that random sequences of H/P copolymers will bear striking resemblance to known proteins: Most sequences under folding conditions will be approximately as compact as known proteins, will have considerable amounts of secondary structure, and it is most probable that an arbitrary sequence will fold to a number of lowest free energy conformations that is of order one. In these respects, this simple model shows that proteinlike behavior should arise simply in copolymers in which one monomer type is highly solvent averse. It suggests that the structures and uniquenesses of native proteins are not consequences of having 20 different monomer types, or of unique properties of amino acid monomers with regard to special packing or interactions, and thus that simple copolymers might be designable to collapse to proteinlike structures and properties. A good strategy for designing a sequence to have a minimum possible number of native states is to strategically insert many P monomers. Thus known proteins may be marginally stable due to a balance: More H residues stabilize the desired native state, but more P residues prevent simultaneous stabilization of undesired native states.
Li, Hongdong; Zhang, Yang; Guan, Yuanfang; Menon, Rajasree; Omenn, Gilbert S
2017-01-01
Tens of thousands of splice isoforms of proteins have been catalogued as predicted sequences from transcripts in humans and other species. Relatively few have been characterized biochemically or structurally. With the extensive development of protein bioinformatics, the characterization and modeling of isoform features, isoform functions, and isoform-level networks have advanced notably. Here we present applications of the I-TASSER family of algorithms for folding and functional predictions and the IsoFunc, MIsoMine, and Hisonet data resources for isoform-level analyses of network and pathway-based functional predictions and protein-protein interactions. Hopefully, predictions and insights from protein bioinformatics will stimulate many experimental validation studies.
Kwasigroch, Jean Marc; Rooman, Marianne
2006-07-15
Prelude&Fugue are bioinformatics tools aiming at predicting the local 3D structure of a protein from its amino acid sequence in terms of seven backbone torsion angle domains, using database-derived potentials. Prelude(&Fugue) computes all lowest free energy conformations of a protein or protein region, ranked by increasing energy, and possibly satisfying some interresidue distance constraints specified by the user. (Prelude&)Fugue detects sequence regions whose predicted structure is significantly preferred relative to other conformations in the absence of tertiary interactions. These programs can be used for predicting secondary structure, tertiary structure of short peptides, flickering early folding sequences and peptides that adopt a preferred conformation in solution. They can also be used for detecting structural weaknesses, i.e. sequence regions that are not optimal with respect to the tertiary fold. http://babylone.ulb.ac.be/Prelude_and_Fugue.
Predicting helix orientation for coiled-coil dimers
Apgar, James R.; Gutwin, Karl N.; Keating, Amy E.
2008-01-01
The alpha-helical coiled coil is a structurally simple protein oligomerization or interaction motif consisting of two or more alpha helices twisted into a supercoiled bundle. Coiled coils can differ in their stoichiometry, helix orientation and axial alignment. Because of the near degeneracy of many of these variants, coiled coils pose a challenge to fold recognition methods for structure prediction. Whereas distinctions between some protein folds can be discriminated on the basis of hydrophobic/polar patterning or secondary structure propensities, the sequence differences that encode important details of coiled-coil structure can be subtle. This is emblematic of a larger problem in the field of protein structure and interaction prediction: that of establishing specificity between closely similar structures. We tested the behavior of different computational models on the problem of recognizing the correct orientation - parallel vs. antiparallel - of pairs of alpha helices that can form a dimeric coiled coil. For each of 131 examples of known structure, we constructed a large number of both parallel and antiparallel structural models and used these to asses the ability of five energy functions to recognize the correct fold. We also developed and tested three sequenced-based approaches that make use of varying degrees of implicit structural information. The best structural methods performed similarly to the best sequence methods, correctly categorizing ∼81% of dimers. Steric compatibility with the fold was important for some coiled coils we investigated. For many examples, the correct orientation was determined by smaller energy differences between parallel and antiparallel structures distributed over many residues and energy components. Prediction methods that used structure but incorporated varying approximations and assumptions showed quite different behaviors when used to investigate energetic contributions to orientation preference. Sequence based methods were sensitive to the choice of residue-pair interactions scored. PMID:18506779
Mittal, A; Jayaram, B; Shenoy, Sandhya; Bawa, Tejdeep Singh
2010-10-01
Protein folding is at least a six decade old problem, since the times of Pauling and Anfinsen. However, rules of protein folding remain elusive till date. In this work, rigorous analyses of several thousand crystal structures of folded proteins reveal a surprisingly simple unifying principle of backbone organization in protein folding. We find that protein folding is a direct consequence of a narrow band of stoichiometric occurrences of amino-acids in primary sequences, regardless of the size and the fold of a protein. We observe that "preferential interactions" between amino-acids do not drive protein folding, contrary to all prevalent views. We dedicate our discovery to the seminal contribution of Chargaff which was one of the major keys to elucidation of the stoichiometry-driven spatially organized double helical structure of DNA.
Frustration Sculpts the Early Stages of Protein Folding.
Di Silvio, Eva; Brunori, Maurizio; Gianni, Stefano
2015-09-07
The funneled energy landscape theory implies that protein structures are minimally frustrated. Yet, because of the divergent demands between folding and function, regions of frustrated patterns are present at the active site of proteins. To understand the effects of such local frustration in dictating the energy landscape of proteins, here we compare the folding mechanisms of the two alternative spliced forms of a PDZ domain (PDZ2 and PDZ2as) that share a nearly identical sequence and structure, while displaying different frustration patterns. The analysis, based on the kinetic characterization of a large number of site-directed mutants, reveals that although the late stages for folding are very robust and biased by native topology, the early stages are more malleable and dominated by local frustration. The results are briefly discussed in the context of the energy-landscape theory. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Analysis of the Free-Energy Surface of Proteins from Reversible Folding Simulations
Allen, Lucy R.; Krivov, Sergei V.; Paci, Emanuele
2009-01-01
Computer generated trajectories can, in principle, reveal the folding pathways of a protein at atomic resolution and possibly suggest general and simple rules for predicting the folded structure of a given sequence. While such reversible folding trajectories can only be determined ab initio using all-atom transferable force-fields for a few small proteins, they can be determined for a large number of proteins using coarse-grained and structure-based force-fields, in which a known folded structure is by construction the absolute energy and free-energy minimum. Here we use a model of the fast folding helical λ-repressor protein to generate trajectories in which native and non-native states are in equilibrium and transitions are accurately sampled. Yet, representation of the free-energy surface, which underlies the thermodynamic and dynamic properties of the protein model, from such a trajectory remains a challenge. Projections over one or a small number of arbitrarily chosen progress variables often hide the most important features of such surfaces. The results unequivocally show that an unprojected representation of the free-energy surface provides important and unbiased information and allows a simple and meaningful description of many-dimensional, heterogeneous trajectories, providing new insight into the possible mechanisms of fast-folding proteins. PMID:19593364
Analysis of the free-energy surface of proteins from reversible folding simulations.
Allen, Lucy R; Krivov, Sergei V; Paci, Emanuele
2009-07-01
Computer generated trajectories can, in principle, reveal the folding pathways of a protein at atomic resolution and possibly suggest general and simple rules for predicting the folded structure of a given sequence. While such reversible folding trajectories can only be determined ab initio using all-atom transferable force-fields for a few small proteins, they can be determined for a large number of proteins using coarse-grained and structure-based force-fields, in which a known folded structure is by construction the absolute energy and free-energy minimum. Here we use a model of the fast folding helical lambda-repressor protein to generate trajectories in which native and non-native states are in equilibrium and transitions are accurately sampled. Yet, representation of the free-energy surface, which underlies the thermodynamic and dynamic properties of the protein model, from such a trajectory remains a challenge. Projections over one or a small number of arbitrarily chosen progress variables often hide the most important features of such surfaces. The results unequivocally show that an unprojected representation of the free-energy surface provides important and unbiased information and allows a simple and meaningful description of many-dimensional, heterogeneous trajectories, providing new insight into the possible mechanisms of fast-folding proteins.
Protein Structure Determination using Metagenome sequence data
Ovchinnikov, Sergey; Park, Hahnbeom; Varghese, Neha; Huang, Po-Ssu; Pavlopoulos, Georgios A.; Kim, David E.; Kamisetty, Hetunandan; Kyrpides, Nikos C.; Baker, David
2017-01-01
Despite decades of work by structural biologists, there are still ~5200 protein families with unknown structure outside the range of comparative modeling. We show that Rosetta structure prediction guided by residue-residue contacts inferred from evolutionary information can accurately model proteins that belong to large families, and that metagenome sequence data more than triples the number of protein families with sufficient sequences for accurate modeling. We then integrate metagenome data, contact based structure matching and Rosetta structure calculations to generate models for 614 protein families with currently unknown structures; 206 are membrane proteins and 137 have folds not represented in the PDB. This approach provides the representative models for large protein families originally envisioned as the goal of the protein structure initiative at a fraction of the cost. PMID:28104891
Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model.
Wang, Sheng; Sun, Siqi; Li, Zhen; Zhang, Renyu; Xu, Jinbo
2017-01-01
Protein contacts contain key information for the understanding of protein structure and function and thus, contact prediction from sequence is an important problem. Recently exciting progress has been made on this problem, but the predicted contacts for proteins without many sequence homologs is still of low quality and not very useful for de novo structure prediction. This paper presents a new deep learning method that predicts contacts by integrating both evolutionary coupling (EC) and sequence conservation information through an ultra-deep neural network formed by two deep residual neural networks. The first residual network conducts a series of 1-dimensional convolutional transformation of sequential features; the second residual network conducts a series of 2-dimensional convolutional transformation of pairwise information including output of the first residual network, EC information and pairwise potential. By using very deep residual networks, we can accurately model contact occurrence patterns and complex sequence-structure relationship and thus, obtain higher-quality contact prediction regardless of how many sequence homologs are available for proteins in question. Our method greatly outperforms existing methods and leads to much more accurate contact-assisted folding. Tested on 105 CASP11 targets, 76 past CAMEO hard targets, and 398 membrane proteins, the average top L long-range prediction accuracy obtained by our method, one representative EC method CCMpred and the CASP11 winner MetaPSICOV is 0.47, 0.21 and 0.30, respectively; the average top L/10 long-range accuracy of our method, CCMpred and MetaPSICOV is 0.77, 0.47 and 0.59, respectively. Ab initio folding using our predicted contacts as restraints but without any force fields can yield correct folds (i.e., TMscore>0.6) for 203 of the 579 test proteins, while that using MetaPSICOV- and CCMpred-predicted contacts can do so for only 79 and 62 of them, respectively. Our contact-assisted models also have much better quality than template-based models especially for membrane proteins. The 3D models built from our contact prediction have TMscore>0.5 for 208 of the 398 membrane proteins, while those from homology modeling have TMscore>0.5 for only 10 of them. Further, even if trained mostly by soluble proteins, our deep learning method works very well on membrane proteins. In the recent blind CAMEO benchmark, our fully-automated web server implementing this method successfully folded 6 targets with a new fold and only 0.3L-2.3L effective sequence homologs, including one β protein of 182 residues, one α+β protein of 125 residues, one α protein of 140 residues, one α protein of 217 residues, one α/β of 260 residues and one α protein of 462 residues. Our method also achieved the highest F1 score on free-modeling targets in the latest CASP (Critical Assessment of Structure Prediction), although it was not fully implemented back then. http://raptorx.uchicago.edu/ContactMap/.
Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model
Li, Zhen; Zhang, Renyu
2017-01-01
Motivation Protein contacts contain key information for the understanding of protein structure and function and thus, contact prediction from sequence is an important problem. Recently exciting progress has been made on this problem, but the predicted contacts for proteins without many sequence homologs is still of low quality and not very useful for de novo structure prediction. Method This paper presents a new deep learning method that predicts contacts by integrating both evolutionary coupling (EC) and sequence conservation information through an ultra-deep neural network formed by two deep residual neural networks. The first residual network conducts a series of 1-dimensional convolutional transformation of sequential features; the second residual network conducts a series of 2-dimensional convolutional transformation of pairwise information including output of the first residual network, EC information and pairwise potential. By using very deep residual networks, we can accurately model contact occurrence patterns and complex sequence-structure relationship and thus, obtain higher-quality contact prediction regardless of how many sequence homologs are available for proteins in question. Results Our method greatly outperforms existing methods and leads to much more accurate contact-assisted folding. Tested on 105 CASP11 targets, 76 past CAMEO hard targets, and 398 membrane proteins, the average top L long-range prediction accuracy obtained by our method, one representative EC method CCMpred and the CASP11 winner MetaPSICOV is 0.47, 0.21 and 0.30, respectively; the average top L/10 long-range accuracy of our method, CCMpred and MetaPSICOV is 0.77, 0.47 and 0.59, respectively. Ab initio folding using our predicted contacts as restraints but without any force fields can yield correct folds (i.e., TMscore>0.6) for 203 of the 579 test proteins, while that using MetaPSICOV- and CCMpred-predicted contacts can do so for only 79 and 62 of them, respectively. Our contact-assisted models also have much better quality than template-based models especially for membrane proteins. The 3D models built from our contact prediction have TMscore>0.5 for 208 of the 398 membrane proteins, while those from homology modeling have TMscore>0.5 for only 10 of them. Further, even if trained mostly by soluble proteins, our deep learning method works very well on membrane proteins. In the recent blind CAMEO benchmark, our fully-automated web server implementing this method successfully folded 6 targets with a new fold and only 0.3L-2.3L effective sequence homologs, including one β protein of 182 residues, one α+β protein of 125 residues, one α protein of 140 residues, one α protein of 217 residues, one α/β of 260 residues and one α protein of 462 residues. Our method also achieved the highest F1 score on free-modeling targets in the latest CASP (Critical Assessment of Structure Prediction), although it was not fully implemented back then. Availability http://raptorx.uchicago.edu/ContactMap/ PMID:28056090
NASA Astrophysics Data System (ADS)
Arteca, Gustavo A.; Tapia, O.
Using computer-simulated molecular dynamics, we study the effect of sequence mutation on the unfolding mechanism of a native fold. The system considered is the native fold of hen egg-white lysozyme, exposed to centrifugal unfolding in vacuo. This unfolding bias elicits configurational transitions that imitate the behaviour of anhydrous proteins diffusing after electrospraying from neutral-pH solutions. By changing the sequences threaded onto the native fold of lysozyme, we probe the role of disulfide bridges and the effect of a global mutation. We find that the initial denaturing steps share common characteristics for the tested sequences. Recurrent features are: (i) the presence of dumbbell conformers with significant residual secondary structure, (ii) the ubiquitous formation of hairpins and two-stranded β-sheets regardless of disulfide bridges, and (iii) an unfolding pattern where the reduction in folding complexity is highly correlated with the decrease in chain compactness. These findings appear to be intrinsic to the shape of the native fold, suggesting that similar unfolding pathways may be accessible to many protein sequences.
Jacquin, Hugo; Gilson, Amy; Shakhnovich, Eugene; Cocco, Simona; Monasson, Rémi
2016-05-01
Inverse statistical approaches to determine protein structure and function from Multiple Sequence Alignments (MSA) are emerging as powerful tools in computational biology. However the underlying assumptions of the relationship between the inferred effective Potts Hamiltonian and real protein structure and energetics remain untested so far. Here we use lattice protein model (LP) to benchmark those inverse statistical approaches. We build MSA of highly stable sequences in target LP structures, and infer the effective pairwise Potts Hamiltonians from those MSA. We find that inferred Potts Hamiltonians reproduce many important aspects of 'true' LP structures and energetics. Careful analysis reveals that effective pairwise couplings in inferred Potts Hamiltonians depend not only on the energetics of the native structure but also on competing folds; in particular, the coupling values reflect both positive design (stabilization of native conformation) and negative design (destabilization of competing folds). In addition to providing detailed structural information, the inferred Potts models used as protein Hamiltonian for design of new sequences are able to generate with high probability completely new sequences with the desired folds, which is not possible using independent-site models. Those are remarkable results as the effective LP Hamiltonians used to generate MSA are not simple pairwise models due to the competition between the folds. Our findings elucidate the reasons for the success of inverse approaches to the modelling of proteins from sequence data, and their limitations.
Ganguly, Debabani; Chen, Jianhan
2011-04-01
Coupled binding and folding is frequently involved in specific recognition of so-called intrinsically disordered proteins (IDPs), a newly recognized class of proteins that rely on a lack of stable tertiary fold for function. Here, we exploit topology-based Gō-like modeling as an effective tool for the mechanism of IDP recognition within the theoretical framework of minimally frustrated energy landscape. Importantly, substantial differences exist between IDPs and globular proteins in both amino acid sequence and binding interface characteristics. We demonstrate that established Gō-like models designed for folded proteins tend to over-estimate the level of residual structures in unbound IDPs, whereas under-estimating the strength of intermolecular interactions. Such systematic biases have important consequences in the predicted mechanism of interaction. A strategy is proposed to recalibrate topology-derived models to balance intrinsic folding propensities and intermolecular interactions, based on experimental knowledge of the overall residual structure level and binding affinity. Applied to pKID/KIX, the calibrated Gō-like model predicts a dominant multistep sequential pathway for binding-induced folding of pKID that is initiated by KIX binding via the C-terminus in disordered conformations, followed by binding and folding of the rest of C-terminal helix and finally the N-terminal helix. This novel mechanism is consistent with key observations derived from a recent NMR titration and relaxation dispersion study and provides a molecular-level interpretation of kinetic rates derived from dispersion curve analysis. These case studies provide important insight into the applicability and potential pitfalls of topology-based modeling for studying IDP folding and interaction in general. Copyright © 2011 Wiley-Liss, Inc.
A new method to improve network topological similarity search: applied to fold recognition
Lhota, John; Hauptman, Ruth; Hart, Thomas; Ng, Clara; Xie, Lei
2015-01-01
Motivation: Similarity search is the foundation of bioinformatics. It plays a key role in establishing structural, functional and evolutionary relationships between biological sequences. Although the power of the similarity search has increased steadily in recent years, a high percentage of sequences remain uncharacterized in the protein universe. Thus, new similarity search strategies are needed to efficiently and reliably infer the structure and function of new sequences. The existing paradigm for studying protein sequence, structure, function and evolution has been established based on the assumption that the protein universe is discrete and hierarchical. Cumulative evidence suggests that the protein universe is continuous. As a result, conventional sequence homology search methods may be not able to detect novel structural, functional and evolutionary relationships between proteins from weak and noisy sequence signals. To overcome the limitations in existing similarity search methods, we propose a new algorithmic framework—Enrichment of Network Topological Similarity (ENTS)—to improve the performance of large scale similarity searches in bioinformatics. Results: We apply ENTS to a challenging unsolved problem: protein fold recognition. Our rigorous benchmark studies demonstrate that ENTS considerably outperforms state-of-the-art methods. As the concept of ENTS can be applied to any similarity metric, it may provide a general framework for similarity search on any set of biological entities, given their representation as a network. Availability and implementation: Source code freely available upon request Contact: lxie@iscb.org PMID:25717198
Efficient use of unlabeled data for protein sequence classification: a comparative study
Kuksa, Pavel; Huang, Pai-Hsi; Pavlovic, Vladimir
2009-01-01
Background Recent studies in computational primary protein sequence analysis have leveraged the power of unlabeled data. For example, predictive models based on string kernels trained on sequences known to belong to particular folds or superfamilies, the so-called labeled data set, can attain significantly improved accuracy if this data is supplemented with protein sequences that lack any class tags–the unlabeled data. In this study, we present a principled and biologically motivated computational framework that more effectively exploits the unlabeled data by only using the sequence regions that are more likely to be biologically relevant for better prediction accuracy. As overly-represented sequences in large uncurated databases may bias the estimation of computational models that rely on unlabeled data, we also propose a method to remove this bias and improve performance of the resulting classifiers. Results Combined with state-of-the-art string kernels, our proposed computational framework achieves very accurate semi-supervised protein remote fold and homology detection on three large unlabeled databases. It outperforms current state-of-the-art methods and exhibits significant reduction in running time. Conclusion The unlabeled sequences used under the semi-supervised setting resemble the unpolished gemstones; when used as-is, they may carry unnecessary features and hence compromise the classification accuracy but once cut and polished, they improve the accuracy of the classifiers considerably. PMID:19426450
Sequence-based prediction of protein-binding sites in DNA: comparative study of two SVM models.
Park, Byungkyu; Im, Jinyong; Tuvshinjargal, Narankhuu; Lee, Wook; Han, Kyungsook
2014-11-01
As many structures of protein-DNA complexes have been known in the past years, several computational methods have been developed to predict DNA-binding sites in proteins. However, its inverse problem (i.e., predicting protein-binding sites in DNA) has received much less attention. One of the reasons is that the differences between the interaction propensities of nucleotides are much smaller than those between amino acids. Another reason is that DNA exhibits less diverse sequence patterns than protein. Therefore, predicting protein-binding DNA nucleotides is much harder than predicting DNA-binding amino acids. We computed the interaction propensity (IP) of nucleotide triplets with amino acids using an extensive dataset of protein-DNA complexes, and developed two support vector machine (SVM) models that predict protein-binding nucleotides from sequence data alone. One SVM model predicts protein-binding nucleotides using DNA sequence data alone, and the other SVM model predicts protein-binding nucleotides using both DNA and protein sequences. In a 10-fold cross-validation with 1519 DNA sequences, the SVM model that uses DNA sequence data only predicted protein-binding nucleotides with an accuracy of 67.0%, an F-measure of 67.1%, and a Matthews correlation coefficient (MCC) of 0.340. With an independent dataset of 181 DNAs that were not used in training, it achieved an accuracy of 66.2%, an F-measure 66.3% and a MCC of 0.324. Another SVM model that uses both DNA and protein sequences achieved an accuracy of 69.6%, an F-measure of 69.6%, and a MCC of 0.383 in a 10-fold cross-validation with 1519 DNA sequences and 859 protein sequences. With an independent dataset of 181 DNAs and 143 proteins, it showed an accuracy of 67.3%, an F-measure of 66.5% and a MCC of 0.329. Both in cross-validation and independent testing, the second SVM model that used both DNA and protein sequence data showed better performance than the first model that used DNA sequence data. To the best of our knowledge, this is the first attempt to predict protein-binding nucleotides in a given DNA sequence from the sequence data alone. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Krokhotin, Andrey; Dokholyan, Nikolay V.
2017-07-01
Most proteins fold into unique three-dimensional (3D) structures that determine their biological functions, such as catalytic activity or macromolecular binding. Misfolded proteins can pose a threat through aberrant interactions with other proteins leading to a number of diseases including Alzheimer's disease, Parkinson's disease, and amyotrophic lateral sclerosis [1,2]. What does determine 3D structure of proteins? The first clue to this question came more than fifty years ago when Anfinsen demonstrated that unfolded proteins can spontaneously fold to their native 3D structures [3,4]. Anfinsen's experiments lead to the conclusion that proteins fold to unique native structure corresponding to the stable and kinetically accessible free energy minimum, and protein native structure is solely determined by its amino acid sequence. The question of how exactly proteins find their free energy minimum proved to be a difficult problem. One of the puzzles, initially pointed out by Levinthal, was an inconsistency between observed protein folding times and theoretical estimates. A self-avoiding polymer model of a globular protein of 100-residues length on a cubic lattice can sample at least 1047 states. Based on the assumption that conformational sampling occurs at the highest vibrational mode of proteins (∼picoseconds), predicted folding time by searching among all the possible conformations leads to ∼1027 years (much larger than the age of the universe) [5]. In contrast, observed protein folding time range from microseconds to minutes. Due to tremendous theoretical progress in protein folding field that has been achieved in past decades, the source of this inconsistency is currently understood that is thoroughly described in the review by Finkelstein et al. [6].
Okura, Hiromichi; Takahashi, Tsuyoshi; Mihara, Hisakazu
2012-06-01
Successful approaches of de novo protein design suggest a great potential to create novel structural folds and to understand natural rules of protein folding. For these purposes, smaller and simpler de novo proteins have been developed. Here, we constructed smaller proteins by removing the terminal sequences from stable de novo vTAJ proteins and compared stabilities between mutant and original proteins. vTAJ proteins were screened from an α3β3 binary-patterned library which was designed with polar/ nonpolar periodicities of α-helix and β-sheet. vTAJ proteins have the additional terminal sequences due to the method of constructing the genetically repeated library sequences. By removing the parts of the sequences, we successfully obtained the stable smaller de novo protein mutants with fewer amino acid alphabets than the originals. However, these mutants showed the differences on ANS binding properties and stabilities against denaturant and pH change. The terminal sequences, which were designed just as flexible linkers not as secondary structure units, sufficiently affected these physicochemical details. This study showed implications for adjusting protein stabilities by designing N- and C-terminal sequences.
Small Scaffolds, Big Potential: Developing Miniature Proteins as Therapeutic Agents.
Holub, Justin M
2017-09-01
Preclinical Research Miniature proteins are a class of oligopeptide characterized by their short sequence lengths and ability to adopt well-folded, three-dimensional structures. Because of their biomimetic nature and synthetic tractability, miniature proteins have been used to study a range of biochemical processes including fast protein folding, signal transduction, catalysis and molecular transport. Recently, miniature proteins have been gaining traction as potential therapeutic agents because their small size and ability to fold into defined tertiary structures facilitates their development as protein-based drugs. This research overview discusses emerging developments involving the use of miniature proteins as scaffolds to design novel therapeutics for the treatment and study of human disease. Specifically, this review will explore strategies to: (i) stabilize miniature protein tertiary structure; (ii) optimize biomolecular recognition by grafting functional epitopes onto miniature protein scaffolds; and (iii) enhance cytosolic delivery of miniature proteins through the use of cationic motifs that facilitate endosomal escape. These objectives are discussed not only to address challenges in developing effective miniature protein-based drugs, but also to highlight the tremendous potential miniature proteins hold for combating and understanding human disease. Drug Dev Res 78 : 268-282, 2017. © 2017 Wiley Periodicals, Inc. © 2017 Wiley Periodicals, Inc.
The folding mechanism of two closely related proteins in the intracellular lipid binding protein family, human bile acid binding protein (hBABP) and rat bile acid binding protein (rBABP) were examined. These proteins are 77% identical (93% similar) in sequence Both of these singl...
Protein 3D Structure Computed from Evolutionary Sequence Variation
Sheridan, Robert; Hopf, Thomas A.; Pagnani, Andrea; Zecchina, Riccardo; Sander, Chris
2011-01-01
The evolutionary trajectory of a protein through sequence space is constrained by its function. Collections of sequence homologs record the outcomes of millions of evolutionary experiments in which the protein evolves according to these constraints. Deciphering the evolutionary record held in these sequences and exploiting it for predictive and engineering purposes presents a formidable challenge. The potential benefit of solving this challenge is amplified by the advent of inexpensive high-throughput genomic sequencing. In this paper we ask whether we can infer evolutionary constraints from a set of sequence homologs of a protein. The challenge is to distinguish true co-evolution couplings from the noisy set of observed correlations. We address this challenge using a maximum entropy model of the protein sequence, constrained by the statistics of the multiple sequence alignment, to infer residue pair couplings. Surprisingly, we find that the strength of these inferred couplings is an excellent predictor of residue-residue proximity in folded structures. Indeed, the top-scoring residue couplings are sufficiently accurate and well-distributed to define the 3D protein fold with remarkable accuracy. We quantify this observation by computing, from sequence alone, all-atom 3D structures of fifteen test proteins from different fold classes, ranging in size from 50 to 260 residues., including a G-protein coupled receptor. These blinded inferences are de novo, i.e., they do not use homology modeling or sequence-similar fragments from known structures. The co-evolution signals provide sufficient information to determine accurate 3D protein structure to 2.7–4.8 Å Cα-RMSD error relative to the observed structure, over at least two-thirds of the protein (method called EVfold, details at http://EVfold.org). This discovery provides insight into essential interactions constraining protein evolution and will facilitate a comprehensive survey of the universe of protein structures, new strategies in protein and drug design, and the identification of functional genetic variants in normal and disease genomes. PMID:22163331
Figueroa-Montiel, Andrea; Ramos, Marco A; Mares, Rosa E; Dueñas, Salvador; Pimienta, Genaro; Ortiz, Ernesto; Possani, Lourival D; Licea-Navarro, Alexei F
2016-01-01
Small peptides isolated from the venom of the marine snails belonging to the genus Conus have been largely studied because of their therapeutic value. These peptides can be classified in two groups. The largest one is composed by peptides rich in disulfide bonds, and referred to as conotoxins. Despite the importance of conotoxins given their pharmacology value, little is known about the protein disulfide isomerase (PDI) enzymes that are required to catalyze their correct folding. To discover the PDIs that may participate in the folding and structural maturation of conotoxins, the transcriptomes of the venom duct of four different species of Conus from the peninsula of Baja California (Mexico) were assembled. Complementary DNA (cDNA) libraries were constructed for each species and sequenced using a Genome Analyzer Illumina platform. The raw RNA-seq data was converted into transcript sequences using Trinity, a de novo assembler that allows the grouping of reads into contigs without a reference genome. An N50 value of 605 was established as a reference for future assemblies of Conus transcriptomes using this software. Transdecoder was used to extract likely coding sequences from Trinity transcripts, and PDI-specific sequence motif "APWCGHCK" was used to capture potential PDIs. An in silico analysis was performed to characterize the group of PDI protein sequences encoded by the duct-transcriptome of each species. The computational approach entailed a structural homology characterization, based on the presence of functional Thioredoxin-like domains. Four different PDI families were characterized, which are constituted by a total of 41 different gene sequences. The sequences had an average of 65% identity with other PDIs. Using MODELLER 9.14, the homology-based three-dimensional structure prediction of a subset of the sequences reported, showed the expected thioredoxin fold which was confirmed by a "simulated annealing" method.
The Folding Energy Landscape and Free Energy Excitations of Cytochrome c
Weinkam, Patrick; Zimmermann, Jörg; Romesberg, Floyd E.
2014-01-01
The covalently bound heme cofactor plays a dominant role in the folding of cytochrome c. Due to the complicated inorganic chemistry of the heme, some might consider the folding of cytochrome c to be a special case that follows different principles than those used to describe folding of proteins without cofactors. Recent investigations, however, demonstrate that models which are commonly used to describe folding for many proteins work well for cytochrome c when heme is explicitly introduced and generally provide results that agree with experimental observations. We will first discuss results from simple native structure-based models. These models include attractive interactions between nonadjacent residues only if they are present in the crystal structure at pH 7. Since attractive nonnative contacts are not included in native structure-based models, their energy landscapes can be described as “perfectly funneled.” In other words, native structure-based models are energetically guided towards the native state and contain no energetic traps that would hinder folding. Energetic traps are sources of frustration which cause specific transient intermediates to be populated. Native structure-based models do include repulsion between residues due to excluded volume. Nonenergetic traps can therefore exist if the chain, which cannot cross over itself, must partially unfold in order for folding to proceed. The ability of native structure-based models to capture these type of motions is in part responsible for their successful predictions of folding pathways for many types of proteins. Models without frustration describe well the sequence of folding events for cytochrome c inferred from hydrogen exchange experiments thereby justifying their use as a starting point. At low pH, the folding sequence of cytochrome c deviates from that at pH 7 and from those predicted from models with perfectly funneled energy landscapes. Alternate folding pathways are a result of “chemical frustration.” This frustration arises because some regions of the protein are destabilized more than others due to the heterogeneous distribution of titratable residues that are protonated at low pH. We construct more complex models that include chemical frustration, in addition to the native structure-based terms. These more complex models only modestly perturb the energy landscape which remains overall well funneled. These perturbed models can accurately describe how alternative folding pathways are used at low pH. At alkaline pH, cytochrome c populates distinctly different structural ensembles. For instance, lysine residues are deprotonated and compete for the heme ligation site. The same models that can describe folding at low pH also predict well the structures and relative stabilities of intermediates populated at alkaline pH. PMID:20143816
Global analysis of protein folding using massively parallel design, synthesis and testing
Rocklin, Gabriel J.; Chidyausiku, Tamuka M.; Goreshnik, Inna; Ford, Alex; Houliston, Scott; Lemak, Alexander; Carter, Lauren; Ravichandran, Rashmi; Mulligan, Vikram K.; Chevalier, Aaron; Arrowsmith, Cheryl H.; Baker, David
2017-01-01
Proteins fold into unique native structures stabilized by thousands of weak interactions that collectively overcome the entropic cost of folding. Though these forces are “encoded” in the thousands of known protein structures, “decoding” them is challenging due to the complexity of natural proteins that have evolved for function, not stability. Here we combine computational protein design, next-generation gene synthesis, and a high-throughput protease susceptibility assay to measure folding and stability for over 15,000 de novo designed miniproteins, 1,000 natural proteins, 10,000 point-mutants, and 30,000 negative control sequences, identifying over 2,500 new stable designed proteins in four basic folds. This scale—three orders of magnitude greater than that of previous studies of design or folding—enabled us to systematically examine how sequence determines folding and stability in uncharted protein space. Iteration between design and experiment increased the design success rate from 6% to 47%, produced stable proteins unlike those found in nature for topologies where design was initially unsuccessful, and revealed subtle contributions to stability as designs became increasingly optimized. Our approach achieves the long-standing goal of a tight feedback cycle between computation and experiment, and promises to transform computational protein design into a data-driven science. PMID:28706065
Song, Jiangning; Burrage, Kevin; Yuan, Zheng; Huber, Thomas
2006-03-09
The majority of peptide bonds in proteins are found to occur in the trans conformation. However, for proline residues, a considerable fraction of Prolyl peptide bonds adopt the cis form. Proline cis/trans isomerization is known to play a critical role in protein folding, splicing, cell signaling and transmembrane active transport. Accurate prediction of proline cis/trans isomerization in proteins would have many important applications towards the understanding of protein structure and function. In this paper, we propose a new approach to predict the proline cis/trans isomerization in proteins using support vector machine (SVM). The preliminary results indicated that using Radial Basis Function (RBF) kernels could lead to better prediction performance than that of polynomial and linear kernel functions. We used single sequence information of different local window sizes, amino acid compositions of different local sequences, multiple sequence alignment obtained from PSI-BLAST and the secondary structure information predicted by PSIPRED. We explored these different sequence encoding schemes in order to investigate their effects on the prediction performance. The training and testing of this approach was performed on a newly enlarged dataset of 2424 non-homologous proteins determined by X-Ray diffraction method using 5-fold cross-validation. Selecting the window size 11 provided the best performance for determining the proline cis/trans isomerization based on the single amino acid sequence. It was found that using multiple sequence alignments in the form of PSI-BLAST profiles could significantly improve the prediction performance, the prediction accuracy increased from 62.8% with single sequence to 69.8% and Matthews Correlation Coefficient (MCC) improved from 0.26 with single local sequence to 0.40. Furthermore, if coupled with the predicted secondary structure information by PSIPRED, our method yielded a prediction accuracy of 71.5% and MCC of 0.43, 9% and 0.17 higher than the accuracy achieved based on the singe sequence information, respectively. A new method has been developed to predict the proline cis/trans isomerization in proteins based on support vector machine, which used the single amino acid sequence with different local window sizes, the amino acid compositions of local sequence flanking centered proline residues, the position-specific scoring matrices (PSSMs) extracted by PSI-BLAST and the predicted secondary structures generated by PSIPRED. The successful application of SVM approach in this study reinforced that SVM is a powerful tool in predicting proline cis/trans isomerization in proteins and biological sequence analysis.
Wood, Christopher W; Bruning, Marc; Ibarra, Amaurys Á; Bartlett, Gail J; Thomson, Andrew R; Sessions, Richard B; Brady, R Leo; Woolfson, Derek N
2014-11-01
The ability to accurately model protein structures at the atomistic level underpins efforts to understand protein folding, to engineer natural proteins predictably and to design proteins de novo. Homology-based methods are well established and produce impressive results. However, these are limited to structures presented by and resolved for natural proteins. Addressing this problem more widely and deriving truly ab initio models requires mathematical descriptions for protein folds; the means to decorate these with natural, engineered or de novo sequences; and methods to score the resulting models. We present CCBuilder, a web-based application that tackles the problem for a defined but large class of protein structure, the α-helical coiled coils. CCBuilder generates coiled-coil backbones, builds side chains onto these frameworks and provides a range of metrics to measure the quality of the models. Its straightforward graphical user interface provides broad functionality that allows users to build and assess models, in which helix geometry, coiled-coil architecture and topology and protein sequence can be varied rapidly. We demonstrate the utility of CCBuilder by assembling models for 653 coiled-coil structures from the PDB, which cover >96% of the known coiled-coil types, and by generating models for rarer and de novo coiled-coil structures. CCBuilder is freely available, without registration, at http://coiledcoils.chm.bris.ac.uk/app/cc_builder/. © The Author 2014. Published by Oxford University Press.
Dzurová, Lenka; Forneris, Federico; Savino, Simone; Galuszka, Petr; Vrabka, Josef; Frébort, Ivo
2015-08-01
The recently discovered cytokinin (CK)-specific phosphoribohydrolase "Lonely Guy" (LOG) is a key enzyme of CK biosynthesis, converting inactive CK nucleotides into biologically active free bases. We have determined the crystal structures of LOG from Claviceps purpurea (cpLOG) and its complex with the enzymatic product phosphoribose. The structures reveal a dimeric arrangement of Rossmann folds, with the ligands bound to large pockets at the interface between cpLOG monomers. Structural comparisons highlight the homology of cpLOG to putative lysine decarboxylases. Extended sequence analysis enabled identification of a distinguishing LOG sequence signature. Taken together, our data suggest phosphoribohydrolase activity for several proteins of unknown function. © 2015 Wiley Periodicals, Inc.
Possenti, Andrea; Vendruscolo, Michele; Camilloni, Carlo; Tiana, Guido
2018-05-23
Proteins employ the information stored in the genetic code and translated into their sequences to carry out well-defined functions in the cellular environment. The possibility to encode for such functions is controlled by the balance between the amount of information supplied by the sequence and that left after that the protein has folded into its structure. We study the amount of information necessary to specify the protein structure, providing an estimate that keeps into account the thermodynamic properties of protein folding. We thus show that the information remaining in the protein sequence after encoding for its structure (the 'information gap') is very close to what needed to encode for its function and interactions. Then, by predicting the information gap directly from the protein sequence, we show that it may be possible to use these insights from information theory to discriminate between ordered and disordered proteins, to identify unknown functions, and to optimize artificially-designed protein sequences. This article is protected by copyright. All rights reserved. © 2018 Wiley Periodicals, Inc.
Layers: A molecular surface peeling algorithm and its applications to analyze protein structures
Karampudi, Naga Bhushana Rao; Bahadur, Ranjit Prasad
2015-01-01
We present an algorithm ‘Layers’ to peel the atoms of proteins as layers. Using Layers we show an efficient way to transform protein structures into 2D pattern, named residue transition pattern (RTP), which is independent of molecular orientations. RTP explains the folding patterns of proteins and hence identification of similarity between proteins is simple and reliable using RTP than with the standard sequence or structure based methods. Moreover, Layers generates a fine-tunable coarse model for the molecular surface by using non-random sampling. The coarse model can be used for shape comparison, protein recognition and ligand design. Additionally, Layers can be used to develop biased initial configuration of molecules for protein folding simulations. We have developed a random forest classifier to predict the RTP of a given polypeptide sequence. Layers is a standalone application; however, it can be merged with other applications to reduce the computational load when working with large datasets of protein structures. Layers is available freely at http://www.csb.iitkgp.ernet.in/applications/mol_layers/main. PMID:26553411
Getov, Ivan; Petukh, Marharyta; Alexov, Emil
2016-04-07
Folding free energy is an important biophysical characteristic of proteins that reflects the overall stability of the 3D structure of macromolecules. Changes in the amino acid sequence, naturally occurring or made in vitro, may affect the stability of the corresponding protein and thus could be associated with disease. Several approaches that predict the changes of the folding free energy caused by mutations have been proposed, but there is no method that is clearly superior to the others. The optimal goal is not only to accurately predict the folding free energy changes, but also to characterize the structural changes induced by mutations and the physical nature of the predicted folding free energy changes. Here we report a new method to predict the Single Amino Acid Folding free Energy Changes (SAAFEC) based on a knowledge-modified Molecular Mechanics Poisson-Boltzmann (MM/PBSA) approach. The method is comprised of two main components: a MM/PBSA component and a set of knowledge based terms delivered from a statistical study of the biophysical characteristics of proteins. The predictor utilizes a multiple linear regression model with weighted coefficients of various terms optimized against a set of experimental data. The aforementioned approach yields a correlation coefficient of 0.65 when benchmarked against 983 cases from 42 proteins in the ProTherm database. the webserver can be accessed via http://compbio.clemson.edu/SAAFEC/.
NASA Astrophysics Data System (ADS)
Hao, Ming-Hong; Scheraga, Harold A.
1995-01-01
A comparative study of protein folding with an analytical theory and computer simulations, respectively, is reported. The theory is based on an improved mean-field formalism which, in addition to the usual mean-field approximations, takes into account the distributions of energies in the subsets of conformational states. Sequence-specific properties of proteins are parametrized in the theory by two sets of variables, one for the energetics of mean-field interactions and one for the distribution of energies. Simulations are carried out on model polypeptides with different sequences, with different chain lengths, and with different interaction potentials, ranging from strong biases towards certain local chain states (bond angles and torsional angles) to complete absence of local conformational preferences. Theoretical analysis of the simulation results for the model polypeptides reveals three different types of behavior in the folding transition from the statistical coiled state to the compact globular state; these include a cooperative two-state transition, a continuous folding, and a glasslike transition. It is found that, with the fitted theoretical parameters which are specific for each polypeptide under a different potential, the mean-field theory can describe the thermodynamic properties and folding behavior of the different polypeptides accurately. By comparing the theoretical descriptions with simulation results, we verify the basic assumptions of the theory and, thereby, obtain new insights about the folding transitions of proteins. It is found that the cooperativity of the first-order folding transition of the model polypeptides is determined mainly by long-range interactions, in particular the dipolar orientation; the local interactions (e.g., bond-angle and torsion-angle potentials) have only marginal effect on the cooperative characteristic of the folding, but have a large impact on the difference in energy between the folded lowest-energy structure and the unfolded conformations of a protein.
The protein structure prediction problem could be solved using the current PDB library
Zhang, Yang; Skolnick, Jeffrey
2005-01-01
For single-domain proteins, we examine the completeness of the structures in the current Protein Data Bank (PDB) library for use in full-length model construction of unknown sequences. To address this issue, we employ a comprehensive benchmark set of 1,489 medium-size proteins that cover the PDB at the level of 35% sequence identity and identify templates by structure alignment. With homologous proteins excluded, we can always find similar folds to native with an average rms deviation (RMSD) from native of 2.5 Å with ≈82% alignment coverage. These template structures often contain a significant number of insertions/deletions. The tasser algorithm was applied to build full-length models, where continuous fragments are excised from the top-scoring templates and reassembled under the guide of an optimized force field, which includes consensus restraints taken from the templates and knowledge-based statistical potentials. For almost all targets (except for 2/1,489), the resultant full-length models have an RMSD to native below 6 Å (97% of them below 4 Å). On average, the RMSD of full-length models is 2.25 Å, with aligned regions improved from 2.5 Å to 1.88 Å, comparable with the accuracy of low-resolution experimental structures. Furthermore, starting from state-of-the-art structural alignments, we demonstrate a methodology that can consistently bring template-based alignments closer to native. These results are highly suggestive that the protein-folding problem can in principle be solved based on the current PDB library by developing efficient fold recognition algorithms that can recover such initial alignments. PMID:15653774
Roles of beta-turns in protein folding: from peptide models to protein engineering.
Marcelino, Anna Marie C; Gierasch, Lila M
2008-05-01
Reverse turns are a major class of protein secondary structure; they represent sites of chain reversal and thus sites where the globular character of a protein is created. It has been speculated for many years that turns may nucleate the formation of structure in protein folding, as their propensity to occur will favor the approximation of their flanking regions and their general tendency to be hydrophilic will favor their disposition at the solvent-accessible surface. Reverse turns are local features, and it is therefore not surprising that their structural properties have been extensively studied using peptide models. In this article, we review research on peptide models of turns to test the hypothesis that the propensities of turns to form in short peptides will relate to the roles of corresponding sequences in protein folding. Turns with significant stability as isolated entities should actively promote the folding of a protein, and by contrast, turn sequences that merely allow the chain to adopt conformations required for chain reversal are predicted to be passive in the folding mechanism. We discuss results of protein engineering studies of the roles of turn residues in folding mechanisms. Factors that correlate with the importance of turns in folding indeed include their intrinsic stability, as well as their topological context and their participation in hydrophobic networks within the protein's structure.
FRAN and RBF-PSO as two components of a hyper framework to recognize protein folds.
Abbasi, Elham; Ghatee, Mehdi; Shiri, M E
2013-09-01
In this paper, an intelligent hyper framework is proposed to recognize protein folds from its amino acid sequence which is a fundamental problem in bioinformatics. This framework includes some statistical and intelligent algorithms for proteins classification. The main components of the proposed framework are the Fuzzy Resource-Allocating Network (FRAN) and the Radial Bases Function based on Particle Swarm Optimization (RBF-PSO). FRAN applies a dynamic method to tune up the RBF network parameters. Due to the patterns complexity captured in protein dataset, FRAN classifies the proteins under fuzzy conditions. Also, RBF-PSO applies PSO to tune up the RBF classifier. Experimental results demonstrate that FRAN improves prediction accuracy up to 51% and achieves acceptable multi-class results for protein fold prediction. Although RBF-PSO provides reasonable results for protein fold recognition up to 48%, it is weaker than FRAN in some cases. However the proposed hyper framework provides an opportunity to use a great range of intelligent methods and can learn from previous experiences. Thus it can avoid the weakness of some intelligent methods in terms of memory, computational time and static structure. Furthermore, the performance of this system can be enhanced throughout the system life-cycle. Copyright © 2013 Elsevier Ltd. All rights reserved.
Protocols for efficient simulations of long-time protein dynamics using coarse-grained CABS model.
Jamroz, Michal; Kolinski, Andrzej; Kmiecik, Sebastian
2014-01-01
Coarse-grained (CG) modeling is a well-acknowledged simulation approach for getting insight into long-time scale protein folding events at reasonable computational cost. Depending on the design of a CG model, the simulation protocols vary from highly case-specific-requiring user-defined assumptions about the folding scenario-to more sophisticated blind prediction methods for which only a protein sequence is required. Here we describe the framework protocol for the simulations of long-term dynamics of globular proteins, with the use of the CABS CG protein model and sequence data. The simulations can start from a random or a selected (e.g., native) structure. The described protocol has been validated using experimental data for protein folding model systems-the prediction results agreed well with the experimental results.
Madan, Bharat; Sokalingam, Sriram; Raghunathan, Govindan; Lee, Sun-Gu
2014-10-01
Both Type I' and Type II' β-turns have the same sense of the β-turn twist that is compatible with the β-sheet twist. They occur predominantly in two residue β-hairpins, but the occurrence of Type I' β-turns is two times higher than Type II' β-turns. This suggests that Type I' β-turns may be more stable than Type II' β-turns, and Type I' β-turn sequence and structure can be more favorable for protein folding than Type II' β-turns. Here, we redesigned the native Type II' β-turn in GFP to Type I' β-turn, and investigated its effect on protein folding and stability. The Type I' β-turns were designed based on the statistical analysis of residues in natural Type I' β-turns. The substitution of the native "GD" sequence of i+1 and i+2 residues with Type I' preferred "(N/D)G" sequence motif increased the folding rate by 50% and slightly improved the thermodynamic stability. Despite the enhancement of in vitro refolding kinetics and stability of the redesigned mutants, they showed poor soluble expression level compared to wild type. To overcome this problem, i and i + 3 residues of the designed Type I' β-turn were further engineered. The mutation of Thr to Lys at i + 3 could restore the in vivo soluble expression of the Type I' mutant. This study indicates that Type II' β-turns in natural β-hairpins can be further optimized by converting the sequence to Type I'. © 2014 Wiley Periodicals, Inc.
Computational mining for hypothetical patterns of amino acid side chains in protein data bank (PDB)
NASA Astrophysics Data System (ADS)
Ghani, Nur Syatila Ab; Firdaus-Raih, Mohd
2018-04-01
The three-dimensional structure of a protein can provide insights regarding its function. Functional relationship between proteins can be inferred from fold and sequence similarities. In certain cases, sequence or fold comparison fails to conclude homology between proteins with similar mechanism. Since the structure is more conserved than the sequence, a constellation of functional residues can be similarly arranged among proteins of similar mechanism. Local structural similarity searches are able to detect such constellation of amino acids among distinct proteins, which can be useful to annotate proteins of unknown function. Detection of such patterns of amino acids on a large scale can increase the repertoire of important 3D motifs since available known 3D motifs currently, could not compensate the ever-increasing numbers of uncharacterized proteins to be annotated. Here, a computational platform for an automated detection of 3D motifs is described. A fuzzy-pattern searching algorithm derived from IMagine an Amino Acid 3D Arrangement search EnGINE (IMAAAGINE) was implemented to develop an automated method for searching of hypothetical patterns of amino acid side chains in Protein Data Bank (PDB), without the need for prior knowledge on related sequence or structure of pattern of interest. We present an example of the searches, which is the detection of a hypothetical pattern derived from known structural motif of C2H2 structural pattern from zinc fingers. The conservation of particular patterns of amino acid side chains in unrelated proteins is highlighted. This approach can act as a complementary method for available structure- and sequence-based platforms and may contribute in improving functional association between proteins.
Theoretical and computational studies in protein folding, design, and function
NASA Astrophysics Data System (ADS)
Morrissey, Michael Patrick
2000-10-01
In this work, simplified statistical models are used to understand an array of processes related to protein folding and design. In Part I, lattice models are utilized to test several theories about the statistical properties of protein-like systems. In Part II, sequence analysis and all-atom simulations are used to advance a novel theory for the behavior of a particular protein. Part I is divided into five chapters. In Chapter 2, a method of sequence design for model proteins, based on statistical mechanical first-principles, is developed. The cumulant design method uses a mean-field approximation to expand the free energy of a sequence in temperature. The method successfully designs sequences which fold to a target lattice structure at a specific temperature, a feat which was not possible using previous design methods. The next three chapters are computational studies of the double mutant cycle, which has been used experimentally to predict intra-protein interactions. Complete structure prediction is demonstrated for a model system using exhaustive, and also sub-exhaustive, double mutants. Nonadditivity of enthalpy, rather than of free energy, is proposed and demonstrated to be a superior marker for inter-residue contact. Next, a new double mutant protocol, called exchange mutation, is introduced. Although simple statistical arguments predict exchange mutation to be a more accurate contact predictor than standard mutant cycles, this hypothesis was not upheld in lattice simulations. Reasons for this inconsistency will be discussed. Finally, a multi-chain folding algorithm is introduced. Known as LINKS, this algorithm was developed to test a method of structure prediction which utilizes chain-break mutants. While structure prediction was not successful, LINKS should nevertheless be a useful tool for the study of protein-protein and protein-ligand interactions. The last chapter of Part I utilizes the lattice to explore the differences between standard folding, from the fully denatured state, and cotranslational folding, whereby one end of a protein is synthesized and released before the other. Cotranslational folding is shown to accelerate folding kinetics, particularly when the target backbone contains many local contacts. Additionally, cotranslation is shown capable of "guiding" a model protein into a metastable, local contact-rich state, despite the existence of a true native state of much lower energy. In Part II, a model is developed for the behavior of PrP, a unique mammalian protein which has been shown to possess two native states. The pathogenic "scrapie" state PrPSc, which has not been structurally characterized, is known to trigger conversion of the characterized endogenous conformation PrPC into additional PrPSc, Residues 144--153 are shown to form the most hydrophilic naturally occurring alpha-helix, out of a broad database with more than 10,000 candidates. The novel beta-nucleation model proposes that PrPSc, is not a distinct mono-molecular state, but is rather a beta-sheet-like aggregate centered around helix-1 components of multiple PrP molecules. The remainder of Part II uses molecular dynamics simulations to support the beta-nucleation hypothesis, and to propose a system of peptide ligands which may arrest the process of prion propagation.
Roles of β-Turns in Protein Folding: From Peptide Models to Protein Engineering
Marcelino, Anna Marie C.; Gierasch, Lila M.
2010-01-01
Reverse turns are a major class of protein secondary structure; they represent sites of chain reversal and thus sites where the globular character of a protein is created. It has been speculated for many years that turns may nucleate the formation of structure in protein folding, as their propensity to occur will favor the approximation of their flanking regions and their general tendency to be hydrophilic will favor their disposition at the solvent-accessible surface. Reverse turns are local features, and it is therefore not surprising that their structural properties have been extensively studied using peptide models. In this article, we review research on peptide models of turns to test the hypothesis that the propensities of turns to form in short peptides will relate to the roles of corresponding sequences in protein folding. Turns with significant stability as isolated entities should actively promote the folding of a protein, and by contrast, turn sequences that merely allow the chain to adopt conformations required for chain reversal are predicted to be passive in the folding mechanism. We discuss results of protein engineering studies of the roles of turn residues in folding mechanisms. Factors that correlate with the importance of turns in folding indeed include their intrinsic stability, as well as their topological context and their participation in hydrophobic networks within the protein’s structure. PMID:18275088
Lazim, Raudah; Mei, Ye; Zhang, Dawei
2012-03-01
Replica exchange molecular dynamics (REMD) simulation provides an efficient conformational sampling tool for the study of protein folding. In this study, we explore the mechanism directing the structure variation from α/4β-fold protein to 3α-fold protein after mutation by conducting REMD simulation on 42 replicas with temperatures ranging from 270 K to 710 K. The simulation began from a protein possessing the primary structure of GA88 but the tertiary structure of GB88, two G proteins with "high sequence identity." Albeit the large Cα-root mean square deviation (RMSD) of the folded protein (4.34 Å at 270 K and 4.75 Å at 304 K), a variation in tertiary structure was observed. Together with the analysis of secondary structure assignment, cluster analysis and principal component, it provides insights to the folding and unfolding pathway of 3α-fold protein and α/4β-fold protein respectively paving the way toward the understanding of the ongoings during conformational variation.
Folding of a single domain protein entering the endoplasmic reticulum precedes disulfide formation.
Robinson, Philip J; Pringle, Marie Anne; Woolhead, Cheryl A; Bulleid, Neil J
2017-04-28
The relationship between protein synthesis, folding, and disulfide formation within the endoplasmic reticulum (ER) is poorly understood. Previous studies have suggested that pre-existing disulfide links are absolutely required to allow protein folding and, conversely, that protein folding occurs prior to disulfide formation. To address the question of what happens first within the ER, that is, protein folding or disulfide formation, we studied folding events at the early stages of polypeptide chain translocation into the mammalian ER using stalled translation intermediates. Our results demonstrate that polypeptide folding can occur without complete domain translocation. Protein disulfide isomerase (PDI) interacts with these early intermediates, but disulfide formation does not occur unless the entire sequence of the protein domain is translocated. This is the first evidence that folding of the polypeptide chain precedes disulfide formation within a cellular context and highlights key differences between protein folding in the ER and refolding of purified proteins. © 2017 by The American Society for Biochemistry and Molecular Biology, Inc.
The folding energy landscape and free energy excitations of cytochrome c.
Weinkam, Patrick; Zimmermann, Jörg; Romesberg, Floyd E; Wolynes, Peter G
2010-05-18
The covalently bound heme cofactor plays a dominant role in the folding of cytochrome c. Because of the complicated inorganic chemistry of the heme, some might consider the folding of cytochrome c to be a special case, following principles different from those used to describe the folding of proteins without cofactors. Recent investigations, however, demonstrate that common models describing folding for many proteins work well for cytochrome c when heme is explicitly introduced, generally providing results that agree with experimental observations. In this Account, we first discuss results from simple native structure-based models. These models include attractive interactions between nonadjacent residues only if they are present in the crystal structure at pH 7. Because attractive nonnative contacts are not included in native structure-based models, their energy landscapes can be described as "perfectly funneled". In other words, native structure-based models are energetically guided towards the native state and contain no energetic traps that would hinder folding. Energetic traps are denoted sources of "frustration", which cause specific transient intermediates to be populated. Native structure-based models do, however, include repulsion between residues due to excluded volume. Nonenergetic traps can therefore exist if the chain, which cannot cross over itself, must partially unfold so that folding can proceed. The ability of native structure-based models to capture this kind of motion is partly responsible for their successful predictions of folding pathways for many types of proteins. Models without frustration describe the sequence of folding events for cytochrome c well (as inferred from hydrogen-exchange experiments), thereby justifying their use as a starting point. At low pH, the experimentally observed folding sequence of cytochrome c deviates from that at pH 7 and from models with perfectly funneled energy landscapes. Here, alternate folding pathways are a result of "chemical frustration". This frustration arises because some regions of the protein are destabilized more than others due to the heterogeneous distribution of titratable residues that are protonated at low pH. Beginning with native structure-based terms, we construct more complex models by adding chemical frustration. These more complex models only modestly perturb the energy landscape, which remains, overall, well funneled. These perturbed models can accurately describe how alternative folding pathways are used at low pH. At alkaline pH, cytochrome c populates distinctly different structural ensembles. For instance, lysine residues are deprotonated and compete for the heme ligation site. The same models that can describe folding at low pH also predict well the structures and relative stabilities of intermediates populated at alkaline pH. The success of models based on funneled energy landscapes suggest that cytochrome c folding is driven primarily by native contacts. The presence of heme appears to add chemical complexity to the folding process, but it does not require fundamental modification of the general principles used to describe folding. Moreover, its added complexity provides a valuable means of probing the folding energy landscape in greater detail than is possible with simpler systems.
Bandyopadhyay, Boudhayan; Goldenzweig, Adi; Unger, Tamar; Adato, Orit; Fleishman, Sarel J; Unger, Ron; Horovitz, Amnon
2017-12-15
The GroE chaperonin system in Escherichia coli comprises GroEL and GroES and facilitates ATP-dependent protein folding in vivo and in vitro Proteins with very similar sequences and structures can differ in their dependence on GroEL for efficient folding. One potential but unverified source for GroEL dependence is frustration, wherein not all interactions in the native state are optimized energetically, thereby potentiating slow folding and misfolding. Here, we chose enhanced green fluorescent protein as a model system and subjected it to random mutagenesis, followed by screening for variants whose in vivo folding displays increased or decreased GroEL dependence. We confirmed the altered GroEL dependence of these variants with in vitro folding assays. Strikingly, mutations at positions predicted to be highly frustrated were found to correlate with decreased GroEL dependence. Conversely, mutations at positions with low frustration were found to correlate with increased GroEL dependence. Further support for this finding was obtained by showing that folding of an enhanced green fluorescent protein variant designed computationally to have reduced frustration is indeed less GroEL-dependent. Our results indicate that changes in local frustration also affect partitioning in vivo between spontaneous and chaperonin-mediated folding. Hence, the design of minimally frustrated sequences can reduce chaperonin dependence and improve protein expression levels. © 2017 by The American Society for Biochemistry and Molecular Biology, Inc.
Pope, Welkin H.; Weigele, Peter R.; Chang, Juan; Pedulla, Marisa L.; Ford, Michael E.; Houtz, Jennifer M.; Jiang, Wen; Chiu, Wah; Hatfull, Graham F.; Hendrix, Roger W.; King, Jonathan
2010-01-01
Marine Synechococcus spp and marine Prochlorococcus spp are numerically dominant photoautotrophs in the open oceans and contributors to the global carbon cycle. Syn5 is a short-tailed cyanophage isolated from the Sargasso Sea on Synechococcus strain WH8109. Syn5 has been grown in WH8109 to high titer in the laboratory and purified and concentrated retaining infectivity. Genome sequencing and annotation of Syn5 revealed that the linear genome is 46,214bp with a 237bp terminal direct repeat. Sixty-one open reading frames (ORFs) were identified. Based on genomic organization and sequence similarity to known protein sequences within GenBank, Syn5 shares features with T7-like phages. The presence of a putative integrase suggests access to a temperate life-cycle. Assignment of eleven ORFs to structural proteins found within the phage virion was confirmed by mass-spectrometry and N-terminal sequencing. Eight of these identified structural proteins exhibited amino acid sequence similarity to enteric phage proteins. The remaining three virion proteins did not resemble any known phage sequences in GenBank as of August 2006. Cryoelectron micrographs of purified Syn5 virions revealed that the capsid has a single “horn”, a novel fibrous structure protruding from the opposing end of the capsid from the tail of the virion. The tail appendage displayed an apparent three-fold rather than six-fold symmetry. An 18Å-resolution icosahedral reconstruction of the capsid revealed a T=7 lattice, but with an unusual pattern of surface knobs. This phage/host system should allow detailed investigation of the physiology and biochemistry of phage propagation in marine photosynthetic bacteria. PMID:17383677
Query-seeded iterative sequence similarity searching improves selectivity 5–20-fold
Li, Weizhong; Lopez, Rodrigo
2017-01-01
Abstract Iterative similarity search programs, like psiblast, jackhmmer, and psisearch, are much more sensitive than pairwise similarity search methods like blast and ssearch because they build a position specific scoring model (a PSSM or HMM) that captures the pattern of sequence conservation characteristic to a protein family. But models are subject to contamination; once an unrelated sequence has been added to the model, homologs of the unrelated sequence will also produce high scores, and the model can diverge from the original protein family. Examination of alignment errors during psiblast PSSM contamination suggested a simple strategy for dramatically reducing PSSM contamination. psiblast PSSMs are built from the query-based multiple sequence alignment (MSA) implied by the pairwise alignments between the query model (PSSM, HMM) and the subject sequences in the library. When the original query sequence residues are inserted into gapped positions in the aligned subject sequence, the resulting PSSM rarely produces alignment over-extensions or alignments to unrelated sequences. This simple step, which tends to anchor the PSSM to the original query sequence and slightly increase target percent identity, can reduce the frequency of false-positive alignments more than 20-fold compared with psiblast and jackhmmer, with little loss in search sensitivity. PMID:27923999
Martínez-Castilla, León P.; Rodríguez-Sotres, Rogelio
2010-01-01
Background Despite the remarkable progress of bioinformatics, how the primary structure of a protein leads to a three-dimensional fold, and in turn determines its function remains an elusive question. Alignments of sequences with known function can be used to identify proteins with the same or similar function with high success. However, identification of function-related and structure-related amino acid positions is only possible after a detailed study of every protein. Folding pattern diversity seems to be much narrower than sequence diversity, and the amino acid sequences of natural proteins have evolved under a selective pressure comprising structural and functional requirements acting in parallel. Principal Findings The approach described in this work begins by generating a large number of amino acid sequences using ROSETTA [Dantas G et al. (2003) J Mol Biol 332:449–460], a program with notable robustness in the assignment of amino acids to a known three-dimensional structure. The resulting sequence-sets showed no conservation of amino acids at active sites, or protein-protein interfaces. Hidden Markov models built from the resulting sequence sets were used to search sequence databases. Surprisingly, the models retrieved from the database sequences belonged to proteins with the same or a very similar function. Given an appropriate cutoff, the rate of false positives was zero. According to our results, this protocol, here referred to as Rd.HMM, detects fine structural details on the folding patterns, that seem to be tightly linked to the fitness of a structural framework for a specific biological function. Conclusion Because the sequence of the native protein used to create the Rd.HMM model was always amongst the top hits, the procedure is a reliable tool to score, very accurately, the quality and appropriateness of computer-modeled 3D-structures, without the need for spectroscopy data. However, Rd.HMM is very sensitive to the conformational features of the models' backbone. PMID:20830209
Predicting turns in proteins with a unified model.
Song, Qi; Li, Tonghua; Cong, Peisheng; Sun, Jiangming; Li, Dapeng; Tang, Shengnan
2012-01-01
Turns are a critical element of the structure of a protein; turns play a crucial role in loops, folds, and interactions. Current prediction methods are well developed for the prediction of individual turn types, including α-turn, β-turn, and γ-turn, etc. However, for further protein structure and function prediction it is necessary to develop a uniform model that can accurately predict all types of turns simultaneously. In this study, we present a novel approach, TurnP, which offers the ability to investigate all the turns in a protein based on a unified model. The main characteristics of TurnP are: (i) using newly exploited features of structural evolution information (secondary structure and shape string of protein) based on structure homologies, (ii) considering all types of turns in a unified model, and (iii) practical capability of accurate prediction of all turns simultaneously for a query. TurnP utilizes predicted secondary structures and predicted shape strings, both of which have greater accuracy, based on innovative technologies which were both developed by our group. Then, sequence and structural evolution features, which are profile of sequence, profile of secondary structures and profile of shape strings are generated by sequence and structure alignment. When TurnP was validated on a non-redundant dataset (4,107 entries) by five-fold cross-validation, we achieved an accuracy of 88.8% and a sensitivity of 71.8%, which exceeded the most state-of-the-art predictors of certain type of turn. Newly determined sequences, the EVA and CASP9 datasets were used as independent tests and the results we achieved were outstanding for turn predictions and confirmed the good performance of TurnP for practical applications.
Predicting Turns in Proteins with a Unified Model
Song, Qi; Li, Tonghua; Cong, Peisheng; Sun, Jiangming; Li, Dapeng; Tang, Shengnan
2012-01-01
Motivation Turns are a critical element of the structure of a protein; turns play a crucial role in loops, folds, and interactions. Current prediction methods are well developed for the prediction of individual turn types, including α-turn, β-turn, and γ-turn, etc. However, for further protein structure and function prediction it is necessary to develop a uniform model that can accurately predict all types of turns simultaneously. Results In this study, we present a novel approach, TurnP, which offers the ability to investigate all the turns in a protein based on a unified model. The main characteristics of TurnP are: (i) using newly exploited features of structural evolution information (secondary structure and shape string of protein) based on structure homologies, (ii) considering all types of turns in a unified model, and (iii) practical capability of accurate prediction of all turns simultaneously for a query. TurnP utilizes predicted secondary structures and predicted shape strings, both of which have greater accuracy, based on innovative technologies which were both developed by our group. Then, sequence and structural evolution features, which are profile of sequence, profile of secondary structures and profile of shape strings are generated by sequence and structure alignment. When TurnP was validated on a non-redundant dataset (4,107 entries) by five-fold cross-validation, we achieved an accuracy of 88.8% and a sensitivity of 71.8%, which exceeded the most state-of-the-art predictors of certain type of turn. Newly determined sequences, the EVA and CASP9 datasets were used as independent tests and the results we achieved were outstanding for turn predictions and confirmed the good performance of TurnP for practical applications. PMID:23144872
Structure of human POFUT2: insights into thrombospondin type 1 repeat fold and O-fucosylation
Chen, Chun-I; Keusch, Jeremy J; Klein, Dominique; Hess, Daniel; Hofsteenge, Jan; Gut, Heinz
2012-01-01
Protein O-fucosylation is a post-translational modification found on serine/threonine residues of thrombospondin type 1 repeats (TSR). The fucose transfer is catalysed by the enzyme protein O-fucosyltransferase 2 (POFUT2) and >40 human proteins contain the TSR consensus sequence for POFUT2-dependent fucosylation. To better understand O-fucosylation on TSR, we carried out a structural and functional analysis of human POFUT2 and its TSR substrate. Crystal structures of POFUT2 reveal a variation of the classical GT-B fold and identify sugar donor and TSR acceptor binding sites. Structural findings are correlated with steady-state kinetic measurements of wild-type and mutant POFUT2 and TSR and give insight into the catalytic mechanism and substrate specificity. By using an artificial mini-TSR substrate, we show that specificity is not primarily encoded in the TSR protein sequence but rather in the unusual 3D structure of a small part of the TSR. Our findings uncover that recognition of distinct conserved 3D fold motifs can be used as a mechanism to achieve substrate specificity by enzymes modifying completely folded proteins of very wide sequence diversity and biological function. PMID:22588082
Re-visiting protein-centric two-tier classification of existing DNA-protein complexes
2012-01-01
Background Precise DNA-protein interactions play most important and vital role in maintaining the normal physiological functioning of the cell, as it controls many high fidelity cellular processes. Detailed study of the nature of these interactions has paved the way for understanding the mechanisms behind the biological processes in which they are involved. Earlier in 2000, a systematic classification of DNA-protein complexes based on the structural analysis of the proteins was proposed at two tiers, namely groups and families. With the advancement in the number and resolution of structures of DNA-protein complexes deposited in the Protein Data Bank, it is important to revisit the existing classification. Results On the basis of the sequence analysis of DNA binding proteins, we have built upon the protein centric, two-tier classification of DNA-protein complexes by adding new members to existing families and making new families and groups. While classifying the new complexes, we also realised the emergence of new groups and families. The new group observed was where β-propeller was seen to interact with DNA. There were 34 SCOP folds which were observed to be present in the complexes of both old and new classifications, whereas 28 folds are present exclusively in the new complexes. Some new families noticed were NarL transcription factor, Z-α DNA binding proteins, Forkhead transcription factor, AP2 protein, Methyl CpG binding protein etc. Conclusions Our results suggest that with the increasing number of availability of DNA-protein complexes in Protein Data Bank, the number of families in the classification increased by approximately three fold. The folds present exclusively in newly classified complexes is suggestive of inclusion of proteins with new function in new classification, the most populated of which are the folds responsible for DNA damage repair. The proposed re-visited classification can be used to perform genome-wide surveys in the genomes of interest for the presence of DNA-binding proteins. Further analysis of these complexes can aid in developing algorithms for identifying DNA-binding proteins and their family members from mere sequence information. PMID:22800292
Re-visiting protein-centric two-tier classification of existing DNA-protein complexes.
Malhotra, Sony; Sowdhamini, Ramanathan
2012-07-16
Precise DNA-protein interactions play most important and vital role in maintaining the normal physiological functioning of the cell, as it controls many high fidelity cellular processes. Detailed study of the nature of these interactions has paved the way for understanding the mechanisms behind the biological processes in which they are involved. Earlier in 2000, a systematic classification of DNA-protein complexes based on the structural analysis of the proteins was proposed at two tiers, namely groups and families. With the advancement in the number and resolution of structures of DNA-protein complexes deposited in the Protein Data Bank, it is important to revisit the existing classification. On the basis of the sequence analysis of DNA binding proteins, we have built upon the protein centric, two-tier classification of DNA-protein complexes by adding new members to existing families and making new families and groups. While classifying the new complexes, we also realised the emergence of new groups and families. The new group observed was where β-propeller was seen to interact with DNA. There were 34 SCOP folds which were observed to be present in the complexes of both old and new classifications, whereas 28 folds are present exclusively in the new complexes. Some new families noticed were NarL transcription factor, Z-α DNA binding proteins, Forkhead transcription factor, AP2 protein, Methyl CpG binding protein etc. Our results suggest that with the increasing number of availability of DNA-protein complexes in Protein Data Bank, the number of families in the classification increased by approximately three fold. The folds present exclusively in newly classified complexes is suggestive of inclusion of proteins with new function in new classification, the most populated of which are the folds responsible for DNA damage repair. The proposed re-visited classification can be used to perform genome-wide surveys in the genomes of interest for the presence of DNA-binding proteins. Further analysis of these complexes can aid in developing algorithms for identifying DNA-binding proteins and their family members from mere sequence information.
Structural and sequence features of two residue turns in beta-hairpins.
Madan, Bharat; Seo, Sung Yong; Lee, Sun-Gu
2014-09-01
Beta-turns in beta-hairpins have been implicated as important sites in protein folding. In particular, two residue β-turns, the most abundant connecting elements in beta-hairpins, have been a major target for engineering protein stability and folding. In this study, we attempted to investigate and update the structural and sequence properties of two residue turns in beta-hairpins with a large data set. For this, 3977 beta-turns were extracted from 2394 nonhomologous protein chains and analyzed. First, the distribution, dihedral angles and twists of two residue turn types were determined, and compared with previous data. The trend of turn type occurrence and most structural features of the turn types were similar to previous results, but for the first time Type II turns in beta-hairpins were identified. Second, sequence motifs for the turn types were devised based on amino acid positional potentials of two-residue turns, and their distributions were examined. From this study, we could identify code-like sequence motifs for the two residue beta-turn types. Finally, structural and sequence properties of beta-strands in the beta-hairpins were analyzed, which revealed that the beta-strands showed no specific sequence and structural patterns for turn types. The analytical results in this study are expected to be a reference in the engineering or design of beta-hairpin turn structures and sequences. © 2014 Wiley Periodicals, Inc.
Self-Complementarity within Proteins: Bridging the Gap between Binding and Folding
Basu, Sankar; Bhattacharyya, Dhananjay; Banerjee, Rahul
2012-01-01
Complementarity, in terms of both shape and electrostatic potential, has been quantitatively estimated at protein-protein interfaces and used extensively to predict the specific geometry of association between interacting proteins. In this work, we attempted to place both binding and folding on a common conceptual platform based on complementarity. To that end, we estimated (for the first time to our knowledge) electrostatic complementarity (Em) for residues buried within proteins. Em measures the correlation of surface electrostatic potential at protein interiors. The results show fairly uniform and significant values for all amino acids. Interestingly, hydrophobic side chains also attain appreciable complementarity primarily due to the trajectory of the main chain. Previous work from our laboratory characterized the surface (or shape) complementarity (Sm) of interior residues, and both of these measures have now been combined to derive two scoring functions to identify the native fold amid a set of decoys. These scoring functions are somewhat similar to functions that discriminate among multiple solutions in a protein-protein docking exercise. The performances of both of these functions on state-of-the-art databases were comparable if not better than most currently available scoring functions. Thus, analogously to interfacial residues of protein chains associated (docked) with specific geometry, amino acids found in the native interior have to satisfy fairly stringent constraints in terms of both Sm and Em. The functions were also found to be useful for correctly identifying the same fold for two sequences with low sequence identity. Finally, inspired by the Ramachandran plot, we developed a plot of Sm versus Em (referred to as the complementarity plot) that identifies residues with suboptimal packing and electrostatics which appear to be correlated to coordinate errors. PMID:22713576
Self-complementarity within proteins: bridging the gap between binding and folding.
Basu, Sankar; Bhattacharyya, Dhananjay; Banerjee, Rahul
2012-06-06
Complementarity, in terms of both shape and electrostatic potential, has been quantitatively estimated at protein-protein interfaces and used extensively to predict the specific geometry of association between interacting proteins. In this work, we attempted to place both binding and folding on a common conceptual platform based on complementarity. To that end, we estimated (for the first time to our knowledge) electrostatic complementarity (Em) for residues buried within proteins. Em measures the correlation of surface electrostatic potential at protein interiors. The results show fairly uniform and significant values for all amino acids. Interestingly, hydrophobic side chains also attain appreciable complementarity primarily due to the trajectory of the main chain. Previous work from our laboratory characterized the surface (or shape) complementarity (Sm) of interior residues, and both of these measures have now been combined to derive two scoring functions to identify the native fold amid a set of decoys. These scoring functions are somewhat similar to functions that discriminate among multiple solutions in a protein-protein docking exercise. The performances of both of these functions on state-of-the-art databases were comparable if not better than most currently available scoring functions. Thus, analogously to interfacial residues of protein chains associated (docked) with specific geometry, amino acids found in the native interior have to satisfy fairly stringent constraints in terms of both Sm and Em. The functions were also found to be useful for correctly identifying the same fold for two sequences with low sequence identity. Finally, inspired by the Ramachandran plot, we developed a plot of Sm versus Em (referred to as the complementarity plot) that identifies residues with suboptimal packing and electrostatics which appear to be correlated to coordinate errors. Copyright © 2012 Biophysical Society. Published by Elsevier Inc. All rights reserved.
Effective Potentials for Folding Proteins
NASA Astrophysics Data System (ADS)
Chen, Nan-Yow; Su, Zheng-Yao; Mou, Chung-Yu
2006-02-01
A coarse-grained off-lattice model that is not biased in any way to the native state is proposed to fold proteins. To predict the native structure in a reasonable time, the model has included the essential effects of water in an effective potential. Two new ingredients, the dipole-dipole interaction and the local hydrophobic interaction, are introduced and are shown to be as crucial as the hydrogen bonding. The model allows successful folding of the wild-type sequence of protein G and may have provided important hints to the study of protein folding.
Geller, Ron; Pechmann, Sebastian; Acevedo, Ashley; Andino, Raul; Frydman, Judith
2018-05-03
Acquisition of mutations is central to evolution; however, the detrimental effects of most mutations on protein folding and stability limit protein evolvability. Molecular chaperones, which suppress aggregation and facilitate polypeptide folding, may alleviate the effects of destabilizing mutations thus promoting sequence diversification. To illuminate how chaperones can influence protein evolution, we examined the effect of reduced activity of the chaperone Hsp90 on poliovirus evolution. We find that Hsp90 offsets evolutionary trade-offs between protein stability and aggregation. Lower chaperone levels favor variants of reduced hydrophobicity and protein aggregation propensity but at a cost to protein stability. Notably, reducing Hsp90 activity also promotes clusters of codon-deoptimized synonymous mutations at inter-domain boundaries, likely to facilitate cotranslational domain folding. Our results reveal how a chaperone can shape the sequence landscape at both the protein and RNA levels to harmonize competing constraints posed by protein stability, aggregation propensity, and translation rate on successful protein biogenesis.
Random heteropolymers preserve protein function in foreign environments
NASA Astrophysics Data System (ADS)
Panganiban, Brian; Qiao, Baofu; Jiang, Tao; DelRe, Christopher; Obadia, Mona M.; Nguyen, Trung Dac; Smith, Anton A. A.; Hall, Aaron; Sit, Izaac; Crosby, Marquise G.; Dennis, Patrick B.; Drockenmuller, Eric; Olvera de la Cruz, Monica; Xu, Ting
2018-03-01
The successful incorporation of active proteins into synthetic polymers could lead to a new class of materials with functions found only in living systems. However, proteins rarely function under the conditions suitable for polymer processing. On the basis of an analysis of trends in protein sequences and characteristic chemical patterns on protein surfaces, we designed four-monomer random heteropolymers to mimic intrinsically disordered proteins for protein solubilization and stabilization in non-native environments. The heteropolymers, with optimized composition and statistical monomer distribution, enable cell-free synthesis of membrane proteins with proper protein folding for transport and enzyme-containing plastics for toxin bioremediation. Controlling the statistical monomer distribution in a heteropolymer, rather than the specific monomer sequence, affords a new strategy to interface with biological systems for protein-based biomaterials.
A Folding Zone in the Ribosomal Exit Tunnel for Kv1.3 Helix Formation
Tu, LiWei; Deutsch, Carol
2010-01-01
SUMMARY Although it is now clear that protein secondary structure can be acquired early, while the nascent peptide resides within the ribosomal exit tunnel, the principles governing folding of native polytopic proteins have not yet been elucidated. We now report an extensive investigation of native Kv1.3, a voltage-gated K+ channel, including transmembrane and linker segments synthesized in sequence. These native segments form helices vectorially (N- to C-terminus) only in a permissive vestibule located in the last 20Å of the tunnel. Native linker sequences similarly fold in this vestibule. Finally, secondary structure acquired in the ribosome is retained in the translocon. These findings emerge from accessibility studies of a diversity of native transmembrane and linker sequences and may therefore be applicable to protein biogenesis in general. PMID:20060838
Combining Rosetta with molecular dynamics (MD): A benchmark of the MD-based ensemble protein design.
Ludwiczak, Jan; Jarmula, Adam; Dunin-Horkawicz, Stanislaw
2018-07-01
Computational protein design is a set of procedures for computing amino acid sequences that will fold into a specified structure. Rosetta Design, a commonly used software for protein design, allows for the effective identification of sequences compatible with a given backbone structure, while molecular dynamics (MD) simulations can thoroughly sample near-native conformations. We benchmarked a procedure in which Rosetta design is started on MD-derived structural ensembles and showed that such a combined approach generates 20-30% more diverse sequences than currently available methods with only a slight increase in computation time. Importantly, the increase in diversity is achieved without a loss in the quality of the designed sequences assessed by their resemblance to natural sequences. We demonstrate that the MD-based procedure is also applicable to de novo design tasks started from backbone structures without any sequence information. In addition, we implemented a protocol that can be used to assess the stability of designed models and to select the best candidates for experimental validation. In sum our results demonstrate that the MD ensemble-based flexible backbone design can be a viable method for protein design, especially for tasks that require a large pool of diverse sequences. Copyright © 2018 Elsevier Inc. All rights reserved.
Huang, Chuen-Der; Lin, Chin-Teng; Pal, Nikhil Ranjan
2003-12-01
The structure classification of proteins plays a very important role in bioinformatics, since the relationships and characteristics among those known proteins can be exploited to predict the structure of new proteins. The success of a classification system depends heavily on two things: the tools being used and the features considered. For the bioinformatics applications, the role of appropriate features has not been paid adequate importance. In this investigation we use three novel ideas for multiclass protein fold classification. First, we use the gating neural network, where each input node is associated with a gate. This network can select important features in an online manner when the learning goes on. At the beginning of the training, all gates are almost closed, i.e., no feature is allowed to enter the network. Through the training, gates corresponding to good features are completely opened while gates corresponding to bad features are closed more tightly, and some gates may be partially open. The second novel idea is to use a hierarchical learning architecture (HLA). The classifier in the first level of HLA classifies the protein features into four major classes: all alpha, all beta, alpha + beta, and alpha/beta. And in the next level we have another set of classifiers, which further classifies the protein features into 27 folds. The third novel idea is to induce the indirect coding features from the amino-acid composition sequence of proteins based on the N-gram concept. This provides us with more representative and discriminative new local features of protein sequences for multiclass protein fold classification. The proposed HLA with new indirect coding features increases the protein fold classification accuracy by about 12%. Moreover, the gating neural network is found to reduce the number of features drastically. Using only half of the original features selected by the gating neural network can reach comparable test accuracy as that using all the original features. The gating mechanism also helps us to get a better insight into the folding process of proteins. For example, tracking the evolution of different gates we can find which characteristics (features) of the data are more important for the folding process. And, of course, it also reduces the computation time.
Vasquez, Kevin A; Hatridge, Taylor A; Curtis, Nicholas C; Contreras, Lydia M
2016-02-19
Recent studies have demonstrated that effective protein production requires coordination of multiple cotranslational cellular processes, which are heavily affected by translation timing. Until recently, protein engineering has focused on codon optimization to maximize protein production rates, mostly considering the effect of tRNA abundance. However, as it relates to complex multidomain proteins, it has been hypothesized that strategic translational pauses between domains and between distinct individual structural motifs can prevent interactions between nascent chain fragments that generate kinetically trapped misfolded peptides and thereby enhance protein yields. In this study, we introduce synthetic transient pauses between structural domains in a heterologous model protein based on designed patterns of affinity between the mRNA and the anti-Shine-Dalgarno (aSD) sequence on the ribosome. We demonstrate that optimizing translation attenuation at domain boundaries can predictably affect solubility patterns in bacteria. Exploration of the affinity space showed that modifying less than 1% of the nucleotides (on a small 12 amino acid linker) can vary soluble protein yields up to ∼7-fold without altering the primary sequence of the protein. In the context of longer linkers, where a larger number of distinct structural motifs can fold outside the ribosome, optimal synonymous codon variations resulted in an additional 2.1-fold increase in solubility, relative to that of nonoptimized linkers of the same length. While rational construction of 54 linkers of various affinities showed a significant correlation between protein solubility and predicted affinity, only weaker correlations were observed between tRNA abundance and protein solubility. We also demonstrate that naturally occurring high-affinity clusters are present between structural domains of β-galactosidase, one of Escherichia coli's largest native proteins. Interdomain ribosomal affinity is an important factor that has not previously been explored in the context of protein engineering.
A reduced amino acid alphabet for understanding and designing protein adaptation to mutation.
Etchebest, C; Benros, C; Bornot, A; Camproux, A-C; de Brevern, A G
2007-11-01
Protein sequence world is considerably larger than structure world. In consequence, numerous non-related sequences may adopt similar 3D folds and different kinds of amino acids may thus be found in similar 3D structures. By grouping together the 20 amino acids into a smaller number of representative residues with similar features, sequence world simplification may be achieved. This clustering hence defines a reduced amino acid alphabet (reduced AAA). Numerous works have shown that protein 3D structures are composed of a limited number of building blocks, defining a structural alphabet. We previously identified such an alphabet composed of 16 representative structural motifs (5-residues length) called Protein Blocks (PBs). This alphabet permits to translate the structure (3D) in sequence of PBs (1D). Based on these two concepts, reduced AAA and PBs, we analyzed the distributions of the different kinds of amino acids and their equivalences in the structural context. Different reduced sets were considered. Recurrent amino acid associations were found in all the local structures while other were specific of some local structures (PBs) (e.g Cysteine, Histidine, Threonine and Serine for the alpha-helix Ncap). Some similar associations are found in other reduced AAAs, e.g Ile with Val, or hydrophobic aromatic residues Trp with Phe and Tyr. We put into evidence interesting alternative associations. This highlights the dependence on the information considered (sequence or structure). This approach, equivalent to a substitution matrix, could be useful for designing protein sequence with different features (for instance adaptation to environment) while preserving mainly the 3D fold.
Nissley, Daniel A.; Sharma, Ajeet K.; Ahmed, Nabeel; Friedrich, Ulrike A.; Kramer, Günter; Bukau, Bernd; O'Brien, Edward P.
2016-01-01
The rates at which domains fold and codons are translated are important factors in determining whether a nascent protein will co-translationally fold and function or misfold and malfunction. Here we develop a chemical kinetic model that calculates a protein domain's co-translational folding curve during synthesis using only the domain's bulk folding and unfolding rates and codon translation rates. We show that this model accurately predicts the course of co-translational folding measured in vivo for four different protein molecules. We then make predictions for a number of different proteins in yeast and find that synonymous codon substitutions, which change translation-elongation rates, can switch some protein domains from folding post-translationally to folding co-translationally—a result consistent with previous experimental studies. Our approach explains essential features of co-translational folding curves and predicts how varying the translation rate at different codon positions along a transcript's coding sequence affects this self-assembly process. PMID:26887592
Swellix: a computational tool to explore RNA conformational space.
Sloat, Nathan; Liu, Jui-Wen; Schroeder, Susan J
2017-11-21
The sequence of nucleotides in an RNA determines the possible base pairs for an RNA fold and thus also determines the overall shape and function of an RNA. The Swellix program presented here combines a helix abstraction with a combinatorial approach to the RNA folding problem in order to compute all possible non-pseudoknotted RNA structures for RNA sequences. The Swellix program builds on the Crumple program and can include experimental constraints on global RNA structures such as the minimum number and lengths of helices from crystallography, cryoelectron microscopy, or in vivo crosslinking and chemical probing methods. The conceptual advance in Swellix is to count helices and generate all possible combinations of helices rather than counting and combining base pairs. Swellix bundles similar helices and includes improvements in memory use and efficient parallelization. Biological applications of Swellix are demonstrated by computing the reduction in conformational space and entropy due to naturally modified nucleotides in tRNA sequences and by motif searches in Human Endogenous Retroviral (HERV) RNA sequences. The Swellix motif search reveals occurrences of protein and drug binding motifs in the HERV RNA ensemble that do not occur in minimum free energy or centroid predicted structures. Swellix presents significant improvements over Crumple in terms of efficiency and memory use. The efficient parallelization of Swellix enables the computation of sequences as long as 418 nucleotides with sufficient experimental constraints. Thus, Swellix provides a practical alternative to free energy minimization tools when multiple structures, kinetically determined structures, or complex RNA-RNA and RNA-protein interactions are present in an RNA folding problem.
Yeast prion architecture explains how proteins can be genes
NASA Astrophysics Data System (ADS)
Wickner, Reed
2013-03-01
Prions (infectious proteins) transmit information without an accompanying DNA or RNA. Most yeast prions are self-propagating amyloids that inactivate a normally functional protein. A single protein can become any of several prion variants, with different manifestations due to different amyloid structures. We showed that the yeast prion amyloids of Ure2p, Sup35p and Rnq1p are folded in-register parallel beta sheets using solid state NMR dipolar recoupling experiments, mass-per-filament-length measurements, and filament diameter measurements. The extent of beta sheet structure, measured by chemical shifts in solid-state NMR and acquired protease-resistance on amyloid formation, combined with the measured filament diameters, imply that the beta sheets must be folded along the long axis of the filament. We speculate that prion variants of a single protein sequence differ in the location of these folds. Favorable interactions between identical side chains must hold these structures in-register. The same interactions must guide an unstructured monomer joining the end of a filament to assume the same conformation as molecules already in the filament, with the turns at the same locations. In this way, a protein can template its own conformation, in analogy to the ability of a DNA molecule to template its sequence by specific base-pairing. Bldg. 8, Room 225, NIH, 8 Center Drive MSC 0830, Bethesda, MD 20892-0830, wickner@helix.nih.gov, 301-496-3452
Free Energy Landscape - Settlements of Key Residues.
NASA Astrophysics Data System (ADS)
Aroutiounian, Svetlana
2007-03-01
FEL perspective in studies of protein folding transitions reflects notion that since there are ˜10^N conformations to scan in search of lowest free energy state, random search is beyond biological timescale. Protein folding must follow certain fel pathways and folding kinetics of evolutionary selected proteins dominates kinetic traps. Good model for functional robustness of natural proteins - coarse-grained model protein is not very accurate but affords bringing simulations closer to biological realm; Go-like potential secures the fel funnel shape; biochemical contacts signify the funnel bottleneck. Boltzmann-weighted ensemble of protein conformations and histogram method are used to obtain from MC sampling of protein conformational space the approximate probability distribution. The fel is F(rmsd) = -1/βLn[Hist(rmsd)], β=kBT and rmsd is root-mean-square-deviation from native conformation. The sperm whale myoglobin has rich dynamic behavior, is small and large - on computational scale, has a symmetry in architecture and unusual sextet of residue pairs. Main idea: there is a mathematical relation between protein fel and a key residues set providing stability to folding transition. Is the set evolutionary conserved also for functional reasons? Hypothesis: primary sequence determines the key residues positions conserved as stabilizers and the fel is the battlefield for the folding stability. Preliminary results: primary sequence - not the architecture, is the rule settler, indeed.
Prediction of redox-sensitive cysteines using sequential distance and other sequence-based features.
Sun, Ming-An; Zhang, Qing; Wang, Yejun; Ge, Wei; Guo, Dianjing
2016-08-24
Reactive oxygen species can modify the structure and function of proteins and may also act as important signaling molecules in various cellular processes. Cysteine thiol groups of proteins are particularly susceptible to oxidation. Meanwhile, their reversible oxidation is of critical roles for redox regulation and signaling. Recently, several computational tools have been developed for predicting redox-sensitive cysteines; however, those methods either only focus on catalytic redox-sensitive cysteines in thiol oxidoreductases, or heavily depend on protein structural data, thus cannot be widely used. In this study, we analyzed various sequence-based features potentially related to cysteine redox-sensitivity, and identified three types of features for efficient computational prediction of redox-sensitive cysteines. These features are: sequential distance to the nearby cysteines, PSSM profile and predicted secondary structure of flanking residues. After further feature selection using SVM-RFE, we developed Redox-Sensitive Cysteine Predictor (RSCP), a SVM based classifier for redox-sensitive cysteine prediction using primary sequence only. Using 10-fold cross-validation on RSC758 dataset, the accuracy, sensitivity, specificity, MCC and AUC were estimated as 0.679, 0.602, 0.756, 0.362 and 0.727, respectively. When evaluated using 10-fold cross-validation with BALOSCTdb dataset which has structure information, the model achieved performance comparable to current structure-based method. Further validation using an independent dataset indicates it is robust and of relatively better accuracy for predicting redox-sensitive cysteines from non-enzyme proteins. In this study, we developed a sequence-based classifier for predicting redox-sensitive cysteines. The major advantage of this method is that it does not rely on protein structure data, which ensures more extensive application compared to other current implementations. Accurate prediction of redox-sensitive cysteines not only enhances our understanding about the redox sensitivity of cysteine, it may also complement the proteomics approach and facilitate further experimental investigation of important redox-sensitive cysteines.
Biophysics of protein evolution and evolutionary protein biophysics
Sikosek, Tobias; Chan, Hue Sun
2014-01-01
The study of molecular evolution at the level of protein-coding genes often entails comparing large datasets of sequences to infer their evolutionary relationships. Despite the importance of a protein's structure and conformational dynamics to its function and thus its fitness, common phylogenetic methods embody minimal biophysical knowledge of proteins. To underscore the biophysical constraints on natural selection, we survey effects of protein mutations, highlighting the physical basis for marginal stability of natural globular proteins and how requirement for kinetic stability and avoidance of misfolding and misinteractions might have affected protein evolution. The biophysical underpinnings of these effects have been addressed by models with an explicit coarse-grained spatial representation of the polypeptide chain. Sequence–structure mappings based on such models are powerful conceptual tools that rationalize mutational robustness, evolvability, epistasis, promiscuous function performed by ‘hidden’ conformational states, resolution of adaptive conflicts and conformational switches in the evolution from one protein fold to another. Recently, protein biophysics has been applied to derive more accurate evolutionary accounts of sequence data. Methods have also been developed to exploit sequence-based evolutionary information to predict biophysical behaviours of proteins. The success of these approaches demonstrates a deep synergy between the fields of protein biophysics and protein evolution. PMID:25165599
In vitro folding of inclusion body proteins.
Rudolph, R; Lilie, H
1996-01-01
Insoluble, inactive inclusion bodies are frequently formed upon recombinant protein production in transformed microorganisms. These inclusion bodies, which contain the recombinant protein in an highly enriched form, can be isolated by solid/liquid separation. After solubilization, native proteins can be generated from the inactive material by using in vitro folding techniques. New folding procedures have been developed for efficient in vitro reconstitution of complex hydrophobic, multidomain, oligomeric, or highly disulfide-bonded proteins. These protocols take into account process parameters such as protein concentration, catalysis of disulfide bond formation, temperature, pH, and ionic strength, as well as specific solvent ingredients that reduce unproductive side reactions. Modification of the protein sequence has been exploited to improve in vitro folding.
Protein structure prediction with local adjust tabu search algorithm
2014-01-01
Background Protein folding structure prediction is one of the most challenging problems in the bioinformatics domain. Because of the complexity of the realistic protein structure, the simplified structure model and the computational method should be adopted in the research. The AB off-lattice model is one of the simplification models, which only considers two classes of amino acids, hydrophobic (A) residues and hydrophilic (B) residues. Results The main work of this paper is to discuss how to optimize the lowest energy configurations in 2D off-lattice model and 3D off-lattice model by using Fibonacci sequences and real protein sequences. In order to avoid falling into local minimum and faster convergence to the global minimum, we introduce a novel method (SATS) to the protein structure problem, which combines simulated annealing algorithm and tabu search algorithm. Various strategies, such as the new encoding strategy, the adaptive neighborhood generation strategy and the local adjustment strategy, are adopted successfully for high-speed searching the optimal conformation corresponds to the lowest energy of the protein sequences. Experimental results show that some of the results obtained by the improved SATS are better than those reported in previous literatures, and we can sure that the lowest energy folding state for short Fibonacci sequences have been found. Conclusions Although the off-lattice models is not very realistic, they can reflect some important characteristics of the realistic protein. It can be found that 3D off-lattice model is more like native folding structure of the realistic protein than 2D off-lattice model. In addition, compared with some previous researches, the proposed hybrid algorithm can more effectively and more quickly search the spatial folding structure of a protein chain. PMID:25474708
Folding behavior of ribosomal protein S6 studied by modified Go¯ -like model
NASA Astrophysics Data System (ADS)
Wu, L.; Zhang, J.; Wang, J.; Li, W. F.; Wang, W.
2007-03-01
Recent experimental and theoretical studies suggest that, although topology is the determinant factor in protein folding, especially for small single-domain proteins, energetic factors also play an important role in the folding process. The ribosomal protein S6 has been subjected to intensive studies. A radical change of the transition state in its circular permutants has been observed, which is believed to be caused by a biased distribution of contact energies. Since the simplistic topology-only Gō -like model is not able to reproduce such an observation, we modify the model by introducing variable contact energies between residues based on their physicochemical properties. The modified Gō -like model can successfully reproduce the Φ -value distributions, folding nucleus, and folding pathways of both the wild-type and circular permutants of S6. Furthermore, by comparing the results of the modified and the simplistic models, we find that the hydrophobic effect constructs the major force that balances the loop entropies. This may indicate that nature maintains the folding cooperativity of this protein by carefully arranging the location of hydrophobic residues in the sequence. Our study reveals a strategy or mechanism used by nature to get out of the dilemma when the native structure, possibly required by biological function, conflicts with folding cooperativity. Finally, the possible relationship between such a design of nature and amyloidosis is also discussed.
RF-Phos: A Novel General Phosphorylation Site Prediction Tool Based on Random Forest.
Ismail, Hamid D; Jones, Ahoi; Kim, Jung H; Newman, Robert H; Kc, Dukka B
2016-01-01
Protein phosphorylation is one of the most widespread regulatory mechanisms in eukaryotes. Over the past decade, phosphorylation site prediction has emerged as an important problem in the field of bioinformatics. Here, we report a new method, termed Random Forest-based Phosphosite predictor 2.0 (RF-Phos 2.0), to predict phosphorylation sites given only the primary amino acid sequence of a protein as input. RF-Phos 2.0, which uses random forest with sequence and structural features, is able to identify putative sites of phosphorylation across many protein families. In side-by-side comparisons based on 10-fold cross validation and an independent dataset, RF-Phos 2.0 compares favorably to other popular mammalian phosphosite prediction methods, such as PhosphoSVM, GPS2.1, and Musite.
NASA Astrophysics Data System (ADS)
Mahmood, Zakaria N.; Mahmuddin, Massudi; Mahmood, Mohammed Nooraldeen
Encoding proteins of amino acid sequence to predict classified into their respective families and subfamilies is important research area. However for a given protein, knowing the exact action whether hormonal, enzymatic, transmembranal or nuclear receptors does not depend solely on amino acid sequence but on the way the amino acid thread folds as well. This study provides a prototype system that able to predict a protein tertiary structure. Several methods are used to develop and evaluate the system to produce better accuracy in protein 3D structure prediction. The Bees Optimization algorithm which inspired from the honey bees food foraging method, is used in the searching phase. In this study, the experiment is conducted on short sequence proteins that have been used by the previous researches using well-known tools. The proposed approach shows a promising result.
Pagan, Rafael F; Massey, Steven E
2014-02-01
Proteins are regarded as being robust to the deleterious effects of mutations. Here, the neutral emergence of mutational robustness in a population of single domain proteins is explored using computer simulations. A pairwise contact model was used to calculate the ΔG of folding (ΔG folding) using the three dimensional protein structure of leech eglin C. A random amino acid sequence with low mutational robustness, defined as the average ΔΔG resulting from a point mutation (ΔΔG average), was threaded onto the structure. A population of 1,000 threaded sequences was evolved under selection for stability, using an upper and lower energy threshold. Under these conditions, mutational robustness increased over time in the most common sequence in the population. In contrast, when the wild type sequence was used it did not show an increase in robustness. This implies that the emergence of mutational robustness is sequence specific and that wild type sequences may be close to maximal robustness. In addition, an inverse relationship between ∆∆G average and protein stability is shown, resulting partly from a larger average effect of point mutations in more stable proteins. The emergence of mutational robustness was also observed in the Escherichia coli colE1 Rop and human CD59 proteins, implying that the property may be common in single domain proteins under certain simulation conditions. The results indicate that at least a portion of mutational robustness in small globular proteins might have arisen by a process of neutral emergence, and could be an example of a beneficial trait that has not been directly selected for, termed a "pseudaptation."
Okuda, A; Imagawa, M; Maeda, Y; Sakai, M; Muramatsu, M
1989-10-05
We have recently identified a typical enhancer, termed GPEI, located about 2.5 kilobases upstream from the transcription initiation site of the rat glutathione transferase P gene. Analyses of 5' and 3' deletion mutants revealed that the cis-acting sequence of GPEI contained the phorbol 12-O-tetradecanoate 13-acetate responsive element (TRE)-like sequence in it. For the maximal activity, however, GPEI required an adjacent upstream sequence of about 19 base pairs in addition to the TRE-like sequence. With the DNA binding gel-shift assay, we could detect protein(s) that specifically binds to the TRE-like sequence of GPEI fragment, which was possibly c-jun.c-fos complex or a similar protein complex. The sequence immediately upstream of the TRE-like sequence did not have any activity by itself, but augmented the latter activity by about 5-fold.
Deciphering the Hidden Informational Content of Protein Sequences
Liu, Ming; Hua, Qing-xin; Hu, Shi-Quan; Jia, Wenhua; Yang, Yanwu; Saith, Sunil Evan; Whittaker, Jonathan; Arvan, Peter; Weiss, Michael A.
2010-01-01
Protein sequences encode both structure and foldability. Whereas the interrelationship of sequence and structure has been extensively investigated, the origins of folding efficiency are enigmatic. We demonstrate that the folding of proinsulin requires a flexible N-terminal hydrophobic residue that is dispensable for the structure, activity, and stability of the mature hormone. This residue (PheB1 in placental mammals) is variably positioned within crystal structures and exhibits 1H NMR motional narrowing in solution. Despite such flexibility, its deletion impaired insulin chain combination and led in cell culture to formation of non-native disulfide isomers with impaired secretion of the variant proinsulin. Cellular folding and secretion were maintained by hydrophobic substitutions at B1 but markedly perturbed by polar or charged side chains. We propose that, during folding, a hydrophobic side chain at B1 anchors transient long-range interactions by a flexible N-terminal arm (residues B1–B8) to mediate kinetic or thermodynamic partitioning among disulfide intermediates. Evidence for the overall contribution of the arm to folding was obtained by alanine scanning mutagenesis. Together, our findings demonstrate that efficient folding of proinsulin requires N-terminal sequences that are dispensable in the native state. Such arm-dependent folding can be abrogated by mutations associated with β-cell dysfunction and neonatal diabetes mellitus. PMID:20663888
Campagne, F; Weinstein, H
1999-01-01
An algorithmic method for drawing residue-based schematic diagrams of proteins on a 2D page is presented and illustrated. The method allows the creation of rendering engines dedicated to a given family of sequences, or fold. The initial implementation provides an engine that can produce a 2D diagram representing secondary structure for any transmembrane protein sequence. We present the details of the strategy for automating the drawing of these diagrams. The most important part of this strategy is the development of an algorithm for laying out residues of a loop that connects to arbitrary points of a 2D plane. As implemented, this algorithm is suitable for real-time modification of the loop layout. This work is of interest for the representation and analysis of data from (1) protein databases, (2) mutagenesis results, or (3) various kinds of protein context-dependent annotations or data.
BLAST and FASTA similarity searching for multiple sequence alignment.
Pearson, William R
2014-01-01
BLAST, FASTA, and other similarity searching programs seek to identify homologous proteins and DNA sequences based on excess sequence similarity. If two sequences share much more similarity than expected by chance, the simplest explanation for the excess similarity is common ancestry-homology. The most effective similarity searches compare protein sequences, rather than DNA sequences, for sequences that encode proteins, and use expectation values, rather than percent identity, to infer homology. The BLAST and FASTA packages of sequence comparison programs provide programs for comparing protein and DNA sequences to protein databases (the most sensitive searches). Protein and translated-DNA comparisons to protein databases routinely allow evolutionary look back times from 1 to 2 billion years; DNA:DNA searches are 5-10-fold less sensitive. BLAST and FASTA can be run on popular web sites, but can also be downloaded and installed on local computers. With local installation, target databases can be customized for the sequence data being characterized. With today's very large protein databases, search sensitivity can also be improved by searching smaller comprehensive databases, for example, a complete protein set from an evolutionarily neighboring model organism. By default, BLAST and FASTA use scoring strategies target for distant evolutionary relationships; for comparisons involving short domains or queries, or searches that seek relatively close homologs (e.g. mouse-human), shallower scoring matrices will be more effective. Both BLAST and FASTA provide very accurate statistical estimates, which can be used to reliably identify protein sequences that diverged more than 2 billion years ago.
Tan, Yen Hock; Huang, He; Kihara, Daisuke
2006-08-15
Aligning distantly related protein sequences is a long-standing problem in bioinformatics, and a key for successful protein structure prediction. Its importance is increasing recently in the context of structural genomics projects because more and more experimentally solved structures are available as templates for protein structure modeling. Toward this end, recent structure prediction methods employ profile-profile alignments, and various ways of aligning two profiles have been developed. More fundamentally, a better amino acid similarity matrix can improve a profile itself; thereby resulting in more accurate profile-profile alignments. Here we have developed novel amino acid similarity matrices from knowledge-based amino acid contact potentials. Contact potentials are used because the contact propensity to the other amino acids would be one of the most conserved features of each position of a protein structure. The derived amino acid similarity matrices are tested on benchmark alignments at three different levels, namely, the family, the superfamily, and the fold level. Compared to BLOSUM45 and the other existing matrices, the contact potential-based matrices perform comparably in the family level alignments, but clearly outperform in the fold level alignments. The contact potential-based matrices perform even better when suboptimal alignments are considered. Comparing the matrices themselves with each other revealed that the contact potential-based matrices are very different from BLOSUM45 and the other matrices, indicating that they are located in a different basin in the amino acid similarity matrix space.
Zawada, James F; Yin, Gang; Steiner, Alexander R; Yang, Junhao; Naresh, Alpana; Roy, Sushmita M; Gold, Daniel S; Heinsohn, Henry G; Murray, Christopher J
2011-01-01
Engineering robust protein production and purification of correctly folded biotherapeutic proteins in cell-based systems is often challenging due to the requirements for maintaining complex cellular networks for cell viability and the need to develop associated downstream processes that reproducibly yield biopharmaceutical products with high product quality. Here, we present an alternative Escherichia coli-based open cell-free synthesis (OCFS) system that is optimized for predictable high-yield protein synthesis and folding at any scale with straightforward downstream purification processes. We describe how the linear scalability of OCFS allows rapid process optimization of parameters affecting extract activation, gene sequence optimization, and redox folding conditions for disulfide bond formation at microliter scales. Efficient and predictable high-level protein production can then be achieved using batch processes in standard bioreactors. We show how a fully bioactive protein produced by OCFS from optimized frozen extract can be purified directly using a streamlined purification process that yields a biologically active cytokine, human granulocyte-macrophage colony-stimulating factor, produced at titers of 700 mg/L in 10 h. These results represent a milestone for in vitro protein synthesis, with potential for the cGMP production of disulfide-bonded biotherapeutic proteins. Biotechnol. Bioeng. 2011; 108:1570–1578. © 2011 Wiley Periodicals, Inc. PMID:21337337
Perczel, András; Jákli, Imre; McAllister, Michael A; Csizmadia, Imre G
2003-06-06
Folding properties of small globular proteins are determined by their amino acid sequence (primary structure). This holds both for local (secondary structure) and for global conformational features of linear polypeptides and proteins composed from natural amino acid derivatives. It thus provides the rational basis of structure prediction algorithms. The shortest secondary structure element, the beta-turn, most typically adopts either a type I or a type II form, depending on the amino acid composition. Herein we investigate the sequence-dependent folding stability of both major types of beta-turns using simple dipeptide models (-Xxx-Yyy-). Gas-phase ab initio properties of 16 carefully selected and suitably protected dipeptide models (for example Val-Ser, Ala-Gly, Ser-Ser) were studied. For each backbone fold most probable side-chain conformers were considered. Fully optimized 321G RHF molecular structures were employed in medium level [B3LYP/6-311++G(d,p)//RHF/3-21G] energy calculations to estimate relative populations of the different backbone conformers. Our results show that the preference for beta-turn forms as calculated by quantum mechanics and observed in Xray determined proteins correlates significantly.
Improving membrane protein expression by optimizing integration efficiency
2017-01-01
The heterologous overexpression of integral membrane proteins in Escherichia coli often yields insufficient quantities of purifiable protein for applications of interest. The current study leverages a recently demonstrated link between co-translational membrane integration efficiency and protein expression levels to predict protein sequence modifications that improve expression. Membrane integration efficiencies, obtained using a coarse-grained simulation approach, robustly predicted effects on expression of the integral membrane protein TatC for a set of 140 sequence modifications, including loop-swap chimeras and single-residue mutations distributed throughout the protein sequence. Mutations that improve simulated integration efficiency were 4-fold enriched with respect to improved experimentally observed expression levels. Furthermore, the effects of double mutations on both simulated integration efficiency and experimentally observed expression levels were cumulative and largely independent, suggesting that multiple mutations can be introduced to yield higher levels of purifiable protein. This work provides a foundation for a general method for the rational overexpression of integral membrane proteins based on computationally simulated membrane integration efficiencies. PMID:28918393
Protein Structure Prediction by Protein Threading
NASA Astrophysics Data System (ADS)
Xu, Ying; Liu, Zhijie; Cai, Liming; Xu, Dong
The seminal work of Bowie, Lüthy, and Eisenberg (Bowie et al., 1991) on "the inverse protein folding problem" laid the foundation of protein structure prediction by protein threading. By using simple measures for fitness of different amino acid types to local structural environments defined in terms of solvent accessibility and protein secondary structure, the authors derived a simple and yet profoundly novel approach to assessing if a protein sequence fits well with a given protein structural fold. Their follow-up work (Elofsson et al., 1996; Fischer and Eisenberg, 1996; Fischer et al., 1996a,b) and the work by Jones, Taylor, and Thornton (Jones et al., 1992) on protein fold recognition led to the development of a new brand of powerful tools for protein structure prediction, which we now term "protein threading." These computational tools have played a key role in extending the utility of all the experimentally solved structures by X-ray crystallography and nuclear magnetic resonance (NMR), providing structural models and functional predictions for many of the proteins encoded in the hundreds of genomes that have been sequenced up to now.
Network-based function prediction and interactomics: the case for metabolic enzymes.
Janga, S C; Díaz-Mejía, J Javier; Moreno-Hagelsieb, G
2011-01-01
As sequencing technologies increase in power, determining the functions of unknown proteins encoded by the DNA sequences so produced becomes a major challenge. Functional annotation is commonly done on the basis of amino-acid sequence similarity alone. Long after sequence similarity becomes undetectable by pair-wise comparison, profile-based identification of homologs can often succeed due to the conservation of position-specific patterns, important for a protein's three dimensional folding and function. Nevertheless, prediction of protein function from homology-driven approaches is not without problems. Homologous proteins might evolve different functions and the power of homology detection has already started to reach its maximum. Computational methods for inferring protein function, which exploit the context of a protein in cellular networks, have come to be built on top of homology-based approaches. These network-based functional inference techniques provide both a first hand hint into a proteins' functional role and offer complementary insights to traditional methods for understanding the function of uncharacterized proteins. Most recent network-based approaches aim to integrate diverse kinds of functional interactions to boost both coverage and confidence level. These techniques not only promise to solve the moonlighting aspect of proteins by annotating proteins with multiple functions, but also increase our understanding on the interplay between different functional classes in a cell. In this article we review the state of the art in network-based function prediction and describe some of the underlying difficulties and successes. Given the volume of high-throughput data that is being reported the time is ripe to employ these network-based approaches, which can be used to unravel the functions of the uncharacterized proteins accumulating in the genomic databases. © 2010 Elsevier Inc. All rights reserved.
Baxa, Michael C.; Yu, Wookyung; Adhikari, Aashish N.; Ge, Liang; Xia, Zhen; Zhou, Ruhong; Freed, Karl F.; Sosnick, Tobin R.
2015-01-01
Experimental and computational folding studies of Proteins L & G and NuG2 typically find that sequence differences determine which of the two hairpins is formed in the transition state ensemble (TSE). However, our recent work on Protein L finds that its TSE contains both hairpins, compelling a reassessment of the influence of sequence on the folding behavior of the other two homologs. We characterize the TSEs for Protein G and NuG2b, a triple mutant of NuG2, using ψ analysis, a method for identifying contacts in the TSE. All three homologs are found to share a common and near-native TSE topology with interactions between all four strands. However, the helical content varies in the TSE, being largely absent in Proteins G & L but partially present in NuG2b. The variability likely arises from competing propensities for the formation of nonnative β turns in the naturally occurring proteins, as observed in our TerItFix folding algorithm. All-atom folding simulations of NuG2b recapitulate the observed TSEs with four strands for 5 of 27 transition paths [Lindorff-Larsen K, Piana S, Dror RO, Shaw DE (2011) Science 334(6055):517–520]. Our data support the view that homologous proteins have similar folding mechanisms, even when nonnative interactions are present in the transition state. These findings emphasize the ongoing challenge of accurately characterizing and predicting TSEs, even for relatively simple proteins. PMID:26100906
Insights in connecting phenotypes in bacteria to coevolutionary information
NASA Astrophysics Data System (ADS)
Cheng, Ryan; Morcos, Faruck; Hayes, Ryan; Helm, Rodney; Levine, Herbert; Onuchic, Jose
It has long been known that protein sequences are far from random. These sequences have been evolutionarily selected to maintain their ability to fold into stable, three-dimensional folded structures as well as their ability to form macromolecular assemblies, perform catalytic functions, etc. For these reasons, there exist quantifiable mutational patterns in the collection of sequence data for a protein family arising from the need to maintain favorable residue-residue interactions to facilitate folding as well as cellular function. Here, we focus on studying the correlated mutational patterns that give rise to interaction specificity in bacterial two-component signaling (TCS) systems. TCS proteins have evolved to be able to preferentially bind and transfer a phosphate group to their signaling partner while avoiding phosphotransfer with non-partners. We infer a Potts model Hamiltonian governing the correlated mutational patterns that are observed in the sequence data of TCS partners and apply this model to recently published in vivo mutational data. Our findings further support the notion that statistical models built from sequence data can be used to predict bacterial phenotypes as well as engineer interaction specificity between non-partner TCS proteins. This research has been supported by the NSF INSPIRE Award (MCB-1241332) and by the CTBP sponsored by the NSF (Grant PHY- 1427654).
An, Na; Fleming, Aaron M.; Middleton, Eric G.; Burrows, Cynthia J.
2014-01-01
Human telomeric DNA consists of tandem repeats of the sequence 5′-TTAGGG-3′ that can fold into various G-quadruplexes, including the hybrid, basket, and propeller folds. In this report, we demonstrate use of the α-hemolysin ion channel to analyze these subtle topological changes at a nanometer scale by providing structure-dependent electrical signatures through DNA–protein interactions. Whereas the dimensions of hybrid and basket folds allowed them to enter the protein vestibule, the propeller fold exceeds the size of the latch region, producing only brief collisions. After attaching a 25-mer poly-2′-deoxyadenosine extension to these structures, unraveling kinetics also were evaluated. Both the locations where the unfolding processes occur and the molecular shapes of the G-quadruplexes play important roles in determining their unfolding profiles. These results provide insights into the application of α-hemolysin as a molecular sieve to differentiate nanostructures as well as the potential technical hurdles DNA secondary structures may present to nanopore technology. PMID:25225404
Hidden Structural Codes in Protein Intrinsic Disorder.
Borkosky, Silvia S; Camporeale, Gabriela; Chemes, Lucía B; Risso, Marikena; Noval, María Gabriela; Sánchez, Ignacio E; Alonso, Leonardo G; de Prat Gay, Gonzalo
2017-10-17
Intrinsic disorder is a major structural category in biology, accounting for more than 30% of coding regions across the domains of life, yet consists of conformational ensembles in equilibrium, a major challenge in protein chemistry. Anciently evolved papillomavirus genomes constitute an unparalleled case for sequence to structure-function correlation in cases in which there are no folded structures. E7, the major transforming oncoprotein of human papillomaviruses, is a paradigmatic example among the intrinsically disordered proteins. Analysis of a large number of sequences of the same viral protein allowed for the identification of a handful of residues with absolute conservation, scattered along the sequence of its N-terminal intrinsically disordered domain, which intriguingly are mostly leucine residues. Mutation of these led to a pronounced increase in both α-helix and β-sheet structural content, reflected by drastic effects on equilibrium propensities and oligomerization kinetics, and uncovers the existence of local structural elements that oppose canonical folding. These folding relays suggest the existence of yet undefined hidden structural codes behind intrinsic disorder in this model protein. Thus, evolution pinpoints conformational hot spots that could have not been identified by direct experimental methods for analyzing or perturbing the equilibrium of an intrinsically disordered protein ensemble.
Aravind, Penmatsa; Wistow, Graeme; Sharma, Yogendra; Sankaranarayanan, Rajan
2008-01-01
βγ-Crystallins belong to a superfamily of proteins in prokaryotes and eukaryotes that are based on duplications of a characteristic, highly conserved Greek Key motif. Most members of the superfamily in vertebrates are structural proteins of the eye lens that contain four motifs arranged as two structural domains. Absent in melanoma-1 (AIM1), an unusual member of the superfamily whose expression is associated with suppression of malignancy in melanoma, contains 12 βγ-crystallin motifs in six domains. Some of these motifs diverge considerably from the canonical motif sequence. AIM1g1, the first βγ-crystallin domain of AIM1, is the most variant of βγ-crystallin domains currently known. In order to understand the limits of sequence variation on the structure, we report the crystal structure of AIM1g1 at 1.9Å resolution. In spite of having changes in key residues, the domain retains the overall βγ-crystallin fold. The domain also contains an unusual extended surface loop that significantly alters the shape of the domain and its charge profile. This structure illustrates the resilience of the βγ fold to considerable sequence changes and its remarkable ability to adapt for novel functions. PMID:18582473
Modeling repetitive, non‐globular proteins
Basu, Koli; Campbell, Robert L.; Guo, Shuaiqi; Sun, Tianjun
2016-01-01
Abstract While ab initio modeling of protein structures is not routine, certain types of proteins are more straightforward to model than others. Proteins with short repetitive sequences typically exhibit repetitive structures. These repetitive sequences can be more amenable to modeling if some information is known about the predominant secondary structure or other key features of the protein sequence. We have successfully built models of a number of repetitive structures with novel folds using knowledge of the consensus sequence within the sequence repeat and an understanding of the likely secondary structures that these may adopt. Our methods for achieving this success are reviewed here. PMID:26914323
NASA Astrophysics Data System (ADS)
Guinn, Emily J.; Jagannathan, Bharat; Marqusee, Susan
2015-04-01
A fundamental question in protein folding is whether proteins fold through one or multiple trajectories. While most experiments indicate a single pathway, simulations suggest proteins can fold through many parallel pathways. Here, we use a combination of chemical denaturant, mechanical force and site-directed mutations to demonstrate the presence of multiple unfolding pathways in a simple, two-state folding protein. We show that these multiple pathways have structurally different transition states, and that seemingly small changes in protein sequence and environment can strongly modulate the flux between the pathways. These results suggest that in vivo, the crowded cellular environment could strongly influence the mechanisms of protein folding and unfolding. Our study resolves the apparent dichotomy between experimental and theoretical studies, and highlights the advantage of using a multipronged approach to reveal the complexities of a protein's free-energy landscape.
Ferreira, Diogo C; van der Linden, Marx G; de Oliveira, Leandro C; Onuchic, José N; de Araújo, Antônio F Pereira
2016-04-01
Recent ab initio folding simulations for a limited number of small proteins have corroborated a previous suggestion that atomic burial information obtainable from sequence could be sufficient for tertiary structure determination when combined to sequence-independent geometrical constraints. Here, we use simulations parameterized by native burials to investigate the required amount of information in a diverse set of globular proteins comprising different structural classes and a wide size range. Burial information is provided by a potential term pushing each atom towards one among a small number L of equiprobable concentric layers. An upper bound for the required information is provided by the minimal number of layers L(min) still compatible with correct folding behavior. We obtain L(min) between 3 and 5 for seven small to medium proteins with 50 ≤ Nr ≤ 110 residues while for a larger protein with Nr = 141 we find that L ≥ 6 is required to maintain native stability. We additionally estimate the usable redundancy for a given L ≥ L(min) from the burial entropy associated to the largest folding-compatible fraction of "superfluous" atoms, for which the burial term can be turned off or target layers can be chosen randomly. The estimated redundancy for small proteins with L = 4 is close to 0.8. Our results are consistent with the above-average quality of burial predictions used in previous simulations and indicate that the fraction of approachable proteins could increase significantly with even a mild, plausible, improvement on sequence-dependent burial prediction or on sequence-independent constraints that augment the detectable redundancy during simulations. © 2016 Wiley Periodicals, Inc.
An alternative view of protein fold space.
Shindyalov, I N; Bourne, P E
2000-02-15
Comparing and subsequently classifying protein structures information has received significant attention concurrent with the increase in the number of experimentally derived 3-dimensional structures. Classification schemes have focused on biological function found within protein domains and on structure classification based on topology. Here an alternative view is presented that groups substructures. Substructures are long (50-150 residue) highly repetitive near-contiguous pieces of polypeptide chain that occur frequently in a set of proteins from the PDB defined as structurally non-redundant over the complete polypeptide chain. The substructure classification is based on a previously reported Combinatorial Extension (CE) algorithm that provides a significantly different set of structure alignments than those previously described, having, for example, only a 40% overlap with FSSP. Qualitatively the algorithm provides longer contiguous aligned segments at the price of a slightly higher root-mean-square deviation (rmsd). Clustering these alignments gives a discreet and highly repetitive set of substructures not detectable by sequence similarity alone. In some cases different substructures represent all or different parts of well known folds indicative of the Russian doll effect--the continuity of protein fold space. In other cases they fall into different structure and functional classifications. It is too early to determine whether these newly classified substructures represent new insights into the evolution of a structural framework important to many proteins. What is apparent from on-going work is that these substructures have the potential to be useful probes in finding remote sequence homology and in structure prediction studies. The characteristics of the complete all-by-all comparison of the polypeptide chains present in the PDB and details of the filtering procedure by pair-wise structure alignment that led to the emergent substructure gallery are discussed. Substructure classification, alignments, and tools to analyze them are available at http://cl.sdsc.edu/ce.html.
muBLASTP: database-indexed protein sequence search on multicore CPUs.
Zhang, Jing; Misra, Sanchit; Wang, Hao; Feng, Wu-Chun
2016-11-04
The Basic Local Alignment Search Tool (BLAST) is a fundamental program in the life sciences that searches databases for sequences that are most similar to a query sequence. Currently, the BLAST algorithm utilizes a query-indexed approach. Although many approaches suggest that sequence search with a database index can achieve much higher throughput (e.g., BLAT, SSAHA, and CAFE), they cannot deliver the same level of sensitivity as the query-indexed BLAST, i.e., NCBI BLAST, or they can only support nucleotide sequence search, e.g., MegaBLAST. Due to different challenges and characteristics between query indexing and database indexing, the existing techniques for query-indexed search cannot be used into database indexed search. muBLASTP, a novel database-indexed BLAST for protein sequence search, delivers identical hits returned to NCBI BLAST. On Intel Haswell multicore CPUs, for a single query, the single-threaded muBLASTP achieves up to a 4.41-fold speedup for alignment stages, and up to a 1.75-fold end-to-end speedup over single-threaded NCBI BLAST. For a batch of queries, the multithreaded muBLASTP achieves up to a 5.7-fold speedups for alignment stages, and up to a 4.56-fold end-to-end speedup over multithreaded NCBI BLAST. With a newly designed index structure for protein database and associated optimizations in BLASTP algorithm, we re-factored BLASTP algorithm for modern multicore processors that achieves much higher throughput with acceptable memory footprint for the database index.
High throughput profile-profile based fold recognition for the entire human proteome.
McGuffin, Liam J; Smith, Richard T; Bryson, Kevin; Sørensen, Søren-Aksel; Jones, David T
2006-06-07
In order to maintain the most comprehensive structural annotation databases we must carry out regular updates for each proteome using the latest profile-profile fold recognition methods. The ability to carry out these updates on demand is necessary to keep pace with the regular updates of sequence and structure databases. Providing the highest quality structural models requires the most intensive profile-profile fold recognition methods running with the very latest available sequence databases and fold libraries. However, running these methods on such a regular basis for every sequenced proteome requires large amounts of processing power. In this paper we describe and benchmark the JYDE (Job Yield Distribution Environment) system, which is a meta-scheduler designed to work above cluster schedulers, such as Sun Grid Engine (SGE) or Condor. We demonstrate the ability of JYDE to distribute the load of genomic-scale fold recognition across multiple independent Grid domains. We use the most recent profile-profile version of our mGenTHREADER software in order to annotate the latest version of the Human proteome against the latest sequence and structure databases in as short a time as possible. We show that our JYDE system is able to scale to large numbers of intensive fold recognition jobs running across several independent computer clusters. Using our JYDE system we have been able to annotate 99.9% of the protein sequences within the Human proteome in less than 24 hours, by harnessing over 500 CPUs from 3 independent Grid domains. This study clearly demonstrates the feasibility of carrying out on demand high quality structural annotations for the proteomes of major eukaryotic organisms. Specifically, we have shown that it is now possible to provide complete regular updates of profile-profile based fold recognition models for entire eukaryotic proteomes, through the use of Grid middleware such as JYDE.
Competing Pathways and Multiple Folding Nuclei in a Large Multidomain Protein, Luciferase.
Scholl, Zackary N; Yang, Weitao; Marszalek, Piotr E
2017-05-09
Proteins obtain their final functional configuration through incremental folding with many intermediate steps in the folding pathway. If known, these intermediate steps could be valuable new targets for designing therapeutics and the sequence of events could elucidate the mechanism of refolding. However, determining these intermediate steps is hardly an easy feat, and has been elusive for most proteins, especially large, multidomain proteins. Here, we effectively map part of the folding pathway for the model large multidomain protein, Luciferase, by combining single-molecule force-spectroscopy experiments and coarse-grained simulation. Single-molecule refolding experiments reveal the initial nucleation of folding while simulations corroborate these stable core structures of Luciferase, and indicate the relative propensities for each to propagate to the final folded native state. Both experimental refolding and Monte Carlo simulations of Markov state models generated from simulation reveal that Luciferase most often folds along a pathway originating from the nucleation of the N-terminal domain, and that this pathway is the least likely to form nonnative structures. We then engineer truncated variants of Luciferase whose sequences corresponded to the putative structure from simulation and we use atomic force spectroscopy to determine their unfolding and stability. These experimental results corroborate the structures predicted from the folding simulation and strongly suggest that they are intermediates along the folding pathway. Taken together, our results suggest that initial Luciferase refolding occurs along a vectorial pathway and also suggest a mechanism that chaperones may exploit to prevent misfolding. Copyright © 2017 Biophysical Society. Published by Elsevier Inc. All rights reserved.
Gibbs motif sampling: detection of bacterial outer membrane protein repeats.
Neuwald, A. F.; Liu, J. S.; Lawrence, C. E.
1995-01-01
The detection and alignment of locally conserved regions (motifs) in multiple sequences can provide insight into protein structure, function, and evolution. A new Gibbs sampling algorithm is described that detects motif-encoding regions in sequences and optimally partitions them into distinct motif models; this is illustrated using a set of immunoglobulin fold proteins. When applied to sequences sharing a single motif, the sampler can be used to classify motif regions into related submodels, as is illustrated using helix-turn-helix DNA-binding proteins. Other statistically based procedures are described for searching a database for sequences matching motifs found by the sampler. When applied to a set of 32 very distantly related bacterial integral outer membrane proteins, the sampler revealed that they share a subtle, repetitive motif. Although BLAST (Altschul SF et al., 1990, J Mol Biol 215:403-410) fails to detect significant pairwise similarity between any of the sequences, the repeats present in these outer membrane proteins, taken as a whole, are highly significant (based on a generally applicable statistical test for motifs described here). Analysis of bacterial porins with known trimeric beta-barrel structure and related proteins reveals a similar repetitive motif corresponding to alternating membrane-spanning beta-strands. These beta-strands occur on the membrane interface (as opposed to the trimeric interface) of the beta-barrel. The broad conservation and structural location of these repeats suggests that they play important functional roles. PMID:8520488
Joseph, Agnel Praveen; Srinivasan, Narayanaswamy; de Brevern, Alexandre G
2012-09-01
Comparison of multiple protein structures has a broad range of applications in the analysis of protein structure, function and evolution. Multiple structure alignment tools (MSTAs) are necessary to obtain a simultaneous comparison of a family of related folds. In this study, we have developed a method for multiple structure comparison largely based on sequence alignment techniques. A widely used Structural Alphabet named Protein Blocks (PBs) was used to transform the information on 3D protein backbone conformation as a 1D sequence string. A progressive alignment strategy similar to CLUSTALW was adopted for multiple PB sequence alignment (mulPBA). Highly similar stretches identified by the pairwise alignments are given higher weights during the alignment. The residue equivalences from PB based alignments are used to obtain a three dimensional fit of the structures followed by an iterative refinement of the structural superposition. Systematic comparisons using benchmark datasets of MSTAs underlines that the alignment quality is better than MULTIPROT, MUSTANG and the alignments in HOMSTRAD, in more than 85% of the cases. Comparison with other rigid-body and flexible MSTAs also indicate that mulPBA alignments are superior to most of the rigid-body MSTAs and highly comparable to the flexible alignment methods. Copyright © 2012 Elsevier Masson SAS. All rights reserved.
Helix formation and stability in membranes.
McKay, Matthew J; Afrose, Fahmida; Koeppe, Roger E; Greathouse, Denise V
2018-02-13
In this article we review current understanding of basic principles for the folding of membrane proteins, focusing on the more abundant alpha-helical class. Membrane proteins, vital to many biological functions and implicated in numerous diseases, fold into their active conformations in the complex environment of the cell bilayer membrane. While many membrane proteins rely on the translocon and chaperone proteins to fold correctly, others can achieve their functional form in the absence of any translation apparatus or other aides. Nevertheless, the spontaneous folding process is not well understood at the molecular level. Recent findings suggest that helix fraying and loop formation may be important for overall structure, dynamics and regulation of function. Several types of membrane helices with ionizable amino acids change their topology with pH. Additionally we note that some peptides, including many that are rich in arginine, and a particular analogue of gramicidin, are able passively to translocate across cell membranes. The findings indicate that a final protein structure in a lipid-bilayer membrane is sequence-based, with lipids contributing to stability and regulation. While much progress has been made toward understanding the folding process for alpha-helical membrane proteins, it remains a work in progress. This article is part of a Special Issue entitled: Emergence of Complex Behavior in Biomembranes edited by Marjorie Longo. Copyright © 2018 Elsevier B.V. All rights reserved.
Predicting protein-binding RNA nucleotides with consideration of binding partners.
Tuvshinjargal, Narankhuu; Lee, Wook; Park, Byungkyu; Han, Kyungsook
2015-06-01
In recent years several computational methods have been developed to predict RNA-binding sites in protein. Most of these methods do not consider interacting partners of a protein, so they predict the same RNA-binding sites for a given protein sequence even if the protein binds to different RNAs. Unlike the problem of predicting RNA-binding sites in protein, the problem of predicting protein-binding sites in RNA has received little attention mainly because it is much more difficult and shows a lower accuracy on average. In our previous study, we developed a method that predicts protein-binding nucleotides from an RNA sequence. In an effort to improve the prediction accuracy and usefulness of the previous method, we developed a new method that uses both RNA and protein sequence data. In this study, we identified effective features of RNA and protein molecules and developed a new support vector machine (SVM) model to predict protein-binding nucleotides from RNA and protein sequence data. The new model that used both protein and RNA sequence data achieved a sensitivity of 86.5%, a specificity of 86.2%, a positive predictive value (PPV) of 72.6%, a negative predictive value (NPV) of 93.8% and Matthews correlation coefficient (MCC) of 0.69 in a 10-fold cross validation; it achieved a sensitivity of 58.8%, a specificity of 87.4%, a PPV of 65.1%, a NPV of 84.2% and MCC of 0.48 in independent testing. For comparative purpose, we built another prediction model that used RNA sequence data alone and ran it on the same dataset. In a 10 fold-cross validation it achieved a sensitivity of 85.7%, a specificity of 80.5%, a PPV of 67.7%, a NPV of 92.2% and MCC of 0.63; in independent testing it achieved a sensitivity of 67.7%, a specificity of 78.8%, a PPV of 57.6%, a NPV of 85.2% and MCC of 0.45. In both cross-validations and independent testing, the new model that used both RNA and protein sequences showed a better performance than the model that used RNA sequence data alone in most performance measures. To the best of our knowledge, this is the first sequence-based prediction of protein-binding nucleotides in RNA which considers the binding partner of RNA. The new model will provide valuable information for designing biochemical experiments to find putative protein-binding sites in RNA with unknown structure. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
Kavianpour, Hamidreza; Vasighi, Mahdi
2017-02-01
Nowadays, having knowledge about cellular attributes of proteins has an important role in pharmacy, medical science and molecular biology. These attributes are closely correlated with the function and three-dimensional structure of proteins. Knowledge of protein structural class is used by various methods for better understanding the protein functionality and folding patterns. Computational methods and intelligence systems can have an important role in performing structural classification of proteins. Most of protein sequences are saved in databanks as characters and strings and a numerical representation is essential for applying machine learning methods. In this work, a binary representation of protein sequences is introduced based on reduced amino acids alphabets according to surrounding hydrophobicity index. Many important features which are hidden in these long binary sequences can be clearly displayed through their cellular automata images. The extracted features from these images are used to build a classification model by support vector machine. Comparing to previous studies on the several benchmark datasets, the promising classification rates obtained by tenfold cross-validation imply that the current approach can help in revealing some inherent features deeply hidden in protein sequences and improve the quality of predicting protein structural class.
Hydrophobic folding units derived from dissimilar monomer structures and their interactions.
Tsai, C J; Nussinov, R
1997-01-01
We have designed an automated procedure to cut a protein into compact hydrophobic folding units. The hydrophobic units are large enough to contain tertiary non-local interactions, reflecting potential nucleation sites during protein folding. The quality of a hydrophobic folding unit is evaluated by four criteria. The first two correspond to visual characterization of a structural domain, namely, compactness and extent of isolation. We use the definition of Zehfus and Rose (Zehfus MH, Rose GD, 1986, Biochemistry 25:35-340) to calculate the compactness of a cut protein unit. The isolation of a unit is based on the solvent accessible surface area (ASA) originally buried in the interior and exposed to the solvent after cutting. The third quantity is the hydrophobicity, equivalent to the fraction of the buried non-polar ASA with respect to the total non-polar ASA. The last criterion in the evaluation of a folding unit is the number of segments it includes. To conform with the rationale of obtaining hydrophobic units, which may relate to early folding events, the hydrophobic interactions are implicitly and explicitly applied in their generation and assessment. We follow Holm and Sander (Holm L, Sander C, 1994, Proteins 19:256-268) to reduce the multiple cutting-point problem to a one-dimensional search for all reasonable trial cuts. However, as here we focus on the hydrophobic cores, the contact matrix used to obtain the first non-trivial eigenvector contains only hydrophobic contracts, rather than all, hydrophobic and hydrophilic, interactions. This dataset of hydrophobic folding units, derived from structurally dissimilar single chain monomers, is particularly useful for investigations of the mechanism of protein folding. For cases where there are kinetic data, the one or more hydrophobic folding units generated for a protein correlate with the two or with the three-state folding process observed. We carry out extensive amino acid sequence order independent structural comparisons to generate a structurally non-redundant set of hydrophobic folding units for fold recognition and for statistical purposes.
Zou, J; Saven, J G
2000-02-11
A self-consistent theory is presented that can be used to estimate the number and composition of sequences satisfying a predetermined set of constraints. The theory is formulated so as to examine the features of sequences having a particular value of Delta=E(f)-
An, Ji-Yong; Meng, Fan-Rong; You, Zhu-Hong; Fang, Yu-Hong; Zhao, Yu-Jun; Zhang, Ming
2016-01-01
We propose a novel computational method known as RVM-LPQ that combines the Relevance Vector Machine (RVM) model and Local Phase Quantization (LPQ) to predict PPIs from protein sequences. The main improvements are the results of representing protein sequences using the LPQ feature representation on a Position Specific Scoring Matrix (PSSM), reducing the influence of noise using a Principal Component Analysis (PCA), and using a Relevance Vector Machine (RVM) based classifier. We perform 5-fold cross-validation experiments on Yeast and Human datasets, and we achieve very high accuracies of 92.65% and 97.62%, respectively, which is significantly better than previous works. To further evaluate the proposed method, we compare it with the state-of-the-art support vector machine (SVM) classifier on the Yeast dataset. The experimental results demonstrate that our RVM-LPQ method is obviously better than the SVM-based method. The promising experimental results show the efficiency and simplicity of the proposed method, which can be an automatic decision support tool for future proteomics research.
RNAiFold 2.0: a web server and software to design custom and Rfam-based RNA molecules.
Garcia-Martin, Juan Antonio; Dotu, Ivan; Clote, Peter
2015-07-01
Several algorithms for RNA inverse folding have been used to design synthetic riboswitches, ribozymes and thermoswitches, whose activity has been experimentally validated. The RNAiFold software is unique among approaches for inverse folding in that (exhaustive) constraint programming is used instead of heuristic methods. For that reason, RNAiFold can generate all sequences that fold into the target structure or determine that there is no solution. RNAiFold 2.0 is a complete overhaul of RNAiFold 1.0, rewritten from the now defunct COMET language to C++. The new code properly extends the capabilities of its predecessor by providing a user-friendly pipeline to design synthetic constructs having the functionality of given Rfam families. In addition, the new software supports amino acid constraints, even for proteins translated in different reading frames from overlapping coding sequences; moreover, structure compatibility/incompatibility constraints have been expanded. With these features, RNAiFold 2.0 allows the user to design single RNA molecules as well as hybridization complexes of two RNA molecules. the web server, source code and linux binaries are publicly accessible at http://bioinformatics.bc.edu/clotelab/RNAiFold2.0. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
Koschmieder, Julian; Brausemann, Anton; Drepper, Friedel; Rodriguez-Franco, Marta; Ghisla, Sandro; Warscheid, Bettina; Einsle, Oliver; Beyer, Peter
2015-01-01
Recombinant phytoene desaturase (PDS-His6) from rice was purified to near-homogeneity and shown to be enzymatically active in a biphasic, liposome-based assay system. The protein contains FAD as the sole protein-bound redox-cofactor. Benzoquinones, not replaceable by molecular oxygen, serve as a final electron acceptor defining PDS as a 15-cis-phytoene (donor):plastoquinone oxidoreductase. The herbicidal PDS-inhibitor norflurazon is capable of arresting the reaction by stabilizing the intermediary FADred, while an excess of the quinone acceptor relieves this blockage, indicating competition. The enzyme requires its homo-oligomeric association for activity. The sum of data collected through gel permeation chromatography, non-denaturing polyacrylamide electrophoresis, chemical cross-linking, mass spectrometry and electron microscopy techniques indicate that the high-order oligomers formed in solution are the basis for an active preparation. Of these, a tetramer consisting of dimers represents the active unit. This is corroborated by our preliminary X-ray structural analysis that also revealed similarities of the protein fold with the sequence-inhomologous bacterial phytoene desaturase CRTI and other oxidoreductases of the GR2-family of flavoproteins. This points to an evolutionary relatedness of CRTI and PDS yielding different carotene desaturation sequences based on homologous protein folds. PMID:26147209
Bakhtiarizadeh, Mohammad Reza; Moradi-Shahrbabak, Mohammad; Ebrahimi, Mansour; Ebrahimie, Esmaeil
2014-09-07
Due to the central roles of lipid binding proteins (LBPs) in many biological processes, sequence based identification of LBPs is of great interest. The major challenge is that LBPs are diverse in sequence, structure, and function which results in low accuracy of sequence homology based methods. Therefore, there is a need for developing alternative functional prediction methods irrespective of sequence similarity. To identify LBPs from non-LBPs, the performances of support vector machine (SVM) and neural network were compared in this study. Comprehensive protein features and various techniques were employed to create datasets. Five-fold cross-validation (CV) and independent evaluation (IE) tests were used to assess the validity of the two methods. The results indicated that SVM outperforms neural network. SVM achieved 89.28% (CV) and 89.55% (IE) overall accuracy in identification of LBPs from non-LBPs and 92.06% (CV) and 92.90% (IE) (in average) for classification of different LBPs classes. Increasing the number and the range of extracted protein features as well as optimization of the SVM parameters significantly increased the efficiency of LBPs class prediction in comparison to the only previous report in this field. Altogether, the results showed that the SVM algorithm can be run on broad, computationally calculated protein features and offers a promising tool in detection of LBPs classes. The proposed approach has the potential to integrate and improve the common sequence alignment based methods. Copyright © 2014 Elsevier Ltd. All rights reserved.
How Does Your Protein Fold? Elucidating the Apomyoglobin Folding Pathway
Dyson, H. Jane; Wright, Peter E.
2017-01-01
Conspectus Although each type of protein fold and in some cases individual proteins within a fold classification can have very different mechanisms of folding, the underlying biophysical and biochemical principles that operate to cause a linear polypeptide chain to fold into a globular structure must be the same. In an aqueous solution, the protein takes up the thermodynamically most stable structure, but the pathway along which the polypeptide proceeds in order to reach that structure is a function of the amino acid sequence, which must be the final determining factor, not only in shaping the final folded structure, but in dictating the folding pathway. A number of groups have focused on a single protein or group of proteins, to determine in detail the factors that influence the rate and mechanism of folding in a defined system, with the hope that hypothesis-driven experiments can elucidate the underlying principles governing the folding process. Our research group has focused on the folding of the globin family of proteins, and in particular on the monomeric protein apomyoglobin. Apomyoglobin (apoMb) folds relatively slowly (~2 seconds) via an ensemble of obligatory intermediates that form rapidly after the initiation of folding. The folding pathway can be dissected using rapid-mixing techniques, which can probe processes in the millisecond time range. Stopped-flow measurements detected by circular dichroism (CD) or fluorescence spectroscopy give information on the rates of folding events. Quench-flow experiments utilize the differential rates of hydrogen-deuterium exchange of amide protons protected in parts of the structure that are folded early; protection of amides can be detected by mass spectrometry or proton nuclear magnetic resonance spectroscopy (NMR). In addition, apoMb forms an intermediate at equilibrium at pH ~ 4, which is sufficiently stable for it to be structurally characterized by solution methods such as CD, fluorescence and NMR spectroscopies, and the conformational ensembles formed in the presence of denaturing agents and low pH can be characterized as models for the unfolded states of the protein. Newer NMR techniques such as measurement of residual dipolar couplings in the various partly folded states, and relaxation dispersion measurements to probe invisible states present at low concentrations, have contributed to providing a detailed picture of the apomyoglobin folding pathway. The research summarized in this review was aimed at characterizing and comparing the equilibrium and kinetic intermediates both structurally and dynamically, as well as delineating the complete folding pathway at a residue-specific level, in order to answer the question “What is it about the amino acid sequence that causes each molecule in the unfolded protein ensemble to start folding, and, once started, to proceed towards the formation of the correctly folded three-dimensional structure?” PMID:28032989
Common fold in helix–hairpin–helix proteins
Shao, Xuguang; Grishin, Nick V.
2000-01-01
Helix–hairpin–helix (HhH) is a widespread motif involved in non-sequence-specific DNA binding. The majority of HhH motifs function as DNA-binding modules, however, some of them are used to mediate protein–protein interactions or have acquired enzymatic activity by incorporating catalytic residues (DNA glycosylases). From sequence and structural analysis of HhH-containing proteins we conclude that most HhH motifs are integrated as a part of a five-helical domain, termed (HhH)2 domain here. It typically consists of two consecutive HhH motifs that are linked by a connector helix and displays pseudo-2-fold symmetry. (HhH)2 domains show clear structural integrity and a conserved hydrophobic core composed of seven residues, one residue from each α-helix and each hairpin, and deserves recognition as a distinct protein fold. In addition to known HhH in the structures of RuvA, RadA, MutY and DNA-polymerases, we have detected new HhH motifs in sterile alpha motif and barrier-to-autointegration factor domains, the α-subunit of Escherichia coli RNA-polymerase, DNA-helicase PcrA and DNA glycosylases. Statistically significant sequence similarity of HhH motifs and pronounced structural conservation argue for homology between (HhH)2 domains in different protein families. Our analysis helps to clarify how non-symmetric protein motifs bind to the double helix of DNA through the formation of a pseudo-2-fold symmetric (HhH)2 functional unit. PMID:10908318
GeneSilico protein structure prediction meta-server.
Kurowski, Michal A; Bujnicki, Janusz M
2003-07-01
Rigorous assessments of protein structure prediction have demonstrated that fold recognition methods can identify remote similarities between proteins when standard sequence search methods fail. It has been shown that the accuracy of predictions is improved when refined multiple sequence alignments are used instead of single sequences and if different methods are combined to generate a consensus model. There are several meta-servers available that integrate protein structure predictions performed by various methods, but they do not allow for submission of user-defined multiple sequence alignments and they seldom offer confidentiality of the results. We developed a novel WWW gateway for protein structure prediction, which combines the useful features of other meta-servers available, but with much greater flexibility of the input. The user may submit an amino acid sequence or a multiple sequence alignment to a set of methods for primary, secondary and tertiary structure prediction. Fold-recognition results (target-template alignments) are converted into full-atom 3D models and the quality of these models is uniformly assessed. A consensus between different FR methods is also inferred. The results are conveniently presented on-line on a single web page over a secure, password-protected connection. The GeneSilico protein structure prediction meta-server is freely available for academic users at http://genesilico.pl/meta.
GeneSilico protein structure prediction meta-server
Kurowski, Michal A.; Bujnicki, Janusz M.
2003-01-01
Rigorous assessments of protein structure prediction have demonstrated that fold recognition methods can identify remote similarities between proteins when standard sequence search methods fail. It has been shown that the accuracy of predictions is improved when refined multiple sequence alignments are used instead of single sequences and if different methods are combined to generate a consensus model. There are several meta-servers available that integrate protein structure predictions performed by various methods, but they do not allow for submission of user-defined multiple sequence alignments and they seldom offer confidentiality of the results. We developed a novel WWW gateway for protein structure prediction, which combines the useful features of other meta-servers available, but with much greater flexibility of the input. The user may submit an amino acid sequence or a multiple sequence alignment to a set of methods for primary, secondary and tertiary structure prediction. Fold-recognition results (target-template alignments) are converted into full-atom 3D models and the quality of these models is uniformly assessed. A consensus between different FR methods is also inferred. The results are conveniently presented on-line on a single web page over a secure, password-protected connection. The GeneSilico protein structure prediction meta-server is freely available for academic users at http://genesilico.pl/meta. PMID:12824313
Minimal model for the secondary structures and conformational conversions in proteins
NASA Astrophysics Data System (ADS)
Imamura, Hideo
Better understanding of protein folding process can provide physical insights on the function of proteins and makes it possible to benefit from genetic information accumulated so far. Protein folding process normally takes place in less than seconds but even seconds are beyond reach of current computational power for simulations on a system of all-atom detail. Hence, to model and explore protein folding process it is crucial to construct a proper model that can adequately describe the physical process and mechanism for the relevant time scale. We discuss the reduced off-lattice model that can express _-helix and ?-hairpin conformations defined solely by a given sequence in order to investigate a protein folding mechanism of conformations such as a ?-hairpin and also to investigate conformational conversions in proteins. The first two chapters introduce and review essential concepts in protein folding modelling physical interaction in proteins, various simple models, and also review computational methods, in particular, the Metropolis Monte Carlo method, its dynamic interpretation and thermodynamic Monte Carlo algorithms. Chapter 3 describes the minimalist model that represents both _-helix and ?-sheet conformations using simple potentials. The native conformation can be specified by the sequence without particular conformational biases to a reference state. In Chapter 4, the model is used to investigate the folding mechanism of ?-hairpins exhaustively using the dynamic Monte Carlo and a thermodynamic Monte Carlo method an effcient combination of the multicanonical Monte Carlo and the weighted histogram analysis method. We show that the major folding pathways and folding rate depend on the location of a hydrophobic. The conformational conversions between _-helix and ?-sheet conformations are examined in Chapter 5 and 6. First, the conformational conversion due to mutation in a non-hydrophobic system and then the conformational conversion due to mutation with a hydrophobic pair at a different position at various temperatures are examined.
Bingle, Colin D; Seal, Ruth L; Craven, C Jeremy
2011-08-01
We present the BPIFAn/BPIFBn systematic nomenclature for the PLUNC (palate lung and nasal epithelium clone)/PSP (parotid secretory protein)/BSP30 (bovine salivary protein 30)/SMGB (submandibular gland protein B) family of proteins, based on an adaptation of the SPLUNCn (short PLUNCn)/LPLUNCn (large PLUNCn) nomenclature. The nomenclature is applied to a set of 102 sequences which we believe represent the current reliable data for BPIFA/BPIFB proteins across all species, including marsupials and birds. The nomenclature will be implemented by the HGNC (HUGO Gene Nomenclature Committee).
Jia, Yi; Huan, Jun; Buhr, Vincent; Zhang, Jintao; Carayannopoulos, Leonidas N
2009-01-01
Background Automatic identification of structure fingerprints from a group of diverse protein structures is challenging, especially for proteins whose divergent amino acid sequences may fall into the "twilight-" or "midnight-" zones where pair-wise sequence identities to known sequences fall below 25% and sequence-based functional annotations often fail. Results Here we report a novel graph database mining method and demonstrate its application to protein structure pattern identification and structure classification. The biologic motivation of our study is to recognize common structure patterns in "immunoevasins", proteins mediating virus evasion of host immune defense. Our experimental study, using both viral and non-viral proteins, demonstrates the efficiency and efficacy of the proposed method. Conclusion We present a theoretic framework, offer a practical software implementation for incorporating prior domain knowledge, such as substitution matrices as studied here, and devise an efficient algorithm to identify approximate matched frequent subgraphs. By doing so, we significantly expanded the analytical power of sophisticated data mining algorithms in dealing with large volume of complicated and noisy protein structure data. And without loss of generality, choice of appropriate compatibility matrices allows our method to be easily employed in domains where subgraph labels have some uncertainty. PMID:19208148
Zhang, Yun; Liu, Fang; Nie, Jinfang; Jiang, Fuyang; Zhou, Caibin; Yang, Jiani; Fan, Jinlong; Li, Jianping
2014-05-07
In this paper, we report for the first time an electrochemical biosensor for single-step, reagentless, and picomolar detection of a sequence-specific DNA-binding protein using a double-stranded, electrode-bound DNA probe terminally modified with a redox active label close to the electrode surface. This new methodology is based upon local repression of electrolyte diffusion associated with protein-DNA binding that leads to reduction of the electrochemical response of the label. In the proof-of-concept study, the resulting electrochemical biosensor was quantitatively sensitive to the concentrations of the TATA binding protein (TBP, a model analyte) ranging from 40 pM to 25.4 nM with an estimated detection limit of ∼10.6 pM (∼80 to 400-fold improvement on the detection limit over previous electrochemical analytical systems).
The hypothetical protein Atu4866 from Agrobacterium tumefaciens adopts a streptavidin-like fold
Ai, Xuanjun; Semesi, Anthony; Yee, Adelinda; Arrowsmith, Cheryl H.; Choy, Wing-Yiu; Li, Shawn S.C.
2008-01-01
Atu4866 is a 79-residue conserved hypothetical protein of unknown function from Agrobacterium tumefaciens. Protein sequence alignments show that it shares ≥60% sequence identity with 20 other hypothetical proteins of bacterial origin. However, the structures and functions of these proteins remain unknown so far. To gain insight into the function of this family of proteins, we have determined the structure of Atu4866 as a target of a structural genomics project using solution NMR spectroscopy. Our results reveal that Atu4866 adopts a streptavidin-like fold featuring a β-barrel/sandwich formed by eight antiparallel β-strands. Further structural analysis identified a continuous patch of conserved residues on the surface of Atu4866 that may constitute a potential ligand-binding site. PMID:18042676
Zbilut, Joseph P.; Colosimo, Alfredo; Conti, Filippo; Colafranceschi, Mauro; Manetti, Cesare; Valerio, MariaCristina; Webber, Charles L.; Giuliani, Alessandro
2003-01-01
The problem of protein folding vs. aggregation was investigated in acylphosphatase and the amyloid protein Aβ(1–40) by means of nonlinear signal analysis of their chain hydrophobicity. Numerical descriptors of recurrence patterns provided the basis for statistical evaluation of folding/aggregation distinctive features. Static and dynamic approaches were used to elucidate conditions coincident with folding vs. aggregation using comparisons with known protein secondary structure classifications, site-directed mutagenesis studies of acylphosphatase, and molecular dynamics simulations of amyloid protein, Aβ(1–40). The results suggest that a feature derived from principal component space characterized by the smoothness of singular, deterministic hydrophobicity patches plays a significant role in the conditions governing protein aggregation. PMID:14645049
MODBASE, a database of annotated comparative protein structure models
Pieper, Ursula; Eswar, Narayanan; Stuart, Ashley C.; Ilyin, Valentin A.; Sali, Andrej
2002-01-01
MODBASE (http://guitar.rockefeller.edu/modbase) is a relational database of annotated comparative protein structure models for all available protein sequences matched to at least one known protein structure. The models are calculated by MODPIPE, an automated modeling pipeline that relies on PSI-BLAST, IMPALA and MODELLER. MODBASE uses the MySQL relational database management system for flexible and efficient querying, and the MODVIEW Netscape plugin for viewing and manipulating multiple sequences and structures. It is updated regularly to reflect the growth of the protein sequence and structure databases, as well as improvements in the software for calculating the models. For ease of access, MODBASE is organized into different datasets. The largest dataset contains models for domains in 304 517 out of 539 171 unique protein sequences in the complete TrEMBL database (23 March 2001); only models based on significant alignments (PSI-BLAST E-value < 10–4) and models assessed to have the correct fold are included. Other datasets include models for target selection and structure-based annotation by the New York Structural Genomics Research Consortium, models for prediction of genes in the Drosophila melanogaster genome, models for structure determination of several ribosomal particles and models calculated by the MODWEB comparative modeling web server. PMID:11752309
Theoretical Insights into the Biophysics of Protein Bi-stability and Evolutionary Switches
Krobath, Heinrich; Chan, Hue Sun
2016-01-01
Deciphering the effects of nonsynonymous mutations on protein structure is central to many areas of biomedical research and is of fundamental importance to the study of molecular evolution. Much of the investigation of protein evolution has focused on mutations that leave a protein’s folded structure essentially unchanged. However, to evolve novel folds of proteins, mutations that lead to large conformational modifications have to be involved. Unraveling the basic biophysics of such mutations is a challenge to theory, especially when only one or two amino acid substitutions cause a large-scale conformational switch. Among the few such mutational switches identified experimentally, the one between the GA all-α and GB α+β folds is extensively characterized; but all-atom simulations using fully transferrable potentials have not been able to account for this striking switching behavior. Here we introduce an explicit-chain model that combines structure-based native biases for multiple alternative structures with a general physical atomic force field, and apply this construct to twelve mutants spanning the sequence variation between GA and GB. In agreement with experiment, we observe conformational switching from GA to GB upon a single L45Y substitution in the GA98 mutant. In line with the latent evolutionary potential concept, our model shows a gradual sequence-dependent change in fold preference in the mutants before this switch. Our analysis also indicates that a sharp GA/GB switch may arise from the orientation dependence of aromatic π-interactions. These findings provide physical insights toward rationalizing, predicting and designing evolutionary conformational switches. PMID:27253392
Comparative Protein Structure Modeling Using MODELLER
Webb, Benjamin; Sali, Andrej
2016-01-01
Comparative protein structure modeling predicts the three-dimensional structure of a given protein sequence (target) based primarily on its alignment to one or more proteins of known structure (templates). The prediction process consists of fold assignment, target-template alignment, model building, and model evaluation. This unit describes how to calculate comparative models using the program MODELLER and how to use the ModBase database of such models, and discusses all four steps of comparative modeling, frequently observed errors, and some applications. Modeling lactate dehydrogenase from Trichomonas vaginalis (TvLDH) is described as an example. The download and installation of the MODELLER software is also described. PMID:27322406
Solis, Armando D
2015-12-01
To reduce complexity, understand generalized rules of protein folding, and facilitate de novo protein design, the 20-letter amino acid alphabet is commonly reduced to a smaller alphabet by clustering amino acids based on some measure of similarity. In this work, we seek the optimal alphabet that preserves as much of the structural information found in long-range (contact) interactions among amino acids in natively-folded proteins. We employ the Information Maximization Device, based on information theory, to partition the amino acids into well-defined clusters. Numbering from 2 to 19 groups, these optimal clusters of amino acids, while generated automatically, embody well-known properties of amino acids such as hydrophobicity/polarity, charge, size, and aromaticity, and are demonstrated to maintain the discriminative power of long-range interactions with minimal loss of mutual information. Our measurements suggest that reduced alphabets (of less than 10) are able to capture virtually all of the information residing in native contacts and may be sufficient for fold recognition, as demonstrated by extensive threading tests. In an expansive survey of the literature, we observe that alphabets derived from various approaches-including those derived from physicochemical intuition, local structure considerations, and sequence alignments of remote homologs-fare consistently well in preserving contact interaction information, highlighting a convergence in the various factors thought to be relevant to the folding code. Moreover, we find that alphabets commonly used in experimental protein design are nearly optimal and are largely coherent with observations that have arisen in this work. © 2015 Wiley Periodicals, Inc.
Stegemann, Björn; Klebe, Gerhard
2012-02-01
Small molecules are recognized in protein-binding pockets through surface-exposed physicochemical properties. To optimize binding, they have to adopt a conformation corresponding to a local energy minimum within the formed protein-ligand complex. However, their conformational flexibility makes them competent to bind not only to homologous proteins of the same family but also to proteins of remote similarity with respect to the shape of the binding pockets and folding pattern. Considering drug action, such observations can give rise to unexpected and undesired cross reactivity. In this study, datasets of six different cofactors (ADP, ATP, NAD(P)(H), FAD, and acetyl CoA, sharing an adenosine diphosphate moiety as common substructure), observed in multiple crystal structures of protein-cofactor complexes exhibiting sequence identity below 25%, have been analyzed for the conformational properties of the bound ligands, the distribution of physicochemical properties in the accommodating protein-binding pockets, and the local folding patterns next to the cofactor-binding site. State-of-the-art clustering techniques have been applied to group the different protein-cofactor complexes in the different spaces. Interestingly, clustering in cavity (Cavbase) and fold space (DALI) reveals virtually the same data structuring. Remarkable relationships can be found among the different spaces. They provide information on how conformations are conserved across the host proteins and which distinct local cavity and fold motifs recognize the different portions of the cofactors. In those cases, where different cofactors are found to be accommodated in a similar fashion to the same fold motifs, only a commonly shared substructure of the cofactors is used for the recognition process. Copyright © 2011 Wiley Periodicals, Inc.
Xu, Dong; Zhang, Yang
2012-07-01
Ab initio protein folding is one of the major unsolved problems in computational biology owing to the difficulties in force field design and conformational search. We developed a novel program, QUARK, for template-free protein structure prediction. Query sequences are first broken into fragments of 1-20 residues where multiple fragment structures are retrieved at each position from unrelated experimental structures. Full-length structure models are then assembled from fragments using replica-exchange Monte Carlo simulations, which are guided by a composite knowledge-based force field. A number of novel energy terms and Monte Carlo movements are introduced and the particular contributions to enhancing the efficiency of both force field and search engine are analyzed in detail. QUARK prediction procedure is depicted and tested on the structure modeling of 145 nonhomologous proteins. Although no global templates are used and all fragments from experimental structures with template modeling score >0.5 are excluded, QUARK can successfully construct 3D models of correct folds in one-third cases of short proteins up to 100 residues. In the ninth community-wide Critical Assessment of protein Structure Prediction experiment, QUARK server outperformed the second and third best servers by 18 and 47% based on the cumulative Z-score of global distance test-total scores in the FM category. Although ab initio protein folding remains a significant challenge, these data demonstrate new progress toward the solution of the most important problem in the field. Copyright © 2012 Wiley Periodicals, Inc.
De Jaco, Antonella; Dubi, Noga; Camp, Shelley; Taylor, Palmer
2017-01-01
The α/β-hydrolase fold superfamily of proteins is composed of structurally related members that, despite great diversity in their catalytic, recognition, adhesion and chaperone functions, share a common fold governed by homologous residues and conserved disulfide bridges. Non-synonymous single nucleotide polymorphisms within the α/β-hydrolase fold domain in various family members have been found for congenital endocrine, metabolic and nervous system disorders. By examining the amino acid sequence from the various proteins, mutations were found to be prevalent in conserved residues within the α/β-hydrolase fold of the homologous proteins. This is the case for the thyroglobulin mutations linked to congenital hypothyroidism. To address whether correct folding of the common domain is required for protein export, we inserted the thyroglobulin mutations at homologous positions in two correlated but simpler α/β-hydrolase fold proteins known to be exported to the cell surface: neuroligin3 and acetylcholinesterase. Here we show that these mutations in the cholinesterase homologous region alter the folding properties of the α/β-hydrolase fold domain, which are reflected in defects in protein trafficking, folding and function, and ultimately result in retention of the partially processed proteins in the endoplasmic reticulum. Accordingly, mutations at conserved residues may be transferred amongst homologous proteins to produce common processing defects despite disparate functions, protein complexity and tissue-specific expression of the homologous proteins. More importantly, a similar assembly of the α/β-hydrolase fold domain tertiary structure among homologous members of the superfamily is required for correct trafficking of the proteins to their final destination. PMID:23035660
Statistical mechanics of simple models of protein folding and design.
Pande, V S; Grosberg, A Y; Tanaka, T
1997-01-01
It is now believed that the primary equilibrium aspects of simple models of protein folding are understood theoretically. However, current theories often resort to rather heavy mathematics to overcome some technical difficulties inherent in the problem or start from a phenomenological model. To this end, we take a new approach in this pedagogical review of the statistical mechanics of protein folding. The benefit of our approach is a drastic mathematical simplification of the theory, without resort to any new approximations or phenomenological prescriptions. Indeed, the results we obtain agree precisely with previous calculations. Because of this simplification, we are able to present here a thorough and self contained treatment of the problem. Topics discussed include the statistical mechanics of the random energy model (REM), tests of the validity of REM as a model for heteropolymer freezing, freezing transition of random sequences, phase diagram of designed ("minimally frustrated") sequences, and the degree to which errors in the interactions employed in simulations of either folding and design can still lead to correct folding behavior. Images FIGURE 2 FIGURE 3 FIGURE 4 FIGURE 6 PMID:9414231
Contingency Table Browser - prediction of early stage protein structure.
Kalinowska, Barbara; Krzykalski, Artur; Roterman, Irena
2015-01-01
The Early Stage (ES) intermediate represents the starting structure in protein folding simulations based on the Fuzzy Oil Drop (FOD) model. The accuracy of FOD predictions is greatly dependent on the accuracy of the chosen intermediate. A suitable intermediate can be constructed using the sequence-structure relationship information contained in the so-called contingency table - this table expresses the likelihood of encountering various structural motifs for each tetrapeptide fragment in the amino acid sequence. The limited accuracy with which such structures could previously be predicted provided the motivation for a more indepth study of the contingency table itself. The Contingency Table Browser is a tool which can visualize, search and analyze the table. Our work presents possible applications of Contingency Table Browser, among them - analysis of specific protein sequences from the point of view of their structural ambiguity.
2012-01-01
Background The NCBI Conserved Domain Database (CDD) consists of a collection of multiple sequence alignments of protein domains that are at various stages of being manually curated into evolutionary hierarchies based on conserved and divergent sequence and structural features. These domain models are annotated to provide insights into the relationships between sequence, structure and function via web-based BLAST searches. Results Here we automate the generation of conserved domain (CD) hierarchies using a combination of heuristic and Markov chain Monte Carlo (MCMC) sampling procedures and starting from a (typically very large) multiple sequence alignment. This procedure relies on statistical criteria to define each hierarchy based on the conserved and divergent sequence patterns associated with protein functional-specialization. At the same time this facilitates the sequence and structural annotation of residues that are functionally important. These statistical criteria also provide a means to objectively assess the quality of CD hierarchies, a non-trivial task considering that the protein subgroups are often very distantly related—a situation in which standard phylogenetic methods can be unreliable. Our aim here is to automatically generate (typically sub-optimal) hierarchies that, based on statistical criteria and visual comparisons, are comparable to manually curated hierarchies; this serves as the first step toward the ultimate goal of obtaining optimal hierarchical classifications. A plot of runtimes for the most time-intensive (non-parallelizable) part of the algorithm indicates a nearly linear time complexity so that, even for the extremely large Rossmann fold protein class, results were obtained in about a day. Conclusions This approach automates the rapid creation of protein domain hierarchies and thus will eliminate one of the most time consuming aspects of conserved domain database curation. At the same time, it also facilitates protein domain annotation by identifying those pattern residues that most distinguish each protein domain subgroup from other related subgroups. PMID:22726767
Contribution to the Prediction of the Fold Code: Application to Immunoglobulin and Flavodoxin Cases
Banach, Mateusz; Prudhomme, Nicolas; Carpentier, Mathilde; Duprat, Elodie; Papandreou, Nikolaos; Kalinowska, Barbara; Chomilier, Jacques; Roterman, Irena
2015-01-01
Background Folding nucleus of globular proteins formation starts by the mutual interaction of a group of hydrophobic amino acids whose close contacts allow subsequent formation and stability of the 3D structure. These early steps can be predicted by simulation of the folding process through a Monte Carlo (MC) coarse grain model in a discrete space. We previously defined MIRs (Most Interacting Residues), as the set of residues presenting a large number of non-covalent neighbour interactions during such simulation. MIRs are good candidates to define the minimal number of residues giving rise to a given fold instead of another one, although their proportion is rather high, typically [15-20]% of the sequences. Having in mind experiments with two sequences of very high levels of sequence identity (up to 90%) but different folds, we combined the MIR method, which takes sequence as single input, with the “fuzzy oil drop” (FOD) model that requires a 3D structure, in order to estimate the residues coding for the fold. FOD assumes that a globular protein follows an idealised 3D Gaussian distribution of hydrophobicity density, with the maximum in the centre and minima at the surface of the “drop”. If the actual local density of hydrophobicity around a given amino acid is as high as the ideal one, then this amino acid is assigned to the core of the globular protein, and it is assumed to follow the FOD model. Therefore one obtains a distribution of the amino acids of a protein according to their agreement or rejection with the FOD model. Results We compared and combined MIR and FOD methods to define the minimal nucleus, or keystone, of two populated folds: immunoglobulin-like (Ig) and flavodoxins (Flav). The combination of these two approaches defines some positions both predicted as a MIR and assigned as accordant with the FOD model. It is shown here that for these two folds, the intersection of the predicted sets of residues significantly differs from random selection. It reduces the number of selected residues by each individual method and allows a reasonable agreement with experimentally determined key residues coding for the particular fold. In addition, the intersection of the two methods significantly increases the specificity of the prediction, providing a robust set of residues that constitute the folding nucleus. PMID:25915049
Intermediates and the folding of proteins L and G
DOE Office of Scientific and Technical Information (OSTI.GOV)
Brown, Scott; Head-Gordon, Teresa
We use a minimalist protein model, in combination with a sequence design strategy, to determine differences in primary structure for proteins L and G that are responsible for the two proteins folding through distinctly different folding mechanisms. We find that the folding of proteins L and G are consistent with a nucleation-condensation mechanism, each of which is described as helix-assisted {beta}-1 and {beta}-2 hairpin formation, respectively. We determine that the model for protein G exhibits an early intermediate that precedes the rate-limiting barrier of folding and which draws together misaligned secondary structure elements that are stabilized by hydrophobic core contactsmore » involving the third {beta}-strand, and presages the later transition state in which the correct strand alignment of these same secondary structure elements is restored. Finally the validity of the targeted intermediate ensemble for protein G was analyzed by fitting the kinetic data to a two-step first order reversible reaction, proving that protein G folding involves an on-pathway early intermediate, and should be populated and therefore observable by experiment.« less
Intermediates and the folding of proteins L and G
Brown, Scott; Head-Gordon, Teresa
2004-01-01
We use a minimalist protein model, in combination with a sequence design strategy, to determine differences in primary structure for proteins L and G, which are responsible for the two proteins folding through distinctly different folding mechanisms. We find that the folding of proteins L and G are consistent with a nucleation-condensation mechanism, each of which is described as helix-assisted β-1 and β-2 hairpin formation, respectively. We determine that the model for protein G exhibits an early intermediate that precedes the rate-limiting barrier of folding, and which draws together misaligned secondary structure elements that are stabilized by hydrophobic core contacts involving the third β-strand, and presages the later transition state in which the correct strand alignment of these same secondary structure elements is restored. Finally, the validity of the targeted intermediate ensemble for protein G was analyzed by fitting the kinetic data to a two-step first-order reversible reaction, proving that protein G folding involves an on-pathway early intermediate, and should be populated and therefore observable by experiment. PMID:15044729
From protein sequence to dynamics and disorder with DynaMine.
Cilia, Elisa; Pancsa, Rita; Tompa, Peter; Lenaerts, Tom; Vranken, Wim F
2013-01-01
Protein function and dynamics are closely related; however, accurate dynamics information is difficult to obtain. Here based on a carefully assembled data set derived from experimental data for proteins in solution, we quantify backbone dynamics properties on the amino-acid level and develop DynaMine--a fast, high-quality predictor of protein backbone dynamics. DynaMine uses only protein sequence information as input and shows great potential in distinguishing regions of different structural organization, such as folded domains, disordered linkers, molten globules and pre-structured binding motifs of different sizes. It also identifies disordered regions within proteins with an accuracy comparable to the most sophisticated existing predictors, without depending on prior disorder knowledge or three-dimensional structural information. DynaMine provides molecular biologists with an important new method that grasps the dynamical characteristics of any protein of interest, as we show here for human p53 and E1A from human adenovirus 5.
Lieutaud, Philippe; Uversky, Alexey V.; Uversky, Vladimir N.; Longhi, Sonia
2016-01-01
ABSTRACT In the last 2 decades it has become increasingly evident that a large number of proteins are either fully or partially disordered. Intrinsically disordered proteins lack a stable 3D structure, are ubiquitous and fulfill essential biological functions. Their conformational heterogeneity is encoded in their amino acid sequences, thereby allowing intrinsically disordered proteins or regions to be recognized based on properties of these sequences. The identification of disordered regions facilitates the functional annotation of proteins and is instrumental for delineating boundaries of protein domains amenable to structural determination with X-ray crystallization. This article discusses a comprehensive selection of databases and methods currently employed to disseminate experimental and putative annotations of disorder, predict disorder and identify regions involved in induced folding. It also provides a set of detailed instructions that should be followed to perform computational analysis of disorder. PMID:28232901
Yang, A S; Hitz, B; Honig, B
1996-06-21
The stability of beta-turns is calculated as a function of sequence and turn type with a Monte Carlo sampling technique. The conformational energy of four internal hydrogen-bonded turn types, I, I', II and II', is obtained by evaluating their gas phase energy with the CHARMM force field and accounting for solvation effects with the Finite Difference Poisson-Boltzmann (FDPB) method. All four turn types are found to be less stable than the coil state, independent of the sequence in the turn. The free-energy penalties associated with turn formation vary between 1.6 kcal/mol and 7.7 kcal/mol, depending on the sequence and turn type. Differences in turn stability arise mainly from intraresidue interactions within the two central residues of the turn. For each combination of the two central residues, except for -Gly-Gly-, the most stable beta-turn type is always found to occur most commonly in native proteins. The fact that a model based on local interactions accounts for the observed preference of specific sequences suggests that long-range tertiary interactions tend to play a secondary role in determining turn conformation. In contrast, for beta-hairpins, long-range interactions appear to dominate. Specifically, due to the right-handed twist of beta-strands, type I' turns for -Gly-Gly- are found to occur with high frequency, even when local energetics would dictate otherwise. The fact that any combination of two residues is found able to adopt a relatively low-energy turn structure explains why the amino acid sequence in turns is highly variable. The calculated free-energy cost of turn formation, when combined with related numbers obtained for alpha-helices and beta-sheets, suggests a model for the initiation of protein folding based on metastable fragments of secondary structure.
Evolution-Based Functional Decomposition of Proteins
Rivoire, Olivier; Reynolds, Kimberly A.; Ranganathan, Rama
2016-01-01
The essential biological properties of proteins—folding, biochemical activities, and the capacity to adapt—arise from the global pattern of interactions between amino acid residues. The statistical coupling analysis (SCA) is an approach to defining this pattern that involves the study of amino acid coevolution in an ensemble of sequences comprising a protein family. This approach indicates a functional architecture within proteins in which the basic units are coupled networks of amino acids termed sectors. This evolution-based decomposition has potential for new understandings of the structural basis for protein function. To facilitate its usage, we present here the principles and practice of the SCA and introduce new methods for sector analysis in a python-based software package (pySCA). We show that the pattern of amino acid interactions within sectors is linked to the divergence of functional lineages in a multiple sequence alignment—a model for how sector properties might be differentially tuned in members of a protein family. This work provides new tools for studying proteins and for generally testing the concept of sectors as the principal units of function and adaptive variation. PMID:27254668
Predicting protein-binding regions in RNA using nucleotide profiles and compositions.
Choi, Daesik; Park, Byungkyu; Chae, Hanju; Lee, Wook; Han, Kyungsook
2017-03-14
Motivated by the increased amount of data on protein-RNA interactions and the availability of complete genome sequences of several organisms, many computational methods have been proposed to predict binding sites in protein-RNA interactions. However, most computational methods are limited to finding RNA-binding sites in proteins instead of protein-binding sites in RNAs. Predicting protein-binding sites in RNA is more challenging than predicting RNA-binding sites in proteins. Recent computational methods for finding protein-binding sites in RNAs have several drawbacks for practical use. We developed a new support vector machine (SVM) model for predicting protein-binding regions in mRNA sequences. The model uses sequence profiles constructed from log-odds scores of mono- and di-nucleotides and nucleotide compositions. The model was evaluated by standard 10-fold cross validation, leave-one-protein-out (LOPO) cross validation and independent testing. Since actual mRNA sequences have more non-binding regions than protein-binding regions, we tested the model on several datasets with different ratios of protein-binding regions to non-binding regions. The best performance of the model was obtained in a balanced dataset of positive and negative instances. 10-fold cross validation with a balanced dataset achieved a sensitivity of 91.6%, a specificity of 92.4%, an accuracy of 92.0%, a positive predictive value (PPV) of 91.7%, a negative predictive value (NPV) of 92.3% and a Matthews correlation coefficient (MCC) of 0.840. LOPO cross validation showed a lower performance than the 10-fold cross validation, but the performance remains high (87.6% accuracy and 0.752 MCC). In testing the model on independent datasets, it achieved an accuracy of 82.2% and an MCC of 0.656. Testing of our model and other state-of-the-art methods on a same dataset showed that our model is better than the others. Sequence profiles of log-odds scores of mono- and di-nucleotides were much more powerful features than nucleotide compositions in finding protein-binding regions in RNA sequences. But, a slight performance gain was obtained when using the sequence profiles along with nucleotide compositions. These are preliminary results of ongoing research, but demonstrate the potential of our approach as a powerful predictor of protein-binding regions in RNA. The program and supporting data are available at http://bclab.inha.ac.kr/RBPbinding .
Omelchenko, Marina V; Galperin, Michael Y; Wolf, Yuri I; Koonin, Eugene V
2010-04-30
Evolutionarily unrelated proteins that catalyze the same biochemical reactions are often referred to as analogous - as opposed to homologous - enzymes. The existence of numerous alternative, non-homologous enzyme isoforms presents an interesting evolutionary problem; it also complicates genome-based reconstruction of the metabolic pathways in a variety of organisms. In 1998, a systematic search for analogous enzymes resulted in the identification of 105 Enzyme Commission (EC) numbers that included two or more proteins without detectable sequence similarity to each other, including 34 EC nodes where proteins were known (or predicted) to have distinct structural folds, indicating independent evolutionary origins. In the past 12 years, many putative non-homologous isofunctional enzymes were identified in newly sequenced genomes. In addition, efforts in structural genomics resulted in a vastly improved structural coverage of proteomes, providing for definitive assessment of (non)homologous relationships between proteins. We report the results of a comprehensive search for non-homologous isofunctional enzymes (NISE) that yielded 185 EC nodes with two or more experimentally characterized - or predicted - structurally unrelated proteins. Of these NISE sets, only 74 were from the original 1998 list. Structural assignments of the NISE show over-representation of proteins with the TIM barrel fold and the nucleotide-binding Rossmann fold. From the functional perspective, the set of NISE is enriched in hydrolases, particularly carbohydrate hydrolases, and in enzymes involved in defense against oxidative stress. These results indicate that at least some of the non-homologous isofunctional enzymes were recruited relatively recently from enzyme families that are active against related substrates and are sufficiently flexible to accommodate changes in substrate specificity.
Bradshaw, Charles Richard; Surendranath, Vineeth; Henschel, Robert; Mueller, Matthias Stefan; Habermann, Bianca Hermine
2011-03-10
Conserved domains in proteins are one of the major sources of functional information for experimental design and genome-level annotation. Though search tools for conserved domain databases such as Hidden Markov Models (HMMs) are sensitive in detecting conserved domains in proteins when they share sufficient sequence similarity, they tend to miss more divergent family members, as they lack a reliable statistical framework for the detection of low sequence similarity. We have developed a greatly improved HMMerThread algorithm that can detect remotely conserved domains in highly divergent sequences. HMMerThread combines relaxed conserved domain searches with fold recognition to eliminate false positive, sequence-based identifications. With an accuracy of 90%, our software is able to automatically predict highly divergent members of conserved domain families with an associated 3-dimensional structure. We give additional confidence to our predictions by validation across species. We have run HMMerThread searches on eight proteomes including human and present a rich resource of remotely conserved domains, which adds significantly to the functional annotation of entire proteomes. We find ∼4500 cross-species validated, remotely conserved domain predictions in the human proteome alone. As an example, we find a DNA-binding domain in the C-terminal part of the A-kinase anchor protein 10 (AKAP10), a PKA adaptor that has been implicated in cardiac arrhythmias and premature cardiac death, which upon stress likely translocates from mitochondria to the nucleus/nucleolus. Based on our prediction, we propose that with this HLH-domain, AKAP10 is involved in the transcriptional control of stress response. Further remotely conserved domains we discuss are examples from areas such as sporulation, chromosome segregation and signalling during immune response. The HMMerThread algorithm is able to automatically detect the presence of remotely conserved domains in proteins based on weak sequence similarity. Our predictions open up new avenues for biological and medical studies. Genome-wide HMMerThread domains are available at http://vm1-hmmerthread.age.mpg.de.
Bradshaw, Charles Richard; Surendranath, Vineeth; Henschel, Robert; Mueller, Matthias Stefan; Habermann, Bianca Hermine
2011-01-01
Conserved domains in proteins are one of the major sources of functional information for experimental design and genome-level annotation. Though search tools for conserved domain databases such as Hidden Markov Models (HMMs) are sensitive in detecting conserved domains in proteins when they share sufficient sequence similarity, they tend to miss more divergent family members, as they lack a reliable statistical framework for the detection of low sequence similarity. We have developed a greatly improved HMMerThread algorithm that can detect remotely conserved domains in highly divergent sequences. HMMerThread combines relaxed conserved domain searches with fold recognition to eliminate false positive, sequence-based identifications. With an accuracy of 90%, our software is able to automatically predict highly divergent members of conserved domain families with an associated 3-dimensional structure. We give additional confidence to our predictions by validation across species. We have run HMMerThread searches on eight proteomes including human and present a rich resource of remotely conserved domains, which adds significantly to the functional annotation of entire proteomes. We find ∼4500 cross-species validated, remotely conserved domain predictions in the human proteome alone. As an example, we find a DNA-binding domain in the C-terminal part of the A-kinase anchor protein 10 (AKAP10), a PKA adaptor that has been implicated in cardiac arrhythmias and premature cardiac death, which upon stress likely translocates from mitochondria to the nucleus/nucleolus. Based on our prediction, we propose that with this HLH-domain, AKAP10 is involved in the transcriptional control of stress response. Further remotely conserved domains we discuss are examples from areas such as sporulation, chromosome segregation and signalling during immune response. The HMMerThread algorithm is able to automatically detect the presence of remotely conserved domains in proteins based on weak sequence similarity. Our predictions open up new avenues for biological and medical studies. Genome-wide HMMerThread domains are available at http://vm1-hmmerthread.age.mpg.de. PMID:21423752
Kadumuri, Rajashekar Varma; Vadrevu, Ramakrishna
2017-10-01
Due to their crucial role in function, folding, and stability, protein loops are being targeted for grafting/designing to create novel or alter existing functionality and improve stability and foldability. With a view to facilitate a thorough analysis and effectual search options for extracting and comparing loops for sequence and structural compatibility, we developed, LoopX a comprehensively compiled library of sequence and conformational features of ∼700,000 loops from protein structures. The database equipped with a graphical user interface is empowered with diverse query tools and search algorithms, with various rendering options to visualize the sequence- and structural-level information along with hydrogen bonding patterns, backbone φ, ψ dihedral angles of both the target and candidate loops. Two new features (i) conservation of the polar/nonpolar environment and (ii) conservation of sequence and conformation of specific residues within the loops have also been incorporated in the search and retrieval of compatible loops for a chosen target loop. Thus, the LoopX server not only serves as a database and visualization tool for sequence and structural analysis of protein loops but also aids in extracting and comparing candidate loops for a given target loop based on user-defined search options.
Enhanced Wang Landau sampling of adsorbed protein conformations.
Radhakrishna, Mithun; Sharma, Sumit; Kumar, Sanat K
2012-03-21
Using computer simulations to model the folding of proteins into their native states is computationally expensive due to the extraordinarily low degeneracy of the ground state. In this paper, we develop an efficient way to sample these folded conformations using Wang Landau sampling coupled with the configurational bias method (which uses an unphysical "temperature" that lies between the collapse and folding transition temperatures of the protein). This method speeds up the folding process by roughly an order of magnitude over existing algorithms for the sequences studied. We apply this method to study the adsorption of intrinsically disordered hydrophobic polar protein fragments on a hydrophobic surface. We find that these fragments, which are unstructured in the bulk, acquire secondary structure upon adsorption onto a strong hydrophobic surface. Apparently, the presence of a hydrophobic surface allows these random coil fragments to fold by providing hydrophobic contacts that were lost in protein fragmentation. © 2012 American Institute of Physics
Shabelnikov, Sergey; Kiselev, Artem
2015-01-01
Despite extensive studies of cardiac bioactive peptides and their functions in molluscs, soluble proteins expressed in the heart and secreted into the circulation have not yet been reported. In this study, we describe an 18.1-kDa, cysteine-rich atrial secretory protein (CRASP) isolated from the terrestrial snail Achatina achatina that has no detectable sequence similarity to any known protein or nucleotide sequence. CRASP is an acidic, 158-residue, N-glycosylated protein composed of eight alpha-helical segments stabilized with five disulphide bonds. A combination of fold recognition algorithms and ab initio folding predicted that CRASP adopts an all-alpha, right-handed superhelical fold. CRASP is most strongly expressed in the atrium in secretory atrial granular cells, and substantial amounts of CRASP are released from the heart upon nerve stimulation. CRASP is detected in the haemolymph of intact animals at nanomolar concentrations. CRASP is the first secretory protein expressed in molluscan atrium to be reported. We propose that CRASP is an example of a taxonomically restricted gene that might be responsible for adaptations specific for terrestrial pulmonates. PMID:26444993
Yugandhar, K; Gromiha, M Michael
2014-09-01
Protein-protein interactions are intrinsic to virtually every cellular process. Predicting the binding affinity of protein-protein complexes is one of the challenging problems in computational and molecular biology. In this work, we related sequence features of protein-protein complexes with their binding affinities using machine learning approaches. We set up a database of 185 protein-protein complexes for which the interacting pairs are heterodimers and their experimental binding affinities are available. On the other hand, we have developed a set of 610 features from the sequences of protein complexes and utilized Ranker search method, which is the combination of Attribute evaluator and Ranker method for selecting specific features. We have analyzed several machine learning algorithms to discriminate protein-protein complexes into high and low affinity groups based on their Kd values. Our results showed a 10-fold cross-validation accuracy of 76.1% with the combination of nine features using support vector machines. Further, we observed accuracy of 83.3% on an independent test set of 30 complexes. We suggest that our method would serve as an effective tool for identifying the interacting partners in protein-protein interaction networks and human-pathogen interactions based on the strength of interactions. © 2014 Wiley Periodicals, Inc.
Zeldovich, Konstantin B; Chen, Peiqiu; Shakhnovich, Boris E; Shakhnovich, Eugene I
2007-07-01
In this work we develop a microscopic physical model of early evolution where phenotype--organism life expectancy--is directly related to genotype--the stability of its proteins in their native conformations-which can be determined exactly in the model. Simulating the model on a computer, we consistently observe the "Big Bang" scenario whereby exponential population growth ensues as soon as favorable sequence-structure combinations (precursors of stable proteins) are discovered. Upon that, random diversity of the structural space abruptly collapses into a small set of preferred proteins. We observe that protein folds remain stable and abundant in the population at timescales much greater than mutation or organism lifetime, and the distribution of the lifetimes of dominant folds in a population approximately follows a power law. The separation of evolutionary timescales between discovery of new folds and generation of new sequences gives rise to emergence of protein families and superfamilies whose sizes are power-law distributed, closely matching the same distributions for real proteins. On the population level we observe emergence of species--subpopulations that carry similar genomes. Further, we present a simple theory that relates stability of evolving proteins to the sizes of emerging genomes. Together, these results provide a microscopic first-principles picture of how first-gene families developed in the course of early evolution.
Overlooked Short Toxin-Like Proteins: A Shortcut to Drug Design
Linial, Michal
2017-01-01
Short stable peptides have huge potential for novel therapies and biosimilars. Cysteine-rich short proteins are characterized by multiple disulfide bridges in a compact structure. Many of these metazoan proteins are processed, folded, and secreted as soluble stable folds. These properties are shared by both marine and terrestrial animal toxins. These stable short proteins are promising sources for new drug development. We developed ClanTox (classifier of animal toxins) to identify toxin-like proteins (TOLIPs) using machine learning models trained on a large-scale proteomic database. Insects proteomes provide a rich source for protein innovations. Therefore, we seek overlooked toxin-like proteins from insects (coined iTOLIPs). Out of 4180 short (<75 amino acids) secreted proteins, 379 were predicted as iTOLIPs with high confidence, with as many as 30% of the genes marked as uncharacterized. Based on bioinformatics, structure modeling, and data-mining methods, we found that the most significant group of predicted iTOLIPs carry antimicrobial activity. Among the top predicted sequences were 120 termicin genes from termites with antifungal properties. Structural variations of insect antimicrobial peptides illustrate the similarity to a short version of the defensin fold with antifungal specificity. We also identified 9 proteins that strongly resemble ion channel inhibitors from scorpion and conus toxins. Furthermore, we assigned functional fold to numerous uncharacterized iTOLIPs. We conclude that a systematic approach for finding iTOLIPs provides a rich source of peptides for drug design and innovative therapeutic discoveries. PMID:29109389
Labudde, Dirk
2015-01-01
The importance of short membrane sequence motifs has been shown in many works and emphasizes the related sequence motif analysis. Together with specific transmembrane helix-helix interactions, the analysis of interacting sequence parts is helpful for understanding the process during membrane protein folding and in retaining the three-dimensional fold. Here we present a simple high-throughput analysis method for deriving mutational information of interacting sequence parts. Applied on aquaporin water channel proteins, our approach supports the analysis of mutational variants within different interacting subsequences and finally the investigation of natural variants which cause diseases like, for example, nephrogenic diabetes insipidus. In this work we demonstrate a simple method for massive membrane protein data analysis. As shown, the presented in silico analyses provide information about interacting sequence parts which are constrained by protein evolution. We present a simple graphical visualization medium for the representation of evolutionary influenced interaction pattern pairs (EIPPs) adapted to mutagen investigations of aquaporin-2, a protein whose mutants are involved in the rare endocrine disorder known as nephrogenic diabetes insipidus, and membrane proteins in general. Furthermore, we present a new method to derive new evolutionary variations within EIPPs which can be used for further mutagen laboratory investigations. PMID:26180540
Grunert, Steffen; Labudde, Dirk
2015-01-01
The importance of short membrane sequence motifs has been shown in many works and emphasizes the related sequence motif analysis. Together with specific transmembrane helix-helix interactions, the analysis of interacting sequence parts is helpful for understanding the process during membrane protein folding and in retaining the three-dimensional fold. Here we present a simple high-throughput analysis method for deriving mutational information of interacting sequence parts. Applied on aquaporin water channel proteins, our approach supports the analysis of mutational variants within different interacting subsequences and finally the investigation of natural variants which cause diseases like, for example, nephrogenic diabetes insipidus. In this work we demonstrate a simple method for massive membrane protein data analysis. As shown, the presented in silico analyses provide information about interacting sequence parts which are constrained by protein evolution. We present a simple graphical visualization medium for the representation of evolutionary influenced interaction pattern pairs (EIPPs) adapted to mutagen investigations of aquaporin-2, a protein whose mutants are involved in the rare endocrine disorder known as nephrogenic diabetes insipidus, and membrane proteins in general. Furthermore, we present a new method to derive new evolutionary variations within EIPPs which can be used for further mutagen laboratory investigations.
Structural perturbations on huntingtin N17 domain during its folding on 2D-nanomaterials
NASA Astrophysics Data System (ADS)
Zhang, Leili; Feng, Mei; Zhou, Ruhong; Luan, Binquan
2017-09-01
A globular protein’s folded structure in its physiological environment is largely determined by its amino acid sequence. Recently, newly discovered transformer proteins as well as intrinsically disordered proteins may adopt the folding-upon-binding mechanism where their secondary structures are highly dependent on their binding partners. Due to the various applications of nanomaterials in biological sensors and potential wearable devices, it is important to discover possible conformational changes of proteins on nanomaterials. Here, through molecular dynamics simulations, we show that the first 17 residues of the huntingtin protein (HTT-N17) exhibit appreciable differences during its folding on 2D-nanomaterials, such as graphene and MoS2 nanosheets. Namely, the protein is disordered on the graphene surface but is helical on the MoS2 surface. Despite that the amphiphilic environment at the nanosheet-water interface promotes the folding of the amphipathic proteins (such as HTT-N17), competitions between protein-nanosheet and intra-protein interactions yield very different protein conformations. Therefore, as engineered binding partners, nanomaterials might significantly affect the structures of adsorbed proteins.
Spontaneous Unfolding-Refolding of Fibronectin Type III Domains Assayed by Thiol Exchange
Shah, Riddhi; Ohashi, Tomoo; Erickson, Harold P.; Oas, Terrence G.
2017-01-01
Globular proteins are not permanently folded but spontaneously unfold and refold on time scales that can span orders of magnitude for different proteins. A longstanding debate in the protein-folding field is whether unfolding rates or folding rates correlate to the stability of a protein. In the present study, we have determined the unfolding and folding kinetics of 10 FNIII domains. FNIII domains are one of the most common protein folds and are present in 2% of animal proteins. FNIII domains are ideal for this study because they have an identical seven-strand β-sandwich structure, but they vary widely in sequence and thermodynamic stability. We assayed thermodynamic stability of each domain by equilibrium denaturation in urea. We then assayed the kinetics of domain opening and closing by a technique known as thiol exchange. For this we introduced a buried Cys at the identical location in each FNIII domain and measured the kinetics of labeling with DTNB over a range of urea concentrations. A global fit of the kinetics data gave the kinetics of spontaneous unfolding and refolding in zero urea. We found that the folding rates were relatively similar, ∼0.1–1 s−1, for the different domains. The unfolding rates varied widely and correlated with thermodynamic stability. Our study is the first to address this question using a set of domains that are structurally homologous but evolved with widely varying sequence identity and thermodynamic stability. These data add new evidence that thermodynamic stability correlates primarily with unfolding rate rather than folding rate. The study also has implications for the question of whether opening of FNIII domains contributes to the stretching of fibronectin matrix fibrils. PMID:27909052
MatureP: prediction of secreted proteins with exclusive information from their mature regions.
Orfanoudaki, Georgia; Markaki, Maria; Chatzi, Katerina; Tsamardinos, Ioannis; Economou, Anastassios
2017-06-12
More than a third of the cellular proteome is non-cytoplasmic. Most secretory proteins use the Sec system for export and are targeted to membranes using signal peptides and mature domains. To specifically analyze bacterial mature domain features, we developed MatureP, a classifier that predicts secretory sequences through features exclusively computed from their mature domains. MatureP was trained using Just Add Data Bio, an automated machine learning tool. Mature domains are predicted efficiently with ~92% success, as measured by the Area Under the Receiver Operating Characteristic Curve (AUC). Predictions were validated using experimental datasets of mutated secretory proteins. The features selected by MatureP reveal prominent differences in amino acid content between secreted and cytoplasmic proteins. Amino-terminal mature domain sequences have enhanced disorder, more hydroxyl and polar residues and less hydrophobics. Cytoplasmic proteins have prominent amino-terminal hydrophobic stretches and charged regions downstream. Presumably, secretory mature domains comprise a distinct protein class. They balance properties that promote the necessary flexibility required for the maintenance of non-folded states during targeting and secretion with the ability of post-secretion folding. These findings provide novel insight in protein trafficking, sorting and folding mechanisms and may benefit protein secretion biotechnology.
Das, Payel; Matysiak, Silvina; Clementi, Cecilia
2005-01-01
Coarse-grained models have been extremely valuable in promoting our understanding of protein folding. However, the quantitative accuracy of existing simplified models is strongly hindered either from the complete removal of frustration (as in the widely used Gō-like models) or from the compromise with the minimal frustration principle and/or realistic protein geometry (as in the simple on-lattice models). We present a coarse-grained model that “naturally” incorporates sequence details and energetic frustration into an overall minimally frustrated folding landscape. The model is coupled with an optimization procedure to design the parameters of the protein Hamiltonian to fold into a desired native structure. The application to the study of src-Src homology 3 domain shows that this coarse-grained model contains the main physical-chemical ingredients that are responsible for shaping the folding landscape of this protein. The results illustrate the importance of nonnative interactions and energetic heterogeneity for a quantitative characterization of folding mechanisms. PMID:16006532
Proteome-level interplay between folding and aggregation propensities of proteins.
Tartaglia, Gian Gaetano; Vendruscolo, Michele
2010-10-08
With the advent of proteomics, there is an increasing need of tools for predicting the properties of large numbers of proteins by using the information provided by their amino acid sequences, even in the absence of the knowledge of their structures. One of the most important types of predictions concerns whether proteins will fold or aggregate. Here, we study the competition between these two processes by analyzing the relationship between the folding and aggregation propensity profiles for the human and Escherichia coli proteomes. These profiles are calculated, respectively, using the CamFold method, which we introduce in this work, and the Zyggregator method. Our results indicate that the kinetic behavior of proteins is, to a large extent, determined by the interplay between regions of low folding and high aggregation propensities. Copyright © 2010. Published by Elsevier Ltd.
Prokaryotic Ubiquitin-Like Protein Modification
Maupin-Furlow, Julie A.
2016-01-01
Prokaryotes form ubiquitin (Ub)-like isopeptide bonds on the lysine residues of proteins by at least two distinct pathways that are reversible and regulated. In mycobacteria, the C-terminal Gln of Pup (prokaryotic ubiquitin-like protein) is deamidated and isopeptide linked to proteins by a mechanism distinct from ubiquitylation in enzymology yet analogous to ubiquitylation in targeting proteins for destruction by proteasomes. Ub-fold proteins of archaea (SAMPs, small archaeal modifier proteins) and Thermus (TtuB, tRNA-two-thiouridine B) that differ from Ub in amino acid sequence, yet share a common β-grasp fold, also form isopeptide bonds by a mechanism that appears streamlined compared with ubiquitylation. SAMPs and TtuB are found to be members of a small group of Ub-fold proteins that function not only in protein modification but also in sulfur-transfer pathways associated with tRNA thiolation and molybdopterin biosynthesis. These multifunctional Ub-fold proteins are thought to be some of the most ancient of Ub-like protein modifiers. PMID:24995873
QUASAR--scoring and ranking of sequence-structure alignments.
Birzele, Fabian; Gewehr, Jan E; Zimmer, Ralf
2005-12-15
Sequence-structure alignments are a common means for protein structure prediction in the fields of fold recognition and homology modeling, and there is a broad variety of programs that provide such alignments based on sequence similarity, secondary structure or contact potentials. Nevertheless, finding the best sequence-structure alignment in a pool of alignments remains a difficult problem. QUASAR (quality of sequence-structure alignments ranking) provides a unifying framework for scoring sequence-structure alignments that aids finding well-performing combinations of well-known and custom-made scoring schemes. Those scoring functions can be benchmarked against widely accepted quality scores like MaxSub, TMScore, Touch and APDB, thus enabling users to test their own alignment scores against 'standard-of-truth' structure-based scores. Furthermore, individual score combinations can be optimized with respect to benchmark sets based on known structural relationships using QUASAR's in-built optimization routines.
Recent developments in the theory of protein folding: searching for the global energy minimum.
Scheraga, H A
1996-04-16
Statistical mechanical theories and computer simulation are being used to gain an understanding of the fundamental features of protein folding. A major obstacle in the computation of protein structures is the multiple-minima problem arising from the existence of many local minima in the multidimensional energy landscape of the protein. This problem has been surmounted for small open-chain and cyclic peptides, and for regular-repeating sequences of models of fibrous proteins. Progress is being made in resolving this problem for globular proteins.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tao, Jiahui; Petrova, Kseniya; Ron, David
2010-05-25
P58(IPK) might function as an endoplasmic reticulum molecular chaperone to maintain protein folding homeostasis during unfolded protein responses. P58(IPK) contains nine tetratricopeptide repeat (TPR) motifs and a C-terminal J-domain within its primary sequence. To investigate the mechanism by which P58(IPK) functions to promote protein folding within the endoplasmic reticulum, we have determined the crystal structure of P58(IPK) TPR fragment to 2.5 {angstrom} resolution by the SAD method. The crystal structure of P58(IPK) revealed three domains (I-III) with similar folds and each domain contains three TPR motifs. An ELISA assay indicated that P58(IPK) acts as a molecular chaperone by interacting withmore » misfolded proteins such as luciferase and rhodanese. The P58(IPK) structure reveals a conserved hydrophobic patch located in domain I that might be involved in binding the misfolded polypeptides. Structure-based mutagenesis for the conserved hydrophobic residues located in domain I significantly reduced the molecular chaperone activity of P58(IPK).« less
Beta-propellers: associated functions and their role in human diseases.
Pons, Tirso; Gómez, Raú; Chinea, Glay; Valencia, Alfonso
2003-03-01
The beta-propeller fold appears as a very fascinating architecture based on four-stranded antiparallel and twisted beta-sheets, radially arranged around a central tunnel. Similar to the alpha/beta-barrel (TIM-barrel) fold, the beta-propeller has a wide range of different functions, and is gaining substantial attention. Some proteins containing beta-propeller domains have been implicated in the pathogenesis of a variety of diseases such as cancer, Alzheimer, Huntington, arthritis, familial hypercholesterolemia, retinitis pigmentosa, osteogenesis, hypertension, and microbial and viral infections. This article reviews some aspects of 3D structure, amino acids sequence regularities, and biological functions of the proteins containing beta-propeller domains. Major emphasis has been laid on beta-propellers whose functions are associated to human diseases. Recent research efforts reported in the fields of protein engineering, drug design, and protein structure-function relationship studies, concerning the beta-propeller architecture, have also been discussed.
Comparison of intrinsic dynamics of cytochrome p450 proteins using normal mode analysis
Dorner, Mariah E; McMunn, Ryan D; Bartholow, Thomas G; Calhoon, Brecken E; Conlon, Michelle R; Dulli, Jessica M; Fehling, Samuel C; Fisher, Cody R; Hodgson, Shane W; Keenan, Shawn W; Kruger, Alyssa N; Mabin, Justin W; Mazula, Daniel L; Monte, Christopher A; Olthafer, Augustus; Sexton, Ashley E; Soderholm, Beatrice R; Strom, Alexander M; Hati, Sanchita
2015-01-01
Cytochrome P450 enzymes are hemeproteins that catalyze the monooxygenation of a wide-range of structurally diverse substrates of endogenous and exogenous origin. These heme monooxygenases receive electrons from NADH/NADPH via electron transfer proteins. The cytochrome P450 enzymes, which constitute a diverse superfamily of more than 8,700 proteins, share a common tertiary fold but < 25% sequence identity. Based on their electron transfer protein partner, cytochrome P450 proteins are classified into six broad classes. Traditional methods of pro are based on the canonical paradigm that attributes proteins' function to their three-dimensional structure, which is determined by their primary structure that is the amino acid sequence. It is increasingly recognized that protein dynamics play an important role in molecular recognition and catalytic activity. As the mobility of a protein is an intrinsic property that is encrypted in its primary structure, we examined if different classes of cytochrome P450 enzymes display any unique patterns of intrinsic mobility. Normal mode analysis was performed to characterize the intrinsic dynamics of five classes of cytochrome P450 proteins. The present study revealed that cytochrome P450 enzymes share a strong dynamic similarity (root mean squared inner product > 55% and Bhattacharyya coefficient > 80%), despite the low sequence identity (< 25%) and sequence similarity (< 50%) across the cytochrome P450 superfamily. Noticeable differences in Cα atom fluctuations of structural elements responsible for substrate binding were noticed. These differences in residue fluctuations might be crucial for substrate selectivity in these enzymes. PMID:26130403
Deryusheva, Evgeniia I; Machulin, Andrey V; Selivanova, Olga M; Galzitskaya, Oxana V
2017-04-01
Proteins of the nucleic acid-binding proteins superfamily perform such functions as processing, transport, storage, stretching, translation, and degradation of RNA. It is one of the 16 superfamilies containing the OB-fold in protein structures. Here, we have analyzed the superfamily of nucleic acid-binding proteins (the number of sequences exceeds 200,000) and obtained that this superfamily prevalently consists of proteins containing the cold shock DNA-binding domain (ca. 131,000 protein sequences). Proteins containing the S1 domain compose 57% from the cold shock DNA-binding domain family. Furthermore, we have found that the S1 domain was identified mainly in the bacterial proteins (ca. 83%) compared to the eukaryotic and archaeal proteins, which are available in the UniProt database. We have found that the number of multiple repeats of S1 domain in the S1 domain-containing proteins depends on the taxonomic affiliation. All archaeal proteins contain one copy of the S1 domain, while the number of repeats in the eukaryotic proteins varies between 1 and 15 and correlates with the protein size. In the bacterial proteins, the number of repeats is no more than 6, regardless of the protein size. The large variation of the repeat number of S1 domain as one of the structural variants of the OB-fold is a distinctive feature of S1 domain-containing proteins. Proteins from the other families and superfamilies have either one OB-fold or change slightly the repeat numbers. On the whole, it can be supposed that the repeat number is a vital for multifunctional activity of the S1 domain-containing proteins. Proteins 2017; 85:602-613. © 2016 Wiley Periodicals, Inc. © 2017 Wiley Periodicals, Inc.
Exploration of Uncharted Regions of the Protein Universe
Jaroszewski, Lukasz; Li, Zhanwen; Krishna, S. Sri; Bakolitsa, Constantina; Wooley, John; Deacon, Ashley M.; Wilson, Ian A.; Godzik, Adam
2009-01-01
The genome projects have unearthed an enormous diversity of genes of unknown function that are still awaiting biological and biochemical characterization. These genes, as most others, can be grouped into families based on sequence similarity. The PFAM database currently contains over 2,200 such families, referred to as domains of unknown function (DUF). In a coordinated effort, the four large-scale centers of the NIH Protein Structure Initiative have determined the first three-dimensional structures for more than 250 of these DUF families. Analysis of the first 248 reveals that about two thirds of the DUF families likely represent very divergent branches of already known and well-characterized families, which allows hypotheses to be formulated about their biological function. The remainder can be formally categorized as new folds, although about one third of these show significant substructure similarity to previously characterized folds. These results infer that, despite the enormous increase in the number and the diversity of new genes being uncovered, the fold space of the proteins they encode is gradually becoming saturated. The previously unexplored sectors of the protein universe appear to be primarily shaped by extreme diversification of known protein families, which then enables organisms to evolve new functions and adapt to particular niches and habitats. Notwithstanding, these DUF families still constitute the richest source for discovery of the remaining protein folds and topologies. PMID:19787035
Thermodynamics of Coupled Folding in the Interaction of Archaeal RNase P Proteins RPP21 and RPP29
Xu, Yiren; Oruganti, Sri Vidya; Gopalan, Venkat; Foster, Mark P.
2014-01-01
We have used isothermal titration calorimetry (ITC) to identify and describe binding-coupled equilibria in the interaction between two protein subunits of archaeal ribonuclease P (RNase P). In all three domains of life, RNase P is a ribonucleoprotein complex that is primarily responsible for catalyzing the Mg2+-dependent cleavage of the 5′ leader sequence of precursor tRNAs during tRNA maturation. In archaea, RNase P has been shown to be composed of one catalytic RNA and up to five proteins, four of which associate in the absence of RNA as two functional heterodimers, POP5-RPP30 and RPP21-RPP29. NMR studies of the Pyrococcus furiosus RPP21 and RPP29 proteins in their free and complexed states provided evidence for significant protein folding upon binding. ITC experiments were performed over a range of temperatures, ionic strengths, pH values and in buffers with varying ionization potential, and with a folding-deficient RPP21 point mutant. These experiments revealed a negative heat capacity change (ΔCp), nearly twice that predicted from surface accessibility calculations, a strong salt dependence to the interaction and proton release at neutral pH, but a small net contribution from these to the excess ΔCp. We considered potential contributions from protein folding and burial of interfacial water molecules based on structural and spectroscopic data. We conclude that binding-coupled protein folding is likely responsible for a significant portion of the excess ΔCp. These findings provide novel structural-thermodynamic insights into coupled equilibria that enable specificity in macromolecular assemblies. PMID:22243443
Ghouzam, Yassine; Postic, Guillaume; Guerin, Pierre-Edouard; de Brevern, Alexandre G.; Gelly, Jean-Christophe
2016-01-01
Protein structure prediction based on comparative modeling is the most efficient way to produce structural models when it can be performed. ORION is a dedicated webserver based on a new strategy that performs this task. The identification by ORION of suitable templates is performed using an original profile-profile approach that combines sequence and structure evolution information. Structure evolution information is encoded into profiles using structural features, such as solvent accessibility and local conformation —with Protein Blocks—, which give an accurate description of the local protein structure. ORION has recently been improved, increasing by 5% the quality of its results. The ORION web server accepts a single protein sequence as input and searches homologous protein structures within minutes. Various databases such as PDB, SCOP and HOMSTRAD can be mined to find an appropriate structural template. For the modeling step, a protein 3D structure can be directly obtained from the selected template by MODELLER and displayed with global and local quality model estimation measures. The sequence and the predicted structure of 4 examples from the CAMEO server and a recent CASP11 target from the ‘Hard’ category (T0818-D1) are shown as pertinent examples. Our web server is accessible at http://www.dsimb.inserm.fr/ORION/. PMID:27319297
Ghouzam, Yassine; Postic, Guillaume; Guerin, Pierre-Edouard; de Brevern, Alexandre G; Gelly, Jean-Christophe
2016-06-20
Protein structure prediction based on comparative modeling is the most efficient way to produce structural models when it can be performed. ORION is a dedicated webserver based on a new strategy that performs this task. The identification by ORION of suitable templates is performed using an original profile-profile approach that combines sequence and structure evolution information. Structure evolution information is encoded into profiles using structural features, such as solvent accessibility and local conformation -with Protein Blocks-, which give an accurate description of the local protein structure. ORION has recently been improved, increasing by 5% the quality of its results. The ORION web server accepts a single protein sequence as input and searches homologous protein structures within minutes. Various databases such as PDB, SCOP and HOMSTRAD can be mined to find an appropriate structural template. For the modeling step, a protein 3D structure can be directly obtained from the selected template by MODELLER and displayed with global and local quality model estimation measures. The sequence and the predicted structure of 4 examples from the CAMEO server and a recent CASP11 target from the 'Hard' category (T0818-D1) are shown as pertinent examples. Our web server is accessible at http://www.dsimb.inserm.fr/ORION/.
Characterization of the amino acid contribution to the folding degree of proteins.
Estrada, Ernesto
2004-03-01
The folding degree index (Estrada, Bioinformatics 2002;18:697-704) is extended to account for the contribution of amino acids to folding. First, the mathematical formalism for extending the folding degree index is presented. Then, the amino acid contributions to folding degree of several proteins are used to analyze its relation to secondary structure. The possibilities of using these contributions in helping or checking the assignation of secondary structure to amino acids are also introduced. The influence of external factors to the amino acids contribution to folding degree is studied through the temperature effect on ribonuclease A. Finally, the analysis of 3D protein similarity through the use of amino acid contributions to folding degree is studied by selecting a series of lysozymes. These results are compared to that obtained by sequence alignment (2D similarity) and 3D superposition of the structures, showing the uniqueness of the current approach. Copyright 2004 Wiley-Liss, Inc.
Jailani, A Abdul Kader; Solanki, Vikas; Roy, Anirban; Sivasudha, T; Mandal, Bikash
2017-04-02
A highly infectious clone of Cucumber green mottle mosaic virus (CGMMV), a cucurbit-infecting tobamovirus was utilized for designing of gene expression vectors. Two versions of vector were examined for their efficacy in expressing the green fluorescent protein (GFP) in Nicotiana benthamiana. When the GFP gene was inserted at the stop codon of coat protein (CP) gene of the CGMMV genome without any read-through codon, systemic expression of GFP, as well as virion formation and systemic symptoms expression were obtained in N. benthamiana. The qRT-PCR analysis showed 23 fold increase of GFP over actin at 10days post inoculation (dpi), which increased to 45 fold at 14dpi and thereafter the GFP expression was significantly declined. Further, we show that when the most of the CP sequence is deleted retaining only the first 105 nucleotides, the shortened vector containing GFP in frame of original CP open reading frame (ORF) resulted in 234 fold increase of GFP expression over actin at 5dpi in N. benthamiana without the formation of virions and disease symptoms. Our study demonstrated that a simple manipulation of CP gene in the CGMMV genome while preserving the translational frame of CP resulted in developing a virus-free, rapid and efficient foreign protein expression system in the plant. The CGMMV based vectors developed in this study may be potentially useful for the production of edible vaccines in cucurbits. Copyright © 2017 Elsevier B.V. All rights reserved.
Uncovering Specific Electrostatic Interactions in the Denatured States of Proteins
Shen, Jana K.
2010-01-01
The stability and folding of proteins are modulated by energetically significant interactions in the denatured state that is in equilibrium with the native state. These interactions remain largely invisible to current experimental techniques, however, due to the sparse population and conformational heterogeneity of the denatured-state ensemble under folding conditions. Molecular dynamics simulations using physics-based force fields can in principle offer atomistic details of the denatured state. However, practical applications are plagued with the lack of rigorous means to validate microscopic information and deficiencies in force fields and solvent models. This study presents a method based on coupled titration and molecular dynamics sampling of the denatured state starting from the extended sequence under native conditions. The resulting denatured-state pKas allow for the prediction of experimental observables such as pH- and mutation-induced stability changes. I show the capability and use of the method by investigating the electrostatic interactions in the denatured states of wild-type and K12M mutant of NTL9 protein. This study shows that the major errors in electrostatics can be identified by validating the titration properties of the fragment peptides derived from the sequence of the intact protein. Consistent with experimental evidence, our simulations show a significantly depressed pKa for Asp8 in the denatured state of wild-type, which is due to a nonnative interaction between Asp8 and Lys12. Interestingly, the simulation also shows a nonnative interaction between Asp8 and Glu48 in the denatured state of the mutant. I believe the presented method is general and can be applied to extract and validate microscopic electrostatics of the entire folding energy landscape. PMID:20682271
Building toy models of proteins using coevolutionary information
NASA Astrophysics Data System (ADS)
Cheng, Ryan; Raghunathan, Mohit; Onuchic, Jose
2015-03-01
Recent developments in global statistical methodologies have advanced the analysis of large collections of protein sequences for coevolutionary information. Coevolution between amino acids in a protein arises from compensatory mutations that are needed to maintain the stability or function of a protein over the course of evolution. This gives rise to quantifiable correlations between amino acid positions within the multiple sequence alignment of a protein family. Here, we use Direct Coupling Analysis (DCA) to infer a Potts model Hamiltonian governing the correlated mutations in a protein family to obtain the sequence-dependent interaction energies of a toy protein model. We demonstrate that this methodology predicts residue-residue interaction energies that are consistent with experimental mutational changes in protein stabilities as well as other computational methodologies. Furthermore, we demonstrate with several examples that DCA could be used to construct a structure-based model that quantitatively agrees with experimental data on folding mechanisms. This work serves as a potential framework for generating models of proteins that are enriched by evolutionary data that can potentially be used to engineer key functional motions and interactions in protein systems. This research has been supported by the NSF INSPIRE award MCB-1241332 and by the CTBP sponsored by the NSF (Grant PHY-1427654).
The Proteome Folding Project: Proteome-scale prediction of structure and function
Drew, Kevin; Winters, Patrick; Butterfoss, Glenn L.; Berstis, Viktors; Uplinger, Keith; Armstrong, Jonathan; Riffle, Michael; Schweighofer, Erik; Bovermann, Bill; Goodlett, David R.; Davis, Trisha N.; Shasha, Dennis; Malmström, Lars; Bonneau, Richard
2011-01-01
The incompleteness of proteome structure and function annotation is a critical problem for biologists and, in particular, severely limits interpretation of high-throughput and next-generation experiments. We have developed a proteome annotation pipeline based on structure prediction, where function and structure annotations are generated using an integration of sequence comparison, fold recognition, and grid-computing-enabled de novo structure prediction. We predict protein domain boundaries and three-dimensional (3D) structures for protein domains from 94 genomes (including human, Arabidopsis, rice, mouse, fly, yeast, Escherichia coli, and worm). De novo structure predictions were distributed on a grid of more than 1.5 million CPUs worldwide (World Community Grid). We generated significant numbers of new confident fold annotations (9% of domains that are otherwise unannotated in these genomes). We demonstrate that predicted structures can be combined with annotations from the Gene Ontology database to predict new and more specific molecular functions. PMID:21824995
Heterologous mitochondrial targeting sequences can deliver functional proteins into mitochondria.
Marcus, Dana; Lichtenstein, Michal; Cohen, Natali; Hadad, Rita; Erlich-Hadad, Tal; Greif, Hagar; Lorberboum-Galski, Haya
2016-12-01
Mitochondrial Targeting Sequences (MTSs) are responsible for trafficking nuclear-encoded proteins into mitochondria. Once entering the mitochondria, the MTS is recognized and cleaved off. Some MTSs are long and undergo two-step processing, as in the case of the human frataxin (FXN) protein (80aa), implicated in Friedreich's ataxia (FA). Therefore, we chose the FXN protein to examine whether nuclear-encoded mitochondrial proteins can efficiently be targeted via a heterologous MTS (hMTS) and deliver a functional protein into mitochondria. We examined three hMTSs; that of citrate synthase (cs), lipoamide deydrogenase (LAD) and C6ORF66 (ORF), as classically MTS sequences, known to be removed by one-step processing, to deliver FXN into mitochondria, in the form of fusion proteins. We demonstrate that using hMTSs for delivering FXN results in the production of 4-5-fold larger amounts of the fusion proteins, and at 4-5-fold higher concentrations. Moreover, hMTSs delivered a functional FXN protein into the mitochondria even more efficiently than the native MTSfxn, as evidenced by the rescue of FA patients' cells from oxidative stress; demonstrating a 18%-54% increase in cell survival; and a 13%-33% increase in ATP levels, as compared to the fusion protein carrying the native MTS. One fusion protein with MTScs increased aconitase activity within patients' cells, by 400-fold. The implications form our studies are of vast importance for both basic and translational research of mitochondrial proteins as any mitochondrial protein can be delivered efficiently by an hMTS. Moreover, effective targeting of functional proteins is important for restoration of mitochondrial function and treatment of related disorders. Copyright © 2016 Elsevier Ltd. All rights reserved.
Prediction of the translocon-mediated membrane insertion free energies of protein sequences.
Park, Yungki; Helms, Volkhard
2008-05-15
Helical membrane proteins (HMPs) play crucial roles in a variety of cellular processes. Unlike water-soluble proteins, HMPs need not only to fold but also get inserted into the membrane to be fully functional. This process of membrane insertion is mediated by the translocon complex. Thus, it is of great interest to develop computational methods for predicting the translocon-mediated membrane insertion free energies of protein sequences. We have developed Membrane Insertion (MINS), a novel sequence-based computational method for predicting the membrane insertion free energies of protein sequences. A benchmark test gives a correlation coefficient of 0.74 between predicted and observed free energies for 357 known cases, which corresponds to a mean unsigned error of 0.41 kcal/mol. These results are significantly better than those obtained by traditional hydropathy analysis. Moreover, the ability of MINS to reasonably predict membrane insertion free energies of protein sequences allows for effective identification of transmembrane (TM) segments. Subsequently, MINS was applied to predict the membrane insertion free energies of 316 TM segments found in known structures. An in-depth analysis of the predicted free energies reveals a number of interesting findings about the biogenesis and structural stability of HMPs. A web server for MINS is available at http://service.bioinformatik.uni-saarland.de/mins
Neshich, Goran; Togawa, Roberto C.; Mancini, Adauto L.; Kuser, Paula R.; Yamagishi, Michel E. B.; Pappas, Georgios; Torres, Wellington V.; Campos, Tharsis Fonseca e; Ferreira, Leonardo L.; Luna, Fabio M.; Oliveira, Adilton G.; Miura, Ronald T.; Inoue, Marcus K.; Horita, Luiz G.; de Souza, Dimas F.; Dominiquini, Fabiana; Álvaro, Alexandre; Lima, Cleber S.; Ogawa, Fabio O.; Gomes, Gabriel B.; Palandrani, Juliana F.; dos Santos, Gabriela F.; de Freitas, Esther M.; Mattiuz, Amanda R.; Costa, Ivan C.; de Almeida, Celso L.; Souza, Savio; Baudet, Christian; Higa, Roberto H.
2003-01-01
STING Millennium Suite (SMS) is a new web-based suite of programs and databases providing visualization and a complex analysis of molecular sequence and structure for the data deposited at the Protein Data Bank (PDB). SMS operates with a collection of both publicly available data (PDB, HSSP, Prosite) and its own data (contacts, interface contacts, surface accessibility). Biologists find SMS useful because it provides a variety of algorithms and validated data, wrapped-up in a user friendly web interface. Using SMS it is now possible to analyze sequence to structure relationships, the quality of the structure, nature and volume of atomic contacts of intra and inter chain type, relative conservation of amino acids at the specific sequence position based on multiple sequence alignment, indications of folding essential residue (FER) based on the relationship of the residue conservation to the intra-chain contacts and Cα–Cα and Cβ–Cβ distance geometry. Specific emphasis in SMS is given to interface forming residues (IFR)—amino acids that define the interactive portion of the protein surfaces. SMS may simultaneously display and analyze previously superimposed structures. PDB updates trigger SMS updates in a synchronized fashion. SMS is freely accessible for public data at http://www.cbi.cnptia.embrapa.br, http://mirrors.rcsb.org/SMS and http://trantor.bioc.columbia.edu/SMS. PMID:12824333
Peyret, Hadrien; Gehin, Annick; Thuenemann, Eva C.; Blond, Donatienne; El Turabi, Aadil; Beales, Lucy; Clarke, Dean; Gilbert, Robert J. C.; Fry, Elizabeth E.; Stuart, David I.; Holmes, Kris; Stonehouse, Nicola J.; Whelan, Mike; Rosenberg, William; Lomonossoff, George P.; Rowlands, David J.
2015-01-01
The core protein of the hepatitis B virus, HBcAg, assembles into highly immunogenic virus-like particles (HBc VLPs) when expressed in a variety of heterologous systems. Specifically, the major insertion region (MIR) on the HBcAg protein allows the insertion of foreign sequences, which are then exposed on the tips of surface spike structures on the outside of the assembled particle. Here, we present a novel strategy which aids the display of whole proteins on the surface of HBc particles. This strategy, named tandem core, is based on the production of the HBcAg dimer as a single polypeptide chain by tandem fusion of two HBcAg open reading frames. This allows the insertion of large heterologous sequences in only one of the two MIRs in each spike, without compromising VLP formation. We present the use of tandem core technology in both plant and bacterial expression systems. The results show that tandem core particles can be produced with unmodified MIRs, or with one MIR in each tandem dimer modified to contain the entire sequence of GFP or of a camelid nanobody. Both inserted proteins are correctly folded and the nanobody fused to the surface of the tandem core particle (which we name tandibody) retains the ability to bind to its cognate antigen. This technology paves the way for the display of natively folded proteins on the surface of HBc particles either through direct fusion or through non-covalent attachment via a nanobody. PMID:25830365
Xu, Dong; Zhang, Yang
2012-01-01
Ab initio protein folding is one of the major unsolved problems in computational biology due to the difficulties in force field design and conformational search. We developed a novel program, QUARK, for template-free protein structure prediction. Query sequences are first broken into fragments of 1–20 residues where multiple fragment structures are retrieved at each position from unrelated experimental structures. Full-length structure models are then assembled from fragments using replica-exchange Monte Carlo simulations, which are guided by a composite knowledge-based force field. A number of novel energy terms and Monte Carlo movements are introduced and the particular contributions to enhancing the efficiency of both force field and search engine are analyzed in detail. QUARK prediction procedure is depicted and tested on the structure modeling of 145 non-homologous proteins. Although no global templates are used and all fragments from experimental structures with template modeling score (TM-score) >0.5 are excluded, QUARK can successfully construct 3D models of correct folds in 1/3 cases of short proteins up to 100 residues. In the ninth community-wide Critical Assessment of protein Structure Prediction (CASP9) experiment, QUARK server outperformed the second and third best servers by 18% and 47% based on the cumulative Z-score of global distance test-total (GDT-TS) scores in the free modeling (FM) category. Although ab initio protein folding remains a significant challenge, these data demonstrate new progress towards the solution of the most important problem in the field. PMID:22411565
ERIC Educational Resources Information Center
Fernandez-Reche, Andres; Cobos, Eva S.; Luque, Irene; Ruiz-Sanz, Javier; Martinez, Jose C.
2018-01-01
In 1972 Christian B. Anfinsen received the Nobel Prize in Chemistry for "…his work on ribonuclease, especially concerning the connection between the amino acid sequence and the biologically active conformation." The understanding of this principle is crucial for physical biochemistry students, since protein folding studies, bio-computing…
Jiménez, Diego Javier; Dini-Andreote, Francisco; Ottoni, Júlia Ronzella; de Oliveira, Valéria Maia; van Elsas, Jan Dirk; Andreote, Fernando Dini
2015-01-01
The occurrence of genes encoding biotechnologically relevant α/β-hydrolases in mangrove soil microbial communities was assessed using data obtained by whole-metagenome sequencing of four mangroves areas, denoted BrMgv01 to BrMgv04, in São Paulo, Brazil. The sequences (215 Mb in total) were filtered based on local amino acid alignments against the Lipase Engineering Database. In total, 5923 unassembled sequences were affiliated with 30 different α/β-hydrolase fold superfamilies. The most abundant predicted proteins encompassed cytosolic hydrolases (abH08; ∼ 23%), microsomal hydrolases (abH09; ∼ 12%) and Moraxella lipase-like proteins (abH04 and abH01; < 5%). Detailed analysis of the genes predicted to encode proteins of the abH08 superfamily revealed a high proportion related to epoxide hydrolases and haloalkane dehalogenases in polluted mangroves BrMgv01-02-03. This suggested selection and putative involvement in local degradation/detoxification of the pollutants. Seven sequences that were annotated as genes for putative epoxide hydrolases and five for putative haloalkane dehalogenases were found in a fosmid library generated from BrMgv02 DNA. The latter enzymes were predicted to belong to Actinobacteria, Deinococcus-Thermus, Planctomycetes and Proteobacteria. Our integrated approach thus identified 12 genes (complete and/or partial) that may encode hitherto undescribed enzymes. The low amino acid identity (< 60%) with already-described genes opens perspectives for both production in an expression host and genetic screening of metagenomes. PMID:25171437
A Survey of Protein Structures from Archaeal Viruses
Dellas, Nikki; Lawrence, C. Martin; Young, Mark J.
2013-01-01
Viruses that infect the third domain of life, Archaea, are a newly emerging field of interest. To date, all characterized archaeal viruses infect archaea that thrive in extreme conditions, such as halophilic, hyperthermophilic, and methanogenic environments. Viruses in general, especially those replicating in extreme environments, contain highly mosaic genomes with open reading frames (ORFs) whose sequences are often dissimilar to all other known ORFs. It has been estimated that approximately 85% of virally encoded ORFs do not match known sequences in the nucleic acid databases, and this percentage is even higher for archaeal viruses (typically 90%–100%). This statistic suggests that either virus genomes represent a larger segment of sequence space and/or that viruses encode genes of novel fold and/or function. Because the overall three-dimensional fold of a protein evolves more slowly than its sequence, efforts have been geared toward structural characterization of proteins encoded by archaeal viruses in order to gain insight into their potential functions. In this short review, we provide multiple examples where structural characterization of archaeal viral proteins has indeed provided significant functional and evolutionary insight. PMID:25371334
Direct Observation of Parallel Folding Pathways Revealed Using a Symmetric Repeat Protein System
Aksel, Tural; Barrick, Doug
2014-01-01
Although progress has been made to determine the native fold of a polypeptide from its primary structure, the diversity of pathways that connect the unfolded and folded states has not been adequately explored. Theoretical and computational studies predict that proteins fold through parallel pathways on funneled energy landscapes, although experimental detection of pathway diversity has been challenging. Here, we exploit the high translational symmetry and the direct length variation afforded by linear repeat proteins to directly detect folding through parallel pathways. By comparing folding rates of consensus ankyrin repeat proteins (CARPs), we find a clear increase in folding rates with increasing size and repeat number, although the size of the transition states (estimated from denaturant sensitivity) remains unchanged. The increase in folding rate with chain length, as opposed to a decrease expected from typical models for globular proteins, is a clear demonstration of parallel pathways. This conclusion is not dependent on extensive curve-fitting or structural perturbation of protein structure. By globally fitting a simple parallel-Ising pathway model, we have directly measured nucleation and propagation rates in protein folding, and have quantified the fluxes along each path, providing a detailed energy landscape for folding. This finding of parallel pathways differs from results from kinetic studies of repeat-proteins composed of sequence-variable repeats, where modest repeat-to-repeat energy variation coalesces folding into a single, dominant channel. Thus, for globular proteins, which have much higher variation in local structure and topology, parallel pathways are expected to be the exception rather than the rule. PMID:24988356
Ramirez-Sarmiento, Cesar A; Komives, Elizabeth A
2018-04-06
Hydrogen-deuterium exchange mass spectrometry (HDXMS) has emerged as a powerful approach for revealing folding and allostery in protein-protein interactions. The advent of higher resolution mass spectrometers combined with ion mobility separation and ultra performance liquid chromatographic separations have allowed the complete coverage of large protein sequences and multi-protein complexes. Liquid-handling robots have improved the reproducibility and accurate temperature control of the sample preparation. Many researchers are also appreciating the power of combining biophysical approaches such as stopped-flow fluorescence, single molecule FRET, and molecular dynamics simulations with HDXMS. In this review, we focus on studies that have used a combination of approaches to reveal (re)folding of proteins as well as on long-distance allosteric changes upon interaction. Copyright © 2018 Elsevier Inc. All rights reserved.
Rebelling for a Reason: Protein Structural “Outliers”
Arumugam, Gandhimathi; Nair, Anu G.; Hariharaputran, Sridhar; Ramanathan, Sowdhamini
2013-01-01
Analysis of structural variation in domain superfamilies can reveal constraints in protein evolution which aids protein structure prediction and classification. Structure-based sequence alignment of distantly related proteins, organized in PASS2 database, provides clues about structurally conserved regions among different functional families. Some superfamily members show large structural differences which are functionally relevant. This paper analyses the impact of structural divergence on function for multi-member superfamilies, selected from the PASS2 superfamily alignment database. Functional annotations within superfamilies, with structural outliers or ‘rebels’, are discussed in the context of structural variations. Overall, these data reinforce the idea that functional similarities cannot be extrapolated from mere structural conservation. The implication for fold-function prediction is that the functional annotations can only be inherited with very careful consideration, especially at low sequence identities. PMID:24073209
Characterization of the Structural Gene Promoter of Aedes aegypti Densovirus
Ward, Todd W.; Kimmick, Michael W.; Afanasiev, Boris N.; Carlson, Jonathan O.
2001-01-01
Aedes aegypti densonucleosis virus (AeDNV) has two promoters that have been shown to be active by reporter gene expression analysis (B. N. Afanasiev, Y. V. Koslov, J. O. Carlson, and B. J. Beaty, Exp. Parasitol. 79:322–339, 1994). Northern blot analysis of cells infected with AeDNV revealed two transcripts 1,200 and 3,500 nucleotides in length that are assumed to express the structural protein (VP) gene and nonstructural protein genes, respectively. Primer extension was used to map the transcriptional start site of the structural protein gene. Surprisingly, the structural protein gene transcript began at an initiator consensus sequence, CAGT, 60 nucleotides upstream from the map unit 61 TATAA sequence previously thought to define the promoter. Constructs with the β-galactosidase gene fused to the structural protein gene were used to determine elements necessary for promoter function. Deletion or mutation of the initiator sequence, CAGT, reduced protein expression by 93%, whereas mutation of the TATAA sequence at map unit 61 had little effect. An additional open reading frame was observed upstream of the structural protein gene that can express β-galactosidase at a low level (20% of that of VP fusions). Expression of the AeDNV structural protein gene was shown to be stimulated by the major nonstructural protein NS1 (Afanasiev et al., Exp. parasitol., 1994). To determine the sequences required for transactivation, expression of structural protein gene–β-galactosidase gene fusion constructs differing in AeDNV genome content was measured with and without NS1. The presence of NS1 led to an 8- to 10-fold increase in expression when either genomic end was present, compared to a 2-fold increase with a construct lacking the genomic ends. An even higher (37-fold) increase in expression occurred with both genomic ends present; however, this was in part due to template replication as shown by Southern blot analysis. These data indicate the location and importance of various elements necessary for efficient protein expression and transactivation from the structural protein gene promoter of AeDNV. PMID:11152505
Fast computational methods for predicting protein structure from primary amino acid sequence
Agarwal, Pratul Kumar [Knoxville, TN
2011-07-19
The present invention provides a method utilizing primary amino acid sequence of a protein, energy minimization, molecular dynamics and protein vibrational modes to predict three-dimensional structure of a protein. The present invention also determines possible intermediates in the protein folding pathway. The present invention has important applications to the design of novel drugs as well as protein engineering. The present invention predicts the three-dimensional structure of a protein independent of size of the protein, overcoming a significant limitation in the prior art.
High Pressure ZZ-Exchange NMR Reveals Key Features of Protein Folding Transition States.
Zhang, Yi; Kitazawa, Soichiro; Peran, Ivan; Stenzoski, Natalie; McCallum, Scott A; Raleigh, Daniel P; Royer, Catherine A
2016-11-23
Understanding protein folding mechanisms and their sequence dependence requires the determination of residue-specific apparent kinetic rate constants for the folding and unfolding reactions. Conventional two-dimensional NMR, such as HSQC experiments, can provide residue-specific information for proteins. However, folding is generally too fast for such experiments. ZZ-exchange NMR spectroscopy allows determination of folding and unfolding rates on much faster time scales, yet even this regime is not fast enough for many protein folding reactions. The application of high hydrostatic pressure slows folding by orders of magnitude due to positive activation volumes for the folding reaction. We combined high pressure perturbation with ZZ-exchange spectroscopy on two autonomously folding protein domains derived from the ribosomal protein, L9. We obtained residue-specific apparent rates at 2500 bar for the N-terminal domain of L9 (NTL9), and rates at atmospheric pressure for a mutant of the C-terminal domain (CTL9) from pressure dependent ZZ-exchange measurements. Our results revealed that NTL9 folding is almost perfectly two-state, while small deviations from two-state behavior were observed for CTL9. Both domains exhibited large positive activation volumes for folding. The volumetric properties of these domains reveal that their transition states contain most of the internal solvent excluded voids that are found in the hydrophobic cores of the respective native states. These results demonstrate that by coupling it with high pressure, ZZ-exchange can be extended to investigate a large number of protein conformational transitions.
Xue, Li C; Jordan, Rafael A; El-Manzalawy, Yasser; Dobbs, Drena; Honavar, Vasant
2014-02-01
Selecting near-native conformations from the immense number of conformations generated by docking programs remains a major challenge in molecular docking. We introduce DockRank, a novel approach to scoring docked conformations based on the degree to which the interface residues of the docked conformation match a set of predicted interface residues. DockRank uses interface residues predicted by partner-specific sequence homology-based protein-protein interface predictor (PS-HomPPI), which predicts the interface residues of a query protein with a specific interaction partner. We compared the performance of DockRank with several state-of-the-art docking scoring functions using Success Rate (the percentage of cases that have at least one near-native conformation among the top m conformations) and Hit Rate (the percentage of near-native conformations that are included among the top m conformations). In cases where it is possible to obtain partner-specific (PS) interface predictions from PS-HomPPI, DockRank consistently outperforms both (i) ZRank and IRAD, two state-of-the-art energy-based scoring functions (improving Success Rate by up to 4-fold); and (ii) Variants of DockRank that use predicted interface residues obtained from several protein interface predictors that do not take into account the binding partner in making interface predictions (improving success rate by up to 39-fold). The latter result underscores the importance of using partner-specific interface residues in scoring docked conformations. We show that DockRank, when used to re-rank the conformations returned by ClusPro, improves upon the original ClusPro rankings in terms of both Success Rate and Hit Rate. DockRank is available as a server at http://einstein.cs.iastate.edu/DockRank/. Copyright © 2013 Wiley Periodicals, Inc.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pasek, Marta; Boeggeman, Elizabeth; Ramakrishnan, Boopathy
The expression of recombinant proteins in Escherichia coli often leads to inactive aggregated proteins known as the inclusion bodies. To date, the best available tool has been the use of fusion tags, including the carbohydrate-binding protein; e.g., the maltose-binding protein (MBP) that enhances the solubility of recombinant proteins. However, none of these fusion tags work universally with every partner protein. We hypothesized that galectins, which are also carbohydrate-binding proteins, may help as fusion partners in folding the mammalian proteins in E. coli. Here we show for the first time that a small soluble lectin, human galectin-1, one member of amore » large galectin family, can function as a fusion partner to produce soluble folded recombinant human glycosyltransferase, {beta}-1,4-galactosyltransferase-7 ({beta}4Gal-T7), in E. coli. The enzyme {beta}4Gal-T7 transfers galactose to xylose during the synthesis of the tetrasaccharide linker sequence attached to a Ser residue of proteoglycans. Without a fusion partner, {beta}4Gal-T7 is expressed in E. coli as inclusion bodies. We have designed a new vector construct, pLgals1, from pET-23a that includes the sequence for human galectin-1, followed by the Tev protease cleavage site, a 6x His-coding sequence, and a multi-cloning site where a cloned gene is inserted. After lactose affinity column purification of galectin-1-{beta}4Gal-T7 fusion protein, the unique protease cleavage site allows the protein {beta}4Gal-T7 to be cleaved from galectin-1 that binds and elutes from UDP-agarose column. The eluted protein is enzymatically active, and shows CD spectra comparable to the folded {beta}4Gal-T1. The engineered galectin-1 vector could prove to be a valuable tool for expressing other proteins in E. coli.« less
Zhou, Ren-Bin; Lu, Hui-Meng; Liu, Jie; Shi, Jian-Yu; Zhu, Jing; Lu, Qin-Qin; Yin, Da-Chuan
2016-01-01
Recombinant expression of proteins has become an indispensable tool in modern day research. The large yields of recombinantly expressed proteins accelerate the structural and functional characterization of proteins. Nevertheless, there are literature reported that the recombinant proteins show some differences in structure and function as compared with the native ones. Now there have been more than 100,000 structures (from both recombinant and native sources) publicly available in the Protein Data Bank (PDB) archive, which makes it possible to investigate if there exist any proteins in the RCSB PDB archive that have identical sequence but have some difference in structures. In this paper, we present the results of a systematic comparative study of the 3D structures of identical naturally purified versus recombinantly expressed proteins. The structural data and sequence information of the proteins were mined from the RCSB PDB archive. The combinatorial extension (CE), FATCAT-flexible and TM-Align methods were employed to align the protein structures. The root-mean-square distance (RMSD), TM-score, P-value, Z-score, secondary structural elements and hydrogen bonds were used to assess the structure similarity. A thorough analysis of the PDB archive generated five-hundred-seventeen pairs of native and recombinant proteins that have identical sequence. There were no pairs of proteins that had the same sequence and significantly different structural fold, which support the hypothesis that expression in a heterologous host usually could fold correctly into their native forms.
Zhou, Ren-Bin; Lu, Hui-Meng; Liu, Jie; Shi, Jian-Yu; Zhu, Jing; Lu, Qin-Qin; Yin, Da-Chuan
2016-01-01
Recombinant expression of proteins has become an indispensable tool in modern day research. The large yields of recombinantly expressed proteins accelerate the structural and functional characterization of proteins. Nevertheless, there are literature reported that the recombinant proteins show some differences in structure and function as compared with the native ones. Now there have been more than 100,000 structures (from both recombinant and native sources) publicly available in the Protein Data Bank (PDB) archive, which makes it possible to investigate if there exist any proteins in the RCSB PDB archive that have identical sequence but have some difference in structures. In this paper, we present the results of a systematic comparative study of the 3D structures of identical naturally purified versus recombinantly expressed proteins. The structural data and sequence information of the proteins were mined from the RCSB PDB archive. The combinatorial extension (CE), FATCAT-flexible and TM-Align methods were employed to align the protein structures. The root-mean-square distance (RMSD), TM-score, P-value, Z-score, secondary structural elements and hydrogen bonds were used to assess the structure similarity. A thorough analysis of the PDB archive generated five-hundred-seventeen pairs of native and recombinant proteins that have identical sequence. There were no pairs of proteins that had the same sequence and significantly different structural fold, which support the hypothesis that expression in a heterologous host usually could fold correctly into their native forms. PMID:27517583
TMFoldWeb: a web server for predicting transmembrane protein fold class.
Kozma, Dániel; Tusnády, Gábor E
2015-09-17
Here we present TMFoldWeb, the web server implementation of TMFoldRec, a transmembrane protein fold recognition algorithm. TMFoldRec uses statistical potentials and utilizes topology filtering and a gapless threading algorithm. It ranks template structures and selects the most likely candidates and estimates the reliability of the obtained lowest energy model. The statistical potential was developed in a maximum likelihood framework on a representative set of the PDBTM database. According to the benchmark test the performance of TMFoldRec is about 77 % in correctly predicting fold class for a given transmembrane protein sequence. An intuitive web interface has been developed for the recently published TMFoldRec algorithm. The query sequence goes through a pipeline of topology prediction and a systematic sequence to structure alignment (threading). Resulting templates are ordered by energy and reliability values and are colored according to their significance level. Besides the graphical interface, a programmatic access is available as well, via a direct interface for developers or for submitting genome-wide data sets. The TMFoldWeb web server is unique and currently the only web server that is able to predict the fold class of transmembrane proteins while assigning reliability scores for the prediction. This method is prepared for genome-wide analysis with its easy-to-use interface, informative result page and programmatic access. Considering the info-communication evolution in the last few years, the developed web server, as well as the molecule viewer, is responsive and fully compatible with the prevalent tablets and mobile devices.
Ma, Xueyan; Panjikar, Santosh; Koepke, Juergen; Loris, Elke; Stöckigt, Joachim
2006-01-01
The enzyme strictosidine synthase (STR1) from the Indian medicinal plant Rauvolfia serpentina is of primary importance for the biosynthetic pathway of the indole alkaloid ajmaline. Moreover, STR1 initiates all biosynthetic pathways leading to the entire monoterpenoid indole alkaloid family representing an enormous structural variety of ∼2000 compounds in higher plants. The crystal structures of STR1 in complex with its natural substrates tryptamine and secologanin provide structural understanding of the observed substrate preference and identify residues lining the active site surface that contact the substrates. STR1 catalyzes a Pictet-Spengler–type reaction and represents a novel six-bladed β-propeller fold in plant proteins. Structure-based sequence alignment revealed a common repetitive sequence motif (three hydrophobic residues are followed by a small residue and a hydrophilic residue), indicating a possible evolutionary relationship between STR1 and several sequence-unrelated six-bladed β-propeller structures. Structural analysis and site-directed mutagenesis experiments demonstrate the essential role of Glu-309 in catalysis. The data will aid in deciphering the details of the reaction mechanism of STR1 as well as other members of this enzyme family. PMID:16531499
Ma, Xueyan; Panjikar, Santosh; Koepke, Juergen; Loris, Elke; Stöckigt, Joachim
2006-04-01
The enzyme strictosidine synthase (STR1) from the Indian medicinal plant Rauvolfia serpentina is of primary importance for the biosynthetic pathway of the indole alkaloid ajmaline. Moreover, STR1 initiates all biosynthetic pathways leading to the entire monoterpenoid indole alkaloid family representing an enormous structural variety of approximately 2000 compounds in higher plants. The crystal structures of STR1 in complex with its natural substrates tryptamine and secologanin provide structural understanding of the observed substrate preference and identify residues lining the active site surface that contact the substrates. STR1 catalyzes a Pictet-Spengler-type reaction and represents a novel six-bladed beta-propeller fold in plant proteins. Structure-based sequence alignment revealed a common repetitive sequence motif (three hydrophobic residues are followed by a small residue and a hydrophilic residue), indicating a possible evolutionary relationship between STR1 and several sequence-unrelated six-bladed beta-propeller structures. Structural analysis and site-directed mutagenesis experiments demonstrate the essential role of Glu-309 in catalysis. The data will aid in deciphering the details of the reaction mechanism of STR1 as well as other members of this enzyme family.
Mapping the Geometric Evolution of Protein Folding Motor.
Jerath, Gaurav; Hazam, Prakash Kishore; Shekhar, Shashi; Ramakrishnan, Vibin
2016-01-01
Polypeptide chain has an invariant main-chain and a variant side-chain sequence. How the side-chain sequence determines fold in terms of its chemical constitution has been scrutinized extensively and verified periodically. However, a focussed investigation on the directive effect of side-chain geometry may provide important insights supplementing existing algorithms in mapping the geometrical evolution of protein chains and its structural preferences. Geometrically, folding of protein structure may be envisaged as the evolution of its geometric variables: ϕ, and ψ dihedral angles of polypeptide main-chain directed by χ1, and χ2 of side chain. In this work, protein molecule is metaphorically modelled as a machine with 4 rotors ϕ, ψ, χ1 and χ2, with its evolution to the functional fold is directed by combinations of its rotor directions. We observe that differential rotor motions lead to different secondary structure formations and the combinatorial pattern is unique and consistent for particular secondary structure type. Further, we found that combination of rotor geometries of each amino acid is unique which partly explains how different amino acid sequence combinations have unique structural evolution and functional adaptation. Quantification of these amino acid rotor preferences, resulted in the generation of 3 substitution matrices, which later on plugged in the BLAST tool, for evaluating their efficiency in aligning sequences. We have employed BLOSUM62 and PAM30 as standard for primary evaluation. Generation of substitution matrices is a logical extension of the conceptual framework we attempted to build during the development of this work. Optimization of matrices following the conventional routines and possible application with biologically relevant data sets are beyond the scope of this manuscript, though it is a part of the larger project design.
Donald, L. J.; Chernushevich, I. V.; Zhou, J.; Verentchikov, A.; Poppe-Schriemer, N.; Hosfield, D. J.; Westmore, J. B.; Ens, W.; Duckworth, H. W.; Standing, K. G.
1996-01-01
IclR protein, the repressor of the aceBAK operon of Escherichia coli, has been examined by time-of-flight mass spectrometry, with ionization by matrix assisted laser desorption or by electrospray. The purified protein was found to have a smaller mass than that predicted from the base sequence of the cloned iclR gene. Additional measurements were made on mixtures of peptides derived from IclR by treatment with trypsin and cyanogen bromide. They showed that the amino acid sequence is that predicted from the gene sequence, except that the protein has suffered truncation by removal of the N-terminal eight or, in some cases, nine amino acid residues. The peptide bond whose hydrolysis would remove eight residues is a typical target for the E. coli protease OmpT. We find that, by taking precautions to minimize Omp T proteolysis, or by eliminating it through mutation of the host strain, we can isolate full-length IclR protein (lacking only the N-terminal methionine residue). Full-length IclR is a much better DNA-binding protein than the truncated versions: it binds the aceBAK operator sequence 44-fold more tightly, presumably because of additional contacts that the N-terminal residues make with the DNA. Our experience thus demonstrates the advantages of using mass spectrometry to characterize newly purified proteins produced from cloned genes, especially where proteolysis or other covalent modification is a concern. This technique gives mass spectra from complex peptide mixtures that can be analyzed completely, without any fractionation of the mixtures, by reference to the amino acid sequence inferred from the base sequence of the cloned gene. PMID:8844850
NASA Astrophysics Data System (ADS)
Huang, Zhao
2011-12-01
Compared to 'conventional' materials made from metal, glass, or ceramics, protein-based materials have unique mechanical properties. Furthermore, the morphology, mechanical properties, and functionality of protein-based materials may be optimized via sequence engineering for use in a variety of applications, including textile materials, biosensors, and tissue engineering scaffolds. The development of recombinant DNA technology has enabled the production and engineering of protein-based materials ex vivo. However, harsh production conditions can compromise the mechanical properties of protein-based materials and diminish their ability to incorporate functional proteins. Developing a new generation of protein-based materials is crucial to (i) improve materials assembly conditions, (ii) create novel mechanical properties, and (iii) expand the capacity to carry functional protein/peptide sequences. This thesis describes development of novel protein-based materials using Ultrabithorax, a member of the Hox family of proteins that regulate developmental pathways in Drosophila melanogaster. The experiments presented (i) establish the conditions required for the assembly of Ubx-based materials, (ii) generate a wide range of Ubx morphologies, (iii) examine the mechanical properties of Ubx fibers, (iv) incorporate protein functions to Ubx-based materials via gene fusion, (v) pattern protein functions within the Ubx materials, and (vi) examine the biocompatibility of Ubx materials in vitro. Ubx-based materials assemble at mild conditions compatible with protein folding and activity, which enables Ubx chimeric materials to retain the function of appended proteins in spatial patterns determined by materials assembly. Ubx-based materials also display mechanical properties comparable to existing protein-based materials and demonstrate good biocompatibility with living cells in vitro. Taken together, this research demonstrates the unique features and future potential of novel Ubx-based materials.
Goncearenco, Alexander; Ma, Bin-Guang; Berezovsky, Igor N
2014-03-01
DNA, RNA and proteins are major biological macromolecules that coevolve and adapt to environments as components of one highly interconnected system. We explore here sequence/structure determinants of mechanisms of adaptation of these molecules, links between them, and results of their mutual evolution. We complemented statistical analysis of genomic and proteomic sequences with folding simulations of RNA molecules, unraveling causal relations between compositional and sequence biases reflecting molecular adaptation on DNA, RNA and protein levels. We found many compositional peculiarities related to environmental adaptation and the life style. Specifically, thermal adaptation of protein-coding sequences in Archaea is characterized by a stronger codon bias than in Bacteria. Guanine and cytosine load in the third codon position is important for supporting the aerobic life style, and it is highly pronounced in Bacteria. The third codon position also provides a tradeoff between arginine and lysine, which are favorable for thermal adaptation and aerobicity, respectively. Dinucleotide composition provides stability of nucleic acids via strong base-stacking in ApG dinucleotides. In relation to coevolution of nucleic acids and proteins, thermostability-related demands on the amino acid composition affect the nucleotide content in the second codon position in Archaea.
Goncearenco, Alexander; Ma, Bin-Guang; Berezovsky, Igor N.
2014-01-01
DNA, RNA and proteins are major biological macromolecules that coevolve and adapt to environments as components of one highly interconnected system. We explore here sequence/structure determinants of mechanisms of adaptation of these molecules, links between them, and results of their mutual evolution. We complemented statistical analysis of genomic and proteomic sequences with folding simulations of RNA molecules, unraveling causal relations between compositional and sequence biases reflecting molecular adaptation on DNA, RNA and protein levels. We found many compositional peculiarities related to environmental adaptation and the life style. Specifically, thermal adaptation of protein-coding sequences in Archaea is characterized by a stronger codon bias than in Bacteria. Guanine and cytosine load in the third codon position is important for supporting the aerobic life style, and it is highly pronounced in Bacteria. The third codon position also provides a tradeoff between arginine and lysine, which are favorable for thermal adaptation and aerobicity, respectively. Dinucleotide composition provides stability of nucleic acids via strong base-stacking in ApG dinucleotides. In relation to coevolution of nucleic acids and proteins, thermostability-related demands on the amino acid composition affect the nucleotide content in the second codon position in Archaea. PMID:24371267
Co-evolutionary constraints of globular proteins correlate with their folding rates.
Mallik, Saurav; Kundu, Sudip
2015-08-04
Folding rates (lnkf) of globular proteins correlate with their biophysical properties, but relationship between lnkf and patterns of sequence evolution remains elusive. We introduce 'relative co-evolution order' (rCEO) as length-normalized average primary chain separation of co-evolving pairs (CEPs), which negatively correlates with lnkf. In addition to pairs in native 3D contact, indirectly connected and structurally remote CEPs probably also play critical roles in protein folding. Correlation between rCEO and lnkf is stronger in multi-state proteins than two-state proteins, contrasting the case of contact order (co), where stronger correlation is found in two-state proteins. Finally, rCEO, co and lnkf are fitted into a 3D linear correlation. Copyright © 2015 Federation of European Biochemical Societies. Published by Elsevier B.V. All rights reserved.
Self-assembled bionanostructures: proteins following the lead of DNA nanostructures
2014-01-01
Natural polymers are able to self-assemble into versatile nanostructures based on the information encoded into their primary structure. The structural richness of biopolymer-based nanostructures depends on the information content of building blocks and the available biological machinery to assemble and decode polymers with a defined sequence. Natural polypeptides comprise 20 amino acids with very different properties in comparison to only 4 structurally similar nucleotides, building elements of nucleic acids. Nevertheless the ease of synthesizing polynucleotides with selected sequence and the ability to encode the nanostructural assembly based on the two specific nucleotide pairs underlay the development of techniques to self-assemble almost any selected three-dimensional nanostructure from polynucleotides. Despite more complex design rules, peptides were successfully used to assemble symmetric nanostructures, such as fibrils and spheres. While earlier designed protein-based nanostructures used linked natural oligomerizing domains, recent design of new oligomerizing interaction surfaces and introduction of the platform for topologically designed protein fold may enable polypeptide-based design to follow the track of DNA nanostructures. The advantages of protein-based nanostructures, such as the functional versatility and cost effective and sustainable production methods provide strong incentive for further development in this direction. PMID:24491139
Mocz, G.
1995-01-01
Fuzzy cluster analysis has been applied to the 20 amino acids by using 65 physicochemical properties as a basis for classification. The clustering products, the fuzzy sets (i.e., classical sets with associated membership functions), have provided a new measure of amino acid similarities for use in protein folding studies. This work demonstrates that fuzzy sets of simple molecular attributes, when assigned to amino acid residues in a protein's sequence, can predict the secondary structure of the sequence with reasonable accuracy. An approach is presented for discriminating standard folding states, using near-optimum information splitting in half-overlapping segments of the sequence of assigned membership functions. The method is applied to a nonredundant set of 252 proteins and yields approximately 73% matching for correctly predicted and correctly rejected residues with approximately 60% overall success rate for the correctly recognized ones in three folding states: alpha-helix, beta-strand, and coil. The most useful attributes for discriminating these states appear to be related to size, polarity, and thermodynamic factors. Van der Waals volume, apparent average thickness of surrounding molecular free volume, and a measure of dimensionless surface electron density can explain approximately 95% of prediction results. hydrogen bonding and hydrophobicity induces do not yet enable clear clustering and prediction. PMID:7549882
NASA Astrophysics Data System (ADS)
Shea, Joan-Emma; Brooks, Charles L., III
2001-10-01
Beginning with simplified lattice and continuum "minimalist" models and progressing to detailed atomic models, simulation studies have augmented and directed development of the modern landscape perspective of protein folding. In this review we discuss aspects of detailed atomic simulation methods applied to studies of protein folding free energy surfaces, using biased-sampling free energy methods and temperature-induced protein unfolding. We review studies from each on systems of particular experimental interest and assess the strengths and weaknesses of each approach in the context of "exact" results for both free energies and kinetics of a minimalist model for a beta-barrel protein. We illustrate in detail how each approach is implemented and discuss analysis methods that have been developed as components of these studies. We describe key insights into the relationship between protein topology and the folding mechanism emerging from folding free energy surface calculations. We further describe the determination of detailed "pathways" and models of folding transition states that have resulted from unfolding studies. Our assessment of the two methods suggests that both can provide, often complementary, details of folding mechanism and thermodynamics, but this success relies on (a) adequate sampling of diverse conformational regions for the biased-sampling free energy approach and (b) many trajectories at multiple temperatures for unfolding studies. Furthermore, we find that temperature-induced unfolding provides representatives of folding trajectories only when the topology and sequence (energy) provide a relatively funneled landscape and "off-pathway" intermediates do not exist.
Predicting residue-wise contact orders in proteins by support vector regression.
Song, Jiangning; Burrage, Kevin
2006-10-03
The residue-wise contact order (RWCO) describes the sequence separations between the residues of interest and its contacting residues in a protein sequence. It is a new kind of one-dimensional protein structure that represents the extent of long-range contacts and is considered as a generalization of contact order. Together with secondary structure, accessible surface area, the B factor, and contact number, RWCO provides comprehensive and indispensable important information to reconstructing the protein three-dimensional structure from a set of one-dimensional structural properties. Accurately predicting RWCO values could have many important applications in protein three-dimensional structure prediction and protein folding rate prediction, and give deep insights into protein sequence-structure relationships. We developed a novel approach to predict residue-wise contact order values in proteins based on support vector regression (SVR), starting from primary amino acid sequences. We explored seven different sequence encoding schemes to examine their effects on the prediction performance, including local sequence in the form of PSI-BLAST profiles, local sequence plus amino acid composition, local sequence plus molecular weight, local sequence plus secondary structure predicted by PSIPRED, local sequence plus molecular weight and amino acid composition, local sequence plus molecular weight and predicted secondary structure, and local sequence plus molecular weight, amino acid composition and predicted secondary structure. When using local sequences with multiple sequence alignments in the form of PSI-BLAST profiles, we could predict the RWCO distribution with a Pearson correlation coefficient (CC) between the predicted and observed RWCO values of 0.55, and root mean square error (RMSE) of 0.82, based on a well-defined dataset with 680 protein sequences. Moreover, by incorporating global features such as molecular weight and amino acid composition we could further improve the prediction performance with the CC to 0.57 and an RMSE of 0.79. In addition, combining the predicted secondary structure by PSIPRED was found to significantly improve the prediction performance and could yield the best prediction accuracy with a CC of 0.60 and RMSE of 0.78, which provided at least comparable performance compared with the other existing methods. The SVR method shows a prediction performance competitive with or at least comparable to the previously developed linear regression-based methods for predicting RWCO values. In contrast to support vector classification (SVC), SVR is very good at estimating the raw value profiles of the samples. The successful application of the SVR approach in this study reinforces the fact that support vector regression is a powerful tool in extracting the protein sequence-structure relationship and in estimating the protein structural profiles from amino acid sequences.
Multi-level machine learning prediction of protein-protein interactions in Saccharomyces cerevisiae.
Zubek, Julian; Tatjewski, Marcin; Boniecki, Adam; Mnich, Maciej; Basu, Subhadip; Plewczynski, Dariusz
2015-01-01
Accurate identification of protein-protein interactions (PPI) is the key step in understanding proteins' biological functions, which are typically context-dependent. Many existing PPI predictors rely on aggregated features from protein sequences, however only a few methods exploit local information about specific residue contacts. In this work we present a two-stage machine learning approach for prediction of protein-protein interactions. We start with the carefully filtered data on protein complexes available for Saccharomyces cerevisiae in the Protein Data Bank (PDB) database. First, we build linear descriptions of interacting and non-interacting sequence segment pairs based on their inter-residue distances. Secondly, we train machine learning classifiers to predict binary segment interactions for any two short sequence fragments. The final prediction of the protein-protein interaction is done using the 2D matrix representation of all-against-all possible interacting sequence segments of both analysed proteins. The level-I predictor achieves 0.88 AUC for micro-scale, i.e., residue-level prediction. The level-II predictor improves the results further by a more complex learning paradigm. We perform 30-fold macro-scale, i.e., protein-level cross-validation experiment. The level-II predictor using PSIPRED-predicted secondary structure reaches 0.70 precision, 0.68 recall, and 0.70 AUC, whereas other popular methods provide results below 0.6 threshold (recall, precision, AUC). Our results demonstrate that multi-scale sequence features aggregation procedure is able to improve the machine learning results by more than 10% as compared to other sequence representations. Prepared datasets and source code for our experimental pipeline are freely available for download from: http://zubekj.github.io/mlppi/ (open source Python implementation, OS independent).
NoFold: RNA structure clustering without folding or alignment.
Middleton, Sarah A; Kim, Junhyong
2014-11-01
Structures that recur across multiple different transcripts, called structure motifs, often perform a similar function-for example, recruiting a specific RNA-binding protein that then regulates translation, splicing, or subcellular localization. Identifying common motifs between coregulated transcripts may therefore yield significant insight into their binding partners and mechanism of regulation. However, as most methods for clustering structures are based on folding individual sequences or doing many pairwise alignments, this results in a tradeoff between speed and accuracy that can be problematic for large-scale data sets. Here we describe a novel method for comparing and characterizing RNA secondary structures that does not require folding or pairwise alignment of the input sequences. Our method uses the idea of constructing a distance function between two objects by their respective distances to a collection of empirical examples or models, which in our case consists of 1973 Rfam family covariance models. Using this as a basis for measuring structural similarity, we developed a clustering pipeline called NoFold to automatically identify and annotate structure motifs within large sequence data sets. We demonstrate that NoFold can simultaneously identify multiple structure motifs with an average sensitivity of 0.80 and precision of 0.98 and generally exceeds the performance of existing methods. We also perform a cross-validation analysis of the entire set of Rfam families, achieving an average sensitivity of 0.57. We apply NoFold to identify motifs enriched in dendritically localized transcripts and report 213 enriched motifs, including both known and novel structures. © 2014 Middleton and Kim; Published by Cold Spring Harbor Laboratory Press for the RNA Society.
Comparative Protein Structure Modeling Using MODELLER.
Webb, Benjamin; Sali, Andrej
2014-09-08
Functional characterization of a protein sequence is one of the most frequent problems in biology. This task is usually facilitated by accurate three-dimensional (3-D) structure of the studied protein. In the absence of an experimentally determined structure, comparative or homology modeling can sometimes provide a useful 3-D model for a protein that is related to at least one known protein structure. Comparative modeling predicts the 3-D structure of a given protein sequence (target) based primarily on its alignment to one or more proteins of known structure (templates). The prediction process consists of fold assignment, target-template alignment, model building, and model evaluation. This unit describes how to calculate comparative models using the program MODELLER and discusses all four steps of comparative modeling, frequently observed errors, and some applications. Modeling lactate dehydrogenase from Trichomonas vaginalis (TvLDH) is described as an example. The download and installation of the MODELLER software is also described. Copyright © 2014 John Wiley & Sons, Inc.
Single helically folded aromatic oligoamides that mimic the charge surface of double-stranded B-DNA
NASA Astrophysics Data System (ADS)
Ziach, Krzysztof; Chollet, Céline; Parissi, Vincent; Prabhakaran, Panchami; Marchivie, Mathieu; Corvaglia, Valentina; Bose, Partha Pratim; Laxmi-Reddy, Katta; Godde, Frédéric; Schmitter, Jean-Marie; Chaignepain, Stéphane; Pourquier, Philippe; Huc, Ivan
2018-05-01
Numerous essential biomolecular processes require the recognition of DNA surface features by proteins. Molecules mimicking these features could potentially act as decoys and interfere with pharmacologically or therapeutically relevant protein-DNA interactions. Although naturally occurring DNA-mimicking proteins have been described, synthetic tunable molecules that mimic the charge surface of double-stranded DNA are not known. Here, we report the design, synthesis and structural characterization of aromatic oligoamides that fold into single helical conformations and display a double helical array of negatively charged residues in positions that match the phosphate moieties in B-DNA. These molecules were able to inhibit several enzymes possessing non-sequence-selective DNA-binding properties, including topoisomerase 1 and HIV-1 integrase, presumably through specific foldamer-protein interactions, whereas sequence-selective enzymes were not inhibited. Such modular and synthetically accessible DNA mimics provide a versatile platform to design novel inhibitors of protein-DNA interactions.
The intracellular region of ClC-3 chloride channel is in a partially folded state and a monomer.
Li, Shu Jie; Kawazaki, Masanobu; Ogasahara, Kyoko; Nakagawa, Atsushi
2006-05-01
In contrast to bacterial ClC chloride channels, all eukaryotic ClC chloride channels have a conserved long intracellular region that makes up of the carboxyl terminus of the protein and is necessary for channel functions as a channel gate. Little is known, however, about the molecular structure of the intracellular region of ClC chloride channels so far. Here, for the first time, we have expressed and purified the intracellular region of the rat ClC-3 chloride channel (C-ClC-3) as a water-soluble protein under physiological conditions, and investigated its structural characteristics and assembly behavior by means of circular dichroism (CD) spectroscopy, differential scanning calorimetry (DSC), size exclusion chromatography and analytical ultracentrifugation. The far-UV CD spectra of C-ClC-3 in the native state and in the presence of urea clearly show that the protein has a significantly folded secondary structure consisting of alpha-helices and beta-sheets, while the near-UV CD spectra and DSC experiments indicate the protein is deficient in well-defined tertiary packing. Its Stokes radius is larger than its expected size as a folded globular protein, as determined on size exclusion chromatography. Furthermore, the DisEMBL program, a useful computational tool for the prediction of disordered/unstructured regions within a protein sequence, predicts that the protein is in a partially folded state. Based on these results, we conclude that C-ClC-3 is partially folded. On the other hand, both size exclusion chromatography and sedimentation equilibrium analysis show that C-ClC-3 exists as a monomer in solution, not a dimer like the whole ClC-3 molecule.
Circular permutant GFP insertion folding reporters
Waldo, Geoffrey S [Santa Fe, NM; Cabantous, Stephanie [Los Alamos, NM
2008-06-24
Provided are methods of assaying and improving protein folding using circular permutants of fluorescent proteins, including circular permutants of GFP variants and combinations thereof. The invention further provides various nucleic acid molecules and vectors incorporating such nucleic acid molecules, comprising polynucleotides encoding fluorescent protein circular permutants derived from superfolder GFP, which polynucleotides include an internal cloning site into which a heterologous polynucleotide may be inserted in-frame with the circular permutant coding sequence, and which when expressed are capable of reporting on the degree to which a polypeptide encoded by such an inserted heterologous polynucleotide is correctly folded by correlation with the degree of fluorescence exhibited.
Circular permutant GFP insertion folding reporters
Waldo, Geoffrey S; Cabantous, Stephanie
2013-02-12
Provided are methods of assaying and improving protein folding using circular permutants of fluorescent proteins, including circular permutants of GFP variants and combinations thereof. The invention further provides various nucleic acid molecules and vectors incorporating such nucleic acid molecules, comprising polynucleotides encoding fluorescent protein circular permutants derived from superfolder GFP, which polynucleotides include an internal cloning site into which a heterologous polynucleotide may be inserted in-frame with the circular permutant coding sequence, and which when expressed are capable of reporting on the degree to which a polypeptide encoded by such an inserted heterologous polynucleotide is correctly folded by correlation with the degree of fluorescence exhibited.
Circular permutant GFP insertion folding reporters
Waldo, Geoffrey S [Santa Fe, NM; Cabantous, Stephanie [Los Alamos, NM
2011-06-14
Provided are methods of assaying and improving protein folding using circular permutants of fluorescent proteins, including circular permutants of GFP variants and combinations thereof. The invention further provides various nucleic acid molecules and vectors incorporating such nucleic acid molecules, comprising polynucleotides encoding fluorescent protein circular permutants derived from superfolder GFP, which polynucleotides include an internal cloning site into which a heterologous polynucleotide may be inserted in-frame with the circular permutant coding sequence, and which when expressed are capable of reporting on the degree to which a polypeptide encoded by such an inserted heterologous polynucleotide is correctly folded by correlation with the degree of fluorescence exhibited.
Circular permutant GFP insertion folding reporters
Waldo, Geoffrey S.; Cabantous, Stephanie
2013-04-16
Provided are methods of assaying and improving protein folding using circular permutants of fluorescent proteins, including circular permutants of GFP variants and combinations thereof. The invention further provides various nucleic acid molecules and vectors incorporating such nucleic acid molecules, comprising polynucleotides encoding fluorescent protein circular permutants derived from superfolder GFP, which polynucleotides include an internal cloning site into which a heterologous polynucleotide may be inserted in-frame with the circular permutant coding sequence, and which when expressed are capable of reporting on the degree to which a polypeptide encoded by such an inserted heterologous polynucleotide is correctly folded by correlation with the degree of fluorescence exhibited.
Investigating homology between proteins using energetic profiles.
Wrabl, James O; Hilser, Vincent J
2010-03-26
Accumulated experimental observations demonstrate that protein stability is often preserved upon conservative point mutation. In contrast, less is known about the effects of large sequence or structure changes on the stability of a particular fold. Almost completely unknown is the degree to which stability of different regions of a protein is generally preserved throughout evolution. In this work, these questions are addressed through thermodynamic analysis of a large representative sample of protein fold space based on remote, yet accepted, homology. More than 3,000 proteins were computationally analyzed using the structural-thermodynamic algorithm COREX/BEST. Estimated position-specific stability (i.e., local Gibbs free energy of folding) and its component enthalpy and entropy were quantitatively compared between all proteins in the sample according to all-vs.-all pairwise structural alignment. It was discovered that the local stabilities of homologous pairs were significantly more correlated than those of non-homologous pairs, indicating that local stability was indeed generally conserved throughout evolution. However, the position-specific enthalpy and entropy underlying stability were less correlated, suggesting that the overall regional stability of a protein was more important than the thermodynamic mechanism utilized to achieve that stability. Finally, two different types of statistically exceptional evolutionary structure-thermodynamic relationships were noted. First, many homologous proteins contained regions of similar thermodynamics despite localized structure change, suggesting a thermodynamic mechanism enabling evolutionary fold change. Second, some homologous proteins with extremely similar structures nonetheless exhibited different local stabilities, a phenomenon previously observed experimentally in this laboratory. These two observations, in conjunction with the principal conclusion that homologous proteins generally conserved local stability, may provide guidance for a future thermodynamically informed classification of protein homology.
Reinert, Zachary E; Horne, W Seth
2014-11-28
A variety of non-biological structural motifs have been incorporated into the backbone of natural protein sequences. In parallel work, diverse unnatural oligomers of de novo design (termed "foldamers") have been developed that fold in defined ways. In this Perspective article, we survey foundational studies on protein backbone engineering, with a focus on alterations made in the context of complex tertiary folds. We go on to summarize recent work illustrating the potential promise of these methods to provide a general framework for the construction of foldamer mimics of protein tertiary structures.
Codon influence on protein expression in E. coli correlates with mRNA levels
Boël, Grégory; Wong, Kam-Ho; Su, Min; Luff, Jon; Valecha, Mayank; Everett, John K.; Acton, Thomas B.; Xiao, Rong; Montelione, Gaetano T.; Aalberts, Daniel P.; Hunt, John F.
2016-01-01
Degeneracy in the genetic code, which enables a single protein to be encoded by a multitude of synonymous gene sequences, has an important role in regulating protein expression, but substantial uncertainty exists concerning the details of this phenomenon. Here we analyze the sequence features influencing protein expression levels in 6,348 experiments using bacteriophage T7 polymerase to synthesize messenger RNA in Escherichia coli. Logistic regression yields a new codon-influence metric that correlates only weakly with genomic codon-usage frequency, but strongly with global physiological protein concentrations and also mRNA concentrations and lifetimes in vivo. Overall, the codon content influences protein expression more strongly than mRNA-folding parameters, although the latter dominate in the initial ~16 codons. Genes redesigned based on our analyses are transcribed with unaltered efficiency but translated with higher efficiency in vitro. The less efficiently translated native sequences show greatly reduced mRNA levels in vivo. Our results suggest that codon content modulates a kinetic competition between protein elongation and mRNA degradation that is a central feature of the physiology and also possibly the regulation of translation in E. coli. PMID:26760206
Improvisation in evolution of genes and genomes: whose structure is it anyway?
Shakhnovich, Boris E; Shakhnovich, Eugene I
2008-06-01
Significant progress has been made in recent years in a variety of seemingly unrelated fields such as sequencing, protein structure prediction, and high-throughput transcriptomics and metabolomics. At the same time, new microscopic models have been developed that made it possible to analyze the evolution of genes and genomes from first principles. The results from these efforts enable, for the first time, a comprehensive insight into the evolution of complex systems and organisms on all scales--from sequences to organisms and populations. Every newly sequenced genome uncovers new genes, families, and folds. Where do these new genes come from? How do gene duplication and subsequent divergence of sequence and structure affect the fitness of the organism? What role does regulation play in the evolution of proteins and folds? Emerging synergism between data and modeling provides first robust answers to these questions.
Functional Tat transport of unstructured, small, hydrophilic proteins.
Richter, Silke; Lindenstrauss, Ute; Lücke, Christian; Bayliss, Richard; Brüser, Thomas
2007-11-16
The twin-arginine translocation (Tat) system is a protein translocation system that is adapted to the translocation of folded proteins across biological membranes. An understanding of the folding requirements for Tat substrates is of fundamental importance for the elucidation of the transport mechanism. We now demonstrate for the first time Tat transport for fully unstructured proteins, using signal sequence fusions to naturally unfolded FG repeats from the yeast Nsp1p nuclear pore protein. The transport of unfolded proteins becomes less efficient with increasing size, consistent with only a single interaction between the system and the substrate. Strikingly, the introduction of six residues from the hydrophobic core of a globular protein completely blocked translocation. Physiological data suggest that hydrophobic surface patches abort transport at a late stage, most likely by membrane interactions during transport. This study thus explains the observed restriction of the Tat system to folded globular proteins on a molecular level.
Janecek, S.
1996-01-01
The question of parallel (alpha/beta)8-barrel fold evolution remains unclear, owing mainly to the lack of sequence homology throughout the amino acid sequences of (alpha/beta)8-barrel enzymes. The "classical" approaches used in the search for homologies among (alpha/beta)8-barrels (e.g., production of structurally based alignments) have yielded alignments perfect from the structural point of view, but the approaches have been unable to reveal the homologies. These are proposed to be "hidden" in (alpha/beta)8-barrel enzymes. The term "hidden homology" means that the alignment of sequence stretches proposed to be homologous need not be structurally fully satisfactory. This is due to the very long evolutionary history of all (alpha/beta)8-barrels. This work identifies so-called hidden homology around the strand beta 2 that is flanked by loops containing invariant glycines and prolines in 17 different (alpha/beta)8-barrel enzymes, i.e., roughly in half of all currently known (alpha/beta)8-barrel proteins. The search was based on the idea that a conserved sequence region of an (alpha/beta)8-barrel enzyme should be more or less conserved also in the equivalent part of the structure of the other enzymes with this folding motif, given their mutual evolutionary relatedness. For this purpose, the sequence region around the well-conserved second beta-strand of alpha-amylase flanked by the invariant glycine and proline (56_GFTAIWITP, Aspergillus oryzae alpha-amylase numbering), was used as the sequence-structural template. The proposal that the second beta-strand of (alpha/beta)8-barrel fold is important from the evolutionary point of view is strongly supported by the increasing trend of the observed beta 2-strand structural similarity for the pairs of (alpha/beta)8-barrel enzymes: alpha-amylase and the alpha-subunit of tryptophan synthase, alpha-amylase and mandelate racemase, and alpha-amylase and cyclodextrin glycosyltransferase. This trend is also in agreement with the existing evolutionary division of the entire family of (alpha/beta)8-barrel proteins. PMID:8762144
Janecek, S
1996-06-01
The question of parallel (alpha/beta)8-barrel fold evolution remains unclear, owing mainly to the lack of sequence homology throughout the amino acid sequences of (alpha/beta)8-barrel enzymes. The "classical" approaches used in the search for homologies among (alpha/beta)8-barrels (e.g., production of structurally based alignments) have yielded alignments perfect from the structural point of view, but the approaches have been unable to reveal the homologies. These are proposed to be "hidden" in (alpha/beta)8-barrel enzymes. The term "hidden homology" means that the alignment of sequence stretches proposed to be homologous need not be structurally fully satisfactory. This is due to the very long evolutionary history of all (alpha/beta)8-barrels. This work identifies so-called hidden homology around the strand beta 2 that is flanked by loops containing invariant glycines and prolines in 17 different (alpha/beta)8-barrel enzymes, i.e., roughly in half of all currently known (alpha/beta)8-barrel proteins. The search was based on the idea that a conserved sequence region of an (alpha/beta)8-barrel enzyme should be more or less conserved also in the equivalent part of the structure of the other enzymes with this folding motif, given their mutual evolutionary relatedness. For this purpose, the sequence region around the well-conserved second beta-strand of alpha-amylase flanked by the invariant glycine and proline (56_GFTAIWITP, Aspergillus oryzae alpha-amylase numbering), was used as the sequence-structural template. The proposal that the second beta-strand of (alpha/beta)8-barrel fold is important from the evolutionary point of view is strongly supported by the increasing trend of the observed beta 2-strand structural similarity for the pairs of (alpha/beta)8-barrel enzymes: alpha-amylase and the alpha-subunit of tryptophan synthase, alpha-amylase and mandelate racemase, and alpha-amylase and cyclodextrin glycosyltransferase. This trend is also in agreement with the existing evolutionary division of the entire family of (alpha/beta)8-barrel proteins.
Broglia, Ricardo A; Tiana, Guido; Sutto, Ludovico; Provasi, Davide; Simona, Fabio
2005-10-01
The main problems found in designing drugs are those of optimizing the drug-target interaction and of avoiding the insurgence of resistance. We suggest a scheme for the design of inhibitors that can be used as leads for the development of a drug and that do not face either of these problems, and then apply it to the case of HIV-1-PR. It is based on the knowledge that the folding of single-domain proteins, such as each of the monomers forming the HIV-1-PR homodimer, is controlled by local elementary structures (LES), stabilized by local contacts among hydrophobic, strongly interacting, and highly conserved amino acids that play a central role in the folding process. Because LES have evolved over many generations to recognize and strongly interact with each other so as to make the protein fold fast and avoid aggregation with other proteins, highly specific (and thus little toxic) as well as effective folding-inhibitor molecules suggest themselves: short peptides (or eventually their mimetic molecules) displaying the same amino acid sequence of that of LES (p-LES). Aside from being specific and efficient, these inhibitors are expected not to induce resistance; in fact, mutations in HIV-1-PR that successfully avoid the action of p-LES imply the destabilization of one or more LES and thus should lead to protein denaturation. Making use of Monte Carlo simulations, we first identify the LES of the HIV-1-PR and then show that the corresponding p-LES peptides act as effective inhibitors of the folding of the protease.
Defining and predicting structurally conserved regions in protein superfamilies
Huang, Ivan K.; Grishin, Nick V.
2013-01-01
Motivation: The structures of homologous proteins are generally better conserved than their sequences. This phenomenon is demonstrated by the prevalence of structurally conserved regions (SCRs) even in highly divergent protein families. Defining SCRs requires the comparison of two or more homologous structures and is affected by their availability and divergence, and our ability to deduce structurally equivalent positions among them. In the absence of multiple homologous structures, it is necessary to predict SCRs of a protein using information from only a set of homologous sequences and (if available) a single structure. Accurate SCR predictions can benefit homology modelling and sequence alignment. Results: Using pairwise DaliLite alignments among a set of homologous structures, we devised a simple measure of structural conservation, termed structural conservation index (SCI). SCI was used to distinguish SCRs from non-SCRs. A database of SCRs was compiled from 386 SCOP superfamilies containing 6489 protein domains. Artificial neural networks were then trained to predict SCRs with various features deduced from a single structure and homologous sequences. Assessment of the predictions via a 5-fold cross-validation method revealed that predictions based on features derived from a single structure perform similarly to ones based on homologous sequences, while combining sequence and structural features was optimal in terms of accuracy (0.755) and Matthews correlation coefficient (0.476). These results suggest that even without information from multiple structures, it is still possible to effectively predict SCRs for a protein. Finally, inspection of the structures with the worst predictions pinpoints difficulties in SCR definitions. Availability: The SCR database and the prediction server can be found at http://prodata.swmed.edu/SCR. Contact: 91huangi@gmail.com or grishin@chop.swmed.edu Supplementary information: Supplementary data are available at Bioinformatics Online PMID:23193223
Richard, François D; Kajava, Andrey V
2014-06-01
The dramatic growth of sequencing data evokes an urgent need to improve bioinformatics tools for large-scale proteome analysis. Over the last two decades, the foremost efforts of computer scientists were devoted to proteins with aperiodic sequences having globular 3D structures. However, a large portion of proteins contain periodic sequences representing arrays of repeats that are directly adjacent to each other (so called tandem repeats or TRs). These proteins frequently fold into elongated fibrous structures carrying different fundamental functions. Algorithms specific to the analysis of these regions are urgently required since the conventional approaches developed for globular domains have had limited success when applied to the TR regions. The protein TRs are frequently not perfect, containing a number of mutations, and some of them cannot be easily identified. To detect such "hidden" repeats several algorithms have been developed. However, the most sensitive among them are time-consuming and, therefore, inappropriate for large scale proteome analysis. To speed up the TR detection we developed a rapid filter that is based on the comparison of composition and order of short strings in the adjacent sequence motifs. Tests show that our filter discards up to 22.5% of proteins which are known to be without TRs while keeping almost all (99.2%) TR-containing sequences. Thus, we are able to decrease the size of the initial sequence dataset enriching it with TR-containing proteins which allows a faster subsequent TR detection by other methods. The program is available upon request. Copyright © 2014 Elsevier Inc. All rights reserved.
Maurer-Stroh, Sebastian; Gao, He; Han, Hao; Baeten, Lies; Schymkowitz, Joost; Rousseau, Frederic; Zhang, Louxin; Eisenhaber, Frank
2013-02-01
Data mining in protein databases, derivatives from more fundamental protein 3D structure and sequence databases, has considerable unearthed potential for the discovery of sequence motif--structural motif--function relationships as the finding of the U-shape (Huf-Zinc) motif, originally a small student's project, exemplifies. The metal ion zinc is critically involved in universal biological processes, ranging from protein-DNA complexes and transcription regulation to enzymatic catalysis and metabolic pathways. Proteins have evolved a series of motifs to specifically recognize and bind zinc ions. Many of these, so called zinc fingers, are structurally independent globular domains with discontinuous binding motifs made up of residues mostly far apart in sequence. Through a systematic approach starting from the BRIX structure fragment database, we discovered that there exists another predictable subset of zinc-binding motifs that not only have a conserved continuous sequence pattern but also share a characteristic local conformation, despite being included in totally different overall folds. While this does not allow general prediction of all Zn binding motifs, a HMM-based web server, Huf-Zinc, is available for prediction of these novel, as well as conventional, zinc finger motifs in protein sequences. The Huf-Zinc webserver can be freely accessed through this URL (http://mendel.bii.a-star.edu.sg/METHODS/hufzinc/).
Papanikolopoulou, Katerina; Forge, Vincent; Goeltz, Pierrette; Mitraki, Anna
2004-03-05
The folding of beta-structured, fibrous proteins is a largely unexplored area. A class of such proteins is used by viruses as adhesins, and recent studies revealed novel beta-structured motifs for them. We have been studying the folding and assembly of adenovirus fibers that consist of a globular C-terminal domain, a central fibrous shaft, and an N-terminal part that attaches to the viral capsid. The globular C-terminal, or "head" domain, has been postulated to be necessary for the trimerization of the fiber and might act as a registration signal that directs its correct folding and assembly. In this work, we replaced the head of the fiber by the trimerization domain of the bacteriophage T4 fibritin, termed "foldon." Two chimeric proteins, comprising the foldon domain connected at the C-terminal end of four fiber shaft repeats with or without the use of a natural linker sequence, fold into highly stable, SDS-resistant trimers. The structural signatures of the chimeric proteins as seen by CD and infrared spectroscopy are reported. The results suggest that the foldon domain can successfully replace the fiber head domain in ensuring correct trimerization of the shaft sequences. Biological implications and implications for engineering highly stable, beta-structured nanorods are discussed.
Morgan, Alexander A.; Rubenstein, Edward
2013-01-01
Proline is an anomalous amino acid. Its nitrogen atom is covalently locked within a ring, thus it is the only proteinogenic amino acid with a constrained phi angle. Sequences of three consecutive prolines can fold into polyproline helices, structures that join alpha helices and beta pleats as architectural motifs in protein configuration. Triproline helices are participants in protein-protein signaling interactions. Longer spans of repeat prolines also occur, containing as many as 27 consecutive proline residues. Little is known about the frequency, positioning, and functional significance of these proline sequences. Therefore we have undertaken a systematic bioinformatics study of proline residues in proteins. We analyzed the distribution and frequency of 687,434 proline residues among 18,666 human proteins, identifying single residues, dimers, trimers, and longer repeats. Proline accounts for 6.3% of the 10,882,808 protein amino acids. Of all proline residues, 4.4% are in trimers or longer spans. We detected patterns that influence function based on proline location, spacing, and concentration. We propose a classification based on proline-rich, polyproline-rich, and proline-poor status. Whereas singlet proline residues are often found in proteins that display recurring architectural patterns, trimers or longer proline sequences tend be associated with the absence of repetitive structural motifs. Spans of 6 or more are associated with DNA/RNA processing, actin, and developmental processes. We also suggest a role for proline in Kruppel-type zinc finger protein control of DNA expression, and in the nucleation and translocation of actin by the formin complex. PMID:23372670
Do Viruses Exchange Genes across Superkingdoms of Life?
Malik, Shahana S; Azem-E-Zahra, Syeda; Kim, Kyung Mo; Caetano-Anollés, Gustavo; Nasir, Arshan
2017-01-01
Viruses can be classified into archaeoviruses, bacterioviruses, and eukaryoviruses according to the taxonomy of the infected host. The host-constrained perception of viruses implies preference of genetic exchange between viruses and cellular organisms of their host superkingdoms and viral origins from host cells either via escape or reduction. However, viruses frequently establish non-lytic interactions with organisms and endogenize into the genomes of bacterial endosymbionts that reside in eukaryotic cells. Such interactions create opportunities for genetic exchange between viruses and organisms of non-host superkingdoms. Here, we take an atypical approach to revisit virus-cell interactions by first identifying protein fold structures in the proteomes of archaeoviruses, bacterioviruses, and eukaryoviruses and second by tracing their spread in the proteomes of superkingdoms Archaea, Bacteria, and Eukarya. The exercise quantified protein structural homologies between viruses and organisms of their host and non-host superkingdoms and revealed likely candidates for virus-to-cell and cell-to-virus gene transfers. Unexpected lifestyle-driven genetic affiliations between bacterioviruses and Eukarya and eukaryoviruses and Bacteria were also predicted in addition to a large cohort of protein folds that were universally shared by viral and cellular proteomes and virus-specific protein folds not detected in cellular proteomes. These protein folds provide unique insights into viral origins and evolution that are generally difficult to recover with traditional sequence alignment-dependent evolutionary analyses owing to the fast mutation rates of viral gene sequences.
Kathuria, Sagar V; Chan, Yvonne H; Nobrega, R Paul; Özen, Ayşegül; Matthews, C Robert
2016-03-01
Measurements of protection against exchange of main chain amide hydrogens (NH) with solvent hydrogens in globular proteins have provided remarkable insights into the structures of rare high-energy states that populate their folding free-energy surfaces. Lacking, however, has been a unifying theory that rationalizes these high-energy states in terms of the structures and sequences of their resident proteins. The Branched Aliphatic Side Chain (BASiC) hypothesis has been developed to explain the observed patterns of protection in a pair of TIM barrel proteins. This hypothesis supposes that the side chains of isoleucine, leucine, and valine (ILV) residues often form large hydrophobic clusters that very effectively impede the penetration of water to their underlying hydrogen bond networks and, thereby, enhance the protection against solvent exchange. The linkage between the secondary and tertiary structures enables these ILV clusters to serve as cores of stability in high-energy partially folded states. Statistically significant correlations between the locations of large ILV clusters in native conformations and strong protection against exchange for a variety of motifs reported in the literature support the generality of the BASiC hypothesis. The results also illustrate the necessity to elaborate this simple hypothesis to account for the roles of adjacent hydrocarbon moieties in defining stability cores of partially folded states along folding reaction coordinates. © 2015 The Protein Society.
Jiménez, Diego Javier; Dini-Andreote, Francisco; Ottoni, Júlia Ronzella; de Oliveira, Valéria Maia; van Elsas, Jan Dirk; Andreote, Fernando Dini
2015-05-01
The occurrence of genes encoding biotechnologically relevant α/β-hydrolases in mangrove soil microbial communities was assessed using data obtained by whole-metagenome sequencing of four mangroves areas, denoted BrMgv01 to BrMgv04, in São Paulo, Brazil. The sequences (215 Mb in total) were filtered based on local amino acid alignments against the Lipase Engineering Database. In total, 5923 unassembled sequences were affiliated with 30 different α/β-hydrolase fold superfamilies. The most abundant predicted proteins encompassed cytosolic hydrolases (abH08; ∼ 23%), microsomal hydrolases (abH09; ∼ 12%) and Moraxella lipase-like proteins (abH04 and abH01; < 5%). Detailed analysis of the genes predicted to encode proteins of the abH08 superfamily revealed a high proportion related to epoxide hydrolases and haloalkane dehalogenases in polluted mangroves BrMgv01-02-03. This suggested selection and putative involvement in local degradation/detoxification of the pollutants. Seven sequences that were annotated as genes for putative epoxide hydrolases and five for putative haloalkane dehalogenases were found in a fosmid library generated from BrMgv02 DNA. The latter enzymes were predicted to belong to Actinobacteria, Deinococcus-Thermus, Planctomycetes and Proteobacteria. Our integrated approach thus identified 12 genes (complete and/or partial) that may encode hitherto undescribed enzymes. The low amino acid identity (< 60%) with already-described genes opens perspectives for both production in an expression host and genetic screening of metagenomes. © 2014 The Authors. Microbial Biotechnology published by John Wiley & Sons Ltd and Society for Applied Microbiology.
Fearnley, I M; Finel, M; Skehel, J M; Walker, J E
1991-01-01
The 39 kDa and 42 kDa subunits of NADH:ubiquinone oxidoreductase from bovine heart mitochondria are nuclear-coded components of the hydrophobic protein fraction of the enzyme. Their amino acid sequences have been deduced from the sequences of overlapping cDNA clones. These clones were amplified from total bovine heart cDNA by means of the polymerase chain reaction, with the use of complex mixtures of oligonucleotide primers based upon fragments of protein sequence determined at the N-terminals of the proteins and at internal sites. The protein sequences of the 39 kDa and 42 kDa subunits are 345 and 320 amino acid residues long respectively, and their calculated molecular masses are 39,115 Da and 36,693 Da. Both proteins are predominantly hydrophilic, but each contains one or two hydrophobic segments that could possibly be folded into transmembrane alpha-helices. The bovine 39 kDa protein sequence is related to that of a 40 kDa subunit from complex I from Neurospora crassa mitochondria; otherwise, it is not related significantly to any known sequence, including redox proteins and two polypeptides involved in import of proteins into mitochondria, known as the mitochondrial processing peptidase and the processing-enhancing protein. Therefore the functions of the 39 kDa and 42 kDa subunits of complex I are unknown. The mitochondrial gene product, ND4, a hydrophobic component of complex I with an apparent molecular mass of about 39 kDa, has been identified in preparations of the enzyme. This subunit stains faintly with Coomassie Blue dye, and in many gel systems it is not resolved from the nuclearcoded 36 kDa subunit. Images Fig. 1. PMID:1832859
Bastolla, Ugo
2014-01-01
The properties of biomolecules depend both on physics and on the evolutionary process that formed them. These two points of view produce a powerful synergism. Physics sets the stage and the constraints that molecular evolution has to obey, and evolutionary theory helps in rationalizing the physical properties of biomolecules, including protein folding thermodynamics. To complete the parallelism, protein thermodynamics is founded on the statistical mechanics in the space of protein structures, and molecular evolution can be viewed as statistical mechanics in the space of protein sequences. In this review, we will integrate both points of view, applying them to detecting selection on the stability of the folded state of proteins. We will start discussing positive design, which strengthens the stability of the folded against the unfolded state of proteins. Positive design justifies why statistical potentials for protein folding can be obtained from the frequencies of structural motifs. Stability against unfolding is easier to achieve for longer proteins. On the contrary, negative design, which consists in destabilizing frequently formed misfolded conformations, is more difficult to achieve for longer proteins. The folding rate can be enhanced by strengthening short-range native interactions, but this requirement contrasts with negative design, and evolution has to trade-off between them. Finally, selection can accelerate functional movements by favoring low frequency normal modes of the dynamics of the native state that strongly correlate with the functional conformation change. PMID:24970217
Foldability of a Natural De Novo Evolved Protein.
Bungard, Dixie; Copple, Jacob S; Yan, Jing; Chhun, Jimmy J; Kumirov, Vlad K; Foy, Scott G; Masel, Joanna; Wysocki, Vicki H; Cordes, Matthew H J
2017-11-07
The de novo evolution of protein-coding genes from noncoding DNA is emerging as a source of molecular innovation in biology. Studies of random sequence libraries, however, suggest that young de novo proteins will not fold into compact, specific structures typical of native globular proteins. Here we show that Bsc4, a functional, natural de novo protein encoded by a gene that evolved recently from noncoding DNA in the yeast S. cerevisiae, folds to a partially specific three-dimensional structure. Bsc4 forms soluble, compact oligomers with high β sheet content and a hydrophobic core, and undergoes cooperative, reversible denaturation. Bsc4 lacks a specific quaternary state, however, existing instead as a continuous distribution of oligomer sizes, and binds dyes indicative of amyloid oligomers or molten globules. The combination of native-like and non-native-like properties suggests a rudimentary fold that could potentially act as a functional intermediate in the emergence of new folded proteins de novo. Copyright © 2017 Elsevier Ltd. All rights reserved.
Chen, Mingchen; Lin, Xingcheng; Zheng, Weihua; Onuchic, José N; Wolynes, Peter G
2016-08-25
The associative memory, water mediated, structure and energy model (AWSEM) is a coarse-grained force field with transferable tertiary interactions that incorporates local in sequence energetic biases using bioinformatically derived structural information about peptide fragments with locally similar sequences that we call memories. The memory information from the protein data bank (PDB) database guides proper protein folding. The structural information about available sequences in the database varies in quality and can sometimes lead to frustrated free energy landscapes locally. One way out of this difficulty is to construct the input fragment memory information from all-atom simulations of portions of the complete polypeptide chain. In this paper, we investigate this approach first put forward by Kwac and Wolynes in a more complete way by studying the structure prediction capabilities of this approach for six α-helical proteins. This scheme which we call the atomistic associative memory, water mediated, structure and energy model (AAWSEM) amounts to an ab initio protein structure prediction method that starts from the ground up without using bioinformatic input. The free energy profiles from AAWSEM show that atomistic fragment memories are sufficient to guide the correct folding when tertiary forces are included. AAWSEM combines the efficiency of coarse-grained simulations on the full protein level with the local structural accuracy achievable from all-atom simulations of only parts of a large protein. The results suggest that a hybrid use of atomistic fragment memory and database memory in structural predictions may well be optimal for many practical applications.
Similar folds with different stabilization mechanisms: the cases of prion and doppel proteins
Colacino, Stefano; Tiana, Guido; Colombo, Giorgio
2006-01-01
Background Protein misfolding is the main cause of a group of fatal neurodegenerative diseases in humans and animals. In particular, in Prion-related diseases the normal cellular form of the Prion Protein PrP (PrPC) is converted into the infectious PrPSc through a conformational process during which it acquires a high β-sheet content. Doppel is a protein that shares a similar native fold, but lacks the scrapie isoform. Understanding the molecular determinants of these different behaviours is important both for biomedical and biophysical research. Results In this paper, the dynamical and energetic properties of the two proteins in solution is comparatively analyzed by means of long time scale explicit solvent, all-atom molecular dynamics in different temperature conditions. The trajectories are analyzed by means of a recently introduced energy decomposition approach (Tiana et al, Prot. Sci. 2004) aimed at identifying the key residues for the stabilization and folding of the protein. Our analysis shows that Prion and Doppel have two different cores stabilizing the native state and that the relative contribution of the nucleus to the global stability of the protein for Doppel is sensitively higher than for PrP. Moreover, under misfolding conditions the Doppel core is conserved, while the energy stabilization network of PrP is disrupted. Conclusion These observations suggest that different sequences can share similar native topology with different stabilizing interactions and that the sequences of the Prion and Doppel proteins may have diverged under different evolutionary constraints resulting in different folding and stabilization mechanisms. PMID:16857062
The alphabet of intrinsic disorder
Uversky, Vladimir N
2013-01-01
The ability of a protein to fold into unique functional state or to stay intrinsically disordered is encoded in its amino acid sequence. Both ordered and intrinsically disordered proteins (IDPs) are natural polypeptides that use the same arsenal of 20 proteinogenic amino acid residues as their major building blocks. The exceptional structural plasticity of IDPs, their capability to exist as heterogeneous structural ensembles and their wide array of important disorder-based biological functions that complements functional repertoire of ordered proteins are all rooted within the peculiar differential usage of these building blocks by ordered proteins and IDPs. In fact, some residues (so-called disorder-promoting residues) are noticeably more common in IDPs than in sequences of ordered proteins, which, in their turn, are enriched in several order-promoting residues. Furthermore, residues can be arranged according to their “disorder promoting potencies,” which are evaluated based on the relative abundances of various amino acids in ordered and disordered proteins. This review continues a series of publications on the roles of different amino acids in defining the phenomenon of protein intrinsic disorder and concerns glutamic acid, which is the second most disorder-promoting residue. PMID:28516010
Chan, Yvonne H.; Venev, Sergey V.; Zeldovich, Konstantin B.; Matthews, C. Robert
2017-01-01
Sequence divergence of orthologous proteins enables adaptation to environmental stresses and promotes evolution of novel functions. Limits on evolution imposed by constraints on sequence and structure were explored using a model TIM barrel protein, indole-3-glycerol phosphate synthase (IGPS). Fitness effects of point mutations in three phylogenetically divergent IGPS proteins during adaptation to temperature stress were probed by auxotrophic complementation of yeast with prokaryotic, thermophilic IGPS. Analysis of beneficial mutations pointed to an unexpected, long-range allosteric pathway towards the active site of the protein. Significant correlations between the fitness landscapes of distant orthologues implicate both sequence and structure as primary forces in defining the TIM barrel fitness landscape and suggest that fitness landscapes can be translocated in sequence space. Exploration of fitness landscapes in the context of a protein fold provides a strategy for elucidating the sequence-structure-fitness relationships in other common motifs. PMID:28262665
NASA Astrophysics Data System (ADS)
Agarwal, Sonya; Döring, Kristina; Gierusz, Leszek A.; Iyer, Pooja; Lane, Fiona M.; Graham, James F.; Goldmann, Wilfred; Pinheiro, Teresa J. T.; Gill, Andrew C.
2015-10-01
The β2-α2 loop of PrPC is a key modulator of disease-associated prion protein misfolding. Amino acids that differentiate mouse (Ser169, Asn173) and deer (Asn169, Thr173) PrPC appear to confer dramatically different structural properties in this region and it has been suggested that amino acid sequences associated with structural rigidity of the loop also confer susceptibility to prion disease. Using mouse recombinant PrP, we show that mutating residue 173 from Asn to Thr alters protein stability and misfolding only subtly, whilst changing Ser to Asn at codon 169 causes instability in the protein, promotes oligomer formation and dramatically potentiates fibril formation. The doubly mutated protein exhibits more complex folding and misfolding behaviour than either single mutant, suggestive of differential effects of the β2-α2 loop sequence on both protein stability and on specific misfolding pathways. Molecular dynamics simulation of protein structure suggests a key role for the solvent accessibility of Tyr168 in promoting molecular interactions that may lead to prion protein misfolding. Thus, we conclude that ‘rigidity’ in the β2-α2 loop region of the normal conformer of PrP has less effect on misfolding than other sequence-related effects in this region.
Protein folding and misfolding: mechanism and principles
Englander, S. Walter; Mayne, Leland; Krishna, Mallela M. G.
2012-01-01
Two fundamentally different views of how proteins fold are now being debated. Do proteins fold through multiple unpredictable routes directed only by the energetically downhill nature of the folding landscape or do they fold through specific intermediates in a defined pathway that systematically puts predetermined pieces of the target native protein into place? It has now become possible to determine the structure of protein folding intermediates, evaluate their equilibrium and kinetic parameters, and establish their pathway relationships. Results obtained for many proteins have serendipitously revealed a new dimension of protein structure. Cooperative structural units of the native protein, called foldons, unfold and refold repeatedly even under native conditions. Much evidence obtained by hydrogen exchange and other methods now indicates that cooperative foldon units and not individual amino acids account for the unit steps in protein folding pathways. The formation of foldons and their ordered pathway assembly systematically puts native-like foldon building blocks into place, guided by a sequential stabilization mechanism in which prior native-like structure templates the formation of incoming foldons with complementary structure. Thus the same propensities and interactions that specify the final native state, encoded in the amino-acid sequence of every protein, determine the pathway for getting there. Experimental observations that have been interpreted differently, in terms of multiple independent pathways, appear to be due to chance misfolding errors that cause different population fractions to block at different pathway points, populate different pathway intermediates, and fold at different rates. This paper summarizes the experimental basis for these three determining principles and their consequences. Cooperative native-like foldon units and the sequential stabilization process together generate predetermined stepwise pathways. Optional misfolding errors are responsible for 3-state and heterogeneous kinetic folding. PMID:18405419
Initial assembly steps of a translocase for folded proteins
Blümmel, Anne-Sophie; Haag, Laura A.; Eimer, Ekaterina; Müller, Matthias; Fröbel, Julia
2015-01-01
The so-called Tat (twin-arginine translocation) system transports completely folded proteins across cellular membranes of archaea, prokaryotes and plant chloroplasts. Tat-directed proteins are distinguished by a conserved twin-arginine (RR-) motif in their signal sequences. Many Tat systems are based on the membrane proteins TatA, TatB and TatC, of which TatB and TatC are known to cooperate in binding RR-signal peptides and to form higher-order oligomeric structures. We have now elucidated the fine architecture of TatBC oligomers assembled to form closed intramembrane substrate-binding cavities. The identification of distinct homonymous and heteronymous contacts between TatB and TatC suggest that TatB monomers coalesce into dome-like TatB structures that are surrounded by outer rings of TatC monomers. We also show that these TatBC complexes are approached by TatA protomers through their N-termini, which thereby establish contacts with TatB and membrane-inserted RR-precursors. PMID:26068441
DOE Office of Scientific and Technical Information (OSTI.GOV)
T Wang; K Heran Darwin; H Li
2011-12-31
Mycobacterium tuberculosis uses a proteasome system that is analogous to the eukaryotic ubiquitin-proteasome pathway and is required for pathogenesis. However, the bacterial analog of ubiquitin, prokaryotic ubiquitin-like protein (Pup), is an intrinsically disordered protein that bears little sequence or structural resemblance to the highly structured ubiquitin. Thus, it was unknown how pupylated proteins were recruited to the proteasome. Here, we show that the Mycobacterium proteasomal ATPase (Mpa) has three pairs of tentacle-like coiled coils that recognize Pup. Mpa bound unstructured Pup through hydrophobic interactions and a network of hydrogen bonds, leading to the formation of an {alpha}-helix in Pup. Ourmore » work describes a binding-induced folding recognition mechanism in the Pup-proteasome system that differs mechanistically from substrate recognition in the ubiquitin-proteasome system. This key difference between the prokaryotic and eukaryotic systems could be exploited for the development of a small molecule-based treatment for tuberculosis.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wang, T.; Li, H.; Darwin, K. H.
2010-11-01
Mycobacterium tuberculosis uses a proteasome system that is analogous to the eukaryotic ubiquitin-proteasome pathway and is required for pathogenesis. However, the bacterial analog of ubiquitin, prokaryotic ubiquitin-like protein (Pup), is an intrinsically disordered protein that bears little sequence or structural resemblance to the highly structured ubiquitin. Thus, it was unknown how pupylated proteins were recruited to the proteasome. Here, we show that the Mycobacterium proteasomal ATPase (Mpa) has three pairs of tentacle-like coiled coils that recognize Pup. Mpa bound unstructured Pup through hydrophobic interactions and a network of hydrogen bonds, leading to the formation of an {alpha}-helix in Pup. Ourmore » work describes a binding-induced folding recognition mechanism in the Pup-proteasome system that differs mechanistically from substrate recognition in the ubiquitin-proteasome system. This key difference between the prokaryotic and eukaryotic systems could be exploited for the development of a small molecule-based treatment for tuberculosis.« less
Xue, Li C.; Jordan, Rafael A.; EL-Manzalawy, Yasser; Dobbs, Drena; Honavar, Vasant
2015-01-01
Selecting near-native conformations from the immense number of conformations generated by docking programs remains a major challenge in molecular docking. We introduce DockRank, a novel approach to scoring docked conformations based on the degree to which the interface residues of the docked conformation match a set of predicted interface residues. Dock-Rank uses interface residues predicted by partner-specific sequence homology-based protein–protein interface predictor (PS-HomPPI), which predicts the interface residues of a query protein with a specific interaction partner. We compared the performance of DockRank with several state-of-the-art docking scoring functions using Success Rate (the percentage of cases that have at least one near-native conformation among the top m conformations) and Hit Rate (the percentage of near-native conformations that are included among the top m conformations). In cases where it is possible to obtain partner-specific (PS) interface predictions from PS-HomPPI, DockRank consistently outperforms both (i) ZRank and IRAD, two state-of-the-art energy-based scoring functions (improving Success Rate by up to 4-fold); and (ii) Variants of DockRank that use predicted interface residues obtained from several protein interface predictors that do not take into account the binding partner in making interface predictions (improving success rate by up to 39-fold). The latter result underscores the importance of using partner-specific interface residues in scoring docked conformations. We show that DockRank, when used to re-rank the conformations returned by ClusPro, improves upon the original ClusPro rankings in terms of both Success Rate and Hit Rate. DockRank is available as a server at http://einstein.cs.iastate.edu/DockRank/. PMID:23873600
Walker, M D; Park, C W; Rosen, A; Aronheim, A
1990-01-01
Cell specific expression of the insulin gene is achieved through transcriptional mechanisms operating on multiple DNA sequence elements located in the 5' flanking region of the gene. Of particular importance in the rat insulin I gene are two closely similar 9 bp sequences (IEB1 and IEB2): mutation of either of these leads to 5-10 fold reduction in transcriptional activity. We have screened an expression cDNA library derived from mouse pancreatic endocrine beta cells with a radioactive DNA probe containing multiple copies of the IEB1 sequence. A cDNA clone (A1) isolated by this procedure encodes a protein which shows efficient binding to the IEB1 probe, but much weaker binding to either an unrelated DNA probe or to a probe bearing a single base pair insertion within the recognition sequence. DNA sequence analysis indicates a protein belonging to the helix-loop-helix family of DNA-binding proteins. The ability of the protein encoded by clone A1 to recognize a number of wild type and mutant DNA sequences correlates closely with the ability of each sequence element to support transcription in vivo in the context of the insulin 5' flanking DNA. We conclude that the isolated cDNA may encode a transcription factor that participates in control of insulin gene expression. Images PMID:2181401
Kozic, Mara; Fox, Stephen J; Thomas, Jens M; Verma, Chandra S; Rigden, Daniel J
2018-05-01
Antimicrobial resistance within a wide range of infectious agents is a severe and growing public health threat. Antimicrobial peptides (AMPs) are among the leading alternatives to current antibiotics, exhibiting broad spectrum activity. Their activity is determined by numerous properties such as cationic charge, amphipathicity, size, and amino acid composition. Currently, only around 10% of known AMP sequences have experimentally solved structures. To improve our understanding of the AMP structural universe we have carried out large scale ab initio 3D modeling of structurally uncharacterized AMPs that revealed similarities between predicted folds of the modeled sequences and structures of characterized AMPs. Two of the peptides whose models matched known folds are Lebocin Peptide 1A (LP1A) and Odorranain M, predicted to form β-hairpins but, interestingly, to lack the intramolecular disulfide bonds, cation-π or aromatic interactions that generally stabilize such AMP structures. Other examples include Ponericin Q42, Latarcin 4a, Kassinatuerin 1, Ceratotoxin D, and CPF-B1 peptide, which have α-helical folds, as well as mixed αβ folds of human Histatin 2 peptide and Garvicin A which are, to the best of our knowledge, the first linear αββ fold AMPs lacking intramolecular disulfide bonds. In addition to fold matches to experimentally derived structures, unique folds were also obtained, namely for Microcin M and Ipomicin. These results help in understanding the range of protein scaffolds that naturally bear antimicrobial activity and may facilitate protein design efforts towards better AMPs. © 2018 The Authors Proteins: Structure, Function, and Bioinformatics Published by Wiley Periodicals, Inc.
A limited universe of membrane protein families and folds
Oberai, Amit; Ihm, Yungok; Kim, Sanguk; Bowie, James U.
2006-01-01
One of the goals of structural genomics is to obtain a structural representative of almost every fold in nature. A recent estimate suggests that 70%–80% of soluble protein domains identified in the first 1000 genome sequences should be covered by about 25,000 structures—a reasonably achievable goal. As no current estimates exist for the number of membrane protein families, however, it is not possible to know whether family coverage is a realistic goal for membrane proteins. Here we find that virtually all polytopic helical membrane protein families are present in the already known sequences so we can make an estimate of the total number of families. We find that only ∼700 polytopic membrane protein families account for 80% of structured residues and ∼1700 cover 90% of structured residues. While apparently a finite and reachable goal, we estimate that it will likely take more than three decades to obtain the structures needed for 90% residue coverage, if current trends continue. PMID:16815920
NASA Astrophysics Data System (ADS)
Beedle, Amy E. M.; Lezamiz, Ainhoa; Stirnemann, Guillaume; Garcia-Manyes, Sergi
2015-08-01
Understanding the directionality and sequence of protein unfolding is crucial to elucidate the underlying folding free energy landscape. An extra layer of complexity is added in metalloproteins, where a metal cofactor participates in the correct, functional fold of the protein. However, the precise mechanisms by which organometallic interactions are dynamically broken and reformed on (un)folding are largely unknown. Here we use single molecule force spectroscopy AFM combined with protein engineering and MD simulations to study the individual unfolding pathways of the blue-copper proteins azurin and plastocyanin. Using the nanomechanical properties of the native copper centre as a structurally embedded molecular reporter, we demonstrate that both proteins unfold via two independent, competing pathways. Our results provide experimental evidence of a novel kinetic partitioning scenario whereby the protein can stochastically unfold through two distinct main transition states placed at the N and C termini that dictate the direction in which unfolding occurs.
Nakano, Shogo; Asano, Yasuhisa
2015-02-03
Development of software and methods for design of complete sequences of functional proteins could contribute to studies of protein engineering and protein evolution. To this end, we developed the INTMSAlign software, and used it to design functional proteins and evaluate their usefulness. The software could assign both consensus and correlation residues of target proteins. We generated three protein sequences with S-selective hydroxynitrile lyase (S-HNL) activity, which we call designed S-HNLs; these proteins folded as efficiently as the native S-HNL. Sequence and biochemical analysis of the designed S-HNLs suggested that accumulation of neutral mutations occurs during the process of S-HNLs evolution from a low-activity form to a high-activity (native) form. Taken together, our results demonstrate that our software and the associated methods could be applied not only to design of complete sequences, but also to predictions of protein evolution, especially within families such as esterases and S-HNLs.
NASA Astrophysics Data System (ADS)
Nakano, Shogo; Asano, Yasuhisa
2015-02-01
Development of software and methods for design of complete sequences of functional proteins could contribute to studies of protein engineering and protein evolution. To this end, we developed the INTMSAlign software, and used it to design functional proteins and evaluate their usefulness. The software could assign both consensus and correlation residues of target proteins. We generated three protein sequences with S-selective hydroxynitrile lyase (S-HNL) activity, which we call designed S-HNLs; these proteins folded as efficiently as the native S-HNL. Sequence and biochemical analysis of the designed S-HNLs suggested that accumulation of neutral mutations occurs during the process of S-HNLs evolution from a low-activity form to a high-activity (native) form. Taken together, our results demonstrate that our software and the associated methods could be applied not only to design of complete sequences, but also to predictions of protein evolution, especially within families such as esterases and S-HNLs.
Cooperative Subunit Refolding of a Light-Harvesting Protein through a Self-Chaperone Mechanism.
Laos, Alistair J; Dean, Jacob C; Toa, Zi S D; Wilk, Krystyna E; Scholes, Gregory D; Curmi, Paul M G; Thordarson, Pall
2017-07-10
The fold of a protein is encoded by its amino acid sequence, but how complex multimeric proteins fold and assemble into functional quaternary structures remains unclear. Here we show that two structurally different phycobiliproteins refold and reassemble in a cooperative manner from their unfolded polypeptide subunits, without biological chaperones. Refolding was confirmed by ultrafast broadband transient absorption and two-dimensional electronic spectroscopy to probe internal chromophores as a marker of quaternary structure. Our results demonstrate a cooperative, self-chaperone refolding mechanism, whereby the β-subunits independently refold, thereby templating the folding of the α-subunits, which then chaperone the assembly of the native complex, quantitatively returning all coherences. Our results indicate that subunit self-chaperoning is a robust mechanism for heteromeric protein folding and assembly that could also be applied in self-assembled synthetic hierarchical systems. © 2017 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.
Kinact: a computational approach for predicting activating missense mutations in protein kinases.
Rodrigues, Carlos H M; Ascher, David B; Pires, Douglas E V
2018-05-21
Protein phosphorylation is tightly regulated due to its vital role in many cellular processes. While gain of function mutations leading to constitutive activation of protein kinases are known to be driver events of many cancers, the identification of these mutations has proven challenging. Here we present Kinact, a novel machine learning approach for predicting kinase activating missense mutations using information from sequence and structure. By adapting our graph-based signatures, Kinact represents both structural and sequence information, which are used as evidence to train predictive models. We show the combination of structural and sequence features significantly improved the overall accuracy compared to considering either primary or tertiary structure alone, highlighting their complementarity. Kinact achieved a precision of 87% and 94% and Area Under ROC Curve of 0.89 and 0.92 on 10-fold cross-validation, and on blind tests, respectively, outperforming well established tools (P < 0.01). We further show that Kinact performs equally well on homology models built using templates with sequence identity as low as 33%. Kinact is freely available as a user-friendly web server at http://biosig.unimelb.edu.au/kinact/.
Inverse statistical physics of protein sequences: a key issues review.
Cocco, Simona; Feinauer, Christoph; Figliuzzi, Matteo; Monasson, Rémi; Weigt, Martin
2018-03-01
In the course of evolution, proteins undergo important changes in their amino acid sequences, while their three-dimensional folded structure and their biological function remain remarkably conserved. Thanks to modern sequencing techniques, sequence data accumulate at unprecedented pace. This provides large sets of so-called homologous, i.e. evolutionarily related protein sequences, to which methods of inverse statistical physics can be applied. Using sequence data as the basis for the inference of Boltzmann distributions from samples of microscopic configurations or observables, it is possible to extract information about evolutionary constraints and thus protein function and structure. Here we give an overview over some biologically important questions, and how statistical-mechanics inspired modeling approaches can help to answer them. Finally, we discuss some open questions, which we expect to be addressed over the next years.
Inverse statistical physics of protein sequences: a key issues review
NASA Astrophysics Data System (ADS)
Cocco, Simona; Feinauer, Christoph; Figliuzzi, Matteo; Monasson, Rémi; Weigt, Martin
2018-03-01
In the course of evolution, proteins undergo important changes in their amino acid sequences, while their three-dimensional folded structure and their biological function remain remarkably conserved. Thanks to modern sequencing techniques, sequence data accumulate at unprecedented pace. This provides large sets of so-called homologous, i.e. evolutionarily related protein sequences, to which methods of inverse statistical physics can be applied. Using sequence data as the basis for the inference of Boltzmann distributions from samples of microscopic configurations or observables, it is possible to extract information about evolutionary constraints and thus protein function and structure. Here we give an overview over some biologically important questions, and how statistical-mechanics inspired modeling approaches can help to answer them. Finally, we discuss some open questions, which we expect to be addressed over the next years.
Kazmier, Kelli; Alexander, Nathan S.; Meiler, Jens; Mchaourab, Hassane S.
2010-01-01
A hybrid protein structure determination approach combining sparse Electron Paramagnetic Resonance (EPR) distance restraints and Rosetta de novo protein folding has been previously demonstrated to yield high quality models (Alexander et al., 2008). However, widespread application of this methodology to proteins of unknown structures is hindered by the lack of a general strategy to place spin label pairs in the primary sequence. In this work, we report the development of an algorithm that optimally selects spin labeling positions for the purpose of distance measurements by EPR. For the α-helical subdomain of T4 lysozyme (T4L), simulated restraints that maximize sequence separation between the two spin labels while simultaneously ensuring pairwise connectivity of secondary structure elements yielded vastly improved models by Rosetta folding. 50% of all these models have the correct fold compared to only 21% and 8% correctly folded models when randomly placed restraints or no restraints are used, respectively. Moreover, the improvements in model quality require a limited number of optimized restraints, the number of which is determined by the pairwise connectivities of T4L α-helices. The predicted improvement in Rosetta model quality was verified by experimental determination of distances between spin labels pairs selected by the algorithm. Overall, our results reinforce the rationale for the combined use of sparse EPR distance restraints and de novo folding. By alleviating the experimental bottleneck associated with restraint selection, this algorithm sets the stage for extending computational structure determination to larger, traditionally elusive protein topologies of critical structural and biochemical importance. PMID:21074624
Echigoya, Yusuke; Lim, Kenji Rowel Q; Trieu, Nhu; Bao, Bo; Miskew Nichols, Bailey; Vila, Maria Candida; Novak, James S; Hara, Yuko; Lee, Joshua; Touznik, Aleksander; Mamchaoui, Kamel; Aoki, Yoshitsugu; Takeda, Shin'ichi; Nagaraju, Kanneboyina; Mouly, Vincent; Maruyama, Rika; Duddy, William; Yokota, Toshifumi
2017-11-01
Duchenne muscular dystrophy (DMD), the most common lethal genetic disorder, is caused by mutations in the dystrophin (DMD) gene. Exon skipping is a therapeutic approach that uses antisense oligonucleotides (AOs) to modulate splicing and restore the reading frame, leading to truncated, yet functional protein expression. In 2016, the US Food and Drug Administration (FDA) conditionally approved the first phosphorodiamidate morpholino oligomer (morpholino)-based AO drug, eteplirsen, developed for DMD exon 51 skipping. Eteplirsen remains controversial with insufficient evidence of its therapeutic effect in patients. We recently developed an in silico tool to design antisense morpholino sequences for exon skipping. Here, we designed morpholino AOs targeting DMD exon 51 using the in silico tool and quantitatively evaluated the effects in immortalized DMD muscle cells in vitro. To our surprise, most of the newly designed morpholinos induced exon 51 skipping more efficiently compared with the eteplirsen sequence. The efficacy of exon 51 skipping and rescue of dystrophin protein expression were increased by up to more than 12-fold and 7-fold, respectively, compared with the eteplirsen sequence. Significant in vivo efficacy of the most effective morpholino, determined in vitro, was confirmed in mice carrying the human DMD gene. These findings underscore the importance of AO sequence optimization for exon skipping. Copyright © 2017 The American Society of Gene and Cell Therapy. Published by Elsevier Inc. All rights reserved.
Pattern similarity study of functional sites in protein sequences: lysozymes and cystatins
Nakai, Shuryo; Li-Chan, Eunice CY; Dou, Jinglie
2005-01-01
Background Although it is generally agreed that topography is more conserved than sequences, proteins sharing the same fold can have different functions, while there are protein families with low sequence similarity. An alternative method for profile analysis of characteristic conserved positions of the motifs within the 3D structures may be needed for functional annotation of protein sequences. Using the approach of quantitative structure-activity relationships (QSAR), we have proposed a new algorithm for postulating functional mechanisms on the basis of pattern similarity and average of property values of side-chains in segments within sequences. This approach was used to search for functional sites of proteins belonging to the lysozyme and cystatin families. Results Hydrophobicity and β-turn propensity of reference segments with 3–7 residues were used for the homology similarity search (HSS) for active sites. Hydrogen bonding was used as the side-chain property for searching the binding sites of lysozymes. The profiles of similarity constants and average values of these parameters as functions of their positions in the sequences could identify both active and substrate binding sites of the lysozyme of Streptomyces coelicolor, which has been reported as a new fold enzyme (Cellosyl). The same approach was successfully applied to cystatins, especially for postulating the mechanisms of amyloidosis of human cystatin C as well as human lysozyme. Conclusion Pattern similarity and average index values of structure-related properties of side chains in short segments of three residues or longer were, for the first time, successfully applied for predicting functional sites in sequences. This new approach may be applicable to studying functional sites in un-annotated proteins, for which complete 3D structures are not yet available. PMID:15904486
From Sequence and Forces to Structure, Function and Evolution of Intrinsically Disordered Proteins
Forman-Kay, Julie D.; Mittag, Tanja
2015-01-01
Intrinsically disordered proteins (IDPs), which lack persistent structure, are a challenge to structural biology due to the inapplicability of standard methods for characterization of folded proteins as well as their deviation from the dominant structure/function paradigm. Their widespread presence and involvement in biological function, however, has spurred the growing acceptance of the importance of IDPs and the development of new tools for studying their structure, dynamics and function. The interplay of folded and disordered domains or regions for function and the existence of a continuum of protein states with respect to conformational energetics, motional timescales and compactness is shaping a unified understanding of structure-dynamics-disorder/function relationships. On the 20th anniversary of this journal, Structure, we provide a historical perspective on the investigation of IDPs and summarize the sequence features and physical forces that underlie their unique structural, functional and evolutionary properties. PMID:24010708
From sequence and forces to structure, function, and evolution of intrinsically disordered proteins.
Forman-Kay, Julie D; Mittag, Tanja
2013-09-03
Intrinsically disordered proteins (IDPs), which lack persistent structure, are a challenge to structural biology due to the inapplicability of standard methods for characterization of folded proteins as well as their deviation from the dominant structure/function paradigm. Their widespread presence and involvement in biological function, however, has spurred the growing acceptance of the importance of IDPs and the development of new tools for studying their structure, dynamics, and function. The interplay of folded and disordered domains or regions for function and the existence of a continuum of protein states with respect to conformational energetics, motional timescales, and compactness are shaping a unified understanding of structure-dynamics-disorder/function relationships. In the 20(th) anniversary of Structure, we provide a historical perspective on the investigation of IDPs and summarize the sequence features and physical forces that underlie their unique structural, functional, and evolutionary properties. Copyright © 2013 Elsevier Ltd. All rights reserved.
Simplified Protein Models: Predicting Folding Pathways and Structure Using Amino Acid Sequences
NASA Astrophysics Data System (ADS)
Adhikari, Aashish N.; Freed, Karl F.; Sosnick, Tobin R.
2013-07-01
We demonstrate the ability of simultaneously determining a protein’s folding pathway and structure using a properly formulated model without prior knowledge of the native structure. Our model employs a natural coordinate system for describing proteins and a search strategy inspired by the observation that real proteins fold in a sequential fashion by incrementally stabilizing nativelike substructures or “foldons.” Comparable folding pathways and structures are obtained for the twelve proteins recently studied using atomistic molecular dynamics simulations [K. Lindorff-Larsen, S. Piana, R. O. Dror, D. E. Shaw, Science 334, 517 (2011)], with our calculations running several orders of magnitude faster. We find that nativelike propensities in the unfolded state do not necessarily determine the order of structure formation, a departure from a major conclusion of the molecular dynamics study. Instead, our results support a more expansive view wherein intrinsic local structural propensities may be enhanced or overridden in the folding process by environmental context. The success of our search strategy validates it as an expedient mechanism for folding both in silico and in vivo.
DOE Office of Scientific and Technical Information (OSTI.GOV)
NONE
1994-12-31
This conference was held December 4--8, 1994 in Asilomar, California. The purpose of this meeting was to provide a forum for exchange of state-of-the-art information concerning the prediction of protein structure. Attention if focused on the following: comparative modeling; sequence to fold assignment; and ab initio folding.
Domain atrophy creates rare cases of functional partial protein domains.
Prakash, Ananth; Bateman, Alex
2015-04-30
Protein domains display a range of structural diversity, with numerous additions and deletions of secondary structural elements between related domains. We have observed a small number of cases of surprising large-scale deletions of core elements of structural domains. We propose a new concept called domain atrophy, where protein domains lose a significant number of core structural elements. Here, we implement a new pipeline to systematically identify new cases of domain atrophy across all known protein sequences. The output of this pipeline was carefully checked by hand, which filtered out partial domain instances that were unlikely to represent true domain atrophy due to misannotations or un-annotated sequence fragments. We identify 75 cases of domain atrophy, of which eight cases are found in a three-dimensional protein structure and 67 cases have been inferred based on mapping to a known homologous structure. Domains with structural variations include ancient folds such as the TIM-barrel and Rossmann folds. Most of these domains are observed to show structural loss that does not affect their functional sites. Our analysis has significantly increased the known cases of domain atrophy. We discuss specific instances of domain atrophy and see that there has often been a compensatory mechanism that helps to maintain the stability of the partial domain. Our study indicates that although domain atrophy is an extremely rare phenomenon, protein domains under certain circumstances can tolerate extreme mutations giving rise to partial, but functional, domains.
Viroporins, Examples of the Two-Stage Membrane Protein Folding Model.
Martinez-Gil, Luis; Mingarro, Ismael
2015-06-26
Viroporins are small, α-helical, hydrophobic virus encoded proteins, engineered to form homo-oligomeric hydrophilic pores in the host membrane. Viroporins participate in multiple steps of the viral life cycle, from entry to budding. As any other membrane protein, viroporins have to find the way to bury their hydrophobic regions into the lipid bilayer. Once within the membrane, the hydrophobic helices of viroporins interact with each other to form higher ordered structures required to correctly perform their porating activities. This two-step process resembles the two-stage model proposed for membrane protein folding by Engelman and Poppot. In this review we use the membrane protein folding model as a leading thread to analyze the mechanism and forces behind the membrane insertion and folding of viroporins. We start by describing the transmembrane segment architecture of viroporins, including the number and sequence characteristics of their membrane-spanning domains. Next, we connect the differences found among viroporin families to their viral genome organization, and finalize focusing on the pathways used by viroporins in their way to the membrane and on the transmembrane helix-helix interactions required to achieve proper folding and assembly.
Streptococcus pneumonia YlxR at 1.35 A shows a putative new fold.
Osipiuk, J; Górnicki, P; Maj, L; Dementieva, I; Laskowski, R; Joachimiak, A
2001-11-01
The structure of the YlxR protein of unknown function from Streptococcus pneumonia was determined to 1.35 A. YlxR is expressed from the nusA/infB operon in bacteria and belongs to a small protein family (COG2740) that shares a conserved sequence motif GRGA(Y/W). The family shows no significant amino-acid sequence similarity with other proteins. Three-wavelength diffraction MAD data were collected to 1.7 A from orthorhombic crystals using synchrotron radiation and the structure was determined using a semi-automated approach. The YlxR structure resembles a two-layer alpha/beta sandwich with the overall shape of a cylinder and shows no structural homology to proteins of known structure. Structural analysis revealed that the YlxR structure represents a new protein fold that belongs to the alpha-beta plait superfamily. The distribution of the electrostatic surface potential shows a large positively charged patch on one side of the protein, a feature often found in nucleic acid-binding proteins. Three sulfate ions bind to this positively charged surface. Analysis of potential binding sites uncovered several substantial clefts, with the largest spanning 3/4 of the protein. A similar distribution of binding sites and a large sharply bent cleft are observed in RNA-binding proteins that are unrelated in sequence and structure. It is proposed that YlxR is an RNA-binding protein.
Acid extraction and purification of recombinant spider silk proteins.
Mello, Charlene M; Soares, Jason W; Arcidiacono, Steven; Butler, Michelle M
2004-01-01
A procedure has been developed for the isolation of recombinant spider silk proteins based upon their unique stability and solubilization characteristics. Three recombinant silk proteins, (SpI)7, NcDS, and [(SpI)4/(SpII)1]4, were purified by extraction with organic acids followed by affinity or ion exchange chromatography resulting in 90-95% pure silk solutions. The protein yield of NcDS (15 mg/L culture) and (SpI)7 (35 mg/L) increased 4- and 5-fold, respectively, from previously reported values presumably due to a more complete solubilization of the expressed recombinant protein. [(SpI)4/(SpII)1]4, a hybrid protein based on the repeat sequences of spidroin I and spidroin II, had a yield of 12.4 mg/L. This method is an effective, reproducible technique that has broad applicability for a variety of silk proteins as well as other acid stable biopolymers.
Functional Evolution of PLP-dependent Enzymes based on Active-Site Structural Similarities
Catazaro, Jonathan; Caprez, Adam; Guru, Ashu; Swanson, David; Powers, Robert
2014-01-01
Families of distantly related proteins typically have very low sequence identity, which hinders evolutionary analysis and functional annotation. Slowly evolving features of proteins, such as an active site, are therefore valuable for annotating putative and distantly related proteins. To date, a complete evolutionary analysis of the functional relationship of an entire enzyme family based on active-site structural similarities has not yet been undertaken. Pyridoxal-5’-phosphate (PLP) dependent enzymes are primordial enzymes that diversified in the last universal ancestor. Using the Comparison of Protein Active Site Structures (CPASS) software and database, we show that the active site structures of PLP-dependent enzymes can be used to infer evolutionary relationships based on functional similarity. The enzymes successfully clustered together based on substrate specificity, function, and three-dimensional fold. This study demonstrates the value of using active site structures for functional evolutionary analysis and the effectiveness of CPASS. PMID:24920327
Functional evolution of PLP-dependent enzymes based on active-site structural similarities.
Catazaro, Jonathan; Caprez, Adam; Guru, Ashu; Swanson, David; Powers, Robert
2014-10-01
Families of distantly related proteins typically have very low sequence identity, which hinders evolutionary analysis and functional annotation. Slowly evolving features of proteins, such as an active site, are therefore valuable for annotating putative and distantly related proteins. To date, a complete evolutionary analysis of the functional relationship of an entire enzyme family based on active-site structural similarities has not yet been undertaken. Pyridoxal-5'-phosphate (PLP) dependent enzymes are primordial enzymes that diversified in the last universal ancestor. Using the comparison of protein active site structures (CPASS) software and database, we show that the active site structures of PLP-dependent enzymes can be used to infer evolutionary relationships based on functional similarity. The enzymes successfully clustered together based on substrate specificity, function, and three-dimensional-fold. This study demonstrates the value of using active site structures for functional evolutionary analysis and the effectiveness of CPASS. © 2014 Wiley Periodicals, Inc.
Hom, Geoffrey K.; Lassila, J. Kyle; Thomas, Leonard M.; Mayo, Stephen L.
2005-01-01
Our goal was to compute a stable, full-sequence design of the Drosophila melanogaster engrailed homeodomain. Thermal and chemical denaturation data indicated the design was significantly more stable than was the wild-type protein. The data were also nearly identical to those for a similar, later full-sequence design, which was shown by NMR to adopt the homeodomain fold: a three-helix, globular monomer. However, a 1.65 Å crystal structure of the design described here turned out to be of a completely different fold: a four-helix, rodlike tetramer. The crystallization conditions included ~25% dioxane, and subsequent experiments by circular dichroism and sedimentation velocity analytical ultracentrifugation indicated that dioxane increases the helicity and oligomerization state of the designed protein. We attribute at least part of the discrepancy between the target fold and the crystal structure to the presence of a high concentration of dioxane. PMID:15741348
Chen, Peng; Li, Jinyan
2010-05-17
Prediction of long-range inter-residue contacts is an important topic in bioinformatics research. It is helpful for determining protein structures, understanding protein foldings, and therefore advancing the annotation of protein functions. In this paper, we propose a novel ensemble of genetic algorithm classifiers (GaCs) to address the long-range contact prediction problem. Our method is based on the key idea called sequence profile centers (SPCs). Each SPC is the average sequence profiles of residue pairs belonging to the same contact class or non-contact class. GaCs train on multiple but different pairs of long-range contact data (positive data) and long-range non-contact data (negative data). The negative data sets, having roughly the same sizes as the positive ones, are constructed by random sampling over the original imbalanced negative data. As a result, about 21.5% long-range contacts are correctly predicted. We also found that the ensemble of GaCs indeed makes an accuracy improvement by around 5.6% over the single GaC. Classifiers with the use of sequence profile centers may advance the long-range contact prediction. In line with this approach, key structural features in proteins would be determined with high efficiency and accuracy.
The computational linguistics of biological sequences
DOE Office of Scientific and Technical Information (OSTI.GOV)
Searls, D.
1995-12-31
This tutorial was one of eight tutorials selected to be presented at the Third International Conference on Intelligent Systems for Molecular Biology which was held in the United Kingdom from July 16 to 19, 1995. Protein sequences are analogous in many respects, particularly their folding behavior. Proteins have a much richer variety of interactions, but in theory the same linguistic principles could come to bear in describing dependencies between distant residues that arise by virtue of three-dimensional structure. This tutorial will concentrate on nucleic acid sequences.
Blank-Landeshammer, Bernhard; Kollipara, Laxmikanth; Biß, Karsten; Pfenninger, Markus; Malchow, Sebastian; Shuvaev, Konstantin; Zahedi, René P; Sickmann, Albert
2017-09-01
Complex mass spectrometry based proteomics data sets are mostly analyzed by protein database searches. While this approach performs considerably well for sequenced organisms, direct inference of peptide sequences from tandem mass spectra, i.e., de novo peptide sequencing, oftentimes is the only way to obtain information when protein databases are absent. However, available algorithms suffer from drawbacks such as lack of validation and often high rates of false positive hits (FP). Here we present a simple method of combining results from commonly available de novo peptide sequencing algorithms, which in conjunction with minor tweaks in data acquisition ensues lower empirical FDR compared to the analysis using single algorithms. Results were validated using state-of-the art database search algorithms as well specifically synthesized reference peptides. Thus, we could increase the number of PSMs meeting a stringent FDR of 5% more than 3-fold compared to the single best de novo sequencing algorithm alone, accounting for an average of 11 120 PSMs (combined) instead of 3476 PSMs (alone) in triplicate 2 h LC-MS runs of tryptic HeLa digestion.
Statistical analysis of native contact formation in the folding of designed model proteins
NASA Astrophysics Data System (ADS)
Tiana, Guido; Broglia, Ricardo A.
2001-02-01
The time evolution of the formation probability of native bonds has been studied for designed sequences which fold fast into the native conformation. From this analysis a clear hierarchy of bonds emerge: (a) local, fast forming highly stable native bonds built by some of the most strongly interacting amino acids of the protein; (b) nonlocal bonds formed late in the folding process, in coincidence with the folding nucleus, and involving essentially the same strongly interacting amino acids already participating in the fast bonds; (c) the rest of the native bonds whose behavior is subordinated, to a large extent, to that of the strong local and nonlocal native contacts.
Simkovic, Felix; Thomas, Jens M H; Keegan, Ronan M; Winn, Martyn D; Mayans, Olga; Rigden, Daniel J
2016-07-01
For many protein families, the deluge of new sequence information together with new statistical protocols now allow the accurate prediction of contacting residues from sequence information alone. This offers the possibility of more accurate ab initio (non-homology-based) structure prediction. Such models can be used in structure solution by molecular replacement (MR) where the target fold is novel or is only distantly related to known structures. Here, AMPLE, an MR pipeline that assembles search-model ensembles from ab initio structure predictions ('decoys'), is employed to assess the value of contact-assisted ab initio models to the crystallographer. It is demonstrated that evolutionary covariance-derived residue-residue contact predictions improve the quality of ab initio models and, consequently, the success rate of MR using search models derived from them. For targets containing β-structure, decoy quality and MR performance were further improved by the use of a β-strand contact-filtering protocol. Such contact-guided decoys achieved 14 structure solutions from 21 attempted protein targets, compared with nine for simple Rosetta decoys. Previously encountered limitations were superseded in two key respects. Firstly, much larger targets of up to 221 residues in length were solved, which is far larger than the previously benchmarked threshold of 120 residues. Secondly, contact-guided decoys significantly improved success with β-sheet-rich proteins. Overall, the improved performance of contact-guided decoys suggests that MR is now applicable to a significantly wider range of protein targets than were previously tractable, and points to a direct benefit to structural biology from the recent remarkable advances in sequencing.
Simkovic, Felix; Thomas, Jens M. H.; Keegan, Ronan M.; Winn, Martyn D.; Mayans, Olga; Rigden, Daniel J.
2016-01-01
For many protein families, the deluge of new sequence information together with new statistical protocols now allow the accurate prediction of contacting residues from sequence information alone. This offers the possibility of more accurate ab initio (non-homology-based) structure prediction. Such models can be used in structure solution by molecular replacement (MR) where the target fold is novel or is only distantly related to known structures. Here, AMPLE, an MR pipeline that assembles search-model ensembles from ab initio structure predictions (‘decoys’), is employed to assess the value of contact-assisted ab initio models to the crystallographer. It is demonstrated that evolutionary covariance-derived residue–residue contact predictions improve the quality of ab initio models and, consequently, the success rate of MR using search models derived from them. For targets containing β-structure, decoy quality and MR performance were further improved by the use of a β-strand contact-filtering protocol. Such contact-guided decoys achieved 14 structure solutions from 21 attempted protein targets, compared with nine for simple Rosetta decoys. Previously encountered limitations were superseded in two key respects. Firstly, much larger targets of up to 221 residues in length were solved, which is far larger than the previously benchmarked threshold of 120 residues. Secondly, contact-guided decoys significantly improved success with β-sheet-rich proteins. Overall, the improved performance of contact-guided decoys suggests that MR is now applicable to a significantly wider range of protein targets than were previously tractable, and points to a direct benefit to structural biology from the recent remarkable advances in sequencing. PMID:27437113
Neutrality and evolvability of designed protein sequences
NASA Astrophysics Data System (ADS)
Bhattacherjee, Arnab; Biswas, Parbati
2010-07-01
The effect of foldability on protein’s evolvability is analyzed by a two-prong approach consisting of a self-consistent mean-field theory and Monte Carlo simulations. Theory and simulation models representing protein sequences with binary patterning of amino acid residues compatible with a particular foldability criteria are used. This generalized foldability criterion is derived using the high temperature cumulant expansion approximating the free energy of folding. The effect of cumulative point mutations on these designed proteins is studied under neutral condition. The robustness, protein’s ability to tolerate random point mutations is determined with a selective pressure of stability (ΔΔG) for the theory designed sequences, which are found to be more robust than that of Monte Carlo and mean-field-biased Monte Carlo generated sequences. The results show that this foldability criterion selects viable protein sequences more effectively compared to the Monte Carlo method, which has a marked effect on how the selective pressure shapes the evolutionary sequence space. These observations may impact de novo sequence design and its applications in protein engineering.
Nabuurs, Sanne M; Westphal, Adrie H; aan den Toorn, Marije; Lindhoud, Simon; van Mierlo, Carlo P M
2009-06-17
Partially folded protein species transiently exist during folding of most proteins. Often these species are molten globules, which may be on- or off-pathway to native protein. Molten globules have a substantial amount of secondary structure but lack virtually all the tertiary side-chain packing characteristic of natively folded proteins. These ensembles of interconverting conformers are prone to aggregation and potentially play a role in numerous devastating pathologies, and thus attract considerable attention. The molten globule that is observed during folding of apoflavodoxin from Azotobacter vinelandii is off-pathway, as it has to unfold before native protein can be formed. Here we report that this species can be trapped under nativelike conditions by substituting amino acid residue F44 by Y44, allowing spectroscopic characterization of its conformation. Whereas native apoflavodoxin contains a parallel beta-sheet surrounded by alpha-helices (i.e., the flavodoxin-like or alpha-beta parallel topology), it is shown that the molten globule has a totally different topology: it is helical and contains no beta-sheet. The presence of this remarkably nonnative species shows that single polypeptide sequences can code for distinct folds that swap upon changing conditions. Topological switching between unrelated protein structures is likely a general phenomenon in the protein structure universe.
Structure Prediction and Analysis of Neuraminidase Sequence Variants
ERIC Educational Resources Information Center
Thayer, Kelly M.
2016-01-01
Analyzing protein structure has become an integral aspect of understanding systems of biochemical import. The laboratory experiment endeavors to introduce protein folding to ascertain structures of proteins for which the structure is unavailable, as well as to critically evaluate the quality of the prediction obtained. The model system used is the…
Kumwenda, Benjamin; Litthauer, Derek; Bishop, Özlem Tastan; Reva, Oleg
2013-01-01
Elucidation of evolutionary factors that enhance protein thermostability is a critical problem and was the focus of this work on Thermus species. Pairs of orthologous sequences of T. scotoductus SA-01 and T. thermophilus HB27, with the largest negative minimum folding energy (MFE) as predicted by the UNAFold algorithm, were statistically analyzed. Favored substitutions of amino acids residues and their properties were determined. Substitutions were analyzed in modeled protein structures to determine their locations and contribution to energy differences using PyMOL and FoldX programs respectively. Dominant trends in amino acid substitutions consistent with differences in thermostability between orthologous sequences were observed. T. thermophilus thermophilic proteins showed an increase in non-polar, tiny, and charged amino acids. An abundance of alanine substituted by serine and threonine, as well as arginine substituted by glutamine and lysine was observed in T. thermophilus HB27. Structural comparison showed that stabilizing mutations occurred on surfaces and loops in protein structures. PMID:24023508
antaRNA: ant colony-based RNA sequence design.
Kleinkauf, Robert; Mann, Martin; Backofen, Rolf
2015-10-01
RNA sequence design is studied at least as long as the classical folding problem. Although for the latter the functional fold of an RNA molecule is to be found ,: inverse folding tries to identify RNA sequences that fold into a function-specific target structure. In combination with RNA-based biotechnology and synthetic biology ,: reliable RNA sequence design becomes a crucial step to generate novel biochemical components. In this article ,: the computational tool antaRNA is presented. It is capable of compiling RNA sequences for a given structure that comply in addition with an adjustable full range objective GC-content distribution ,: specific sequence constraints and additional fuzzy structure constraints. antaRNA applies ant colony optimization meta-heuristics and its superior performance is shown on a biological datasets. http://www.bioinf.uni-freiburg.de/Software/antaRNA CONTACT: backofen@informatik.uni-freiburg.de Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Riback, Joshua A.; Bowman, Micayla A.; Zmyslowski, Adam M.
A substantial fraction of the proteome is intrinsically disordered, and even well-folded proteins adopt non-native geometries during synthesis, folding, transport, and turnover. Characterization of intrinsically disordered proteins (IDPs) is challenging, in part because of a lack of accurate physical models and the difficulty of interpreting experimental results. We have developed a general method to extract the dimensions and solvent quality (self-interactions) of IDPs from a single small-angle x-ray scattering measurement. We applied this procedure to a variety of IDPs and found that even IDPs with low net charge and high hydrophobicity remain highly expanded in water, contrary to the generalmore » expectation that protein-like sequences collapse in water. Our results suggest that the unfolded state of most foldable sequences is expanded; we conjecture that this property was selected by evolution to minimize misfolding and aggregation.« less
PreSSAPro: a software for the prediction of secondary structure by amino acid properties.
Costantini, Susan; Colonna, Giovanni; Facchiano, Angelo M
2007-10-01
PreSSAPro is a software, available to the scientific community as a free web service designed to provide predictions of secondary structures starting from the amino acid sequence of a given protein. Predictions are based on our recently published work on the amino acid propensities for secondary structures in either large but not homogeneous protein data sets, as well as in smaller but homogeneous data sets corresponding to protein structural classes, i.e. all-alpha, all-beta, or alpha-beta proteins. Predictions result improved by the use of propensities evaluated for the right protein class. PreSSAPro predicts the secondary structure according to the right protein class, if known, or gives a multiple prediction with reference to the different structural classes. The comparison of these predictions represents a novel tool to evaluate what sequence regions can assume different secondary structures depending on the structural class assignment, in the perspective of identifying proteins able to fold in different conformations. The service is available at the URL http://bioinformatica.isa.cnr.it/PRESSAPRO/.
@TOME-2: a new pipeline for comparative modeling of protein-ligand complexes.
Pons, Jean-Luc; Labesse, Gilles
2009-07-01
@TOME 2.0 is new web pipeline dedicated to protein structure modeling and small ligand docking based on comparative analyses. @TOME 2.0 allows fold recognition, template selection, structural alignment editing, structure comparisons, 3D-model building and evaluation. These tasks are routinely used in sequence analyses for structure prediction. In our pipeline the necessary software is efficiently interconnected in an original manner to accelerate all the processes. Furthermore, we have also connected comparative docking of small ligands that is performed using protein-protein superposition. The input is a simple protein sequence in one-letter code with no comment. The resulting 3D model, protein-ligand complexes and structural alignments can be visualized through dedicated Web interfaces or can be downloaded for further studies. These original features will aid in the functional annotation of proteins and the selection of templates for molecular modeling and virtual screening. Several examples are described to highlight some of the new functionalities provided by this pipeline. The server and its documentation are freely available at http://abcis.cbs.cnrs.fr/AT2/
Prolonged fasting increases purine recycling in post-weaned northern elephant seals.
Soñanez-Organis, José Guadalupe; Vázquez-Medina, José Pablo; Zenteno-Savín, Tania; Aguilar, Andres; Crocker, Daniel E; Ortiz, Rudy M
2012-05-01
Northern elephant seals are naturally adapted to prolonged periods (1-2 months) of absolute food and water deprivation (fasting). In terrestrial mammals, food deprivation stimulates ATP degradation and decreases ATP synthesis, resulting in the accumulation of purines (ATP degradation byproducts). Hypoxanthine-guanine phosphoribosyl transferase (HGPRT) salvages ATP by recycling the purine degradation products derived from xanthine oxidase (XO) metabolism, which also promotes oxidant production. The contributions of HGPRT to purine recycling during prolonged food deprivation in marine mammals are not well defined. In the present study we cloned and characterized the complete and partial cDNA sequences that encode for HGPRT and xanthine oxidoreductase (XOR) in northern elephant seals. We also measured XO protein expression and circulating activity, along with xanthine and hypoxanthine plasma content in fasting northern elephant seal pups. Blood, adipose and muscle tissue samples were collected from animals after 1, 3, 5 and 7 weeks of their natural post-weaning fast. The complete HGPRT and partial XOR cDNA sequences are 771 and 345 bp long and encode proteins of 218 and 115 amino acids, respectively, with conserved domains important for their function and regulation. XOR mRNA and XO protein expression increased 3-fold and 1.7-fold with fasting, respectively, whereas HGPRT mRNA (4-fold) and protein (2-fold) expression increased after 7 weeks in adipose tissue and muscle. Plasma xanthine (3-fold) and hypoxanthine (2.5-fold) levels, and XO (1.7- to 20-fold) and HGPRT (1.5- to 1.7-fold) activities increased during the last 2 weeks of fasting. Results suggest that prolonged fasting in elephant seal pups is associated with increased capacity to recycle purines, which may contribute to ameliorating oxidant production and enhancing the supply of ATP, both of which would be beneficial during prolonged food deprivation and appear to be adaptive in this species.
Prolonged fasting increases purine recycling in post-weaned northern elephant seals
Soñanez-Organis, José Guadalupe; Vázquez-Medina, José Pablo; Zenteno-Savín, Tania; Aguilar, Andres; Crocker, Daniel E.; Ortiz, Rudy M.
2012-01-01
SUMMARY Northern elephant seals are naturally adapted to prolonged periods (1–2 months) of absolute food and water deprivation (fasting). In terrestrial mammals, food deprivation stimulates ATP degradation and decreases ATP synthesis, resulting in the accumulation of purines (ATP degradation byproducts). Hypoxanthine-guanine phosphoribosyl transferase (HGPRT) salvages ATP by recycling the purine degradation products derived from xanthine oxidase (XO) metabolism, which also promotes oxidant production. The contributions of HGPRT to purine recycling during prolonged food deprivation in marine mammals are not well defined. In the present study we cloned and characterized the complete and partial cDNA sequences that encode for HGPRT and xanthine oxidoreductase (XOR) in northern elephant seals. We also measured XO protein expression and circulating activity, along with xanthine and hypoxanthine plasma content in fasting northern elephant seal pups. Blood, adipose and muscle tissue samples were collected from animals after 1, 3, 5 and 7 weeks of their natural post-weaning fast. The complete HGPRT and partial XOR cDNA sequences are 771 and 345 bp long and encode proteins of 218 and 115 amino acids, respectively, with conserved domains important for their function and regulation. XOR mRNA and XO protein expression increased 3-fold and 1.7-fold with fasting, respectively, whereas HGPRT mRNA (4-fold) and protein (2-fold) expression increased after 7 weeks in adipose tissue and muscle. Plasma xanthine (3-fold) and hypoxanthine (2.5-fold) levels, and XO (1.7- to 20-fold) and HGPRT (1.5- to 1.7-fold) activities increased during the last 2 weeks of fasting. Results suggest that prolonged fasting in elephant seal pups is associated with increased capacity to recycle purines, which may contribute to ameliorating oxidant production and enhancing the supply of ATP, both of which would be beneficial during prolonged food deprivation and appear to be adaptive in this species. PMID:22496280
DOE Office of Scientific and Technical Information (OSTI.GOV)
Borziak, Kirill; Jouline, Igor B
2007-01-01
Motivation: Sensory domains that are conserved among Bacteria, Archaea and Eucarya are important detectors of common signals detected by living cells. Due to their high sequence divergence, sensory domains are difficult to identify. We systematically look for novel sensory domains using sensitive profile-based searches initi-ated with regions of signal transduction proteins where no known domains can be identified by current domain models. Results: Using profile searches followed by multiple sequence alignment, structure prediction, and domain architecture analysis, we have identified a novel sensory domain termed FIST, which is present in signal transduction proteins from Bacteria, Archaea and Eucarya. Remote similaritymore » to a known ligand-binding fold and chromosomal proximity of FIST-encoding genes to those coding for proteins involved in amino acid metabolism and transport suggest that FIST domains bind small ligands, such as amino acids.« less
Phillips, J J; Javadi, Y; Millership, C; Main, E R G
2012-01-01
Tetratricopeptide repeats (TPRs) are a class of all alpha-helical repeat proteins that are comprised of 34-aa helix-turn-helix motifs. These stack together to form nonglobular structures that are stabilized by short-range interactions from residues close in primary sequence. Unlike globular proteins, they have few, if any, long-range nonlocal stabilizing interactions. Several studies on designed TPR proteins have shown that this modular structure is reflected in their folding, that is, modular multistate folding is observed as opposed to two-state folding. Here we show that TPR multistate folding can be suppressed to approximate two-state folding through modulation of intrinsic stability or extrinsic environmental variables. This modulation was investigated by comparing the thermodynamic unfolding under differing buffer regimes of two distinct series of consensus-designed TPR proteins, which possess different intrinsic stabilities. A total of nine proteins of differing sizes and differing consensus TPR motifs were each thermally and chemically denatured and their unfolding monitored using differential scanning calorimetry (DSC) and CD/fluorescence, respectively. Analyses of both the DSC and chemical denaturation data show that reducing the total stability of each protein and repeat units leads to observable two-state unfolding. These data highlight the intimate link between global and intrinsic repeat stability that governs whether folding proceeds by an observably two-state mechanism, or whether partial unfolding yields stable intermediate structures which retain sufficient stability to be populated at equilibrium. PMID:22170589
Modulation of frustration in folding by sequence permutation.
Nobrega, R Paul; Arora, Karunesh; Kathuria, Sagar V; Graceffa, Rita; Barrea, Raul A; Guo, Liang; Chakravarthy, Srinivas; Bilsel, Osman; Irving, Thomas C; Brooks, Charles L; Matthews, C Robert
2014-07-22
Folding of globular proteins can be envisioned as the contraction of a random coil unfolded state toward the native state on an energy surface rough with local minima trapping frustrated species. These substructures impede productive folding and can serve as nucleation sites for aggregation reactions. However, little is known about the relationship between frustration and its underlying sequence determinants. Chemotaxis response regulator Y (CheY), a 129-amino acid bacterial protein, has been shown previously to populate an off-pathway kinetic trap in the microsecond time range. The frustration has been ascribed to premature docking of the N- and C-terminal subdomains or, alternatively, to the formation of an unproductive local-in-sequence cluster of branched aliphatic side chains, isoleucine, leucine, and valine (ILV). The roles of the subdomains and ILV clusters in frustration were tested by altering the sequence connectivity using circular permutations. Surprisingly, the stability and buried surface area of the intermediate could be increased or decreased depending on the location of the termini. Comparison with the results of small-angle X-ray-scattering experiments and simulations points to the accelerated formation of a more compact, on-pathway species for the more stable intermediate. The effect of chain connectivity in modulating the structures and stabilities of the early kinetic traps in CheY is better understood in terms of the ILV cluster model. However, the subdomain model captures the requirement for an intact N-terminal domain to access the native conformation. Chain entropy and aliphatic-rich sequences play crucial roles in biasing the early events leading to frustration in the folding of CheY.
Sumbalova, Lenka; Stourac, Jan; Martinek, Tomas; Bednar, David; Damborsky, Jiri
2018-05-23
HotSpot Wizard is a web server used for the automated identification of hotspots in semi-rational protein design to give improved protein stability, catalytic activity, substrate specificity and enantioselectivity. Since there are three orders of magnitude fewer protein structures than sequences in bioinformatic databases, the major limitation to the usability of previous versions was the requirement for the protein structure to be a compulsory input for the calculation. HotSpot Wizard 3.0 now accepts the protein sequence as input data. The protein structure for the query sequence is obtained either from eight repositories of homology models or is modeled using Modeller and I-Tasser. The quality of the models is then evaluated using three quality assessment tools-WHAT_CHECK, PROCHECK and MolProbity. During follow-up analyses, the system automatically warns the users whenever they attempt to redesign poorly predicted parts of their homology models. The second main limitation of HotSpot Wizard's predictions is that it identifies suitable positions for mutagenesis, but does not provide any reliable advice on particular substitutions. A new module for the estimation of thermodynamic stabilities using the Rosetta and FoldX suites has been introduced which prevents destabilizing mutations among pre-selected variants entering experimental testing. HotSpot Wizard is freely available at http://loschmidt.chemi.muni.cz/hotspotwizard.
Evolution of I-SceI Homing Endonucleases with Increased DNA Recognition Site Specificity
DOE Office of Scientific and Technical Information (OSTI.GOV)
Joshi, Rakesh; Ho, Kwok Ki; Tenney, Kristen
2013-09-18
Elucidating how homing endonucleases undergo changes in recognition site specificity will facilitate efforts to engineer proteins for gene therapy applications. I-SceI is a monomeric homing endonuclease that recognizes and cleaves within an 18-bp target. It tolerates limited degeneracy in its target sequence, including substitution of a C:G{sub +4} base pair for the wild-type A:T{sub +4} base pair. Libraries encoding randomized amino acids at I-SceI residue positions that contact or are proximal to A:T{sub +4} were used in conjunction with a bacterial one-hybrid system to select I-SceI derivatives that bind to recognition sites containing either the A:T{sub +4} or the C:G{submore » +4} base pairs. As expected, isolates encoding wild-type residues at the randomized positions were selected using either target sequence. All I-SceI proteins isolated using the C:G{sub +4} recognition site included small side-chain substitutions at G100 and either contained (K86R/G100T, K86R/G100S and K86R/G100C) or lacked (G100A, G100T) a K86R substitution. Interestingly, the binding affinities of the selected variants for the wild-type A:T{sub +4} target are 4- to 11-fold lower than that of wild-type I-SceI, whereas those for the C:G{sub +4} target are similar. The increased specificity of the mutant proteins is also evident in binding experiments in vivo. These differences in binding affinities account for the observed -36-fold difference in target preference between the K86R/G100T and wild-type proteins in DNA cleavage assays. An X-ray crystal structure of the K86R/G100T mutant protein bound to a DNA duplex containing the C:G{sub +4} substitution suggests how sequence specificity of a homing enzyme can increase. This biochemical and structural analysis defines one pathway by which site specificity is augmented for a homing endonuclease.« less
Raman, Rajeev; Rajanikanth, V; Palaniappan, Raghavan U M; Lin, Yi-Pin; He, Hongxuan; McDonough, Sean P; Sharma, Yogendra; Chang, Yung-Fu
2010-12-29
Many bacterial surface exposed proteins mediate the host-pathogen interaction more effectively in the presence of Ca²+. Leptospiral immunoglobulin-like (Lig) proteins, LigA and LigB, are surface exposed proteins containing Bacterial immunoglobulin like (Big) domains. The function of proteins which contain Big fold is not known. Based on the possible similarities of immunoglobulin and βγ-crystallin folds, we here explore the important question whether Ca²+ binds to a Big domains, which would provide a novel functional role of the proteins containing Big fold. We selected six individual Big domains for this study (three from the conserved part of LigA and LigB, denoted as Lig A3, Lig A4, and LigBCon5; two from the variable region of LigA, i.e., 9(th) (Lig A9) and 10(th) repeats (Lig A10); and one from the variable region of LigB, i.e., LigBCen2. We have also studied the conserved region covering the three and six repeats (LigBCon1-3 and LigCon). All these proteins bind the calcium-mimic dye Stains-all. All the selected four domains bind Ca²+ with dissociation constants of 2-4 µM. Lig A9 and Lig A10 domains fold well with moderate thermal stability, have β-sheet conformation and form homodimers. Fluorescence spectra of Big domains show a specific doublet (at 317 and 330 nm), probably due to Trp interaction with a Phe residue. Equilibrium unfolding of selected Big domains is similar and follows a two-state model, suggesting the similarity in their fold. We demonstrate that the Lig are Ca²+-binding proteins, with Big domains harbouring the binding motif. We conclude that despite differences in sequence, a Big motif binds Ca²+. This work thus sets up a strong possibility for classifying the proteins containing Big domains as a novel family of Ca²+-binding proteins. Since Big domain is a part of many proteins in bacterial kingdom, we suggest a possible function these proteins via Ca²+ binding.
Palaniappan, Raghavan U. M.; Lin, Yi-Pin; He, Hongxuan; McDonough, Sean P.; Sharma, Yogendra; Chang, Yung-Fu
2010-01-01
Background Many bacterial surface exposed proteins mediate the host-pathogen interaction more effectively in the presence of Ca2+. Leptospiral immunoglobulin-like (Lig) proteins, LigA and LigB, are surface exposed proteins containing Bacterial immunoglobulin like (Big) domains. The function of proteins which contain Big fold is not known. Based on the possible similarities of immunoglobulin and βγ-crystallin folds, we here explore the important question whether Ca2+ binds to a Big domains, which would provide a novel functional role of the proteins containing Big fold. Principal Findings We selected six individual Big domains for this study (three from the conserved part of LigA and LigB, denoted as Lig A3, Lig A4, and LigBCon5; two from the variable region of LigA, i.e., 9th (Lig A9) and 10th repeats (Lig A10); and one from the variable region of LigB, i.e., LigBCen2. We have also studied the conserved region covering the three and six repeats (LigBCon1-3 and LigCon). All these proteins bind the calcium-mimic dye Stains-all. All the selected four domains bind Ca2+ with dissociation constants of 2–4 µM. Lig A9 and Lig A10 domains fold well with moderate thermal stability, have β-sheet conformation and form homodimers. Fluorescence spectra of Big domains show a specific doublet (at 317 and 330 nm), probably due to Trp interaction with a Phe residue. Equilibrium unfolding of selected Big domains is similar and follows a two-state model, suggesting the similarity in their fold. Conclusions We demonstrate that the Lig are Ca2+-binding proteins, with Big domains harbouring the binding motif. We conclude that despite differences in sequence, a Big motif binds Ca2+. This work thus sets up a strong possibility for classifying the proteins containing Big domains as a novel family of Ca2+-binding proteins. Since Big domain is a part of many proteins in bacterial kingdom, we suggest a possible function these proteins via Ca2+ binding. PMID:21206924
Characterizing protein conformations by correlation analysis of coarse-grained contact matrices.
Lindsay, Richard J; Siess, Jan; Lohry, David P; McGee, Trevor S; Ritchie, Jordan S; Johnson, Quentin R; Shen, Tongye
2018-01-14
We have developed a method to capture the essential conformational dynamics of folded biopolymers using statistical analysis of coarse-grained segment-segment contacts. Previously, the residue-residue contact analysis of simulation trajectories was successfully applied to the detection of conformational switching motions in biomolecular complexes. However, the application to large protein systems (larger than 1000 amino acid residues) is challenging using the description of residue contacts. Also, the residue-based method cannot be used to compare proteins with different sequences. To expand the scope of the method, we have tested several coarse-graining schemes that group a collection of consecutive residues into a segment. The definition of these segments may be derived from structural and sequence information, while the interaction strength of the coarse-grained segment-segment contacts is a function of the residue-residue contacts. We then perform covariance calculations on these coarse-grained contact matrices. We monitored how well the principal components of the contact matrices is preserved using various rendering functions. The new method was demonstrated to assist the reduction of the degrees of freedom for describing the conformation space, and it potentially allows for the analysis of a system that is approximately tenfold larger compared with the corresponding residue contact-based method. This method can also render a family of similar proteins into the same conformational space, and thus can be used to compare the structures of proteins with different sequences.
Characterizing protein conformations by correlation analysis of coarse-grained contact matrices
NASA Astrophysics Data System (ADS)
Lindsay, Richard J.; Siess, Jan; Lohry, David P.; McGee, Trevor S.; Ritchie, Jordan S.; Johnson, Quentin R.; Shen, Tongye
2018-01-01
We have developed a method to capture the essential conformational dynamics of folded biopolymers using statistical analysis of coarse-grained segment-segment contacts. Previously, the residue-residue contact analysis of simulation trajectories was successfully applied to the detection of conformational switching motions in biomolecular complexes. However, the application to large protein systems (larger than 1000 amino acid residues) is challenging using the description of residue contacts. Also, the residue-based method cannot be used to compare proteins with different sequences. To expand the scope of the method, we have tested several coarse-graining schemes that group a collection of consecutive residues into a segment. The definition of these segments may be derived from structural and sequence information, while the interaction strength of the coarse-grained segment-segment contacts is a function of the residue-residue contacts. We then perform covariance calculations on these coarse-grained contact matrices. We monitored how well the principal components of the contact matrices is preserved using various rendering functions. The new method was demonstrated to assist the reduction of the degrees of freedom for describing the conformation space, and it potentially allows for the analysis of a system that is approximately tenfold larger compared with the corresponding residue contact-based method. This method can also render a family of similar proteins into the same conformational space, and thus can be used to compare the structures of proteins with different sequences.
Bar, Maya; Bar-Ziv, Roy; Scherf, Tali; Fass, Deborah
2006-08-01
The Tenebrio molitor thermal hysteresis protein has a cysteine content of 19%. This 84-residue protein folds as a compact beta-helix, with eight disulfide bonds buried in its core. Exposed on one face of the protein is an array of threonine residues, which constitutes the ice-binding face. Previous protocols for expression of this protein in recombinant expression systems resulted in inclusion bodies or soluble but largely inactive material. A long and laborious refolding procedure was performed to increase the fraction of active protein and isolate it from inactive fractions. We present a new protocol for production of fully folded and active T. molitor thermal hysteresis protein in bacteria, without the need for in vitro refolding. The protein coding sequence was fused to those of various carrier proteins and expressed at low temperature in a bacterial strain specially suited for production of disulfide-bonded proteins. The product, after a simple and robust purification procedure, was analyzed spectroscopically and functionally and was found to compare favorably to previously published data on refolded protein and protein obtained from its native source.
Llanes, Antonio; Muñoz, Andrés; Bueno-Crespo, Andrés; García-Valverde, Teresa; Sánchez, Antonia; Arcas-Túnez, Francisco; Pérez-Sánchez, Horacio; Cecilia, José M
2016-01-01
The protein-folding problem has been extensively studied during the last fifty years. The understanding of the dynamics of global shape of a protein and the influence on its biological function can help us to discover new and more effective drugs to deal with diseases of pharmacological relevance. Different computational approaches have been developed by different researchers in order to foresee the threedimensional arrangement of atoms of proteins from their sequences. However, the computational complexity of this problem makes mandatory the search for new models, novel algorithmic strategies and hardware platforms that provide solutions in a reasonable time frame. We present in this revision work the past and last tendencies regarding protein folding simulations from both perspectives; hardware and software. Of particular interest to us are both the use of inexact solutions to this computationally hard problem as well as which hardware platforms have been used for running this kind of Soft Computing techniques.
Yadav, Saurabh; Kumari, Pragati; Kushwaha, Hemant Ritturaj
2013-01-01
Glutaredoxins are enzymatic antioxidants which are small, ubiquitous, glutathione dependent and essentially classified under thioredoxin-fold superfamily. Glutaredoxins are classified into two types: dithiol and monothiol. Monothiol glutaredoxins which carry the signature "CGFS" as a redox active motif is known for its role in oxidative stress, inside the cell. In the present analysis, the 138 amino acid long monothiol glutaredoxin, AgGRX1 from Ashbya gossypii was identified and has been used for the analysis. The multiple sequence alignment of the AgGRX1 protein sequence revealed the characteristic motif of typical monothiol glutaredoxin as observed in various other organisms. The proposed structure of the AgGRX1 protein was used to analyze signature folds related to the thioredoxin superfamily. Further, the study highlighted the structural features pertaining to the complex mechanism of glutathione docking and interacting residues.
Rea, Anita M; Simpson, Emma R; Meldrum, Jill K; Williams, Huw E L; Searle, Mark S
2008-12-02
The fast folding of small proteins is likely to be the product of evolutionary pressures that balance the search for native-like contacts in the transition state with the minimum number of stable non-native interactions that could lead to partially folded states prone to aggregation and amyloid formation. We have investigated the effects of non-native interactions on the folding landscape of yeast ubiquitin by introducing aromatic substitutions into the beta-turn region of the N-terminal beta-hairpin, using both the native G-bulged type I turn sequence (TXTGK) as well as an engineered 2:2 XNGK type I' turn sequence. The N-terminal beta-hairpin is a recognized folding nucleation site in ubiquitin. The folding kinetics for wt-Ub (TLTGK) and the type I' turn mutant (TNGK) reveal only a weakly populated intermediate, however, substitution with X = Phe or Trp in either context results in a high propensity to form a stable compact intermediate where the initial U-->I collapse is visible as a distinct kinetic phase. The introduction of Trp into either of the two host turn sequences results in either complex multiphase kinetics with the possibility of parallel folding pathways, or formation of a highly compact I-state stabilized by non-native interactions that must unfold before refolding. Sequence substitutions with aromatic residues within a localized beta-turn capable of forming non-native hydrophobic contacts in both the native state and partially folded states has the undesirable consequence that folding is frustrated by the formation of stable compact intermediates that evolutionary pressures at the sequence level may have largely eliminated.
Structure and Sequence Search on Aptamer-Protein Docking
NASA Astrophysics Data System (ADS)
Xiao, Jiajie; Bonin, Keith; Guthold, Martin; Salsbury, Freddie
2015-03-01
Interactions between proteins and deoxyribonucleic acid (DNA) play a significant role in the living systems, especially through gene regulation. However, short nucleic acids sequences (aptamers) with specific binding affinity to specific proteins exhibit clinical potential as therapeutics. Our capillary and gel electrophoresis selection experiments show that specific sequences of aptamers can be selected that bind specific proteins. Computationally, given the experimentally-determined structure and sequence of a thrombin-binding aptamer, we can successfully dock the aptamer onto thrombin in agreement with experimental structures of the complex. In order to further study the conformational flexibility of this thrombin-binding aptamer and to potentially develop a predictive computational model of aptamer-binding, we use GPU-enabled molecular dynamics simulations to both examine the conformational flexibility of the aptamer in the absence of binding to thrombin, and to determine our ability to fold an aptamer. This study should help further de-novo predictions of aptamer sequences by enabling the study of structural and sequence-dependent effects on aptamer-protein docking specificity.
Yellow fluorescent protein phiYFPv (Phialidium): structure and structure-based mutagenesis
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pletneva, Nadya V.; Pletnev, Vladimir Z., E-mail: vzpletnev@gmail.com; Souslova, Ekaterina
The yellow fluorescent protein phiYFPv with improved folding has been developed from the spectrally identical wild-type phiYFP found in the marine jellyfish Phialidium. The yellow fluorescent protein phiYFPv (λ{sub em}{sup max} ≃ 537 nm) with improved folding has been developed from the spectrally identical wild-type phiYFP found in the marine jellyfish Phialidium. The latter fluorescent protein is one of only two known cases of naturally occurring proteins that exhibit emission spectra in the yellow–orange range (535–555 nm). Here, the crystal structure of phiYFPv has been determined at 2.05 Å resolution. The ‘yellow’ chromophore formed from the sequence triad Thr65-Tyr66-Gly67 adoptsmore » the bicyclic structure typical of fluorophores emitting in the green spectral range. It was demonstrated that perfect antiparallel π-stacking of chromophore Tyr66 and the proximal Tyr203, as well as Val205, facing the chromophore phenolic ring are chiefly responsible for the observed yellow emission of phiYFPv at 537 nm. Structure-based site-directed mutagenesis has been used to identify the key functional residues in the chromophore environment. The obtained results have been utilized to improve the properties of phiYFPv and its homologous monomeric biomarker tagYFP.« less
Asymmetric scoring functions for proteins
NASA Astrophysics Data System (ADS)
Lezon, Timothy; Holter, Neal; Maritan, Amos; Banavar, Jayanth
2003-03-01
The protein folding problem entails the prediction of the native state structure of a protein given the sequence of amino acids. In a coarse-grained description of a protein, an important ingredient for attempting this task is the determination of the effective energies of interaction between amino acids. We will discuss a simple approach for determining such interaction potentials from a training set of protein sequences and their experimentally determined native state structures. The key new ingredient in our study is the incorporation of the lack of symmetry in the effective interactions between amino acids. Our results, obtained using a set of 513 proteins, and their implications will be discussed.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yang, S.; Park, S.; Makowski, L.
Small angle X-ray scattering (SAXS) is an increasingly powerful technique to characterize the structure of biomolecules in solution. We present a computational method for accurately and efficiently computing the solution scattering curve from a protein with dynamical fluctuations. The method is built upon a coarse-grained (CG) representation of the protein. This CG approach takes advantage of the low-resolution character of solution scattering. It allows rapid determination of the scattering pattern from conformations extracted from CG simulations to obtain scattering characterization of the protein conformational landscapes. Important elements incorporated in the method include an effective residue-based structure factor for each aminomore » acid, an explicit treatment of the hydration layer at the surface of the protein, and an ensemble average of scattering from all accessible conformations to account for macromolecular flexibility. The CG model is calibrated and illustrated to accurately reproduce the experimental scattering curve of Hen egg white lysozyme. We then illustrate the computational method by calculating the solution scattering pattern of several representative protein folds and multiple conformational states. The results suggest that solution scattering data, when combined with a reliable computational method, have great potential for a better structural description of multi-domain complexes in different functional states, and for recognizing structural folds when sequence similarity to a protein of known structure is low. Possible applications of the method are discussed.« less
Davey, James A; Chica, Roberto A
2015-04-01
Computational protein design (CPD) predictions are highly dependent on the structure of the input template used. However, it is unclear how small differences in template geometry translate to large differences in stability prediction accuracy. Herein, we explored how structural changes to the input template affect the outcome of stability predictions by CPD. To do this, we prepared alternate templates by Rotamer Optimization followed by energy Minimization (ROM) and used them to recapitulate the stability of 84 protein G domain β1 mutant sequences. In the ROM process, side-chain rotamers for wild-type (WT) or mutant sequences are optimized on crystal or nuclear magnetic resonance (NMR) structures prior to template minimization, resulting in alternate structures termed ROM templates. We show that use of ROM templates prepared from sequences known to be stable results predominantly in improved prediction accuracy compared to using the minimized crystal or NMR structures. Conversely, ROM templates prepared from sequences that are less stable than the WT reduce prediction accuracy by increasing the number of false positives. These observed changes in prediction outcomes are attributed to differences in side-chain contacts made by rotamers in ROM templates. Finally, we show that ROM templates prepared from sequences that are unfolded or that adopt a nonnative fold result in the selective enrichment of sequences that are also unfolded or that adopt a nonnative fold, respectively. Our results demonstrate the existence of a rotamer bias caused by the input template that can be harnessed to skew predictions toward sequences displaying desired characteristics. © 2014 The Protein Society.
Pek, Han Bin; Klement, Maximilian; Ang, Kok Siong; Chung, Bevan Kai-Sheng; Ow, Dave Siak-Wei; Lee, Dong-Yup
2015-01-01
Various isoforms of invertases from prokaryotes, fungi, and higher plants has been expressed in Escherichia coli, and codon optimisation is a widely-adopted strategy for improvement of heterologous enzyme expression. Successful synthetic gene design for recombinant protein expression can be done by matching its translational elongation rate against heterologous host organisms via codon optimization. Amongst the various design parameters considered for the gene synthesis, codon context bias has been relatively overlooked compared to individual codon usage which is commonly adopted in most of codon optimization tools. In addition, matching the rates of transcription and translation based on secondary structure may lead to enhanced protein folding. In this study, we evaluated codon context fitness as design criterion for improving the expression of thermostable invertase from Thermotoga maritima in Escherichia coli and explored the relevance of secondary structure regions for folding and expression. We designed three coding sequences by using (1) a commercial vendor optimized gene algorithm, (2) codon context for the whole gene, and (3) codon context based on the secondary structure regions. Then, the codon optimized sequences were transformed and expressed in E. coli. From the resultant enzyme activities and protein yield data, codon context fitness proved to have the highest activity as compared to the wild-type control and other criteria while secondary structure-based strategy is comparable to the control. Codon context bias was shown to be a relevant parameter for enhancing enzyme production in Escherichia coli by codon optimization. Thus, we can effectively design synthetic genes within heterologous host organisms using this criterion. Copyright © 2015 Elsevier Inc. All rights reserved.
Barrick, Doug
2011-01-01
Mapping the stability distributions of proteins in their native folded states provides a critical link between structure, thermodynamics, and function. Linear repeat proteins have proven more amenable to this kind of mapping than globular proteins. C-terminal deletion studies of YopM, a large, linear leucine-rich repeat (LRR) protein, show that stability is distributed quite heterogeneously, yet a high level of cooperativity is maintained [1]. Key components of this distribution are three interfaces that strongly stabilize adjacent sequences, thereby maintaining structural integrity and promoting cooperativity. To better understand the distribution of interaction energy around these critical interfaces, we studied internal (rather than terminal) deletions of three LRRs in this region, including one of these stabilizing interfaces. Contrary to our expectation that deletion of structured repeats should be destabilizing, we find that internal deletion of folded repeats can actually stabilize the native state, suggesting that these repeats are destabilizing, although paradoxically, they are folded in the native state. We identified two residues within this destabilizing segment that deviate from the consensus sequence at a position that normally forms a stacked leucine ladder in the hydrophobic core. Replacement of these nonconsensus residues with leucine is stabilizing. This stability enhancement can be reproduced in the context of nonnative interfaces, but it requires an extended hydrophobic core. Our results demonstrate that different LRRs vary widely in their contribution to stability, and that this variation is context-dependent. These two factors are likely to determine the types of rearrangements that lead to folded, functional proteins, and in turn, are likely to restrict the pathways available for the evolution of linear repeat proteins. PMID:21764506
Mathematical methods for protein science
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hart, W.; Istrail, S.; Atkins, J.
1997-12-31
Understanding the structure and function of proteins is a fundamental endeavor in molecular biology. Currently, over 100,000 protein sequences have been determined by experimental methods. The three dimensional structure of the protein determines its function, but there are currently less than 4,000 structures known to atomic resolution. Accordingly, techniques to predict protein structure from sequence have an important role in aiding the understanding of the Genome and the effects of mutations in genetic disease. The authors describe current efforts at Sandia to better understand the structure of proteins through rigorous mathematical analyses of simple lattice models. The efforts have focusedmore » on two aspects of protein science: mathematical structure prediction, and inverse protein folding.« less
Diversity in the origins of proteostasis networks- a driver for protein function in evolution
Powers, Evan T.; Balch, William E.
2013-01-01
Although a protein’s primary sequence largely determines its function, proteins can adopt different folding states in response to changes in the environment, some of which may be deleterious to the organism. All organisms, including Bacteria, Archaea and Eukarya, have evolved a protein homeostasis network, or proteostasis network, that consists of chaperones and folding factors, degradation components, signalling pathways and specialized compartmentalized modules that manage protein folding in response to environmental stimuli and variation. Surveying the origins of proteostasis networks reveals that they have co-evolved with the proteome to regulate the physiological state of the cell, reflecting the unique stresses that different cells or organisms experience, and that they have a key role in driving evolution by closely managing the link between the phenotype and the genotype. PMID:23463216
Pineda-Lucena, Antonio; Liao, Jack C C; Cort, John R; Yee, Adelinda; Kennedy, Michael A; Edwards, Aled M; Arrowsmith, Cheryl H
2003-05-01
As part of the Northeast Structural Genomics Consortium pilot project focused on small eukaryotic proteins and protein domains, we have determined the NMR structure of the protein encoded by ORF YML108W from Saccharomyces cerevisiae. YML108W belongs to one of the numerous structural proteomics targets whose biological function is unknown. Moreover, this protein does not have sequence similarity to any other protein. The NMR structure of YML108W consists of a four-stranded beta-sheet with strand order 2143 and two alpha-helices, with an overall topology of betabetaalphabetabetaalpha. Strand beta1 runs parallel to beta4, and beta2:beta1 and beta4:beta3 pairs are arranged in an antiparallel fashion. Although this fold belongs to the split betaalphabeta family, it appears to be unique among this family; it is a novel arrangement of secondary structure, thereby expanding the universe of protein folds.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hsu, P. J.; Lai, S. K., E-mail: sklai@coll.phy.ncu.edu.tw; Molecular Science and Technology Program, Taiwan International Graduate Program, Academia Sinica, Taipei 115, Taiwan
Folded conformations of proteins in thermodynamically stable states have long lifetimes. Before it folds into a stable conformation, or after unfolding from a stable conformation, the protein will generally stray from one random conformation to another leading thus to rapid fluctuations. Brief structural changes therefore occur before folding and unfolding events. These short-lived movements are easily overlooked in studies of folding/unfolding for they represent momentary excursions of the protein to explore conformations in the neighborhood of the stable conformation. The present study looks for precursory signatures of protein folding/unfolding within these rapid fluctuations through a combination of three techniques: (1)more » ultrafast shape recognition, (2) time series segmentation, and (3) time series correlation analysis. The first procedure measures the differences between statistical distance distributions of atoms in different conformations by calculating shape similarity indices from molecular dynamics simulation trajectories. The second procedure is used to discover the times at which the protein makes transitions from one conformation to another. Finally, we employ the third technique to exploit spatial fingerprints of the stable conformations; this procedure is to map out the sequences of changes preceding the actual folding and unfolding events, since strongly correlated atoms in different conformations are different due to bond and steric constraints. The aforementioned high-frequency fluctuations are therefore characterized by distinct correlational and structural changes that are associated with rate-limiting precursors that translate into brief segments. Guided by these technical procedures, we choose a model system, a fragment of the protein transthyretin, for identifying in this system not only the precursory signatures of transitions associated with α helix and β hairpin, but also the important role played by weaker correlations in such protein folding dynamics.« less
NASA Astrophysics Data System (ADS)
Hsu, P. J.; Cheong, S. A.; Lai, S. K.
2014-05-01
Folded conformations of proteins in thermodynamically stable states have long lifetimes. Before it folds into a stable conformation, or after unfolding from a stable conformation, the protein will generally stray from one random conformation to another leading thus to rapid fluctuations. Brief structural changes therefore occur before folding and unfolding events. These short-lived movements are easily overlooked in studies of folding/unfolding for they represent momentary excursions of the protein to explore conformations in the neighborhood of the stable conformation. The present study looks for precursory signatures of protein folding/unfolding within these rapid fluctuations through a combination of three techniques: (1) ultrafast shape recognition, (2) time series segmentation, and (3) time series correlation analysis. The first procedure measures the differences between statistical distance distributions of atoms in different conformations by calculating shape similarity indices from molecular dynamics simulation trajectories. The second procedure is used to discover the times at which the protein makes transitions from one conformation to another. Finally, we employ the third technique to exploit spatial fingerprints of the stable conformations; this procedure is to map out the sequences of changes preceding the actual folding and unfolding events, since strongly correlated atoms in different conformations are different due to bond and steric constraints. The aforementioned high-frequency fluctuations are therefore characterized by distinct correlational and structural changes that are associated with rate-limiting precursors that translate into brief segments. Guided by these technical procedures, we choose a model system, a fragment of the protein transthyretin, for identifying in this system not only the precursory signatures of transitions associated with α helix and β hairpin, but also the important role played by weaker correlations in such protein folding dynamics.
Polymer Uncrossing and Knotting in Protein Folding, and Their Role in Minimal Folding Pathways
Mohazab, Ali R.; Plotkin, Steven S.
2013-01-01
We introduce a method for calculating the extent to which chain non-crossing is important in the most efficient, optimal trajectories or pathways for a protein to fold. This involves recording all unphysical crossing events of a ghost chain, and calculating the minimal uncrossing cost that would have been required to avoid such events. A depth-first tree search algorithm is applied to find minimal transformations to fold , , , and knotted proteins. In all cases, the extra uncrossing/non-crossing distance is a small fraction of the total distance travelled by a ghost chain. Different structural classes may be distinguished by the amount of extra uncrossing distance, and the effectiveness of such discrimination is compared with other order parameters. It was seen that non-crossing distance over chain length provided the best discrimination between structural and kinetic classes. The scaling of non-crossing distance with chain length implies an inevitable crossover to entanglement-dominated folding mechanisms for sufficiently long chains. We further quantify the minimal folding pathways by collecting the sequence of uncrossing moves, which generally involve leg, loop, and elbow-like uncrossing moves, and rendering the collection of these moves over the unfolded ensemble as a multiple-transformation “alignment”. The consensus minimal pathway is constructed and shown schematically for representative cases of an , , and knotted protein. An overlap parameter is defined between pathways; we find that proteins have minimal overlap indicating diverse folding pathways, knotted proteins are highly constrained to follow a dominant pathway, and proteins are somewhere in between. Thus we have shown how topological chain constraints can induce dominant pathway mechanisms in protein folding. PMID:23365638
Goldenzweig, Adi; Goldsmith, Moshe; Hill, Shannon E; Gertman, Or; Laurino, Paola; Ashani, Yacov; Dym, Orly; Unger, Tamar; Albeck, Shira; Prilusky, Jaime; Lieberman, Raquel L; Aharoni, Amir; Silman, Israel; Sussman, Joel L; Tawfik, Dan S; Fleishman, Sarel J
2016-07-21
Upon heterologous overexpression, many proteins misfold or aggregate, thus resulting in low functional yields. Human acetylcholinesterase (hAChE), an enzyme mediating synaptic transmission, is a typical case of a human protein that necessitates mammalian systems to obtain functional expression. We developed a computational strategy and designed an AChE variant bearing 51 mutations that improved core packing, surface polarity, and backbone rigidity. This variant expressed at ∼2,000-fold higher levels in E. coli compared to wild-type hAChE and exhibited 20°C higher thermostability with no change in enzymatic properties or in the active-site configuration as determined by crystallography. To demonstrate broad utility, we similarly designed four other human and bacterial proteins. Testing at most three designs per protein, we obtained enhanced stability and/or higher yields of soluble and active protein in E. coli. Our algorithm requires only a 3D structure and several dozen sequences of naturally occurring homologs, and is available at http://pross.weizmann.ac.il. Copyright © 2016 The Author(s). Published by Elsevier Inc. All rights reserved.
MINS2: revisiting the molecular code for transmembrane-helix recognition by the Sec61 translocon.
Park, Yungki; Helms, Volkhard
2008-08-15
To be fully functional, membrane proteins should not only fold, but also get inserted into the membrane, which is mediated by the Sec61 translocon. Recent experimental studies have attempted to elucidate how the Sec61 translocon accomplishes this delicate task by measuring the translocon-mediated membrane insertion free energies of 357 systematically designed peptides. On the basis of this data set, we have developed MINS2, a novel sequence-based computational method for predicting the membrane insertion free energies of protein sequences. A benchmark analysis of MINS2 shows that MINS2 signi.cantly outperforms previously proposed methods. Importantly, the application of MINS2 to known membrane protein structures shows that a better prediction of membrane insertion free energies does not lead to a better prediction of transmembrane segments of polytopic membrane proteins. A web server for MINS2 is publicly available at http://service.bioinformatik.uni-saarland.de/mins. Supplementary data are available at Bioinformatics online.
Structural protein descriptors in 1-dimension and their sequence-based predictions.
Kurgan, Lukasz; Disfani, Fatemeh Miri
2011-09-01
The last few decades observed an increasing interest in development and application of 1-dimensional (1D) descriptors of protein structure. These descriptors project 3D structural features onto 1D strings of residue-wise structural assignments. They cover a wide-range of structural aspects including conformation of the backbone, burying depth/solvent exposure and flexibility of residues, and inter-chain residue-residue contacts. We perform first-of-its-kind comprehensive comparative review of the existing 1D structural descriptors. We define, review and categorize ten structural descriptors and we also describe, summarize and contrast over eighty computational models that are used to predict these descriptors from the protein sequences. We show that the majority of the recent sequence-based predictors utilize machine learning models, with the most popular being neural networks, support vector machines, hidden Markov models, and support vector and linear regressions. These methods provide high-throughput predictions and most of them are accessible to a non-expert user via web servers and/or stand-alone software packages. We empirically evaluate several recent sequence-based predictors of secondary structure, disorder, and solvent accessibility descriptors using a benchmark set based on CASP8 targets. Our analysis shows that the secondary structure can be predicted with over 80% accuracy and segment overlap (SOV), disorder with over 0.9 AUC, 0.6 Matthews Correlation Coefficient (MCC), and 75% SOV, and relative solvent accessibility with PCC of 0.7 and MCC of 0.6 (0.86 when homology is used). We demonstrate that the secondary structure predicted from sequence without the use of homology modeling is as good as the structure extracted from the 3D folds predicted by top-performing template-based methods.
Dijk, J; van den Broek, R; Nasiulas, G; Beck, A; Reinhardt, R; Wittmann-Liebold, B
1987-08-01
The amino-terminal sequence of ribosomal protein L10 from Halobacterium marismortui has been determined up to residue 54, using both a liquid- and a gas-phase sequenator. The two sequences are in good agreement. The protein is clearly homologous to protein HcuL10 from the related strain Halobacterium cutirubrum. Furthermore, a weaker but distinct homology to ribosomal protein L6 from Escherichia coli and Bacillus stearothermophilus can be detected. In addition to 7 identical amino acids in the first 36 residues in all four sequences a number of conservative replacements occurs, of mainly hydrophobic amino acids. In this common region the pattern of conserved amino acids suggests the presence of a beta-alpha fold as it occurs in ribosomal proteins L12 and L30. Furthermore, several potential cases of homology to other ribosomal components of the three ur-kingdoms have been found.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mushegian, Arcady R., E-mail: mushegian2@gmail.com; Elena, Santiago F., E-mail: sfelena@ibmcp.upv.es; The Santa Fe Institute, Santa Fe, NM 87501
Homologs of Tobacco mosaic virus 30K cell-to-cell movement protein are encoded by diverse plant viruses. Mechanisms of action and evolutionary origins of these proteins remain obscure. We expand the picture of conservation and evolution of the 30K proteins, producing sequence alignment of the 30K superfamily with the broadest phylogenetic coverage thus far and illuminating structural features of the core all-beta fold of these proteins. Integrated copies of pararetrovirus 30K movement genes are prevalent in euphyllophytes, with at least one copy intact in nearly every examined species, and mRNAs detected for most of them. Sequence analysis suggests repeated integrations, pseudogenizations, andmore » positive selection in those provirus genes. An unannotated 30K-superfamily gene in Arabidopsis thaliana genome is likely expressed as a fusion with the At1g37113 transcript. This molecular background of endopararetrovirus gene products in plants may change our view of virus infection and pathogenesis, and perhaps of cellular homeostasis in the hosts. - Highlights: • Sequence region shared by plant virus “30K” movement proteins has an all-beta fold. • Most euphyllophyte genomes contain integrated copies of pararetroviruses. • These integrated virus genomes often include intact movement protein genes. • Molecular evidence suggests that these “30K” genes may be selected for function.« less
Analysis of ER Resident Proteins in S. cerevisiae: Implementation of H/KDEL Retrieval Sequences
Young, Carissa L.; Raden, David L.; Robinson, Anne S.
2013-01-01
An elaborate quality control system regulates ER homeostasis by ensuring the fidelity of protein synthesis and maturation. In budding yeast, genomic analyses and high-throughput proteomic studies have identified ER resident proteins that restore homeostasis following local perturbations. Yet, how these folding factors modulate stress has been largely unexplored. In this study, we designed a series of PCR-based modules including codon-optimized epitopes and FP variants complete with C-terminal H/KDEL retrieval motifs. These conserved sequences are inherent to most soluble ER resident proteins. To monitor multiple proteins simultaneously, H/KDEL cassettes are available with six different selection markers, providing optimal flexibility for live-cell imaging and multicolor labeling in vivo. A single pair of PCR primers can be used for the amplification of these 26 modules, enabling numerous combinations of tags and selection markers. The versatility of pCY H/KDEL cassettes was demonstrated by labeling BiP/Kar2p, Pdi1p, and Scj1p with all novel tags, thus providing a direct comparison among FP variants. Furthermore, to advance in vitro studies of yeast ER proteins, Strep-tag II was engineered with a C-terminal retrieval sequence. Here, an efficient purification strategy was established for BiP under physiological conditions. PMID:23324027
NASA Astrophysics Data System (ADS)
Sethaphong, Latsavongsakda
This work examines smart material properties of rational self-assembly and molecular recognition found in nano-biosystems. Exploiting the sequence and structural information encoded within nucleic acids and proteins will permit programmed synthesis of nanomaterials and help create molecular machines that may carry out new roles involving chemical catalysis and bioenergy. Responsive to different ionic environments thru self-reorgnization, nucleic acids (NA) are nature's signature smart material; organisms such as viruses and bacteria use features of NAs to react to their environment and orchestrate their lifecycle. Furthermore, nucleic acid systems (both RNA and DNA) are currently exploited as scaffolds; recent applications have been showcased to build bioelectronics and biotemplated nanostructures via directed assembly of multidimensional nanoelectronic devices 1. Since the most stable and rudimentary structure of nucleic acids is the helical duplex, these were modeled in order to examine the influence of the microenvironment, sequence, and cation-dependent perturbations of their canonical forms. Due to their negatively charged phosphate backbone, NA's rely on counterions to overcome the inherent repulsive forces that arise from the assembly of two complementary strands. As a realistic model system, we chose the HIV-TAR helix (PDB ID: 397D) to study specific sequence motifs on cation sequestration. At physiologically relevant concentrations of sodium and potassium ions, we observed sequence based effects where purine stretches were adept in retaining high residency cations. The transitional space between adenine and guanosine nucleotides (ApG step) in a sequence proved the most favorable. This work was the first to directly show these subtle interactions of sequence based cationic sequestration and may be useful for controlling metallization of nucleic acids in conductive nanowires. Extending the study further, we explored the degree to which the structure of NA duplexes alone interacted with cations distinct from a specific sequence. Under physiologically relevant conditions, a duplex of RNA polyguanine-polycitidine was highly responsive and able to sequester cations to the middle of the purine stretches. The least responsive structure was a DNA polyadenine-polythymine duplex. A random sequence DNA duplex contorted into an RNA-like helix resulted in cationic dynamics similar to RNA systems. These studies showed that cation diffusive binding events in nucleic acid duplex structures are sequence specific and heavily influenced by structural aspects helical forms to account for much of the differences observed. Although structural information in nucleic acids is encoded within their sequence, linking amino acid sequence to protein structure is murkier; the structural information within proteins is encoded by the folding process itself: a complex phenomenon driven toward the equilibrium state of the active conformation. Upwards of two thirds of a protein's sequence can be substituted with similar amino acids without significantly perturbing its function; conserved residues of about 10% seem to be vital; since evolutionary selection pressure in proteins operates 3-dimenionally, a linear sequence is partially informative. We explored this problem by folding de-novo the cytosolic portion of the membrane protein, cellulose synthase, CESA1 from upland cotton, Gossypium hirsutum (Ghcesa1). The cytoplasmic region was generated by homology modeling and refined with molecular dynamics. These mutations impair local structural flexibility which likely results in cellulose that is produced at a lower rate and is less crystalline. Additional modeling of fragments of cellulose synthases from the model plant, Arabidopsis thaliana, offered novel insights into the function of conserved cytosolic domains within plant cellulose synthases. Transport mechanisms related to the transmembrane region revealed significant differences between plants and a bacterial complex. These studies generated possible mutations that may allow for the creation of new synthases and identified other avenues of research in order to develop technologies that may alter the crystallinity and other useful properties of cellulose. 1. Karplus, K., SAM-T08, HMM-based protein structure prediction. Nucleic Acids Research, 2009. 37: p. W492-W497.
Linear and Nonlinear Statistical Characterization of DNA
NASA Astrophysics Data System (ADS)
Norio Oiwa, Nestor; Goldman, Carla; Glazier, James
2002-03-01
We find spatial order in the distribution of protein-coding (including RNAs) and control segments of GenBank genomic sequences, irrespective of ATCG content. This is achieved by correlations, histograms, fractal dimensions and singularity spectra. Estimates of these quantities in complete nuclear genome indicate that coding sequences are long-range correlated and their disposition are self-similar (multifractal) for eukaryotes. These characteristics are absent in prokaryotes, where there are few noncoding sequences, suggesting the `junk' DNA play a relevant role to the genome structure and function. Concerning the genetic message of ATCG sequences, we build a random walk (Levy flight), using DNA symmetry arguments, where we associate A, T, C and G as left, right, down and up steps, respectively. Nonlinear analysis of mitochondrial DNA walks reveal multifractal pattern based on palindromic sequences, which fold in hairpins and loops.
Prions, From Structure to Epigenetics and Neuronal Functions
NASA Astrophysics Data System (ADS)
Lindquist, Susan
2012-02-01
Prions are a unique type of protein that can misfold and convert other proteins to the same shape. The well-characterized yeast prion [PSI+] is formed from an inactive amyloid fiber conformation of the translation-termination factor, Sup35. This altered conformation is passed from mother cells to daughters, acting as a template to perpetuate the prion state and providing a mechanism of protein-based inheritance. We employed a variety of methods to determine the structure of Sup35 amyloid fibrils. First, using fluorescent tags and cross-linking we identified specific segments of the protein monomer that form intermolecular contacts in a ``Head-to-Head,'' ``Tail-to-Tail'' fashion while a central region forms intramolecular contacts. Then, using peptide arrays we mapped the region responsible for the prion transmission barrier between two different yeast species. We have also used optical tweezers to reveal that the non-covalent intermolecular contacts between monomers are unusually strong, and maintain fibril integrity even under forces that partially unfold individual monomers and extend fibril length. Based on the handful of known yeast prion proteins we predicted sequences that could be responsible for prion-like amyloid folding. Our screen identified 19 new candidate prions, whose protein-folding properties and diverse cellular functions we have characterized using a combination of genetic and biochemical techniques. Prion-driven phenotypic diversity increases under stress, and can be amplified by the dynamic maturation of prion-initiating states. These qualities allow prions to act as ``bet-hedging'' devices that facilitate the adaptation of yeast to stressful environments, and might speed the evolution of new traits. Together with Kandel and Si, we have also found that a regulatory protein that plays an important role in synaptic plasticity behaves as a prion in yeast. Cytoplasmic polyAdenylation element binding protein, CPEB, maintains synapses by promoting the local translation of mRNAs. We postulate that the self-perpetuating folding of the prion domain acts as a molecular memory. Thus yeast prions have provided evidence for the surprising possibility that amyloid protein folds can serve as the basis for memory and inheritance.
Song, Jiangning; Tan, Hao; Wang, Mingjun; Webb, Geoffrey I.; Akutsu, Tatsuya
2012-01-01
Protein backbone torsion angles (Phi) and (Psi) involve two rotation angles rotating around the Cα-N bond (Phi) and the Cα-C bond (Psi). Due to the planarity of the linked rigid peptide bonds, these two angles can essentially determine the backbone geometry of proteins. Accordingly, the accurate prediction of protein backbone torsion angle from sequence information can assist the prediction of protein structures. In this study, we develop a new approach called TANGLE (Torsion ANGLE predictor) to predict the protein backbone torsion angles from amino acid sequences. TANGLE uses a two-level support vector regression approach to perform real-value torsion angle prediction using a variety of features derived from amino acid sequences, including the evolutionary profiles in the form of position-specific scoring matrices, predicted secondary structure, solvent accessibility and natively disordered region as well as other global sequence features. When evaluated based on a large benchmark dataset of 1,526 non-homologous proteins, the mean absolute errors (MAEs) of the Phi and Psi angle prediction are 27.8° and 44.6°, respectively, which are 1% and 3% respectively lower than that using one of the state-of-the-art prediction tools ANGLOR. Moreover, the prediction of TANGLE is significantly better than a random predictor that was built on the amino acid-specific basis, with the p-value<1.46e-147 and 7.97e-150, respectively by the Wilcoxon signed rank test. As a complementary approach to the current torsion angle prediction algorithms, TANGLE should prove useful in predicting protein structural properties and assisting protein fold recognition by applying the predicted torsion angles as useful restraints. TANGLE is freely accessible at http://sunflower.kuicr.kyoto-u.ac.jp/~sjn/TANGLE/. PMID:22319565
Pandurangan, Sudhakar; Diapari, Marwan; Yin, Fuqiang; Munholland, Seth; Perry, Gregory E.; Chapman, B. Patrick; Huang, Shangzhi; Sparvoli, Francesca; Bollini, Roberto; Crosby, William L.; Pauls, Karl P.; Marsolais, Frédéric
2016-01-01
A series of genetically related lines of common bean (Phaseolus vulgaris L.) integrate a progressive deficiency in major storage proteins, the 7S globulin phaseolin and lectins. SARC1 integrates a lectin-like protein, arcelin-1 from a wild common bean accession. SMARC1N-PN1 is deficient in major lectins, including erythroagglutinating phytohemagglutinin (PHA-E) but not α-amylase inhibitor, and incorporates also a deficiency in phaseolin. SMARC1-PN1 is intermediate and shares the phaseolin deficiency. Sanilac is the parental background. To understand the genomic basis for variations in protein profiles previously determined by proteomics, the genotypes were submitted to short-fragment genome sequencing using an Illumina HiSeq 2000/2500 platform. Reads were aligned to reference sequences and subjected to de novo assembly. The results of the analyses identified polymorphisms responsible for the lack of specific storage proteins, as well as those associated with large differences in storage protein expression. SMARC1N-PN1 lacks the lectin genes pha-E and lec4-B17, and has the pseudogene pdlec1 in place of the functional pha-L gene. While the α-phaseolin gene appears absent, an approximately 20-fold decrease in β-phaseolin accumulation is associated with a single nucleotide polymorphism converting a G-box to an ACGT motif in the proximal promoter. Among residual lectins compensating for storage protein deficiency, mannose lectin FRIL and α-amylase inhibitor 1 genes are uniquely present in SMARC1N-PN1. An approximately 50-fold increase in α-amylase inhibitor like protein accumulation is associated with multiple polymorphisms introducing up to eight potential positive cis-regulatory elements in the proximal promoter specific to SMARC1N-PN1. An approximately 7-fold increase in accumulation of 11S globulin legumin is not associated with variation in proximal promoter sequence, suggesting that the identity of individual proteins involved in proteome rebalancing might also be determined at the translational level. PMID:27066039
You, Min Kyoung; Kim, Jin Hwa; Lee, Yeo Jin; Jeong, Ye Sol; Ha, Sun-Hwa
2016-12-22
Plastoglobules (PGs) are thylakoid membrane microdomains within plastids that are known as specialized locations of carotenogenesis. Three rice phytoene synthase proteins (OsPSYs) involved in carotenoid biosynthesis have been identified. Here, the N-terminal 80-amino-acid portion of OsPSY2 (PTp) was demonstrated to be a chloroplast-targeting peptide by displaying cytosolic localization of OsPSY2(ΔPTp):mCherry in rice protoplast, in contrast to chloroplast localization of OsPSY2:mCherry in a punctate pattern. The peptide sequence of a PTp was predicted to harbor two transmembrane domains eligible for a putative PG-targeting signal. To assess and enhance the PG-targeting ability of PTp, the original PTp DNA sequence ( PTp ) was modified to a synthetic DNA sequence ( stPTp ), which had 84.4% similarity to the original sequence. The motivation of this modification was to reduce the GC ratio from 75% to 65% and to disentangle the hairpin loop structures of PTp . These two DNA sequences were fused to the sequence of the synthetic green fluorescent protein (sGFP) and drove GFP expression with different efficiencies. In particular, the RNA and protein levels of stPTp-sGFP were slightly improved to 1.4-fold and 1.3-fold more than those of sGFP, respectively. The green fluorescent signals of their mature proteins were all observed as speckle-like patterns with slightly blurred stromal signals in chloroplasts. These discrete green speckles of PTp - sGFP and stPTp - sGFP corresponded exactly to the red fluorescent signal displayed by OsPSY2:mCherry in both etiolated and greening protoplasts and it is presumed to correspond to distinct PGs. In conclusion, we identified PTp as a transit peptide sequence facilitating preferential translocation of foreign proteins to PGs, and developed an improved PTp sequence, a s tPTp , which is expected to be very useful for applications in plant biotechnologies requiring precise micro-compartmental localization in plastids.
Exploring Mouse Protein Function via Multiple Approaches.
Huang, Guohua; Chu, Chen; Huang, Tao; Kong, Xiangyin; Zhang, Yunhua; Zhang, Ning; Cai, Yu-Dong
2016-01-01
Although the number of available protein sequences is growing exponentially, functional protein annotations lag far behind. Therefore, accurate identification of protein functions remains one of the major challenges in molecular biology. In this study, we presented a novel approach to predict mouse protein functions. The approach was a sequential combination of a similarity-based approach, an interaction-based approach and a pseudo amino acid composition-based approach. The method achieved an accuracy of about 0.8450 for the 1st-order predictions in the leave-one-out and ten-fold cross-validations. For the results yielded by the leave-one-out cross-validation, although the similarity-based approach alone achieved an accuracy of 0.8756, it was unable to predict the functions of proteins with no homologues. Comparatively, the pseudo amino acid composition-based approach alone reached an accuracy of 0.6786. Although the accuracy was lower than that of the previous approach, it could predict the functions of almost all proteins, even proteins with no homologues. Therefore, the combined method balanced the advantages and disadvantages of both approaches to achieve efficient performance. Furthermore, the results yielded by the ten-fold cross-validation indicate that the combined method is still effective and stable when there are no close homologs are available. However, the accuracy of the predicted functions can only be determined according to known protein functions based on current knowledge. Many protein functions remain unknown. By exploring the functions of proteins for which the 1st-order predicted functions are wrong but the 2nd-order predicted functions are correct, the 1st-order wrongly predicted functions were shown to be closely associated with the genes encoding the proteins. The so-called wrongly predicted functions could also potentially be correct upon future experimental verification. Therefore, the accuracy of the presented method may be much higher in reality.
Exploring Mouse Protein Function via Multiple Approaches
Huang, Tao; Kong, Xiangyin; Zhang, Yunhua; Zhang, Ning
2016-01-01
Although the number of available protein sequences is growing exponentially, functional protein annotations lag far behind. Therefore, accurate identification of protein functions remains one of the major challenges in molecular biology. In this study, we presented a novel approach to predict mouse protein functions. The approach was a sequential combination of a similarity-based approach, an interaction-based approach and a pseudo amino acid composition-based approach. The method achieved an accuracy of about 0.8450 for the 1st-order predictions in the leave-one-out and ten-fold cross-validations. For the results yielded by the leave-one-out cross-validation, although the similarity-based approach alone achieved an accuracy of 0.8756, it was unable to predict the functions of proteins with no homologues. Comparatively, the pseudo amino acid composition-based approach alone reached an accuracy of 0.6786. Although the accuracy was lower than that of the previous approach, it could predict the functions of almost all proteins, even proteins with no homologues. Therefore, the combined method balanced the advantages and disadvantages of both approaches to achieve efficient performance. Furthermore, the results yielded by the ten-fold cross-validation indicate that the combined method is still effective and stable when there are no close homologs are available. However, the accuracy of the predicted functions can only be determined according to known protein functions based on current knowledge. Many protein functions remain unknown. By exploring the functions of proteins for which the 1st-order predicted functions are wrong but the 2nd-order predicted functions are correct, the 1st-order wrongly predicted functions were shown to be closely associated with the genes encoding the proteins. The so-called wrongly predicted functions could also potentially be correct upon future experimental verification. Therefore, the accuracy of the presented method may be much higher in reality. PMID:27846315
Meng, Fan-Rong; You, Zhu-Hong; Chen, Xing; Zhou, Yong; An, Ji-Yong
2017-07-05
Knowledge of drug-target interaction (DTI) plays an important role in discovering new drug candidates. Unfortunately, there are unavoidable shortcomings; including the time-consuming and expensive nature of the experimental method to predict DTI. Therefore, it motivates us to develop an effective computational method to predict DTI based on protein sequence. In the paper, we proposed a novel computational approach based on protein sequence, namely PDTPS (Predicting Drug Targets with Protein Sequence) to predict DTI. The PDTPS method combines Bi-gram probabilities (BIGP), Position Specific Scoring Matrix (PSSM), and Principal Component Analysis (PCA) with Relevance Vector Machine (RVM). In order to evaluate the prediction capacity of the PDTPS, the experiment was carried out on enzyme, ion channel, GPCR, and nuclear receptor datasets by using five-fold cross-validation tests. The proposed PDTPS method achieved average accuracy of 97.73%, 93.12%, 86.78%, and 87.78% on enzyme, ion channel, GPCR and nuclear receptor datasets, respectively. The experimental results showed that our method has good prediction performance. Furthermore, in order to further evaluate the prediction performance of the proposed PDTPS method, we compared it with the state-of-the-art support vector machine (SVM) classifier on enzyme and ion channel datasets, and other exiting methods on four datasets. The promising comparison results further demonstrate that the efficiency and robust of the proposed PDTPS method. This makes it a useful tool and suitable for predicting DTI, as well as other bioinformatics tasks.
Suresh, V; Parthasarathy, S
2014-01-01
We developed a support vector machine based web server called SVM-PB-Pred, to predict the Protein Block for any given amino acid sequence. The input features of SVM-PB-Pred include i) sequence profiles (PSSM) and ii) actual secondary structures (SS) from DSSP method or predicted secondary structures from NPS@ and GOR4 methods. There were three combined input features PSSM+SS(DSSP), PSSM+SS(NPS@) and PSSM+SS(GOR4) used to test and train the SVM models. Similarly, four datasets RS90, DB433, LI1264 and SP1577 were used to develop the SVM models. These four SVM models developed were tested using three different benchmarking tests namely; (i) self consistency, (ii) seven fold cross validation test and (iii) independent case test. The maximum possible prediction accuracy of ~70% was observed in self consistency test for the SVM models of both LI1264 and SP1577 datasets, where PSSM+SS(DSSP) input features was used to test. The prediction accuracies were reduced to ~53% for PSSM+SS(NPS@) and ~43% for PSSM+SS(GOR4) in independent case test, for the SVM models of above two same datasets. Using our method, it is possible to predict the protein block letters for any query protein sequence with ~53% accuracy, when the SP1577 dataset and predicted secondary structure from NPS@ server were used. The SVM-PB-Pred server can be freely accessed through http://bioinfo.bdu.ac.in/~svmpbpred.
Prediction of Ras-effector interactions using position energy matrices.
Kiel, Christina; Serrano, Luis
2007-09-01
One of the more challenging problems in biology is to determine the cellular protein interaction network. Progress has been made to predict protein-protein interactions based on structural information, assuming that structural similar proteins interact in a similar way. In a previous publication, we have determined a genome-wide Ras-effector interaction network based on homology models, with a high accuracy of predicting binding and non-binding domains. However, for a prediction on a genome-wide scale, homology modelling is a time-consuming process. Therefore, we here successfully developed a faster method using position energy matrices, where based on different Ras-effector X-ray template structures, all amino acids in the effector binding domain are sequentially mutated to all other amino acid residues and the effect on binding energy is calculated. Those pre-calculated matrices can then be used to score for binding any Ras or effector sequences. Based on position energy matrices, the sequences of putative Ras-binding domains can be scanned quickly to calculate an energy sum value. By calibrating energy sum values using quantitative experimental binding data, thresholds can be defined and thus non-binding domains can be excluded quickly. Sequences which have energy sum values above this threshold are considered to be potential binding domains, and could be further analysed using homology modelling. This prediction method could be applied to other protein families sharing conserved interaction types, in order to determine in a fast way large scale cellular protein interaction networks. Thus, it could have an important impact on future in silico structural genomics approaches, in particular with regard to increasing structural proteomics efforts, aiming to determine all possible domain folds and interaction types. All matrices are deposited in the ADAN database (http://adan-embl.ibmc.umh.es/). Supplementary data are available at Bioinformatics online.
Papanikolopoulou, Katerina; Teixeira, Susana; Belrhali, Hassan; Forsyth, V Trevor; Mitraki, Anna; van Raaij, Mark J
2004-09-03
Adenovirus fibres are trimeric proteins that consist of a globular C-terminal domain, a central fibrous shaft and an N-terminal part that attaches to the viral capsid. In the presence of the globular C-terminal domain, which is necessary for correct trimerisation, the shaft segment adopts a triple beta-spiral conformation. We have replaced the head of the fibre by the trimerisation domain of the bacteriophage T4 fibritin, the foldon. Two different fusion constructs were made and crystallised, one with an eight amino acid residue linker and one with a linker of only two residues. X-ray crystallographic studies of both fusion proteins shows that residues 319-391 of the adenovirus type 2 fibre shaft fold into a triple beta-spiral fold indistinguishable from the native structure, although this is now resolved at a higher resolution of 1.9 A. The foldon residues 458-483 also adopt their natural structure. The intervening linkers are not well ordered in the crystal structures. This work shows that the shaft sequences retain their capacity to fold into their native beta-spiral fibrous fold when fused to a foreign C-terminal trimerisation motif. It provides a structural basis to artificially trimerise longer adenovirus shaft segments and segments from other trimeric beta-structured fibre proteins. Such artificial fibrous constructs, amenable to crystallisation and solution studies, can offer tractable model systems for the study of beta-fibrous structure. They can also prove useful for gene therapy and fibre engineering applications.
Protein Folding—How and Why: By Hydrogen Exchange, Fragment Separation, and Mass Spectrometry
Englander, S. Walter; Mayne, Leland; Kan, Zhong-Yuan; Hu, Wenbing
2017-01-01
Advanced hydrogen exchange (HX) methodology can now determine the structure of protein folding intermediates and their progression in folding pathways. Key developments over time include the HX pulse labeling method with nuclear magnetic resonance analysis, development of the fragment separation method, the addition to it of mass spectrometric (MS) analysis, and recent improvements in the HX MS technique and data analysis. Also, the discovery of protein foldons and their role supplies an essential interpretive link. Recent work using HX pulse labeling with HX MS analysis finds that a number of proteins fold by stepping through a reproducible sequence of native-like intermediates in an ordered pathway. The stepwise nature of the pathway is dictated by the cooperative foldon unit construction of the protein. The pathway order is determined by a sequential stabilization principle; prior native-like structure guides the formation of adjacent native-like structure. This view does not match the funneled energy landscape paradigm of a very large number of folding tracks, which was framed before foldons were known. PMID:27145881
Liquid crystalline spinning of spider silk
NASA Astrophysics Data System (ADS)
Vollrath, Fritz; Knight, David P.
2001-03-01
Spider silk has outstanding mechanical properties despite being spun at close to ambient temperatures and pressures using water as the solvent. The spider achieves this feat of benign fibre processing by judiciously controlling the folding and crystallization of the main protein constituents, and by adding auxiliary compounds, to create a composite material of defined hierarchical structure. Because the `spinning dope' (the material from which silk is spun) is liquid crystalline, spiders can draw it during extrusion into a hardened fibre using minimal forces. This process involves an unusual internal drawdown within the spider's spinneret that is not seen in industrial fibre processing, followed by a conventional external drawdown after the dope has left the spinneret. Successful copying of the spider's internal processing and precise control over protein folding, combined with knowledge of the gene sequences of its spinning dopes, could permit industrial production of silk-based fibres with unique properties under benign conditions.
Ishikawa, Yoshihiro; Bächinger, Hans Peter
2013-11-01
Collagen biosynthesis occurs in the rough endoplasmic reticulum, and many molecular chaperones and folding enzymes are involved in this process. The folding mechanism of type I procollagen has been well characterized, and protein disulfide isomerase (PDI) has been suggested as a key player in the formation of the correct disulfide bonds in the noncollagenous carboxyl-terminal and amino-terminal propeptides. Prolyl 3-hydroxylase 1 (P3H1) forms a hetero-trimeric complex with cartilage-associated protein and cyclophilin B (CypB). This complex is a multifunctional complex acting as a prolyl 3-hydroxylase, a peptidyl prolyl cis-trans isomerase, and a molecular chaperone. Two major domains are predicted from the primary sequence of P3H1: an amino-terminal domain and a carboxyl-terminal domain corresponding to the 2-oxoglutarate- and iron-dependent dioxygenase domains similar to the α-subunit of prolyl 4-hydroxylase and lysyl hydroxylases. The amino-terminal domain contains four CXXXC sequence repeats. The primary sequence of cartilage-associated protein is homologous to the amino-terminal domain of P3H1 and also contains four CXXXC sequence repeats. However, the function of the CXXXC sequence repeats is not known. Several publications have reported that short peptides containing a CXC or a CXXC sequence show oxido-reductase activity similar to PDI in vitro. We hypothesize that CXXXC motifs have oxido-reductase activity similar to the CXXC motif in PDI. We have tested the enzyme activities on model substrates in vitro using a GCRALCG peptide and the P3H1 complex. Our results suggest that this complex could function as a disulfide isomerase in the rough endoplasmic reticulum.
Preliminary X-ray crystallographic studies of mouse UPR responsive protein P58(IPK) TPR fragment
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tao, Jiahui; Wu, Yunkun; Ron, David
2008-02-01
To investigate the mechanism by which P58(IPK) functions to promote protein folding within the ER, a P58(IPK) TPR fragment without the C-terminal J-domain has been crystallized. Endoplasmic reticulum (ER) stress induces the unfolded protein response (UPR), which can promote protein folding and misfolded protein degradation and attenuate protein translation and protein translocation into the ER. P58(IPK) has been proposed to function as a molecular chaperone to maintain protein-folding homeostasis in the ER under normal and stressed conditions. P58(IPK) contains nine TPR motifs and a C-terminal J-domain within its primary sequence. To investigate the mechanism by which P58(IPK) functions to promotemore » protein folding within the ER, a P58(IPK) TPR fragment without the C-terminal J-domain was crystallized. The crystals diffract to 2.5 Å resolution using a synchrotron X-ray source. The crystals belong to space group P2{sub 1}, with unit-cell parameters a = 83.53, b = 92.75, c = 84.32 Å, α = 90.00, β = 119.36, γ = 90.00°. There are two P58(IPK) molecules in the asymmetric unit, which corresponds to a solvent content of approximately 60%. Structure determination by MAD methods is under way.« less
Delic, Marizela; Göngrich, Rebecca; Mattanovich, Diethard; Gasser, Brigitte
2014-07-20
Recombinant protein production has developed into a huge market with enormous positive implications for human health and for the future direction of a biobased economy. Limitations in the economic and technical feasibility of production processes are often related to bottlenecks of in vivo protein folding. Based on cell biological knowledge, some major bottlenecks have been overcome by the overexpression of molecular chaperones and other folding related proteins, or by the deletion of deleterious pathways that may lead to misfolding, mistargeting, or degradation. While important success could be achieved by this strategy, the list of reported unsuccessful cases is disappointingly long and obviously dependent on the recombinant protein to be produced. Singular engineering of protein folding steps may not lead to desired results if the pathway suffers from several limitations. In particular, the connection between folding quality control and proteolytic degradation needs further attention. Based on recent understanding that multiple steps in the folding and secretion pathways limit productivity, synergistic combinations of the cell engineering approaches mentioned earlier need to be explored. In addition, systems biology-based whole cell analysis that also takes energy and redox metabolism into consideration will broaden the knowledge base for future rational engineering strategies.
Exploring the repeat protein universe through computational protein design
Brunette, TJ; Parmeggiani, Fabio; Huang, Po-Ssu; ...
2015-12-16
A central question in protein evolution is the extent to which naturally occurring proteins sample the space of folded structures accessible to the polypeptide chain. Repeat proteins composed of multiple tandem copies of a modular structure unit are widespread in nature and have critical roles in molecular recognition, signalling, and other essential biological processes. Naturally occurring repeat proteins have been re-engineered for molecular recognition and modular scaffolding applications. In this paper, we use computational protein design to investigate the space of folded structures that can be generated by tandem repeating a simple helix–loop–helix–loop structural motif. Eighty-three designs with sequences unrelatedmore » to known repeat proteins were experimentally characterized. Of these, 53 are monomeric and stable at 95 °C, and 43 have solution X-ray scattering spectra consistent with the design models. Crystal structures of 15 designs spanning a broad range of curvatures are in close agreement with the design models with root mean square deviations ranging from 0.7 to 2.5 Å. Finally, our results show that existing repeat proteins occupy only a small fraction of the possible repeat protein sequence and structure space and that it is possible to design novel repeat proteins with precisely specified geometries, opening up a wide array of new possibilities for biomolecular engineering.« less
Neuroligin Trafficking Deficiencies Arising from Mutations in the α/β-Hydrolase Fold Protein Family*
De Jaco, Antonella; Lin, Michael Z.; Dubi, Noga; Comoletti, Davide; Miller, Meghan T.; Camp, Shelley; Ellisman, Mark; Butko, Margaret T.; Tsien, Roger Y.; Taylor, Palmer
2010-01-01
Despite great functional diversity, characterization of the α/β-hydrolase fold proteins that encompass a superfamily of hydrolases, heterophilic adhesion proteins, and chaperone domains reveals a common structural motif. By incorporating the R451C mutation found in neuroligin (NLGN) and associated with autism and the thyroglobulin G2320R (G221R in NLGN) mutation responsible for congenital hypothyroidism into NLGN3, we show that mutations in the α/β-hydrolase fold domain influence folding and biosynthetic processing of neuroligin3 as determined by in vitro susceptibility to proteases, glycosylation processing, turnover, and processing rates. We also show altered interactions of the mutant proteins with chaperones in the endoplasmic reticulum and arrest of transport along the secretory pathway with diversion to the proteasome. Time-controlled expression of a fluorescently tagged neuroligin in hippocampal neurons shows that these mutations compromise neuronal trafficking of the protein, with the R451C mutation reducing and the G221R mutation virtually abolishing the export of NLGN3 from the soma to the dendritic spines. Although the R451C mutation causes a local folding defect, the G221R mutation appears responsible for more global misfolding of the protein, reflecting their sequence positions in the structure of the protein. Our results suggest that disease-related mutations in the α/β-hydrolase fold domain share common trafficking deficiencies yet lead to discrete congenital disorders of differing severity in the endocrine and nervous systems. PMID:20615874
De Jaco, Antonella; Lin, Michael Z; Dubi, Noga; Comoletti, Davide; Miller, Meghan T; Camp, Shelley; Ellisman, Mark; Butko, Margaret T; Tsien, Roger Y; Taylor, Palmer
2010-09-10
Despite great functional diversity, characterization of the alpha/beta-hydrolase fold proteins that encompass a superfamily of hydrolases, heterophilic adhesion proteins, and chaperone domains reveals a common structural motif. By incorporating the R451C mutation found in neuroligin (NLGN) and associated with autism and the thyroglobulin G2320R (G221R in NLGN) mutation responsible for congenital hypothyroidism into NLGN3, we show that mutations in the alpha/beta-hydrolase fold domain influence folding and biosynthetic processing of neuroligin3 as determined by in vitro susceptibility to proteases, glycosylation processing, turnover, and processing rates. We also show altered interactions of the mutant proteins with chaperones in the endoplasmic reticulum and arrest of transport along the secretory pathway with diversion to the proteasome. Time-controlled expression of a fluorescently tagged neuroligin in hippocampal neurons shows that these mutations compromise neuronal trafficking of the protein, with the R451C mutation reducing and the G221R mutation virtually abolishing the export of NLGN3 from the soma to the dendritic spines. Although the R451C mutation causes a local folding defect, the G221R mutation appears responsible for more global misfolding of the protein, reflecting their sequence positions in the structure of the protein. Our results suggest that disease-related mutations in the alpha/beta-hydrolase fold domain share common trafficking deficiencies yet lead to discrete congenital disorders of differing severity in the endocrine and nervous systems.
AlignMe—a membrane protein sequence alignment web server
Stamm, Marcus; Staritzbichler, René; Khafizov, Kamil; Forrest, Lucy R.
2014-01-01
We present a web server for pair-wise alignment of membrane protein sequences, using the program AlignMe. The server makes available two operational modes of AlignMe: (i) sequence to sequence alignment, taking two sequences in fasta format as input, combining information about each sequence from multiple sources and producing a pair-wise alignment (PW mode); and (ii) alignment of two multiple sequence alignments to create family-averaged hydropathy profile alignments (HP mode). For the PW sequence alignment mode, four different optimized parameter sets are provided, each suited to pairs of sequences with a specific similarity level. These settings utilize different types of inputs: (position-specific) substitution matrices, secondary structure predictions and transmembrane propensities from transmembrane predictions or hydrophobicity scales. In the second (HP) mode, each input multiple sequence alignment is converted into a hydrophobicity profile averaged over the provided set of sequence homologs; the two profiles are then aligned. The HP mode enables qualitative comparison of transmembrane topologies (and therefore potentially of 3D folds) of two membrane proteins, which can be useful if the proteins have low sequence similarity. In summary, the AlignMe web server provides user-friendly access to a set of tools for analysis and comparison of membrane protein sequences. Access is available at http://www.bioinfo.mpg.de/AlignMe PMID:24753425
Identification of distant drug off-targets by direct superposition of binding pocket surfaces.
Schumann, Marcel; Armen, Roger S
2013-01-01
Correctly predicting off-targets for a given molecular structure, which would have the ability to bind a large range of ligands, is both particularly difficult and important if they share no significant sequence or fold similarity with the respective molecular target ("distant off-targets"). A novel approach for identification of off-targets by direct superposition of protein binding pocket surfaces is presented and applied to a set of well-studied and highly relevant drug targets, including representative kinases and nuclear hormone receptors. The entire Protein Data Bank is searched for similar binding pockets and convincing distant off-target candidates were identified that share no significant sequence or fold similarity with the respective target structure. These putative target off-target pairs are further supported by the existence of compounds that bind strongly to both with high topological similarity, and in some cases, literature examples of individual compounds that bind to both. Also, our results clearly show that it is possible for binding pockets to exhibit a striking surface similarity, while the respective off-target shares neither significant sequence nor significant fold similarity with the respective molecular target ("distant off-target").
Identification of Distant Drug Off-Targets by Direct Superposition of Binding Pocket Surfaces
Schumann, Marcel; Armen, Roger S.
2013-01-01
Correctly predicting off-targets for a given molecular structure, which would have the ability to bind a large range of ligands, is both particularly difficult and important if they share no significant sequence or fold similarity with the respective molecular target (“distant off-targets”). A novel approach for identification of off-targets by direct superposition of protein binding pocket surfaces is presented and applied to a set of well-studied and highly relevant drug targets, including representative kinases and nuclear hormone receptors. The entire Protein Data Bank is searched for similar binding pockets and convincing distant off-target candidates were identified that share no significant sequence or fold similarity with the respective target structure. These putative target off-target pairs are further supported by the existence of compounds that bind strongly to both with high topological similarity, and in some cases, literature examples of individual compounds that bind to both. Also, our results clearly show that it is possible for binding pockets to exhibit a striking surface similarity, while the respective off-target shares neither significant sequence nor significant fold similarity with the respective molecular target (“distant off-target”). PMID:24391782
NASA Astrophysics Data System (ADS)
Basu, Sankar; Söderquist, Fredrik; Wallner, Björn
2017-05-01
The focus of the computational structural biology community has taken a dramatic shift over the past one-and-a-half decades from the classical protein structure prediction problem to the possible understanding of intrinsically disordered proteins (IDP) or proteins containing regions of disorder (IDPR). The current interest lies in the unraveling of a disorder-to-order transitioning code embedded in the amino acid sequences of IDPs/IDPRs. Disordered proteins are characterized by an enormous amount of structural plasticity which makes them promiscuous in binding to different partners, multi-functional in cellular activity and atypical in folding energy landscapes resembling partially folded molten globules. Also, their involvement in several deadly human diseases (e.g. cancer, cardiovascular and neurodegenerative diseases) makes them attractive drug targets, and important for a biochemical understanding of the disease(s). The study of the structural ensemble of IDPs is rather difficult, in particular for transient interactions. When bound to a structured partner, an IDPR adapts an ordered conformation in the complex. The residues that undergo this disorder-to-order transition are called protean residues, generally found in short contiguous stretches and the first step in understanding the modus operandi of an IDP/IDPR would be to predict these residues. There are a few available methods which predict these protean segments from their amino acid sequences; however, their performance reported in the literature leaves clear room for improvement. With this background, the current study presents `Proteus', a random forest classifier that predicts the likelihood of a residue undergoing a disorder-to-order transition upon binding to a potential partner protein. The prediction is based on features that can be calculated using the amino acid sequence alone. Proteus compares favorably with existing methods predicting twice as many true positives as the second best method (55 vs. 27%) with a much higher precision on an independent data set. The current study also sheds some light on a possible `disorder-to-order' transitioning consensus, untangled, yet embedded in the amino acid sequence of IDPs. Some guidelines have also been suggested for proceeding with a real-life structural modeling involving an IDPR using Proteus.
Backbone hydration determines the folding signature of amino acid residues.
Bignucolo, Olivier; Leung, Hoi Tik Alvin; Grzesiek, Stephan; Bernèche, Simon
2015-04-08
The relation between the sequence of a protein and its three-dimensional structure remains largely unknown. A lasting dream is to elucidate the side-chain-dependent driving forces that govern the folding process. Different structural data suggest that aromatic amino acids play a particular role in the stabilization of protein structures. To better understand the underlying mechanism, we studied peptides of the sequence EGAAXAASS (X = Gly, Ile, Tyr, Trp) through comparison of molecular dynamics (MD) trajectories and NMR residual dipolar coupling (RDC) measurements. The RDC data for aromatic substitutions provide evidence for a kink in the peptide backbone. Analysis of the MD simulations shows that the formation of internal hydrogen bonds underlying a helical turn is key to reproduce the experimental RDC values. The simulations further reveal that the driving force leading to such helical-turn conformations arises from the lack of hydration of the peptide chain on either side of the bulky aromatic side chain, which can potentially act as a nucleation point initiating the folding process.
Paiardini, Alessandro; Bossa, Francesco; Pascarella, Stefano
2004-01-01
The wealth of biological information provided by structural and genomic projects opens new prospects of understanding life and evolution at the molecular level. In this work, it is shown how computational approaches can be exploited to pinpoint protein structural features that remain invariant upon long evolutionary periods in the fold-type I, PLP-dependent enzymes. A nonredundant set of 23 superposed crystallographic structures belonging to this superfamily was built. Members of this family typically display high-structural conservation despite low-sequence identity. For each structure, a multiple-sequence alignment of orthologous sequences was obtained, and the 23 alignments were merged using the structural information to obtain a comprehensive multiple alignment of 921 sequences of fold-type I enzymes. The structurally conserved regions (SCRs), the evolutionarily conserved residues, and the conserved hydrophobic contacts (CHCs) were extracted from this data set, using both sequence and structural information. The results of this study identified a structural pattern of hydrophobic contacts shared by all of the superfamily members of fold-type I enzymes and involved in native interactions. This profile highlights the presence of a nucleus for this fold, in which residues participating in the most conserved native interactions exhibit preferential evolutionary conservation, that correlates significantly (r = 0.70) with the extent of mean hydrophobic contact value of their apolar fraction. PMID:15498941
Tripathi, Pooja; Pandey, Paras N
2017-07-07
The present work employs pseudo amino acid composition (PseAAC) for encoding the protein sequences in their numeric form. Later this will be arranged in the similarity matrix, which serves as input for spectral graph clustering method. Spectral methods are used previously also for clustering of protein sequences, but they uses pair wise alignment scores of protein sequences, in similarity matrix. The alignment score depends on the length of sequences, so clustering short and long sequences together may not good idea. Therefore the idea of introducing PseAAC with spectral clustering algorithm came into scene. We extensively tested our method and compared its performance with other existing machine learning methods. It is consistently observed that, the number of clusters that we obtained for a given set of proteins is close to the number of superfamilies in that set and PseAAC combined with spectral graph clustering shows the best classification results. Copyright © 2017 Elsevier Ltd. All rights reserved.
PROTERAN: animated terrain evolution for visual analysis of patterns in protein folding trajectory.
Zhou, Ruhong; Parida, Laxmi; Kapila, Kush; Mudur, Sudhir
2007-01-01
The mechanism of protein folding remains largely a mystery in molecular biology, despite the enormous effort from many groups in the past decades. Currently, the protein folding mechanism is often characterized by calculating the free energy landscape versus various reaction coordinates such as the fraction of native contacts, the radius of gyration and so on. In this paper, we present an integrated approach towards understanding the folding process via visual analysis of patterns of these reaction coordinates. The three disparate processes (1) protein folding simulation, (2) pattern elicitation and (3) visualization of patterns, work in tandem. Thus as the protein folds, the changing landscape in the pattern space can be viewed via the visualization tool, PROTERAN, a program we developed for this purpose. We first present an incremental (on-line) trie-based pattern discovery algorithm to elicit the patterns and then describe the terrain metaphor based visualization tool. Using two example small proteins, a beta-hairpin and a designed protein Trp-cage, we next demonstrate that this combined pattern discovery and visualization approach extracts crucial information about protein folding intermediates and mechanism.
Bioinformatic flowchart and database to investigate the origins and diversity of Clan AA peptidases
Llorens, Carlos; Futami, Ricardo; Renaud, Gabriel; Moya, Andrés
2009-01-01
Background Clan AA of aspartic peptidases relates the family of pepsin monomers evolutionarily with all dimeric peptidases encoded by eukaryotic LTR retroelements. Recent findings describing various pools of single-domain nonviral host peptidases, in prokaryotes and eukaryotes, indicate that the diversity of clan AA is larger than previously thought. The ensuing approach to investigate this enzyme group is by studying its phylogeny. However, clan AA is a difficult case to study due to the low similarity and different rates of evolution. This work is an ongoing attempt to investigate the different clan AA families to understand the cause of their diversity. Results In this paper, we describe in-progress database and bioinformatic flowchart designed to characterize the clan AA protein domain based on all possible protein families through ancestral reconstructions, sequence logos, and hidden markov models (HMMs). The flowchart includes the characterization of a major consensus sequence based on 6 amino acid patterns with correspondence with Andreeva's model, the structural template describing the clan AA peptidase fold. The set of tools is work in progress we have organized in a database within the GyDB project, referred to as Clan AA Reference Database . Conclusion The pre-existing classification combined with the evolutionary history of LTR retroelements permits a consistent taxonomical collection of sequence logos and HMMs. This set is useful for gene annotation but also a reference to evaluate the diversity of, and the relationships among, the different families. Comparisons among HMMs suggest a common ancestor for all dimeric clan AA peptidases that is halfway between single-domain nonviral peptidases and those coded by Ty3/Gypsy LTR retroelements. Sequence logos reveal how all clan AA families follow similar protein domain architecture related to the peptidase fold. In particular, each family nucleates a particular consensus motif in the sequence position related to the flap. The different motifs constitute a network where an alanine-asparagine-like variable motif predominates, instead of the canonical flap of the HIV-1 peptidase and closer relatives. Reviewers This article was reviewed by Daniel H. Haft, Vladimir Kapitonov (nominated by Jerry Jurka), and Ben M. Dunn (nominated by Claus Wilke). PMID:19173708
De Novo Proteins with Life-Sustaining Functions Are Structurally Dynamic.
Murphy, Grant S; Greisman, Jack B; Hecht, Michael H
2016-01-29
Designing and producing novel proteins that fold into stable structures and provide essential biological functions are key goals in synthetic biology. In initial steps toward achieving these goals, we constructed a combinatorial library of de novo proteins designed to fold into 4-helix bundles. As described previously, screening this library for sequences that function in vivo to rescue conditionally lethal mutants of Escherichia coli (auxotrophs) yielded several de novo sequences, termed SynRescue proteins, which rescued four different E. coli auxotrophs. In an effort to understand the structural requirements necessary for auxotroph rescue, we investigated the biophysical properties of the SynRescue proteins, using both computational and experimental approaches. Results from circular dichroism, size-exclusion chromatography, and NMR demonstrate that the SynRescue proteins are α-helical and relatively stable. Surprisingly, however, they do not form well-ordered structures. Instead, they form dynamic structures that fluctuate between monomeric and dimeric states. These findings show that a well-ordered structure is not a prerequisite for life-sustaining functions, and suggests that dynamic structures may have been important in the early evolution of protein function. Copyright © 2015 Elsevier Ltd. All rights reserved.
Determination of Membrane-Insertion Free Energies by Molecular Dynamics Simulations
Gumbart, James; Roux, Benoît
2012-01-01
The accurate prediction of membrane-insertion probability for arbitrary protein sequences is a critical challenge to identifying membrane proteins and determining their folded structures. Although algorithms based on sequence statistics have had moderate success, a complete understanding of the energetic factors that drive the insertion of membrane proteins is essential to thoroughly meeting this challenge. In the last few years, numerous attempts to define a free-energy scale for amino-acid insertion have been made, yet disagreement between most experimental and theoretical scales persists. However, for a recently resolved water-to-bilayer scale, it is found that molecular dynamics simulations that carefully mimic the conditions of the experiment can reproduce experimental free energies, even when using the same force field as previous computational studies that were cited as evidence of this disagreement. Therefore, it is suggested that experimental and simulation-based scales can both be accurate and that discrepancies stem from disparities in the microscopic processes being considered rather than methodological errors. Furthermore, these disparities make the development of a single universally applicable membrane-insertion free energy scale difficult. PMID:22385850
Campos-Olivas, R; Hörr, I; Bormann, C; Jung, G; Gronenborn, A M
2001-05-11
AFP1 is a recently discovered anti-fungal, chitin-binding protein from Streptomyces tendae Tü901. Mature AFP1 comprises 86 residues and exhibits limited sequence similarity to the cellulose-binding domains of bacterial cellulases and xylanases. No similarity to the Cys and Gly-rich domains of plant chitin-binding proteins (e.g. agglutinins, lectins, hevein) is observed. AFP1 is the first chitin-binding protein from a bacterium for which anti-fungal activity was shown. Here, we report the three-dimensional solution structure of AFP1, determined by nuclear magnetic resonance spectroscopy. The protein contains two antiparallel beta-sheets (five and four beta-strands each), that pack against each other in a parallel beta-sandwich. This type of architecture is conserved in the functionally related family II of cellulose-binding domains, albeit with different connectivity. A similar fold is also observed in other unrelated proteins (spore coat protein from Myxococcus xanthus, beta-B2 and gamma-B crystallins from Bos taurus, canavalin from Jack bean). AFP1 is therefore classified as a new member of the betagamma-crystallin superfamily. The dynamics of the protein was characterized by NMR using amide 15N relaxation and solvent exchange data. We demonstrate that the protein exhibits an axially symmetric (oblate-like) rotational diffusion tensor whose principal axis coincides to within 15 degrees with that of the inertial tensor. After completion of the present structure of AFP1, an identical fold was reported for a Streptomyces killer toxin-like protein. Based on sequence comparisons and clustering of conserved residues on the protein surface for different cellulose and chitin-binding proteins, we postulate a putative sugar-binding site for AFP1. The inability of the protein to bind short chitin fragments suggests that certain particular architectural features of the solid chitin surface are crucial for the interaction. Copyright 2001 Academic Press.
A method for probing the mutational landscape of amyloid structure.
O'Donnell, Charles W; Waldispühl, Jérôme; Lis, Mieszko; Halfmann, Randal; Devadas, Srinivas; Lindquist, Susan; Berger, Bonnie
2011-07-01
Proteins of all kinds can self-assemble into highly ordered β-sheet aggregates known as amyloid fibrils, important both biologically and clinically. However, the specific molecular structure of a fibril can vary dramatically depending on sequence and environmental conditions, and mutations can drastically alter amyloid function and pathogenicity. Experimental structure determination has proven extremely difficult with only a handful of NMR-based models proposed, suggesting a need for computational methods. We present AmyloidMutants, a statistical mechanics approach for de novo prediction and analysis of wild-type and mutant amyloid structures. Based on the premise of protein mutational landscapes, AmyloidMutants energetically quantifies the effects of sequence mutation on fibril conformation and stability. Tested on non-mutant, full-length amyloid structures with known chemical shift data, AmyloidMutants offers roughly 2-fold improvement in prediction accuracy over existing tools. Moreover, AmyloidMutants is the only method to predict complete super-secondary structures, enabling accurate discrimination of topologically dissimilar amyloid conformations that correspond to the same sequence locations. Applied to mutant prediction, AmyloidMutants identifies a global conformational switch between Aβ and its highly-toxic 'Iowa' mutant in agreement with a recent experimental model based on partial chemical shift data. Predictions on mutant, yeast-toxic strains of HET-s suggest similar alternate folds. When applied to HET-s and a HET-s mutant with core asparagines replaced by glutamines (both highly amyloidogenic chemically similar residues abundant in many amyloids), AmyloidMutants surprisingly predicts a greatly reduced capacity of the glutamine mutant to form amyloid. We confirm this finding by conducting mutagenesis experiments. Our tool is publically available on the web at http://amyloid.csail.mit.edu/. lindquist_admin@wi.mit.edu; bab@csail.mit.edu.
Using Local States To Drive the Sampling of Global Conformations in Proteins
2016-01-01
Conformational changes associated with protein function often occur beyond the time scale currently accessible to unbiased molecular dynamics (MD) simulations, so that different approaches have been developed to accelerate their sampling. Here we investigate how the knowledge of backbone conformations preferentially adopted by protein fragments, as contained in precalculated libraries known as structural alphabets (SA), can be used to explore the landscape of protein conformations in MD simulations. We find that (a) enhancing the sampling of native local states in both metadynamics and steered MD simulations allows the recovery of global folded states in small proteins; (b) folded states can still be recovered when the amount of information on the native local states is reduced by using a low-resolution version of the SA, where states are clustered into macrostates; and (c) sequences of SA states derived from collections of structural motifs can be used to sample alternative conformations of preselected protein regions. The present findings have potential impact on several applications, ranging from protein model refinement to protein folding and design. PMID:26808351
Using Local States To Drive the Sampling of Global Conformations in Proteins.
Pandini, Alessandro; Fornili, Arianna
2016-03-08
Conformational changes associated with protein function often occur beyond the time scale currently accessible to unbiased molecular dynamics (MD) simulations, so that different approaches have been developed to accelerate their sampling. Here we investigate how the knowledge of backbone conformations preferentially adopted by protein fragments, as contained in precalculated libraries known as structural alphabets (SA), can be used to explore the landscape of protein conformations in MD simulations. We find that (a) enhancing the sampling of native local states in both metadynamics and steered MD simulations allows the recovery of global folded states in small proteins; (b) folded states can still be recovered when the amount of information on the native local states is reduced by using a low-resolution version of the SA, where states are clustered into macrostates; and (c) sequences of SA states derived from collections of structural motifs can be used to sample alternative conformations of preselected protein regions. The present findings have potential impact on several applications, ranging from protein model refinement to protein folding and design.
Didehydrophenylalanine, an abundant modification in the beta subunit of plant polygalacturonases.
Sergeant, Kjell; Printz, Bruno; Gutsch, Annelie; Behr, Marc; Renaut, Jenny; Hausman, Jean-Francois
2017-01-01
The structure and the activity of proteins are often regulated by transient or stable post- translational modifications (PTM). Different from well-known, abundant modifications such as phosphorylation and glycosylation some modifications are limited to one or a few proteins across a broad range of related species. Although few examples of the latter type are known, the evolutionary conservation of these modifications and the enzymes responsible for their synthesis suggest an important physiological role. Here, the first observation of a new, fold-directing PTM is described. During the analysis of alfalfa cell wall proteins a -2Da mass shift was observed on phenylalanine residues in the repeated tetrapeptide FxxY of the beta-subunit of polygalacturonase. This modular protein is known to be involved in developmental and stress-responsive processes. The presence of this modification was confirmed using in-house and external datasets acquired by different commonly used techniques in proteome studies. Based on these analyses it was found that all identified phenylalanine residues in the sequence FxxY of this protein were modified to α,β-didehydro-Phe (ΔPhe). Besides showing the reproducible identification of ΔPhe in different species arguments that substantiate the fold-determining role of ΔPhe are given.
ECOD: an evolutionary classification of protein domains.
Cheng, Hua; Schaeffer, R Dustin; Liao, Yuxing; Kinch, Lisa N; Pei, Jimin; Shi, Shuoyong; Kim, Bong-Hyun; Grishin, Nick V
2014-12-01
Understanding the evolution of a protein, including both close and distant relationships, often reveals insight into its structure and function. Fast and easy access to such up-to-date information facilitates research. We have developed a hierarchical evolutionary classification of all proteins with experimentally determined spatial structures, and presented it as an interactive and updatable online database. ECOD (Evolutionary Classification of protein Domains) is distinct from other structural classifications in that it groups domains primarily by evolutionary relationships (homology), rather than topology (or "fold"). This distinction highlights cases of homology between domains of differing topology to aid in understanding of protein structure evolution. ECOD uniquely emphasizes distantly related homologs that are difficult to detect, and thus catalogs the largest number of evolutionary links among structural domain classifications. Placing distant homologs together underscores the ancestral similarities of these proteins and draws attention to the most important regions of sequence and structure, as well as conserved functional sites. ECOD also recognizes closer sequence-based relationships between protein domains. Currently, approximately 100,000 protein structures are classified in ECOD into 9,000 sequence families clustered into close to 2,000 evolutionary groups. The classification is assisted by an automated pipeline that quickly and consistently classifies weekly releases of PDB structures and allows for continual updates. This synchronization with PDB uniquely distinguishes ECOD among all protein classifications. Finally, we present several case studies of homologous proteins not recorded in other classifications, illustrating the potential of how ECOD can be used to further biological and evolutionary studies.
Cohen, P T; Cohen, P
1989-06-15
Infection of Escherichia coli with phage lambda gt10 resulted in the appearance of a protein phosphatase with activity towards 32P-labelled casein. Activity reached a maximum near the point of cell lysis and declined thereafter. The phosphatase was stimulated 30-fold by Mn2+, while Mg2+ and Ca2+ were much less effective. Activity was unaffected by inhibitors 1 and 2, okadaic acid, calmodulin and trifluoperazine, distinguishing it from the major serine/threonine-specific protein phosphatases of eukaryotic cells. The lambda phosphatase was also capable of dephosphorylating other substrates in the presence of Mn2+, although activity towards 32P-labelled phosphorylase was 10-fold lower, and activity towards phosphorylase kinase and glycogen synthase 25 50-fold lower than with casein. No casein phosphatase activity was present in either uninfected cells, or in E. coli infected with phage lambda gt11. Since lambda gt11 lacks part of the open reading frame (orf) 221, previously shown to encode a protein with sequence similarity to protein phosphatase-1 and protein phosphatase-2A of mammalian cells [Cohen, Collins, Coulson, Berndt & da Cruz e Silva (1988) Gene 69, 131-134], the results indicate that ORF221 is the protein phosphatase detected in cells infected with lambda gt10. Comparison of the sequence of ORF221 with other mammalian protein phosphatases defines three highly conserved regions which are likely to be essential for function. The first of these is deleted in lambda gt11.
Li, Wei; Jiang, Wei; Wang, Lei
2016-10-12
In this work, a novel self-locked aptamer probe mediated cascade amplification strategy has been constructed for highly sensitive and specific detection of protein. First, the self-locked aptamer probe was designed with three functions: one was specific molecular recognition attributed to the aptamer sequence, the second was signal transduction owing to the transduction sequence, and the third was self-locking through the hybridization of the transduction sequence and part of the aptamer sequence. Then, the aptamer sequence specific recognized the target and folded into a three-way helix junction, leading to the release of the transduction sequence. Next, the 3'-end of this three-way junction acted as primer to trigger the strand displacement amplification (SDA), yielding a large amount of primers. Finally, the primers initiated the dual-exponential rolling circle amplification (DE-RCA) and generated numerous G-quadruples sequences. By inserting the fluorescent dye N-methyl mesoporphyrin IX (NMM), enhanced fluorescence signal was achieved. In this strategy, the self-locked aptamer probe was more stable to reduce the interference signals generated by the uncontrollable folding in unbounded state. Through the cascade amplification of SDA and DE-RCA, the sensitivity was further improved with a detection limit of 3.8 × 10(-16) mol/L for protein detection. Furthermore, by changing the aptamer sequence of the probe, sensitive and selective detection of adenosine has been also achieved, suggesting that the proposed strategy has good versatility and can be widely used in sensitive and selective detection of biomolecules. Copyright © 2016 Elsevier B.V. All rights reserved.
SPARSE: quadratic time simultaneous alignment and folding of RNAs without sequence-based heuristics.
Will, Sebastian; Otto, Christina; Miladi, Milad; Möhl, Mathias; Backofen, Rolf
2015-08-01
RNA-Seq experiments have revealed a multitude of novel ncRNAs. The gold standard for their analysis based on simultaneous alignment and folding suffers from extreme time complexity of [Formula: see text]. Subsequently, numerous faster 'Sankoff-style' approaches have been suggested. Commonly, the performance of such methods relies on sequence-based heuristics that restrict the search space to optimal or near-optimal sequence alignments; however, the accuracy of sequence-based methods breaks down for RNAs with sequence identities below 60%. Alignment approaches like LocARNA that do not require sequence-based heuristics, have been limited to high complexity ([Formula: see text] quartic time). Breaking this barrier, we introduce the novel Sankoff-style algorithm 'sparsified prediction and alignment of RNAs based on their structure ensembles (SPARSE)', which runs in quadratic time without sequence-based heuristics. To achieve this low complexity, on par with sequence alignment algorithms, SPARSE features strong sparsification based on structural properties of the RNA ensembles. Following PMcomp, SPARSE gains further speed-up from lightweight energy computation. Although all existing lightweight Sankoff-style methods restrict Sankoff's original model by disallowing loop deletions and insertions, SPARSE transfers the Sankoff algorithm to the lightweight energy model completely for the first time. Compared with LocARNA, SPARSE achieves similar alignment and better folding quality in significantly less time (speedup: 3.7). At similar run-time, it aligns low sequence identity instances substantially more accurate than RAF, which uses sequence-based heuristics. © The Author 2015. Published by Oxford University Press.
Bunka, David H J; Lane, Stephen W; Lane, Claire L; Dykeman, Eric C; Ford, Robert J; Barker, Amy M; Twarock, Reidun; Phillips, Simon E V; Stockley, Peter G
2011-10-14
Using a recombinant, T=1 Satellite Tobacco Necrosis Virus (STNV)-like particle expressed in Escherichia coli, we have established conditions for in vitro disassembly and reassembly of the viral capsid. In vivo assembly is dependent on the presence of the coat protein (CP) N-terminal region, and in vitro assembly requires RNA. Using immobilised CP monomers under reassembly conditions with "free" CP subunits, we have prepared a range of partially assembled CP species for RNA aptamer selection. SELEX directed against the RNA-binding face of the STNV CP resulted in the isolation of several clones, one of which (B3) matches the STNV-1 genome in 16 out of 25 nucleotide positions, including across a statistically significant 10/10 stretch. This 10-base region folds into a stem-loop displaying the motif ACAA and has been shown to bind to STNV CP. Analysis of the other aptamer sequences reveals that the majority can be folded into stem-loops displaying versions of this motif. Using a sequence and secondary structure search motif to analyse the genomic sequence of STNV-1, we identified 30 stem-loops displaying the sequence motif AxxA. The implication is that there are many stem-loops in the genome carrying essential recognition features for binding STNV CP. Secondary structure predictions of the genomic RNA using Mfold showed that only 8 out of 30 of these stem-loops would be formed in the lowest-energy structure. These results are consistent with an assembly mechanism based on kinetically driven folding of the RNA. Copyright © 2011 Elsevier Ltd. All rights reserved.
Fluorescence studies with DNA probes: dynamic aspects of DNA structure and DNA-protein interactions
NASA Astrophysics Data System (ADS)
Millar, David P.; Carver, Theodore E.
1994-08-01
Time-resolved fluorescence measurements of optical probes incorporated at specific sites in DNA provides a new approach to studies of DNA structure and DNA:protein interactions. This approach can be used to study complex multi-state behavior, such as the folding of DNA into alternative higher order structures or the transfer of DNA between multiple binding sites on a protein. In this study, fluorescence anisotropy decay of an internal dansyl probe attached to 17/27-mer oligonucleotides was used to monitor the distribution of DNA 3' termini bound at either the polymerase of 3' to 5' exonuclease sites of the Klenow fragment of DNA polymerase I. Partitioning of the primer terminus between the two active sites of the enzyme resulted in a heterogeneous probe environment, reflected in the associative behavior of the fluorescence anisotropy decay. Analysis of the anisotropy decay with a two state model of solvent-exposed and protein-associated dansyl probes was used to determine the fraction of DNA bound at each site. We examined complexes of Klenow fragment with DNAs containing various base mismatches. Single mismatches at the primer terminus caused a 3-fold increase in the equilibrium partitioning of DNA into the exonuclease site, while two or more consecutive G:G mismatches caused the DNA to bind exclusively at the exonuclease site, with a partitioning constant at least 250- fold greater than that of the corresponding matched DNA sequence. Internal single mismatches located up to four bases from the primer terminus produced larger effects than the same mismatch at the primer terminus. These results provide insight into the recognition mechanisms that enable DNA polymerases to proofread misincorporated bases during DNA replication.
Solis, Armando D
2014-01-01
The most informative probability distribution functions (PDFs) describing the Ramachandran phi-psi dihedral angle pair, a fundamental descriptor of backbone conformation of protein molecules, are derived from high-resolution X-ray crystal structures using an information-theoretic approach. The Information Maximization Device (IMD) is established, based on fundamental information-theoretic concepts, and then applied specifically to derive highly resolved phi-psi maps for all 20 single amino acid and all 8000 triplet sequences at an optimal resolution determined by the volume of current data. The paper shows that utilizing the latent information contained in all viable high-resolution crystal structures found in the Protein Data Bank (PDB), totaling more than 77,000 chains, permits the derivation of a large number of optimized sequence-dependent PDFs. This work demonstrates the effectiveness of the IMD and the superiority of the resulting PDFs by extensive fold recognition experiments and rigorous comparisons with previously published triplet PDFs. Because it automatically optimizes PDFs, IMD results in improved performance of knowledge-based potentials, which rely on such PDFs. Furthermore, it provides an easy computational recipe for empirically deriving other kinds of sequence-dependent structural PDFs with greater detail and precision. The high-resolution phi-psi maps derived in this work are available for download.
Shivange, Amol V; Hoeffken, Hans Wolfgang; Haefner, Stefan; Schwaneberg, Ulrich
2016-12-01
Protein consensus-based surface engineering (ProCoS) is a simple and efficient method for directed protein evolution combining computational analysis and molecular biology tools to engineer protein surfaces. ProCoS is based on the hypothesis that conserved residues originated from a common ancestor and that these residues are crucial for the function of a protein, whereas highly variable regions (situated on the surface of a protein) can be targeted for surface engineering to maximize performance. ProCoS comprises four main steps: ( i ) identification of conserved and highly variable regions; ( ii ) protein sequence design by substituting residues in the highly variable regions, and gene synthesis; ( iii ) in vitro DNA recombination of synthetic genes; and ( iv ) screening for active variants. ProCoS is a simple method for surface mutagenesis in which multiple sequence alignment is used for selection of surface residues based on a structural model. To demonstrate the technique's utility for directed evolution, the surface of a phytase enzyme from Yersinia mollaretii (Ymphytase) was subjected to ProCoS. Screening just 1050 clones from ProCoS engineering-guided mutant libraries yielded an enzyme with 34 amino acid substitutions. The surface-engineered Ymphytase exhibited 3.8-fold higher pH stability (at pH 2.8 for 3 h) and retained 40% of the enzyme's specific activity (400 U/mg) compared with the wild-type Ymphytase. The pH stability might be attributed to a significantly increased (20 percentage points; from 9% to 29%) number of negatively charged amino acids on the surface of the engineered phytase.
Characterizing Conformational Dynamics of Proteins Using Evolutionary Couplings.
Feng, Jiangyan; Shukla, Diwakar
2018-01-25
Understanding of protein conformational dynamics is essential for elucidating molecular origins of protein structure-function relationship. Traditionally, reaction coordinates, i.e., some functions of protein atom positions and velocities have been used to interpret the complex dynamics of proteins obtained from experimental and computational approaches such as molecular dynamics simulations. However, it is nontrivial to identify the reaction coordinates a priori even for small proteins. Here, we evaluate the power of evolutionary couplings (ECs) to capture protein dynamics by exploring their use as reaction coordinates, which can efficiently guide the sampling of a conformational free energy landscape. We have analyzed 10 diverse proteins and shown that a few ECs are sufficient to characterize complex conformational dynamics of proteins involved in folding and conformational change processes. With the rapid strides in sequencing technology, we expect that ECs could help identify reaction coordinates a priori and enhance the sampling of the slow dynamical process associated with protein folding and conformational change.
Adam, Benoit; Charloteaux, Benoit; Beaufays, Jerome; Vanhamme, Luc; Godfroid, Edmond; Brasseur, Robert; Lins, Laurence
2008-01-01
Background Lipocalins are widely distributed in nature and are found in bacteria, plants, arthropoda and vertebra. In hematophagous arthropods, they are implicated in the successful accomplishment of the blood meal, interfering with platelet aggregation, blood coagulation and inflammation and in the transmission of disease parasites such as Trypanosoma cruzi and Borrelia burgdorferi. The pairwise sequence identity is low among this family, often below 30%, despite a well conserved tertiary structure. Under the 30% identity threshold, alignment methods do not correctly assign and align proteins. The only safe way to assign a sequence to that family is by experimental determination. However, these procedures are long and costly and cannot always be applied. A way to circumvent the experimental approach is sequence and structure analyze. To further help in that task, the residues implicated in the stabilisation of the lipocalin fold were determined. This was done by analyzing the conserved interactions for ten lipocalins having a maximum pairwise identity of 28% and various functions. Results It was determined that two hydrophobic clusters of residues are conserved by analysing the ten lipocalin structures and sequences. One cluster is internal to the barrel, involving all strands and the 310 helix. The other is external, involving four strands and the helix lying parallel to the barrel surface. These clusters are also present in RaHBP2, a unusual "outlier" lipocalin from tick Rhipicephalus appendiculatus. This information was used to assess assignment of LIR2 a protein from Ixodes ricinus and to build a 3D model that helps to predict function. FTIR data support the lipocalin fold for this protein. Conclusion By sequence and structural analyzes, two conserved clusters of hydrophobic residues in interactions have been identified in lipocalins. Since the residues implicated are not conserved for function, they should provide the minimal subset necessary to confer the lipocalin fold. This information has been used to assign LIR2 to lipocalins and to investigate its structure/function relationship. This study could be applied to other protein families with low pairwise similarity, such as the structurally related fatty acid binding proteins or avidins. PMID:18190694
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chang, Soo-Ik; Hammes, G.G.
1989-11-01
Homology analyses of the protein sequences of chicken liver and rat mammary gland fatty acid synthases were carried out. The amino acid sequences of the chicken and rat enzymes are 67% identical. If conservative substitutions are allowed, 78% of the amino acids are matched. A region of low homologies exists between the functional domains, in particular around amino acid residues 1059-1264 of the chicken enzyme. Homologies between the active sites of chicken and rat and of chicken and yeast enzymes have been analyzed by an alignment method. A high degree of homology exists between the active sites of the chickenmore » and rat enzymes. However, the chicken and yeast enzymes show a lower degree of homology. The DADPH-binding dinucleotide folds of the {beta}-ketoacyl reductase and the enoyl reductase sites were identified by comparison with a known consensus sequence for the DADP- and FAD-binding dinucleotide folds. The active sites of all of the enzymes are primarily in hydrophobic regions of the protein. This study suggests that the genes for the functional domains of fatty acid synthase were originally separated, and these genes were connected to each other by using different connecting nucleotide sequences in different species. An alternative explanation for the differences in rat and chicken is a common ancestry and mutations in the joining regions during evolution.« less
Pressurized Pepsin Digestion in Proteomics
López-Ferrer, Daniel; Petritis, Konstantinos; Robinson, Errol W.; Hixson, Kim K.; Tian, Zhixin; Lee, Jung Hwa; Lee, Sang-Won; Tolić, Nikola; Weitz, Karl K.; Belov, Mikhail E.; Smith, Richard D.; Paša-Tolić, Ljiljana
2011-01-01
Integrated top-down bottom-up proteomics combined with on-line digestion has great potential to improve the characterization of protein isoforms in biological systems and is amendable to high throughput proteomics experiments. Bottom-up proteomics ultimately provides the peptide sequences derived from the tandem MS analyses of peptides after the proteome has been digested. Top-down proteomics conversely entails the MS analyses of intact proteins for more effective characterization of genetic variations and/or post-translational modifications. Herein, we describe recent efforts toward efficient integration of bottom-up and top-down LC-MS-based proteomics strategies. Since most proteomics separations utilize acidic conditions, we exploited the compatibility of pepsin (where the optimal digestion conditions are at low pH) for integration into bottom-up and top-down proteomics work flows. Pressure-enhanced pepsin digestions were successfully performed and characterized with several standard proteins in either an off-line mode using a Barocycler or an on-line mode using a modified high pressure LC system referred to as a fast on-line digestion system (FOLDS). FOLDS was tested using pepsin and a whole microbial proteome, and the results were compared against traditional trypsin digestions on the same platform. Additionally, FOLDS was integrated with a RePlay configuration to demonstrate an ultrarapid integrated bottom-up top-down proteomics strategy using a standard mixture of proteins and a monkey pox virus proteome. PMID:20627868
NASA Astrophysics Data System (ADS)
Bian, Yunqiang; Ren, Weitong; Song, Feng; Yu, Jiafeng; Wang, Jihua
2018-05-01
Structure-based models or Gō-like models, which are built from one or multiple particular experimental structures, have been successfully applied to the folding of proteins and RNAs. Recently, a variant termed the hybrid atomistic model advances the description of backbone and side chain interactions of traditional structure-based models, by borrowing the description of local interactions from classical force fields. In this study, we assessed the validity of this model in the folding problem of human telomeric DNA G-quadruplex, where local dihedral terms play important roles. A two-state model was developed and a set of molecular dynamics simulations was conducted to study the folding dynamics of sequence Htel24, which was experimentally validated to adopt two different (3 + 1) hybrid G-quadruplex topologies in K+ solution. Consistent with the experimental observations, the hybrid-1 conformation was found to be more stable and the hybrid-2 conformation was kinetically more favored. The simulations revealed that the hybrid-2 conformation folded in a higher cooperative manner, which may be the reason why it was kinetically more accessible. Moreover, by building a Markov state model, a two-quartet G-quadruplex state and a misfolded state were identified as competing states to complicate the folding process of Htel24. Besides, the simulations also showed that the transition between hybrid-1 and hybrid-2 conformations may proceed an ensemble of hairpin structures. The hybrid atomistic structure-based model reproduced the kinetic partitioning folding dynamics of Htel24 between two different folds, and thus can be used to study the complex folding processes of other G-quadruplex structures.
Thermosensitivity of growth is determined by chaperone-mediated proteome reallocation
Chen, Ke; Gao, Ye; Mih, Nathan; O’Brien, Edward J.; Yang, Laurence; Palsson, Bernhard O.
2017-01-01
Maintenance of a properly folded proteome is critical for bacterial survival at notably different growth temperatures. Understanding the molecular basis of thermoadaptation has progressed in two main directions, the sequence and structural basis of protein thermostability and the mechanistic principles of protein quality control assisted by chaperones. Yet we do not fully understand how structural integrity of the entire proteome is maintained under stress and how it affects cellular fitness. To address this challenge, we reconstruct a genome-scale protein-folding network for Escherichia coli and formulate a computational model, FoldME, that provides statistical descriptions of multiscale cellular response consistent with many datasets. FoldME simulations show (i) that the chaperones act as a system when they respond to unfolding stress rather than achieving efficient folding of any single component of the proteome, (ii) how the proteome is globally balanced between chaperones for folding and the complex machinery synthesizing the proteins in response to perturbation, (iii) how this balancing determines growth rate dependence on temperature and is achieved through nonspecific regulation, and (iv) how thermal instability of the individual protein affects the overall functional state of the proteome. Overall, these results expand our view of cellular regulation, from targeted specific control mechanisms to global regulation through a web of nonspecific competing interactions that modulate the optimal reallocation of cellular resources. The methodology developed in this study enables genome-scale integration of environment-dependent protein properties and a proteome-wide study of cellular stress responses. PMID:29073085
Structure based re-design of the binding specificity of anti-apoptotic Bcl-xL
Chen, T. Scott; Palacios, Hector; Keating, Amy E.
2012-01-01
Many native proteins are multi-specific and interact with numerous partners, which can confound analysis of their functions. Protein design provides a potential route to generating synthetic variants of native proteins with more selective binding profiles. Re-designed proteins could be used as research tools, diagnostics or therapeutics. In this work, we used a library screening approach to re-engineer the multi-specific anti-apoptotic protein Bcl-xL to remove its interactions with many of its binding partners, making it a high affinity and selective binder of the BH3 region of pro-apoptotic protein Bad. To overcome the enormity of the potential Bcl-xL sequence space, we developed and applied a computational/experimental framework that used protein structure information to generate focused combinatorial libraries. Sequence features were identified using structure-based modeling, and an optimization algorithm based on integer programming was used to select degenerate codons that maximally covered these features. A constraint on library size was used to ensure thorough sampling. Using yeast surface display to screen a designed library of Bcl-xL variants, we successfully identified a protein with ~1,000-fold improvement in binding specificity for the BH3 region of Bad over the BH3 region of Bim. Although negative design was targeted only against the BH3 region of Bim, the best re-designed protein was globally specific against binding to 10 other peptides corresponding to native BH3 motifs. Our design framework demonstrates an efficient route to highly specific protein binders and may readily be adapted for application to other design problems. PMID:23154169
Folding of polyglutamine chains
NASA Astrophysics Data System (ADS)
Chopra, Manan; Reddy, Allam S.; Abbott, N. L.; de Pablo, J. J.
2008-10-01
Long polyglutamine chains have been associated with a number of neurodegenerative diseases. These include Huntington's disease, where expanded polyglutamine (PolyQ) sequences longer than 36 residues are correlated with the onset of symptoms. In this paper we study the folding pathway of a 54-residue PolyQ chain into a β-helical structure. Transition path sampling Monte Carlo simulations are used to generate unbiased reactive pathways between unfolded configurations and the folded β-helical structure of the polyglutamine chain. The folding process is examined in both explicit water and an implicit solvent. Both models reveal that the formation of a few critical contacts is necessary and sufficient for the molecule to fold. Once the primary contacts are formed, the fate of the protein is sealed and it is largely committed to fold. We find that, consistent with emerging hypotheses about PolyQ aggregation, a stable β-helical structure could serve as the nucleus for subsequent polymerization of amyloid fibrils. Our results indicate that PolyQ sequences shorter than 36 residues cannot form that nucleus, and it is also shown that specific mutations inferred from an analysis of the simulated folding pathway exacerbate its stability.
Principles for computational design of binding antibodies
Pszolla, M. Gabriele; Lapidoth, Gideon D.; Norn, Christoffer; Dym, Orly; Unger, Tamar; Albeck, Shira; Tyka, Michael D.; Fleishman, Sarel J.
2017-01-01
Natural proteins must both fold into a stable conformation and exert their molecular function. To date, computational design has successfully produced stable and atomically accurate proteins by using so-called “ideal” folds rich in regular secondary structures and almost devoid of loops and destabilizing elements, such as cavities. Molecular function, such as binding and catalysis, however, often demands nonideal features, including large and irregular loops and buried polar interaction networks, which have remained challenging for fold design. Through five design/experiment cycles, we learned principles for designing stable and functional antibody variable fragments (Fvs). Specifically, we (i) used sequence-design constraints derived from antibody multiple-sequence alignments, and (ii) during backbone design, maintained stabilizing interactions observed in natural antibodies between the framework and loops of complementarity-determining regions (CDRs) 1 and 2. Designed Fvs bound their ligands with midnanomolar affinities and were as stable as natural antibodies, despite having >30 mutations from mammalian antibody germlines. Furthermore, crystallographic analysis demonstrated atomic accuracy throughout the framework and in four of six CDRs in one design and atomic accuracy in the entire Fv in another. The principles we learned are general, and can be implemented to design other nonideal folds, generating stable, specific, and precise antibodies and enzymes. PMID:28973872
Fernandez-Reche, Andres; Cobos, Eva S; Luque, Irene; Ruiz-Sanz, Javier; Martinez, Jose C
2018-01-04
In 1972 Christian B. Anfinsen received the Nobel Prize in Chemistry for "…his work on ribonuclease, especially concerning the connection between the amino acid sequence and the biologically active conformation." The understanding of this principle is crucial for physical biochemistry students, since protein folding studies, bio-computing sciences and protein design approaches are founded on such a well-demonstrated connection. Herein, we describe a detailed and easy-to-follow experiment to reproduce the most relevant assays carried out at Anfinsen's laboratory in the 60s. This experiment provides students with a platform to interpret by themselves the structural and kinetic experiments conceived to understand the protein folding problem. In addition, this three-day experiment brings students a nice opportunity for protein manipulation as well as for the setting up of spectroscopic and chromatographic techniques. © 2018 by The International Union of Biochemistry and Molecular Biology, 2018. © 2018 The International Union of Biochemistry and Molecular Biology.
Pastor, Ashutosh; Singh, Amit K.; Fisher, Mark T.; Chaudhuri, Tapan K.
2016-01-01
Protein folding has been extensively studied for past four decades by employing solution based experiments such as solubility, enzymatic activity, secondary structure analysis, and analytical methods like FRET, NMR and HD exchange. However, for rapid analysis of the folding process, solution based approaches are often plagued with aggregation side reactions resulting in poor yields. In this work we demonstrate that a Bio-Layer Interferometry (BLI) chaperonin detection system can be potentially applied to identify superior refolding conditions for denatured proteins. The degree of immobilized protein folding as a function of time can be detected by monitoring the binding of the high-affinity nucleotide-free form of the chaperonin GroEL. GroEL preferentially interacts with proteins that have hydrophobic surfaces exposed in their unfolded or partially folded form so a decrease in GroEL binding can be correlated with burial of hydrophobic surfaces as folding progresses. The magnitude of GroEL binding to the protein immobilized on Bio-layer interferometry biosensor inversely reflects the extent of protein folding and hydrophobic residue burial. We demonstrate conditions where accelerated folding can be observed for the aggregation prone protein Maltodextrin glucosidase (MalZ). Superior immobilized folding conditions identified on the Bio-layer interferometry biosensor surface were reproduced on Ni-NTA sepharose bead surfaces and resulted in significant improvement in folding yields of released MalZ (measured by enzymatic activity) compared to bulk refolding conditions in solution. PMID:27367928
NASA Astrophysics Data System (ADS)
Thumb, Werner; Graf, Christine; Parslow, Tristram; Schneider, Rainer; Auer, Manfred
1999-11-01
The interaction of the human immunodeficiency virus type 1 (HIV-1) regulatory protein Rev with cellular cofactors is crucial for the viral life cycle. The HIV-1 Rev transactivation domain is functionally interchangeable with analog regions of Rev proteins of other retroviruses suggesting common folding patterns. In order to obtain experimental evidence for similar structural features mediating protein-protein contacts we investigated activation domain peptides from HIV-1, HIV-2, VISNA virus, feline immunodeficiency virus (FIV) and equine infectious anemia virus (EIAV) by CD spectroscopy, secondary structure prediction and sequence analysis. Although different in polarity and hydrophobicity, all peptides showed a similar behavior with respect to solution conformation, concentration dependence and variations in ionic strength and pH. Temperature studies revealed an unusual induction of β-structure with rising temperatures in all activation domain peptides. The high stability of β-structure in this region was demonstrated in three different peptides of the activation domain of HIV-1 Rev in solutions containing 40% hexafluoropropanol, a reagent usually known to induce α-helix into amino acid sequences. Sequence alignments revealed similarities between the polar effector domains from FIV and EIAV and the leucine rich (hydrophobic) effector domains found in HIV-1, HIV-2 and VISNA. Studies on activation domain peptides of two dominant negative HIV-1 Rev mutants, M10 and M32, pointed towards different reasons for the biological behavior. Whereas the peptide containing the M10 mutation (L 78E 79→D 78L 79) showed wild-type structure, the M32 mutant peptide (L 78L 81L 83→A 78A 81A 83) revealed a different protein fold to be the reason for the disturbed binding to cellular cofactors. From our data, we conclude, that the activation domain of Rev proteins from different viral origins adopt a similar fold and that a β-structural element is involved in binding to a cellular cofactor.
Why Is There a Glass Ceiling for Threading Based Protein Structure Prediction Methods?
Skolnick, Jeffrey; Zhou, Hongyi
2017-04-20
Despite their different implementations, comparison of the best threading approaches to the prediction of evolutionary distant protein structures reveals that they tend to succeed or fail on the same protein targets. This is true despite the fact that the structural template library has good templates for all cases. Thus, a key question is why are certain protein structures threadable while others are not. Comparison with threading results on a set of artificial sequences selected for stability further argues that the failure of threading is due to the nature of the protein structures themselves. Using a new contact map based alignment algorithm, we demonstrate that certain folds are highly degenerate in that they can have very similar coarse grained fractions of native contacts aligned and yet differ significantly from the native structure. For threadable proteins, this is not the case. Thus, contemporary threading approaches appear to have reached a plateau, and new approaches to structure prediction are required.
Kapoor, Abhijeet; Travesset, Alex
2014-03-01
We develop an intermediate resolution model, where the backbone is modeled with atomic resolution but the side chain with a single bead, by extending our previous model (Proteins (2013) DOI: 10.1002/prot.24269) to properly include proline, preproline residues and backbone rigidity. Starting from random configurations, the model properly folds 19 proteins (including a mutant 2A3D sequence) into native states containing β sheet, α helix, and mixed α/β. As a further test, the stability of H-RAS (a 169 residue protein, critical in many signaling pathways) is investigated: The protein is stable, with excellent agreement with experimental B-factors. Despite that proteins containing only α helices fold to their native state at lower backbone rigidity, and other limitations, which we discuss thoroughly, the model provides a reliable description of the dynamics as compared with all atom simulations, but does not constrain secondary structures as it is typically the case in more coarse-grained models. Further implications are described. Copyright © 2013 Wiley Periodicals, Inc.
Castillo, Virginia; Graña-Montes, Ricardo; Sabate, Raimon; Ventura, Salvador
2011-06-01
In the cell, protein folding into stable globular conformations is in competition with aggregation into non-functional and usually toxic structures, since the biophysical properties that promote folding also tend to favor intermolecular contacts, leading to the formation of β-sheet-enriched insoluble assemblies. The formation of protein deposits is linked to at least 20 different human disorders, ranging from dementia to diabetes. Furthermore, protein deposition inside cells represents a major obstacle for the biotechnological production of polypeptides. Importantly, the aggregation behavior of polypeptides appears to be strongly influenced by the intrinsic properties encoded in their sequences and specifically by the presence of selective short regions with high aggregation propensity. This allows computational methods to be used to analyze the aggregation properties of proteins without the previous requirement for structural information. Applications range from the identification of individual amyloidogenic regions in disease-linked polypeptides to the analysis of the aggregation properties of complete proteomes. Herein, we review these theoretical approaches and illustrate how they have become important and useful tools in understanding the molecular mechanisms underlying protein aggregation. Copyright © 2011 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
NASA Astrophysics Data System (ADS)
Gopinath, T.; Nelson, Sarah E. D.; Veglia, Gianluigi
2017-12-01
Magic angle spinning (MAS) solid-state NMR (ssNMR) spectroscopy is emerging as a unique method for the atomic resolution structure determination of native membrane proteins in lipid bilayers. Although 13C-detected ssNMR experiments continue to play a major role, recent technological developments have made it possible to carry out 1H-detected experiments, boosting both sensitivity and resolution. Here, we describe a new set of 1H-detected hybrid pulse sequences that combine through-bond and through-space correlation elements into single experiments, enabling the simultaneous detection of rigid and dynamic domains of membrane proteins. As proof-of-principle, we applied these new pulse sequences to the membrane protein phospholamban (PLN) reconstituted in lipid bilayers under moderate MAS conditions. The cross-polarization (CP) based elements enabled the detection of the relatively immobile residues of PLN in the transmembrane domain using through-space correlations; whereas the most dynamic region, which is in equilibrium between folded and unfolded states, was mapped by through-bond INEPT-based elements. These new 1H-detected experiments will enable one to detect not only the most populated (ground) states of biomacromolecules, but also sparsely populated high-energy (excited) states for a complete characterization of protein free energy landscapes.
LiveBench-1: continuous benchmarking of protein structure prediction servers.
Bujnicki, J M; Elofsson, A; Fischer, D; Rychlewski, L
2001-02-01
We present a novel, continuous approach aimed at the large-scale assessment of the performance of available fold-recognition servers. Six popular servers were investigated: PDB-Blast, FFAS, T98-lib, GenTHREADER, 3D-PSSM, and INBGU. The assessment was conducted using as prediction targets a large number of selected protein structures released from October 1999 to April 2000. A target was selected if its sequence showed no significant similarity to any of the proteins previously available in the structural database. Overall, the servers were able to produce structurally similar models for one-half of the targets, but significantly accurate sequence-structure alignments were produced for only one-third of the targets. We further classified the targets into two sets: easy and hard. We found that all servers were able to find the correct answer for the vast majority of the easy targets if a structurally similar fold was present in the server's fold libraries. However, among the hard targets--where standard methods such as PSI-BLAST fail--the most sensitive fold-recognition servers were able to produce similar models for only 40% of the cases, half of which had a significantly accurate sequence-structure alignment. Among the hard targets, the presence of updated libraries appeared to be less critical for the ranking. An "ideally combined consensus" prediction, where the results of all servers are considered, would increase the percentage of correct assignments by 50%. Each server had a number of cases with a correct assignment, where the assignments of all the other servers were wrong. This emphasizes the benefits of considering more than one server in difficult prediction tasks. The LiveBench program (http://BioInfo.PL/LiveBench) is being continued, and all interested developers are cordially invited to join.
Sequence of the chloroplast 16S rRNA gene and its surrounding regions of Chlamydomonas reinhardii.
Dron, M; Rahire, M; Rochaix, J D
1982-01-01
The sequence of a 2 kb DNA fragment containing the chloroplast 16S ribosomal RNA gene from Chlamydomonas reinhardii and its flanking regions has been determined. The algal 16S rRNA sequence (1475 nucleotides) and secondary structure are highly related to those found in bacteria and in the chloroplasts of higher plants. In contrast, the flanking regions are very different. In C. reinhardii the 16S rRNA gene is surrounded by AT rich segments of about 180 bases, which are followed by a long stretch of complementary bases separated from each other by 1833 nucleotides. It is likely that these structures play an important role in the folding and processing of the precursor of 16S rRNA. The primary and secondary structures of the binding sites of two ribosomal proteins in the 16SrRNAs of E. coli and C. reinhardii are considerably related. Images PMID:6296784
León Vázquez, Erika De; Juillard, Franceline; Rosner, Bernard; Kaye, Kenneth M.
2013-01-01
Kaposi’s sarcoma-associated herpesvirus LANA (1162 residues) mediates episomal persistence of viral genomes during latency. LANA mediates viral DNA replication and segregates episomes to daughter nuclei. A 59 residue deletion immediately upstream of the internal repeat elements rendered LANA highly deficient for DNA replication and modestly deficient for the ability to segregate episomes, while smaller deletions did not. The 59 amino acid deletion reduced LANA episome persistence by ~14-fold, while sequentially smaller deletions resulted in ~3-fold, or no deficiency. Three distinct LANA regions reorganized heterochromatin, one of which contains the deleted sequence, but the deletion did not abolish LANA’s ability to alter chromatin. Therefore, this work identifies a short internal LANA sequence that is critical for DNA replication, has modest effects on episome segregation, and substantially impacts episome persistence; this region may exert its effects through an interacting host cell protein(s). PMID:24314665
DOE Office of Scientific and Technical Information (OSTI.GOV)
FLANAGAN,J.M.; BEWLEY,M.C.
It is generally accepted that the information necessary to specify the native, functional, three-dimensional structure of a protein is encoded entirely within its amino acid sequence; however, efficient reversible folding and unfolding is observed only with a subset of small single-domain proteins. Refolding experiments often lead to the formation of kinetically-trapped, misfolded species that aggregate, even in dilute solution. In the cellular environment, the barriers to efficient protein folding and maintenance of native structure are even larger due to the nature of this process. First, nascent polypeptides must fold in an extremely crowded environment where the concentration of macromolecules approachesmore » 300-400 mg/mL and on average, each ribosome is within its own diameter of another ribosome (1-3). These conditions of severe molecular crowding, coupled with high concentrations of nascent polypeptide chains, favor nonspecific aggregation over productive folding (3). Second, folding of newly-translated polypeptides occurs in the context of their vehtorial synthesis process. Amino acids are added to a growing nascent chain at the rate of -5 residues per set, which means that for a 300 residue protein its N-terminus will be exposed to the cytosol {approx}1 min before its C-terminus and be free to begin the folding process. However, because protein folding is highly cooperative, the nascent polypeptide cannot reach its native state until a complete folding domain (50-250 residues) has emerged from the ribosome. Thus, for a single-domain protein, the final steps in folding are only completed post-translationally since {approx}40 residues of a nascent chain are sequestered within the exit channel of the ribosome and are not available for folding (4). A direct consequence of this limitation in cellular folding is that during translation incomplete domains will exist in partially-folded states that tend to expose hydrophobic residues that are prone to aggregation and/or misfolding. Thus it is not surprising that, in cells, the protein folding process is error prone and organisms have evolved ''editing'' or quality control (QC) systems to assist in the folding, maintenance and, when necessary, selective removal of damaged proteins. In fact, there is growing evidence that failure of these QC-systems contributes to a number of disease states (5-8). This chapter describes our current understanding of the nature and mechanisms of the protein quality control systems in the cytosol of bacteria. Parallel systems are exploited in the cytosol and mitochondria of eukaryotes to prevent the accumulation of misfolded proteins.« less
Nakano, Shogo; Motoyama, Tomoharu; Miyashita, Yurina; Ishizuka, Yuki; Matsuo, Naoya; Tokiwa, Hiroaki; Shinoda, Suguru; Asano, Yasuhisa; Ito, Sohei
2018-05-22
The expansion of protein sequence databases has enabled us to design artificial proteins by sequence-based design methods, such as full consensus design (FCD) and ancestral sequence reconstruction (ASR). Artificial proteins with enhanced activity levels compared with native ones can potentially be generated by such methods, but successful design is rare because preparing a sequence library by curating the database and selecting a method is difficult. Utilizing a curated library prepared by reducing conservation energies, we successfully designed two artificial L-threonine 3-dehydrogenase (SDR-TDH) with higher activity levels than native SDR-TDH, FcTDH-N1 and AncTDH, using FCD and ASR, respectively. The artificial SDR-TDHs had excellent thermal stability and NAD+ recognition compared to native SDR-TDH from Cupriavidus necator (CnTDH): the melting temperatures of FcTDH-N1 and AncTDH were about 10 and 5°C higher than CnTDH, respectively, and the dissociation constants toward NAD+ of FcTDH-N1 and AncTDH were two- and seven-fold lower than that of CnTDH, respectively. Enzymatic efficiency of the artificial SDR-TDHs were comparable to that of CnTDH. Crystal structures of FcTDH-N1 and AncTDH were determined at 2.8 and 2.1 Å resolution, respectively. Structural and MD simulation analysis of the SDR-TDHs indicated that only the flexibility at specific regions was changed, suggesting that multiple mutations introduced in the artificial SDR-TDHs altered their flexibility and thereby affected their enzymatic properties. Benchmark analysis of the SDR-TDHs indicated that both FCD and ASR can generate highly functional proteins if a curated library is prepared appropriately.
Peelman, F.; Vinaimont, N.; Verhee, A.; Vanloo, B.; Verschelde, J. L.; Labeur, C.; Seguret-Mace, S.; Duverger, N.; Hutchinson, G.; Vandekerckhove, J.; Tavernier, J.; Rosseneu, M.
1998-01-01
The enzyme cholesterol lecithin acyl transferase (LCAT) shares the Ser/Asp-Glu/His triad with lipases, esterases and proteases, but the low level of sequence homology between LCAT and these enzymes did not allow for the LCAT fold to be identified yet. We, therefore, relied upon structural homology calculations using threading methods based on alignment of the sequence against a library of solved three-dimensional protein structures, for prediction of the LCAT fold. We propose that LCAT, like lipases, belongs to the alpha/beta hydrolase fold family, and that the central domain of LCAT consists of seven conserved parallel beta-strands connected by four alpha-helices and separated by loops. We used the conserved features of this protein fold for the prediction of functional domains in LCAT, and carried out site-directed mutagenesis for the localization of the active site residues. The wild-type enzyme and mutants were expressed in Cos-1 cells. LCAT mass was measured by ELISA, and enzymatic activity was measured on recombinant HDL, on LDL and on a monomeric substrate. We identified D345 and H377 as the catalytic residues of LCAT, together with F103 and L182 as the oxyanion hole residues. In analogy with lipases, we further propose that a potential "lid" domain at residues 50-74 of LCAT might be involved in the enzyme-substrate interaction. Molecular modeling of human LCAT was carried out using human pancreatic and Candida antarctica lipases as templates. The three-dimensional model proposed here is compatible with the position of natural mutants for either LCAT deficiency or Fish-eye disease. It enables moreover prediction of the LCAT domains involved in the interaction with the phospholipid and cholesterol substrates. PMID:9541390
A minimalist model protein with multiple folding funnels
Locker, C. Rebecca; Hernandez, Rigoberto
2001-01-01
Kinetic and structural studies of wild-type proteins such as prions and amyloidogenic proteins provide suggestive evidence that proteins may adopt multiple long-lived states in addition to the native state. All of these states differ structurally because they lie far apart in configuration space, but their stability is not necessarily caused by cooperative (nucleation) effects. In this study, a minimalist model protein is designed to exhibit multiple long-lived states to explore the dynamics of the corresponding wild-type proteins. The minimalist protein is modeled as a 27-monomer sequence confined to a cubic lattice with three different monomer types. An order parameter—the winding index—is introduced to characterize the extent of folding. The winding index has several advantages over other commonly used order parameters like the number of native contacts. It can distinguish between enantiomers, its calculation requires less computational time than the number of native contacts, and reduced-dimensional landscapes can be developed when the native state structure is not known a priori. The results for the designed model protein prove by existence that the rugged energy landscape picture of protein folding can be generalized to include protein “misfolding” into long-lived states. PMID:11470921
Towse, Clare-Louise; Akke, Mikael; Daggett, Valerie
2017-04-27
Molecular dynamics (MD) simulations contain considerable information with regard to the motions and fluctuations of a protein, the magnitude of which can be used to estimate conformational entropy. Here we survey conformational entropy across protein fold space using the Dynameomics database, which represents the largest existing data set of protein MD simulations for representatives of essentially all known protein folds. We provide an overview of MD-derived entropies accounting for all possible degrees of dihedral freedom on an unprecedented scale. Although different side chains might be expected to impose varying restrictions on the conformational space that the backbone can sample, we found that the backbone entropy and side chain size are not strictly coupled. An outcome of these analyses is the Dynameomics Entropy Dictionary, the contents of which have been compared with entropies derived by other theoretical approaches and experiment. As might be expected, the conformational entropies scale linearly with the number of residues, demonstrating that conformational entropy is an extensive property of proteins. The calculated conformational entropies of folding agree well with previous estimates. Detailed analysis of specific cases identifies deviations in conformational entropy from the average values that highlight how conformational entropy varies with sequence, secondary structure, and tertiary fold. Notably, α-helices have lower entropy on average than do β-sheets, and both are lower than coil regions.