ComplexContact: a web server for inter-protein contact prediction using deep learning.
Zeng, Hong; Wang, Sheng; Zhou, Tianming; Zhao, Feifeng; Li, Xiufeng; Wu, Qing; Xu, Jinbo
2018-05-22
ComplexContact (http://raptorx2.uchicago.edu/ComplexContact/) is a web server for sequence-based interfacial residue-residue contact prediction of a putative protein complex. Interfacial residue-residue contacts are critical for understanding how proteins form complex and interact at residue level. When receiving a pair of protein sequences, ComplexContact first searches for their sequence homologs and builds two paired multiple sequence alignments (MSA), then it applies co-evolution analysis and a CASP-winning deep learning (DL) method to predict interfacial contacts from paired MSAs and visualizes the prediction as an image. The DL method was originally developed for intra-protein contact prediction and performed the best in CASP12. Our large-scale experimental test further shows that ComplexContact greatly outperforms pure co-evolution methods for inter-protein contact prediction, regardless of the species.
Srinivasulu, Yerukala Sathipati; Wang, Jyun-Rong; Hsu, Kai-Ti; Tsai, Ming-Ju; Charoenkwan, Phasit; Huang, Wen-Lin; Huang, Hui-Ling; Ho, Shinn-Ying
2015-01-01
Protein-protein interactions (PPIs) are involved in various biological processes, and underlying mechanism of the interactions plays a crucial role in therapeutics and protein engineering. Most machine learning approaches have been developed for predicting the binding affinity of protein-protein complexes based on structure and functional information. This work aims to predict the binding affinity of heterodimeric protein complexes from sequences only. This work proposes a support vector machine (SVM) based binding affinity classifier, called SVM-BAC, to classify heterodimeric protein complexes based on the prediction of their binding affinity. SVM-BAC identified 14 of 580 sequence descriptors (physicochemical, energetic and conformational properties of the 20 amino acids) to classify 216 heterodimeric protein complexes into low and high binding affinity. SVM-BAC yielded the training accuracy, sensitivity, specificity, AUC and test accuracy of 85.80%, 0.89, 0.83, 0.86 and 83.33%, respectively, better than existing machine learning algorithms. The 14 features and support vector regression were further used to estimate the binding affinities (Pkd) of 200 heterodimeric protein complexes. Prediction performance of a Jackknife test was the correlation coefficient of 0.34 and mean absolute error of 1.4. We further analyze three informative physicochemical properties according to their contribution to prediction performance. Results reveal that the following properties are effective in predicting the binding affinity of heterodimeric protein complexes: apparent partition energy based on buried molar fractions, relations between chemical structure and biological activity in principal component analysis IV, and normalized frequency of beta turn. The proposed sequence-based prediction method SVM-BAC uses an optimal feature selection method to identify 14 informative features to classify and predict binding affinity of heterodimeric protein complexes. The characterization analysis revealed that the average numbers of beta turns and hydrogen bonds at protein-protein interfaces in high binding affinity complexes are more than those in low binding affinity complexes.
2015-01-01
Background Protein-protein interactions (PPIs) are involved in various biological processes, and underlying mechanism of the interactions plays a crucial role in therapeutics and protein engineering. Most machine learning approaches have been developed for predicting the binding affinity of protein-protein complexes based on structure and functional information. This work aims to predict the binding affinity of heterodimeric protein complexes from sequences only. Results This work proposes a support vector machine (SVM) based binding affinity classifier, called SVM-BAC, to classify heterodimeric protein complexes based on the prediction of their binding affinity. SVM-BAC identified 14 of 580 sequence descriptors (physicochemical, energetic and conformational properties of the 20 amino acids) to classify 216 heterodimeric protein complexes into low and high binding affinity. SVM-BAC yielded the training accuracy, sensitivity, specificity, AUC and test accuracy of 85.80%, 0.89, 0.83, 0.86 and 83.33%, respectively, better than existing machine learning algorithms. The 14 features and support vector regression were further used to estimate the binding affinities (Pkd) of 200 heterodimeric protein complexes. Prediction performance of a Jackknife test was the correlation coefficient of 0.34 and mean absolute error of 1.4. We further analyze three informative physicochemical properties according to their contribution to prediction performance. Results reveal that the following properties are effective in predicting the binding affinity of heterodimeric protein complexes: apparent partition energy based on buried molar fractions, relations between chemical structure and biological activity in principal component analysis IV, and normalized frequency of beta turn. Conclusions The proposed sequence-based prediction method SVM-BAC uses an optimal feature selection method to identify 14 informative features to classify and predict binding affinity of heterodimeric protein complexes. The characterization analysis revealed that the average numbers of beta turns and hydrogen bonds at protein-protein interfaces in high binding affinity complexes are more than those in low binding affinity complexes. PMID:26681483
Prediction of heterotrimeric protein complexes by two-phase learning using neighboring kernels
2014-01-01
Background Protein complexes play important roles in biological systems such as gene regulatory networks and metabolic pathways. Most methods for predicting protein complexes try to find protein complexes with size more than three. It, however, is known that protein complexes with smaller sizes occupy a large part of whole complexes for several species. In our previous work, we developed a method with several feature space mappings and the domain composition kernel for prediction of heterodimeric protein complexes, which outperforms existing methods. Results We propose methods for prediction of heterotrimeric protein complexes by extending techniques in the previous work on the basis of the idea that most heterotrimeric protein complexes are not likely to share the same protein with each other. We make use of the discriminant function in support vector machines (SVMs), and design novel feature space mappings for the second phase. As the second classifier, we examine SVMs and relevance vector machines (RVMs). We perform 10-fold cross-validation computational experiments. The results suggest that our proposed two-phase methods and SVM with the extended features outperform the existing method NWE, which was reported to outperform other existing methods such as MCL, MCODE, DPClus, CMC, COACH, RRW, and PPSampler for prediction of heterotrimeric protein complexes. Conclusions We propose two-phase prediction methods with the extended features, the domain composition kernel, SVMs and RVMs. The two-phase method with the extended features and the domain composition kernel using SVM as the second classifier is particularly useful for prediction of heterotrimeric protein complexes. PMID:24564744
Protein complex prediction in large ontology attributed protein-protein interaction networks.
Zhang, Yijia; Lin, Hongfei; Yang, Zhihao; Wang, Jian; Li, Yanpeng; Xu, Bo
2013-01-01
Protein complexes are important for unraveling the secrets of cellular organization and function. Many computational approaches have been developed to predict protein complexes in protein-protein interaction (PPI) networks. However, most existing approaches focus mainly on the topological structure of PPI networks, and largely ignore the gene ontology (GO) annotation information. In this paper, we constructed ontology attributed PPI networks with PPI data and GO resource. After constructing ontology attributed networks, we proposed a novel approach called CSO (clustering based on network structure and ontology attribute similarity). Structural information and GO attribute information are complementary in ontology attributed networks. CSO can effectively take advantage of the correlation between frequent GO annotation sets and the dense subgraph for protein complex prediction. Our proposed CSO approach was applied to four different yeast PPI data sets and predicted many well-known protein complexes. The experimental results showed that CSO was valuable in predicting protein complexes and achieved state-of-the-art performance.
Construction of ontology augmented networks for protein complex prediction.
Zhang, Yijia; Lin, Hongfei; Yang, Zhihao; Wang, Jian
2013-01-01
Protein complexes are of great importance in understanding the principles of cellular organization and function. The increase in available protein-protein interaction data, gene ontology and other resources make it possible to develop computational methods for protein complex prediction. Most existing methods focus mainly on the topological structure of protein-protein interaction networks, and largely ignore the gene ontology annotation information. In this article, we constructed ontology augmented networks with protein-protein interaction data and gene ontology, which effectively unified the topological structure of protein-protein interaction networks and the similarity of gene ontology annotations into unified distance measures. After constructing ontology augmented networks, a novel method (clustering based on ontology augmented networks) was proposed to predict protein complexes, which was capable of taking into account the topological structure of the protein-protein interaction network, as well as the similarity of gene ontology annotations. Our method was applied to two different yeast protein-protein interaction datasets and predicted many well-known complexes. The experimental results showed that (i) ontology augmented networks and the unified distance measure can effectively combine the structure closeness and gene ontology annotation similarity; (ii) our method is valuable in predicting protein complexes and has higher F1 and accuracy compared to other competing methods.
Predicting Physical Interactions between Protein Complexes*
Clancy, Trevor; Rødland, Einar Andreas; Nygard, Ståle; Hovig, Eivind
2013-01-01
Protein complexes enact most biochemical functions in the cell. Dynamic interactions between protein complexes are frequent in many cellular processes. As they are often of a transient nature, they may be difficult to detect using current genome-wide screens. Here, we describe a method to computationally predict physical interactions between protein complexes, applied to both humans and yeast. We integrated manually curated protein complexes and physical protein interaction networks, and we designed a statistical method to identify pairs of protein complexes where the number of protein interactions between a complex pair is due to an actual physical interaction between the complexes. An evaluation against manually curated physical complex-complex interactions in yeast revealed that 50% of these interactions could be predicted in this manner. A community network analysis of the highest scoring pairs revealed a biologically sensible organization of physical complex-complex interactions in the cell. Such analyses of proteomes may serve as a guide to the discovery of novel functional cellular relationships. PMID:23438732
Improving prediction of heterodimeric protein complexes using combination with pairwise kernel.
Ruan, Peiying; Hayashida, Morihiro; Akutsu, Tatsuya; Vert, Jean-Philippe
2018-02-19
Since many proteins become functional only after they interact with their partner proteins and form protein complexes, it is essential to identify the sets of proteins that form complexes. Therefore, several computational methods have been proposed to predict complexes from the topology and structure of experimental protein-protein interaction (PPI) network. These methods work well to predict complexes involving at least three proteins, but generally fail at identifying complexes involving only two different proteins, called heterodimeric complexes or heterodimers. There is however an urgent need for efficient methods to predict heterodimers, since the majority of known protein complexes are precisely heterodimers. In this paper, we use three promising kernel functions, Min kernel and two pairwise kernels, which are Metric Learning Pairwise Kernel (MLPK) and Tensor Product Pairwise Kernel (TPPK). We also consider the normalization forms of Min kernel. Then, we combine Min kernel or its normalization form and one of the pairwise kernels by plugging. We applied kernels based on PPI, domain, phylogenetic profile, and subcellular localization properties to predicting heterodimers. Then, we evaluate our method by employing C-Support Vector Classification (C-SVC), carrying out 10-fold cross-validation, and calculating the average F-measures. The results suggest that the combination of normalized-Min-kernel and MLPK leads to the best F-measure and improved the performance of our previous work, which had been the best existing method so far. We propose new methods to predict heterodimers, using a machine learning-based approach. We train a support vector machine (SVM) to discriminate interacting vs non-interacting protein pairs, based on informations extracted from PPI, domain, phylogenetic profiles and subcellular localization. We evaluate in detail new kernel functions to encode these data, and report prediction performance that outperforms the state-of-the-art.
Update of the ATTRACT force field for the prediction of protein-protein binding affinity.
Chéron, Jean-Baptiste; Zacharias, Martin; Antonczak, Serge; Fiorucci, Sébastien
2017-06-05
Determining the protein-protein interactions is still a major challenge for molecular biology. Docking protocols has come of age in predicting the structure of macromolecular complexes. However, they still lack accuracy to estimate the binding affinities, the thermodynamic quantity that drives the formation of a complex. Here, an updated version of the protein-protein ATTRACT force field aiming at predicting experimental binding affinities is reported. It has been designed on a dataset of 218 protein-protein complexes. The correlation between the experimental and predicted affinities reaches 0.6, outperforming most of the available protocols. Focusing on a subset of rigid and flexible complexes, the performance raises to 0.76 and 0.69, respectively. © 2017 Wiley Periodicals, Inc. © 2017 Wiley Periodicals, Inc.
Ruan, Peiying; Hayashida, Morihiro; Maruyama, Osamu; Akutsu, Tatsuya
2013-01-01
Since many proteins express their functional activity by interacting with other proteins and forming protein complexes, it is very useful to identify sets of proteins that form complexes. For that purpose, many prediction methods for protein complexes from protein-protein interactions have been developed such as MCL, MCODE, RNSC, PCP, RRW, and NWE. These methods have dealt with only complexes with size of more than three because the methods often are based on some density of subgraphs. However, heterodimeric protein complexes that consist of two distinct proteins occupy a large part according to several comprehensive databases of known complexes. In this paper, we propose several feature space mappings from protein-protein interaction data, in which each interaction is weighted based on reliability. Furthermore, we make use of prior knowledge on protein domains to develop feature space mappings, domain composition kernel and its combination kernel with our proposed features. We perform ten-fold cross-validation computational experiments. These results suggest that our proposed kernel considerably outperforms the naive Bayes-based method, which is the best existing method for predicting heterodimeric protein complexes. PMID:23776458
In Silico Analysis for the Study of Botulinum Toxin Structure
NASA Astrophysics Data System (ADS)
Suzuki, Tomonori; Miyazaki, Satoru
2010-01-01
Protein-protein interactions play many important roles in biological function. Knowledge of protein-protein complex structure is required for understanding the function. The determination of protein-protein complex structure by experimental studies remains difficult, therefore computational prediction of protein structures by structure modeling and docking studies is valuable method. In addition, MD simulation is also one of the most popular methods for protein structure modeling and characteristics. Here, we attempt to predict protein-protein complex structure and property using some of bioinformatic methods, and we focus botulinum toxin complex as target structure.
Blind predictions of protein interfaces by docking calculations in CAPRI.
Lensink, Marc F; Wodak, Shoshana J
2010-11-15
Reliable prediction of the amino acid residues involved in protein-protein interfaces can provide valuable insight into protein function, and inform mutagenesis studies, and drug design applications. A fast-growing number of methods are being proposed for predicting protein interfaces, using structural information, energetic criteria, or sequence conservation or by integrating multiple criteria and approaches. Overall however, their performance remains limited, especially when applied to nonobligate protein complexes, where the individual components are also stable on their own. Here, we evaluate interface predictions derived from protein-protein docking calculations. To this end we measure the overlap between the interfaces in models of protein complexes submitted by 76 participants in CAPRI (Critical Assessment of Predicted Interactions) and those of 46 observed interfaces in 20 CAPRI targets corresponding to nonobligate complexes. Our evaluation considers multiple models for each target interface, submitted by different participants, using a variety of docking methods. Although this results in a substantial variability in the prediction performance across participants and targets, clear trends emerge. Docking methods that perform best in our evaluation predict interfaces with average recall and precision levels of about 60%, for a small majority (60%) of the analyzed interfaces. These levels are significantly higher than those obtained for nonobligate complexes by most extant interface prediction methods. We find furthermore that a sizable fraction (24%) of the interfaces in models ranked as incorrect in the CAPRI assessment are actually correctly predicted (recall and precision ≥50%), and that these models contribute to 70% of the correct docking-based interface predictions overall. Our analysis proves that docking methods are much more successful in identifying interfaces than in predicting complexes, and suggests that these methods have an excellent potential of addressing the interface prediction challenge. © 2010 Wiley-Liss, Inc.
The Prediction of Botulinum Toxin Structure Based on in Silico and in Vitro Analysis
NASA Astrophysics Data System (ADS)
Suzuki, Tomonori; Miyazaki, Satoru
2011-01-01
Many of biological system mediated through protein-protein interactions. Knowledge of protein-protein complex structure is required for understanding the function. The determination of huge size and flexible protein-protein complex structure by experimental studies remains difficult, costly and five-consuming, therefore computational prediction of protein structures by homolog modeling and docking studies is valuable method. In addition, MD simulation is also one of the most powerful methods allowing to see the real dynamics of proteins. Here, we predict protein-protein complex structure of botulinum toxin to analyze its property. These bioinformatics methods are useful to report the relation between the flexibility of backbone structure and the activity.
Sequence co-evolution gives 3D contacts and structures of protein complexes
Hopf, Thomas A; Schärfe, Charlotta P I; Rodrigues, João P G L M; Green, Anna G; Kohlbacher, Oliver; Sander, Chris; Bonvin, Alexandre M J J; Marks, Debora S
2014-01-01
Protein–protein interactions are fundamental to many biological processes. Experimental screens have identified tens of thousands of interactions, and structural biology has provided detailed functional insight for select 3D protein complexes. An alternative rich source of information about protein interactions is the evolutionary sequence record. Building on earlier work, we show that analysis of correlated evolutionary sequence changes across proteins identifies residues that are close in space with sufficient accuracy to determine the three-dimensional structure of the protein complexes. We evaluate prediction performance in blinded tests on 76 complexes of known 3D structure, predict protein–protein contacts in 32 complexes of unknown structure, and demonstrate how evolutionary couplings can be used to distinguish between interacting and non-interacting protein pairs in a large complex. With the current growth of sequences, we expect that the method can be generalized to genome-wide elucidation of protein–protein interaction networks and used for interaction predictions at residue resolution. DOI: http://dx.doi.org/10.7554/eLife.03430.001 PMID:25255213
Assessment of the reliability of protein-protein interactions and protein function prediction.
Deng, Minghua; Sun, Fengzhu; Chen, Ting
2003-01-01
As more and more high-throughput protein-protein interaction data are collected, the task of estimating the reliability of different data sets becomes increasingly important. In this paper, we present our study of two groups of protein-protein interaction data, the physical interaction data and the protein complex data, and estimate the reliability of these data sets using three different measurements: (1) the distribution of gene expression correlation coefficients, (2) the reliability based on gene expression correlation coefficients, and (3) the accuracy of protein function predictions. We develop a maximum likelihood method to estimate the reliability of protein interaction data sets according to the distribution of correlation coefficients of gene expression profiles of putative interacting protein pairs. The results of the three measurements are consistent with each other. The MIPS protein complex data have the highest mean gene expression correlation coefficients (0.256) and the highest accuracy in predicting protein functions (70% sensitivity and specificity), while Ito's Yeast two-hybrid data have the lowest mean (0.041) and the lowest accuracy (15% sensitivity and specificity). Uetz's data are more reliable than Ito's data in all three measurements, and the TAP protein complex data are more reliable than the HMS-PCI data in all three measurements as well. The complex data sets generally perform better in function predictions than do the physical interaction data sets. Proteins in complexes are shown to be more highly correlated in gene expression. The results confirm that the components of a protein complex can be assigned to functions that the complex carries out within a cell. There are three interaction data sets different from the above two groups: the genetic interaction data, the in-silico data and the syn-express data. Their capability of predicting protein functions generally falls between that of the Y2H data and that of the MIPS protein complex data. The supplementary information is available at the following Web site: http://www-hto.usc.edu/-msms/AssessInteraction/.
Mazloom, Amin R.; Dannenfelser, Ruth; Clark, Neil R.; Grigoryan, Arsen V.; Linder, Kathryn M.; Cardozo, Timothy J.; Bond, Julia C.; Boran, Aislyn D. W.; Iyengar, Ravi; Malovannaya, Anna; Lanz, Rainer B.; Ma'ayan, Avi
2011-01-01
Coregulator proteins (CoRegs) are part of multi-protein complexes that transiently assemble with transcription factors and chromatin modifiers to regulate gene expression. In this study we analyzed data from 3,290 immuno-precipitations (IP) followed by mass spectrometry (MS) applied to human cell lines aimed at identifying CoRegs complexes. Using the semi-quantitative spectral counts, we scored binary protein-protein and domain-domain associations with several equations. Unlike previous applications, our methods scored prey-prey protein-protein interactions regardless of the baits used. We also predicted domain-domain interactions underlying predicted protein-protein interactions. The quality of predicted protein-protein and domain-domain interactions was evaluated using known binary interactions from the literature, whereas one protein-protein interaction, between STRN and CTTNBP2NL, was validated experimentally; and one domain-domain interaction, between the HEAT domain of PPP2R1A and the Pkinase domain of STK25, was validated using molecular docking simulations. The scoring schemes presented here recovered known, and predicted many new, complexes, protein-protein, and domain-domain interactions. The networks that resulted from the predictions are provided as a web-based interactive application at http://maayanlab.net/HT-IP-MS-2-PPI-DDI/. PMID:22219718
Predicting protein interactions by Brownian dynamics simulations.
Meng, Xuan-Yu; Xu, Yu; Zhang, Hong-Xing; Mezei, Mihaly; Cui, Meng
2012-01-01
We present a newly adapted Brownian-Dynamics (BD)-based protein docking method for predicting native protein complexes. The approach includes global BD conformational sampling, compact complex selection, and local energy minimization. In order to reduce the computational costs for energy evaluations, a shell-based grid force field was developed to represent the receptor protein and solvation effects. The performance of this BD protein docking approach has been evaluated on a test set of 24 crystal protein complexes. Reproduction of experimental structures in the test set indicates the adequate conformational sampling and accurate scoring of this BD protein docking approach. Furthermore, we have developed an approach to account for the flexibility of proteins, which has been successfully applied to reproduce the experimental complex structure from the structure of two unbounded proteins. These results indicate that this adapted BD protein docking approach can be useful for the prediction of protein-protein interactions.
Liu, Lizhen; Sun, Xiaowu; Song, Wei; Du, Chao
2018-06-01
Predicting protein complexes from protein-protein interaction (PPI) network is of great significance to recognize the structure and function of cells. A protein may interact with different proteins under different time or conditions. Existing approaches only utilize static PPI network data that may lose much temporal biological information. First, this article proposed a novel method that combines gene expression data at different time points with traditional static PPI network to construct different dynamic subnetworks. Second, to further filter out the data noise, the semantic similarity based on gene ontology is regarded as the network weight together with the principal component analysis, which is introduced to deal with the weight computing by three traditional methods. Third, after building a dynamic PPI network, a predicting protein complexes algorithm based on "core-attachment" structural feature is applied to detect complexes from each dynamic subnetworks. Finally, it is revealed from the experimental results that our method proposed in this article performs well on detecting protein complexes from dynamic weighted PPI networks.
Modeling the assembly order of multimeric heteroprotein complexes
Esquivel-Rodriguez, Juan; Terashi, Genki; Christoffer, Charles; Shin, Woong-Hee
2018-01-01
Protein-protein interactions are the cornerstone of numerous biological processes. Although an increasing number of protein complex structures have been determined using experimental methods, relatively fewer studies have been performed to determine the assembly order of complexes. In addition to the insights into the molecular mechanisms of biological function provided by the structure of a complex, knowing the assembly order is important for understanding the process of complex formation. Assembly order is also practically useful for constructing subcomplexes as a step toward solving the entire complex experimentally, designing artificial protein complexes, and developing drugs that interrupt a critical step in the complex assembly. There are several experimental methods for determining the assembly order of complexes; however, these techniques are resource-intensive. Here, we present a computational method that predicts the assembly order of protein complexes by building the complex structure. The method, named Path-LzerD, uses a multimeric protein docking algorithm that assembles a protein complex structure from individual subunit structures and predicts assembly order by observing the simulated assembly process of the complex. Benchmarked on a dataset of complexes with experimental evidence of assembly order, Path-LZerD was successful in predicting the assembly pathway for the majority of the cases. Moreover, when compared with a simple approach that infers the assembly path from the buried surface area of subunits in the native complex, Path-LZerD has the strong advantage that it can be used for cases where the complex structure is not known. The path prediction accuracy decreased when starting from unbound monomers, particularly for larger complexes of five or more subunits, for which only a part of the assembly path was correctly identified. As the first method of its kind, Path-LZerD opens a new area of computational protein structure modeling and will be an indispensable approach for studying protein complexes. PMID:29329283
Modeling the assembly order of multimeric heteroprotein complexes.
Peterson, Lenna X; Togawa, Yoichiro; Esquivel-Rodriguez, Juan; Terashi, Genki; Christoffer, Charles; Roy, Amitava; Shin, Woong-Hee; Kihara, Daisuke
2018-01-01
Protein-protein interactions are the cornerstone of numerous biological processes. Although an increasing number of protein complex structures have been determined using experimental methods, relatively fewer studies have been performed to determine the assembly order of complexes. In addition to the insights into the molecular mechanisms of biological function provided by the structure of a complex, knowing the assembly order is important for understanding the process of complex formation. Assembly order is also practically useful for constructing subcomplexes as a step toward solving the entire complex experimentally, designing artificial protein complexes, and developing drugs that interrupt a critical step in the complex assembly. There are several experimental methods for determining the assembly order of complexes; however, these techniques are resource-intensive. Here, we present a computational method that predicts the assembly order of protein complexes by building the complex structure. The method, named Path-LzerD, uses a multimeric protein docking algorithm that assembles a protein complex structure from individual subunit structures and predicts assembly order by observing the simulated assembly process of the complex. Benchmarked on a dataset of complexes with experimental evidence of assembly order, Path-LZerD was successful in predicting the assembly pathway for the majority of the cases. Moreover, when compared with a simple approach that infers the assembly path from the buried surface area of subunits in the native complex, Path-LZerD has the strong advantage that it can be used for cases where the complex structure is not known. The path prediction accuracy decreased when starting from unbound monomers, particularly for larger complexes of five or more subunits, for which only a part of the assembly path was correctly identified. As the first method of its kind, Path-LZerD opens a new area of computational protein structure modeling and will be an indispensable approach for studying protein complexes.
HomPPI: a class of sequence homology based protein-protein interface prediction methods
2011-01-01
Background Although homology-based methods are among the most widely used methods for predicting the structure and function of proteins, the question as to whether interface sequence conservation can be effectively exploited in predicting protein-protein interfaces has been a subject of debate. Results We studied more than 300,000 pair-wise alignments of protein sequences from structurally characterized protein complexes, including both obligate and transient complexes. We identified sequence similarity criteria required for accurate homology-based inference of interface residues in a query protein sequence. Based on these analyses, we developed HomPPI, a class of sequence homology-based methods for predicting protein-protein interface residues. We present two variants of HomPPI: (i) NPS-HomPPI (Non partner-specific HomPPI), which can be used to predict interface residues of a query protein in the absence of knowledge of the interaction partner; and (ii) PS-HomPPI (Partner-specific HomPPI), which can be used to predict the interface residues of a query protein with a specific target protein. Our experiments on a benchmark dataset of obligate homodimeric complexes show that NPS-HomPPI can reliably predict protein-protein interface residues in a given protein, with an average correlation coefficient (CC) of 0.76, sensitivity of 0.83, and specificity of 0.78, when sequence homologs of the query protein can be reliably identified. NPS-HomPPI also reliably predicts the interface residues of intrinsically disordered proteins. Our experiments suggest that NPS-HomPPI is competitive with several state-of-the-art interface prediction servers including those that exploit the structure of the query proteins. The partner-specific classifier, PS-HomPPI can, on a large dataset of transient complexes, predict the interface residues of a query protein with a specific target, with a CC of 0.65, sensitivity of 0.69, and specificity of 0.70, when homologs of both the query and the target can be reliably identified. The HomPPI web server is available at http://homppi.cs.iastate.edu/. Conclusions Sequence homology-based methods offer a class of computationally efficient and reliable approaches for predicting the protein-protein interface residues that participate in either obligate or transient interactions. For query proteins involved in transient interactions, the reliability of interface residue prediction can be improved by exploiting knowledge of putative interaction partners. PMID:21682895
(PS)2: protein structure prediction server version 3.0.
Huang, Tsun-Tsao; Hwang, Jenn-Kang; Chen, Chu-Huang; Chu, Chih-Sheng; Lee, Chi-Wen; Chen, Chih-Chieh
2015-07-01
Protein complexes are involved in many biological processes. Examining coupling between subunits of a complex would be useful to understand the molecular basis of protein function. Here, our updated (PS)(2) web server predicts the three-dimensional structures of protein complexes based on comparative modeling; furthermore, this server examines the coupling between subunits of the predicted complex by combining structural and evolutionary considerations. The predicted complex structure could be indicated and visualized by Java-based 3D graphics viewers and the structural and evolutionary profiles are shown and compared chain-by-chain. For each subunit, considerations with or without the packing contribution of other subunits cause the differences in similarities between structural and evolutionary profiles, and these differences imply which form, complex or monomeric, is preferred in the biological condition for the subunit. We believe that the (PS)(2) server would be a useful tool for biologists who are interested not only in the structures of protein complexes but also in the coupling between subunits of the complexes. The (PS)(2) is freely available at http://ps2v3.life.nctu.edu.tw/. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
Dimitrakopoulos, Christos; Theofilatos, Konstantinos; Pegkas, Andreas; Likothanassis, Spiros; Mavroudi, Seferina
2016-07-01
Proteins are vital biological molecules driving many fundamental cellular processes. They rarely act alone, but form interacting groups called protein complexes. The study of protein complexes is a key goal in systems biology. Recently, large protein-protein interaction (PPI) datasets have been published and a plethora of computational methods that provide new ideas for the prediction of protein complexes have been implemented. However, most of the methods suffer from two major limitations: First, they do not account for proteins participating in multiple functions and second, they are unable to handle weighted PPI graphs. Moreover, the problem remains open as existing algorithms and tools are insufficient in terms of predictive metrics. In the present paper, we propose gradually expanding neighborhoods with adjustment (GENA), a new algorithm that gradually expands neighborhoods in a graph starting from highly informative "seed" nodes. GENA considers proteins as multifunctional molecules allowing them to participate in more than one protein complex. In addition, GENA accepts weighted PPI graphs by using a weighted evaluation function for each cluster. In experiments with datasets from Saccharomyces cerevisiae and human, GENA outperformed Markov clustering, restricted neighborhood search and clustering with overlapping neighborhood expansion, three state-of-the-art methods for computationally predicting protein complexes. Seven PPI networks and seven evaluation datasets were used in total. GENA outperformed existing methods in 16 out of 18 experiments achieving an average improvement of 5.5% when the maximum matching ratio metric was used. Our method was able to discover functionally homogeneous protein clusters and uncover important network modules in a Parkinson expression dataset. When used on the human networks, around 47% of the detected clusters were enriched in gene ontology (GO) terms with depth higher than five in the GO hierarchy. In the present manuscript, we introduce a new method for the computational prediction of protein complexes by making the realistic assumption that proteins participate in multiple protein complexes and cellular functions. Our method can detect accurate and functionally homogeneous clusters. Copyright © 2016 Elsevier B.V. All rights reserved.
Theofilatos, Konstantinos; Pavlopoulou, Niki; Papasavvas, Christoforos; Likothanassis, Spiros; Dimitrakopoulos, Christos; Georgopoulos, Efstratios; Moschopoulos, Charalampos; Mavroudi, Seferina
2015-03-01
Proteins are considered to be the most important individual components of biological systems and they combine to form physical protein complexes which are responsible for certain molecular functions. Despite the large availability of protein-protein interaction (PPI) information, not much information is available about protein complexes. Experimental methods are limited in terms of time, efficiency, cost and performance constraints. Existing computational methods have provided encouraging preliminary results, but they phase certain disadvantages as they require parameter tuning, some of them cannot handle weighted PPI data and others do not allow a protein to participate in more than one protein complex. In the present paper, we propose a new fully unsupervised methodology for predicting protein complexes from weighted PPI graphs. The proposed methodology is called evolutionary enhanced Markov clustering (EE-MC) and it is a hybrid combination of an adaptive evolutionary algorithm and a state-of-the-art clustering algorithm named enhanced Markov clustering. EE-MC was compared with state-of-the-art methodologies when applied to datasets from the human and the yeast Saccharomyces cerevisiae organisms. Using public available datasets, EE-MC outperformed existing methodologies (in some datasets the separation metric was increased by 10-20%). Moreover, when applied to new human datasets its performance was encouraging in the prediction of protein complexes which consist of proteins with high functional similarity. In specific, 5737 protein complexes were predicted and 72.58% of them are enriched for at least one gene ontology (GO) function term. EE-MC is by design able to overcome intrinsic limitations of existing methodologies such as their inability to handle weighted PPI networks, their constraint to assign every protein in exactly one cluster and the difficulties they face concerning the parameter tuning. This fact was experimentally validated and moreover, new potentially true human protein complexes were suggested as candidates for further validation using experimental techniques. Copyright © 2015 Elsevier B.V. All rights reserved.
Bordner, Andrew J; Gorin, Andrey A
2008-05-12
Protein-protein interactions are ubiquitous and essential for all cellular processes. High-resolution X-ray crystallographic structures of protein complexes can reveal the details of their function and provide a basis for many computational and experimental approaches. Differentiation between biological and non-biological contacts and reconstruction of the intact complex is a challenging computational problem. A successful solution can provide additional insights into the fundamental principles of biological recognition and reduce errors in many algorithms and databases utilizing interaction information extracted from the Protein Data Bank (PDB). We have developed a method for identifying protein complexes in the PDB X-ray structures by a four step procedure: (1) comprehensively collecting all protein-protein interfaces; (2) clustering similar protein-protein interfaces together; (3) estimating the probability that each cluster is relevant based on a diverse set of properties; and (4) combining these scores for each PDB entry in order to predict the complex structure. The resulting clusters of biologically relevant interfaces provide a reliable catalog of evolutionary conserved protein-protein interactions. These interfaces, as well as the predicted protein complexes, are available from the Protein Interface Server (PInS) website (see Availability and requirements section). Our method demonstrates an almost two-fold reduction of the annotation error rate as evaluated on a large benchmark set of complexes validated from the literature. We also estimate relative contributions of each interface property to the accurate discrimination of biologically relevant interfaces and discuss possible directions for further improving the prediction method.
Exploiting three kinds of interface propensities to identify protein binding sites.
Liu, Bin; Wang, Xiaolong; Lin, Lei; Dong, Qiwen; Wang, Xuan
2009-08-01
Predicting the binding sites between two interacting proteins provides important clues to the function of a protein. In this study, we present a building block of proteins called order profiles to use the evolutionary information of the protein sequence frequency profiles and apply this building block to produce a class of propensities called order profile interface propensities. For comparisons, we revisit the usage of residue interface propensities and binary profile interface propensities for protein binding site prediction. Each kind of propensities combined with sequence profiles and accessible surface areas are inputted into SVM. When tested on four types of complexes (hetero-permanent complexes, hetero-transient complexes, homo-permanent complexes and homo-transient complexes), experimental results show that the order profile interface propensities are better than residue interface propensities and binary profile interface propensities. Therefore, order profile is a suitable profile-level building block of the protein sequences and can be widely used in many tasks of computational biology, such as the sequence alignment, the prediction of domain boundary, the designation of knowledge-based potentials and the protein remote homology detection.
Bordner, Andrew J.; Gorin, Andrey A.
2008-05-12
Here, protein-protein interactions are ubiquitous and essential for cellular processes. High-resolution X-ray crystallographic structures of protein complexes can elucidate the details of their function and provide a basis for many computational and experimental approaches. Here we demonstrate that existing annotations of protein complexes, including those provided by the Protein Data Bank (PDB) itself, contain a significant fraction of incorrect annotations. Results: We have developed a method for identifying protein complexes in the PDB X-ray structures by a four step procedure: (1) comprehensively collecting all protein-protein interfaces; (2) clustering similar protein-protein interfaces together; (3) estimating the probability that each cluster ismore » relevant based on a diverse set of properties; and (4) finally combining these scores for each entry in order to predict the complex structure. Unlike previous annotation methods, consistent prediction of complexes with identical or almost identical protein content is insured. The resulting clusters of biologically relevant interfaces provide a reliable catalog of evolutionary conserved protein-protein interactions.« less
Conformational Transitions upon Ligand Binding: Holo-Structure Prediction from Apo Conformations
Seeliger, Daniel; de Groot, Bert L.
2010-01-01
Biological function of proteins is frequently associated with the formation of complexes with small-molecule ligands. Experimental structure determination of such complexes at atomic resolution, however, can be time-consuming and costly. Computational methods for structure prediction of protein/ligand complexes, particularly docking, are as yet restricted by their limited consideration of receptor flexibility, rendering them not applicable for predicting protein/ligand complexes if large conformational changes of the receptor upon ligand binding are involved. Accurate receptor models in the ligand-bound state (holo structures), however, are a prerequisite for successful structure-based drug design. Hence, if only an unbound (apo) structure is available distinct from the ligand-bound conformation, structure-based drug design is severely limited. We present a method to predict the structure of protein/ligand complexes based solely on the apo structure, the ligand and the radius of gyration of the holo structure. The method is applied to ten cases in which proteins undergo structural rearrangements of up to 7.1 Å backbone RMSD upon ligand binding. In all cases, receptor models within 1.6 Å backbone RMSD to the target were predicted and close-to-native ligand binding poses were obtained for 8 of 10 cases in the top-ranked complex models. A protocol is presented that is expected to enable structure modeling of protein/ligand complexes and structure-based drug design for cases where crystal structures of ligand-bound conformations are not available. PMID:20066034
Protein docking prediction using predicted protein-protein interface.
Li, Bin; Kihara, Daisuke
2012-01-10
Many important cellular processes are carried out by protein complexes. To provide physical pictures of interacting proteins, many computational protein-protein prediction methods have been developed in the past. However, it is still difficult to identify the correct docking complex structure within top ranks among alternative conformations. We present a novel protein docking algorithm that utilizes imperfect protein-protein binding interface prediction for guiding protein docking. Since the accuracy of protein binding site prediction varies depending on cases, the challenge is to develop a method which does not deteriorate but improves docking results by using a binding site prediction which may not be 100% accurate. The algorithm, named PI-LZerD (using Predicted Interface with Local 3D Zernike descriptor-based Docking algorithm), is based on a pair wise protein docking prediction algorithm, LZerD, which we have developed earlier. PI-LZerD starts from performing docking prediction using the provided protein-protein binding interface prediction as constraints, which is followed by the second round of docking with updated docking interface information to further improve docking conformation. Benchmark results on bound and unbound cases show that PI-LZerD consistently improves the docking prediction accuracy as compared with docking without using binding site prediction or using the binding site prediction as post-filtering. We have developed PI-LZerD, a pairwise docking algorithm, which uses imperfect protein-protein binding interface prediction to improve docking accuracy. PI-LZerD consistently showed better prediction accuracy over alternative methods in the series of benchmark experiments including docking using actual docking interface site predictions as well as unbound docking cases.
Hayashi, Takanori; Matsuzaki, Yuri; Yanagisawa, Keisuke; Ohue, Masahito; Akiyama, Yutaka
2018-05-08
Protein-protein interactions (PPIs) play several roles in living cells, and computational PPI prediction is a major focus of many researchers. The three-dimensional (3D) structure and binding surface are important for the design of PPI inhibitors. Therefore, rigid body protein-protein docking calculations for two protein structures are expected to allow elucidation of PPIs different from known complexes in terms of 3D structures because known PPI information is not explicitly required. We have developed rapid PPI prediction software based on protein-protein docking, called MEGADOCK. In order to fully utilize the benefits of computational PPI predictions, it is necessary to construct a comprehensive database to gather prediction results and their predicted 3D complex structures and to make them easily accessible. Although several databases exist that provide predicted PPIs, the previous databases do not contain a sufficient number of entries for the purpose of discovering novel PPIs. In this study, we constructed an integrated database of MEGADOCK PPI predictions, named MEGADOCK-Web. MEGADOCK-Web provides more than 10 times the number of PPI predictions than previous databases and enables users to conduct PPI predictions that cannot be found in conventional PPI prediction databases. In MEGADOCK-Web, there are 7528 protein chains and 28,331,628 predicted PPIs from all possible combinations of those proteins. Each protein structure is annotated with PDB ID, chain ID, UniProt AC, related KEGG pathway IDs, and known PPI pairs. Additionally, MEGADOCK-Web provides four powerful functions: 1) searching precalculated PPI predictions, 2) providing annotations for each predicted protein pair with an experimentally known PPI, 3) visualizing candidates that may interact with the query protein on biochemical pathways, and 4) visualizing predicted complex structures through a 3D molecular viewer. MEGADOCK-Web provides a huge amount of comprehensive PPI predictions based on docking calculations with biochemical pathways and enables users to easily and quickly assess PPI feasibilities by archiving PPI predictions. MEGADOCK-Web also promotes the discovery of new PPIs and protein functions and is freely available for use at http://www.bi.cs.titech.ac.jp/megadock-web/ .
Wei, Qing; La, David; Kihara, Daisuke
2017-01-01
Prediction of protein-protein interaction sites in a protein structure provides important information for elucidating the mechanism of protein function and can also be useful in guiding a modeling or design procedures of protein complex structures. Since prediction methods essentially assess the propensity of amino acids that are likely to be part of a protein docking interface, they can help in designing protein-protein interactions. Here, we introduce BindML and BindML+ protein-protein interaction sites prediction methods. BindML predicts protein-protein interaction sites by identifying mutation patterns found in known protein-protein complexes using phylogenetic substitution models. BindML+ is an extension of BindML for distinguishing permanent and transient types of protein-protein interaction sites. We developed an interactive web-server that provides a convenient interface to assist in structural visualization of protein-protein interactions site predictions. The input data for the web-server are a tertiary structure of interest. BindML and BindML+ are available at http://kiharalab.org/bindml/ and http://kiharalab.org/bindml/plus/ .
Predicting protein complex geometries with a neural network.
Chae, Myong-Ho; Krull, Florian; Lorenzen, Stephan; Knapp, Ernst-Walter
2010-03-01
A major challenge of the protein docking problem is to define scoring functions that can distinguish near-native protein complex geometries from a large number of non-native geometries (decoys) generated with noncomplexed protein structures (unbound docking). In this study, we have constructed a neural network that employs the information from atom-pair distance distributions of a large number of decoys to predict protein complex geometries. We found that docking prediction can be significantly improved using two different types of polar hydrogen atoms. To train the neural network, 2000 near-native decoys of even distance distribution were used for each of the 185 considered protein complexes. The neural network normalizes the information from different protein complexes using an additional protein complex identity input neuron for each complex. The parameters of the neural network were determined such that they mimic a scoring funnel in the neighborhood of the native complex structure. The neural network approach avoids the reference state problem, which occurs in deriving knowledge-based energy functions for scoring. We show that a distance-dependent atom pair potential performs much better than a simple atom-pair contact potential. We have compared the performance of our scoring function with other empirical and knowledge-based scoring functions such as ZDOCK 3.0, ZRANK, ITScore-PP, EMPIRE, and RosettaDock. In spite of the simplicity of the method and its functional form, our neural network-based scoring function achieves a reasonable performance in rigid-body unbound docking of proteins. Proteins 2010. (c) 2009 Wiley-Liss, Inc.
Predicting Protein-protein Association Rates using Coarse-grained Simulation and Machine Learning
NASA Astrophysics Data System (ADS)
Xie, Zhong-Ru; Chen, Jiawen; Wu, Yinghao
2017-04-01
Protein-protein interactions dominate all major biological processes in living cells. We have developed a new Monte Carlo-based simulation algorithm to study the kinetic process of protein association. We tested our method on a previously used large benchmark set of 49 protein complexes. The predicted rate was overestimated in the benchmark test compared to the experimental results for a group of protein complexes. We hypothesized that this resulted from molecular flexibility at the interface regions of the interacting proteins. After applying a machine learning algorithm with input variables that accounted for both the conformational flexibility and the energetic factor of binding, we successfully identified most of the protein complexes with overestimated association rates and improved our final prediction by using a cross-validation test. This method was then applied to a new independent test set and resulted in a similar prediction accuracy to that obtained using the training set. It has been thought that diffusion-limited protein association is dominated by long-range interactions. Our results provide strong evidence that the conformational flexibility also plays an important role in regulating protein association. Our studies provide new insights into the mechanism of protein association and offer a computationally efficient tool for predicting its rate.
Predicting Protein-protein Association Rates using Coarse-grained Simulation and Machine Learning.
Xie, Zhong-Ru; Chen, Jiawen; Wu, Yinghao
2017-04-18
Protein-protein interactions dominate all major biological processes in living cells. We have developed a new Monte Carlo-based simulation algorithm to study the kinetic process of protein association. We tested our method on a previously used large benchmark set of 49 protein complexes. The predicted rate was overestimated in the benchmark test compared to the experimental results for a group of protein complexes. We hypothesized that this resulted from molecular flexibility at the interface regions of the interacting proteins. After applying a machine learning algorithm with input variables that accounted for both the conformational flexibility and the energetic factor of binding, we successfully identified most of the protein complexes with overestimated association rates and improved our final prediction by using a cross-validation test. This method was then applied to a new independent test set and resulted in a similar prediction accuracy to that obtained using the training set. It has been thought that diffusion-limited protein association is dominated by long-range interactions. Our results provide strong evidence that the conformational flexibility also plays an important role in regulating protein association. Our studies provide new insights into the mechanism of protein association and offer a computationally efficient tool for predicting its rate.
3D RNA and functional interactions from evolutionary couplings
Weinreb, Caleb; Riesselman, Adam; Ingraham, John B.; Gross, Torsten; Sander, Chris; Marks, Debora S.
2016-01-01
Summary Non-coding RNAs are ubiquitous, but the discovery of new RNA gene sequences far outpaces research on their structure and functional interactions. We mine the evolutionary sequence record to derive precise information about function and structure of RNAs and RNA-protein complexes. As in protein structure prediction, we use maximum entropy global probability models of sequence co-variation to infer evolutionarily constrained nucleotide-nucleotide interactions within RNA molecules, and nucleotide-amino acid interactions in RNA-protein complexes. The predicted contacts allow all-atom blinded 3D structure prediction at good accuracy for several known RNA structures and RNA-protein complexes. For unknown structures, we predict contacts in 160 non-coding RNA families. Beyond 3D structure prediction, evolutionary couplings help identify important functional interactions, e.g., at switch points in riboswitches and at a complex nucleation site in HIV. Aided by accelerating sequence accumulation, evolutionary coupling analysis can accelerate the discovery of functional interactions and 3D structures involving RNA. PMID:27087444
Várnai, Csilla; Burkoff, Nikolas S; Wild, David L
2017-01-01
Evolutionary information stored in multiple sequence alignments (MSAs) has been used to identify the interaction interface of protein complexes, by measuring either co-conservation or co-mutation of amino acid residues across the interface. Recently, maximum entropy related correlated mutation measures (CMMs) such as direct information, decoupling direct from indirect interactions, have been developed to identify residue pairs interacting across the protein complex interface. These studies have focussed on carefully selected protein complexes with large, good-quality MSAs. In this work, we study protein complexes with a more typical MSA consisting of fewer than 400 sequences, using a set of 79 intramolecular protein complexes. Using a maximum entropy based CMM at the residue level, we develop an interface level CMM score to be used in re-ranking docking decoys. We demonstrate that our interface level CMM score compares favourably to the complementarity trace score, an evolutionary information-based score measuring co-conservation, when combined with the number of interface residues, a knowledge-based potential and the variability score of individual amino acid sites. We also demonstrate, that, since co-mutation and co-complementarity in the MSA contain orthogonal information, the best prediction performance using evolutionary information can be achieved by combining the co-mutation information of the CMM with co-conservation information of a complementarity trace score, predicting a near-native structure as the top prediction for 41% of the dataset. The method presented is not restricted to small MSAs, and will likely improve interface prediction also for complexes with large and good-quality MSAs.
Rehman, Zia Ur; Idris, Adnan; Khan, Asifullah
2018-06-01
Protein-Protein Interactions (PPI) play a vital role in cellular processes and are formed because of thousands of interactions among proteins. Advancements in proteomics technologies have resulted in huge PPI datasets that need to be systematically analyzed. Protein complexes are the locally dense regions in PPI networks, which extend important role in metabolic pathways and gene regulation. In this work, a novel two-phase protein complex detection and grouping mechanism is proposed. In the first phase, topological and biological features are extracted for each complex, and prediction performance is investigated using Bagging based Ensemble classifier (PCD-BEns). Performance evaluation through cross validation shows improvement in comparison to CDIP, MCode, CFinder and PLSMC methods Second phase employs Multi-Dimensional Scaling (MDS) for the grouping of known complexes by exploring inter complex relations. It is experimentally observed that the combination of topological and biological features in the proposed approach has greatly enhanced prediction performance for protein complex detection, which may help to understand various biological processes, whereas application of MDS based exploration may assist in grouping potentially similar complexes. Copyright © 2018 Elsevier Ltd. All rights reserved.
Principles of assembly reveal a periodic table of protein complexes.
Ahnert, Sebastian E; Marsh, Joseph A; Hernández, Helena; Robinson, Carol V; Teichmann, Sarah A
2015-12-11
Structural insights into protein complexes have had a broad impact on our understanding of biological function and evolution. In this work, we sought a comprehensive understanding of the general principles underlying quaternary structure organization in protein complexes. We first examined the fundamental steps by which protein complexes can assemble, using experimental and structure-based characterization of assembly pathways. Most assembly transitions can be classified into three basic types, which can then be used to exhaustively enumerate a large set of possible quaternary structure topologies. These topologies, which include the vast majority of observed protein complex structures, enable a natural organization of protein complexes into a periodic table. On the basis of this table, we can accurately predict the expected frequencies of quaternary structure topologies, including those not yet observed. These results have important implications for quaternary structure prediction, modeling, and engineering. Copyright © 2015, American Association for the Advancement of Science.
Sequence-Based Prediction of RNA-Binding Residues in Proteins.
Walia, Rasna R; El-Manzalawy, Yasser; Honavar, Vasant G; Dobbs, Drena
2017-01-01
Identifying individual residues in the interfaces of protein-RNA complexes is important for understanding the molecular determinants of protein-RNA recognition and has many potential applications. Recent technical advances have led to several high-throughput experimental methods for identifying partners in protein-RNA complexes, but determining RNA-binding residues in proteins is still expensive and time-consuming. This chapter focuses on available computational methods for identifying which amino acids in an RNA-binding protein participate directly in contacting RNA. Step-by-step protocols for using three different web-based servers to predict RNA-binding residues are described. In addition, currently available web servers and software tools for predicting RNA-binding sites, as well as databases that contain valuable information about known protein-RNA complexes, RNA-binding motifs in proteins, and protein-binding recognition sites in RNA are provided. We emphasize sequence-based methods that can reliably identify interfacial residues without the requirement for structural information regarding either the RNA-binding protein or its RNA partner.
Predicting Protein–protein Association Rates using Coarse-grained Simulation and Machine Learning
Xie, Zhong-Ru; Chen, Jiawen; Wu, Yinghao
2017-01-01
Protein–protein interactions dominate all major biological processes in living cells. We have developed a new Monte Carlo-based simulation algorithm to study the kinetic process of protein association. We tested our method on a previously used large benchmark set of 49 protein complexes. The predicted rate was overestimated in the benchmark test compared to the experimental results for a group of protein complexes. We hypothesized that this resulted from molecular flexibility at the interface regions of the interacting proteins. After applying a machine learning algorithm with input variables that accounted for both the conformational flexibility and the energetic factor of binding, we successfully identified most of the protein complexes with overestimated association rates and improved our final prediction by using a cross-validation test. This method was then applied to a new independent test set and resulted in a similar prediction accuracy to that obtained using the training set. It has been thought that diffusion-limited protein association is dominated by long-range interactions. Our results provide strong evidence that the conformational flexibility also plays an important role in regulating protein association. Our studies provide new insights into the mechanism of protein association and offer a computationally efficient tool for predicting its rate. PMID:28418043
Madaoui, Hocine; Guerois, Raphaël
2008-01-01
Protein surfaces are under significant selection pressure to maintain interactions with their partners throughout evolution. Capturing how selection pressure acts at the interfaces of protein–protein complexes is a fundamental issue with high interest for the structural prediction of macromolecular assemblies. We tackled this issue under the assumption that, throughout evolution, mutations should minimally disrupt the physicochemical compatibility between specific clusters of interacting residues. This constraint drove the development of the so-called Surface COmplementarity Trace in Complex History score (SCOTCH), which was found to discriminate with high efficiency the structure of biological complexes. SCOTCH performances were assessed not only with respect to other evolution-based approaches, such as conservation and coevolution analyses, but also with respect to statistically based scoring methods. Validated on a set of 129 complexes of known structure exhibiting both permanent and transient intermolecular interactions, SCOTCH appears as a robust strategy to guide the prediction of protein–protein complex structures. Of particular interest, it also provides a basic framework to efficiently track how protein surfaces could evolve while keeping their partners in contact. PMID:18511568
3dRPC: a web server for 3D RNA-protein structure prediction.
Huang, Yangyu; Li, Haotian; Xiao, Yi
2018-04-01
RNA-protein interactions occur in many biological processes. To understand the mechanism of these interactions one needs to know three-dimensional (3D) structures of RNA-protein complexes. 3dRPC is an algorithm for prediction of 3D RNA-protein complex structures and consists of a docking algorithm RPDOCK and a scoring function 3dRPC-Score. RPDOCK is used to sample possible complex conformations of an RNA and a protein by calculating the geometric and electrostatic complementarities and stacking interactions at the RNA-protein interface according to the features of atom packing of the interface. 3dRPC-Score is a knowledge-based potential that uses the conformations of nucleotide-amino-acid pairs as statistical variables and that is used to choose the near-native complex-conformations obtained from the docking method above. Recently, we built a web server for 3dRPC. The users can easily use 3dRPC without installing it locally. RNA and protein structures in PDB (Protein Data Bank) format are the only needed input files. It can also incorporate the information of interface residues or residue-pairs obtained from experiments or theoretical predictions to improve the prediction. The address of 3dRPC web server is http://biophy.hust.edu.cn/3dRPC. yxiao@hust.edu.cn.
Electrostatic Rate Enhancement and Transient Complex of Protein-Protein Association
Alsallaq, Ramzi; Zhou, Huan-Xiang
2012-01-01
The association of two proteins is bounded by the rate at which they, via diffusion, find each other while in appropriate relative orientations. Orientational constraints restrict this rate to ~105 – 106 M−1s−1. Proteins with higher association rates generally have complementary electrostatic surfaces; proteins with lower association rates generally are slowed down by conformational changes upon complex formation. Previous studies (Zhou, Biophys. J. 1997;73:2441–2445) have shown that electrostatic enhancement of the diffusion-limited association rate can be accurately modeled by kD = kD0 exp(−
Surflex-Dock: Docking benchmarks and real-world application
NASA Astrophysics Data System (ADS)
Spitzer, Russell; Jain, Ajay N.
2012-06-01
Benchmarks for molecular docking have historically focused on re-docking the cognate ligand of a well-determined protein-ligand complex to measure geometric pose prediction accuracy, and measurement of virtual screening performance has been focused on increasingly large and diverse sets of target protein structures, cognate ligands, and various types of decoy sets. Here, pose prediction is reported on the Astex Diverse set of 85 protein ligand complexes, and virtual screening performance is reported on the DUD set of 40 protein targets. In both cases, prepared structures of targets and ligands were provided by symposium organizers. The re-prepared data sets yielded results not significantly different than previous reports of Surflex-Dock on the two benchmarks. Minor changes to protein coordinates resulting from complex pre-optimization had large effects on observed performance, highlighting the limitations of cognate ligand re-docking for pose prediction assessment. Docking protocols developed for cross-docking, which address protein flexibility and produce discrete families of predicted poses, produced substantially better performance for pose prediction. Performance on virtual screening performance was shown to benefit by employing and combining multiple screening methods: docking, 2D molecular similarity, and 3D molecular similarity. In addition, use of multiple protein conformations significantly improved screening enrichment.
Zallocchi, Marisa; Sisson, Joseph H; Cosgrove, Dominic
2010-02-16
Usher syndrome is the major cause of deaf/blindness in the world. It is a genetic heterogeneous disorder, with nine genes already identified as causative for the disease. We noted expression of all known Usher proteins in bovine tracheal epithelial cells and exploited this system for large-scale biochemical analysis of Usher protein complexes. The dissected epithelia were homogenized in nondetergent buffer and sedimented on sucrose gradients. At least two complexes were evident after the first gradient: one formed by specific isoforms of CDH23, PCDH15, and VLGR-1 and a different one at the top of the gradient that included all of the Usher proteins and rab5, a transport vesicle marker. TEM analysis of these top fractions found them enriched in 100-200 nm vesicles, confirming a vesicular association of the Usher complex(es). Immunoisolation of these vesicles confirmed some of the associations already predicted and identified novel interactions. When the vesicles are lysed in the presence of phenylbutyrate, most of the Usher proteins cosediment into the gradient at a sedimentation coefficient of approximately 50 S, correlating with a predicted molecular mass of 2 x 10(6) Da. Although it is still unclear whether there is only one complex or several independent complexes that are trafficked within distinct vesicular pools, this work shows for the first time that native Usher protein complexes occur in vivo. This complex(es) is present primarily in transport vesicles at the apical pole of tracheal epithelial cells, predicting that Usher proteins may be directionally transported as complexes in hair cells and photoreceptors.
Zallocchi, Marisa; Sisson, Joseph H.; Cosgrove, Dominic
2010-01-01
Usher syndrome is the major cause of deaf/blindness in the world. It is a genetic heterogeneous disorder, with nine genes already identified as causative for the disease. We noted expression of all known Usher proteins in bovine tracheal epithelial cells, and exploited this system for large-scale biochemical analysis of Usher protein complexes. The dissected epithelia were homogenized in non-detergent buffer, and sedimented on sucrose gradients. At least two complexes were evident after the first gradient: one formed by specific isoforms of CDH23, PCDH15 and VLGR-1, and a different one at the top of the gradient that included all the Usher proteins and rab5, a transport vesicle marker. TEM analysis of these top fractions found them enriched in 100–200 nm vesicles, confirming a vesicular association of the Usher complex(es). Immunoisolation of these vesicles confirmed some of the associations already predicted and identified novel interactions. When the vesicles are lysed in the presence of phenylbutyrate, most of the Usher proteins co-sediment into the gradient at a sedimentation coefficient of approximately 50S, correlating with a predicted molecular mass of 2 × 106 Daltons. Although it is still unclear whether there is only one complex or several independent complexes that are trafficked within distinct vesicular pools, this work shows for the first time that native Usher proteins complexes occur in vivo. This complex(es) is present primarily in transport vesicles at the apical pole of tracheal epithelial cells, predicting that Usher proteins may be directionally transported as complexes in hair cells and photoreceptors. PMID:20058854
Fukunishi, Yoshifumi
2010-01-01
For fragment-based drug development, both hit (active) compound prediction and docking-pose (protein-ligand complex structure) prediction of the hit compound are important, since chemical modification (fragment linking, fragment evolution) subsequent to the hit discovery must be performed based on the protein-ligand complex structure. However, the naïve protein-compound docking calculation shows poor accuracy in terms of docking-pose prediction. Thus, post-processing of the protein-compound docking is necessary. Recently, several methods for the post-processing of protein-compound docking have been proposed. In FBDD, the compounds are smaller than those for conventional drug screening. This makes it difficult to perform the protein-compound docking calculation. A method to avoid this problem has been reported. Protein-ligand binding free energy estimation is useful to reduce the procedures involved in the chemical modification of the hit fragment. Several prediction methods have been proposed for high-accuracy estimation of protein-ligand binding free energy. This paper summarizes the various computational methods proposed for docking-pose prediction and their usefulness in FBDD.
Electrostatic rate enhancement and transient complex of protein-protein association.
Alsallaq, Ramzi; Zhou, Huan-Xiang
2008-04-01
The association of two proteins is bounded by the rate at which they, via diffusion, find each other while in appropriate relative orientations. Orientational constraints restrict this rate to approximately 10(5)-10(6) M(-1) s(-1). Proteins with higher association rates generally have complementary electrostatic surfaces; proteins with lower association rates generally are slowed down by conformational changes upon complex formation. Previous studies (Zhou, Biophys J 1997;73:2441-2445) have shown that electrostatic enhancement of the diffusion-limited association rate can be accurately modeled by $k_{\\bf D}$ = $k_{D}0\\ {exp} ( - \\langle U_{el} \\rangle;{\\star}/k_{B} T),$ where k(D) and k(D0) are the rates in the presence and absence of electrostatic interactions, respectively, U(el) is the average electrostatic interaction energy in a "transient-complex" ensemble, and k(B)T is the thermal energy. The transient-complex ensemble separates the bound state from the unbound state. Predictions of the transient-complex theory on four protein complexes were found to agree well with the experiment when the electrostatic interaction energy was calculated with the linearized Poisson-Boltzmann (PB) equation (Alsallaq and Zhou, Structure 2007;15:215-224). Here we show that the agreement is further improved when the nonlinear PB equation is used. These predictions are obtained with the dielectric boundary defined as the protein van der Waals surface. When the dielectric boundary is instead specified as the molecular surface, electrostatic interactions in the transient complex become repulsive and are thus predicted to retard association. Together these results demonstrate that the transient-complex theory is predictive of electrostatic rate enhancement and can help parameterize PB calculations. (c) 2007 Wiley-Liss, Inc.
Text Mining for Protein Docking
Badal, Varsha D.; Kundrotas, Petras J.; Vakser, Ilya A.
2015-01-01
The rapidly growing amount of publicly available information from biomedical research is readily accessible on the Internet, providing a powerful resource for predictive biomolecular modeling. The accumulated data on experimentally determined structures transformed structure prediction of proteins and protein complexes. Instead of exploring the enormous search space, predictive tools can simply proceed to the solution based on similarity to the existing, previously determined structures. A similar major paradigm shift is emerging due to the rapidly expanding amount of information, other than experimentally determined structures, which still can be used as constraints in biomolecular structure prediction. Automated text mining has been widely used in recreating protein interaction networks, as well as in detecting small ligand binding sites on protein structures. Combining and expanding these two well-developed areas of research, we applied the text mining to structural modeling of protein-protein complexes (protein docking). Protein docking can be significantly improved when constraints on the docking mode are available. We developed a procedure that retrieves published abstracts on a specific protein-protein interaction and extracts information relevant to docking. The procedure was assessed on protein complexes from Dockground (http://dockground.compbio.ku.edu). The results show that correct information on binding residues can be extracted for about half of the complexes. The amount of irrelevant information was reduced by conceptual analysis of a subset of the retrieved abstracts, based on the bag-of-words (features) approach. Support Vector Machine models were trained and validated on the subset. The remaining abstracts were filtered by the best-performing models, which decreased the irrelevant information for ~ 25% complexes in the dataset. The extracted constraints were incorporated in the docking protocol and tested on the Dockground unbound benchmark set, significantly increasing the docking success rate. PMID:26650466
Using the Concept of Transient Complex for Affinity Predictions in CAPRI Rounds 20–27 and Beyond
Qin, Sanbo; Zhou, Huan-Xiang
2013-01-01
Predictions of protein-protein binders and binding affinities have traditionally focused on features pertaining to the native complexes. In developing a computational method for predicting protein-protein association rate constants, we introduced the concept of transient complex after mapping the interaction energy surface. The transient complex is located at the outer boundary of the bound-state energy well, having near-native separation and relative orientation between the subunits but not yet formed most of the short-range native interactions. We found that the width of the binding funnel and the electrostatic interaction energy of the transient complex are among the features predictive of binders and binding affinities. These ideas were very promising for the five affinity-related targets (T43–45, 55, and 56) of CAPRI rounds 20–27. For T43, we ranked the single crystallographic complex as number 1 and were one of only two groups that clearly identified that complex as a true binder; for T44, we ranked the only design with measurable binding affinity as number 4. For the nine docking targets, continuing on our success in previous CAPRI rounds, we produced 10 medium-quality models for T47 and acceptable models for T48 and T49. We conclude that the interaction energy landscape and the transient complex in particular will complement existing features in leading to better prediction of binding affinities. PMID:23873496
Optimization of protein-protein docking for predicting Fc-protein interactions.
Agostino, Mark; Mancera, Ricardo L; Ramsland, Paul A; Fernández-Recio, Juan
2016-11-01
The antibody crystallizable fragment (Fc) is recognized by effector proteins as part of the immune system. Pathogens produce proteins that bind Fc in order to subvert or evade the immune response. The structural characterization of the determinants of Fc-protein association is essential to improve our understanding of the immune system at the molecular level and to develop new therapeutic agents. Furthermore, Fc-binding peptides and proteins are frequently used to purify therapeutic antibodies. Although several structures of Fc-protein complexes are available, numerous others have not yet been determined. Protein-protein docking could be used to investigate Fc-protein complexes; however, improved approaches are necessary to efficiently model such cases. In this study, a docking-based structural bioinformatics approach is developed for predicting the structures of Fc-protein complexes. Based on the available set of X-ray structures of Fc-protein complexes, three regions of the Fc, loosely corresponding to three turns within the structure, were defined as containing the essential features for protein recognition and used as restraints to filter the initial docking search. Rescoring the filtered poses with an optimal scoring strategy provided a success rate of approximately 80% of the test cases examined within the top ranked 20 poses, compared to approximately 20% by the initial unrestrained docking. The developed docking protocol provides a significant improvement over the initial unrestrained docking and will be valuable for predicting the structures of currently undetermined Fc-protein complexes, as well as in the design of peptides and proteins that target Fc. Copyright © 2016 John Wiley & Sons, Ltd.
Kirkwood, Kathryn J.; Ahmad, Yasmeen; Larance, Mark; Lamond, Angus I.
2013-01-01
Proteins form a diverse array of complexes that mediate cellular function and regulation. A largely unexplored feature of such protein complexes is the selective participation of specific protein isoforms and/or post-translationally modified forms. In this study, we combined native size-exclusion chromatography (SEC) with high-throughput proteomic analysis to characterize soluble protein complexes isolated from human osteosarcoma (U2OS) cells. Using this approach, we have identified over 71,500 peptides and 1,600 phosphosites, corresponding to over 8,000 proteins, distributed across 40 SEC fractions. This represents >50% of the predicted U2OS cell proteome, identified with a mean peptide sequence coverage of 27% per protein. Three biological replicates were performed, allowing statistical evaluation of the data and demonstrating a high degree of reproducibility in the SEC fractionation procedure. Specific proteins were detected interacting with multiple independent complexes, as typified by the separation of distinct complexes for the MRFAP1-MORF4L1-MRGBP interaction network. The data also revealed protein isoforms and post-translational modifications that selectively associated with distinct subsets of protein complexes. Surprisingly, there was clear enrichment for specific Gene Ontology terms associated with differential size classes of protein complexes. This study demonstrates that combined SEC/MS analysis can be used for the system-wide annotation of protein complexes and to predict potential isoform-specific interactions. All of these SEC data on the native separation of protein complexes have been integrated within the Encyclopedia of Proteome Dynamics, an online, multidimensional data-sharing resource available to the community. PMID:24043423
Kirkwood, Kathryn J; Ahmad, Yasmeen; Larance, Mark; Lamond, Angus I
2013-12-01
Proteins form a diverse array of complexes that mediate cellular function and regulation. A largely unexplored feature of such protein complexes is the selective participation of specific protein isoforms and/or post-translationally modified forms. In this study, we combined native size-exclusion chromatography (SEC) with high-throughput proteomic analysis to characterize soluble protein complexes isolated from human osteosarcoma (U2OS) cells. Using this approach, we have identified over 71,500 peptides and 1,600 phosphosites, corresponding to over 8,000 proteins, distributed across 40 SEC fractions. This represents >50% of the predicted U2OS cell proteome, identified with a mean peptide sequence coverage of 27% per protein. Three biological replicates were performed, allowing statistical evaluation of the data and demonstrating a high degree of reproducibility in the SEC fractionation procedure. Specific proteins were detected interacting with multiple independent complexes, as typified by the separation of distinct complexes for the MRFAP1-MORF4L1-MRGBP interaction network. The data also revealed protein isoforms and post-translational modifications that selectively associated with distinct subsets of protein complexes. Surprisingly, there was clear enrichment for specific Gene Ontology terms associated with differential size classes of protein complexes. This study demonstrates that combined SEC/MS analysis can be used for the system-wide annotation of protein complexes and to predict potential isoform-specific interactions. All of these SEC data on the native separation of protein complexes have been integrated within the Encyclopedia of Proteome Dynamics, an online, multidimensional data-sharing resource available to the community.
Peterson, Lenna X; Shin, Woong-Hee; Kim, Hyungrae; Kihara, Daisuke
2018-03-01
We report our group's performance for protein-protein complex structure prediction and scoring in Round 37 of the Critical Assessment of PRediction of Interactions (CAPRI), an objective assessment of protein-protein complex modeling. We demonstrated noticeable improvement in both prediction and scoring compared to previous rounds of CAPRI, with our human predictor group near the top of the rankings and our server scorer group at the top. This is the first time in CAPRI that a server has been the top scorer group. To predict protein-protein complex structures, we used both multi-chain template-based modeling (TBM) and our protein-protein docking program, LZerD. LZerD represents protein surfaces using 3D Zernike descriptors (3DZD), which are based on a mathematical series expansion of a 3D function. Because 3DZD are a soft representation of the protein surface, LZerD is tolerant to small conformational changes, making it well suited to docking unbound and TBM structures. The key to our improved performance in CAPRI Round 37 was to combine multi-chain TBM and docking. As opposed to our previous strategy of performing docking for all target complexes, we used TBM when multi-chain templates were available and docking otherwise. We also describe the combination of multiple scoring functions used by our server scorer group, which achieved the top rank for the scorer phase. © 2017 Wiley Periodicals, Inc.
ProBiS-CHARMMing: Web Interface for Prediction and Optimization of Ligands in Protein Binding Sites.
Konc, Janez; Miller, Benjamin T; Štular, Tanja; Lešnik, Samo; Woodcock, H Lee; Brooks, Bernard R; Janežič, Dušanka
2015-11-23
Proteins often exist only as apo structures (unligated) in the Protein Data Bank, with their corresponding holo structures (with ligands) unavailable. However, apoproteins may not represent the amino-acid residue arrangement upon ligand binding well, which is especially problematic for molecular docking. We developed the ProBiS-CHARMMing web interface by connecting the ProBiS ( http://probis.cmm.ki.si ) and CHARMMing ( http://www.charmming.org ) web servers into one functional unit that enables prediction of protein-ligand complexes and allows for their geometry optimization and interaction energy calculation. The ProBiS web server predicts ligands (small compounds, proteins, nucleic acids, and single-atom ligands) that may bind to a query protein. This is achieved by comparing its surface structure against a nonredundant database of protein structures and finding those that have binding sites similar to that of the query protein. Existing ligands found in the similar binding sites are then transposed to the query according to predictions from ProBiS. The CHARMMing web server enables, among other things, minimization and potential energy calculation for a wide variety of biomolecular systems, and it is used here to optimize the geometry of the predicted protein-ligand complex structures using the CHARMM force field and to calculate their interaction energies with the corresponding query proteins. We show how ProBiS-CHARMMing can be used to predict ligands and their poses for a particular binding site, and minimize the predicted protein-ligand complexes to obtain representations of holoproteins. The ProBiS-CHARMMing web interface is freely available for academic users at http://probis.nih.gov.
NASA Astrophysics Data System (ADS)
Xu, Xianjin; Yan, Chengfei; Zou, Xiaoqin
2017-08-01
The growing number of protein-ligand complex structures, particularly the structures of proteins co-bound with different ligands, in the Protein Data Bank helps us tackle two major challenges in molecular docking studies: the protein flexibility and the scoring function. Here, we introduced a systematic strategy by using the information embedded in the known protein-ligand complex structures to improve both binding mode and binding affinity predictions. Specifically, a ligand similarity calculation method was employed to search a receptor structure with a bound ligand sharing high similarity with the query ligand for the docking use. The strategy was applied to the two datasets (HSP90 and MAP4K4) in recent D3R Grand Challenge 2015. In addition, for the HSP90 dataset, a system-specific scoring function (ITScore2_hsp90) was generated by recalibrating our statistical potential-based scoring function (ITScore2) using the known protein-ligand complex structures and the statistical mechanics-based iterative method. For the HSP90 dataset, better performances were achieved for both binding mode and binding affinity predictions comparing with the original ITScore2 and with ensemble docking. For the MAP4K4 dataset, although there were only eight known protein-ligand complex structures, our docking strategy achieved a comparable performance with ensemble docking. Our method for receptor conformational selection and iterative method for the development of system-specific statistical potential-based scoring functions can be easily applied to other protein targets that have a number of protein-ligand complex structures available to improve predictions on binding.
Yan, Yumeng; Wen, Zeyu; Wang, Xinxiang; Huang, Sheng-You
2017-03-01
Protein-protein docking is an important computational tool for predicting protein-protein interactions. With the rapid development of proteomics projects, more and more experimental binding information ranging from mutagenesis data to three-dimensional structures of protein complexes are becoming available. Therefore, how to appropriately incorporate the biological information into traditional ab initio docking has been an important issue and challenge in the field of protein-protein docking. To address these challenges, we have developed a Hybrid DOCKing protocol of template-based and template-free approaches, referred to as HDOCK. The basic procedure of HDOCK is to model the structures of individual components based on the template complex by a template-based method if a template is available; otherwise, the component structures will be modeled based on monomer proteins by regular homology modeling. Then, the complex structure of the component models is predicted by traditional protein-protein docking. With the HDOCK protocol, we have participated in the CPARI experiment for rounds 28-35. Out of the 25 CASP-CAPRI targets for oligomer modeling, our HDOCK protocol predicted correct models for 16 targets, ranking one of the top algorithms in this challenge. Our docking method also made correct predictions on other CAPRI challenges such as protein-peptide binding for 6 out of 8 targets and water predictions for 2 out of 2 targets. The advantage of our hybrid docking approach over pure template-based docking was further confirmed by a comparative evaluation on 20 CASP-CAPRI targets. Proteins 2017; 85:497-512. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
Modeling protein complexes with BiGGER.
Krippahl, Ludwig; Moura, José J; Palma, P Nuno
2003-07-01
This article describes the method and results of our participation in the Critical Assessment of PRediction of Interactions (CAPRI) experiment, using the protein docking program BiGGER (Bimolecular complex Generation with Global Evaluation and Ranking) (Palma et al., Proteins 2000;39:372-384). Of five target complexes (CAPRI targets 2, 4, 5, 6, and 7), only one was successfully predicted (target 6), but BiGGER generated reasonable models for targets 4, 5, and 7, which could have been identified if additional biochemical information had been available. Copyright 2003 Wiley-Liss, Inc.
Genomic analysis of organismal complexity in the multicellular green alga Volvox carteri
DOE Office of Scientific and Technical Information (OSTI.GOV)
Prochnik, Simon E.; Umen, James; Nedelcu, Aurora
2010-07-01
Analysis of the Volvox carteri genome reveals that this green alga's increased organismal complexity and multicellularity are associated with modifications in protein families shared with its unicellular ancestor, and not with large-scale innovations in protein coding capacity. The multicellular green alga Volvox carteri and its morphologically diverse close relatives (the volvocine algae) are uniquely suited for investigating the evolution of multicellularity and development. We sequenced the 138 Mb genome of V. carteri and compared its {approx}14,500 predicted proteins to those of its unicellular relative, Chlamydomonas reinhardtii. Despite fundamental differences in organismal complexity and life history, the two species have similarmore » protein-coding potentials, and few species-specific protein-coding gene predictions. Interestingly, volvocine algal-specific proteins are enriched in Volvox, including those associated with an expanded and highly compartmentalized extracellular matrix. Our analysis shows that increases in organismal complexity can be associated with modifications of lineage-specific proteins rather than large-scale invention of protein-coding capacity.« less
CHENG, JIANLIN; EICKHOLT, JESSE; WANG, ZHENG; DENG, XIN
2013-01-01
After decades of research, protein structure prediction remains a very challenging problem. In order to address the different levels of complexity of structural modeling, two types of modeling techniques — template-based modeling and template-free modeling — have been developed. Template-based modeling can often generate a moderate- to high-resolution model when a similar, homologous template structure is found for a query protein but fails if no template or only incorrect templates are found. Template-free modeling, such as fragment-based assembly, may generate models of moderate resolution for small proteins of low topological complexity. Seldom have the two techniques been integrated together to improve protein modeling. Here we develop a recursive protein modeling approach to selectively and collaboratively apply template-based and template-free modeling methods to model template-covered (i.e. certain) and template-free (i.e. uncertain) regions of a protein. A preliminary implementation of the approach was tested on a number of hard modeling cases during the 9th Critical Assessment of Techniques for Protein Structure Prediction (CASP9) and successfully improved the quality of modeling in most of these cases. Recursive modeling can signicantly reduce the complexity of protein structure modeling and integrate template-based and template-free modeling to improve the quality and efficiency of protein structure prediction. PMID:22809379
Sequence-Based Prediction of RNA-Binding Residues in Proteins
Walia, Rasna R.; EL-Manzalawy, Yasser; Honavar, Vasant G.; Dobbs, Drena
2017-01-01
Identifying individual residues in the interfaces of protein–RNA complexes is important for understanding the molecular determinants of protein–RNA recognition and has many potential applications. Recent technical advances have led to several high-throughput experimental methods for identifying partners in protein–RNA complexes, but determining RNA-binding residues in proteins is still expensive and time-consuming. This chapter focuses on available computational methods for identifying which amino acids in an RNA-binding protein participate directly in contacting RNA. Step-by-step protocols for using three different web-based servers to predict RNA-binding residues are described. In addition, currently available web servers and software tools for predicting RNA-binding sites, as well as databases that contain valuable information about known protein–RNA complexes, RNA-binding motifs in proteins, and protein-binding recognition sites in RNA are provided. We emphasize sequence-based methods that can reliably identify interfacial residues without the requirement for structural information regarding either the RNA-binding protein or its RNA partner. PMID:27787829
Template-Based Modeling of Protein-RNA Interactions.
Zheng, Jinfang; Kundrotas, Petras J; Vakser, Ilya A; Liu, Shiyong
2016-09-01
Protein-RNA complexes formed by specific recognition between RNA and RNA-binding proteins play an important role in biological processes. More than a thousand of such proteins in human are curated and many novel RNA-binding proteins are to be discovered. Due to limitations of experimental approaches, computational techniques are needed for characterization of protein-RNA interactions. Although much progress has been made, adequate methodologies reliably providing atomic resolution structural details are still lacking. Although protein-RNA free docking approaches proved to be useful, in general, the template-based approaches provide higher quality of predictions. Templates are key to building a high quality model. Sequence/structure relationships were studied based on a representative set of binary protein-RNA complexes from PDB. Several approaches were tested for pairwise target/template alignment. The analysis revealed a transition point between random and correct binding modes. The results showed that structural alignment is better than sequence alignment in identifying good templates, suitable for generating protein-RNA complexes close to the native structure, and outperforms free docking, successfully predicting complexes where the free docking fails, including cases of significant conformational change upon binding. A template-based protein-RNA interaction modeling protocol PRIME was developed and benchmarked on a representative set of complexes.
Protein-Protein Docking in Drug Design and Discovery.
Kaczor, Agnieszka A; Bartuzi, Damian; Stępniewski, Tomasz Maciej; Matosiuk, Dariusz; Selent, Jana
2018-01-01
Protein-protein interactions (PPIs) are responsible for a number of key physiological processes in the living cells and underlie the pathomechanism of many diseases. Nowadays, along with the concept of so-called "hot spots" in protein-protein interactions, which are well-defined interface regions responsible for most of the binding energy, these interfaces can be targeted with modulators. In order to apply structure-based design techniques to design PPIs modulators, a three-dimensional structure of protein complex has to be available. In this context in silico approaches, in particular protein-protein docking, are a valuable complement to experimental methods for elucidating 3D structure of protein complexes. Protein-protein docking is easy to use and does not require significant computer resources and time (in contrast to molecular dynamics) and it results in 3D structure of a protein complex (in contrast to sequence-based methods of predicting binding interfaces). However, protein-protein docking cannot address all the aspects of protein dynamics, in particular the global conformational changes during protein complex formation. In spite of this fact, protein-protein docking is widely used to model complexes of water-soluble proteins and less commonly to predict structures of transmembrane protein assemblies, including dimers and oligomers of G protein-coupled receptors (GPCRs). In this chapter we review the principles of protein-protein docking, available algorithms and software and discuss the recent examples, benefits, and drawbacks of protein-protein docking application to water-soluble proteins, membrane anchoring and transmembrane proteins, including GPCRs.
Chira, Camelia; Horvath, Dragos; Dumitrescu, D
2011-07-30
Proteins are complex structures made of amino acids having a fundamental role in the correct functioning of living cells. The structure of a protein is the result of the protein folding process. However, the general principles that govern the folding of natural proteins into a native structure are unknown. The problem of predicting a protein structure with minimum-energy starting from the unfolded amino acid sequence is a highly complex and important task in molecular and computational biology. Protein structure prediction has important applications in fields such as drug design and disease prediction. The protein structure prediction problem is NP-hard even in simplified lattice protein models. An evolutionary model based on hill-climbing genetic operators is proposed for protein structure prediction in the hydrophobic - polar (HP) model. Problem-specific search operators are implemented and applied using a steepest-ascent hill-climbing approach. Furthermore, the proposed model enforces an explicit diversification stage during the evolution in order to avoid local optimum. The main features of the resulting evolutionary algorithm - hill-climbing mechanism and diversification strategy - are evaluated in a set of numerical experiments for the protein structure prediction problem to assess their impact to the efficiency of the search process. Furthermore, the emerging consolidated model is compared to relevant algorithms from the literature for a set of difficult bidimensional instances from lattice protein models. The results obtained by the proposed algorithm are promising and competitive with those of related methods.
Srihari, Sriganesh; Yong, Chern Han; Patil, Ashwini; Wong, Limsoon
2015-09-14
Complexes of physically interacting proteins constitute fundamental functional units responsible for driving biological processes within cells. A faithful reconstruction of the entire set of complexes is therefore essential to understand the functional organisation of cells. In this review, we discuss the key contributions of computational methods developed till date (approximately between 2003 and 2015) for identifying complexes from the network of interacting proteins (PPI network). We evaluate in depth the performance of these methods on PPI datasets from yeast, and highlight their limitations and challenges, in particular at detecting sparse and small or sub-complexes and discerning overlapping complexes. We describe methods for integrating diverse information including expression profiles and 3D structures of proteins with PPI networks to understand the dynamics of complex formation, for instance, of time-based assembly of complex subunits and formation of fuzzy complexes from intrinsically disordered proteins. Finally, we discuss methods for identifying dysfunctional complexes in human diseases, an application that is proving invaluable to understand disease mechanisms and to discover novel therapeutic targets. We hope this review aptly commemorates a decade of research on computational prediction of complexes and constitutes a valuable reference for further advancements in this exciting area. Copyright © 2015 Federation of European Biochemical Societies. Published by Elsevier B.V. All rights reserved.
Mehranfar, Adele; Ghadiri, Nasser; Kouhsar, Morteza; Golshani, Ashkan
2017-09-01
Detecting the protein complexes is an important task in analyzing the protein interaction networks. Although many algorithms predict protein complexes in different ways, surveys on the interaction networks indicate that about 50% of detected interactions are false positives. Consequently, the accuracy of existing methods needs to be improved. In this paper we propose a novel algorithm to detect the protein complexes in 'noisy' protein interaction data. First, we integrate several biological data sources to determine the reliability of each interaction and determine more accurate weights for the interactions. A data fusion component is used for this step, based on the interval type-2 fuzzy voter that provides an efficient combination of the information sources. This fusion component detects the errors and diminishes their effect on the detection protein complexes. So in the first step, the reliability scores have been assigned for every interaction in the network. In the second step, we have proposed a general protein complex detection algorithm by exploiting and adopting the strong points of other algorithms and existing hypotheses regarding real complexes. Finally, the proposed method has been applied for the yeast interaction datasets for predicting the interactions. The results show that our framework has a better performance regarding precision and F-measure than the existing approaches. Copyright © 2017 Elsevier Ltd. All rights reserved.
Assessment of CAPRI predictions in rounds 3-5 shows progress in docking procedures.
Méndez, Raúl; Leplae, Raphaël; Lensink, Marc F; Wodak, Shoshana J
2005-08-01
The current status of docking procedures for predicting protein-protein interactions starting from their three-dimensional (3D) structure is reassessed by evaluating blind predictions, performed during 2003-2004 as part of Rounds 3-5 of the community-wide experiment on Critical Assessment of PRedicted Interactions (CAPRI). Ten newly determined structures of protein-protein complexes were used as targets for these rounds. They comprised 2 enzyme-inhibitor complexes, 2 antigen-antibody complexes, 2 complexes involved in cellular signaling, 2 homo-oligomers, and a complex between 2 components of the bacterial cellulosome. For most targets, the predictors were given the experimental structures of 1 unbound and 1 bound component, with the latter in a random orientation. For some, the structure of the free component was derived from that of a related protein, requiring the use of homology modeling. In some of the targets, significant differences in conformation were displayed between the bound and unbound components, representing a major challenge for the docking procedures. For 1 target, predictions could not go to completion. In total, 1866 predictions submitted by 30 groups were evaluated. Over one-third of these groups applied completely novel docking algorithms and scoring functions, with several of them specifically addressing the challenge of dealing with side-chain and backbone flexibility. The quality of the predicted interactions was evaluated by comparison to the experimental structures of the targets, made available for the evaluation, using the well-agreed-upon criteria used previously. Twenty-four groups, which for the first time included an automatic Web server, produced predictions ranking from acceptable to highly accurate for all targets, including those where the structures of the bound and unbound forms differed substantially. These results and a brief survey of the methods used by participants of CAPRI Rounds 3-5 suggest that genuine progress in the performance of docking methods is being achieved, with CAPRI acting as the catalyst.
A rapid and accurate approach for prediction of interactomes from co-elution data (PrInCE).
Stacey, R Greg; Skinnider, Michael A; Scott, Nichollas E; Foster, Leonard J
2017-10-23
An organism's protein interactome, or complete network of protein-protein interactions, defines the protein complexes that drive cellular processes. Techniques for studying protein complexes have traditionally applied targeted strategies such as yeast two-hybrid or affinity purification-mass spectrometry to assess protein interactions. However, given the vast number of protein complexes, more scalable methods are necessary to accelerate interaction discovery and to construct whole interactomes. We recently developed a complementary technique based on the use of protein correlation profiling (PCP) and stable isotope labeling in amino acids in cell culture (SILAC) to assess chromatographic co-elution as evidence of interacting proteins. Importantly, PCP-SILAC is also capable of measuring protein interactions simultaneously under multiple biological conditions, allowing the detection of treatment-specific changes to an interactome. Given the uniqueness and high dimensionality of co-elution data, new tools are needed to compare protein elution profiles, control false discovery rates, and construct an accurate interactome. Here we describe a freely available bioinformatics pipeline, PrInCE, for the analysis of co-elution data. PrInCE is a modular, open-source library that is computationally inexpensive, able to use label and label-free data, and capable of detecting tens of thousands of protein-protein interactions. Using a machine learning approach, PrInCE offers greatly reduced run time, more predicted interactions at the same stringency, prediction of protein complexes, and greater ease of use over previous bioinformatics tools for co-elution data. PrInCE is implemented in Matlab (version R2017a). Source code and standalone executable programs for Windows and Mac OSX are available at https://github.com/fosterlab/PrInCE , where usage instructions can be found. An example dataset and output are also provided for testing purposes. PrInCE is the first fast and easy-to-use data analysis pipeline that predicts interactomes and protein complexes from co-elution data. PrInCE allows researchers without bioinformatics expertise to analyze high-throughput co-elution datasets.
Improving protein complex classification accuracy using amino acid composition profile.
Huang, Chien-Hung; Chou, Szu-Yu; Ng, Ka-Lok
2013-09-01
Protein complex prediction approaches are based on the assumptions that complexes have dense protein-protein interactions and high functional similarity between their subunits. We investigated those assumptions by studying the subunits' interaction topology, sequence similarity and molecular function for human and yeast protein complexes. Inclusion of amino acids' physicochemical properties can provide better understanding of protein complex properties. Principal component analysis is carried out to determine the major features. Adopting amino acid composition profile information with the SVM classifier serves as an effective post-processing step for complexes classification. Improvement is based on primary sequence information only, which is easy to obtain. Copyright © 2013 Elsevier Ltd. All rights reserved.
Template-Based Modeling of Protein-RNA Interactions
Zheng, Jinfang; Kundrotas, Petras J.; Vakser, Ilya A.
2016-01-01
Protein-RNA complexes formed by specific recognition between RNA and RNA-binding proteins play an important role in biological processes. More than a thousand of such proteins in human are curated and many novel RNA-binding proteins are to be discovered. Due to limitations of experimental approaches, computational techniques are needed for characterization of protein-RNA interactions. Although much progress has been made, adequate methodologies reliably providing atomic resolution structural details are still lacking. Although protein-RNA free docking approaches proved to be useful, in general, the template-based approaches provide higher quality of predictions. Templates are key to building a high quality model. Sequence/structure relationships were studied based on a representative set of binary protein-RNA complexes from PDB. Several approaches were tested for pairwise target/template alignment. The analysis revealed a transition point between random and correct binding modes. The results showed that structural alignment is better than sequence alignment in identifying good templates, suitable for generating protein-RNA complexes close to the native structure, and outperforms free docking, successfully predicting complexes where the free docking fails, including cases of significant conformational change upon binding. A template-based protein-RNA interaction modeling protocol PRIME was developed and benchmarked on a representative set of complexes. PMID:27662342
Antibody-protein interactions: benchmark datasets and prediction tools evaluation
Ponomarenko, Julia V; Bourne, Philip E
2007-01-01
Background The ability to predict antibody binding sites (aka antigenic determinants or B-cell epitopes) for a given protein is a precursor to new vaccine design and diagnostics. Among the various methods of B-cell epitope identification X-ray crystallography is one of the most reliable methods. Using these experimental data computational methods exist for B-cell epitope prediction. As the number of structures of antibody-protein complexes grows, further interest in prediction methods using 3D structure is anticipated. This work aims to establish a benchmark for 3D structure-based epitope prediction methods. Results Two B-cell epitope benchmark datasets inferred from the 3D structures of antibody-protein complexes were defined. The first is a dataset of 62 representative 3D structures of protein antigens with inferred structural epitopes. The second is a dataset of 82 structures of antibody-protein complexes containing different structural epitopes. Using these datasets, eight web-servers developed for antibody and protein binding sites prediction have been evaluated. In no method did performance exceed a 40% precision and 46% recall. The values of the area under the receiver operating characteristic curve for the evaluated methods were about 0.6 for ConSurf, DiscoTope, and PPI-PRED methods and above 0.65 but not exceeding 0.70 for protein-protein docking methods when the best of the top ten models for the bound docking were considered; the remaining methods performed close to random. The benchmark datasets are included as a supplement to this paper. Conclusion It may be possible to improve epitope prediction methods through training on datasets which include only immune epitopes and through utilizing more features characterizing epitopes, for example, the evolutionary conservation score. Notwithstanding, overall poor performance may reflect the generality of antigenicity and hence the inability to decipher B-cell epitopes as an intrinsic feature of the protein. It is an open question as to whether ultimately discriminatory features can be found. PMID:17910770
Discovering protein complexes in protein interaction networks via exploring the weak ties effect
2012-01-01
Background Studying protein complexes is very important in biological processes since it helps reveal the structure-functionality relationships in biological networks and much attention has been paid to accurately predict protein complexes from the increasing amount of protein-protein interaction (PPI) data. Most of the available algorithms are based on the assumption that dense subgraphs correspond to complexes, failing to take into account the inherence organization within protein complex and the roles of edges. Thus, there is a critical need to investigate the possibility of discovering protein complexes using the topological information hidden in edges. Results To provide an investigation of the roles of edges in PPI networks, we show that the edges connecting less similar vertices in topology are more significant in maintaining the global connectivity, indicating the weak ties phenomenon in PPI networks. We further demonstrate that there is a negative relation between the weak tie strength and the topological similarity. By using the bridges, a reliable virtual network is constructed, in which each maximal clique corresponds to the core of a complex. By this notion, the detection of the protein complexes is transformed into a classic all-clique problem. A novel core-attachment based method is developed, which detects the cores and attachments, respectively. A comprehensive comparison among the existing algorithms and our algorithm has been made by comparing the predicted complexes against benchmark complexes. Conclusions We proved that the weak tie effect exists in the PPI network and demonstrated that the density is insufficient to characterize the topological structure of protein complexes. Furthermore, the experimental results on the yeast PPI network show that the proposed method outperforms the state-of-the-art algorithms. The analysis of detected modules by the present algorithm suggests that most of these modules have well biological significance in context of complexes, suggesting that the roles of edges are critical in discovering protein complexes. PMID:23046740
Rodriguez-Rivas, Juan; Marsili, Simone; Juan, David; Valencia, Alfonso
2016-12-27
Protein-protein interactions are fundamental for the proper functioning of the cell. As a result, protein interaction surfaces are subject to strong evolutionary constraints. Recent developments have shown that residue coevolution provides accurate predictions of heterodimeric protein interfaces from sequence information. So far these approaches have been limited to the analysis of families of prokaryotic complexes for which large multiple sequence alignments of homologous sequences can be compiled. We explore the hypothesis that coevolution points to structurally conserved contacts at protein-protein interfaces, which can be reliably projected to homologous complexes with distantly related sequences. We introduce a domain-centered protocol to study the interplay between residue coevolution and structural conservation of protein-protein interfaces. We show that sequence-based coevolutionary analysis systematically identifies residue contacts at prokaryotic interfaces that are structurally conserved at the interface of their eukaryotic counterparts. In turn, this allows the prediction of conserved contacts at eukaryotic protein-protein interfaces with high confidence using solely mutational patterns extracted from prokaryotic genomes. Even in the context of high divergence in sequence (the twilight zone), where standard homology modeling of protein complexes is unreliable, our approach provides sequence-based accurate information about specific details of protein interactions at the residue level. Selected examples of the application of prokaryotic coevolutionary analysis to the prediction of eukaryotic interfaces further illustrate the potential of this approach.
The protein-protein interface evolution acts in a similar way to antibody affinity maturation.
Li, Bohua; Zhao, Lei; Wang, Chong; Guo, Huaizu; Wu, Lan; Zhang, Xunming; Qian, Weizhu; Wang, Hao; Guo, Yajun
2010-02-05
Understanding the evolutionary mechanism that acts at the interfaces of protein-protein complexes is a fundamental issue with high interest for delineating the macromolecular complexes and networks responsible for regulation and complexity in biological systems. To investigate whether the evolution of protein-protein interface acts in a similar way as antibody affinity maturation, we incorporated evolutionary information derived from antibody affinity maturation with common simulation techniques to evaluate prediction success rates of the computational method in affinity improvement in four different systems: antibody-receptor, antibody-peptide, receptor-membrane ligand, and receptor-soluble ligand. It was interesting to find that the same evolutionary information could improve the prediction success rates in all the four protein-protein complexes with an exceptional high accuracy (>57%). One of the most striking findings in our present study is that not only in the antibody-combining site but in other protein-protein interfaces almost all of the affinity-enhancing mutations are located at the germline hotspot sequences (RGYW or WA), indicating that DNA hot spot mechanisms may be widely used in the evolution of protein-protein interfaces. Our data suggest that the evolution of distinct protein-protein interfaces may use the same basic strategy under selection pressure to maintain interactions. Additionally, our data indicate that classical simulation techniques incorporating the evolutionary information derived from in vivo antibody affinity maturation can be utilized as a powerful tool to improve the binding affinity of protein-protein complex with a high accuracy.
Prediction of physical protein protein interactions
NASA Astrophysics Data System (ADS)
Szilágyi, András; Grimm, Vera; Arakaki, Adrián K.; Skolnick, Jeffrey
2005-06-01
Many essential cellular processes such as signal transduction, transport, cellular motion and most regulatory mechanisms are mediated by protein-protein interactions. In recent years, new experimental techniques have been developed to discover the protein-protein interaction networks of several organisms. However, the accuracy and coverage of these techniques have proven to be limited, and computational approaches remain essential both to assist in the design and validation of experimental studies and for the prediction of interaction partners and detailed structures of protein complexes. Here, we provide a critical overview of existing structure-independent and structure-based computational methods. Although these techniques have significantly advanced in the past few years, we find that most of them are still in their infancy. We also provide an overview of experimental techniques for the detection of protein-protein interactions. Although the developments are promising, false positive and false negative results are common, and reliable detection is possible only by taking a consensus of different experimental approaches. The shortcomings of experimental techniques affect both the further development and the fair evaluation of computational prediction methods. For an adequate comparative evaluation of prediction and high-throughput experimental methods, an appropriately large benchmark set of biophysically characterized protein complexes would be needed, but is sorely lacking.
Darabi Sahneh, Faryad; Scoglio, Caterina; Riviere, Jim
2013-01-01
Nanoparticle-protein corona complex formation involves absorption of protein molecules onto nanoparticle surfaces in a physiological environment. Understanding the corona formation process is crucial in predicting nanoparticle behavior in biological systems, including applications of nanotoxicology and development of nano drug delivery platforms. This paper extends the modeling work in to derive a mathematical model describing the dynamics of nanoparticle corona complex formation from population balance equations. We apply nonlinear dynamics techniques to derive analytical results for the composition of nanoparticle-protein corona complex, and validate our results through numerical simulations. The model presented in this paper exhibits two phases of corona complex dynamics. In the first phase, proteins rapidly bind to the free surface of nanoparticles, leading to a metastable composition. During the second phase, continuous association and dissociation of protein molecules with nanoparticles slowly changes the composition of the corona complex. Given sufficient time, composition of the corona complex reaches an equilibrium state of stable composition. We find analytical approximate formulae for metastable and stable compositions of corona complex. Our formulae are very well-structured to clearly identify important parameters determining corona composition. The dynamics of biocorona formation constitute vital aspect of interactions between nanoparticles and living organisms. Our results further understanding of these dynamics through quantitation of experimental conditions, modeling results for in vitro systems to better predict behavior for in vivo systems. One potential application would involve a single cell culture medium related to a complex protein medium, such as blood or tissue fluid.
NASA Astrophysics Data System (ADS)
Rao, V. S. R.; Biswas, Margaret; Mukhopadhyay, Chaitali; Balaji, P. V.
1989-03-01
The CCEM method (Contact Criteria and Energy Minimisation) has been developed and applied to study protein-carbohydrate interactions. The method uses available X-ray data even on the native protein at low resolution (above 2.4 Å) to generate realistic models of a variety of proteins with various ligands. The two examples discussed in this paper are arabinose-binding protein (ABP) and pea lectin. The X-ray crystal structure data reported on ABP-β- L-arabinose complex at 2.8, 2.4 and 1.7 Å resolution differ drastically in predicting the nature of the interactions between the protein and ligand. It is shown that, using the data at 2.4 Å resolution, the CCEM method generates complexes which are as good as the higher (1.7 Å) resolution data. The CCEM method predicts some of the important hydrogen bonds between the ligand and the protein which are missing in the interpretation of the X-ray data at 2.4 Å resolution. The theoretically predicted hydrogen bonds are in good agreement with those reported at 1.7 Å resolution. Pea lectin has been solved only in the native form at 3 Å resolution. Application of the CCEM method also enables us to generate complexes of pea lectin with methyl-α- D-glucopyranoside and methyl-2,3-dimethyl-α- D-glucopyranoside which explain well the available experimental data in solution.
Sardiu, Mihaela E; Gilmore, Joshua M; Carrozza, Michael J; Li, Bing; Workman, Jerry L; Florens, Laurence; Washburn, Michael P
2009-10-06
Protein complexes are key molecular machines executing a variety of essential cellular processes. Despite the availability of genome-wide protein-protein interaction studies, determining the connectivity between proteins within a complex remains a major challenge. Here we demonstrate a method that is able to predict the relationship of proteins within a stable protein complex. We employed a combination of computational approaches and a systematic collection of quantitative proteomics data from wild-type and deletion strain purifications to build a quantitative deletion-interaction network map and subsequently convert the resulting data into an interdependency-interaction model of a complex. We applied this approach to a data set generated from components of the Saccharomyces cerevisiae Rpd3 histone deacetylase complexes, which consists of two distinct small and large complexes that are held together by a module consisting of Rpd3, Sin3 and Ume1. The resulting representation reveals new protein-protein interactions and new submodule relationships, providing novel information for mapping the functional organization of a complex.
PrePhyloPro: phylogenetic profile-based prediction of whole proteome linkages
Niu, Yulong; Liu, Chengcheng; Moghimyfiroozabad, Shayan; Yang, Yi
2017-01-01
Direct and indirect functional links between proteins as well as their interactions as part of larger protein complexes or common signaling pathways may be predicted by analyzing the correlation of their evolutionary patterns. Based on phylogenetic profiling, here we present a highly scalable and time-efficient computational framework for predicting linkages within the whole human proteome. We have validated this method through analysis of 3,697 human pathways and molecular complexes and a comparison of our results with the prediction outcomes of previously published co-occurrency model-based and normalization methods. Here we also introduce PrePhyloPro, a web-based software that uses our method for accurately predicting proteome-wide linkages. We present data on interactions of human mitochondrial proteins, verifying the performance of this software. PrePhyloPro is freely available at http://prephylopro.org/phyloprofile/. PMID:28875072
Predicting disease-related proteins based on clique backbone in protein-protein interaction network.
Yang, Lei; Zhao, Xudong; Tang, Xianglong
2014-01-01
Network biology integrates different kinds of data, including physical or functional networks and disease gene sets, to interpret human disease. A clique (maximal complete subgraph) in a protein-protein interaction network is a topological module and possesses inherently biological significance. A disease-related clique possibly associates with complex diseases. Fully identifying disease components in a clique is conductive to uncovering disease mechanisms. This paper proposes an approach of predicting disease proteins based on cliques in a protein-protein interaction network. To tolerate false positive and negative interactions in protein networks, extending cliques and scoring predicted disease proteins with gene ontology terms are introduced to the clique-based method. Precisions of predicted disease proteins are verified by disease phenotypes and steadily keep to more than 95%. The predicted disease proteins associated with cliques can partly complement mapping between genotype and phenotype, and provide clues for understanding the pathogenesis of serious diseases.
ClusPro: an automated docking and discrimination method for the prediction of protein complexes.
Comeau, Stephen R; Gatchell, David W; Vajda, Sandor; Camacho, Carlos J
2004-01-01
Predicting protein interactions is one of the most challenging problems in functional genomics. Given two proteins known to interact, current docking methods evaluate billions of docked conformations by simple scoring functions, and in addition to near-native structures yield many false positives, i.e. structures with good surface complementarity but far from the native. We have developed a fast algorithm for filtering docked conformations with good surface complementarity, and ranking them based on their clustering properties. The free energy filters select complexes with lowest desolvation and electrostatic energies. Clustering is then used to smooth the local minima and to select the ones with the broadest energy wells-a property associated with the free energy at the binding site. The robustness of the method was tested on sets of 2000 docked conformations generated for 48 pairs of interacting proteins. In 31 of these cases, the top 10 predictions include at least one near-native complex, with an average RMSD of 5 A from the native structure. The docking and discrimination method also provides good results for a number of complexes that were used as targets in the Critical Assessment of PRedictions of Interactions experiment. The fully automated docking and discrimination server ClusPro can be found at http://structure.bu.edu
Highly Reproducible Label Free Quantitative Proteomic Analysis of RNA Polymerase Complexes*
Mosley, Amber L.; Sardiu, Mihaela E.; Pattenden, Samantha G.; Workman, Jerry L.; Florens, Laurence; Washburn, Michael P.
2011-01-01
The use of quantitative proteomics methods to study protein complexes has the potential to provide in-depth information on the abundance of different protein components as well as their modification state in various cellular conditions. To interrogate protein complex quantitation using shotgun proteomic methods, we have focused on the analysis of protein complexes using label-free multidimensional protein identification technology and studied the reproducibility of biological replicates. For these studies, we focused on three highly related and essential multi-protein enzymes, RNA polymerase I, II, and III from Saccharomyces cerevisiae. We found that label-free quantitation using spectral counting is highly reproducible at the protein and peptide level when analyzing RNA polymerase I, II, and III. In addition, we show that peptide sampling does not follow a random sampling model, and we show the need for advanced computational models to predict peptide detection probabilities. In order to address these issues, we used the APEX protocol to model the expected peptide detectability based on whole cell lysate acquired using the same multidimensional protein identification technology analysis used for the protein complexes. Neither method was able to predict the peptide sampling levels that we observed using replicate multidimensional protein identification technology analyses. In addition to the analysis of the RNA polymerase complexes, our analysis provides quantitative information about several RNAP associated proteins including the RNAPII elongation factor complexes DSIF and TFIIF. Our data shows that DSIF and TFIIF are the most highly enriched RNAP accessory factors in Rpb3-TAP purifications and demonstrate our ability to measure low level associated protein abundance across biological replicates. In addition, our quantitative data supports a model in which DSIF and TFIIF interact with RNAPII in a dynamic fashion in agreement with previously published reports. PMID:21048197
A new test set for validating predictions of protein-ligand interaction.
Nissink, J Willem M; Murray, Chris; Hartshorn, Mike; Verdonk, Marcel L; Cole, Jason C; Taylor, Robin
2002-12-01
We present a large test set of protein-ligand complexes for the purpose of validating algorithms that rely on the prediction of protein-ligand interactions. The set consists of 305 complexes with protonation states assigned by manual inspection. The following checks have been carried out to identify unsuitable entries in this set: (1) assessing the involvement of crystallographically related protein units in ligand binding; (2) identification of bad clashes between protein side chains and ligand; and (3) assessment of structural errors, and/or inconsistency of ligand placement with crystal structure electron density. In addition, the set has been pruned to assure diversity in terms of protein-ligand structures, and subsets are supplied for different protein-structure resolution ranges. A classification of the set by protein type is available. As an illustration, validation results are shown for GOLD and SuperStar. GOLD is a program that performs flexible protein-ligand docking, and SuperStar is used for the prediction of favorable interaction sites in proteins. The new CCDC/Astex test set is freely available to the scientific community (http://www.ccdc.cam.ac.uk). Copyright 2002 Wiley-Liss, Inc.
Kilambi, Krishna Praneeth; Pacella, Michael S; Xu, Jianqing; Labonte, Jason W; Porter, Justin R; Muthu, Pravin; Drew, Kevin; Kuroda, Daisuke; Schueler-Furman, Ora; Bonneau, Richard; Gray, Jeffrey J
2013-12-01
Rounds 20-27 of the Critical Assessment of PRotein Interactions (CAPRI) provided a testing platform for computational methods designed to address a wide range of challenges. The diverse targets drove the creation of and new combinations of computational tools. In this study, RosettaDock and other novel Rosetta protocols were used to successfully predict four of the 10 blind targets. For example, for DNase domain of Colicin E2-Im2 immunity protein, RosettaDock and RosettaLigand were used to predict the positions of water molecules at the interface, recovering 46% of the native water-mediated contacts. For α-repeat Rep4-Rep2 and g-type lysozyme-PliG inhibitor complexes, homology models were built and standard and pH-sensitive docking algorithms were used to generate structures with interface RMSD values of 3.3 Å and 2.0 Å, respectively. A novel flexible sugar-protein docking protocol was also developed and used for structure prediction of the BT4661-heparin-like saccharide complex, recovering 71% of the native contacts. Challenges remain in the generation of accurate homology models for protein mutants and sampling during global docking. On proteins designed to bind influenza hemagglutinin, only about half of the mutations were identified that affect binding (T55: 54%; T56: 48%). The prediction of the structure of the xylanase complex involving homology modeling and multidomain docking pushed the limits of global conformational sampling and did not result in any successful prediction. The diversity of problems at hand requires computational algorithms to be versatile; the recent additions to the Rosetta suite expand the capabilities to encompass more biologically realistic docking problems. Copyright © 2013 Wiley Periodicals, Inc.
Yu, Dongjun; Wu, Xiaowei; Shen, Hongbin; Yang, Jian; Tang, Zhenmin; Qi, Yong; Yang, Jingyu
2012-12-01
Membrane proteins are encoded by ~ 30% in the genome and function importantly in the living organisms. Previous studies have revealed that membrane proteins' structures and functions show obvious cell organelle-specific properties. Hence, it is highly desired to predict membrane protein's subcellular location from the primary sequence considering the extreme difficulties of membrane protein wet-lab studies. Although many models have been developed for predicting protein subcellular locations, only a few are specific to membrane proteins. Existing prediction approaches were constructed based on statistical machine learning algorithms with serial combination of multi-view features, i.e., different feature vectors are simply serially combined to form a super feature vector. However, such simple combination of features will simultaneously increase the information redundancy that could, in turn, deteriorate the final prediction accuracy. That's why it was often found that prediction success rates in the serial super space were even lower than those in a single-view space. The purpose of this paper is investigation of a proper method for fusing multiple multi-view protein sequential features for subcellular location predictions. Instead of serial strategy, we propose a novel parallel framework for fusing multiple membrane protein multi-view attributes that will represent protein samples in complex spaces. We also proposed generalized principle component analysis (GPCA) for feature reduction purpose in the complex geometry. All the experimental results through different machine learning algorithms on benchmark membrane protein subcellular localization datasets demonstrate that the newly proposed parallel strategy outperforms the traditional serial approach. We also demonstrate the efficacy of the parallel strategy on a soluble protein subcellular localization dataset indicating the parallel technique is flexible to suite for other computational biology problems. The software and datasets are available at: http://www.csbio.sjtu.edu.cn/bioinf/mpsp.
Xu, Xianjin; Qiu, Liming; Yan, Chengfei; Ma, Zhiwei; Grinter, Sam Z; Zou, Xiaoqin
2017-03-01
Protein-protein interactions are either through direct contacts between two binding partners or mediated by structural waters. Both direct contacts and water-mediated interactions are crucial to the formation of a protein-protein complex. During the recent CAPRI rounds, a novel parallel searching strategy for predicting water-mediated interactions is introduced into our protein-protein docking method, MDockPP. Briefly, a FFT-based docking algorithm is employed in generating putative binding modes, and an iteratively derived statistical potential-based scoring function, ITScorePP, in conjunction with biological information is used to assess and rank the binding modes. Up to 10 binding modes are selected as the initial protein-protein complex structures for MD simulations in explicit solvent. Water molecules near the interface are clustered based on the snapshots extracted from independent equilibrated trajectories. Then, protein-ligand docking is employed for a parallel search for water molecules near the protein-protein interface. The water molecules generated by ligand docking and the clustered water molecules generated by MD simulations are merged, referred to as the predicted structural water molecules. Here, we report the performance of this protocol for CAPRI rounds 28-29 and 31-35 containing 20 valid docking targets and 11 scoring targets. In the docking experiments, we predicted correct binding modes for nine targets, including one high-accuracy, two medium-accuracy, and six acceptable predictions. Regarding the two targets for the prediction of water-mediated interactions, we achieved models ranked as "excellent" in accordance with the CAPRI evaluation criteria; one of these two targets is considered as a difficult target for structural water prediction. Proteins 2017; 85:424-434. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
Kuzu, Guray; Keskin, Ozlem; Nussinov, Ruth; Gursoy, Attila
2016-10-01
The structures of protein assemblies are important for elucidating cellular processes at the molecular level. Three-dimensional electron microscopy (3DEM) is a powerful method to identify the structures of assemblies, especially those that are challenging to study by crystallography. Here, a new approach, PRISM-EM, is reported to computationally generate plausible structural models using a procedure that combines crystallographic structures and density maps obtained from 3DEM. The predictions are validated against seven available structurally different crystallographic complexes. The models display mean deviations in the backbone of <5 Å. PRISM-EM was further tested on different benchmark sets; the accuracy was evaluated with respect to the structure of the complex, and the correlation with EM density maps and interface predictions were evaluated and compared with those obtained using other methods. PRISM-EM was then used to predict the structure of the ternary complex of the HIV-1 envelope glycoprotein trimer, the ligand CD4 and the neutralizing protein m36.
Multi-level machine learning prediction of protein-protein interactions in Saccharomyces cerevisiae.
Zubek, Julian; Tatjewski, Marcin; Boniecki, Adam; Mnich, Maciej; Basu, Subhadip; Plewczynski, Dariusz
2015-01-01
Accurate identification of protein-protein interactions (PPI) is the key step in understanding proteins' biological functions, which are typically context-dependent. Many existing PPI predictors rely on aggregated features from protein sequences, however only a few methods exploit local information about specific residue contacts. In this work we present a two-stage machine learning approach for prediction of protein-protein interactions. We start with the carefully filtered data on protein complexes available for Saccharomyces cerevisiae in the Protein Data Bank (PDB) database. First, we build linear descriptions of interacting and non-interacting sequence segment pairs based on their inter-residue distances. Secondly, we train machine learning classifiers to predict binary segment interactions for any two short sequence fragments. The final prediction of the protein-protein interaction is done using the 2D matrix representation of all-against-all possible interacting sequence segments of both analysed proteins. The level-I predictor achieves 0.88 AUC for micro-scale, i.e., residue-level prediction. The level-II predictor improves the results further by a more complex learning paradigm. We perform 30-fold macro-scale, i.e., protein-level cross-validation experiment. The level-II predictor using PSIPRED-predicted secondary structure reaches 0.70 precision, 0.68 recall, and 0.70 AUC, whereas other popular methods provide results below 0.6 threshold (recall, precision, AUC). Our results demonstrate that multi-scale sequence features aggregation procedure is able to improve the machine learning results by more than 10% as compared to other sequence representations. Prepared datasets and source code for our experimental pipeline are freely available for download from: http://zubekj.github.io/mlppi/ (open source Python implementation, OS independent).
Darabi Sahneh, Faryad; Scoglio, Caterina; Riviere, Jim
2013-01-01
Background Nanoparticle-protein corona complex formation involves absorption of protein molecules onto nanoparticle surfaces in a physiological environment. Understanding the corona formation process is crucial in predicting nanoparticle behavior in biological systems, including applications of nanotoxicology and development of nano drug delivery platforms. Method This paper extends the modeling work in to derive a mathematical model describing the dynamics of nanoparticle corona complex formation from population balance equations. We apply nonlinear dynamics techniques to derive analytical results for the composition of nanoparticle-protein corona complex, and validate our results through numerical simulations. Results The model presented in this paper exhibits two phases of corona complex dynamics. In the first phase, proteins rapidly bind to the free surface of nanoparticles, leading to a metastable composition. During the second phase, continuous association and dissociation of protein molecules with nanoparticles slowly changes the composition of the corona complex. Given sufficient time, composition of the corona complex reaches an equilibrium state of stable composition. We find analytical approximate formulae for metastable and stable compositions of corona complex. Our formulae are very well-structured to clearly identify important parameters determining corona composition. Conclusion The dynamics of biocorona formation constitute vital aspect of interactions between nanoparticles and living organisms. Our results further understanding of these dynamics through quantitation of experimental conditions, modeling results for in vitro systems to better predict behavior for in vivo systems. One potential application would involve a single cell culture medium related to a complex protein medium, such as blood or tissue fluid. PMID:23741371
Tuncbag, Nurcan; Gursoy, Attila; Nussinov, Ruth; Keskin, Ozlem
2011-08-11
Prediction of protein-protein interactions at the structural level on the proteome scale is important because it allows prediction of protein function, helps drug discovery and takes steps toward genome-wide structural systems biology. We provide a protocol (termed PRISM, protein interactions by structural matching) for large-scale prediction of protein-protein interactions and assembly of protein complex structures. The method consists of two components: rigid-body structural comparisons of target proteins to known template protein-protein interfaces and flexible refinement using a docking energy function. The PRISM rationale follows our observation that globally different protein structures can interact via similar architectural motifs. PRISM predicts binding residues by using structural similarity and evolutionary conservation of putative binding residue 'hot spots'. Ultimately, PRISM could help to construct cellular pathways and functional, proteome-scale annotation. PRISM is implemented in Python and runs in a UNIX environment. The program accepts Protein Data Bank-formatted protein structures and is available at http://prism.ccbb.ku.edu.tr/prism_protocol/.
Rigid-Docking Approaches to Explore Protein-Protein Interaction Space.
Matsuzaki, Yuri; Uchikoga, Nobuyuki; Ohue, Masahito; Akiyama, Yutaka
Protein-protein interactions play core roles in living cells, especially in the regulatory systems. As information on proteins has rapidly accumulated on publicly available databases, much effort has been made to obtain a better picture of protein-protein interaction networks using protein tertiary structure data. Predicting relevant interacting partners from their tertiary structure is a challenging task and computer science methods have the potential to assist with this. Protein-protein rigid docking has been utilized by several projects, docking-based approaches having the advantages that they can suggest binding poses of predicted binding partners which would help in understanding the interaction mechanisms and that comparing docking results of both non-binders and binders can lead to understanding the specificity of protein-protein interactions from structural viewpoints. In this review we focus on explaining current computational prediction methods to predict pairwise direct protein-protein interactions that form protein complexes.
Rodriguez-Rivas, Juan; Marsili, Simone; Juan, David; Valencia, Alfonso
2016-01-01
Protein–protein interactions are fundamental for the proper functioning of the cell. As a result, protein interaction surfaces are subject to strong evolutionary constraints. Recent developments have shown that residue coevolution provides accurate predictions of heterodimeric protein interfaces from sequence information. So far these approaches have been limited to the analysis of families of prokaryotic complexes for which large multiple sequence alignments of homologous sequences can be compiled. We explore the hypothesis that coevolution points to structurally conserved contacts at protein–protein interfaces, which can be reliably projected to homologous complexes with distantly related sequences. We introduce a domain-centered protocol to study the interplay between residue coevolution and structural conservation of protein–protein interfaces. We show that sequence-based coevolutionary analysis systematically identifies residue contacts at prokaryotic interfaces that are structurally conserved at the interface of their eukaryotic counterparts. In turn, this allows the prediction of conserved contacts at eukaryotic protein–protein interfaces with high confidence using solely mutational patterns extracted from prokaryotic genomes. Even in the context of high divergence in sequence (the twilight zone), where standard homology modeling of protein complexes is unreliable, our approach provides sequence-based accurate information about specific details of protein interactions at the residue level. Selected examples of the application of prokaryotic coevolutionary analysis to the prediction of eukaryotic interfaces further illustrate the potential of this approach. PMID:27965389
Andreopoulos, Bill; Winter, Christof; Labudde, Dirk; Schroeder, Michael
2009-06-27
A lot of high-throughput studies produce protein-protein interaction networks (PPINs) with many errors and missing information. Even for genome-wide approaches, there is often a low overlap between PPINs produced by different studies. Second-level neighbors separated by two protein-protein interactions (PPIs) were previously used for predicting protein function and finding complexes in high-error PPINs. We retrieve second level neighbors in PPINs, and complement these with structural domain-domain interactions (SDDIs) representing binding evidence on proteins, forming PPI-SDDI-PPI triangles. We find low overlap between PPINs, SDDIs and known complexes, all well below 10%. We evaluate the overlap of PPI-SDDI-PPI triangles with known complexes from Munich Information center for Protein Sequences (MIPS). PPI-SDDI-PPI triangles have ~20 times higher overlap with MIPS complexes than using second-level neighbors in PPINs without SDDIs. The biological interpretation for triangles is that a SDDI causes two proteins to be observed with common interaction partners in high-throughput experiments. The relatively few SDDIs overlapping with PPINs are part of highly connected SDDI components, and are more likely to be detected in experimental studies. We demonstrate the utility of PPI-SDDI-PPI triangles by reconstructing myosin-actin processes in the nucleus, cytoplasm, and cytoskeleton, which were not obvious in the original PPIN. Using other complementary datatypes in place of SDDIs to form triangles, such as PubMed co-occurrences or threading information, results in a similar ability to find protein complexes. Given high-error PPINs with missing information, triangles of mixed datatypes are a promising direction for finding protein complexes. Integrating PPINs with SDDIs improves finding complexes. Structural SDDIs partially explain the high functional similarity of second-level neighbors in PPINs. We estimate that relatively little structural information would be sufficient for finding complexes involving most of the proteins and interactions in a typical PPIN.
Andreopoulos, Bill; Winter, Christof; Labudde, Dirk; Schroeder, Michael
2009-01-01
Background A lot of high-throughput studies produce protein-protein interaction networks (PPINs) with many errors and missing information. Even for genome-wide approaches, there is often a low overlap between PPINs produced by different studies. Second-level neighbors separated by two protein-protein interactions (PPIs) were previously used for predicting protein function and finding complexes in high-error PPINs. We retrieve second level neighbors in PPINs, and complement these with structural domain-domain interactions (SDDIs) representing binding evidence on proteins, forming PPI-SDDI-PPI triangles. Results We find low overlap between PPINs, SDDIs and known complexes, all well below 10%. We evaluate the overlap of PPI-SDDI-PPI triangles with known complexes from Munich Information center for Protein Sequences (MIPS). PPI-SDDI-PPI triangles have ~20 times higher overlap with MIPS complexes than using second-level neighbors in PPINs without SDDIs. The biological interpretation for triangles is that a SDDI causes two proteins to be observed with common interaction partners in high-throughput experiments. The relatively few SDDIs overlapping with PPINs are part of highly connected SDDI components, and are more likely to be detected in experimental studies. We demonstrate the utility of PPI-SDDI-PPI triangles by reconstructing myosin-actin processes in the nucleus, cytoplasm, and cytoskeleton, which were not obvious in the original PPIN. Using other complementary datatypes in place of SDDIs to form triangles, such as PubMed co-occurrences or threading information, results in a similar ability to find protein complexes. Conclusion Given high-error PPINs with missing information, triangles of mixed datatypes are a promising direction for finding protein complexes. Integrating PPINs with SDDIs improves finding complexes. Structural SDDIs partially explain the high functional similarity of second-level neighbors in PPINs. We estimate that relatively little structural information would be sufficient for finding complexes involving most of the proteins and interactions in a typical PPIN. PMID:19558694
A benchmark testing ground for integrating homology modeling and protein docking.
Bohnuud, Tanggis; Luo, Lingqi; Wodak, Shoshana J; Bonvin, Alexandre M J J; Weng, Zhiping; Vajda, Sandor; Schueler-Furman, Ora; Kozakov, Dima
2017-01-01
Protein docking procedures carry out the task of predicting the structure of a protein-protein complex starting from the known structures of the individual protein components. More often than not, however, the structure of one or both components is not known, but can be derived by homology modeling on the basis of known structures of related proteins deposited in the Protein Data Bank (PDB). Thus, the problem is to develop methods that optimally integrate homology modeling and docking with the goal of predicting the structure of a complex directly from the amino acid sequences of its component proteins. One possibility is to use the best available homology modeling and docking methods. However, the models built for the individual subunits often differ to a significant degree from the bound conformation in the complex, often much more so than the differences observed between free and bound structures of the same protein, and therefore additional conformational adjustments, both at the backbone and side chain levels need to be modeled to achieve an accurate docking prediction. In particular, even homology models of overall good accuracy frequently include localized errors that unfavorably impact docking results. The predicted reliability of the different regions in the model can also serve as a useful input for the docking calculations. Here we present a benchmark dataset that should help to explore and solve combined modeling and docking problems. This dataset comprises a subset of the experimentally solved 'target' complexes from the widely used Docking Benchmark from the Weng Lab (excluding antibody-antigen complexes). This subset is extended to include the structures from the PDB related to those of the individual components of each complex, and hence represent potential templates for investigating and benchmarking integrated homology modeling and docking approaches. Template sets can be dynamically customized by specifying ranges in sequence similarity and in PDB release dates, or using other filtering options, such as excluding sets of specific structures from the template list. Multiple sequence alignments, as well as structural alignments of the templates to their corresponding subunits in the target are also provided. The resource is accessible online or can be downloaded at http://cluspro.org/benchmark, and is updated on a weekly basis in synchrony with new PDB releases. Proteins 2016; 85:10-16. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
Agrawal, Neeraj J; Helk, Bernhard; Trout, Bernhardt L
2014-01-21
Identifying hot-spot residues - residues that are critical to protein-protein binding - can help to elucidate a protein's function and assist in designing therapeutic molecules to target those residues. We present a novel computational tool, termed spatial-interaction-map (SIM), to predict the hot-spot residues of an evolutionarily conserved protein-protein interaction from the structure of an unbound protein alone. SIM can predict the protein hot-spot residues with an accuracy of 36-57%. Thus, the SIM tool can be used to predict the yet unknown hot-spot residues for many proteins for which the structure of the protein-protein complexes are not available, thereby providing a clue to their functions and an opportunity to design therapeutic molecules to target these proteins. Copyright © 2013 Federation of European Biochemical Societies. Published by Elsevier B.V. All rights reserved.
Zahiri, Javad; Mohammad-Noori, Morteza; Ebrahimpour, Reza; Saadat, Samaneh; Bozorgmehr, Joseph H; Goldberg, Tatyana; Masoudi-Nejad, Ali
2014-12-01
Protein-protein interaction (PPI) detection is one of the central goals of functional genomics and systems biology. Knowledge about the nature of PPIs can help fill the widening gap between sequence information and functional annotations. Although experimental methods have produced valuable PPI data, they also suffer from significant limitations. Computational PPI prediction methods have attracted tremendous attentions. Despite considerable efforts, PPI prediction is still in its infancy in complex multicellular organisms such as humans. Here, we propose a novel ensemble learning method, LocFuse, which is useful in human PPI prediction. This method uses eight different genomic and proteomic features along with four types of different classifiers. The prediction performance of this classifier selection method was found to be considerably better than methods employed hitherto. This confirms the complex nature of the PPI prediction problem and also the necessity of using biological information for classifier fusion. The LocFuse is available at: http://lbb.ut.ac.ir/Download/LBBsoft/LocFuse. The results revealed that if we divide proteome space according to the cellular localization of proteins, then the utility of some classifiers in PPI prediction can be improved. Therefore, to predict the interaction for any given protein pair, we can select the most accurate classifier with regard to the cellular localization information. Based on the results, we can say that the importance of different features for PPI prediction varies between differently localized proteins; however in general, our novel features, which were extracted from position-specific scoring matrices (PSSMs), are the most important ones and the Random Forest (RF) classifier performs best in most cases. LocFuse was developed with a user-friendly graphic interface and it is freely available for Linux, Mac OSX and MS Windows operating systems. Copyright © 2014 Elsevier Inc. All rights reserved.
A discriminatory function for prediction of protein-DNA interactions based on alpha shape modeling.
Zhou, Weiqiang; Yan, Hong
2010-10-15
Protein-DNA interaction has significant importance in many biological processes. However, the underlying principle of the molecular recognition process is still largely unknown. As more high-resolution 3D structures of protein-DNA complex are becoming available, the surface characteristics of the complex become an important research topic. In our work, we apply an alpha shape model to represent the surface structure of the protein-DNA complex and developed an interface-atom curvature-dependent conditional probability discriminatory function for the prediction of protein-DNA interaction. The interface-atom curvature-dependent formalism captures atomic interaction details better than the atomic distance-based method. The proposed method provides good performance in discriminating the native structures from the docking decoy sets, and outperforms the distance-dependent formalism in terms of the z-score. Computer experiment results show that the curvature-dependent formalism with the optimal parameters can achieve a native z-score of -8.17 in discriminating the native structure from the highest surface-complementarity scored decoy set and a native z-score of -7.38 in discriminating the native structure from the lowest RMSD decoy set. The interface-atom curvature-dependent formalism can also be used to predict apo version of DNA-binding proteins. These results suggest that the interface-atom curvature-dependent formalism has a good prediction capability for protein-DNA interactions. The code and data sets are available for download on http://www.hy8.com/bioinformatics.htm kenandzhou@hotmail.com.
A Method for WD40 Repeat Detection and Secondary Structure Prediction
Wang, Yang; Jiang, Fan; Zhuo, Zhu; Wu, Xian-Hui; Wu, Yun-Dong
2013-01-01
WD40-repeat proteins (WD40s), as one of the largest protein families in eukaryotes, play vital roles in assembling protein-protein/DNA/RNA complexes. WD40s fold into similar β-propeller structures despite diversified sequences. A program WDSP (WD40 repeat protein Structure Predictor) has been developed to accurately identify WD40 repeats and predict their secondary structures. The method is designed specifically for WD40 proteins by incorporating both local residue information and non-local family-specific structural features. It overcomes the problem of highly diversified protein sequences and variable loops. In addition, WDSP achieves a better prediction in identifying multiple WD40-domain proteins by taking the global combination of repeats into consideration. In secondary structure prediction, the average Q3 accuracy of WDSP in jack-knife test reaches 93.7%. A disease related protein LRRK2 was used as a representive example to demonstrate the structure prediction. PMID:23776530
2010-01-01
Background The reconstruction of protein complexes from the physical interactome of organisms serves as a building block towards understanding the higher level organization of the cell. Over the past few years, several independent high-throughput experiments have helped to catalogue enormous amount of physical protein interaction data from organisms such as yeast. However, these individual datasets show lack of correlation with each other and also contain substantial number of false positives (noise). Over these years, several affinity scoring schemes have also been devised to improve the qualities of these datasets. Therefore, the challenge now is to detect meaningful as well as novel complexes from protein interaction (PPI) networks derived by combining datasets from multiple sources and by making use of these affinity scoring schemes. In the attempt towards tackling this challenge, the Markov Clustering algorithm (MCL) has proved to be a popular and reasonably successful method, mainly due to its scalability, robustness, and ability to work on scored (weighted) networks. However, MCL produces many noisy clusters, which either do not match known complexes or have additional proteins that reduce the accuracies of correctly predicted complexes. Results Inspired by recent experimental observations by Gavin and colleagues on the modularity structure in yeast complexes and the distinctive properties of "core" and "attachment" proteins, we develop a core-attachment based refinement method coupled to MCL for reconstruction of yeast complexes from scored (weighted) PPI networks. We combine physical interactions from two recent "pull-down" experiments to generate an unscored PPI network. We then score this network using available affinity scoring schemes to generate multiple scored PPI networks. The evaluation of our method (called MCL-CAw) on these networks shows that: (i) MCL-CAw derives larger number of yeast complexes and with better accuracies than MCL, particularly in the presence of natural noise; (ii) Affinity scoring can effectively reduce the impact of noise on MCL-CAw and thereby improve the quality (precision and recall) of its predicted complexes; (iii) MCL-CAw responds well to most available scoring schemes. We discuss several instances where MCL-CAw was successful in deriving meaningful complexes, and where it missed a few proteins or whole complexes due to affinity scoring of the networks. We compare MCL-CAw with several recent complex detection algorithms on unscored and scored networks, and assess the relative performance of the algorithms on these networks. Further, we study the impact of augmenting physical datasets with computationally inferred interactions for complex detection. Finally, we analyse the essentiality of proteins within predicted complexes to understand a possible correlation between protein essentiality and their ability to form complexes. Conclusions We demonstrate that core-attachment based refinement in MCL-CAw improves the predictions of MCL on yeast PPI networks. We show that affinity scoring improves the performance of MCL-CAw. PMID:20939868
Mapping monomeric threading to protein-protein structure prediction.
Guerler, Aysam; Govindarajoo, Brandon; Zhang, Yang
2013-03-25
The key step of template-based protein-protein structure prediction is the recognition of complexes from experimental structure libraries that have similar quaternary fold. Maintaining two monomer and dimer structure libraries is however laborious, and inappropriate library construction can degrade template recognition coverage. We propose a novel strategy SPRING to identify complexes by mapping monomeric threading alignments to protein-protein interactions based on the original oligomer entries in the PDB, which does not rely on library construction and increases the efficiency and quality of complex template recognitions. SPRING is tested on 1838 nonhomologous protein complexes which can recognize correct quaternary template structures with a TM score >0.5 in 1115 cases after excluding homologous proteins. The average TM score of the first model is 60% and 17% higher than that by HHsearch and COTH, respectively, while the number of targets with an interface RMSD <2.5 Å by SPRING is 134% and 167% higher than these competing methods. SPRING is controlled with ZDOCK on 77 docking benchmark proteins. Although the relative performance of SPRING and ZDOCK depends on the level of homology filters, a combination of the two methods can result in a significantly higher model quality than ZDOCK at all homology thresholds. These data demonstrate a new efficient approach to quaternary structure recognition that is ready to use for genome-scale modeling of protein-protein interactions due to the high speed and accuracy.
Improved method for predicting protein fold patterns with ensemble classifiers.
Chen, W; Liu, X; Huang, Y; Jiang, Y; Zou, Q; Lin, C
2012-01-27
Protein folding is recognized as a critical problem in the field of biophysics in the 21st century. Predicting protein-folding patterns is challenging due to the complex structure of proteins. In an attempt to solve this problem, we employed ensemble classifiers to improve prediction accuracy. In our experiments, 188-dimensional features were extracted based on the composition and physical-chemical property of proteins and 20-dimensional features were selected using a coupled position-specific scoring matrix. Compared with traditional prediction methods, these methods were superior in terms of prediction accuracy. The 188-dimensional feature-based method achieved 71.2% accuracy in five cross-validations. The accuracy rose to 77% when we used a 20-dimensional feature vector. These methods were used on recent data, with 54.2% accuracy. Source codes and dataset, together with web server and software tools for prediction, are available at: http://datamining.xmu.edu.cn/main/~cwc/ProteinPredict.html.
Protein-protein interaction predictions using text mining methods.
Papanikolaou, Nikolas; Pavlopoulos, Georgios A; Theodosiou, Theodosios; Iliopoulos, Ioannis
2015-03-01
It is beyond any doubt that proteins and their interactions play an essential role in most complex biological processes. The understanding of their function individually, but also in the form of protein complexes is of a great importance. Nowadays, despite the plethora of various high-throughput experimental approaches for detecting protein-protein interactions, many computational methods aiming to predict new interactions have appeared and gained interest. In this review, we focus on text-mining based computational methodologies, aiming to extract information for proteins and their interactions from public repositories such as literature and various biological databases. We discuss their strengths, their weaknesses and how they complement existing experimental techniques by simultaneously commenting on the biological databases which hold such information and the benchmark datasets that can be used for evaluating new tools. Copyright © 2014 Elsevier Inc. All rights reserved.
Kinoshita, Kengo; Murakami, Yoichi; Nakamura, Haruki
2007-07-01
We have developed a method to predict ligand-binding sites in a new protein structure by searching for similar binding sites in the Protein Data Bank (PDB). The similarities are measured according to the shapes of the molecular surfaces and their electrostatic potentials. A new web server, eF-seek, provides an interface to our search method. It simply requires a coordinate file in the PDB format, and generates a prediction result as a virtual complex structure, with the putative ligands in a PDB format file as the output. In addition, the predicted interacting interface is displayed to facilitate the examination of the virtual complex structure on our own applet viewer with the web browser (URL: http://eF-site.hgc.jp/eF-seek).
Molecular determinants of the interactions between proteins and ssDNA.
Mishra, Garima; Levy, Yaakov
2015-04-21
ssDNA binding proteins (SSBs) protect ssDNA from chemical and enzymatic assault that can derail DNA processing machinery. Complexes between SSBs and ssDNA are often highly stable, but predicting their structures is challenging, mostly because of the inherent flexibility of ssDNA and the geometric and energetic complexity of the interfaces that it forms. Here, we report a newly developed coarse-grained model to predict the structure of SSB-ssDNA complexes. The model is successfully applied to predict the binding modes of six SSBs with ssDNA strands of lengths of 6-65 nt. In addition to charge-charge interactions (which are often central to governing protein interactions with nucleic acids by means of electrostatic complementarity), an essential energetic term to predict SSB-ssDNA complexes is the interactions between aromatic residues and DNA bases. For some systems, flexibility is required from not only the ssDNA but also, the SSB to allow it to undergo conformational changes and the penetration of the ssDNA into its binding pocket. The association mechanisms can be quite varied, and in several cases, they involve the ssDNA sliding along the protein surface. The binding mechanism suggests that coarse-grained models are appropriate to study the motion of SSBs along ssDNA, which is expected to be central to the function carried out by the SSBs.
Li, Jinyu; Rossetti, Giulia; Dreyer, Jens; Raugei, Simone; Ippoliti, Emiliano; Lüscher, Bernhard; Carloni, Paolo
2014-01-01
Protein electrospray ionization (ESI) mass spectrometry (MS)-based techniques are widely used to provide insight into structural proteomics under the assumption that non-covalent protein complexes being transferred into the gas phase preserve basically the same intermolecular interactions as in solution. Here we investigate the applicability of this assumption by extending our previous structural prediction protocol for single proteins in ESI-MS to protein complexes. We apply our protocol to the human insulin dimer (hIns2) as a test case. Our calculations reproduce the main charge and the collision cross section (CCS) measured in ESI-MS experiments. Molecular dynamics simulations for 0.075 ms show that the complex maximizes intermolecular non-bonded interactions relative to the structure in water, without affecting the cross section. The overall gas-phase structure of hIns2 does exhibit differences with the one in aqueous solution, not inferable from a comparison with calculated CCS. Hence, care should be exerted when interpreting ESI-MS proteomics data based solely on NMR and/or X-ray structural information. PMID:25210764
Munteanu, Cristian R; Gonzalez-Diaz, Humberto; Garcia, Rafael; Loza, Mabel; Pazos, Alejandro
2015-01-01
The molecular information encoding into molecular descriptors is the first step into in silico Chemoinformatics methods in Drug Design. The Machine Learning methods are a complex solution to find prediction models for specific biological properties of molecules. These models connect the molecular structure information such as atom connectivity (molecular graphs) or physical-chemical properties of an atom/group of atoms to the molecular activity (Quantitative Structure - Activity Relationship, QSAR). Due to the complexity of the proteins, the prediction of their activity is a complicated task and the interpretation of the models is more difficult. The current review presents a series of 11 prediction models for proteins, implemented as free Web tools on an Artificial Intelligence Model Server in Biosciences, Bio-AIMS (http://bio-aims.udc.es/TargetPred.php). Six tools predict protein activity, two models evaluate drug - protein target interactions and the other three calculate protein - protein interactions. The input information is based on the protein 3D structure for nine models, 1D peptide amino acid sequence for three tools and drug SMILES formulas for two servers. The molecular graph descriptor-based Machine Learning models could be useful tools for in silico screening of new peptides/proteins as future drug targets for specific treatments.
Su, Chinh; Nguyen, Thuy-Diem; Zheng, Jie; Kwoh, Chee-Keong
2014-01-01
Protein-protein docking is an in silico method to predict the formation of protein complexes. Due to limited computational resources, the protein-protein docking approach has been developed under the assumption of rigid docking, in which one of the two protein partners remains rigid during the protein associations and water contribution is ignored or implicitly presented. Despite obtaining a number of acceptable complex predictions, it seems to-date that most initial rigid docking algorithms still find it difficult or even fail to discriminate successfully the correct predictions from the other incorrect or false positive ones. To improve the rigid docking results, re-ranking is one of the effective methods that help re-locate the correct predictions in top high ranks, discriminating them from the other incorrect ones. Our results showed that the IFACEwat increased both the numbers of the near-native structures and improved their ranks as compared to the initial rigid docking ZDOCK3.0.2. In fact, the IFACEwat achieved a success rate of 83.8% for Antigen/Antibody complexes, which is 10% better than ZDOCK3.0.2. As compared to another re-ranking technique ZRANK, the IFACEwat obtains success rates of 92.3% (8% better) and 90% (5% better) respectively for medium and difficult cases. When comparing with the latest published re-ranking method F2Dock, the IFACEwat performed equivalently well or even better for several Antigen/Antibody complexes. With the inclusion of interfacial water, the IFACEwat improves mostly results of the initial rigid docking, especially for Antigen/Antibody complexes. The improvement is achieved by explicitly taking into account the contribution of water during the protein interactions, which was ignored or not fully presented by the initial rigid docking and other re-ranking techniques. In addition, the IFACEwat maintains sufficient computational efficiency of the initial docking algorithm, yet improves the ranks as well as the number of the near native structures found. As our implementation so far targeted to improve the results of ZDOCK3.0.2, and particularly for the Antigen/Antibody complexes, it is expected in the near future that more implementations will be conducted to be applicable for other initial rigid docking algorithms.
Sequence-based prediction of protein-binding sites in DNA: comparative study of two SVM models.
Park, Byungkyu; Im, Jinyong; Tuvshinjargal, Narankhuu; Lee, Wook; Han, Kyungsook
2014-11-01
As many structures of protein-DNA complexes have been known in the past years, several computational methods have been developed to predict DNA-binding sites in proteins. However, its inverse problem (i.e., predicting protein-binding sites in DNA) has received much less attention. One of the reasons is that the differences between the interaction propensities of nucleotides are much smaller than those between amino acids. Another reason is that DNA exhibits less diverse sequence patterns than protein. Therefore, predicting protein-binding DNA nucleotides is much harder than predicting DNA-binding amino acids. We computed the interaction propensity (IP) of nucleotide triplets with amino acids using an extensive dataset of protein-DNA complexes, and developed two support vector machine (SVM) models that predict protein-binding nucleotides from sequence data alone. One SVM model predicts protein-binding nucleotides using DNA sequence data alone, and the other SVM model predicts protein-binding nucleotides using both DNA and protein sequences. In a 10-fold cross-validation with 1519 DNA sequences, the SVM model that uses DNA sequence data only predicted protein-binding nucleotides with an accuracy of 67.0%, an F-measure of 67.1%, and a Matthews correlation coefficient (MCC) of 0.340. With an independent dataset of 181 DNAs that were not used in training, it achieved an accuracy of 66.2%, an F-measure 66.3% and a MCC of 0.324. Another SVM model that uses both DNA and protein sequences achieved an accuracy of 69.6%, an F-measure of 69.6%, and a MCC of 0.383 in a 10-fold cross-validation with 1519 DNA sequences and 859 protein sequences. With an independent dataset of 181 DNAs and 143 proteins, it showed an accuracy of 67.3%, an F-measure of 66.5% and a MCC of 0.329. Both in cross-validation and independent testing, the second SVM model that used both DNA and protein sequence data showed better performance than the first model that used DNA sequence data. To the best of our knowledge, this is the first attempt to predict protein-binding nucleotides in a given DNA sequence from the sequence data alone. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
Consistent prediction of GO protein localization.
Spetale, Flavio E; Arce, Debora; Krsticevic, Flavia; Bulacio, Pilar; Tapia, Elizabeth
2018-05-17
The GO-Cellular Component (GO-CC) ontology provides a controlled vocabulary for the consistent description of the subcellular compartments or macromolecular complexes where proteins may act. Current machine learning-based methods used for the automated GO-CC annotation of proteins suffer from the inconsistency of individual GO-CC term predictions. Here, we present FGGA-CC + , a class of hierarchical graph-based classifiers for the consistent GO-CC annotation of protein coding genes at the subcellular compartment or macromolecular complex levels. Aiming to boost the accuracy of GO-CC predictions, we make use of the protein localization knowledge in the GO-Biological Process (GO-BP) annotations to boost the accuracy of GO-CC prediction. As a result, FGGA-CC + classifiers are built from annotation data in both the GO-CC and GO-BP ontologies. Due to their graph-based design, FGGA-CC + classifiers are fully interpretable and their predictions amenable to expert analysis. Promising results on protein annotation data from five model organisms were obtained. Additionally, successful validation results in the annotation of a challenging subset of tandem duplicated genes in the tomato non-model organism were accomplished. Overall, these results suggest that FGGA-CC + classifiers can indeed be useful for satisfying the huge demand of GO-CC annotation arising from ubiquitous high throughout sequencing and proteomic projects.
Laine, Elodie; Carbone, Alessandra
2015-01-01
Protein-protein interactions (PPIs) are essential to all biological processes and they represent increasingly important therapeutic targets. Here, we present a new method for accurately predicting protein-protein interfaces, understanding their properties, origins and binding to multiple partners. Contrary to machine learning approaches, our method combines in a rational and very straightforward way three sequence- and structure-based descriptors of protein residues: evolutionary conservation, physico-chemical properties and local geometry. The implemented strategy yields very precise predictions for a wide range of protein-protein interfaces and discriminates them from small-molecule binding sites. Beyond its predictive power, the approach permits to dissect interaction surfaces and unravel their complexity. We show how the analysis of the predicted patches can foster new strategies for PPIs modulation and interaction surface redesign. The approach is implemented in JET2, an automated tool based on the Joint Evolutionary Trees (JET) method for sequence-based protein interface prediction. JET2 is freely available at www.lcqb.upmc.fr/JET2. PMID:26690684
Velankar, Sameer; Kryshtafovych, Andriy; Huang, Shen‐You; Schneidman‐Duhovny, Dina; Sali, Andrej; Segura, Joan; Fernandez‐Fuentes, Narcis; Viswanath, Shruthi; Elber, Ron; Grudinin, Sergei; Popov, Petr; Neveu, Emilie; Lee, Hasup; Baek, Minkyung; Park, Sangwoo; Heo, Lim; Rie Lee, Gyu; Seok, Chaok; Qin, Sanbo; Zhou, Huan‐Xiang; Ritchie, David W.; Maigret, Bernard; Devignes, Marie‐Dominique; Ghoorah, Anisah; Torchala, Mieczyslaw; Chaleil, Raphaël A.G.; Bates, Paul A.; Ben‐Zeev, Efrat; Eisenstein, Miriam; Negi, Surendra S.; Weng, Zhiping; Vreven, Thom; Pierce, Brian G.; Borrman, Tyler M.; Yu, Jinchao; Ochsenbein, Françoise; Guerois, Raphaël; Vangone, Anna; Rodrigues, João P.G.L.M.; van Zundert, Gydo; Nellen, Mehdi; Xue, Li; Karaca, Ezgi; Melquiond, Adrien S.J.; Visscher, Koen; Kastritis, Panagiotis L.; Bonvin, Alexandre M.J.J.; Xu, Xianjin; Qiu, Liming; Yan, Chengfei; Li, Jilong; Ma, Zhiwei; Cheng, Jianlin; Zou, Xiaoqin; Shen, Yang; Peterson, Lenna X.; Kim, Hyung‐Rae; Roy, Amit; Han, Xusi; Esquivel‐Rodriguez, Juan; Kihara, Daisuke; Yu, Xiaofeng; Bruce, Neil J.; Fuller, Jonathan C.; Wade, Rebecca C.; Anishchenko, Ivan; Kundrotas, Petras J.; Vakser, Ilya A.; Imai, Kenichiro; Yamada, Kazunori; Oda, Toshiyuki; Nakamura, Tsukasa; Tomii, Kentaro; Pallara, Chiara; Romero‐Durana, Miguel; Jiménez‐García, Brian; Moal, Iain H.; Férnandez‐Recio, Juan; Joung, Jong Young; Kim, Jong Yun; Joo, Keehyoung; Lee, Jooyoung; Kozakov, Dima; Vajda, Sandor; Mottarella, Scott; Hall, David R.; Beglov, Dmitri; Mamonov, Artem; Xia, Bing; Bohnuud, Tanggis; Del Carpio, Carlos A.; Ichiishi, Eichiro; Marze, Nicholas; Kuroda, Daisuke; Roy Burman, Shourya S.; Gray, Jeffrey J.; Chermak, Edrisse; Cavallo, Luigi; Oliva, Romina; Tovchigrechko, Andrey
2016-01-01
ABSTRACT We present the results for CAPRI Round 30, the first joint CASP‐CAPRI experiment, which brought together experts from the protein structure prediction and protein–protein docking communities. The Round comprised 25 targets from amongst those submitted for the CASP11 prediction experiment of 2014. The targets included mostly homodimers, a few homotetramers, and two heterodimers, and comprised protein chains that could readily be modeled using templates from the Protein Data Bank. On average 24 CAPRI groups and 7 CASP groups submitted docking predictions for each target, and 12 CAPRI groups per target participated in the CAPRI scoring experiment. In total more than 9500 models were assessed against the 3D structures of the corresponding target complexes. Results show that the prediction of homodimer assemblies by homology modeling techniques and docking calculations is quite successful for targets featuring large enough subunit interfaces to represent stable associations. Targets with ambiguous or inaccurate oligomeric state assignments, often featuring crystal contact‐sized interfaces, represented a confounding factor. For those, a much poorer prediction performance was achieved, while nonetheless often providing helpful clues on the correct oligomeric state of the protein. The prediction performance was very poor for genuine tetrameric targets, where the inaccuracy of the homology‐built subunit models and the smaller pair‐wise interfaces severely limited the ability to derive the correct assembly mode. Our analysis also shows that docking procedures tend to perform better than standard homology modeling techniques and that highly accurate models of the protein components are not always required to identify their association modes with acceptable accuracy. Proteins 2016; 84(Suppl 1):323–348. © 2016 The Authors Proteins: Structure, Function, and Bioinformatics Published by Wiley Periodicals, Inc. PMID:27122118
Naegle, Kristen M.; White, Forest M.; Lauffenburger, Douglas A.; Yaffe, Michael B.
2012-01-01
Cell signaling networks propagate information from extracellular cues via dynamic modulation of protein–protein interactions in a context-dependent manner. Networks based on receptor tyrosine kinases (RTKs), for example, phosphorylate intracellular proteins in response to extracellular ligands, resulting in dynamic protein–protein interactions that drive phenotypic changes. Most commonly used methods for discovering these protein–protein interactions, however, are optimized for detecting stable, longer-lived complexes, rather than the type of transient interactions that are essential components of dynamic signaling networks such as those mediated by RTKs. Substrate phosphorylation downstream of RTK activation modifies substrate activity and induces phospho-specific binding interactions, resulting in the formation of large transient macromolecular signaling complexes. Since protein complex formation should follow the trajectory of events that drive it, we reasoned that mining phosphoproteomic datasets for highly similar dynamic behavior of measured phosphorylation sites on different proteins could be used to predict novel, transient protein–protein interactions that had not been previously identified. We applied this method to explore signaling events downstream of EGFR stimulation. Our computational analysis of robustly co-regulated phosphorylation sites, based on multiple clustering analysis of quantitative time-resolved mass-spectrometry phosphoproteomic data, not only identified known sitewise-specific recruitment of proteins to EGFR, but also predicted novel, a priori interactions. A particularly intriguing prediction of EGFR interaction with the cytoskeleton-associated protein PDLIM1 was verified within cells using co-immunoprecipitation and in situ proximity ligation assays. Our approach thus offers a new way to discover protein–protein interactions in a dynamic context- and phosphorylation site-specific manner. PMID:22851037
Liu, Shiwei; Liu, Yihui; Zhao, Jiawei; Cai, Shitao; Qian, Hongmei; Zuo, Kaijing; Zhao, Lingxia; Zhang, Lida
2017-04-01
Rice (Oryza sativa) is one of the most important staple foods for more than half of the global population. Many rice traits are quantitative, complex and controlled by multiple interacting genes. Thus, a full understanding of genetic relationships will be critical to systematically identify genes controlling agronomic traits. We developed a genome-wide rice protein-protein interaction network (RicePPINet, http://netbio.sjtu.edu.cn/riceppinet) using machine learning with structural relationship and functional information. RicePPINet contained 708 819 predicted interactions for 16 895 non-transposable element related proteins. The power of the network for discovering novel protein interactions was demonstrated through comparison with other publicly available protein-protein interaction (PPI) prediction methods, and by experimentally determined PPI data sets. Furthermore, global analysis of domain-mediated interactions revealed RicePPINet accurately reflects PPIs at the domain level. Our studies showed the efficiency of the RicePPINet-based method in prioritizing candidate genes involved in complex agronomic traits, such as disease resistance and drought tolerance, was approximately 2-11 times better than random prediction. RicePPINet provides an expanded landscape of computational interactome for the genetic dissection of agronomically important traits in rice. © 2017 The Authors The Plant Journal © 2017 John Wiley & Sons Ltd.
Oligomerization of G protein-coupled receptors: computational methods.
Selent, J; Kaczor, A A
2011-01-01
Recent research has unveiled the complexity of mechanisms involved in G protein-coupled receptor (GPCR) functioning in which receptor dimerization/oligomerization may play an important role. Although the first high-resolution X-ray structure for a likely functional chemokine receptor dimer has been deposited in the Protein Data Bank, the interactions and mechanisms of dimer formation are not yet fully understood. In this respect, computational methods play a key role for predicting accurate GPCR complexes. This review outlines computational approaches focusing on sequence- and structure-based methodologies as well as discusses their advantages and limitations. Sequence-based approaches that search for possible protein-protein interfaces in GPCR complexes have been applied with success in several studies, but did not yield always consistent results. Structure-based methodologies are a potent complement to sequence-based approaches. For instance, protein-protein docking is a valuable method especially when guided by experimental constraints. Some disadvantages like limited receptor flexibility and non-consideration of the membrane environment have to be taken into account. Molecular dynamics simulation can overcome these drawbacks giving a detailed description of conformational changes in a native-like membrane. Successful prediction of GPCR complexes using computational approaches combined with experimental efforts may help to understand the role of dimeric/oligomeric GPCR complexes for fine-tuning receptor signaling. Moreover, since such GPCR complexes have attracted interest as potential drug target for diverse diseases, unveiling molecular determinants of dimerization/oligomerization can provide important implications for drug discovery.
Computational prediction of protein hot spot residues.
Morrow, John Kenneth; Zhang, Shuxing
2012-01-01
Most biological processes involve multiple proteins interacting with each other. It has been recently discovered that certain residues in these protein-protein interactions, which are called hot spots, contribute more significantly to binding affinity than others. Hot spot residues have unique and diverse energetic properties that make them challenging yet important targets in the modulation of protein-protein complexes. Design of therapeutic agents that interact with hot spot residues has proven to be a valid methodology in disrupting unwanted protein-protein interactions. Using biological methods to determine which residues are hot spots can be costly and time consuming. Recent advances in computational approaches to predict hot spots have incorporated a myriad of features, and have shown increasing predictive successes. Here we review the state of knowledge around protein-protein interactions, hot spots, and give an overview of multiple in silico prediction techniques of hot spot residues.
Efficient prediction of human protein-protein interactions at a global scale.
Schoenrock, Andrew; Samanfar, Bahram; Pitre, Sylvain; Hooshyar, Mohsen; Jin, Ke; Phillips, Charles A; Wang, Hui; Phanse, Sadhna; Omidi, Katayoun; Gui, Yuan; Alamgir, Md; Wong, Alex; Barrenäs, Fredrik; Babu, Mohan; Benson, Mikael; Langston, Michael A; Green, James R; Dehne, Frank; Golshani, Ashkan
2014-12-10
Our knowledge of global protein-protein interaction (PPI) networks in complex organisms such as humans is hindered by technical limitations of current methods. On the basis of short co-occurring polypeptide regions, we developed a tool called MP-PIPE capable of predicting a global human PPI network within 3 months. With a recall of 23% at a precision of 82.1%, we predicted 172,132 putative PPIs. We demonstrate the usefulness of these predictions through a range of experiments. The speed and accuracy associated with MP-PIPE can make this a potential tool to study individual human PPI networks (from genomic sequences alone) for personalized medicine.
Krepl, Miroslav; Cléry, Antoine; Blatter, Markus; Allain, Frederic H.T.; Sponer, Jiri
2016-01-01
RNA recognition motif (RRM) proteins represent an abundant class of proteins playing key roles in RNA biology. We present a joint atomistic molecular dynamics (MD) and experimental study of two RRM-containing proteins bound with their single-stranded target RNAs, namely the Fox-1 and SRSF1 complexes. The simulations are used in conjunction with NMR spectroscopy to interpret and expand the available structural data. We accumulate more than 50 μs of simulations and show that the MD method is robust enough to reliably describe the structural dynamics of the RRM–RNA complexes. The simulations predict unanticipated specific participation of Arg142 at the protein–RNA interface of the SRFS1 complex, which is subsequently confirmed by NMR and ITC measurements. Several segments of the protein–RNA interface may involve competition between dynamical local substates rather than firmly formed interactions, which is indirectly consistent with the primary NMR data. We demonstrate that the simulations can be used to interpret the NMR atomistic models and can provide qualified predictions. Finally, we propose a protocol for ‘MD-adapted structure ensemble’ as a way to integrate the simulation predictions and expand upon the deposited NMR structures. Unbiased μs-scale atomistic MD could become a technique routinely complementing the NMR measurements of protein–RNA complexes. PMID:27193998
Yugandhar, K; Gromiha, M Michael
2014-09-01
Protein-protein interactions are intrinsic to virtually every cellular process. Predicting the binding affinity of protein-protein complexes is one of the challenging problems in computational and molecular biology. In this work, we related sequence features of protein-protein complexes with their binding affinities using machine learning approaches. We set up a database of 185 protein-protein complexes for which the interacting pairs are heterodimers and their experimental binding affinities are available. On the other hand, we have developed a set of 610 features from the sequences of protein complexes and utilized Ranker search method, which is the combination of Attribute evaluator and Ranker method for selecting specific features. We have analyzed several machine learning algorithms to discriminate protein-protein complexes into high and low affinity groups based on their Kd values. Our results showed a 10-fold cross-validation accuracy of 76.1% with the combination of nine features using support vector machines. Further, we observed accuracy of 83.3% on an independent test set of 30 complexes. We suggest that our method would serve as an effective tool for identifying the interacting partners in protein-protein interaction networks and human-pathogen interactions based on the strength of interactions. © 2014 Wiley Periodicals, Inc.
The Est3 protein associates with yeast telomerase through an OB-fold domain
Lee, Jaesung S.; Mandell, Edward K.; Tucey, Timothy M.; Morris, Danna K.; Victoria, Lundblad
2009-01-01
The Est3 protein is a small regulatory subunit of yeast telomerase which is dispensable for enzyme catalysis but essential for telomere replication in vivo. Using structure prediction combined with in vivo characterization, we show here that Est3 consists of a predicted OB (oligo-saccharide/oligo-nucleotide binding) fold. Mutagenesis of predicted surface residues was used to generate a functional map of one surface of Est3, which identified a site that mediates association with the telomerase complex. Surprisingly, the predicted OB-fold of Est3 is structurally similar to the OB-fold of the mammalian TPP1 protein, despite the fact that Est3 and TPP1, as components of telomerase and a telomere capping complex, respectively, perform functionally distinct tasks at chromosome ends. The analysis performed on Est3 may be instructive in generating comparable missense mutations on the surface of the OB-fold domain of TPP1. PMID:19172754
Blatter, Markus; Cléry, Antoine; Damberger, Fred F.
2017-01-01
Abstract The Fox-1 RNA recognition motif (RRM) domain is an important member of the RRM protein family. We report a 1.8 Å X-ray structure of the free Fox-1 containing six distinct monomers. We use this and the nuclear magnetic resonance (NMR) structure of the Fox-1 protein/RNA complex for molecular dynamics (MD) analyses of the structured hydration. The individual monomers of the X-ray structure show diverse hydration patterns, however, MD excellently reproduces the most occupied hydration sites. Simulations of the protein/RNA complex show hydration consistent with the isolated protein complemented by hydration sites specific to the protein/RNA interface. MD predicts intricate hydration sites with water-binding times extending up to hundreds of nanoseconds. We characterize two of them using NMR spectroscopy, RNA binding with switchSENSE and free-energy calculations of mutant proteins. Both hydration sites are experimentally confirmed and their abolishment reduces the binding free-energy. A quantitative agreement between theory and experiment is achieved for the S155A substitution but not for the S122A mutant. The S155 hydration site is evolutionarily conserved within the RRM domains. In conclusion, MD is an effective tool for predicting and interpreting the hydration patterns of protein/RNA complexes. Hydration is not easily detectable in NMR experiments but can affect stability of protein/RNA complexes. PMID:28505313
Protein-Protein Interface and Disease: Perspective from Biomolecular Networks.
Hu, Guang; Xiao, Fei; Li, Yuqian; Li, Yuan; Vongsangnak, Wanwipa
Protein-protein interactions are involved in many important biological processes and molecular mechanisms of disease association. Structural studies of interfacial residues in protein complexes provide information on protein-protein interactions. Characterizing protein-protein interfaces, including binding sites and allosteric changes, thus pose an imminent challenge. With special focus on protein complexes, approaches based on network theory are proposed to meet this challenge. In this review we pay attention to protein-protein interfaces from the perspective of biomolecular networks and their roles in disease. We first describe the different roles of protein complexes in disease through several structural aspects of interfaces. We then discuss some recent advances in predicting hot spots and communication pathway analysis in terms of amino acid networks. Finally, we highlight possible future aspects of this area with respect to both methodology development and applications for disease treatment.
Dong, Yadong; Sun, Yongqi; Qin, Chao
2018-01-01
The existing protein complex detection methods can be broadly divided into two categories: unsupervised and supervised learning methods. Most of the unsupervised learning methods assume that protein complexes are in dense regions of protein-protein interaction (PPI) networks even though many true complexes are not dense subgraphs. Supervised learning methods utilize the informative properties of known complexes; they often extract features from existing complexes and then use the features to train a classification model. The trained model is used to guide the search process for new complexes. However, insufficient extracted features, noise in the PPI data and the incompleteness of complex data make the classification model imprecise. Consequently, the classification model is not sufficient for guiding the detection of complexes. Therefore, we propose a new robust score function that combines the classification model with local structural information. Based on the score function, we provide a search method that works both forwards and backwards. The results from experiments on six benchmark PPI datasets and three protein complex datasets show that our approach can achieve better performance compared with the state-of-the-art supervised, semi-supervised and unsupervised methods for protein complex detection, occasionally significantly outperforming such methods.
Predictive hypotheses are ineffectual in resolving complex biochemical systems.
Fry, Michael
2018-03-20
Scientific hypotheses may either predict particular unknown facts or accommodate previously-known data. Although affirmed predictions are intuitively more rewarding than accommodations of established facts, opinions divide whether predictive hypotheses are also epistemically superior to accommodation hypotheses. This paper examines the contribution of predictive hypotheses to discoveries of several bio-molecular systems. Having all the necessary elements of the system known beforehand, an abstract predictive hypothesis of semiconservative mode of DNA replication was successfully affirmed. However, in defining the genetic code whose biochemical basis was unclear, hypotheses were only partially effective and supplementary experimentation was required for its conclusive definition. Markedly, hypotheses were entirely inept in predicting workings of complex systems that included unknown elements. Thus, hypotheses did not predict the existence and function of mRNA, the multiple unidentified components of the protein biosynthesis machinery, or the manifold unknown constituents of the ubiquitin-proteasome system of protein breakdown. Consequently, because of their inability to envision unknown entities, predictive hypotheses did not contribute to the elucidation of cation theories remained the sole instrument to explain complex bio-molecular systems, the philosophical question of alleged advantage of predictive over accommodative hypotheses became inconsequential.
DockTrina: docking triangular protein trimers.
Popov, Petr; Ritchie, David W; Grudinin, Sergei
2014-01-01
In spite of the abundance of oligomeric proteins within a cell, the structural characterization of protein-protein interactions is still a challenging task. In particular, many of these interactions involve heteromeric complexes, which are relatively difficult to determine experimentally. Hence there is growing interest in using computational techniques to model such complexes. However, assembling large heteromeric complexes computationally is a highly combinatorial problem. Nonetheless the problem can be simplified greatly by considering interactions between protein trimers. After dimers and monomers, triangular trimers (i.e. trimers with pair-wise contacts between all three pairs of proteins) are the most frequently observed quaternary structural motifs according to the three-dimensional (3D) complex database. This article presents DockTrina, a novel protein docking method for modeling the 3D structures of nonsymmetrical triangular trimers. The method takes as input pair-wise contact predictions from a rigid body docking program. It then scans and scores all possible combinations of pairs of monomers using a very fast root mean square deviation test. Finally, it ranks the predictions using a scoring function which combines triples of pair-wise contact terms and a geometric clash penalty term. The overall approach takes less than 2 min per complex on a modern desktop computer. The method is tested and validated using a benchmark set of 220 bound and seven unbound protein trimer structures. DockTrina will be made available at http://nano-d.inrialpes.fr/software/docktrina. Copyright © 2013 Wiley Periodicals, Inc.
Liang, Shide; Li, Liwei; Hsu, Wei-Lun; Pilcher, Meaghan N.; Uversky, Vladimir; Zhou, Yaoqi; Dunker, A. Keith; Meroueh, Samy O.
2009-01-01
The significant work that has been invested toward understanding protein–protein interaction has not translated into significant advances in structure-based predictions. In particular redesigning protein surfaces to bind to unrelated receptors remains a challenge, partly due to receptor flexibility, which is often neglected in these efforts. In this work, we computationally graft the binding epitope of various small proteins obtained from the RCSB database to bind to barnase, lysozyme, and trypsin using a previously derived and validated algorithm. In an effort to probe the protein complexes in a realistic environment, all native and designer complexes were subjected to a total of nearly 400 ns of explicit-solvent molecular dynamics (MD) simulation. The MD data led to an unexpected observation: some of the designer complexes were highly unstable and decomposed during the trajectories. In contrast, the native and a number of designer complexes remained consistently stable. The unstable conformers provided us with a unique opportunity to define the structural and energetic factors that lead to unproductive protein–protein complexes. To that end we used free energy calculations following the MM-PBSA approach to determine the role of nonpolar effects, electrostatics and entropy in binding. Remarkably, we found that a majority of unstable complexes exhibited more favorable electrostatics than native or stable designer complexes, suggesting that favorable electrostatic interactions are not prerequisite for complex formation between proteins. However, nonpolar effects remained consistently more favorable in native and stable designer complexes reinforcing the importance of hydrophobic effects in protein–protein binding. While entropy systematically opposed binding in all cases, there was no observed trend in the entropy difference between native and designer complexes. A series of alanine scanning mutations of hot-spot residues at the interface of native and designer complexes showed less than optimal contacts of hot-spot residues with their surroundings in the unstable conformers, resulting in more favorable entropy for these complexes. Finally, disorder predictions revealed that secondary structures at the interface of unstable complexes exhibited greater disorder than the stable complexes. PMID:19113835
Computational Prediction of Hot Spot Residues
Morrow, John Kenneth; Zhang, Shuxing
2013-01-01
Most biological processes involve multiple proteins interacting with each other. It has been recently discovered that certain residues in these protein-protein interactions, which are called hot spots, contribute more significantly to binding affinity than others. Hot spot residues have unique and diverse energetic properties that make them challenging yet important targets in the modulation of protein-protein complexes. Design of therapeutic agents that interact with hot spot residues has proven to be a valid methodology in disrupting unwanted protein-protein interactions. Using biological methods to determine which residues are hot spots can be costly and time consuming. Recent advances in computational approaches to predict hot spots have incorporated a myriad of features, and have shown increasing predictive successes. Here we review the state of knowledge around protein-protein interactions, hot spots, and give an overview of multiple in silico prediction techniques of hot spot residues. PMID:22316154
Shape Complementarity of Protein-Protein Complexes at Multiple Resolutions
Zhang, Qing; Sanner, Michel; Olson, Arthur J.
2010-01-01
Biological complexes typically exhibit intermolecular interfaces of high shape complementarity. Many computational docking approaches use this surface complementarity as a guide in the search for predicting the structures of protein-protein complexes. Proteins often undergo conformational changes in order to create a highly complementary interface when associating. These conformational changes are a major cause of failure for automated docking procedures when predicting binding modes between proteins using their unbound conformations. Low resolution surfaces in which high frequency geometric details are omitted have been used to address this problem. These smoothed, or blurred, surfaces are expected to minimize the differences between free and bound structures, especially those that are due to side chain conformations or small backbone deviations. In spite of the fact that this approach has been used in many docking protocols, there has yet to be a systematic study of the effects of such surface smoothing on the shape complementarity of the resulting interfaces. Here we investigate this question by computing shape complementarity of a set of 66 protein-protein complexes represented by multi-resolution blurred surfaces. Complexed and unbound structures are available for these protein-protein complexes. They are a subset of complexes from a non-redundant docking benchmark selected for rigidity (i.e. the proteins undergo limited conformational changes between their bound and unbound states). In this work we construct the surfaces by isocontouring a density map obtained by accumulating the densities of Gaussian functions placed at all atom centers of the molecule. The smoothness or resolution is specified by a Gaussian fall-off coefficient, termed “blobbyness”. Shape complementarity is quantified using a histogram of the shortest distances between two proteins' surface mesh vertices for both the crystallographic complexes and the complexes built using the protein structures in their unbound conformation. The histograms calculated for the bound complex structures demonstrate that medium resolution smoothing (blobbyness=−0.9) can reproduce about 88% of the shape complementarity of atomic resolution surfaces. Complexes formed from the free component structures show a partial loss of shape complementarity (more overlaps and gaps) with the atomic resolution surfaces. For surfaces smoothed to low resolution (blobbyness=−0.3), we find more consistency of shape complementarity between the complexed and free cases. To further reduce bad contacts without significantly impacting the good contacts we introduce another blurred surface, in which the Gaussian densities of flexible atoms are reduced. From these results we discuss the use of shape complementarity in protein-protein docking. PMID:18837463
Yerrapragada, Shaila; Shukla, Animesh; Hallsworth-Pepin, Kymberlie; Choi, Kwangmin; Wollam, Aye; Clifton, Sandra; Qin, Xiang; Muzny, Donna; Raghuraman, Sriram; Ashki, Haleh; Uzman, Akif; Highlander, Sarah K.; Fryszczyn, Bartlomiej G.; Fox, George E.; Tirumalai, Madhan R.; Liu, Yamei; Kim, Sun
2015-01-01
Tolypothrix sp. PCC 7601 is a freshwater filamentous cyanobacterium with complex responses to environmental conditions. Here, we present its 9.96-Mbp draft genome sequence, containing 10,065 putative protein-coding sequences, including 305 predicted two-component system proteins and 27 putative phytochrome-class photoreceptors, the most such proteins in any sequenced genome. PMID:25953173
NASA Astrophysics Data System (ADS)
Champeimont, Raphaël; Laine, Elodie; Hu, Shuang-Wei; Penin, Francois; Carbone, Alessandra
2016-05-01
A novel computational approach of coevolution analysis allowed us to reconstruct the protein-protein interaction network of the Hepatitis C Virus (HCV) at the residue resolution. For the first time, coevolution analysis of an entire viral genome was realized, based on a limited set of protein sequences with high sequence identity within genotypes. The identified coevolving residues constitute highly relevant predictions of protein-protein interactions for further experimental identification of HCV protein complexes. The method can be used to analyse other viral genomes and to predict the associated protein interaction networks.
Melero, Cristina; Ollikainen, Noah; Harwood, Ian; ...
2014-10-13
Re-engineering protein–protein recognition is an important route to dissecting and controlling complex interaction networks. Experimental approaches have used the strategy of “second-site suppressors,” where a functional interaction is inferred between two proteins if a mutation in one protein can be compensated by a mutation in the second. Mimicking this strategy, computational design has been applied successfully to change protein recognition specificity by predicting such sets of compensatory mutations in protein–protein interfaces. To extend this approach, it would be advantageous to be able to “transplant” existing engineered and experimentally validated specificity changes to other homologous protein–protein complexes. Here, we test thismore » strategy by designing a pair of mutations that modulates peptide recognition specificity in the Syntrophin PDZ domain, confirming the designed interaction biochemically and structurally, and then transplanting the mutations into the context of five related PDZ domain–peptide complexes. We find a wide range of energetic effects of identical mutations in structurally similar positions, revealing a dramatic context dependence (epistasis) of designed mutations in homologous protein–protein interactions. To better understand the structural basis of this context dependence, we apply a structure-based computational model that recapitulates these energetic effects and we use this model to make and validate forward predictions. The context dependence of these mutations is captured by computational predictions, our results both highlight the considerable difficulties in designing protein–protein interactions and provide challenging benchmark cases for the development of improved protein modeling and design methods that accurately account for the context.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Melero, Cristina; Ollikainen, Noah; Harwood, Ian
Re-engineering protein–protein recognition is an important route to dissecting and controlling complex interaction networks. Experimental approaches have used the strategy of “second-site suppressors,” where a functional interaction is inferred between two proteins if a mutation in one protein can be compensated by a mutation in the second. Mimicking this strategy, computational design has been applied successfully to change protein recognition specificity by predicting such sets of compensatory mutations in protein–protein interfaces. To extend this approach, it would be advantageous to be able to “transplant” existing engineered and experimentally validated specificity changes to other homologous protein–protein complexes. Here, we test thismore » strategy by designing a pair of mutations that modulates peptide recognition specificity in the Syntrophin PDZ domain, confirming the designed interaction biochemically and structurally, and then transplanting the mutations into the context of five related PDZ domain–peptide complexes. We find a wide range of energetic effects of identical mutations in structurally similar positions, revealing a dramatic context dependence (epistasis) of designed mutations in homologous protein–protein interactions. To better understand the structural basis of this context dependence, we apply a structure-based computational model that recapitulates these energetic effects and we use this model to make and validate forward predictions. The context dependence of these mutations is captured by computational predictions, our results both highlight the considerable difficulties in designing protein–protein interactions and provide challenging benchmark cases for the development of improved protein modeling and design methods that accurately account for the context.« less
InterProSurf: a web server for predicting interacting sites on protein surfaces
Negi, Surendra S.; Schein, Catherine H.; Oezguen, Numan; Power, Trevor D.; Braun, Werner
2009-01-01
Summary A new web server, InterProSurf, predicts interacting amino acid residues in proteins that are most likely to interact with other proteins, given the 3D structures of subunits of a protein complex. The prediction method is based on solvent accessible surface area of residues in the isolated subunits, a propensity scale for interface residues and a clustering algorithm to identify surface regions with residues of high interface propensities. Here we illustrate the application of InterProSurf to determine which areas of Bacillus anthracis toxins and measles virus hemagglutinin protein interact with their respective cell surface receptors. The computationally predicted regions overlap with those regions previously identified as interface regions by sequence analysis and mutagenesis experiments. PMID:17933856
Predicting the Impact of Alternative Splicing on Plant MADS Domain Protein Function
Severing, Edouard I.; van Dijk, Aalt D. J.; Morabito, Giuseppa; Busscher-Lange, Jacqueline; Immink, Richard G. H.; van Ham, Roeland C. H. J.
2012-01-01
Several genome-wide studies demonstrated that alternative splicing (AS) significantly increases the transcriptome complexity in plants. However, the impact of AS on the functional diversity of proteins is difficult to assess using genome-wide approaches. The availability of detailed sequence annotations for specific genes and gene families allows for a more detailed assessment of the potential effect of AS on their function. One example is the plant MADS-domain transcription factor family, members of which interact to form protein complexes that function in transcription regulation. Here, we perform an in silico analysis of the potential impact of AS on the protein-protein interaction capabilities of MIKC-type MADS-domain proteins. We first confirmed the expression of transcript isoforms resulting from predicted AS events. Expressed transcript isoforms were considered functional if they were likely to be translated and if their corresponding AS events either had an effect on predicted dimerisation motifs or occurred in regions known to be involved in multimeric complex formation, or otherwise, if their effect was conserved in different species. Nine out of twelve MIKC MADS-box genes predicted to produce multiple protein isoforms harbored putative functional AS events according to those criteria. AS events with conserved effects were only found at the borders of or within the K-box domain. We illustrate how AS can contribute to the evolution of interaction networks through an example of selective inclusion of a recently evolved interaction motif in the MADS AFFECTING FLOWERING1-3 (MAF1–3) subclade. Furthermore, we demonstrate the potential effect of an AS event in SHORT VEGETATIVE PHASE (SVP), resulting in the deletion of a short sequence stretch including a predicted interaction motif, by overexpression of the fully spliced and the alternatively spliced SVP transcripts. For most of the AS events we were able to formulate hypotheses about the potential impact on the interaction capabilities of the encoded MIKC proteins. PMID:22295091
Usman Mirza, Muhammad; Rafique, Shazia; Ali, Amjad; Munir, Mobeen; Ikram, Nazia; Manan, Abdul; Salo-Ahen, Outi M H; Idrees, Muhammad
2016-12-09
The recent outbreak of Zika virus (ZIKV) infection in Brazil has developed to a global health concern due to its likely association with birth defects (primary microcephaly) and neurological complications. Consequently, there is an urgent need to develop a vaccine to prevent or a medicine to treat the infection. In this study, immunoinformatics approach was employed to predict antigenic epitopes of Zika viral proteins to aid in development of a peptide vaccine against ZIKV. Both linear and conformational B-cell epitopes as well as cytotoxic T-lymphocyte (CTL) epitopes were predicted for ZIKV Envelope (E), NS3 and NS5 proteins. We further investigated the binding interactions of altogether 15 antigenic CTL epitopes with three class I major histocompatibility complex (MHC I) proteins after docking the peptides to the binding groove of the MHC I proteins. The stability of the resulting peptide-MHC I complexes was further studied by molecular dynamics simulations. The simulation results highlight the limits of rigid-body docking methods. Some of the antigenic epitopes predicted and analyzed in this work might present a preliminary set of peptides for future vaccine development against ZIKV.
Discrete structural features among interface residue-level classes.
Sowmya, Gopichandran; Ranganathan, Shoba
2015-01-01
Protein-protein interaction (PPI) is essential for molecular functions in biological cells. Investigation on protein interfaces of known complexes is an important step towards deciphering the driving forces of PPIs. Each PPI complex is specific, sensitive and selective to binding. Therefore, we have estimated the relative difference in percentage of polar residues between surface and the interface for each complex in a non-redundant heterodimer dataset of 278 complexes to understand the predominant forces driving binding. Our analysis showed ~60% of protein complexes with surface polarity greater than interface polarity (designated as class A). However, a considerable number of complexes (~40%) have interface polarity greater than surface polarity, (designated as class B), with a significantly different p-value of 1.66E-45 from class A. Comprehensive analyses of protein complexes show that interface features such as interface area, interface polarity abundance, solvation free energy gain upon interface formation, binding energy and the percentage of interface charged residue abundance distinguish among class A and class B complexes, while electrostatic visualization maps also help differentiate interface classes among complexes. Class A complexes are classical with abundant non-polar interactions at the interface; however class B complexes have abundant polar interactions at the interface, similar to protein surface characteristics. Five physicochemical interface features analyzed from the protein heterodimer dataset are discriminatory among the interface residue-level classes. These novel observations find application in developing residue-level models for protein-protein binding prediction, protein-protein docking studies and interface inhibitor design as drugs.
Discrete structural features among interface residue-level classes
2015-01-01
Background Protein-protein interaction (PPI) is essential for molecular functions in biological cells. Investigation on protein interfaces of known complexes is an important step towards deciphering the driving forces of PPIs. Each PPI complex is specific, sensitive and selective to binding. Therefore, we have estimated the relative difference in percentage of polar residues between surface and the interface for each complex in a non-redundant heterodimer dataset of 278 complexes to understand the predominant forces driving binding. Results Our analysis showed ~60% of protein complexes with surface polarity greater than interface polarity (designated as class A). However, a considerable number of complexes (~40%) have interface polarity greater than surface polarity, (designated as class B), with a significantly different p-value of 1.66E-45 from class A. Comprehensive analyses of protein complexes show that interface features such as interface area, interface polarity abundance, solvation free energy gain upon interface formation, binding energy and the percentage of interface charged residue abundance distinguish among class A and class B complexes, while electrostatic visualization maps also help differentiate interface classes among complexes. Conclusions Class A complexes are classical with abundant non-polar interactions at the interface; however class B complexes have abundant polar interactions at the interface, similar to protein surface characteristics. Five physicochemical interface features analyzed from the protein heterodimer dataset are discriminatory among the interface residue-level classes. These novel observations find application in developing residue-level models for protein-protein binding prediction, protein-protein docking studies and interface inhibitor design as drugs. PMID:26679043
Yerrapragada, Shaila; Shukla, Animesh; Hallsworth-Pepin, Kymberlie; Choi, Kwangmin; Wollam, Aye; Clifton, Sandra; Qin, Xiang; Muzny, Donna; Raghuraman, Sriram; Ashki, Haleh; Uzman, Akif; Highlander, Sarah K; Fryszczyn, Bartlomiej G; Fox, George E; Tirumalai, Madhan R; Liu, Yamei; Kim, Sun; Kehoe, David M; Weinstock, George M
2015-05-07
Tolypothrix sp. PCC 7601 is a freshwater filamentous cyanobacterium with complex responses to environmental conditions. Here, we present its 9.96-Mbp draft genome sequence, containing 10,065 putative protein-coding sequences, including 305 predicted two-component system proteins and 27 putative phytochrome-class photoreceptors, the most such proteins in any sequenced genome. Copyright © 2015 Yerrapragada et al.
Le Bihan, Thierry; Robinson, Mark D; Stewart, Ian I; Figeys, Daniel
2004-01-01
Although HPLC-ESI-MS/MS is rapidly becoming an indispensable tool for the analysis of peptides in complex mixtures, the sequence coverage it affords is often quite poor. Low protein expression resulting in peptide signal intensities that fall below the limit of detection of the MS system in combination with differences in peptide ionization efficiency plays a significant role in this. A second important factor stems from differences in physicochemical properties of each peptide and how these properties relate to chromatographic retention and ultimate detection. To identify and understand those properties, we compared data from experimentally identified peptides with data from peptides predicted by in silico digest of all corresponding proteins in the experimental set. Three different complex protein mixtures extracted were used to define a training set to evaluate the amino acid retention coefficients based on linear regression analysis. The retention coefficients were also compared with other previous hydrophobic and retention scale. From this, we have constructed an empirical model that can be readily used to predict peptides that are likely to be observed on our HPLC-ESI-MS/MS system based on their physicochemical properties. Finally, we demonstrated that in silico prediction of peptides and their retention coefficients can be used to generate an inclusion list for a targeted mass spectrometric identification of low abundance proteins in complex protein samples. This approach is based on experimentally derived data to calibrate the method and therefore may theoretically be applied to any HPLC-MS/MS system on which data are being generated.
Evaluation of protein docking predictions using Hex 3.1 in CAPRI rounds 1 and 2.
Ritchie, David W
2003-07-01
This article describes and reviews our efforts using Hex 3.1 to predict the docking modes of the seven target protein-protein complexes presented in the CAPRI (Critical Assessment of Predicted Interactions) blind docking trial. For each target, the structure of at least one of the docking partners was given in its unbound form, and several of the targets involved large multimeric structures (e.g., Lactobacillus HPr kinase, hemagglutinin, bovine rotavirus VP6). Here we describe several enhancements to our original spherical polar Fourier docking correlation algorithm. For example, a novel surface sphere smothering algorithm is introduced to generate multiple local coordinate systems around the surface of a large receptor molecule, which may be used to define a small number of initial ligand-docking orientations distributed over the receptor surface. High-resolution spherical polar docking correlations are performed over the resulting receptor surface patches, and candidate docking solutions are refined by using a novel soft molecular mechanics energy minimization procedure. Overall, this approach identified two good solutions at rank 5 or less for two of the seven CAPRI complexes. Subsequent analysis of our results shows that Hex 3.1 is able to place good solutions within a list of
Conservation of hot regions in protein-protein interaction in evolution.
Hu, Jing; Li, Jiarui; Chen, Nansheng; Zhang, Xiaolong
2016-11-01
The hot regions of protein-protein interactions refer to the active area which formed by those most important residues to protein combination process. With the research development on protein interactions, lots of predicted hot regions can be discovered efficiently by intelligent computing methods, while performing biology experiments to verify each every prediction is hardly to be done due to the time-cost and the complexity of the experiment. This study based on the research of hot spot residue conservations, the proposed method is used to verify authenticity of predicted hot regions that using machine learning algorithm combined with protein's biological features and sequence conservation, though multiple sequence alignment, module substitute matrix and sequence similarity to create conservation scoring algorithm, and then using threshold module to verify the conservation tendency of hot regions in evolution. This research work gives an effective method to verify predicted hot regions in protein-protein interactions, which also provides a useful way to deeply investigate the functional activities of protein hot regions. Copyright © 2016. Published by Elsevier Inc.
Shen, Xianjun; Yi, Li; Jiang, Xingpeng; He, Tingting; Yang, Jincai; Xie, Wei; Hu, Po; Hu, Xiaohua
2017-01-01
How to identify protein complex is an important and challenging task in proteomics. It would make great contribution to our knowledge of molecular mechanism in cell life activities. However, the inherent organization and dynamic characteristic of cell system have rarely been incorporated into the existing algorithms for detecting protein complexes because of the limitation of protein-protein interaction (PPI) data produced by high throughput techniques. The availability of time course gene expression profile enables us to uncover the dynamics of molecular networks and improve the detection of protein complexes. In order to achieve this goal, this paper proposes a novel algorithm DCA (Dynamic Core-Attachment). It detects protein-complex core comprising of continually expressed and highly connected proteins in dynamic PPI network, and then the protein complex is formed by including the attachments with high adhesion into the core. The integration of core-attachment feature into the dynamic PPI network is responsible for the superiority of our algorithm. DCA has been applied on two different yeast dynamic PPI networks and the experimental results show that it performs significantly better than the state-of-the-art techniques in terms of prediction accuracy, hF-measure and statistical significance in biology. In addition, the identified complexes with strong biological significance provide potential candidate complexes for biologists to validate.
2014-01-01
Background Protein-protein docking is an in silico method to predict the formation of protein complexes. Due to limited computational resources, the protein-protein docking approach has been developed under the assumption of rigid docking, in which one of the two protein partners remains rigid during the protein associations and water contribution is ignored or implicitly presented. Despite obtaining a number of acceptable complex predictions, it seems to-date that most initial rigid docking algorithms still find it difficult or even fail to discriminate successfully the correct predictions from the other incorrect or false positive ones. To improve the rigid docking results, re-ranking is one of the effective methods that help re-locate the correct predictions in top high ranks, discriminating them from the other incorrect ones. In this paper, we propose a new re-ranking technique using a new energy-based scoring function, namely IFACEwat - a combined Interface Atomic Contact Energy (IFACE) and water effect. The IFACEwat aims to further improve the discrimination of the near-native structures of the initial rigid docking algorithm ZDOCK3.0.2. Unlike other re-ranking techniques, the IFACEwat explicitly implements interfacial water into the protein interfaces to account for the water-mediated contacts during the protein interactions. Results Our results showed that the IFACEwat increased both the numbers of the near-native structures and improved their ranks as compared to the initial rigid docking ZDOCK3.0.2. In fact, the IFACEwat achieved a success rate of 83.8% for Antigen/Antibody complexes, which is 10% better than ZDOCK3.0.2. As compared to another re-ranking technique ZRANK, the IFACEwat obtains success rates of 92.3% (8% better) and 90% (5% better) respectively for medium and difficult cases. When comparing with the latest published re-ranking method F2Dock, the IFACEwat performed equivalently well or even better for several Antigen/Antibody complexes. Conclusions With the inclusion of interfacial water, the IFACEwat improves mostly results of the initial rigid docking, especially for Antigen/Antibody complexes. The improvement is achieved by explicitly taking into account the contribution of water during the protein interactions, which was ignored or not fully presented by the initial rigid docking and other re-ranking techniques. In addition, the IFACEwat maintains sufficient computational efficiency of the initial docking algorithm, yet improves the ranks as well as the number of the near native structures found. As our implementation so far targeted to improve the results of ZDOCK3.0.2, and particularly for the Antigen/Antibody complexes, it is expected in the near future that more implementations will be conducted to be applicable for other initial rigid docking algorithms. PMID:25521441
Structure-Templated Predictions of Novel Protein Interactions from Sequence Information
Betel, Doron; Breitkreuz, Kevin E; Isserlin, Ruth; Dewar-Darch, Danielle; Tyers, Mike; Hogue, Christopher W. V
2007-01-01
The multitude of functions performed in the cell are largely controlled by a set of carefully orchestrated protein interactions often facilitated by specific binding of conserved domains in the interacting proteins. Interacting domains commonly exhibit distinct binding specificity to short and conserved recognition peptides called binding profiles. Although many conserved domains are known in nature, only a few have well-characterized binding profiles. Here, we describe a novel predictive method known as domain–motif interactions from structural topology (D-MIST) for elucidating the binding profiles of interacting domains. A set of domains and their corresponding binding profiles were derived from extant protein structures and protein interaction data and then used to predict novel protein interactions in yeast. A number of the predicted interactions were verified experimentally, including new interactions of the mitotic exit network, RNA polymerases, nucleotide metabolism enzymes, and the chaperone complex. These results demonstrate that new protein interactions can be predicted exclusively from sequence information. PMID:17892321
Structure Prediction of Protein Complexes
NASA Astrophysics Data System (ADS)
Pierce, Brian; Weng, Zhiping
Protein-protein interactions are critical for biological function. They directly and indirectly influence the biological systems of which they are a part. Antibodies bind with antigens to detect and stop viruses and other infectious agents. Cell signaling is performed in many cases through the interactions between proteins. Many diseases involve protein-protein interactions on some level, including cancer and prion diseases.
Liu, Zhiming; Luo, Jiawei
2017-08-01
Associating protein complexes to human inherited diseases is critical for better understanding of biological processes and functional mechanisms of the disease. Many protein complexes have been identified and functionally annotated by computational and purification methods so far, however, the particular roles they were playing in causing disease have not yet been well determined. In this study, we present a novel method to identify associations between protein complexes and diseases. First, we construct a disease-protein heterogeneous network based on data integration and laplacian normalization. Second, we apply a random walk with restart on heterogeneous network (RWRH) algorithm on this network to quantify the strength of the association between proteins and the query disease. Third, we sum over the scores of member proteins to obtain a summary score for each candidate protein complex, and then rank all candidate protein complexes according to their scores. With a series of leave-one-out cross-validation experiments, we found that our method not only possesses high performance but also demonstrates robustness regarding the parameters and the network structure. We test our approach with breast cancer and select top 20 highly ranked protein complexes, 17 of the selected protein complexes are evidenced to be connected with breast cancer. Our proposed method is effective in identifying disease-related protein complexes based on data integration and laplacian normalization. Copyright © 2017. Published by Elsevier Ltd.
2017-01-01
Although deep learning approaches have had tremendous success in image, video and audio processing, computer vision, and speech recognition, their applications to three-dimensional (3D) biomolecular structural data sets have been hindered by the geometric and biological complexity. To address this problem we introduce the element-specific persistent homology (ESPH) method. ESPH represents 3D complex geometry by one-dimensional (1D) topological invariants and retains important biological information via a multichannel image-like representation. This representation reveals hidden structure-function relationships in biomolecules. We further integrate ESPH and deep convolutional neural networks to construct a multichannel topological neural network (TopologyNet) for the predictions of protein-ligand binding affinities and protein stability changes upon mutation. To overcome the deep learning limitations from small and noisy training sets, we propose a multi-task multichannel topological convolutional neural network (MM-TCNN). We demonstrate that TopologyNet outperforms the latest methods in the prediction of protein-ligand binding affinities, mutation induced globular protein folding free energy changes, and mutation induced membrane protein folding free energy changes. Availability: weilab.math.msu.edu/TDL/ PMID:28749969
Karp, Jerome M; Eryilmaz, Ertan; Erylimaz, Ertan; Cowburn, David
2015-01-01
There has been a longstanding interest in being able to accurately predict NMR chemical shifts from structural data. Recent studies have focused on using molecular dynamics (MD) simulation data as input for improved prediction. Here we examine the accuracy of chemical shift prediction for intein systems, which have regions of intrinsic disorder. We find that using MD simulation data as input for chemical shift prediction does not consistently improve prediction accuracy over use of a static X-ray crystal structure. This appears to result from the complex conformational ensemble of the disordered protein segments. We show that using accelerated molecular dynamics (aMD) simulations improves chemical shift prediction, suggesting that methods which better sample the conformational ensemble like aMD are more appropriate tools for use in chemical shift prediction for proteins with disordered regions. Moreover, our study suggests that data accurately reflecting protein dynamics must be used as input for chemical shift prediction in order to correctly predict chemical shifts in systems with disorder.
HDOCK: a web server for protein-protein and protein-DNA/RNA docking based on a hybrid strategy.
Yan, Yumeng; Zhang, Di; Zhou, Pei; Li, Botong; Huang, Sheng-You
2017-07-03
Protein-protein and protein-DNA/RNA interactions play a fundamental role in a variety of biological processes. Determining the complex structures of these interactions is valuable, in which molecular docking has played an important role. To automatically make use of the binding information from the PDB in docking, here we have presented HDOCK, a novel web server of our hybrid docking algorithm of template-based modeling and free docking, in which cases with misleading templates can be rescued by the free docking protocol. The server supports protein-protein and protein-DNA/RNA docking and accepts both sequence and structure inputs for proteins. The docking process is fast and consumes about 10-20 min for a docking run. Tested on the cases with weakly homologous complexes of <30% sequence identity from five docking benchmarks, the HDOCK pipeline tied with template-based modeling on the protein-protein and protein-DNA benchmarks and performed better than template-based modeling on the three protein-RNA benchmarks when the top 10 predictions were considered. The performance of HDOCK became better when more predictions were considered. Combining the results of HDOCK and template-based modeling by ranking first of the template-based model further improved the predictive power of the server. The HDOCK web server is available at http://hdock.phys.hust.edu.cn/. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Docking and scoring protein interactions: CAPRI 2009.
Lensink, Marc F; Wodak, Shoshana J
2010-11-15
Protein docking algorithms are assessed by evaluating blind predictions performed during 2007-2009 in Rounds 13-19 of the community-wide experiment on critical assessment of predicted interactions (CAPRI). We evaluated the ability of these algorithms to sample docking poses and to single out specific association modes in 14 targets, representing 11 distinct protein complexes. These complexes play important biological roles in RNA maturation, G-protein signal processing, and enzyme inhibition and function. One target involved protein-RNA interactions not previously considered in CAPRI, several others were hetero-oligomers, or featured multiple interfaces between the same protein pair. For most targets, predictions started from the experimentally determined structures of the free (unbound) components, or from models built from known structures of related or similar proteins. To succeed they therefore needed to account for conformational changes and model inaccuracies. In total, 64 groups and 12 web-servers submitted docking predictions of which 4420 were evaluated. Overall our assessment reveals that 67% of the groups, more than ever before, produced acceptable models or better for at least one target, with many groups submitting multiple high- and medium-accuracy models for two to six targets. Forty-one groups including four web-servers participated in the scoring experiment with 1296 evaluated models. Scoring predictions also show signs of progress evidenced from the large proportion of correct models submitted. But singling out the best models remains a challenge, which also adversely affects the ability to correctly rank docking models. With the increased interest in translating abstract protein interaction networks into realistic models of protein assemblies, the growing CAPRI community is actively developing more efficient and reliable docking and scoring methods for everyone to use. © 2010 Wiley-Liss, Inc.
Ribonucleoprotein complexes in neurologic diseases.
Ule, Jernej
2008-10-01
Ribonucleoprotein (RNP) complexes regulate the tissue-specific RNA processing and transport that increases the coding capacity of our genome and the ability to respond quickly and precisely to the diverse set of signals. This review focuses on three proteins that are part of RNP complexes in most cells of our body: TAR DNA-binding protein (TDP-43), the survival motor neuron protein (SMN), and fragile-X mental retardation protein (FMRP). In particular, the review asks the question why these ubiquitous proteins are primarily associated with defects in specific regions of the central nervous system? To understand this question, it is important to understand the role of genetic and cellular environment in causing the defect in the protein, as well as how the defective protein leads to misregulation of specific target RNAs. Two approaches for comprehensive analysis of defective RNA-protein interactions are presented. The first approach defines the RNA code or the collection of proteins that bind to a certain cis-acting RNA site in order to lead to a predictable outcome. The second approach defines the RNA map or the summary of positions on target RNAs where binding of a particular RNA-binding protein leads to a predictable outcome. As we learn more about the RNA codes and maps that guide the action of the dynamic RNP world in our brain, possibilities for new treatments of neurologic diseases are bound to emerge.
Yan, Yumeng; Tao, Huanyu; Huang, Sheng-You
2018-05-26
A major subclass of protein-protein interactions is formed by homo-oligomers with certain symmetry. Therefore, computational modeling of the symmetric protein complexes is important for understanding the molecular mechanism of related biological processes. Although several symmetric docking algorithms have been developed for Cn symmetry, few docking servers have been proposed for Dn symmetry. Here, we present HSYMDOCK, a web server of our hierarchical symmetric docking algorithm that supports both Cn and Dn symmetry. The HSYMDOCK server was extensively evaluated on three benchmarks of symmetric protein complexes, including the 20 CASP11-CAPRI30 homo-oligomer targets, the symmetric docking benchmark of 213 Cn targets and 35 Dn targets, and a nonredundant test set of 55 transmembrane proteins. It was shown that HSYMDOCK obtained a significantly better performance than other similar docking algorithms. The server supports both sequence and structure inputs for the monomer/subunit. Users have an option to provide the symmetry type of the complex, or the server can predict the symmetry type automatically. The docking process is fast and on average consumes 10∼20 min for a docking job. The HSYMDOCK web server is available at http://huanglab.phys.hust.edu.cn/hsymdock/.
BeAtMuSiC: Prediction of changes in protein-protein binding affinity on mutations.
Dehouck, Yves; Kwasigroch, Jean Marc; Rooman, Marianne; Gilis, Dimitri
2013-07-01
The ability of proteins to establish highly selective interactions with a variety of (macro)molecular partners is a crucial prerequisite to the realization of their biological functions. The availability of computational tools to evaluate the impact of mutations on protein-protein binding can therefore be valuable in a wide range of industrial and biomedical applications, and help rationalize the consequences of non-synonymous single-nucleotide polymorphisms. BeAtMuSiC (http://babylone.ulb.ac.be/beatmusic) is a coarse-grained predictor of the changes in binding free energy induced by point mutations. It relies on a set of statistical potentials derived from known protein structures, and combines the effect of the mutation on the strength of the interactions at the interface, and on the overall stability of the complex. The BeAtMuSiC server requires as input the structure of the protein-protein complex, and gives the possibility to assess rapidly all possible mutations in a protein chain or at the interface, with predictive performances that are in line with the best current methodologies.
Prediction of Protein-Protein Interaction Sites Using Electrostatic Desolvation Profiles
Fiorucci, Sébastien; Zacharias, Martin
2010-01-01
Abstract Protein-protein complex formation involves removal of water from the interface region. Surface regions with a small free energy penalty for water removal or desolvation may correspond to preferred interaction sites. A method to calculate the electrostatic free energy of placing a neutral low-dielectric probe at various protein surface positions has been designed and applied to characterize putative interaction sites. Based on solutions of the finite-difference Poisson equation, this method also includes long-range electrostatic contributions and the protein solvent boundary shape in contrast to accessible-surface-area-based solvation energies. Calculations on a large set of proteins indicate that in many cases (>90%), the known binding site overlaps with one of the six regions of lowest electrostatic desolvation penalty (overlap with the lowest desolvation region for 48% of proteins). Since the onset of electrostatic desolvation occurs even before direct protein-protein contact formation, it may help guide proteins toward the binding region in the final stage of complex formation. It is interesting that the probe desolvation properties associated with residue types were found to depend to some degree on whether the residue was outside of or part of a binding site. The probe desolvation penalty was on average smaller if the residue was part of a binding site compared to other surface locations. Applications to several antigen-antibody complexes demonstrated that the approach might be useful not only to predict protein interaction sites in general but to map potential antigenic epitopes on protein surfaces. PMID:20441756
A hidden markov model derived structural alphabet for proteins.
Camproux, A C; Gautier, R; Tufféry, P
2004-06-04
Understanding and predicting protein structures depends on the complexity and the accuracy of the models used to represent them. We have set up a hidden Markov model that discretizes protein backbone conformation as series of overlapping fragments (states) of four residues length. This approach learns simultaneously the geometry of the states and their connections. We obtain, using a statistical criterion, an optimal systematic decomposition of the conformational variability of the protein peptidic chain in 27 states with strong connection logic. This result is stable over different protein sets. Our model fits well the previous knowledge related to protein architecture organisation and seems able to grab some subtle details of protein organisation, such as helix sub-level organisation schemes. Taking into account the dependence between the states results in a description of local protein structure of low complexity. On an average, the model makes use of only 8.3 states among 27 to describe each position of a protein structure. Although we use short fragments, the learning process on entire protein conformations captures the logic of the assembly on a larger scale. Using such a model, the structure of proteins can be reconstructed with an average accuracy close to 1.1A root-mean-square deviation and for a complexity of only 3. Finally, we also observe that sequence specificity increases with the number of states of the structural alphabet. Such models can constitute a very relevant approach to the analysis of protein architecture in particular for protein structure prediction.
Yan, Jing; Zhou, Mowei; Gilbert, Joshua D; Wolff, Jeremy J; Somogyi, Árpád; Pedder, Randall E; Quintyn, Royston S; Morrison, Lindsay J; Easterling, Michael L; Paša-Tolić, Ljiljana; Wysocki, Vicki H
2017-01-03
Mass spectrometry continues to develop as a valuable tool in the analysis of proteins and protein complexes. In protein complex mass spectrometry studies, surface-induced dissociation (SID) has been successfully applied in quadrupole time-of-flight (Q-TOF) instruments. SID provides structural information on noncovalent protein complexes that is complementary to other techniques. However, the mass resolution of Q-TOF instruments can limit the information that can be obtained for protein complexes by SID. Fourier transform ion cyclotron resonance mass spectrometry (FT-ICR MS) provides ultrahigh resolution and ultrahigh mass accuracy measurements. In this study, an SID device was designed and successfully installed in a hybrid FT-ICR instrument in place of the standard gas collision cell. The SID-FT-ICR platform has been tested with several protein complex systems (homooligomers, a heterooligomer, and a protein-ligand complex, ranging from 53 to 85 kDa), and the results are consistent with data previously acquired on Q-TOF platforms, matching predictions from known protein interface information. SID fragments with the same m/z but different charge states are well-resolved based on distinct spacing between adjacent isotope peaks, and the addition of metal cations and ligands can also be isotopically resolved with the ultrahigh mass resolution available in FT-ICR.
Prediction of kinase-inhibitor binding affinity using energetic parameters
Usha, Singaravelu; Selvaraj, Samuel
2016-01-01
The combination of physicochemical properties and energetic parameters derived from protein-ligand complexes play a vital role in determining the biological activity of a molecule. In the present work, protein-ligand interaction energy along with logP values was used to predict the experimental log (IC50) values of 25 different kinase-inhibitors using multiple regressions which gave a correlation coefficient of 0.93. The regression equation obtained was tested on 93 kinase-inhibitor complexes and an average deviation of 0.92 from the experimental log IC50 values was shown. The same set of descriptors was used to predict binding affinities for a test set of five individual kinase families, with correlation values > 0.9. We show that the protein-ligand interaction energies and partition coefficient values form the major deterministic factors for binding affinity of the ligand for its receptor. PMID:28149052
Identifying protein complexes based on brainstorming strategy.
Shen, Xianjun; Zhou, Jin; Yi, Li; Hu, Xiaohua; He, Tingting; Yang, Jincai
2016-11-01
Protein complexes comprising of interacting proteins in protein-protein interaction network (PPI network) play a central role in driving biological processes within cells. Recently, more and more swarm intelligence based algorithms to detect protein complexes have been emerging, which have become the research hotspot in proteomics field. In this paper, we propose a novel algorithm for identifying protein complexes based on brainstorming strategy (IPC-BSS), which is integrated into the main idea of swarm intelligence optimization and the improved K-means algorithm. Distance between the nodes in PPI network is defined by combining the network topology and gene ontology (GO) information. Inspired by human brainstorming process, IPC-BSS algorithm firstly selects the clustering center nodes, and then they are separately consolidated with the other nodes with short distance to form initial clusters. Finally, we put forward two ways of updating the initial clusters to search optimal results. Experimental results show that our IPC-BSS algorithm outperforms the other classic algorithms on yeast and human PPI networks, and it obtains many predicted protein complexes with biological significance. Copyright © 2016 Elsevier Inc. All rights reserved.
Heo, Lim; Lee, Hasup; Seok, Chaok
2016-08-18
Protein-protein docking methods have been widely used to gain an atomic-level understanding of protein interactions. However, docking methods that employ low-resolution energy functions are popular because of computational efficiency. Low-resolution docking tends to generate protein complex structures that are not fully optimized. GalaxyRefineComplex takes such low-resolution docking structures and refines them to improve model accuracy in terms of both interface contact and inter-protein orientation. This refinement method allows flexibility at the protein interface and in the overall docking structure to capture conformational changes that occur upon binding. Symmetric refinement is also provided for symmetric homo-complexes. This method was validated by refining models produced by available docking programs, including ZDOCK and M-ZDOCK, and was successfully applied to CAPRI targets in a blind fashion. An example of using the refinement method with an existing docking method for ligand binding mode prediction of a drug target is also presented. A web server that implements the method is freely available at http://galaxy.seoklab.org/refinecomplex.
Lee, Kenneth K; Sardiu, Mihaela E; Swanson, Selene K; Gilmore, Joshua M; Torok, Michael; Grant, Patrick A; Florens, Laurence; Workman, Jerry L; Washburn, Michael P
2011-07-05
Despite the availability of several large-scale proteomics studies aiming to identify protein interactions on a global scale, little is known about how proteins interact and are organized within macromolecular complexes. Here, we describe a technique that consists of a combination of biochemistry approaches, quantitative proteomics and computational methods using wild-type and deletion strains to investigate the organization of proteins within macromolecular protein complexes. We applied this technique to determine the organization of two well-studied complexes, Spt-Ada-Gcn5 histone acetyltransferase (SAGA) and ADA, for which no comprehensive high-resolution structures exist. This approach revealed that SAGA/ADA is composed of five distinct functional modules, which can persist separately. Furthermore, we identified a novel subunit of the ADA complex, termed Ahc2, and characterized Sgf29 as an ADA family protein present in all Gcn5 histone acetyltransferase complexes. Finally, we propose a model for the architecture of the SAGA and ADA complexes, which predicts novel functional associations within the SAGA complex and provides mechanistic insights into phenotypical observations in SAGA mutants.
Lee, Kenneth K; Sardiu, Mihaela E; Swanson, Selene K; Gilmore, Joshua M; Torok, Michael; Grant, Patrick A; Florens, Laurence; Workman, Jerry L; Washburn, Michael P
2011-01-01
Despite the availability of several large-scale proteomics studies aiming to identify protein interactions on a global scale, little is known about how proteins interact and are organized within macromolecular complexes. Here, we describe a technique that consists of a combination of biochemistry approaches, quantitative proteomics and computational methods using wild-type and deletion strains to investigate the organization of proteins within macromolecular protein complexes. We applied this technique to determine the organization of two well-studied complexes, Spt–Ada–Gcn5 histone acetyltransferase (SAGA) and ADA, for which no comprehensive high-resolution structures exist. This approach revealed that SAGA/ADA is composed of five distinct functional modules, which can persist separately. Furthermore, we identified a novel subunit of the ADA complex, termed Ahc2, and characterized Sgf29 as an ADA family protein present in all Gcn5 histone acetyltransferase complexes. Finally, we propose a model for the architecture of the SAGA and ADA complexes, which predicts novel functional associations within the SAGA complex and provides mechanistic insights into phenotypical observations in SAGA mutants. PMID:21734642
Le, Duc-Hau
2015-01-01
Protein complexes formed by non-covalent interaction among proteins play important roles in cellular functions. Computational and purification methods have been used to identify many protein complexes and their cellular functions. However, their roles in terms of causing disease have not been well discovered yet. There exist only a few studies for the identification of disease-associated protein complexes. However, they mostly utilize complicated heterogeneous networks which are constructed based on an out-of-date database of phenotype similarity network collected from literature. In addition, they only apply for diseases for which tissue-specific data exist. In this study, we propose a method to identify novel disease-protein complex associations. First, we introduce a framework to construct functional similarity protein complex networks where two protein complexes are functionally connected by either shared protein elements, shared annotating GO terms or based on protein interactions between elements in each protein complex. Second, we propose a simple but effective neighborhood-based algorithm, which yields a local similarity measure, to rank disease candidate protein complexes. Comparing the predictive performance of our proposed algorithm with that of two state-of-the-art network propagation algorithms including one we used in our previous study, we found that it performed statistically significantly better than that of these two algorithms for all the constructed functional similarity protein complex networks. In addition, it ran about 32 times faster than these two algorithms. Moreover, our proposed method always achieved high performance in terms of AUC values irrespective of the ways to construct the functional similarity protein complex networks and the used algorithms. The performance of our method was also higher than that reported in some existing methods which were based on complicated heterogeneous networks. Finally, we also tested our method with prostate cancer and selected the top 100 highly ranked candidate protein complexes. Interestingly, 69 of them were evidenced since at least one of their protein elements are known to be associated with prostate cancer. Our proposed method, including the framework to construct functional similarity protein complex networks and the neighborhood-based algorithm on these networks, could be used for identification of novel disease-protein complex associations.
Quantitative genetic-interaction mapping in mammalian cells
Roguev, Assen; Talbot, Dale; Negri, Gian Luca; Shales, Michael; Cagney, Gerard; Bandyopadhyay, Sourav; Panning, Barbara; Krogan, Nevan J
2013-01-01
Mapping genetic interactions (GIs) by simultaneously perturbing pairs of genes is a powerful tool for understanding complex biological phenomena. Here we describe an experimental platform for generating quantitative GI maps in mammalian cells using a combinatorial RNA interference strategy. We performed ~11,000 pairwise knockdowns in mouse fibroblasts, focusing on 130 factors involved in chromatin regulation to create a GI map. Comparison of the GI and protein-protein interaction (PPI) data revealed that pairs of genes exhibiting positive GIs and/or similar genetic profiles were predictive of the corresponding proteins being physically associated. The mammalian GI map identified pathways and complexes but also resolved functionally distinct submodules within larger protein complexes. By integrating GI and PPI data, we created a functional map of chromatin complexes in mouse fibroblasts, revealing that the PAF complex is a central player in the mammalian chromatin landscape. PMID:23407553
Parikh, Hardik I; Kellogg, Glen E
2014-06-01
Characterizing the nature of interaction between proteins that have not been experimentally cocrystallized requires a computational docking approach that can successfully predict the spatial conformation adopted in the complex. In this work, the Hydropathic INTeractions (HINT) force field model was used for scoring docked models in a data set of 30 high-resolution crystallographically characterized "dry" protein-protein complexes and was shown to reliably identify native-like models. However, most current protein-protein docking algorithms fail to explicitly account for water molecules involved in bridging interactions that mediate and stabilize the association of the protein partners, so we used HINT to illuminate the physical and chemical properties of bridging waters and account for their energetic stabilizing contributions. The HINT water Relevance metric identified the "truly" bridging waters at the 30 protein-protein interfaces and we utilized them in "solvated" docking by manually inserting them into the input files for the rigid body ZDOCK program. By accounting for these interfacial waters, a statistically significant improvement of ∼24% in the average hit-count within the top-10 predictions the protein-protein dataset was seen, compared to standard "dry" docking. The results also show scoring improvement, with medium and high accuracy models ranking much better than incorrect ones. These improvements can be attributed to the physical presence of water molecules that alter surface properties and better represent native shape and hydropathic complementarity between interacting partners, with concomitantly more accurate native-like structure predictions. © 2013 Wiley Periodicals, Inc.
Rudling, Axel; Orro, Adolfo; Carlsson, Jens
2018-02-26
Water plays a major role in ligand binding and is attracting increasing attention in structure-based drug design. Water molecules can make large contributions to binding affinity by bridging protein-ligand interactions or by being displaced upon complex formation, but these phenomena are challenging to model at the molecular level. Herein, networks of ordered water molecules in protein binding sites were analyzed by clustering of molecular dynamics (MD) simulation trajectories. Locations of ordered waters (hydration sites) were first identified from simulations of high resolution crystal structures of 13 protein-ligand complexes. The MD-derived hydration sites reproduced 73% of the binding site water molecules observed in the crystal structures. If the simulations were repeated without the cocrystallized ligands, a majority (58%) of the crystal waters in the binding sites were still predicted. In addition, comparison of the hydration sites obtained from simulations carried out in the absence of ligands to those identified for the complexes revealed that the networks of ordered water molecules were preserved to a large extent, suggesting that the locations of waters in a protein-ligand interface are mainly dictated by the protein. Analysis of >1000 crystal structures showed that hydration sites bridged protein-ligand interactions in complexes with different ligands, and those with high MD-derived occupancies were more likely to correspond to experimentally observed ordered water molecules. The results demonstrate that ordered water molecules relevant for modeling of protein-ligand complexes can be identified from MD simulations. Our findings could contribute to development of improved methods for structure-based virtual screening and lead optimization.
Protein-protein structure prediction by scoring molecular dynamics trajectories of putative poses.
Sarti, Edoardo; Gladich, Ivan; Zamuner, Stefano; Correia, Bruno E; Laio, Alessandro
2016-09-01
The prediction of protein-protein interactions and their structural configuration remains a largely unsolved problem. Most of the algorithms aimed at finding the native conformation of a protein complex starting from the structure of its monomers are based on searching the structure corresponding to the global minimum of a suitable scoring function. However, protein complexes are often highly flexible, with mobile side chains and transient contacts due to thermal fluctuations. Flexibility can be neglected if one aims at finding quickly the approximate structure of the native complex, but may play a role in structure refinement, and in discriminating solutions characterized by similar scores. We here benchmark the capability of some state-of-the-art scoring functions (BACH-SixthSense, PIE/PISA and Rosetta) in discriminating finite-temperature ensembles of structures corresponding to the native state and to non-native configurations. We produce the ensembles by running thousands of molecular dynamics simulations in explicit solvent starting from poses generated by rigid docking and optimized in vacuum. We find that while Rosetta outperformed the other two scoring functions in scoring the structures in vacuum, BACH-SixthSense and PIE/PISA perform better in distinguishing near-native ensembles of structures generated by molecular dynamics in explicit solvent. Proteins 2016; 84:1312-1320. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
When fast is better: protein folding fundamentals and mechanisms from ultrafast approaches
Muñoz, Victor; Cerminara, Michele
2016-01-01
Protein folding research stalled for decades because conventional experiments indicated that proteins fold slowly and in single strokes, whereas theory predicted a complex interplay between dynamics and energetics resulting in myriad microscopic pathways. Ultrafast kinetic methods turned the field upside down by providing the means to probe fundamental aspects of folding, test theoretical predictions and benchmark simulations. Accordingly, experimentalists could measure the timescales for all relevant folding motions, determine the folding speed limit and confirm that folding barriers are entropic bottlenecks. Moreover, a catalogue of proteins that fold extremely fast (microseconds) could be identified. Such fast-folding proteins cross shallow free energy barriers or fold downhill, and thus unfold with minimal co-operativity (gradually). A new generation of thermodynamic methods has exploited this property to map folding landscapes, interaction networks and mechanisms at nearly atomic resolution. In parallel, modern molecular dynamics simulations have finally reached the timescales required to watch fast-folding proteins fold and unfold in silico. All of these findings have buttressed the fundamentals of protein folding predicted by theory, and are now offering the first glimpses at the underlying mechanisms. Fast folding appears to also have functional implications as recent results connect downhill folding with intrinsically disordered proteins, their complex binding modes and ability to moonlight. These connections suggest that the coupling between downhill (un)folding and binding enables such protein domains to operate analogically as conformational rheostats. PMID:27574021
Raschka, Sebastian; Wolf, Alex J; Bemister-Buffington, Joseph; Kuhn, Leslie A
2018-04-01
Understanding how proteins encode ligand specificity is fascinating and similar in importance to deciphering the genetic code. For protein-ligand recognition, the combination of an almost infinite variety of interfacial shapes and patterns of chemical groups makes the problem especially challenging. Here we analyze data across non-homologous proteins in complex with small biological ligands to address observations made in our inhibitor discovery projects: that proteins favor donating H-bonds to ligands and avoid using groups with both H-bond donor and acceptor capacity. The resulting clear and significant chemical group matching preferences elucidate the code for protein-native ligand binding, similar to the dominant patterns found in nucleic acid base-pairing. On average, 90% of the keto and carboxylate oxygens occurring in the biological ligands formed direct H-bonds to the protein. A two-fold preference was found for protein atoms to act as H-bond donors and ligand atoms to act as acceptors, and 76% of all intermolecular H-bonds involved an amine donor. Together, the tight chemical and geometric constraints associated with satisfying donor groups generate a hydrogen-bonding lock that can be matched only by ligands bearing the right acceptor-rich key. Measuring an index of H-bond preference based on the observed chemical trends proved sufficient to predict other protein-ligand complexes and can be used to guide molecular design. The resulting Hbind and Protein Recognition Index software packages are being made available for rigorously defining intermolecular H-bonds and measuring the extent to which H-bonding patterns in a given complex match the preference key.
NASA Astrophysics Data System (ADS)
Raschka, Sebastian; Wolf, Alex J.; Bemister-Buffington, Joseph; Kuhn, Leslie A.
2018-02-01
Understanding how proteins encode ligand specificity is fascinating and similar in importance to deciphering the genetic code. For protein-ligand recognition, the combination of an almost infinite variety of interfacial shapes and patterns of chemical groups makes the problem especially challenging. Here we analyze data across non-homologous proteins in complex with small biological ligands to address observations made in our inhibitor discovery projects: that proteins favor donating H-bonds to ligands and avoid using groups with both H-bond donor and acceptor capacity. The resulting clear and significant chemical group matching preferences elucidate the code for protein-native ligand binding, similar to the dominant patterns found in nucleic acid base-pairing. On average, 90% of the keto and carboxylate oxygens occurring in the biological ligands formed direct H-bonds to the protein. A two-fold preference was found for protein atoms to act as H-bond donors and ligand atoms to act as acceptors, and 76% of all intermolecular H-bonds involved an amine donor. Together, the tight chemical and geometric constraints associated with satisfying donor groups generate a hydrogen-bonding lock that can be matched only by ligands bearing the right acceptor-rich key. Measuring an index of H-bond preference based on the observed chemical trends proved sufficient to predict other protein-ligand complexes and can be used to guide molecular design. The resulting Hbind and Protein Recognition Index software packages are being made available for rigorously defining intermolecular H-bonds and measuring the extent to which H-bonding patterns in a given complex match the preference key.
Prediction of change in protein unfolding rates upon point mutations in two state proteins.
Chaudhary, Priyashree; Naganathan, Athi N; Gromiha, M Michael
2016-09-01
Studies on protein unfolding rates are limited and challenging due to the complexity of unfolding mechanism and the larger dynamic range of the experimental data. Though attempts have been made to predict unfolding rates using protein sequence-structure information there is no available method for predicting the unfolding rates of proteins upon specific point mutations. In this work, we have systematically analyzed a set of 790 single mutants and developed a robust method for predicting protein unfolding rates upon mutations (Δlnku) in two-state proteins by combining amino acid properties and knowledge-based classification of mutants with multiple linear regression technique. We obtain a mean absolute error (MAE) of 0.79/s and a Pearson correlation coefficient (PCC) of 0.71 between predicted unfolding rates and experimental observations using jack-knife test. We have developed a web server for predicting protein unfolding rates upon mutation and it is freely available at https://www.iitm.ac.in/bioinfo/proteinunfolding/unfoldingrace.html. Prominent features that determine unfolding kinetics as well as plausible reasons for the observed outliers are also discussed. Copyright © 2016 Elsevier B.V. All rights reserved.
Prediction of binding hot spot residues by using structural and evolutionary parameters.
Higa, Roberto Hiroshi; Tozzi, Clésio Luis
2009-07-01
In this work, we present a method for predicting hot spot residues by using a set of structural and evolutionary parameters. Unlike previous studies, we use a set of parameters which do not depend on the structure of the protein in complex, so that the predictor can also be used when the interface region is unknown. Despite the fact that no information concerning proteins in complex is used for prediction, the application of the method to a compiled dataset described in the literature achieved a performance of 60.4%, as measured by F-Measure, corresponding to a recall of 78.1% and a precision of 49.5%. This result is higher than those reported by previous studies using the same data set.
Toufighi, Kiana; Yang, Jae-Seong; Luis, Nuno Miguel; Aznar Benitah, Salvador; Lehner, Ben; Serrano, Luis; Kiel, Christina
2015-01-01
The molecular details underlying the time-dependent assembly of protein complexes in cellular networks, such as those that occur during differentiation, are largely unexplored. Focusing on the calcium-induced differentiation of primary human keratinocytes as a model system for a major cellular reorganization process, we look at the expression of genes whose products are involved in manually-annotated protein complexes. Clustering analyses revealed only moderate co-expression of functionally related proteins during differentiation. However, when we looked at protein complexes, we found that the majority (55%) are composed of non-dynamic and dynamic gene products (‘di-chromatic’), 19% are non-dynamic, and 26% only dynamic. Considering three-dimensional protein structures to predict steric interactions, we found that proteins encoded by dynamic genes frequently interact with a common non-dynamic protein in a mutually exclusive fashion. This suggests that during differentiation, complex assemblies may also change through variation in the abundance of proteins that compete for binding to common proteins as found in some cases for paralogous proteins. Considering the example of the TNF-α/NFκB signaling complex, we suggest that the same core complex can guide signals into diverse context-specific outputs by addition of time specific expressed subunits, while keeping other cellular functions constant. Thus, our analysis provides evidence that complex assembly with stable core components and competition could contribute to cell differentiation. PMID:25946651
Rand, Tim A.; Ginalski, Krzysztof; Grishin, Nick V.; Wang, Xiaodong
2004-01-01
RNA interference is carried out by the small double-stranded RNA-induced silencing complex (RISC). The RISC-bound small RNA guides the RISC complex to identify and cleave mRNAs with complementary sequences. The proteins that make up the RISC complex and cleave mRNA have not been unequivocally defined. Here, we report the biochemical purification of RISC activity to homogeneity from Drosophila Schnieder 2 cell extracts. Argonaute 2 (Ago-2) is the sole protein component present in the purified, functional RISC. By using a bioinformatics method that combines sequence-profile analysis with predicted protein secondary structure, we found homology between the PIWI domain of Ago-2 and endonuclease V and identified potential active-site amino acid residues within the PIWI domain of Ago-2. PMID:15452342
Rand, Tim A; Ginalski, Krzysztof; Grishin, Nick V; Wang, Xiaodong
2004-10-05
RNA interference is carried out by the small double-stranded RNA-induced silencing complex (RISC). The RISC-bound small RNA guides the RISC complex to identify and cleave mRNAs with complementary sequences. The proteins that make up the RISC complex and cleave mRNA have not been unequivocally defined. Here, we report the biochemical purification of RISC activity to homogeneity from Drosophila Schnieder 2 cell extracts. Argonaute 2 (Ago-2) is the sole protein component present in the purified, functional RISC. By using a bioinformatics method that combines sequence-profile analysis with predicted protein secondary structure, we found homology between the PIWI domain of Ago-2 and endonuclease V and identified potential active-site amino acid residues within the PIWI domain of Ago-2.
Lee, Hasup; Baek, Minkyung; Lee, Gyu Rie; Park, Sangwoo; Seok, Chaok
2017-03-01
Many proteins function as homo- or hetero-oligomers; therefore, attempts to understand and regulate protein functions require knowledge of protein oligomer structures. The number of available experimental protein structures is increasing, and oligomer structures can be predicted using the experimental structures of related proteins as templates. However, template-based models may have errors due to sequence differences between the target and template proteins, which can lead to functional differences. Such structural differences may be predicted by loop modeling of local regions or refinement of the overall structure. In CAPRI (Critical Assessment of PRotein Interactions) round 30, we used recently developed features of the GALAXY protein modeling package, including template-based structure prediction, loop modeling, model refinement, and protein-protein docking to predict protein complex structures from amino acid sequences. Out of the 25 CAPRI targets, medium and acceptable quality models were obtained for 14 and 1 target(s), respectively, for which proper oligomer or monomer templates could be detected. Symmetric interface loop modeling on oligomer model structures successfully improved model quality, while loop modeling on monomer model structures failed. Overall refinement of the predicted oligomer structures consistently improved the model quality, in particular in interface contacts. Proteins 2017; 85:399-407. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
Protein-protein docking using region-based 3D Zernike descriptors
2009-01-01
Background Protein-protein interactions are a pivotal component of many biological processes and mediate a variety of functions. Knowing the tertiary structure of a protein complex is therefore essential for understanding the interaction mechanism. However, experimental techniques to solve the structure of the complex are often found to be difficult. To this end, computational protein-protein docking approaches can provide a useful alternative to address this issue. Prediction of docking conformations relies on methods that effectively capture shape features of the participating proteins while giving due consideration to conformational changes that may occur. Results We present a novel protein docking algorithm based on the use of 3D Zernike descriptors as regional features of molecular shape. The key motivation of using these descriptors is their invariance to transformation, in addition to a compact representation of local surface shape characteristics. Docking decoys are generated using geometric hashing, which are then ranked by a scoring function that incorporates a buried surface area and a novel geometric complementarity term based on normals associated with the 3D Zernike shape description. Our docking algorithm was tested on both bound and unbound cases in the ZDOCK benchmark 2.0 dataset. In 74% of the bound docking predictions, our method was able to find a near-native solution (interface C-αRMSD ≤ 2.5 Å) within the top 1000 ranks. For unbound docking, among the 60 complexes for which our algorithm returned at least one hit, 60% of the cases were ranked within the top 2000. Comparison with existing shape-based docking algorithms shows that our method has a better performance than the others in unbound docking while remaining competitive for bound docking cases. Conclusion We show for the first time that the 3D Zernike descriptors are adept in capturing shape complementarity at the protein-protein interface and useful for protein docking prediction. Rigorous benchmark studies show that our docking approach has a superior performance compared to existing methods. PMID:20003235
Protein-protein docking using region-based 3D Zernike descriptors.
Venkatraman, Vishwesh; Yang, Yifeng D; Sael, Lee; Kihara, Daisuke
2009-12-09
Protein-protein interactions are a pivotal component of many biological processes and mediate a variety of functions. Knowing the tertiary structure of a protein complex is therefore essential for understanding the interaction mechanism. However, experimental techniques to solve the structure of the complex are often found to be difficult. To this end, computational protein-protein docking approaches can provide a useful alternative to address this issue. Prediction of docking conformations relies on methods that effectively capture shape features of the participating proteins while giving due consideration to conformational changes that may occur. We present a novel protein docking algorithm based on the use of 3D Zernike descriptors as regional features of molecular shape. The key motivation of using these descriptors is their invariance to transformation, in addition to a compact representation of local surface shape characteristics. Docking decoys are generated using geometric hashing, which are then ranked by a scoring function that incorporates a buried surface area and a novel geometric complementarity term based on normals associated with the 3D Zernike shape description. Our docking algorithm was tested on both bound and unbound cases in the ZDOCK benchmark 2.0 dataset. In 74% of the bound docking predictions, our method was able to find a near-native solution (interface C-alphaRMSD < or = 2.5 A) within the top 1000 ranks. For unbound docking, among the 60 complexes for which our algorithm returned at least one hit, 60% of the cases were ranked within the top 2000. Comparison with existing shape-based docking algorithms shows that our method has a better performance than the others in unbound docking while remaining competitive for bound docking cases. We show for the first time that the 3D Zernike descriptors are adept in capturing shape complementarity at the protein-protein interface and useful for protein docking prediction. Rigorous benchmark studies show that our docking approach has a superior performance compared to existing methods.
Multi-omics approach identifies molecular mechanisms of plant-fungus mycorrhizal interaction
Larsen, Peter E.; Sreedasyam, Avinash; Trivedi, Geetika; ...
2016-01-19
In mycorrhizal symbiosis, plant roots form close, mutually beneficial interactions with soil fungi. Before this mycorrhizal interaction can be established however, plant roots must be capable of detecting potential beneficial fungal partners and initiating the gene expression patterns necessary to begin symbiosis. To predict a plant root – mycorrhizal fungi sensor systems, we analyzed in vitro experiments of Populus tremuloides (aspen tree) and Laccaria bicolor (mycorrhizal fungi) interaction and leveraged over 200 previously published transcriptomic experimental data sets, 159 experimentally validated plant transcription factor binding motifs, and more than 120-thousand experimentally validated protein-protein interactions to generate models of pre-mycorrhizal sensormore » systems in aspen root. These sensor mechanisms link extracellular signaling molecules with gene regulation through a network comprised of membrane receptors, signal cascade proteins, transcription factors, and transcription factor biding DNA motifs. Modeling predicted four pre-mycorrhizal sensor complexes in aspen that interact with fifteen transcription factors to regulate the expression of 1184 genes in response to extracellular signals synthesized by Laccaria. Predicted extracellular signaling molecules include common signaling molecules such as phenylpropanoids, salicylate, and, jasmonic acid. Lastly, this multi-omic computational modeling approach for predicting the complex sensory networks yielded specific, testable biological hypotheses for mycorrhizal interaction signaling compounds, sensor complexes, and mechanisms of gene regulation.« less
Multi-omics approach identifies molecular mechanisms of plant-fungus mycorrhizal interaction
DOE Office of Scientific and Technical Information (OSTI.GOV)
Larsen, Peter E.; Sreedasyam, Avinash; Trivedi, Geetika
In mycorrhizal symbiosis, plant roots form close, mutually beneficial interactions with soil fungi. Before this mycorrhizal interaction can be established however, plant roots must be capable of detecting potential beneficial fungal partners and initiating the gene expression patterns necessary to begin symbiosis. To predict a plant root – mycorrhizal fungi sensor systems, we analyzed in vitro experiments of Populus tremuloides (aspen tree) and Laccaria bicolor (mycorrhizal fungi) interaction and leveraged over 200 previously published transcriptomic experimental data sets, 159 experimentally validated plant transcription factor binding motifs, and more than 120-thousand experimentally validated protein-protein interactions to generate models of pre-mycorrhizal sensormore » systems in aspen root. These sensor mechanisms link extracellular signaling molecules with gene regulation through a network comprised of membrane receptors, signal cascade proteins, transcription factors, and transcription factor biding DNA motifs. Modeling predicted four pre-mycorrhizal sensor complexes in aspen that interact with fifteen transcription factors to regulate the expression of 1184 genes in response to extracellular signals synthesized by Laccaria. Predicted extracellular signaling molecules include common signaling molecules such as phenylpropanoids, salicylate, and, jasmonic acid. Lastly, this multi-omic computational modeling approach for predicting the complex sensory networks yielded specific, testable biological hypotheses for mycorrhizal interaction signaling compounds, sensor complexes, and mechanisms of gene regulation.« less
2011-01-01
Background Systematic mutagenesis studies have shown that only a few interface residues termed hot spots contribute significantly to the binding free energy of protein-protein interactions. Therefore, hot spots prediction becomes increasingly important for well understanding the essence of proteins interactions and helping narrow down the search space for drug design. Currently many computational methods have been developed by proposing different features. However comparative assessment of these features and furthermore effective and accurate methods are still in pressing need. Results In this study, we first comprehensively collect the features to discriminate hot spots and non-hot spots and analyze their distributions. We find that hot spots have lower relASA and larger relative change in ASA, suggesting hot spots tend to be protected from bulk solvent. In addition, hot spots have more contacts including hydrogen bonds, salt bridges, and atomic contacts, which favor complexes formation. Interestingly, we find that conservation score and sequence entropy are not significantly different between hot spots and non-hot spots in Ab+ dataset (all complexes). While in Ab- dataset (antigen-antibody complexes are excluded), there are significant differences in two features between hot pots and non-hot spots. Secondly, we explore the predictive ability for each feature and the combinations of features by support vector machines (SVMs). The results indicate that sequence-based feature outperforms other combinations of features with reasonable accuracy, with a precision of 0.69, a recall of 0.68, an F1 score of 0.68, and an AUC of 0.68 on independent test set. Compared with other machine learning methods and two energy-based approaches, our approach achieves the best performance. Moreover, we demonstrate the applicability of our method to predict hot spots of two protein complexes. Conclusion Experimental results show that support vector machine classifiers are quite effective in predicting hot spots based on sequence features. Hot spots cannot be fully predicted through simple analysis based on physicochemical characteristics, but there is reason to believe that integration of features and machine learning methods can remarkably improve the predictive performance for hot spots. PMID:21798070
Exploiting Amino Acid Composition for Predicting Protein-Protein Interactions
Roy, Sushmita; Martinez, Diego; Platero, Harriett; Lane, Terran; Werner-Washburne, Margaret
2009-01-01
Background Computational prediction of protein interactions typically use protein domains as classifier features because they capture conserved information of interaction surfaces. However, approaches relying on domains as features cannot be applied to proteins without any domain information. In this paper, we explore the contribution of pure amino acid composition (AAC) for protein interaction prediction. This simple feature, which is based on normalized counts of single or pairs of amino acids, is applicable to proteins from any sequenced organism and can be used to compensate for the lack of domain information. Results AAC performed at par with protein interaction prediction based on domains on three yeast protein interaction datasets. Similar behavior was obtained using different classifiers, indicating that our results are a function of features and not of classifiers. In addition to yeast datasets, AAC performed comparably on worm and fly datasets. Prediction of interactions for the entire yeast proteome identified a large number of novel interactions, the majority of which co-localized or participated in the same processes. Our high confidence interaction network included both well-studied and uncharacterized proteins. Proteins with known function were involved in actin assembly and cell budding. Uncharacterized proteins interacted with proteins involved in reproduction and cell budding, thus providing putative biological roles for the uncharacterized proteins. Conclusion AAC is a simple, yet powerful feature for predicting protein interactions, and can be used alone or in conjunction with protein domains to predict new and validate existing interactions. More importantly, AAC alone performs at par with existing, but more complex, features indicating the presence of sequence-level information that is predictive of interaction, but which is not necessarily restricted to domains. PMID:19936254
Protein Sub-Nuclear Localization Prediction Using SVM and Pfam Domain Information
Kumar, Ravindra; Jain, Sohni; Kumari, Bandana; Kumar, Manish
2014-01-01
The nucleus is the largest and the highly organized organelle of eukaryotic cells. Within nucleus exist a number of pseudo-compartments, which are not separated by any membrane, yet each of them contains only a specific set of proteins. Understanding protein sub-nuclear localization can hence be an important step towards understanding biological functions of the nucleus. Here we have described a method, SubNucPred developed by us for predicting the sub-nuclear localization of proteins. This method predicts protein localization for 10 different sub-nuclear locations sequentially by combining presence or absence of unique Pfam domain and amino acid composition based SVM model. The prediction accuracy during leave-one-out cross-validation for centromeric proteins was 85.05%, for chromosomal proteins 76.85%, for nuclear speckle proteins 81.27%, for nucleolar proteins 81.79%, for nuclear envelope proteins 79.37%, for nuclear matrix proteins 77.78%, for nucleoplasm proteins 76.98%, for nuclear pore complex proteins 88.89%, for PML body proteins 75.40% and for telomeric proteins it was 83.33%. Comparison with other reported methods showed that SubNucPred performs better than existing methods. A web-server for predicting protein sub-nuclear localization named SubNucPred has been established at http://14.139.227.92/mkumar/subnucpred/. Standalone version of SubNucPred can also be downloaded from the web-server. PMID:24897370
Looping and clustering model for the organization of protein-DNA complexes on the bacterial genome
NASA Astrophysics Data System (ADS)
Walter, Jean-Charles; Walliser, Nils-Ole; David, Gabriel; Dorignac, Jérôme; Geniet, Frédéric; Palmeri, John; Parmeggiani, Andrea; Wingreen, Ned S.; Broedersz, Chase P.
2018-03-01
The bacterial genome is organized by a variety of associated proteins inside a structure called the nucleoid. These proteins can form complexes on DNA that play a central role in various biological processes, including chromosome segregation. A prominent example is the large ParB-DNA complex, which forms an essential component of the segregation machinery in many bacteria. ChIP-Seq experiments show that ParB proteins localize around centromere-like parS sites on the DNA to which ParB binds specifically, and spreads from there over large sections of the chromosome. Recent theoretical and experimental studies suggest that DNA-bound ParB proteins can interact with each other to condense into a coherent 3D complex on the DNA. However, the structural organization of this protein-DNA complex remains unclear, and a predictive quantitative theory for the distribution of ParB proteins on DNA is lacking. Here, we propose the looping and clustering model, which employs a statistical physics approach to describe protein-DNA complexes. The looping and clustering model accounts for the extrusion of DNA loops from a cluster of interacting DNA-bound proteins that is organized around a single high-affinity binding site. Conceptually, the structure of the protein-DNA complex is determined by a competition between attractive protein interactions and loop closure entropy of this protein-DNA cluster on the one hand, and the positional entropy for placing loops within the cluster on the other. Indeed, we show that the protein interaction strength determines the ‘tightness’ of the loopy protein-DNA complex. Thus, our model provides a theoretical framework for quantitatively computing the binding profiles of ParB-like proteins around a cognate (parS) binding site.
Prediction of protein-protein interaction sites using electrostatic desolvation profiles.
Fiorucci, Sébastien; Zacharias, Martin
2010-05-19
Protein-protein complex formation involves removal of water from the interface region. Surface regions with a small free energy penalty for water removal or desolvation may correspond to preferred interaction sites. A method to calculate the electrostatic free energy of placing a neutral low-dielectric probe at various protein surface positions has been designed and applied to characterize putative interaction sites. Based on solutions of the finite-difference Poisson equation, this method also includes long-range electrostatic contributions and the protein solvent boundary shape in contrast to accessible-surface-area-based solvation energies. Calculations on a large set of proteins indicate that in many cases (>90%), the known binding site overlaps with one of the six regions of lowest electrostatic desolvation penalty (overlap with the lowest desolvation region for 48% of proteins). Since the onset of electrostatic desolvation occurs even before direct protein-protein contact formation, it may help guide proteins toward the binding region in the final stage of complex formation. It is interesting that the probe desolvation properties associated with residue types were found to depend to some degree on whether the residue was outside of or part of a binding site. The probe desolvation penalty was on average smaller if the residue was part of a binding site compared to other surface locations. Applications to several antigen-antibody complexes demonstrated that the approach might be useful not only to predict protein interaction sites in general but to map potential antigenic epitopes on protein surfaces. Copyright (c) 2010 Biophysical Society. Published by Elsevier Inc. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yan, Jing; Zhou, Mowei; Gilbert, Joshua D.
Mass spectrometry continues to develop as a valuable tool in the analysis of proteins and protein complexes. In protein complex mass spectrometry studies, surface-induced dissociation (SID) has been successfully applied in quadrupole time-of-flight (Q-TOF) instruments. SID provides structural information on noncovalent protein complexes that is complementary to other techniques. However, the mass resolution of Q-TOF instruments can limit the information that can be obtained for protein complexes by SID. Fourier transform ion cyclotron resonance mass spectrometry (FT-ICR MS) provides ultrahigh resolution and ultrahigh mass accuracy measurements. Here in this study, an SID device was designed and successfully installed in amore » hybrid FT-ICR instrument in place of the standard gas collision cell. The SID-FT-ICR platform has been tested with several protein complex systems (homooligomers, a heterooligomer, and a protein-ligand complex, ranging from 53 to 85 kDa), and the results are consistent with data previously acquired on Q-TOF platforms, matching predictions from known protein interface information. Lastly, SID fragments with the same m/z but different charge states are well-resolved based on distinct spacing between adjacent isotope peaks, and the addition of metal cations and ligands can also be isotopically resolved with the ultrahigh mass resolution available in FT-ICR.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yan, Jing; Zhou, Mowei; Gilbert, Joshua D.
Mass spectrometry continues to develop as a valuable tool in the analysis of proteins and protein complexes. In protein complex mass spectrometry studies, surface-induced dissociation (SID) has been successfully applied in quadrupole time-of-flight (Q-TOF) instruments. SID provides structural information on non-covalent protein complexes that is complementary to other techniques. However, the mass resolution of Q-TOF instruments can limit the information that can be obtained for protein complexes by SID. Fourier transform ion cyclotron resonance mass spectrometry (FT-ICR MS) provides ultrahigh resolution and ultrahigh mass accuracy measurements. In this study, an SID device was designed and successfully installed in a hybridmore » FT-ICR instrument in place of the standard gas collision cell. The SID-FT-ICR platform has been tested with several protein complex systems (homooligomers, a heterooligomer, and a protein-ligand complex, ranging from 53 kDa to 85 kDa), and the results are consistent with data previously acquired on Q-TOF platforms, matching predictions from known protein interface information. SID fragments with the same m/z but different charge states are well-resolved based on distinct spacing between adjacent isotope peaks, and the addition of metal cations and ligands can also be isotopically resolved with the ultrahigh mass resolution available in FT-ICR.« less
Yan, Jing; Zhou, Mowei; Gilbert, Joshua D.; ...
2016-12-02
Mass spectrometry continues to develop as a valuable tool in the analysis of proteins and protein complexes. In protein complex mass spectrometry studies, surface-induced dissociation (SID) has been successfully applied in quadrupole time-of-flight (Q-TOF) instruments. SID provides structural information on noncovalent protein complexes that is complementary to other techniques. However, the mass resolution of Q-TOF instruments can limit the information that can be obtained for protein complexes by SID. Fourier transform ion cyclotron resonance mass spectrometry (FT-ICR MS) provides ultrahigh resolution and ultrahigh mass accuracy measurements. Here in this study, an SID device was designed and successfully installed in amore » hybrid FT-ICR instrument in place of the standard gas collision cell. The SID-FT-ICR platform has been tested with several protein complex systems (homooligomers, a heterooligomer, and a protein-ligand complex, ranging from 53 to 85 kDa), and the results are consistent with data previously acquired on Q-TOF platforms, matching predictions from known protein interface information. Lastly, SID fragments with the same m/z but different charge states are well-resolved based on distinct spacing between adjacent isotope peaks, and the addition of metal cations and ligands can also be isotopically resolved with the ultrahigh mass resolution available in FT-ICR.« less
Energy Fluctuations Shape Free Energy of Nonspecific Biomolecular Interactions
NASA Astrophysics Data System (ADS)
Elkin, Michael; Andre, Ingemar; Lukatsky, David B.
2012-01-01
Understanding design principles of biomolecular recognition is a key question of molecular biology. Yet the enormous complexity and diversity of biological molecules hamper the efforts to gain a predictive ability for the free energy of protein-protein, protein-DNA, and protein-RNA binding. Here, using a variant of the Derrida model, we predict that for a large class of biomolecular interactions, it is possible to accurately estimate the relative free energy of binding based on the fluctuation properties of their energy spectra, even if a finite number of the energy levels is known. We show that the free energy of the system possessing a wider binding energy spectrum is almost surely lower compared with the system possessing a narrower energy spectrum. Our predictions imply that low-affinity binding scores, usually wasted in protein-protein and protein-DNA docking algorithms, can be efficiently utilized to compute the free energy. Using the results of Rosetta docking simulations of protein-protein interactions from Andre et al. (Proc. Natl. Acad. Sci. USA 105:16148, 2008), we demonstrate the power of our predictions.
Petukh, Marharyta; Li, Minghui; Alexov, Emil
2015-07-01
A new methodology termed Single Amino Acid Mutation based change in Binding free Energy (SAAMBE) was developed to predict the changes of the binding free energy caused by mutations. The method utilizes 3D structures of the corresponding protein-protein complexes and takes advantage of both approaches: sequence- and structure-based methods. The method has two components: a MM/PBSA-based component, and an additional set of statistical terms delivered from statistical investigation of physico-chemical properties of protein complexes. While the approach is rigid body approach and does not explicitly consider plausible conformational changes caused by the binding, the effect of conformational changes, including changes away from binding interface, on electrostatics are mimicked with amino acid specific dielectric constants. This provides significant improvement of SAAMBE predictions as indicated by better match against experimentally determined binding free energy changes over 1300 mutations in 43 proteins. The final benchmarking resulted in a very good agreement with experimental data (correlation coefficient 0.624) while the algorithm being fast enough to allow for large-scale calculations (the average time is less than a minute per mutation).
Munteanu, Cristian R; Pedreira, Nieves; Dorado, Julián; Pazos, Alejandro; Pérez-Montoto, Lázaro G; Ubeira, Florencio M; González-Díaz, Humberto
2014-04-01
Lectins (Ls) play an important role in many diseases such as different types of cancer, parasitic infections and other diseases. Interestingly, the Protein Data Bank (PDB) contains +3000 protein 3D structures with unknown function. Thus, we can in principle, discover new Ls mining non-annotated structures from PDB or other sources. However, there are no general models to predict new biologically relevant Ls based on 3D chemical structures. We used the MARCH-INSIDE software to calculate the Markov-Shannon 3D electrostatic entropy parameters for the complex networks of protein structure of 2200 different protein 3D structures, including 1200 Ls. We have performed a Linear Discriminant Analysis (LDA) using these parameters as inputs in order to seek a new Quantitative Structure-Activity Relationship (QSAR) model, which is able to discriminate 3D structure of Ls from other proteins. We implemented this predictor in the web server named LECTINPred, freely available at http://bio-aims.udc.es/LECTINPred.php. This web server showed the following goodness-of-fit statistics: Sensitivity=96.7 % (for Ls), Specificity=87.6 % (non-active proteins), and Accuracy=92.5 % (for all proteins), considering altogether both the training and external prediction series. In mode 2, users can carry out an automatic retrieval of protein structures from PDB. We illustrated the use of this server, in operation mode 1, performing a data mining of PDB. We predicted Ls scores for +2000 proteins with unknown function and selected the top-scored ones as possible lectins. In operation mode 2, LECTINPred can also upload 3D structural models generated with structure-prediction tools like LOMETS or PHYRE2. The new Ls are expected to be of relevance as cancer biomarkers or useful in parasite vaccine design. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
2014-01-01
Background Molecular Dynamics (MD) simulations of protein complexes suffer from the lack of specific tools in the analysis step. Analyses of MD trajectories of protein complexes indeed generally rely on classical measures, such as the RMSD, RMSF and gyration radius, conceived and developed for single macromolecules. As a matter of fact, instead, researchers engaged in simulating the dynamics of a protein complex are mainly interested in characterizing the conservation/variation of its biological interface. Results On these bases, herein we propose a novel approach to the analysis of MD trajectories or other conformational ensembles of protein complexes, MDcons, which uses the conservation of inter-residue contacts at the interface as a measure of the similarity between different snapshots. A "consensus contact map" is also provided, where the conservation of the different contacts is drawn in a grey scale. Finally, the interface area of the complex is monitored during the simulations. To show its utility, we used this novel approach to study two protein-protein complexes with interfaces of comparable size and both dominated by hydrophilic interactions, but having binding affinities at the extremes of the experimental range. MDcons is demonstrated to be extremely useful to analyse the MD trajectories of the investigated complexes, adding important insight into the dynamic behavior of their biological interface. Conclusions MDcons specifically allows the user to highlight and characterize the dynamics of the interface in protein complexes and can thus be used as a complementary tool for the analysis of MD simulations of both experimental and predicted structures of protein complexes. PMID:25077693
Abdel-Azeim, Safwat; Chermak, Edrisse; Vangone, Anna; Oliva, Romina; Cavallo, Luigi
2014-01-01
Molecular Dynamics (MD) simulations of protein complexes suffer from the lack of specific tools in the analysis step. Analyses of MD trajectories of protein complexes indeed generally rely on classical measures, such as the RMSD, RMSF and gyration radius, conceived and developed for single macromolecules. As a matter of fact, instead, researchers engaged in simulating the dynamics of a protein complex are mainly interested in characterizing the conservation/variation of its biological interface. On these bases, herein we propose a novel approach to the analysis of MD trajectories or other conformational ensembles of protein complexes, MDcons, which uses the conservation of inter-residue contacts at the interface as a measure of the similarity between different snapshots. A "consensus contact map" is also provided, where the conservation of the different contacts is drawn in a grey scale. Finally, the interface area of the complex is monitored during the simulations. To show its utility, we used this novel approach to study two protein-protein complexes with interfaces of comparable size and both dominated by hydrophilic interactions, but having binding affinities at the extremes of the experimental range. MDcons is demonstrated to be extremely useful to analyse the MD trajectories of the investigated complexes, adding important insight into the dynamic behavior of their biological interface. MDcons specifically allows the user to highlight and characterize the dynamics of the interface in protein complexes and can thus be used as a complementary tool for the analysis of MD simulations of both experimental and predicted structures of protein complexes.
HotRegion: a database of predicted hot spot clusters.
Cukuroglu, Engin; Gursoy, Attila; Keskin, Ozlem
2012-01-01
Hot spots are energetically important residues at protein interfaces and they are not randomly distributed across the interface but rather clustered. These clustered hot spots form hot regions. Hot regions are important for the stability of protein complexes, as well as providing specificity to binding sites. We propose a database called HotRegion, which provides the hot region information of the interfaces by using predicted hot spot residues, and structural properties of these interface residues such as pair potentials of interface residues, accessible surface area (ASA) and relative ASA values of interface residues of both monomer and complex forms of proteins. Also, the 3D visualization of the interface and interactions among hot spot residues are provided. HotRegion is accessible at http://prism.ccbb.ku.edu.tr/hotregion.
Zuo, Zhili; Gandhi, Neha S; Mancera, Ricardo L
2010-12-27
The leucine zipper region of activator protein-1 (AP-1) comprises the c-Jun and c-Fos proteins and constitutes a well-known coiled coil protein-protein interaction motif. We have used molecular dynamics (MD) simulations in conjunction with the molecular mechanics/Poisson-Boltzmann generalized-Born surface area [MM/PB(GB)SA] methods to predict the free energy of interaction of these proteins. In particular, the influence of the choice of solvation model, protein force field, and water potential on the stability and dynamic properties of the c-Fos-c-Jun complex were investigated. Use of the AMBER polarizable force field ff02 in combination with the polarizable POL3 water potential was found to result in increased stability of the c-Fos-c-Jun complex. MM/PB(GB)SA calculations revealed that MD simulations using the POL3 water potential give the lowest predicted free energies of interaction compared to other nonpolarizable water potentials. In addition, the calculated absolute free energy of binding was predicted to be closest to the experimental value using the MM/GBSA method with independent MD simulation trajectories using the POL3 water potential and the polarizable ff02 force field, while all other binding affinities were overestimated.
Computational Analysis of Uncharacterized Proteins of Environmental Bacterial Genome
NASA Astrophysics Data System (ADS)
Coxe, K. J.; Kumar, M.
2017-12-01
Betaproteobacteria strain CB is a gram-negative bacterium in the phylum Proteobacteria and are found naturally in soil and water. In this complex environment, bacteria play a key role in efficiently eliminating the organic material and other pollutants from wastewater. To investigate the process of pollutant removal from wastewater using bacteria, it is important to characterize the proteins encoded by the bacterial genome. Our study combines a number of bioinformatics tools to predict the function of unassigned proteins in the bacterial genome. The genome of Betaproteobacteria strain CB contains 2,112 proteins in which function of 508 proteins are unknown, termed as uncharacterized proteins (UPs). The localization of the UPs with in the cell was determined and the structure of 38 UPs was accurately predicted. These UPs were predicted to belong to various classes of proteins such as enzymes, transporters, binding proteins, signal peptides, transmembrane proteins and other proteins. The outcome of this work will help better understand wastewater treatment mechanism.
Zheng, Yong-Sheng; Lu, Yu-Qing; Meng, Ying-Ying; Zhang, Rong-Zhi; Zhang, Han; Sun, Jia-Mei; Wang, Mu-Mu; Li, Li-Hui; Li, Ru-Yu
2017-05-01
WD-40 repeat-containing protein MSI4 (FVE)/MSI4 plays important roles in determining flowering time in Arabidopsis. However, its function is unexplored in wheat. In the present study, coimmunoprecipitation and nanoscale liquid chromatography coupled to MS/MS were used to identify FVE in wheat (TaFVE)-interacting or associated proteins. Altogether 89 differentially expressed proteins showed the same downregulated expression trends as TaFVE in wheat line 5660M. Among them, 62 proteins were further predicted to be involved in the interaction network of TaFVE and 11 proteins have been shown to be potential TaFVE interactors based on curated databases and experimentally determined in other species by the STRING. Both yeast two-hybrid assay and bimolecular fluorescence complementation assay showed that histone deacetylase 6 and histone deacetylase 15 directly interacted with TaFVE. Multiple chromatin-remodelling proteins and polycomb group proteins were also identified and predicted to interact with TaFVE. These results showed that TaFVE directly interacted with multiple proteins to form multiple complexes to regulate spike developmental process, e.g. histone deacetylate, chromatin-remodelling and polycomb repressive complex 2 complexes. In addition, multiple flower development regulation factors (e.g. flowering locus K homology domain, flowering time control protein FPA, FY, flowering time control protein FCA, APETALA 1) involved in floral transition were also identified in the present study. Taken together, these results further elucidate the regulatory functions of TaFVE and help reveal the genetic mechanisms underlying wheat spike differentiation. © 2017 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Dal Palù, Alessandro; Pontelli, Enrico; He, Jing; Lu, Yonggang
2007-01-01
The paper describes a novel framework, constructed using Constraint Logic Programming (CLP) and parallelism, to determine the association between parts of the primary sequence of a protein and alpha-helices extracted from 3D low-resolution descriptions of large protein complexes. The association is determined by extracting constraints from the 3D information, regarding length, relative position and connectivity of helices, and solving these constraints with the guidance of a secondary structure prediction algorithm. Parallelism is employed to enhance performance on large proteins. The framework provides a fast, inexpensive alternative to determine the exact tertiary structure of unknown proteins.
Tools used to study how protein complexes are assembled in signaling cascades
Dwane, Susan
2011-01-01
Most proteins do not function on their own but as part of large signaling complexes that are arranged in every living cell in response to specific environmental cues. Proteins interact with each other either constitutively or transiently and do so with different affinity. When identifying the role played by a protein inside a cell, it is essential to define its particular cohort of binding partners so that the researcher can predict what signaling pathways the protein is engaged in. Once identified and confirmed, the information might allow the interaction to be manipulated by pharmacological inhibitors to help fight disease. PMID:22002082
Accurate prediction of RNA-binding protein residues with two discriminative structural descriptors.
Sun, Meijian; Wang, Xia; Zou, Chuanxin; He, Zenghui; Liu, Wei; Li, Honglin
2016-06-07
RNA-binding proteins participate in many important biological processes concerning RNA-mediated gene regulation, and several computational methods have been recently developed to predict the protein-RNA interactions of RNA-binding proteins. Newly developed discriminative descriptors will help to improve the prediction accuracy of these prediction methods and provide further meaningful information for researchers. In this work, we designed two structural features (residue electrostatic surface potential and triplet interface propensity) and according to the statistical and structural analysis of protein-RNA complexes, the two features were powerful for identifying RNA-binding protein residues. Using these two features and other excellent structure- and sequence-based features, a random forest classifier was constructed to predict RNA-binding residues. The area under the receiver operating characteristic curve (AUC) of five-fold cross-validation for our method on training set RBP195 was 0.900, and when applied to the test set RBP68, the prediction accuracy (ACC) was 0.868, and the F-score was 0.631. The good prediction performance of our method revealed that the two newly designed descriptors could be discriminative for inferring protein residues interacting with RNAs. To facilitate the use of our method, a web-server called RNAProSite, which implements the proposed method, was constructed and is freely available at http://lilab.ecust.edu.cn/NABind .
Arabidopsis G-protein interactome reveals connections to cell wall carbohydrates and morphogenesis
DOE Office of Scientific and Technical Information (OSTI.GOV)
Klopffleisch, Karsten; Phan, Nguyen; Chen, Jay
2011-01-01
The heterotrimeric G-protein complex is minimally composed of G{alpha}, G{beta}, and G{gamma} subunits. In the classic scenario, the G-protein complex is the nexus in signaling from the plasma membrane, where the heterotrimeric G-protein associates with heptahelical G-protein-coupled receptors (GPCRs), to cytoplasmic target proteins called effectors. Although a number of effectors are known in metazoans and fungi, none of these are predicted to exist in their canonical forms in plants. To identify ab initio plant G-protein effectors and scaffold proteins, we screened a set of proteins from the G-protein complex using two-hybrid complementation in yeast. After deep and exhaustive interrogation, wemore » detected 544 interactions between 434 proteins, of which 68 highly interconnected proteins form the core G-protein interactome. Within this core, over half of the interactions comprising two-thirds of the nodes were retested and validated as genuine in planta. Co-expression analysis in combination with phenotyping of loss-of-function mutations in a set of core interactome genes revealed a novel role for G-proteins in regulating cell wall modification.« less
Arabidopsis G-protein interactome reveals connections to cell wall carbohydrates and morphogenesis.
Klopffleisch, Karsten; Phan, Nguyen; Augustin, Kelsey; Bayne, Robert S; Booker, Katherine S; Botella, Jose R; Carpita, Nicholas C; Carr, Tyrell; Chen, Jin-Gui; Cooke, Thomas Ryan; Frick-Cheng, Arwen; Friedman, Erin J; Fulk, Brandon; Hahn, Michael G; Jiang, Kun; Jorda, Lucia; Kruppe, Lydia; Liu, Chenggang; Lorek, Justine; McCann, Maureen C; Molina, Antonio; Moriyama, Etsuko N; Mukhtar, M Shahid; Mudgil, Yashwanti; Pattathil, Sivakumar; Schwarz, John; Seta, Steven; Tan, Matthew; Temp, Ulrike; Trusov, Yuri; Urano, Daisuke; Welter, Bastian; Yang, Jing; Panstruga, Ralph; Uhrig, Joachim F; Jones, Alan M
2011-09-27
The heterotrimeric G-protein complex is minimally composed of Gα, Gβ, and Gγ subunits. In the classic scenario, the G-protein complex is the nexus in signaling from the plasma membrane, where the heterotrimeric G-protein associates with heptahelical G-protein-coupled receptors (GPCRs), to cytoplasmic target proteins called effectors. Although a number of effectors are known in metazoans and fungi, none of these are predicted to exist in their canonical forms in plants. To identify ab initio plant G-protein effectors and scaffold proteins, we screened a set of proteins from the G-protein complex using two-hybrid complementation in yeast. After deep and exhaustive interrogation, we detected 544 interactions between 434 proteins, of which 68 highly interconnected proteins form the core G-protein interactome. Within this core, over half of the interactions comprising two-thirds of the nodes were retested and validated as genuine in planta. Co-expression analysis in combination with phenotyping of loss-of-function mutations in a set of core interactome genes revealed a novel role for G-proteins in regulating cell wall modification.
Arabidopsis G-protein interactome reveals connections to cell wall carbohydrates and morphogenesis
Klopffleisch, Karsten; Phan, Nguyen; Augustin, Kelsey; Bayne, Robert S; Booker, Katherine S; Botella, Jose R; Carpita, Nicholas C; Carr, Tyrell; Chen, Jin-Gui; Cooke, Thomas Ryan; Frick-Cheng, Arwen; Friedman, Erin J; Fulk, Brandon; Hahn, Michael G; Jiang, Kun; Jorda, Lucia; Kruppe, Lydia; Liu, Chenggang; Lorek, Justine; McCann, Maureen C; Molina, Antonio; Moriyama, Etsuko N; Mukhtar, M Shahid; Mudgil, Yashwanti; Pattathil, Sivakumar; Schwarz, John; Seta, Steven; Tan, Matthew; Temp, Ulrike; Trusov, Yuri; Urano, Daisuke; Welter, Bastian; Yang, Jing; Panstruga, Ralph; Uhrig, Joachim F; Jones, Alan M
2011-01-01
The heterotrimeric G-protein complex is minimally composed of Gα, Gβ, and Gγ subunits. In the classic scenario, the G-protein complex is the nexus in signaling from the plasma membrane, where the heterotrimeric G-protein associates with heptahelical G-protein-coupled receptors (GPCRs), to cytoplasmic target proteins called effectors. Although a number of effectors are known in metazoans and fungi, none of these are predicted to exist in their canonical forms in plants. To identify ab initio plant G-protein effectors and scaffold proteins, we screened a set of proteins from the G-protein complex using two-hybrid complementation in yeast. After deep and exhaustive interrogation, we detected 544 interactions between 434 proteins, of which 68 highly interconnected proteins form the core G-protein interactome. Within this core, over half of the interactions comprising two-thirds of the nodes were retested and validated as genuine in planta. Co-expression analysis in combination with phenotyping of loss-of-function mutations in a set of core interactome genes revealed a novel role for G-proteins in regulating cell wall modification. PMID:21952135
Competitive Binding to Cuprous Ions of Protein and BCA in the Bicinchoninic Acid Protein Assay
Huang, Tao; Long, Mian; Huo, Bo
2010-01-01
Although Bicinchoninic acid (BCA) has been widely used to determine protein concentration, the mechanism of interaction between protein, copper ion and BCA in this assay is still not well known. Using the Micro BCA protein assay kit (Pierce Company), we measured the absorbance at 562 nm of BSA solutions with different concentrations of protein, and also varied the BCA concentration. When the concentration of protein was increased, the absorbance exhibited the known linear and nonlinear increase, and then reached an unexpected plateau followed by a gradual decrease. We introduced a model in which peptide chains competed with BCA for binding to cuprous ions. Formation of the well-known chromogenic complex of BCA-Cu1+-BCA was competed with the binding of two peptide bonds (NTPB) to cuprous ion, and there is the possibility of the existence of two new complexes. A simple equilibrium equation was established to describe the correlations between the substances in solution at equilibrium, and an empirical exponential function was introduced to describe the reduction reaction. Theoretical predictions of absorbance from the model were in good agreement with the measurements, which not only validated the competitive binding model, but also predicted a new complex of BCA-Cu1+-NTPB that might exist in the final solution. This work provides a new insight into understanding the chemical bases of the BCA protein assay and might extend the assay to higher protein concentration. PMID:21625379
Current Understanding of Usher Syndrome Type II
Yang, Jun; Wang, Le; Song, Hongman; Sokolov, Maxim
2012-01-01
Usher syndrome is the most common deafness-blindness caused by genetic mutations. To date, three genes have been identified underlying the most prevalent form of Usher syndrome, the type II form (USH2). The proteins encoded by these genes are demonstrated to form a complex in vivo. This complex is localized mainly at the periciliary membrane complex in photoreceptors and the ankle-link of the stereocilia in hair cells. Many proteins have been found to interact with USH2 proteins in vitro, suggesting that they are potential additional components of this USH2 complex and that the genes encoding these proteins may be the candidate USH2 genes. However, further investigations are critical to establish their existence in the USH2 complex in vivo. Based on the predicted functional domains in USH2 proteins, their cellular localizations in photoreceptors and hair cells, the observed phenotypes in USH2 mutant mice, and the known knowledge about diseases similar to USH2, putative biological functions of the USH2 complex have been proposed. Finally, therapeutic approaches for this group of diseases are now being actively explored. PMID:22201796
Prediction of binding hot spot residues by using structural and evolutionary parameters
2009-01-01
In this work, we present a method for predicting hot spot residues by using a set of structural and evolutionary parameters. Unlike previous studies, we use a set of parameters which do not depend on the structure of the protein in complex, so that the predictor can also be used when the interface region is unknown. Despite the fact that no information concerning proteins in complex is used for prediction, the application of the method to a compiled dataset described in the literature achieved a performance of 60.4%, as measured by F-Measure, corresponding to a recall of 78.1% and a precision of 49.5%. This result is higher than those reported by previous studies using the same data set. PMID:21637529
Protein complex prediction for large protein protein interaction networks with the Core&Peel method.
Pellegrini, Marco; Baglioni, Miriam; Geraci, Filippo
2016-11-08
Biological networks play an increasingly important role in the exploration of functional modularity and cellular organization at a systemic level. Quite often the first tools used to analyze these networks are clustering algorithms. We concentrate here on the specific task of predicting protein complexes (PC) in large protein-protein interaction networks (PPIN). Currently, many state-of-the-art algorithms work well for networks of small or moderate size. However, their performance on much larger networks, which are becoming increasingly common in modern proteome-wise studies, needs to be re-assessed. We present a new fast algorithm for clustering large sparse networks: Core&Peel, which runs essentially in time and storage O(a(G)m+n) for a network G of n nodes and m arcs, where a(G) is the arboricity of G (which is roughly proportional to the maximum average degree of any induced subgraph in G). We evaluated Core&Peel on five PPI networks of large size and one of medium size from both yeast and homo sapiens, comparing its performance against those of ten state-of-the-art methods. We demonstrate that Core&Peel consistently outperforms the ten competitors in its ability to identify known protein complexes and in the functional coherence of its predictions. Our method is remarkably robust, being quite insensible to the injection of random interactions. Core&Peel is also empirically efficient attaining the second best running time over large networks among the tested algorithms. Our algorithm Core&Peel pushes forward the state-of the-art in PPIN clustering providing an algorithmic solution with polynomial running time that attains experimentally demonstrable good output quality and speed on challenging large real networks.
When fast is better: protein folding fundamentals and mechanisms from ultrafast approaches.
Muñoz, Victor; Cerminara, Michele
2016-09-01
Protein folding research stalled for decades because conventional experiments indicated that proteins fold slowly and in single strokes, whereas theory predicted a complex interplay between dynamics and energetics resulting in myriad microscopic pathways. Ultrafast kinetic methods turned the field upside down by providing the means to probe fundamental aspects of folding, test theoretical predictions and benchmark simulations. Accordingly, experimentalists could measure the timescales for all relevant folding motions, determine the folding speed limit and confirm that folding barriers are entropic bottlenecks. Moreover, a catalogue of proteins that fold extremely fast (microseconds) could be identified. Such fast-folding proteins cross shallow free energy barriers or fold downhill, and thus unfold with minimal co-operativity (gradually). A new generation of thermodynamic methods has exploited this property to map folding landscapes, interaction networks and mechanisms at nearly atomic resolution. In parallel, modern molecular dynamics simulations have finally reached the timescales required to watch fast-folding proteins fold and unfold in silico All of these findings have buttressed the fundamentals of protein folding predicted by theory, and are now offering the first glimpses at the underlying mechanisms. Fast folding appears to also have functional implications as recent results connect downhill folding with intrinsically disordered proteins, their complex binding modes and ability to moonlight. These connections suggest that the coupling between downhill (un)folding and binding enables such protein domains to operate analogically as conformational rheostats. © 2016 The Author(s).
Protein Secondary Structure Prediction Using Deep Convolutional Neural Fields.
Wang, Sheng; Peng, Jian; Ma, Jianzhu; Xu, Jinbo
2016-01-11
Protein secondary structure (SS) prediction is important for studying protein structure and function. When only the sequence (profile) information is used as input feature, currently the best predictors can obtain ~80% Q3 accuracy, which has not been improved in the past decade. Here we present DeepCNF (Deep Convolutional Neural Fields) for protein SS prediction. DeepCNF is a Deep Learning extension of Conditional Neural Fields (CNF), which is an integration of Conditional Random Fields (CRF) and shallow neural networks. DeepCNF can model not only complex sequence-structure relationship by a deep hierarchical architecture, but also interdependency between adjacent SS labels, so it is much more powerful than CNF. Experimental results show that DeepCNF can obtain ~84% Q3 accuracy, ~85% SOV score, and ~72% Q8 accuracy, respectively, on the CASP and CAMEO test proteins, greatly outperforming currently popular predictors. As a general framework, DeepCNF can be used to predict other protein structure properties such as contact number, disorder regions, and solvent accessibility.
Protein Secondary Structure Prediction Using Deep Convolutional Neural Fields
NASA Astrophysics Data System (ADS)
Wang, Sheng; Peng, Jian; Ma, Jianzhu; Xu, Jinbo
2016-01-01
Protein secondary structure (SS) prediction is important for studying protein structure and function. When only the sequence (profile) information is used as input feature, currently the best predictors can obtain ~80% Q3 accuracy, which has not been improved in the past decade. Here we present DeepCNF (Deep Convolutional Neural Fields) for protein SS prediction. DeepCNF is a Deep Learning extension of Conditional Neural Fields (CNF), which is an integration of Conditional Random Fields (CRF) and shallow neural networks. DeepCNF can model not only complex sequence-structure relationship by a deep hierarchical architecture, but also interdependency between adjacent SS labels, so it is much more powerful than CNF. Experimental results show that DeepCNF can obtain ~84% Q3 accuracy, ~85% SOV score, and ~72% Q8 accuracy, respectively, on the CASP and CAMEO test proteins, greatly outperforming currently popular predictors. As a general framework, DeepCNF can be used to predict other protein structure properties such as contact number, disorder regions, and solvent accessibility.
Surface energetics and protein-protein interactions: analysis and mechanistic implications
Peri, Claudio; Morra, Giulia; Colombo, Giorgio
2016-01-01
Understanding protein-protein interactions (PPI) at the molecular level is a fundamental task in the design of new drugs, the prediction of protein function and the clarification of the mechanisms of (dis)regulation of biochemical pathways. In this study, we use a novel computational approach to investigate the energetics of aminoacid networks located on the surface of proteins, isolated and in complex with their respective partners. Interestingly, the analysis of individual proteins identifies patches of surface residues that, when mapped on the structure of their respective complexes, reveal regions of residue-pair couplings that extend across the binding interfaces, forming continuous motifs. An enhanced effect is visible across the proteins of the dataset forming larger quaternary assemblies. The method indicates the presence of energetic signatures in the isolated proteins that are retained in the bound form, which we hypothesize to determine binding orientation upon complex formation. We propose our method, BLUEPRINT, as a complement to different approaches ranging from the ab-initio characterization of PPIs, to protein-protein docking algorithms, for the physico-chemical and functional investigation of protein-protein interactions. PMID:27050828
Protein 8-class secondary structure prediction using conditional neural fields.
Wang, Zhiyong; Zhao, Feng; Peng, Jian; Xu, Jinbo
2011-10-01
Compared with the protein 3-class secondary structure (SS) prediction, the 8-class prediction gains less attention and is also much more challenging, especially for proteins with few sequence homologs. This paper presents a new probabilistic method for 8-class SS prediction using conditional neural fields (CNFs), a recently invented probabilistic graphical model. This CNF method not only models the complex relationship between sequence features and SS, but also exploits the interdependency among SS types of adjacent residues. In addition to sequence profiles, our method also makes use of non-evolutionary information for SS prediction. Tested on the CB513 and RS126 data sets, our method achieves Q8 accuracy of 64.9 and 64.7%, respectively, which are much better than the SSpro8 web server (51.0 and 48.0%, respectively). Our method can also be used to predict other structure properties (e.g. solvent accessibility) of a protein or the SS of RNA. Copyright © 2011 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Cantwell, Caoimhe A; Byrne, Laurann A; Connolly, Cathal D; Hynes, Michael J; McArdle, Patrick; Murphy, Richard A
2017-08-01
The aim of the present work was to establish a reliable analytical method to determine the degree of complexation in commercial metal proteinates used as feed additives in the solid state. Two complementary techniques were developed. Firstly, a quantitative attenuated total reflectance Fourier transform infrared (ATR-FTIR) spectroscopic method investigated modifications in vibrational absorption bands of the ligand on complex formation. Secondly, a powder X-ray diffraction (PXRD) method to quantify the amount of crystalline material in the proteinate product was developed. These methods were developed in tandem and cross-validated with each other. Multivariate analysis (MVA) was used to develop validated calibration and prediction models. The FTIR and PXRD calibrations showed excellent linearity (R 2 > 0.99). The diagnostic model parameters showed that the FTIR and PXRD methods were robust with a root mean square error of calibration RMSEC ≤3.39% and a root mean square error of prediction RMSEP ≤7.17% respectively. Comparative statistics show excellent agreement between the MVA packages assessed and between the FTIR and PXRD methods. The methods can be used to determine the degree of complexation in complexes of both protein hydrolysates and pure amino acids.
Wu, Min; Kwoh, Chee-Keong; Li, Xiaoli; Zheng, Jie
2014-09-11
The regulatory mechanism of recombination is one of the most fundamental problems in genomics, with wide applications in genome wide association studies (GWAS), birth-defect diseases, molecular evolution, cancer research, etc. Recombination events cluster into short genomic regions called "recombination hotspots". Recently, a zinc finger protein PRDM9 was reported to regulate recombination hotspots in human and mouse genomes. In addition, a 13-mer motif contained in the binding sites of PRDM9 is found to be enriched in human hotspots. However, this 13-mer motif only covers a fraction of hotspots, indicating that PRDM9 is not the only regulator of recombination hotspots. Therefore, the challenge of discovering other regulators of recombination hotspots becomes significant. Furthermore, recombination is a complex process. Hence, multiple proteins acting as machinery, rather than individual proteins, are more likely to carry out this process in a precise and stable manner. Therefore, the extension of the prediction of individual trans-regulators to protein complexes is also highly desired. In this paper, we introduce a pipeline to identify genes and protein complexes associated with recombination hotspots. First, we prioritize proteins associated with hotspots based on their preference of binding to hotspots and coldspots. Second, using the above identified genes as seeds, we apply the Random Walk with Restart algorithm (RWR) to propagate their influences to other proteins in protein-protein interaction (PPI) networks. Hence, many proteins without DNA-binding information will also be assigned a score to implicate their roles in recombination hotspots. Third, we construct sub-PPI networks induced by top genes ranked by RWR for various species (e.g., yeast, human and mouse) and detect protein complexes in those sub-PPI networks. The GO term analysis show that our prioritizing methods and the RWR algorithm are capable of identifying novel genes associated with recombination hotspots. The trans-regulators predicted by our pipeline are enriched with epigenetic functions (e.g., histone modifications), demonstrating the epigenetic regulatory mechanisms of recombination hotspots. The identified protein complexes also provide us with candidates to further investigate the molecular machineries for recombination hotspots. Moreover, the experimental data and results are available on our web site http://www.ntu.edu.sg/home/zhengjie/data/RecombinationHotspot/NetPipe/.
Towards Inferring Protein Interactions: Challenges and Solutions
NASA Astrophysics Data System (ADS)
Zhang, Ya; Zha, Hongyuan; Chu, Chao-Hsien; Ji, Xiang
2006-12-01
Discovering interacting proteins has been an essential part of functional genomics. However, existing experimental techniques only uncover a small portion of any interactome. Furthermore, these data often have a very high false rate. By conceptualizing the interactions at domain level, we provide a more abstract representation of interactome, which also facilitates the discovery of unobserved protein-protein interactions. Although several domain-based approaches have been proposed to predict protein-protein interactions, they usually assume that domain interactions are independent on each other for the convenience of computational modeling. A new framework to predict protein interactions is proposed in this paper, where no assumption is made about domain interactions. Protein interactions may be the result of multiple domain interactions which are dependent on each other. A conjunctive norm form representation is used to capture the relationships between protein interactions and domain interactions. The problem of interaction inference is then modeled as a constraint satisfiability problem and solved via linear programing. Experimental results on a combined yeast data set have demonstrated the robustness and the accuracy of the proposed algorithm. Moreover, we also map some predicted interacting domains to three-dimensional structures of protein complexes to show the validity of our predictions.
Gaines, J C; Acebes, S; Virrueta, A; Butler, M; Regan, L; O'Hern, C S
2018-05-01
We compare side chain prediction and packing of core and non-core regions of soluble proteins, protein-protein interfaces, and transmembrane proteins. We first identified or created comparable databases of high-resolution crystal structures of these 3 protein classes. We show that the solvent-inaccessible cores of the 3 classes of proteins are equally densely packed. As a result, the side chains of core residues at protein-protein interfaces and in the membrane-exposed regions of transmembrane proteins can be predicted by the hard-sphere plus stereochemical constraint model with the same high prediction accuracies (>90%) as core residues in soluble proteins. We also find that for all 3 classes of proteins, as one moves away from the solvent-inaccessible core, the packing fraction decreases as the solvent accessibility increases. However, the side chain predictability remains high (80% within 30°) up to a relative solvent accessibility, rSASA≲0.3, for all 3 protein classes. Our results show that ≈40% of the interface regions in protein complexes are "core", that is, densely packed with side chain conformations that can be accurately predicted using the hard-sphere model. We propose packing fraction as a metric that can be used to distinguish real protein-protein interactions from designed, non-binding, decoys. Our results also show that cores of membrane proteins are the same as cores of soluble proteins. Thus, the computational methods we are developing for the analysis of the effect of hydrophobic core mutations in soluble proteins will be equally applicable to analyses of mutations in membrane proteins. © 2018 Wiley Periodicals, Inc.
Subunit mass fingerprinting of mitochondrial complex I.
Morgner, Nina; Zickermann, Volker; Kerscher, Stefan; Wittig, Ilka; Abdrakhmanova, Albina; Barth, Hans-Dieter; Brutschy, Bernhard; Brandt, Ulrich
2008-10-01
We have employed laser induced liquid bead ion desorption (LILBID) mass spectrometry to determine the total mass and to study the subunit composition of respiratory chain complex I from Yarrowia lipolytica. Using 5-10 pmol of purified complex I, we could assign all 40 known subunits of this membrane bound multiprotein complex to peaks in LILBID subunit fingerprint spectra by comparing predicted protein masses to observed ion masses. Notably, even the highly hydrophobic subunits encoded by the mitochondrial genome were easily detectable. Moreover, the LILBID approach allowed us to spot and correct several errors in the genome-derived protein sequences of complex I subunits. Typically, the masses of the individual subunits as determined by LILBID mass spectrometry were within 100 Da of the predicted values. For the first time, we demonstrate that LILBID spectrometry can be successfully applied to a complex I band eluted from a blue-native polyacrylamide gel, making small amounts of large multiprotein complexes accessible for subunit mass fingerprint analysis even if they are membrane bound. Thus, the LILBID subunit mass fingerprint method will be of great value for efficient proteomic analysis of complex I and its assembly intermediates, as well as of other water soluble and membrane bound multiprotein complexes.
Prediction of TF target sites based on atomistic models of protein-DNA complexes
Angarica, Vladimir Espinosa; Pérez, Abel González; Vasconcelos, Ana T; Collado-Vides, Julio; Contreras-Moreira, Bruno
2008-01-01
Background The specific recognition of genomic cis-regulatory elements by transcription factors (TFs) plays an essential role in the regulation of coordinated gene expression. Studying the mechanisms determining binding specificity in protein-DNA interactions is thus an important goal. Most current approaches for modeling TF specific recognition rely on the knowledge of large sets of cognate target sites and consider only the information contained in their primary sequence. Results Here we describe a structure-based methodology for predicting sequence motifs starting from the coordinates of a TF-DNA complex. Our algorithm combines information regarding the direct and indirect readout of DNA into an atomistic statistical model, which is used to estimate the interaction potential. We first measure the ability of our method to correctly estimate the binding specificities of eight prokaryotic and eukaryotic TFs that belong to different structural superfamilies. Secondly, the method is applied to two homology models, finding that sampling of interface side-chain rotamers remarkably improves the results. Thirdly, the algorithm is compared with a reference structural method based on contact counts, obtaining comparable predictions for the experimental complexes and more accurate sequence motifs for the homology models. Conclusion Our results demonstrate that atomic-detail structural information can be feasibly used to predict TF binding sites. The computational method presented here is universal and might be applied to other systems involving protein-DNA recognition. PMID:18922190
Exploring the Sequence-based Prediction of Folding Initiation Sites in Proteins.
Raimondi, Daniele; Orlando, Gabriele; Pancsa, Rita; Khan, Taushif; Vranken, Wim F
2017-08-18
Protein folding is a complex process that can lead to disease when it fails. Especially poorly understood are the very early stages of protein folding, which are likely defined by intrinsic local interactions between amino acids close to each other in the protein sequence. We here present EFoldMine, a method that predicts, from the primary amino acid sequence of a protein, which amino acids are likely involved in early folding events. The method is based on early folding data from hydrogen deuterium exchange (HDX) data from NMR pulsed labelling experiments, and uses backbone and sidechain dynamics as well as secondary structure propensities as features. The EFoldMine predictions give insights into the folding process, as illustrated by a qualitative comparison with independent experimental observations. Furthermore, on a quantitative proteome scale, the predicted early folding residues tend to become the residues that interact the most in the folded structure, and they are often residues that display evolutionary covariation. The connection of the EFoldMine predictions with both folding pathway data and the folded protein structure suggests that the initial statistical behavior of the protein chain with respect to local structure formation has a lasting effect on its subsequent states.
Cardone, Antonio; Pant, Harish; Hassan, Sergio A.
2013-01-01
Weak and ultra-weak protein-protein association play a role in molecular recognition, and can drive spontaneous self-assembly and aggregation. Such interactions are difficult to detect experimentally, and are a challenge to the force field and sampling technique. A method is proposed to identify low-population protein-protein binding modes in aqueous solution. The method is designed to identify preferential first-encounter complexes from which the final complex(es) at equilibrium evolves. A continuum model is used to represent the effects of the solvent, which accounts for short- and long-range effects of water exclusion and for liquid-structure forces at protein/liquid interfaces. These effects control the behavior of proteins in close proximity and are optimized based on binding enthalpy data and simulations. An algorithm is described to construct a biasing function for self-adaptive configurational-bias Monte Carlo of a set of interacting proteins. The function allows mixing large and local changes in the spatial distribution of proteins, thereby enhancing sampling of relevant microstates. The method is applied to three binary systems. Generalization to multiprotein complexes is discussed. PMID:24044772
Ma, Jianzhu; Wang, Sheng
2015-01-01
The solvent accessibility of protein residues is one of the driving forces of protein folding, while the contact number of protein residues limits the possibilities of protein conformations. The de novo prediction of these properties from protein sequence is important for the study of protein structure and function. Although these two properties are certainly related with each other, it is challenging to exploit this dependency for the prediction. We present a method AcconPred for predicting solvent accessibility and contact number simultaneously, which is based on a shared weight multitask learning framework under the CNF (conditional neural fields) model. The multitask learning framework on a collection of related tasks provides more accurate prediction than the framework trained only on a single task. The CNF method not only models the complex relationship between the input features and the predicted labels, but also exploits the interdependency among adjacent labels. Trained on 5729 monomeric soluble globular protein datasets, AcconPred could reach 0.68 three-state accuracy for solvent accessibility and 0.75 correlation for contact number. Tested on the 105 CASP11 domain datasets for solvent accessibility, AcconPred could reach 0.64 accuracy, which outperforms existing methods.
Ma, Jianzhu; Wang, Sheng
2015-01-01
Motivation. The solvent accessibility of protein residues is one of the driving forces of protein folding, while the contact number of protein residues limits the possibilities of protein conformations. The de novo prediction of these properties from protein sequence is important for the study of protein structure and function. Although these two properties are certainly related with each other, it is challenging to exploit this dependency for the prediction. Method. We present a method AcconPred for predicting solvent accessibility and contact number simultaneously, which is based on a shared weight multitask learning framework under the CNF (conditional neural fields) model. The multitask learning framework on a collection of related tasks provides more accurate prediction than the framework trained only on a single task. The CNF method not only models the complex relationship between the input features and the predicted labels, but also exploits the interdependency among adjacent labels. Results. Trained on 5729 monomeric soluble globular protein datasets, AcconPred could reach 0.68 three-state accuracy for solvent accessibility and 0.75 correlation for contact number. Tested on the 105 CASP11 domain datasets for solvent accessibility, AcconPred could reach 0.64 accuracy, which outperforms existing methods. PMID:26339631
Prediction of the translocon-mediated membrane insertion free energies of protein sequences.
Park, Yungki; Helms, Volkhard
2008-05-15
Helical membrane proteins (HMPs) play crucial roles in a variety of cellular processes. Unlike water-soluble proteins, HMPs need not only to fold but also get inserted into the membrane to be fully functional. This process of membrane insertion is mediated by the translocon complex. Thus, it is of great interest to develop computational methods for predicting the translocon-mediated membrane insertion free energies of protein sequences. We have developed Membrane Insertion (MINS), a novel sequence-based computational method for predicting the membrane insertion free energies of protein sequences. A benchmark test gives a correlation coefficient of 0.74 between predicted and observed free energies for 357 known cases, which corresponds to a mean unsigned error of 0.41 kcal/mol. These results are significantly better than those obtained by traditional hydropathy analysis. Moreover, the ability of MINS to reasonably predict membrane insertion free energies of protein sequences allows for effective identification of transmembrane (TM) segments. Subsequently, MINS was applied to predict the membrane insertion free energies of 316 TM segments found in known structures. An in-depth analysis of the predicted free energies reveals a number of interesting findings about the biogenesis and structural stability of HMPs. A web server for MINS is available at http://service.bioinformatik.uni-saarland.de/mins
Efficient Relaxation of Protein-Protein Interfaces by Discrete Molecular Dynamics Simulations.
Emperador, Agusti; Solernou, Albert; Sfriso, Pedro; Pons, Carles; Gelpi, Josep Lluis; Fernandez-Recio, Juan; Orozco, Modesto
2013-02-12
Protein-protein interactions are responsible for the transfer of information inside the cell and represent one of the most interesting research fields in structural biology. Unfortunately, after decades of intense research, experimental approaches still have difficulties in providing 3D structures for the hundreds of thousands of interactions formed between the different proteins in a living organism. The use of theoretical approaches like docking aims to complement experimental efforts to represent the structure of the protein interactome. However, we cannot ignore that current methods have limitations due to problems of sampling of the protein-protein conformational space and the lack of accuracy of available force fields. Cases that are especially difficult for prediction are those in which complex formation implies a non-negligible change in the conformation of the interacting proteins, i.e., those cases where protein flexibility plays a key role in protein-protein docking. In this work, we present a new approach to treat flexibility in docking by global structural relaxation based on ultrafast discrete molecular dynamics. On a standard benchmark of protein complexes, the method provides a general improvement over the results obtained by rigid docking. The method is especially efficient in cases with large conformational changes upon binding, in which structure relaxation with discrete molecular dynamics leads to a predictive success rate double that obtained with state-of-the-art rigid-body docking.
Brender, Jeffrey R.; Zhang, Yang
2015-01-01
The formation of protein-protein complexes is essential for proteins to perform their physiological functions in the cell. Mutations that prevent the proper formation of the correct complexes can have serious consequences for the associated cellular processes. Since experimental determination of protein-protein binding affinity remains difficult when performed on a large scale, computational methods for predicting the consequences of mutations on binding affinity are highly desirable. We show that a scoring function based on interface structure profiles collected from analogous protein-protein interactions in the PDB is a powerful predictor of protein binding affinity changes upon mutation. As a standalone feature, the differences between the interface profile score of the mutant and wild-type proteins has an accuracy equivalent to the best all-atom potentials, despite being two orders of magnitude faster once the profile has been constructed. Due to its unique sensitivity in collecting the evolutionary profiles of analogous binding interactions and the high speed of calculation, the interface profile score has additional advantages as a complementary feature to combine with physics-based potentials for improving the accuracy of composite scoring approaches. By incorporating the sequence-derived and residue-level coarse-grained potentials with the interface structure profile score, a composite model was constructed through the random forest training, which generates a Pearson correlation coefficient >0.8 between the predicted and observed binding free-energy changes upon mutation. This accuracy is comparable to, or outperforms in most cases, the current best methods, but does not require high-resolution full-atomic models of the mutant structures. The binding interface profiling approach should find useful application in human-disease mutation recognition and protein interface design studies. PMID:26506533
Hierarchical Ensemble Methods for Protein Function Prediction
2014-01-01
Protein function prediction is a complex multiclass multilabel classification problem, characterized by multiple issues such as the incompleteness of the available annotations, the integration of multiple sources of high dimensional biomolecular data, the unbalance of several functional classes, and the difficulty of univocally determining negative examples. Moreover, the hierarchical relationships between functional classes that characterize both the Gene Ontology and FunCat taxonomies motivate the development of hierarchy-aware prediction methods that showed significantly better performances than hierarchical-unaware “flat” prediction methods. In this paper, we provide a comprehensive review of hierarchical methods for protein function prediction based on ensembles of learning machines. According to this general approach, a separate learning machine is trained to learn a specific functional term and then the resulting predictions are assembled in a “consensus” ensemble decision, taking into account the hierarchical relationships between classes. The main hierarchical ensemble methods proposed in the literature are discussed in the context of existing computational methods for protein function prediction, highlighting their characteristics, advantages, and limitations. Open problems of this exciting research area of computational biology are finally considered, outlining novel perspectives for future research. PMID:25937954
Rahaman, Obaidur; Estrada, Trilce P.; Doren, Douglas J.; Taufer, Michela; Brooks, Charles L.; Armen, Roger S.
2011-01-01
The performance of several two-step scoring approaches for molecular docking were assessed for their ability to predict binding geometries and free energies. Two new scoring functions designed for “step 2 discrimination” were proposed and compared to our CHARMM implementation of the linear interaction energy (LIE) approach using the Generalized-Born with Molecular Volume (GBMV) implicit solvation model. A scoring function S1 was proposed by considering only “interacting” ligand atoms as the “effective size” of the ligand, and extended to an empirical regression-based pair potential S2. The S1 and S2 scoring schemes were trained and five-fold cross validated on a diverse set of 259 protein-ligand complexes from the Ligand Protein Database (LPDB). The regression-based parameters for S1 and S2 also demonstrated reasonable transferability in the CSARdock 2010 benchmark using a new dataset (NRC HiQ) of diverse protein-ligand complexes. The ability of the scoring functions to accurately predict ligand geometry was evaluated by calculating the discriminative power (DP) of the scoring functions to identify native poses. The parameters for the LIE scoring function with the optimal discriminative power (DP) for geometry (step 1 discrimination) were found to be very similar to the best-fit parameters for binding free energy over a large number of protein-ligand complexes (step 2 discrimination). Reasonable performance of the scoring functions in enrichment of active compounds in four different protein target classes established that the parameters for S1 and S2 provided reasonable accuracy and transferability. Additional analysis was performed to definitively separate scoring function performance from molecular weight effects. This analysis included the prediction of ligand binding efficiencies for a subset of the CSARdock NRC HiQ dataset where the number of ligand heavy atoms ranged from 17 to 35. This range of ligand heavy atoms is where improved accuracy of predicted ligand efficiencies is most relevant to real-world drug design efforts. PMID:21644546
Rahaman, Obaidur; Estrada, Trilce P; Doren, Douglas J; Taufer, Michela; Brooks, Charles L; Armen, Roger S
2011-09-26
The performances of several two-step scoring approaches for molecular docking were assessed for their ability to predict binding geometries and free energies. Two new scoring functions designed for "step 2 discrimination" were proposed and compared to our CHARMM implementation of the linear interaction energy (LIE) approach using the Generalized-Born with Molecular Volume (GBMV) implicit solvation model. A scoring function S1 was proposed by considering only "interacting" ligand atoms as the "effective size" of the ligand and extended to an empirical regression-based pair potential S2. The S1 and S2 scoring schemes were trained and 5-fold cross-validated on a diverse set of 259 protein-ligand complexes from the Ligand Protein Database (LPDB). The regression-based parameters for S1 and S2 also demonstrated reasonable transferability in the CSARdock 2010 benchmark using a new data set (NRC HiQ) of diverse protein-ligand complexes. The ability of the scoring functions to accurately predict ligand geometry was evaluated by calculating the discriminative power (DP) of the scoring functions to identify native poses. The parameters for the LIE scoring function with the optimal discriminative power (DP) for geometry (step 1 discrimination) were found to be very similar to the best-fit parameters for binding free energy over a large number of protein-ligand complexes (step 2 discrimination). Reasonable performance of the scoring functions in enrichment of active compounds in four different protein target classes established that the parameters for S1 and S2 provided reasonable accuracy and transferability. Additional analysis was performed to definitively separate scoring function performance from molecular weight effects. This analysis included the prediction of ligand binding efficiencies for a subset of the CSARdock NRC HiQ data set where the number of ligand heavy atoms ranged from 17 to 35. This range of ligand heavy atoms is where improved accuracy of predicted ligand efficiencies is most relevant to real-world drug design efforts.
Serricchio, Mauro; Vissa, Adriano; Kim, Peter K; Yip, Christopher M; McQuibban, G Angus
2018-04-01
The mitochondrial glycerophospholipid cardiolipin plays important roles in mitochondrial biology. Most notably, cardiolipin directly binds to mitochondrial proteins and helps assemble and stabilize mitochondrial multi-protein complexes. Despite their importance for mitochondrial health, how the proteins involved in cardiolipin biosynthesis are organized and embedded in mitochondrial membranes has not been investigated in detail. Here we show that human PGS1 and CLS1 are constituents of large protein complexes. We show that PGS1 forms oligomers and associates with CLS1 and PTPMT1. Using super-resolution microscopy, we observed well-organized nanoscale structures formed by PGS1. Together with the observation that cardiolipin and CLS1 are not required for PGS1 to assemble in the complex we predict the presence of a PGS1-centered cardiolipin-synthesizing scaffold within the mitochondrial inner membrane. Using an unbiased proteomic approach we found that PGS1 and CLS1 interact with multiple cardiolipin-binding mitochondrial membrane proteins, including prohibitins, stomatin-like protein 2 and the MICOS components MIC60 and MIC19. We further mapped the protein-protein interaction sites between PGS1 and itself, CLS1, MIC60 and PHB. Overall, this study provides evidence for the presence of a cardiolipin synthesis structure that transiently interacts with cardiolipin-dependent protein complexes. Copyright © 2018 Elsevier B.V. All rights reserved.
Computational prediction of protein-protein interactions in Leishmania predicted proteomes.
Rezende, Antonio M; Folador, Edson L; Resende, Daniela de M; Ruiz, Jeronimo C
2012-01-01
The Trypanosomatids parasites Leishmania braziliensis, Leishmania major and Leishmania infantum are important human pathogens. Despite of years of study and genome availability, effective vaccine has not been developed yet, and the chemotherapy is highly toxic. Therefore, it is clear just interdisciplinary integrated studies will have success in trying to search new targets for developing of vaccines and drugs. An essential part of this rationale is related to protein-protein interaction network (PPI) study which can provide a better understanding of complex protein interactions in biological system. Thus, we modeled PPIs for Trypanosomatids through computational methods using sequence comparison against public database of protein or domain interaction for interaction prediction (Interolog Mapping) and developed a dedicated combined system score to address the predictions robustness. The confidence evaluation of network prediction approach was addressed using gold standard positive and negative datasets and the AUC value obtained was 0.94. As result, 39,420, 43,531 and 45,235 interactions were predicted for L. braziliensis, L. major and L. infantum respectively. For each predicted network the top 20 proteins were ranked by MCC topological index. In addition, information related with immunological potential, degree of protein sequence conservation among orthologs and degree of identity compared to proteins of potential parasite hosts was integrated. This information integration provides a better understanding and usefulness of the predicted networks that can be valuable to select new potential biological targets for drug and vaccine development. Network modularity which is a key when one is interested in destabilizing the PPIs for drug or vaccine purposes along with multiple alignments of the predicted PPIs were performed revealing patterns associated with protein turnover. In addition, around 50% of hypothetical protein present in the networks received some degree of functional annotation which represents an important contribution since approximately 60% of Leishmania predicted proteomes has no predicted function.
Pan, Yuliang; Wang, Zixiang; Zhan, Weihua; Deng, Lei
2018-05-01
Identifying RNA-binding residues, especially energetically favored hot spots, can provide valuable clues for understanding the mechanisms and functional importance of protein-RNA interactions. Yet, limited availability of experimentally recognized energy hot spots in protein-RNA crystal structures leads to the difficulties in developing empirical identification approaches. Computational prediction of RNA-binding hot spot residues is still in its infant stage. Here, we describe a computational method, PrabHot (Prediction of protein-RNA binding hot spots), that can effectively detect hot spot residues on protein-RNA binding interfaces using an ensemble of conceptually different machine learning classifiers. Residue interaction network features and new solvent exposure characteristics are combined together and selected for classification with the Boruta algorithm. In particular, two new reference datasets (benchmark and independent) have been generated containing 107 hot spots from 47 known protein-RNA complex structures. In 10-fold cross-validation on the training dataset, PrabHot achieves promising performances with an AUC score of 0.86 and a sensitivity of 0.78, which are significantly better than that of the pioneer RNA-binding hot spot prediction method HotSPRing. We also demonstrate the capability of our proposed method on the independent test dataset and gain a competitive advantage as a result. The PrabHot webserver is freely available at http://denglab.org/PrabHot/. leideng@csu.edu.cn. Supplementary data are available at Bioinformatics online.
NIAS-Server: Neighbors Influence of Amino acids and Secondary Structures in Proteins.
Borguesan, Bruno; Inostroza-Ponta, Mario; Dorn, Márcio
2017-03-01
The exponential growth in the number of experimentally determined three-dimensional protein structures provide a new and relevant knowledge about the conformation of amino acids in proteins. Only a few of probability densities of amino acids are publicly available for use in structure validation and prediction methods. NIAS (Neighbors Influence of Amino acids and Secondary structures) is a web-based tool used to extract information about conformational preferences of amino acid residues and secondary structures in experimental-determined protein templates. This information is useful, for example, to characterize folds and local motifs in proteins, molecular folding, and can help the solution of complex problems such as protein structure prediction, protein design, among others. The NIAS-Server and supplementary data are available at http://sbcb.inf.ufrgs.br/nias .
New strategy for protein interactions and application to structure-based drug design
NASA Astrophysics Data System (ADS)
Zou, Xiaoqin
One of the greatest challenges in computational biophysics is to predict interactions between biological molecules, which play critical roles in biological processes and rational design of therapeutic drugs. Biomolecular interactions involve delicate interplay between multiple interactions, including electrostatic interactions, van der Waals interactions, solvent effect, and conformational entropic effect. Accurate determination of these complex and subtle interactions is challenging. Moreover, a biological molecule such as a protein usually consists of thousands of atoms, and thus occupies a huge conformational space. The large degrees of freedom pose further challenges for accurate prediction of biomolecular interactions. Here, I will present our development of physics-based theory and computational modeling on protein interactions with other molecules. The major strategy is to extract microscopic energetics from the information embedded in the experimentally-determined structures of protein complexes. I will also present applications of the methods to structure-based therapeutic design. Supported by NSF CAREER Award DBI-0953839, NIH R01GM109980, and the American Heart Association (Midwest Affiliate) [13GRNT16990076].
Le Meur, Nolwenn; Gentleman, Robert
2008-01-01
Background Synthetic lethality defines a genetic interaction where the combination of mutations in two or more genes leads to cell death. The implications of synthetic lethal screens have been discussed in the context of drug development as synthetic lethal pairs could be used to selectively kill cancer cells, but leave normal cells relatively unharmed. A challenge is to assess genome-wide experimental data and integrate the results to better understand the underlying biological processes. We propose statistical and computational tools that can be used to find relationships between synthetic lethality and cellular organizational units. Results In Saccharomyces cerevisiae, we identified multi-protein complexes and pairs of multi-protein complexes that share an unusually high number of synthetic genetic interactions. As previously predicted, we found that synthetic lethality can arise from subunits of an essential multi-protein complex or between pairs of multi-protein complexes. Finally, using multi-protein complexes allowed us to take into account the pleiotropic nature of the gene products. Conclusions Modeling synthetic lethality using current estimates of the yeast interactome is an efficient approach to disentangle some of the complex molecular interactions that drive a cell. Our model in conjunction with applied statistical methods and computational methods provides new tools to better characterize synthetic genetic interactions. PMID:18789146
Electrostatic design of protein-protein association rates.
Schreiber, Gideon; Shaul, Yossi; Gottschalk, Kay E
2006-01-01
De novo design and redesign of proteins and protein complexes have made promising progress in recent years. Here, we give an overview of how to use available computer-based tools to design proteins to bind faster and tighter to their protein-complex partner by electrostatic optimization between the two proteins. Electrostatic optimization is possible because of the simple relation between the Debye-Huckel energy of interaction between a pair of proteins and their rate of association. This can be used for rapid, structure-based calculations of the electrostatic attraction between the two proteins in the complex. Using these principles, we developed two computer programs that predict the change in k(on), and as such the affinity, on introducing charged mutations. The two programs have a web interface that is available at
Ripoche, Hugues; Laine, Elodie; Ceres, Nicoletta; Carbone, Alessandra
2017-01-04
The database JET2 Viewer, openly accessible at http://www.jet2viewer.upmc.fr/, reports putative protein binding sites for all three-dimensional (3D) structures available in the Protein Data Bank (PDB). This knowledge base was generated by applying the computational method JET 2 at large-scale on more than 20 000 chains. JET 2 strategy yields very precise predictions of interacting surfaces and unravels their evolutionary process and complexity. JET2 Viewer provides an online intelligent display, including interactive 3D visualization of the binding sites mapped onto PDB structures and suitable files recording JET 2 analyses. Predictions were evaluated on more than 15 000 experimentally characterized protein interfaces. This is, to our knowledge, the largest evaluation of a protein binding site prediction method. The overall performance of JET 2 on all interfaces are: Sen = 52.52, PPV = 51.24, Spe = 80.05, Acc = 75.89. The data can be used to foster new strategies for protein-protein interactions modulation and interaction surface redesign. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Chen, Fu; Sun, Huiyong; Wang, Junmei; Zhu, Feng; Liu, Hui; Wang, Zhe; Lei, Tailong; Li, Youyong; Hou, Tingjun
2018-06-21
Molecular docking provides a computationally efficient way to predict the atomic structural details of protein-RNA interactions (PRI), but accurate prediction of the three-dimensional structures and binding affinities for PRI is still notoriously difficult, partly due to the unreliability of the existing scoring functions for PRI. MM/PBSA and MM/GBSA are more theoretically rigorous than most scoring functions for protein-RNA docking, but their prediction performance for protein-RNA systems remains unclear. Here, we systemically evaluated the capability of MM/PBSA and MM/GBSA to predict the binding affinities and recognize the near-native binding structures for protein-RNA systems with different solvent models and interior dielectric constants (ϵ in ). For predicting the binding affinities, the predictions given by MM/GBSA based on the minimized structures in explicit solvent and the GBGBn1 model with ϵ in = 2 yielded the highest correlation with the experimental data. Moreover, the MM/GBSA calculations based on the minimized structures in implicit solvent and the GBGBn1 model distinguished the near-native binding structures within the top 10 decoys for 118 out of the 149 protein-RNA systems (79.2%). This performance is better than all docking scoring functions studied here. Therefore, the MM/GBSA rescoring is an efficient way to improve the prediction capability of scoring functions for protein-RNA systems. Published by Cold Spring Harbor Laboratory Press for the RNA Society.
PCoM-DB Update: A Protein Co-Migration Database for Photosynthetic Organisms.
Takabayashi, Atsushi; Takabayashi, Saeka; Takahashi, Kaori; Watanabe, Mai; Uchida, Hiroko; Murakami, Akio; Fujita, Tomomichi; Ikeuchi, Masahiko; Tanaka, Ayumi
2017-01-01
The identification of protein complexes is important for the understanding of protein structure and function and the regulation of cellular processes. We used blue-native PAGE and tandem mass spectrometry to identify protein complexes systematically, and built a web database, the protein co-migration database (PCoM-DB, http://pcomdb.lowtem.hokudai.ac.jp/proteins/top), to provide prediction tools for protein complexes. PCoM-DB provides migration profiles for any given protein of interest, and allows users to compare them with migration profiles of other proteins, showing the oligomeric states of proteins and thus identifying potential interaction partners. The initial version of PCoM-DB (launched in January 2013) included protein complex data for Synechocystis whole cells and Arabidopsis thaliana thylakoid membranes. Here we report PCoM-DB version 2.0, which includes new data sets and analytical tools. Additional data are included from whole cells of the pelagic marine picocyanobacterium Prochlorococcus marinus, the thermophilic cyanobacterium Thermosynechococcus elongatus, the unicellular green alga Chlamydomonas reinhardtii and the bryophyte Physcomitrella patens. The Arabidopsis protein data now include data for intact mitochondria, intact chloroplasts, chloroplast stroma and chloroplast envelopes. The new tools comprise a multiple-protein search form and a heat map viewer for protein migration profiles. Users can compare migration profiles of a protein of interest among different organelles or compare migration profiles among different proteins within the same sample. For Arabidopsis proteins, users can compare migration profiles of a protein of interest with putative homologous proteins from non-Arabidopsis organisms. The updated PCoM-DB will help researchers find novel protein complexes and estimate their evolutionary changes in the green lineage. © The Author 2017. Published by Oxford University Press on behalf of Japanese Society of Plant Physiologists. All rights reserved. For permissions, please email: journals.permissions@oup.com.
Granular support vector machines with association rules mining for protein homology prediction.
Tang, Yuchun; Jin, Bo; Zhang, Yan-Qing
2005-01-01
Protein homology prediction between protein sequences is one of critical problems in computational biology. Such a complex classification problem is common in medical or biological information processing applications. How to build a model with superior generalization capability from training samples is an essential issue for mining knowledge to accurately predict/classify unseen new samples and to effectively support human experts to make correct decisions. A new learning model called granular support vector machines (GSVM) is proposed based on our previous work. GSVM systematically and formally combines the principles from statistical learning theory and granular computing theory and thus provides an interesting new mechanism to address complex classification problems. It works by building a sequence of information granules and then building support vector machines (SVM) in some of these information granules on demand. A good granulation method to find suitable granules is crucial for modeling a GSVM with good performance. In this paper, we also propose an association rules-based granulation method. For the granules induced by association rules with high enough confidence and significant support, we leave them as they are because of their high "purity" and significant effect on simplifying the classification task. For every other granule, a SVM is modeled to discriminate the corresponding data. In this way, a complex classification problem is divided into multiple smaller problems so that the learning task is simplified. The proposed algorithm, here named GSVM-AR, is compared with SVM by KDDCUP04 protein homology prediction data. The experimental results show that finding the splitting hyperplane is not a trivial task (we should be careful to select the association rules to avoid overfitting) and GSVM-AR does show significant improvement compared to building one single SVM in the whole feature space. Another advantage is that the utility of GSVM-AR is very good because it is easy to be implemented. More importantly and more interestingly, GSVM provides a new mechanism to address complex classification problems.
Year 2 Report: Protein Function Prediction Platform
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhou, C E
2012-04-27
Upon completion of our second year of development in a 3-year development cycle, we have completed a prototype protein structure-function annotation and function prediction system: Protein Function Prediction (PFP) platform (v.0.5). We have met our milestones for Years 1 and 2 and are positioned to continue development in completion of our original statement of work, or a reasonable modification thereof, in service to DTRA Programs involved in diagnostics and medical countermeasures research and development. The PFP platform is a multi-scale computational modeling system for protein structure-function annotation and function prediction. As of this writing, PFP is the only existing fullymore » automated, high-throughput, multi-scale modeling, whole-proteome annotation platform, and represents a significant advance in the field of genome annotation (Fig. 1). PFP modules perform protein functional annotations at the sequence, systems biology, protein structure, and atomistic levels of biological complexity (Fig. 2). Because these approaches provide orthogonal means of characterizing proteins and suggesting protein function, PFP processing maximizes the protein functional information that can currently be gained by computational means. Comprehensive annotation of pathogen genomes is essential for bio-defense applications in pathogen characterization, threat assessment, and medical countermeasure design and development in that it can short-cut the time and effort required to select and characterize protein biomarkers.« less
Song, Mi-Ryoung; Sun, Yunfu; Bryson, Ami; Gill, Gordon N.; Evans, Sylvia M.; Pfaff, Samuel L.
2009-01-01
Summary LIM transcription factors bind to nuclear LIM interactor (Ldb/NLI/Clim) in specific ratios to form higher-order complexes that regulate gene expression. Here we examined how the dosage of LIM homeodomain proteins Isl1 and Isl2 and LIM-only protein Lmo4 influences the assembly and function of complexes involved in the generation of spinal motor neurons (MNs) and V2a interneurons (INs). Reducing the levels of Islet proteins using a graded series of mutations favored V2a IN differentiation at the expense of MN formation. Although LIM-only proteins (LMOs) are predicted to antagonize the function of Islet proteins, we found that the presence or absence of Lmo4 had little influence on MN or V2a IN specification. We did find, however, that the loss of MNs resulting from reduced Islet levels was rescued by eliminating Lmo4, unmasking a functional interaction between these proteins. Our findings demonstrate that MN and V2a IN fates are specified by distinct complexes that are sensitive to the relative stoichiometries of the constituent factors and we present a model to explain how LIM domain proteins modulate these complexes and, thereby, this binary-cell-fate decision. PMID:19666821
Compressed learning and its applications to subcellular localization.
Zheng, Zhong-Long; Guo, Li; Jia, Jiong; Xie, Chen-Mao; Zeng, Wen-Cai; Yang, Jie
2011-09-01
One of the main challenges faced by biological applications is to predict protein subcellular localization in automatic fashion accurately. To achieve this in these applications, a wide variety of machine learning methods have been proposed in recent years. Most of them focus on finding the optimal classification scheme and less of them take the simplifying the complexity of biological systems into account. Traditionally, such bio-data are analyzed by first performing a feature selection before classification. Motivated by CS (Compressed Sensing) theory, we propose the methodology which performs compressed learning with a sparseness criterion such that feature selection and dimension reduction are merged into one analysis. The proposed methodology decreases the complexity of biological system, while increases protein subcellular localization accuracy. Experimental results are quite encouraging, indicating that the aforementioned sparse methods are quite promising in dealing with complicated biological problems, such as predicting the subcellular localization of Gram-negative bacterial proteins.
Shao, Jinzhen; Zhang, Yubo; Yu, Jianlan; Guo, Lin; Ding, Yi
2011-01-01
Thylakoid membrane complexes of rice (Oryza sativa L.) play crucial roles in growth and crop production. Understanding of protein interactions within the complex would provide new insights into photosynthesis. Here, a new "Double-Strips BN/SDS-PAGE" method was employed to separate thylakoid membrane complexes in order to increase the protein abundance on 2D-gels and to facilitate the identification of hydrophobic transmembrane proteins. A total of 58 protein spots could be observed and subunit constitution of these complexes exhibited on 2D-gels. The generality of this new approach was confirmed using thylakoid membrane from spinach (Spinacia oleracea) and pumpkin (Cucurita spp). Furthermore, the proteins separated from rice thylakoid membrane were identified by the mass spectrometry (MS). The stromal ridge proteins PsaD and PsaE were identified both in the holo- and core- PSI complexes of rice. Using molecular dynamics simulation to explore the recognition mechanism of these subunits, we showed that salt bridge interactions between residues R19 of PsaC and E168 of PasD as well as R75 of PsaC and E91 of PsaD played important roles in the stability of the complex. This stromal ridge subunits interaction was also supported by the subsequent analysis of the binding free energy, the intramolecular distances and the intramolecular energy.
Large-scale De Novo Prediction of Physical Protein-Protein Association*
Elefsinioti, Antigoni; Saraç, Ömer Sinan; Hegele, Anna; Plake, Conrad; Hubner, Nina C.; Poser, Ina; Sarov, Mihail; Hyman, Anthony; Mann, Matthias; Schroeder, Michael; Stelzl, Ulrich; Beyer, Andreas
2011-01-01
Information about the physical association of proteins is extensively used for studying cellular processes and disease mechanisms. However, complete experimental mapping of the human interactome will remain prohibitively difficult in the near future. Here we present a map of predicted human protein interactions that distinguishes functional association from physical binding. Our network classifies more than 5 million protein pairs predicting 94,009 new interactions with high confidence. We experimentally tested a subset of these predictions using yeast two-hybrid analysis and affinity purification followed by quantitative mass spectrometry. Thus we identified 462 new protein-protein interactions and confirmed the predictive power of the network. These independent experiments address potential issues of circular reasoning and are a distinctive feature of this work. Analysis of the physical interactome unravels subnetworks mediating between different functional and physical subunits of the cell. Finally, we demonstrate the utility of the network for the analysis of molecular mechanisms of complex diseases by applying it to genome-wide association studies of neurodegenerative diseases. This analysis provides new evidence implying TOMM40 as a factor involved in Alzheimer's disease. The network provides a high-quality resource for the analysis of genomic data sets and genetic association studies in particular. Our interactome is available via the hPRINT web server at: www.print-db.org. PMID:21836163
Jiménez-García, Brian; Pons, Carles; Fernández-Recio, Juan
2013-07-01
pyDockWEB is a web server for the rigid-body docking prediction of protein-protein complex structures using a new version of the pyDock scoring algorithm. We use here a new custom parallel FTDock implementation, with adjusted grid size for optimal FFT calculations, and a new version of pyDock, which dramatically speeds up calculations while keeping the same predictive accuracy. Given the 3D coordinates of two interacting proteins, pyDockWEB returns the best docking orientations as scored mainly by electrostatics and desolvation energy. The server does not require registration by the user and is freely accessible for academics at http://life.bsc.es/servlet/pydock. Supplementary data are available at Bioinformatics online.
Lapek, John D; Greninger, Patricia; Morris, Robert; Amzallag, Arnaud; Pruteanu-Malinici, Iulian; Benes, Cyril H; Haas, Wilhelm
2017-10-01
The formation of protein complexes and the co-regulation of the cellular concentrations of proteins are essential mechanisms for cellular signaling and for maintaining homeostasis. Here we use isobaric-labeling multiplexed proteomics to analyze protein co-regulation and show that this allows the identification of protein-protein associations with high accuracy. We apply this 'interactome mapping by high-throughput quantitative proteome analysis' (IMAHP) method to a panel of 41 breast cancer cell lines and show that deviations of the observed protein co-regulations in specific cell lines from the consensus network affects cellular fitness. Furthermore, these aberrant interactions serve as biomarkers that predict the drug sensitivity of cell lines in screens across 195 drugs. We expect that IMAHP can be broadly used to gain insight into how changing landscapes of protein-protein associations affect the phenotype of biological systems.
Benzekry, Sebastian; Tuszynski, Jack A; Rietman, Edward A; Lakka Klement, Giannoula
2015-05-28
The ever-increasing expanse of online bioinformatics data is enabling new ways to, not only explore the visualization of these data, but also to apply novel mathematical methods to extract meaningful information for clinically relevant analysis of pathways and treatment decisions. One of the methods used for computing topological characteristics of a space at different spatial resolutions is persistent homology. This concept can also be applied to network theory, and more specifically to protein-protein interaction networks, where the number of rings in an individual cancer network represents a measure of complexity. We observed a linear correlation of R = -0.55 between persistent homology and 5-year survival of patients with a variety of cancers. This relationship was used to predict the proteins within a protein-protein interaction network with the most impact on cancer progression. By re-computing the persistent homology after computationally removing an individual node (protein) from the protein-protein interaction network, we were able to evaluate whether such an inhibition would lead to improvement in patient survival. The power of this approach lied in its ability to identify the effects of inhibition of multiple proteins and in the ability to expose whether the effect of a single inhibition may be amplified by inhibition of other proteins. More importantly, we illustrate specific examples of persistent homology calculations, which correctly predict the survival benefit observed effects in clinical trials using inhibitors of the identified molecular target. We propose that computational approaches such as persistent homology may be used in the future for selection of molecular therapies in clinic. The technique uses a mathematical algorithm to evaluate the node (protein) whose inhibition has the highest potential to reduce network complexity. The greater the drop in persistent homology, the greater reduction in network complexity, and thus a larger potential for survival benefit. We hope that the use of advanced mathematics in medicine will provide timely information about the best drug combination for patients, and avoid the expense associated with an unsuccessful clinical trial, where drug(s) did not show a survival benefit.
2010-01-01
Atomistic Molecular Dynamics provides powerful and flexible tools for the prediction and analysis of molecular and macromolecular systems. Specifically, it provides a means by which we can measure theoretically that which cannot be measured experimentally: the dynamic time-evolution of complex systems comprising atoms and molecules. It is particularly suitable for the simulation and analysis of the otherwise inaccessible details of MHC-peptide interaction and, on a larger scale, the simulation of the immune synapse. Progress has been relatively tentative yet the emergence of truly high-performance computing and the development of coarse-grained simulation now offers us the hope of accurately predicting thermodynamic parameters and of simulating not merely a handful of proteins but larger, longer simulations comprising thousands of protein molecules and the cellular scale structures they form. We exemplify this within the context of immunoinformatics. PMID:21067546
Xia, Bing; Mamonov, Artem; Leysen, Seppe; Allen, Karen N; Strelkov, Sergei V; Paschalidis, Ioannis Ch; Vajda, Sandor; Kozakov, Dima
2015-07-30
The protein-protein docking server ClusPro is used by thousands of laboratories, and models built by the server have been reported in over 300 publications. Although the structures generated by the docking include near-native ones for many proteins, selecting the best model is difficult due to the uncertainty in scoring. Small angle X-ray scattering (SAXS) is an experimental technique for obtaining low resolution structural information in solution. While not sufficient on its own to uniquely predict complex structures, accounting for SAXS data improves the ranking of models and facilitates the identification of the most accurate structure. Although SAXS profiles are currently available only for a small number of complexes, due to its simplicity the method is becoming increasingly popular. Since combining docking with SAXS experiments will provide a viable strategy for fairly high-throughput determination of protein complex structures, the option of using SAXS restraints is added to the ClusPro server. © 2015 Wiley Periodicals, Inc. © 2015 Wiley Periodicals, Inc.
Empirical scoring functions for advanced protein-ligand docking with PLANTS.
Korb, Oliver; Stützle, Thomas; Exner, Thomas E
2009-01-01
In this paper we present two empirical scoring functions, PLANTS(CHEMPLP) and PLANTS(PLP), designed for our docking algorithm PLANTS (Protein-Ligand ANT System), which is based on ant colony optimization (ACO). They are related, regarding their functional form, to parts of already published scoring functions and force fields. The parametrization procedure described here was able to identify several parameter settings showing an excellent performance for the task of pose prediction on two test sets comprising 298 complexes in total. Up to 87% of the complexes of the Astex diverse set and 77% of the CCDC/Astex clean listnc (noncovalently bound complexes of the clean list) could be reproduced with root-mean-square deviations of less than 2 A with respect to the experimentally determined structures. A comparison with the state-of-the-art docking tool GOLD clearly shows that this is, especially for the druglike Astex diverse set, an improvement in pose prediction performance. Additionally, optimized parameter settings for the search algorithm were identified, which can be used to balance pose prediction reliability and search speed.
3D DOSY-TROSY to determine the translational diffusion coefficient of large protein complexes.
Didenko, Tatiana; Boelens, Rolf; Rüdiger, Stefan G D
2011-01-01
The translational diffusion coefficient is a sensitive parameter to probe conformational changes in proteins and protein-protein interactions. Pulsed-field gradient NMR spectroscopy allows one to measure the translational diffusion with high accuracy. Two-dimensional (2D) heteronuclear NMR spectroscopy combined with diffusion-ordered spectroscopy (DOSY) provides improved resolution and therefore selectivity when compared with a conventional 1D readout. Here, we show that a combination of selective isotope labelling, 2D ¹H-¹³C methyl-TROSY (transverse relaxation-optimised spectroscopy) and DOSY allows one to study diffusion properties of large protein complexes. We propose that a 3D DOSY-heteronuclear multiple quantum coherence (HMQC) pulse sequence, that uses the TROSY effect of the HMQC sequence for ¹³C methyl-labelled proteins, is highly suitable for measuring the diffusion coefficient of large proteins. We used the 20 kDa co-chaperone p23 as model system to test this 3D DOSY-TROSY technique under various conditions. We determined the diffusion coefficient of p23 in viscous solutions, mimicking large complexes of up to 200 kDa. We found the experimental data to be in excellent agreement with theoretical predictions. To demonstrate the use for complex formation, we applied this technique to record the formation of a complex of p23 with the molecular chaperone Hsp90, which is around 200 kDa. We anticipate that 3D DOSY-TROSY will be a useful tool to study conformational changes in large protein complexes.
Mi, Tian; Merlin, Jerlin Camilus; Deverasetty, Sandeep; Gryk, Michael R; Bill, Travis J; Brooks, Andrew W; Lee, Logan Y; Rathnayake, Viraj; Ross, Christian A; Sargeant, David P; Strong, Christy L; Watts, Paula; Rajasekaran, Sanguthevar; Schiller, Martin R
2012-01-01
Minimotif Miner (MnM available at http://minimotifminer.org or http://mnm.engr.uconn.edu) is an online database for identifying new minimotifs in protein queries. Minimotifs are short contiguous peptide sequences that have a known function in at least one protein. Here we report the third release of the MnM database which has now grown 60-fold to approximately 300,000 minimotifs. Since short minimotifs are by their nature not very complex we also summarize a new set of false-positive filters and linear regression scoring that vastly enhance minimotif prediction accuracy on a test data set. This online database can be used to predict new functions in proteins and causes of disease.
Predicting the Dynamics of Protein Abundance
Mehdi, Ahmed M.; Patrick, Ralph; Bailey, Timothy L.; Bodén, Mikael
2014-01-01
Protein synthesis is finely regulated across all organisms, from bacteria to humans, and its integrity underpins many important processes. Emerging evidence suggests that the dynamic range of protein abundance is greater than that observed at the transcript level. Technological breakthroughs now mean that sequencing-based measurement of mRNA levels is routine, but protocols for measuring protein abundance remain both complex and expensive. This paper introduces a Bayesian network that integrates transcriptomic and proteomic data to predict protein abundance and to model the effects of its determinants. We aim to use this model to follow a molecular response over time, from condition-specific data, in order to understand adaptation during processes such as the cell cycle. With microarray data now available for many conditions, the general utility of a protein abundance predictor is broad. Whereas most quantitative proteomics studies have focused on higher organisms, we developed a predictive model of protein abundance for both Saccharomyces cerevisiae and Schizosaccharomyces pombe to explore the latitude at the protein level. Our predictor primarily relies on mRNA level, mRNA–protein interaction, mRNA folding energy and half-life, and tRNA adaptation. The combination of key features, allowing for the low certainty and uneven coverage of experimental observations, gives comparatively minor but robust prediction accuracy. The model substantially improved the analysis of protein regulation during the cell cycle: predicted protein abundance identified twice as many cell-cycle-associated proteins as experimental mRNA levels. Predicted protein abundance was more dynamic than observed mRNA expression, agreeing with experimental protein abundance from a human cell line. We illustrate how the same model can be used to predict the folding energy of mRNA when protein abundance is available, lending credence to the emerging view that mRNA folding affects translation efficiency. The software and data used in this research are available at http://bioinf.scmb.uq.edu.au/proteinabundance/. PMID:24532840
Predicting the dynamics of protein abundance.
Mehdi, Ahmed M; Patrick, Ralph; Bailey, Timothy L; Bodén, Mikael
2014-05-01
Protein synthesis is finely regulated across all organisms, from bacteria to humans, and its integrity underpins many important processes. Emerging evidence suggests that the dynamic range of protein abundance is greater than that observed at the transcript level. Technological breakthroughs now mean that sequencing-based measurement of mRNA levels is routine, but protocols for measuring protein abundance remain both complex and expensive. This paper introduces a Bayesian network that integrates transcriptomic and proteomic data to predict protein abundance and to model the effects of its determinants. We aim to use this model to follow a molecular response over time, from condition-specific data, in order to understand adaptation during processes such as the cell cycle. With microarray data now available for many conditions, the general utility of a protein abundance predictor is broad. Whereas most quantitative proteomics studies have focused on higher organisms, we developed a predictive model of protein abundance for both Saccharomyces cerevisiae and Schizosaccharomyces pombe to explore the latitude at the protein level. Our predictor primarily relies on mRNA level, mRNA-protein interaction, mRNA folding energy and half-life, and tRNA adaptation. The combination of key features, allowing for the low certainty and uneven coverage of experimental observations, gives comparatively minor but robust prediction accuracy. The model substantially improved the analysis of protein regulation during the cell cycle: predicted protein abundance identified twice as many cell-cycle-associated proteins as experimental mRNA levels. Predicted protein abundance was more dynamic than observed mRNA expression, agreeing with experimental protein abundance from a human cell line. We illustrate how the same model can be used to predict the folding energy of mRNA when protein abundance is available, lending credence to the emerging view that mRNA folding affects translation efficiency. The software and data used in this research are available at http://bioinf.scmb.uq.edu.au/proteinabundance/.
Jiang, Xiaoying; Wei, Rong; Zhang, Tongliang; Gu, Quan
2008-01-01
The function of protein is closely correlated with it subcellular location. Prediction of subcellular location of apoptosis proteins is an important research area in post-genetic era because the knowledge of apoptosis proteins is useful to understand the mechanism of programmed cell death. Compared with the conventional amino acid composition (AAC), the Pseudo Amino Acid composition (PseAA) as originally introduced by Chou can incorporate much more information of a protein sequence so as to remarkably enhance the power of using a discrete model to predict various attributes of a protein. In this study, a novel approach is presented to predict apoptosis protein solely from sequence based on the concept of Chou's PseAA composition. The concept of approximate entropy (ApEn), which is a parameter denoting complexity of time series, is used to construct PseAA composition as additional features. Fuzzy K-nearest neighbor (FKNN) classifier is selected as prediction engine. Particle swarm optimization (PSO) algorithm is adopted for optimizing the weight factors which are important in PseAA composition. Two datasets are used to validate the performance of the proposed approach, which incorporate six subcellular location and four subcellular locations, respectively. The results obtained by jackknife test are quite encouraging. It indicates that the ApEn of protein sequence could represent effectively the information of apoptosis proteins subcellular locations. It can at least play a complimentary role to many of the existing methods, and might become potentially useful tool for protein function prediction. The software in Matlab is available freely by contacting the corresponding author.
Electrostatics, structure prediction, and the energy landscapes for protein folding and binding.
Tsai, Min-Yeh; Zheng, Weihua; Balamurugan, D; Schafer, Nicholas P; Kim, Bobby L; Cheung, Margaret S; Wolynes, Peter G
2016-01-01
While being long in range and therefore weakly specific, electrostatic interactions are able to modulate the stability and folding landscapes of some proteins. The relevance of electrostatic forces for steering the docking of proteins to each other is widely acknowledged, however, the role of electrostatics in establishing specifically funneled landscapes and their relevance for protein structure prediction are still not clear. By introducing Debye-Hückel potentials that mimic long-range electrostatic forces into the Associative memory, Water mediated, Structure, and Energy Model (AWSEM), a transferable protein model capable of predicting tertiary structures, we assess the effects of electrostatics on the landscapes of thirteen monomeric proteins and four dimers. For the monomers, we find that adding electrostatic interactions does not improve structure prediction. Simulations of ribosomal protein S6 show, however, that folding stability depends monotonically on electrostatic strength. The trend in predicted melting temperatures of the S6 variants agrees with experimental observations. Electrostatic effects can play a range of roles in binding. The binding of the protein complex KIX-pKID is largely assisted by electrostatic interactions, which provide direct charge-charge stabilization of the native state and contribute to the funneling of the binding landscape. In contrast, for several other proteins, including the DNA-binding protein FIS, electrostatics causes frustration in the DNA-binding region, which favors its binding with DNA but not with its protein partner. This study highlights the importance of long-range electrostatics in functional responses to problems where proteins interact with their charged partners, such as DNA, RNA, as well as membranes. © 2015 The Protein Society.
Russell, Anthony G; Watanabe, Yoh-ichi; Charette, J Michael; Gray, Michael W
2005-01-01
Box C/D ribonucleoprotein (RNP) particles mediate O2'-methylation of rRNA and other cellular RNA species. In higher eukaryotic taxa, these RNPs are more complex than their archaeal counterparts, containing four core protein components (Snu13p, Nop56p, Nop58p and fibrillarin) compared with three in Archaea. This increase in complexity raises questions about the evolutionary emergence of the eukaryote-specific proteins and structural conservation in these RNPs throughout the eukaryotic domain. In protists, the primarily unicellular organisms comprising the bulk of eukaryotic diversity, the protein composition of box C/D RNPs has not yet been extensively explored. This study describes the complete gene, cDNA and protein sequences of the fibrillarin homolog from the protozoon Euglena gracilis, the first such information to be obtained for a nucleolus-localized protein in this organism. The E.gracilis fibrillarin gene contains a mixture of intron types exhibiting markedly different sizes. In contrast to most other E.gracilis mRNAs characterized to date, the fibrillarin mRNA lacks a spliced leader (SL) sequence. The predicted fibrillarin protein sequence itself is unusual in that it contains a glycine-lysine (GK)-rich domain at its N-terminus rather than the glycine-arginine-rich (GAR) domain found in most other eukaryotic fibrillarins. In an evolutionarily diverse collection of protists that includes E.gracilis, we have also identified putative homologs of the other core protein components of box C/D RNPs, thereby providing evidence that the protein composition seen in the higher eukaryotic complexes was established very early in eukaryotic cell evolution.
Sarkar, Debasree; Patra, Piya; Ghosh, Abhirupa; Saha, Sudipto
2016-01-01
A considerable proportion of protein-protein interactions (PPIs) in the cell are estimated to be mediated by very short peptide segments that approximately conform to specific sequence patterns known as linear motifs (LMs), often present in the disordered regions in the eukaryotic proteins. These peptides have been found to interact with low affinity and are able bind to multiple interactors, thus playing an important role in the PPI networks involving date hubs. In this work, PPI data and de novo motif identification based method (MEME) were used to identify such peptides in three cancer-associated hub proteins-MYC, APC and MDM2. The peptides corresponding to the significant LMs identified for each hub protein were aligned, the overlapping regions across these peptides being termed as overlapping linear peptides (OLPs). These OLPs were thus predicted to be responsible for multiple PPIs of the corresponding hub proteins and a scoring system was developed to rank them. We predicted six OLPs in MYC and five OLPs in MDM2 that scored higher than OLP predictions from randomly generated protein sets. Two OLP sequences from the C-terminal of MYC were predicted to bind with FBXW7, component of an E3 ubiquitin-protein ligase complex involved in proteasomal degradation of MYC. Similarly, we identified peptides in the C-terminal of MDM2 interacting with FKBP3, which has a specific role in auto-ubiquitinylation of MDM2. The peptide sequences predicted in MYC and MDM2 look promising for designing orthosteric inhibitors against possible disease-associated PPIs. Since these OLPs can interact with other proteins as well, these inhibitors should be specific to the targeted interactor to prevent undesired side-effects. This computational framework has been designed to predict and rank the peptide regions that may mediate multiple PPIs and can be applied to other disease-associated date hub proteins for prediction of novel therapeutic targets of small molecule PPI modulators.
A Data Driven Model for Predicting RNA-Protein Interactions based on Gradient Boosting Machine.
Jain, Dharm Skandh; Gupte, Sanket Rajan; Aduri, Raviprasad
2018-06-22
RNA protein interactions (RPI) play a pivotal role in the regulation of various biological processes. Experimental validation of RPI has been time-consuming, paving the way for computational prediction methods. The major limiting factor of these methods has been the accuracy and confidence of the predictions, and our in-house experiments show that they fail to accurately predict RPI involving short RNA sequences such as TERRA RNA. Here, we present a data-driven model for RPI prediction using a gradient boosting classifier. Amino acids and nucleotides are classified based on the high-resolution structural data of RNA protein complexes. The minimum structural unit consisting of five residues is used as the descriptor. Comparative analysis of existing methods shows the consistently higher performance of our method irrespective of the length of RNA present in the RPI. The method has been successfully applied to map RPI networks involving both long noncoding RNA as well as TERRA RNA. The method is also shown to successfully predict RNA and protein hubs present in RPI networks of four different organisms. The robustness of this method will provide a way for predicting RPI networks of yet unknown interactions for both long noncoding RNA and microRNA.
Cang, Zixuan; Wei, Guo-Wei
2018-02-01
Protein-ligand binding is a fundamental biological process that is paramount to many other biological processes, such as signal transduction, metabolic pathways, enzyme construction, cell secretion, and gene expression. Accurate prediction of protein-ligand binding affinities is vital to rational drug design and the understanding of protein-ligand binding and binding induced function. Existing binding affinity prediction methods are inundated with geometric detail and involve excessively high dimensions, which undermines their predictive power for massive binding data. Topology provides the ultimate level of abstraction and thus incurs too much reduction in geometric information. Persistent homology embeds geometric information into topological invariants and bridges the gap between complex geometry and abstract topology. However, it oversimplifies biological information. This work introduces element specific persistent homology (ESPH) or multicomponent persistent homology to retain crucial biological information during topological simplification. The combination of ESPH and machine learning gives rise to a powerful paradigm for macromolecular analysis. Tests on 2 large data sets indicate that the proposed topology-based machine-learning paradigm outperforms other existing methods in protein-ligand binding affinity predictions. ESPH reveals protein-ligand binding mechanism that can not be attained from other conventional techniques. The present approach reveals that protein-ligand hydrophobic interactions are extended to 40Å away from the binding site, which has a significant ramification to drug and protein design. Copyright © 2017 John Wiley & Sons, Ltd.
Zhou, Peng; Wang, Congcong; Tian, Feifei; Ren, Yanrong; Yang, Chao; Huang, Jian
2013-01-01
Quantitative structure-activity relationship (QSAR), a regression modeling methodology that establishes statistical correlation between structure feature and apparent behavior for a series of congeneric molecules quantitatively, has been widely used to evaluate the activity, toxicity and property of various small-molecule compounds such as drugs, toxicants and surfactants. However, it is surprising to see that such useful technique has only very limited applications to biomacromolecules, albeit the solved 3D atom-resolution structures of proteins, nucleic acids and their complexes have accumulated rapidly in past decades. Here, we present a proof-of-concept paradigm for the modeling, prediction and interpretation of the binding affinity of 144 sequence-nonredundant, structure-available and affinity-known protein complexes (Kastritis et al. Protein Sci 20:482-491, 2011) using a biomacromolecular QSAR (BioQSAR) scheme. We demonstrate that the modeling performance and predictive power of BioQSAR are comparable to or even better than that of traditional knowledge-based strategies, mechanism-type methods and empirical scoring algorithms, while BioQSAR possesses certain additional features compared to the traditional methods, such as adaptability, interpretability, deep-validation and high-efficiency. The BioQSAR scheme could be readily modified to infer the biological behavior and functions of other biomacromolecules, if their X-ray crystal structures, NMR conformation assemblies or computationally modeled structures are available.
HDOCK: a web server for protein–protein and protein–DNA/RNA docking based on a hybrid strategy
Yan, Yumeng; Zhang, Di; Zhou, Pei; Li, Botong
2017-01-01
Abstract Protein–protein and protein–DNA/RNA interactions play a fundamental role in a variety of biological processes. Determining the complex structures of these interactions is valuable, in which molecular docking has played an important role. To automatically make use of the binding information from the PDB in docking, here we have presented HDOCK, a novel web server of our hybrid docking algorithm of template-based modeling and free docking, in which cases with misleading templates can be rescued by the free docking protocol. The server supports protein–protein and protein–DNA/RNA docking and accepts both sequence and structure inputs for proteins. The docking process is fast and consumes about 10–20 min for a docking run. Tested on the cases with weakly homologous complexes of <30% sequence identity from five docking benchmarks, the HDOCK pipeline tied with template-based modeling on the protein–protein and protein–DNA benchmarks and performed better than template-based modeling on the three protein–RNA benchmarks when the top 10 predictions were considered. The performance of HDOCK became better when more predictions were considered. Combining the results of HDOCK and template-based modeling by ranking first of the template-based model further improved the predictive power of the server. The HDOCK web server is available at http://hdock.phys.hust.edu.cn/. PMID:28521030
Detecting complexes from edge-weighted PPI networks via genes expression analysis.
Zhang, Zehua; Song, Jian; Tang, Jijun; Xu, Xinying; Guo, Fei
2018-04-24
Identifying complexes from PPI networks has become a key problem to elucidate protein functions and identify signal and biological processes in a cell. Proteins binding as complexes are important roles of life activity. Accurate determination of complexes in PPI networks is crucial for understanding principles of cellular organization. We propose a novel method to identify complexes on PPI networks, based on different co-expression information. First, we use Markov Cluster Algorithm with an edge-weighting scheme to calculate complexes on PPI networks. Then, we propose some significant features, such as graph information and gene expression analysis, to filter and modify complexes predicted by Markov Cluster Algorithm. To evaluate our method, we test on two experimental yeast PPI networks. On DIP network, our method has Precision and F-Measure values of 0.6004 and 0.5528. On MIPS network, our method has F-Measure and S n values of 0.3774 and 0.3453. Comparing to existing methods, our method improves Precision value by at least 0.1752, F-Measure value by at least 0.0448, S n value by at least 0.0771. Experiments show that our method achieves better results than some state-of-the-art methods for identifying complexes on PPI networks, with the prediction quality improved in terms of evaluation criteria.
Predicting helix–helix interactions from residue contacts in membrane proteins
Lo, Allan; Chiu, Yi-Yuan; Rødland, Einar Andreas; Lyu, Ping-Chiang; Sung, Ting-Yi; Hsu, Wen-Lian
2009-01-01
Motivation: Helix–helix interactions play a critical role in the structure assembly, stability and function of membrane proteins. On the molecular level, the interactions are mediated by one or more residue contacts. Although previous studies focused on helix-packing patterns and sequence motifs, few of them developed methods specifically for contact prediction. Results: We present a new hierarchical framework for contact prediction, with an application in membrane proteins. The hierarchical scheme consists of two levels: in the first level, contact residues are predicted from the sequence and their pairing relationships are further predicted in the second level. Statistical analyses on contact propensities are combined with other sequence and structural information for training the support vector machine classifiers. Evaluated on 52 protein chains using leave-one-out cross validation (LOOCV) and an independent test set of 14 protein chains, the two-level approach consistently improves the conventional direct approach in prediction accuracy, with 80% reduction of input for prediction. Furthermore, the predicted contacts are then used to infer interactions between pairs of helices. When at least three predicted contacts are required for an inferred interaction, the accuracy, sensitivity and specificity are 56%, 40% and 89%, respectively. Our results demonstrate that a hierarchical framework can be applied to eliminate false positives (FP) while reducing computational complexity in predicting contacts. Together with the estimated contact propensities, this method can be used to gain insights into helix-packing in membrane proteins. Availability: http://bio-cluster.iis.sinica.edu.tw/TMhit/ Contact: tsung@iis.sinica.edu.tw Supplementary information:Supplementary data are available at Bioinformatics online. PMID:19244388
Energy Landscape and Transition State of Protein-Protein Association
NASA Astrophysics Data System (ADS)
Alsallaq, Ramzi; Zhou, Huan-Xiang
2006-11-01
Formation of a stereospecific protein complex is favored by specific interactions between two proteins but disfavored by the loss of translational and rotational freedom. Echoing the protein folding process, we have previously proposed a transition state for protein-protein association. Here we clarify the specification of the transition state by working with two toy models for protein association. The models demonstrate that a sharp transition between the bound state with numerous short-range interactions but restricted translation and rotational freedom and the unbound state with at most a small number of interactions but expanded configurational freedom. This transition sets the outer boundary of the bound state as well as the transition state for association. The energy landscape is funnel-like, with the deep well of the bound state surrounded by a broad shallow basin. This formalism of protein-protein association is applied to four protein-protein complexes, and is found to give accurate predictions for the effects of charge mutations and ionic strength on the association rates.
Lessons in molecular recognition. 2. Assessing and improving cross-docking accuracy.
Sutherland, Jeffrey J; Nandigam, Ravi K; Erickson, Jon A; Vieth, Michal
2007-01-01
Docking methods are used to predict the manner in which a ligand binds to a protein receptor. Many studies have assessed the success rate of programs in self-docking tests, whereby a ligand is docked into the protein structure from which it was extracted. Cross-docking, or using a protein structure from a complex containing a different ligand, provides a more realistic assessment of a docking program's ability to reproduce X-ray results. In this work, cross-docking was performed with CDocker, Fred, and Rocs using multiple X-ray structures for eight proteins (two kinases, one nuclear hormone receptor, one serine protease, two metalloproteases, and two phosphodiesterases). While average cross-docking accuracy is not encouraging, it is shown that using the protein structure from the complex that contains the bound ligand most similar to the docked ligand increases docking accuracy for all methods ("similarity selection"). Identifying the most successful protein conformer ("best selection") and similarity selection substantially reduce the difference between self-docking and average cross-docking accuracy. We identify universal predictors of docking accuracy (i.e., showing consistent behavior across most protein-method combinations), and show that models for predicting docking accuracy built using these parameters can be used to select the most appropriate docking method.
Docking and scoring protein complexes: CAPRI 3rd Edition.
Lensink, Marc F; Méndez, Raúl; Wodak, Shoshana J
2007-12-01
The performance of methods for predicting protein-protein interactions at the atomic scale is assessed by evaluating blind predictions performed during 2005-2007 as part of Rounds 6-12 of the community-wide experiment on Critical Assessment of PRedicted Interactions (CAPRI). These Rounds also included a new scoring experiment, where a larger set of models contributed by the predictors was made available to groups developing scoring functions. These groups scored the uploaded set and submitted their own best models for assessment. The structures of nine protein complexes including one homodimer were used as targets. These targets represent biologically relevant interactions involved in gene expression, signal transduction, RNA, or protein processing and membrane maintenance. For all the targets except one, predictions started from the experimentally determined structures of the free (unbound) components or from models derived by homology, making it mandatory for docking methods to model the conformational changes that often accompany association. In total, 63 groups and eight automatic servers, a substantial increase from previous years, submitted docking predictions, of which 1994 were evaluated here. Fifteen groups submitted 305 models for five targets in the scoring experiment. Assessment of the predictions reveals that 31 different groups produced models of acceptable and medium accuracy-but only one high accuracy submission-for all the targets, except the homodimer. In the latter, none of the docking procedures reproduced the large conformational adjustment required for correct assembly, underscoring yet again that handling protein flexibility remains a major challenge. In the scoring experiment, a large fraction of the groups attained the set goal of singling out the correct association modes from incorrect solutions in the limited ensembles of contributed models. But in general they seemed unable to identify the best models, indicating that current scoring methods are probably not sensitive enough. With the increased focus on protein assemblies, in particular by structural genomics efforts, the growing community of CAPRI predictors is engaged more actively than ever in the development of better scoring functions and means of modeling conformational flexibility, which hold promise for much progress in the future. (c) 2007 Wiley-Liss, Inc.
Kastritis, Panagiotis L; Rodrigues, João P G L M; Folkers, Gert E; Boelens, Rolf; Bonvin, Alexandre M J J
2014-07-15
Protein-protein complexes orchestrate most cellular processes such as transcription, signal transduction and apoptosis. The factors governing their affinity remain elusive however, especially when it comes to describing dissociation rates (koff). Here we demonstrate that, next to direct contributions from the interface, the non-interacting surface (NIS) also plays an important role in binding affinity, especially polar and charged residues. Their percentage on the NIS is conserved over orthologous complexes indicating an evolutionary selection pressure. Their effect on binding affinity can be explained by long-range electrostatic contributions and surface-solvent interactions that are known to determine the local frustration of the protein complex surface. Including these in a simple model significantly improves the affinity prediction of protein complexes from structural models. The impact of mutations outside the interacting surface on binding affinity is supported by experimental alanine scanning mutagenesis data. These results enable the development of more sophisticated and integrated biophysical models of binding affinity and open new directions in experimental control and modulation of biomolecular interactions. Copyright © 2014. Published by Elsevier Ltd.
On the binding affinity of macromolecular interactions: daring to ask why proteins interact
Kastritis, Panagiotis L.; Bonvin, Alexandre M. J. J.
2013-01-01
Interactions between proteins are orchestrated in a precise and time-dependent manner, underlying cellular function. The binding affinity, defined as the strength of these interactions, is translated into physico-chemical terms in the dissociation constant (Kd), the latter being an experimental measure that determines whether an interaction will be formed in solution or not. Predicting binding affinity from structural models has been a matter of active research for more than 40 years because of its fundamental role in drug development. However, all available approaches are incapable of predicting the binding affinity of protein–protein complexes from coordinates alone. Here, we examine both theoretical and experimental limitations that complicate the derivation of structure–affinity relationships. Most work so far has concentrated on binary interactions. Systems of increased complexity are far from being understood. The main physico-chemical measure that relates to binding affinity is the buried surface area, but it does not hold for flexible complexes. For the latter, there must be a significant entropic contribution that will have to be approximated in the future. We foresee that any theoretical modelling of these interactions will have to follow an integrative approach considering the biology, chemistry and physics that underlie protein–protein recognition. PMID:23235262
NASA Astrophysics Data System (ADS)
Eid, Sameh; Saleh, Noureldin; Zalewski, Adam; Vedani, Angelo
2014-12-01
Carbohydrates play a key role in a variety of physiological and pathological processes and, hence, represent a rich source for the development of novel therapeutic agents. Being able to predict binding mode and binding affinity is an essential, yet lacking, aspect of the structure-based design of carbohydrate-based ligands. We assembled a diverse data set comprising 273 carbohydrate-protein crystal structures with known binding affinity and evaluated the prediction accuracy of a large collection of well-established scoring and free-energy functions, as well as combinations thereof. Unfortunately, the tested functions were not capable of reproducing binding affinities in the studied complexes. To simplify the complex free-energy surface of carbohydrate-protein systems, we classified the studied proteins according to the topology and solvent exposure of the carbohydrate-binding site into five distinct categories. A free-energy model based on the proposed classification scheme reproduced binding affinities in the carbohydrate data set with an r 2 of 0.71 and root-mean-squared-error of 1.25 kcal/mol ( N = 236). The improvement in model performance underlines the significance of the differences in the local micro-environments of carbohydrate-binding sites and demonstrates the usefulness of calibrating free-energy functions individually according to binding-site topology and solvent exposure.
Shave, Steven; Auer, Manfred
2013-12-23
Combinatorial chemical libraries produced on solid support offer fast and cost-effective access to a large number of unique compounds. If such libraries are screened directly on-bead, the speed at which chemical space can be explored by chemists is much greater than that addressable using solution based synthesis and screening methods. Solution based screening has a large supporting body of software such as structure-based virtual screening tools which enable the prediction of protein-ligand complexes. Use of these techniques to predict the protein bound complexes of compounds synthesized on solid support neglects to take into account the conjugation site on the small molecule ligand. This may invalidate predicted binding modes, the linker may be clashing with protein atoms. We present CSBB-ConeExclusion, a methodology and computer program which provides a measure of the applicability of solution dockings to solid support. Output is given in the form of statistics for each docking pose, a unique 2D visualization method which can be used to determine applicability at a glance, and automatically generated PyMol scripts allowing visualization of protein atom incursion into a defined exclusion volume. CSBB-ConeExclusion is then exemplarically used to determine the optimum attachment point for a purine library targeting cyclin-dependent kinase 2 CDK2.
Systematic identification of proteins that elicit drug side effects
Kuhn, Michael; Al Banchaabouchi, Mumna; Campillos, Monica; Jensen, Lars Juhl; Gross, Cornelius; Gavin, Anne-Claude; Bork, Peer
2013-01-01
Side effect similarities of drugs have recently been employed to predict new drug targets, and networks of side effects and targets have been used to better understand the mechanism of action of drugs. Here, we report a large-scale analysis to systematically predict and characterize proteins that cause drug side effects. We integrated phenotypic data obtained during clinical trials with known drug–target relations to identify overrepresented protein–side effect combinations. Using independent data, we confirm that most of these overrepresentations point to proteins which, when perturbed, cause side effects. Of 1428 side effects studied, 732 were predicted to be predominantly caused by individual proteins, at least 137 of them backed by existing pharmacological or phenotypic data. We prove this concept in vivo by confirming our prediction that activation of the serotonin 7 receptor (HTR7) is responsible for hyperesthesia in mice, which, in turn, can be prevented by a drug that selectively inhibits HTR7. Taken together, we show that a large fraction of complex drug side effects are mediated by individual proteins and create a reference for such relations. PMID:23632385
Montaño, Sarita; Orozco, Esther; Correa-Basurto, José; Bello, Martiniano; Chávez-Munguía, Bibiana; Betanzos, Abigail
2017-02-01
EhCPADH is a protein complex involved in the virulence of Entamoeba histolytica, the protozoan responsible for human amebiasis. It is formed by the EhCP112 cysteine protease and the EhADH adhesin. To explore the molecular basis of the complex formation, three-dimensional models were built for both proteins and molecular dynamics simulations (MDS) and docking calculations were performed. Results predicted that the pEhCP112 proenzyme and the mEhCP112 mature enzyme were globular and peripheral membrane proteins. Interestingly, in pEhCP112, the propeptide appeared hiding the catalytic site (C167, H329, N348); while in mEhCP112, this site was exposed and its residues were found structurally closer than in pEhCP112. EhADH emerged as an extended peripheral membrane protein with high fluctuation in Bro1 and V shape domains. 500 ns-long MDS and protein-protein docking predictions evidenced different heterodimeric complexes with the lowest free energy. pEhCP112 interacted with EhADH by the propeptide and C-terminal regions and mEhCP112 by the C-terminal through hydrogen bonds. In contrast, EhADH bound to mEhCP112 by 442-479 residues, adjacent to the target cell-adherence region (480-600 residues), and by the Bro1 domain (9-349 residues). Calculations of the effective binding free energy and per residue free energy decomposition showed that EhADH binds to mEhCP112 with a higher binding energy than to pEhCP112, mainly through van der Waals interactions and the nonpolar part of solvation energy. The EhADH and EhCP112 structural relationship was validated in trophozoites by immunofluorescence, TEM, and immunoprecipitation assays. Experimental findings fair agreed with in silico results.
OST-HTH: a novel predicted RNA-binding domain
2010-01-01
Background The mechanism by which the arthropod Oskar and vertebrate TDRD5/TDRD7 proteins nucleate or organize structurally related ribonucleoprotein (RNP) complexes, the polar granule and nuage, is poorly understood. Using sequence profile searches we identify a novel domain in these proteins that is widely conserved across eukaryotes and bacteria. Results Using contextual information from domain architectures, sequence-structure superpositions and available functional information we predict that this domain is likely to adopt the winged helix-turn-helix fold and bind RNA with a potential specificity for dsRNA. We show that in eukaryotes this domain is often combined in the same polypeptide with protein-protein- or lipid- interaction domains that might play a role in anchoring these proteins to specific cytoskeletal structures. Conclusions Thus, proteins with this domain might have a key role in the recognition and localization of dsRNA, including miRNAs, rasiRNAs and piRNAs hybridized to their targets. In other cases, this domain is fused to ubiquitin-binding, E3 ligase and ubiquitin-like domains indicating a previously under-appreciated role for ubiquitination in regulating the assembly and stability of nuage-like RNP complexes. Both bacteria and eukaryotes encode a conserved family of proteins that combines this predicted RNA-binding domain with a previously uncharacterized domain (DUF88). We present evidence that it is an RNAse belonging to the superfamily that includes the 5'->3' nucleases, PIN and NYN domains and might be recruited to degrade certain RNAs. Reviewers This article was reviewed by Sandor Pongor and Arcady Mushegian. PMID:20302647
CytoCluster: A Cytoscape Plugin for Cluster Analysis and Visualization of Biological Networks.
Li, Min; Li, Dongyan; Tang, Yu; Wu, Fangxiang; Wang, Jianxin
2017-08-31
Nowadays, cluster analysis of biological networks has become one of the most important approaches to identifying functional modules as well as predicting protein complexes and network biomarkers. Furthermore, the visualization of clustering results is crucial to display the structure of biological networks. Here we present CytoCluster, a cytoscape plugin integrating six clustering algorithms, HC-PIN (Hierarchical Clustering algorithm in Protein Interaction Networks), OH-PIN (identifying Overlapping and Hierarchical modules in Protein Interaction Networks), IPCA (Identifying Protein Complex Algorithm), ClusterONE (Clustering with Overlapping Neighborhood Expansion), DCU (Detecting Complexes based on Uncertain graph model), IPC-MCE (Identifying Protein Complexes based on Maximal Complex Extension), and BinGO (the Biological networks Gene Ontology) function. Users can select different clustering algorithms according to their requirements. The main function of these six clustering algorithms is to detect protein complexes or functional modules. In addition, BinGO is used to determine which Gene Ontology (GO) categories are statistically overrepresented in a set of genes or a subgraph of a biological network. CytoCluster can be easily expanded, so that more clustering algorithms and functions can be added to this plugin. Since it was created in July 2013, CytoCluster has been downloaded more than 9700 times in the Cytoscape App store and has already been applied to the analysis of different biological networks. CytoCluster is available from http://apps.cytoscape.org/apps/cytocluster.
CytoCluster: A Cytoscape Plugin for Cluster Analysis and Visualization of Biological Networks
Li, Min; Li, Dongyan; Tang, Yu; Wang, Jianxin
2017-01-01
Nowadays, cluster analysis of biological networks has become one of the most important approaches to identifying functional modules as well as predicting protein complexes and network biomarkers. Furthermore, the visualization of clustering results is crucial to display the structure of biological networks. Here we present CytoCluster, a cytoscape plugin integrating six clustering algorithms, HC-PIN (Hierarchical Clustering algorithm in Protein Interaction Networks), OH-PIN (identifying Overlapping and Hierarchical modules in Protein Interaction Networks), IPCA (Identifying Protein Complex Algorithm), ClusterONE (Clustering with Overlapping Neighborhood Expansion), DCU (Detecting Complexes based on Uncertain graph model), IPC-MCE (Identifying Protein Complexes based on Maximal Complex Extension), and BinGO (the Biological networks Gene Ontology) function. Users can select different clustering algorithms according to their requirements. The main function of these six clustering algorithms is to detect protein complexes or functional modules. In addition, BinGO is used to determine which Gene Ontology (GO) categories are statistically overrepresented in a set of genes or a subgraph of a biological network. CytoCluster can be easily expanded, so that more clustering algorithms and functions can be added to this plugin. Since it was created in July 2013, CytoCluster has been downloaded more than 9700 times in the Cytoscape App store and has already been applied to the analysis of different biological networks. CytoCluster is available from http://apps.cytoscape.org/apps/cytocluster. PMID:28858211
Cryo-electron microscopy study of bacteriophage T4 displaying anthrax toxin proteins
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fokine, Andrei; Bowman, Valorie D.; Battisti, Anthony J.
2007-10-25
The bacteriophage T4 capsid contains two accessory surface proteins, the small outer capsid protein (Soc, 870 copies) and the highly antigenic outer capsid protein (Hoc, 155 copies). As these are dispensable for capsid formation, they can be used for displaying proteins and macromolecular complexes on the T4 capsid surface. Anthrax toxin components were attached to the T4 capsid as a fusion protein of the N-terminal domain of the anthrax lethal factor (LFn) with Soc. The LFn-Soc fusion protein was complexed in vitro with Hoc{sup -}Soc{sup -}T4 phage. Subsequently, cleaved anthrax protective antigen heptamers (PA63){sub 7} were attached to the exposedmore » LFn domains. A cryo-electron microscopy study of the decorated T4 particles shows the complex of PA63 heptamers with LFn-Soc on the phage surface. Although the cryo-electron microscopy reconstruction is unable to differentiate on its own between different proposed models of the anthrax toxin, the density is consistent with a model that had predicted the orientation and position of three LFn molecules bound to one PA63 heptamer.« less
Modeling complexes of modeled proteins.
Anishchenko, Ivan; Kundrotas, Petras J; Vakser, Ilya A
2017-03-01
Structural characterization of proteins is essential for understanding life processes at the molecular level. However, only a fraction of known proteins have experimentally determined structures. This fraction is even smaller for protein-protein complexes. Thus, structural modeling of protein-protein interactions (docking) primarily has to rely on modeled structures of the individual proteins, which typically are less accurate than the experimentally determined ones. Such "double" modeling is the Grand Challenge of structural reconstruction of the interactome. Yet it remains so far largely untested in a systematic way. We present a comprehensive validation of template-based and free docking on a set of 165 complexes, where each protein model has six levels of structural accuracy, from 1 to 6 Å C α RMSD. Many template-based docking predictions fall into acceptable quality category, according to the CAPRI criteria, even for highly inaccurate proteins (5-6 Å RMSD), although the number of such models (and, consequently, the docking success rate) drops significantly for models with RMSD > 4 Å. The results show that the existing docking methodologies can be successfully applied to protein models with a broad range of structural accuracy, and the template-based docking is much less sensitive to inaccuracies of protein models than the free docking. Proteins 2017; 85:470-478. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
BiGGER: a new (soft) docking algorithm for predicting protein interactions.
Palma, P N; Krippahl, L; Wampler, J E; Moura, J J
2000-06-01
A new computationally efficient and automated "soft docking" algorithm is described to assist the prediction of the mode of binding between two proteins, using the three-dimensional structures of the unbound molecules. The method is implemented in a software package called BiGGER (Bimolecular Complex Generation with Global Evaluation and Ranking) and works in two sequential steps: first, the complete 6-dimensional binding spaces of both molecules is systematically searched. A population of candidate protein-protein docked geometries is thus generated and selected on the basis of the geometric complementarity and amino acid pairwise affinities between the two molecular surfaces. Most of the conformational changes observed during protein association are treated in an implicit way and test results are equally satisfactory, regardless of starting from the bound or the unbound forms of known structures of the interacting proteins. In contrast to other methods, the entire molecular surfaces are searched during the simulation, using absolutely no additional information regarding the binding sites. In a second step, an interaction scoring function is used to rank the putative docked structures. The function incorporates interaction terms that are thought to be relevant to the stabilization of protein complexes. These include: geometric complementarity of the surfaces, explicit electrostatic interactions, desolvation energy, and pairwise propensities of the amino acid side chains to contact across the molecular interface. The relative functional contribution of each of these interaction terms to the global scoring function has been empirically adjusted through a neural network optimizer using a learning set of 25 protein-protein complexes of known crystallographic structures. In 22 out of 25 protein-protein complexes tested, near-native docked geometries were found with C(alpha) RMS deviations < or =4.0 A from the experimental structures, of which 14 were found within the 20 top ranking solutions. The program works on widely available personal computers and takes 2 to 8 hours of CPU time to run any of the docking tests herein presented. Finally, the value and limitations of the method for the study of macromolecular interactions, not yet revealed by experimental techniques, are discussed.
WScore: A Flexible and Accurate Treatment of Explicit Water Molecules in Ligand-Receptor Docking.
Murphy, Robert B; Repasky, Matthew P; Greenwood, Jeremy R; Tubert-Brohman, Ivan; Jerome, Steven; Annabhimoju, Ramakrishna; Boyles, Nicholas A; Schmitz, Christopher D; Abel, Robert; Farid, Ramy; Friesner, Richard A
2016-05-12
We have developed a new methodology for protein-ligand docking and scoring, WScore, incorporating a flexible description of explicit water molecules. The locations and thermodynamics of the waters are derived from a WaterMap molecular dynamics simulation. The water structure is employed to provide an atomic level description of ligand and protein desolvation. WScore also contains a detailed model for localized ligand and protein strain energy and integrates an MM-GBSA scoring component with these terms to assess delocalized strain of the complex. Ensemble docking is used to take into account induced fit effects on the receptor conformation, and protein reorganization free energies are assigned via fitting to experimental data. The performance of the method is evaluated for pose prediction, rank ordering of self-docked complexes, and enrichment in virtual screening, using a large data set of PDB complexes and compared with the Glide SP and Glide XP models; significant improvements are obtained.
Ringer, Ashley L.; Senenko, Anastasia; Sherrill, C. David
2007-01-01
S/π interactions are prevalent in biochemistry and play an important role in protein folding and stabilization. Geometries of cysteine/aromatic interactions found in crystal structures from the Brookhaven Protein Data Bank (PDB) are analyzed and compared with the equilibrium configurations predicted by high-level quantum mechanical results for the H2S–benzene complex. A correlation is observed between the energetically favorable configurations on the quantum mechanical potential energy surface of the H2S–benzene model and the cysteine/aromatic configurations most frequently found in crystal structures of the PDB. In contrast to some previous PDB analyses, configurations with the sulfur over the aromatic ring are found to be the most important. Our results suggest that accurate quantum computations on models of noncovalent interactions may be helpful in understanding the structures of proteins and other complex systems. PMID:17766371
Folding and Stabilization of Native-Sequence-Reversed Proteins
Zhang, Yuanzhao; Weber, Jeffrey K; Zhou, Ruhong
2016-01-01
Though the problem of sequence-reversed protein folding is largely unexplored, one might speculate that reversed native protein sequences should be significantly more foldable than purely random heteropolymer sequences. In this article, we investigate how the reverse-sequences of native proteins might fold by examining a series of small proteins of increasing structural complexity (α-helix, β-hairpin, α-helix bundle, and α/β-protein). Employing a tandem protein structure prediction algorithmic and molecular dynamics simulation approach, we find that the ability of reverse sequences to adopt native-like folds is strongly influenced by protein size and the flexibility of the native hydrophobic core. For β-hairpins with reverse-sequences that fail to fold, we employ a simple mutational strategy for guiding stable hairpin formation that involves the insertion of amino acids into the β-turn region. This systematic look at reverse sequence duality sheds new light on the problem of protein sequence-structure mapping and may serve to inspire new protein design and protein structure prediction protocols. PMID:27113844
Folding and Stabilization of Native-Sequence-Reversed Proteins
NASA Astrophysics Data System (ADS)
Zhang, Yuanzhao; Weber, Jeffrey K.; Zhou, Ruhong
2016-04-01
Though the problem of sequence-reversed protein folding is largely unexplored, one might speculate that reversed native protein sequences should be significantly more foldable than purely random heteropolymer sequences. In this article, we investigate how the reverse-sequences of native proteins might fold by examining a series of small proteins of increasing structural complexity (α-helix, β-hairpin, α-helix bundle, and α/β-protein). Employing a tandem protein structure prediction algorithmic and molecular dynamics simulation approach, we find that the ability of reverse sequences to adopt native-like folds is strongly influenced by protein size and the flexibility of the native hydrophobic core. For β-hairpins with reverse-sequences that fail to fold, we employ a simple mutational strategy for guiding stable hairpin formation that involves the insertion of amino acids into the β-turn region. This systematic look at reverse sequence duality sheds new light on the problem of protein sequence-structure mapping and may serve to inspire new protein design and protein structure prediction protocols.
Kinetic rate constant prediction supports the conformational selection mechanism of protein binding.
Moal, Iain H; Bates, Paul A
2012-01-01
The prediction of protein-protein kinetic rate constants provides a fundamental test of our understanding of molecular recognition, and will play an important role in the modeling of complex biological systems. In this paper, a feature selection and regression algorithm is applied to mine a large set of molecular descriptors and construct simple models for association and dissociation rate constants using empirical data. Using separate test data for validation, the predicted rate constants can be combined to calculate binding affinity with accuracy matching that of state of the art empirical free energy functions. The models show that the rate of association is linearly related to the proportion of unbound proteins in the bound conformational ensemble relative to the unbound conformational ensemble, indicating that the binding partners must adopt a geometry near to that of the bound prior to binding. Mirroring the conformational selection and population shift mechanism of protein binding, the models provide a strong separate line of evidence for the preponderance of this mechanism in protein-protein binding, complementing structural and theoretical studies.
Intrinsically Disordered Proteins and the Origins of Multicellular Organisms
NASA Astrophysics Data System (ADS)
Dunker, A. Keith
In simple multicellular organisms all of the cells are in direct contact with the surrounding milieu, whereas in complex multicellular organisms some cells are completely surrounded by other cells. Current phylogenetic trees indicate that complex multicellular organisms evolved independently from unicellular ancestors about 10 times, and only among the eukaryotes, including once for animals, twice each for green, red, and brown algae, and thrice for fungi. Given these multiple independent evolutionary lineages, we asked two questions: 1. Which molecular functions underpinned the evolution of multicellular organisms?; and, 2. Which of these molecular functions depend on intrinsically disordered proteins (IDPs)? Compared to unicellularity, multicellularity requires the advent of molecules for cellular adhesion, for cell-cell communication and for developmental programs. In addition, the developmental programs need to be regulated over space and time. Finally, each multicellular organism has cell-specific biochemistry and physiology. Thus, the evolution of complex multicellular organisms from unicellular ancestors required five new classes of functions. To answer the second question we used Key-words in Swiss Protein ranked for associations with predictions of protein structure or disorder. With a Z-score of 18.8 compared to random-function proteins, à differentiation was the biological process most strongly associated with IDPs. As expected from this result, large numbers of individual proteins associated with differentiation exhibit substantial regions of predicted disorder. For the animals for which there is the most readily available data all five of the underpinning molecular functions for multicellularity were found to depend critically on IDP-based mechanisms and other evidence supports these ideas. While the data are more sparse, IDPs seem to similarly underlie the five new classes of functions for plants and fungi as well, suggesting that IDPs were indeed crucial for the evolution of complex multicellular organisms. These new findings necessitate a rethinking of the gene regulatory network models currently used to explain cellular differentiation and the evolution of complex multicellular organisms.
Dual Coordination of Post Translational Modifications in Human Protein Networks
Woodsmith, Jonathan; Kamburov, Atanas; Stelzl, Ulrich
2013-01-01
Post-translational modifications (PTMs) regulate protein activity, stability and interaction profiles and are critical for cellular functioning. Further regulation is gained through PTM interplay whereby modifications modulate the occurrence of other PTMs or act in combination. Integration of global acetylation, ubiquitination and tyrosine or serine/threonine phosphorylation datasets with protein interaction data identified hundreds of protein complexes that selectively accumulate each PTM, indicating coordinated targeting of specific molecular functions. A second layer of PTM coordination exists in these complexes, mediated by PTM integration (PTMi) spots. PTMi spots represent very dense modification patterns in disordered protein regions and showed an equally high mutation rate as functional protein domains in cancer, inferring equivocal importance for cellular functioning. Systematic PTMi spot identification highlighted more than 300 candidate proteins for combinatorial PTM regulation. This study reveals two global PTM coordination mechanisms and emphasizes dataset integration as requisite in proteomic PTM studies to better predict modification impact on cellular signaling. PMID:23505349
Surfing on Protein Waves: Proteophoresis as a Mechanism for Bacterial Genome Partitioning
NASA Astrophysics Data System (ADS)
Walter, J.-C.; Dorignac, J.; Lorman, V.; Rech, J.; Bouet, J.-Y.; Nollmann, M.; Palmeri, J.; Parmeggiani, A.; Geniet, F.
2017-07-01
Efficient bacterial chromosome segregation typically requires the coordinated action of a three-component machinery, fueled by adenosine triphosphate, called the partition complex. We present a phenomenological model accounting for the dynamic activity of this system that is also relevant for the physics of catalytic particles in active environments. The model is obtained by coupling simple linear reaction-diffusion equations with a proteophoresis, or "volumetric" chemophoresis, force field that arises from protein-protein interactions and provides a physically viable mechanism for complex translocation. This minimal description captures most known experimental observations: dynamic oscillations of complex components, complex separation, and subsequent symmetrical positioning. The predictions of our model are in phenomenological agreement with and provide substantial insight into recent experiments. From a nonlinear physics view point, this system explores the active separation of matter at micrometric scales with a dynamical instability between static positioning and traveling wave regimes triggered by the dynamical spontaneous breaking of rotational symmetry.
García-Jiménez, Beatriz; Pons, Tirso; Sanchis, Araceli; Valencia, Alfonso
2014-01-01
Biological pathways are important elements of systems biology and in the past decade, an increasing number of pathway databases have been set up to document the growing understanding of complex cellular processes. Although more genome-sequence data are becoming available, a large fraction of it remains functionally uncharacterized. Thus, it is important to be able to predict the mapping of poorly annotated proteins to original pathway models. We have developed a Relational Learning-based Extension (RLE) system to investigate pathway membership through a function prediction approach that mainly relies on combinations of simple properties attributed to each protein. RLE searches for proteins with molecular similarities to specific pathway components. Using RLE, we associated 383 uncharacterized proteins to 28 pre-defined human Reactome pathways, demonstrating relative confidence after proper evaluation. Indeed, in specific cases manual inspection of the database annotations and the related literature supported the proposed classifications. Examples of possible additional components of the Electron transport system, Telomere maintenance and Integrin cell surface interactions pathways are discussed in detail. All the human predicted proteins in the 2009 and 2012 releases 30 and 40 of Reactome are available at http://rle.bioinfo.cnio.es.
Kryshtafovych, Andriy; Moult, John; Bartual, Sergio G.; Bazan, J. Fernando; Berman, Helen; Casteel, Darren E.; Christodoulou, Evangelos; Everett, John K.; Hausmann, Jens; Heidebrecht, Tatjana; Hills, Tanya; Hui, Raymond; Hunt, John F.; Jayaraman, Seetharaman; Joachimiak, Andrzej; Kennedy, Michael A.; Kim, Choel; Lingel, Andreas; Michalska, Karolina; Montelione, Gaetano T.; Otero, José M.; Perrakis, Anastassis; Pizarro, Juan C.; van Raaij, Mark J.; Ramelot, Theresa A.; Rousseau, Francois; Tong, Liang; Wernimont, Amy K.; Young, Jasmine; Schwede, Torsten
2011-01-01
One goal of the CASP Community Wide Experiment on the Critical Assessment of Techniques for Protein Structure Prediction is to identify the current state of the art in protein structure prediction and modeling. A fundamental principle of CASP is blind prediction on a set of relevant protein targets, i.e. the participating computational methods are tested on a common set of experimental target proteins, for which the experimental structures are not known at the time of modeling. Therefore, the CASP experiment would not have been possible without broad support of the experimental protein structural biology community. In this manuscript, several experimental groups discuss the structures of the proteins which they provided as prediction targets for CASP9, highlighting structural and functional peculiarities of these structures: the long tail fibre protein gp37 from bacteriophage T4, the cyclic GMP-dependent protein kinase Iβ (PKGIβ) dimerization/docking domain, the ectodomain of the JTB (Jumping Translocation Breakpoint) transmembrane receptor, Autotaxin (ATX) in complex with an inhibitor, the DNA-Binding J-Binding Protein 1 (JBP1) domain essential for biosynthesis and maintenance of DNA base-J (β-D-glucosyl-hydroxymethyluracil) in Trypanosoma and Leishmania, an so far uncharacterized 73 residue domain from Ruminococcus gnavus with a fold typical for PDZ-like domains, a domain from the Phycobilisome (PBS) core-membrane linker (LCM) phycobiliprotein ApcE from Synechocystis, the Heat shock protein 90 (Hsp90) activators PFC0360w and PFC0270w from Plasmodium falciparum, and 2-oxo-3-deoxygalactonate kinase from Klebsiella pneumoniae. PMID:22020785
Automated de novo phasing and model building of coiled-coil proteins.
Rämisch, Sebastian; Lizatović, Robert; André, Ingemar
2015-03-01
Models generated by de novo structure prediction can be very useful starting points for molecular replacement for systems where suitable structural homologues cannot be readily identified. Protein-protein complexes and de novo-designed proteins are examples of systems that can be challenging to phase. In this study, the potential of de novo models of protein complexes for use as starting points for molecular replacement is investigated. The approach is demonstrated using homomeric coiled-coil proteins, which are excellent model systems for oligomeric systems. Despite the stereotypical fold of coiled coils, initial phase estimation can be difficult and many structures have to be solved with experimental phasing. A method was developed for automatic structure determination of homomeric coiled coils from X-ray diffraction data. In a benchmark set of 24 coiled coils, ranging from dimers to pentamers with resolutions down to 2.5 Å, 22 systems were automatically solved, 11 of which had previously been solved by experimental phasing. The generated models contained 71-103% of the residues present in the deposited structures, had the correct sequence and had free R values that deviated on average by 0.01 from those of the respective reference structures. The electron-density maps were of sufficient quality that only minor manual editing was necessary to produce final structures. The method, named CCsolve, combines methods for de novo structure prediction, initial phase estimation and automated model building into one pipeline. CCsolve is robust against errors in the initial models and can readily be modified to make use of alternative crystallographic software. The results demonstrate the feasibility of de novo phasing of protein-protein complexes, an approach that could also be employed for other small systems beyond coiled coils.
A Unified Conformational Selection and Induced Fit Approach to Protein-Peptide Docking
Trellet, Mikael; Melquiond, Adrien S. J.; Bonvin, Alexandre M. J. J.
2013-01-01
Protein-peptide interactions are vital for the cell. They mediate, inhibit or serve as structural components in nearly 40% of all macromolecular interactions, and are often associated with diseases, making them interesting leads for protein drug design. In recent years, large-scale technologies have enabled exhaustive studies on the peptide recognition preferences for a number of peptide-binding domain families. Yet, the paucity of data regarding their molecular binding mechanisms together with their inherent flexibility makes the structural prediction of protein-peptide interactions very challenging. This leaves flexible docking as one of the few amenable computational techniques to model these complexes. We present here an ensemble, flexible protein-peptide docking protocol that combines conformational selection and induced fit mechanisms. Starting from an ensemble of three peptide conformations (extended, a-helix, polyproline-II), flexible docking with HADDOCK generates 79.4% of high quality models for bound/unbound and 69.4% for unbound/unbound docking when tested against the largest protein-peptide complexes benchmark dataset available to date. Conformational selection at the rigid-body docking stage successfully recovers the most relevant conformation for a given protein-peptide complex and the subsequent flexible refinement further improves the interface by up to 4.5 Å interface RMSD. Cluster-based scoring of the models results in a selection of near-native solutions in the top three for ∼75% of the successfully predicted cases. This unified conformational selection and induced fit approach to protein-peptide docking should open the route to the modeling of challenging systems such as disorder-order transitions taking place upon binding, significantly expanding the applicability limit of biomolecular interaction modeling by docking. PMID:23516555
A unified conformational selection and induced fit approach to protein-peptide docking.
Trellet, Mikael; Melquiond, Adrien S J; Bonvin, Alexandre M J J
2013-01-01
Protein-peptide interactions are vital for the cell. They mediate, inhibit or serve as structural components in nearly 40% of all macromolecular interactions, and are often associated with diseases, making them interesting leads for protein drug design. In recent years, large-scale technologies have enabled exhaustive studies on the peptide recognition preferences for a number of peptide-binding domain families. Yet, the paucity of data regarding their molecular binding mechanisms together with their inherent flexibility makes the structural prediction of protein-peptide interactions very challenging. This leaves flexible docking as one of the few amenable computational techniques to model these complexes. We present here an ensemble, flexible protein-peptide docking protocol that combines conformational selection and induced fit mechanisms. Starting from an ensemble of three peptide conformations (extended, a-helix, polyproline-II), flexible docking with HADDOCK generates 79.4% of high quality models for bound/unbound and 69.4% for unbound/unbound docking when tested against the largest protein-peptide complexes benchmark dataset available to date. Conformational selection at the rigid-body docking stage successfully recovers the most relevant conformation for a given protein-peptide complex and the subsequent flexible refinement further improves the interface by up to 4.5 Å interface RMSD. Cluster-based scoring of the models results in a selection of near-native solutions in the top three for ∼75% of the successfully predicted cases. This unified conformational selection and induced fit approach to protein-peptide docking should open the route to the modeling of challenging systems such as disorder-order transitions taking place upon binding, significantly expanding the applicability limit of biomolecular interaction modeling by docking.
Effect of Treatment of Cystic Fibrosis Pulmonary Exacerbations on Systemic Inflammation
Thompson, Valeria; Chmiel, James F.; Montgomery, Gregory S.; Nasr, Samya Z.; Perkett, Elizabeth; Saavedra, Milene T.; Slovis, Bonnie; Anthony, Margaret M.; Emmett, Peggy; Heltshe, Sonya L.
2015-01-01
Rationale: In cystic fibrosis (CF), pulmonary exacerbations present an opportunity to define the effect of antibiotic therapy on systemic measures of inflammation. Objectives: Investigate whether plasma inflammatory proteins demonstrate and predict a clinical response to antibiotic therapy and determine which proteins are associated with measures of clinical improvement. Methods: In this multicenter study, a panel of 15 plasma proteins was measured at the onset and end of treatment for pulmonary exacerbation and at a clinically stable visit in patients with CF who were 10 years of age or older. Measurements and Main Results: Significant reductions in 10 plasma proteins were observed in 103 patients who had paired blood collections during antibiotic treatment for pulmonary exacerbations. Plasma C-reactive protein, serum amyloid A, calprotectin, and neutrophil elastase antiprotease complexes correlated most strongly with clinical measures at exacerbation onset. Reductions in C-reactive protein, serum amyloid A, IL-1ra, and haptoglobin were most associated with improvements in lung function with antibiotic therapy. Having higher IL-6, IL-8, and α1-antitrypsin (α1AT) levels at exacerbation onset were associated with an increased risk of being a nonresponder (i.e., failing to recover to baseline FEV1). Baseline IL-8, neutrophil elastase antiprotease complexes, and α1AT along with changes in several plasma proteins with antibiotic treatment, in combination with FEV1 at exacerbation onset, were predictive of being a treatment responder. Conclusions: Circulating inflammatory proteins demonstrate and predict a response to treatment of CF pulmonary exacerbations. A systemic biomarker panel could speed up drug discovery, leading to a quicker, more efficient drug development process for the CF community. PMID:25714657
Arcon, Juan Pablo; Defelipe, Lucas A; Modenutti, Carlos P; López, Elias D; Alvarez-Garcia, Daniel; Barril, Xavier; Turjanski, Adrián G; Martí, Marcelo A
2017-04-24
One of the most important biological processes at the molecular level is the formation of protein-ligand complexes. Therefore, determining their structure and underlying key interactions is of paramount relevance and has direct applications in drug development. Because of its low cost relative to its experimental sibling, molecular dynamics (MD) simulations in the presence of different solvent probes mimicking specific types of interactions have been increasingly used to analyze protein binding sites and reveal protein-ligand interaction hot spots. However, a systematic comparison of different probes and their real predictive power from a quantitative and thermodynamic point of view is still missing. In the present work, we have performed MD simulations of 18 different proteins in pure water as well as water mixtures of ethanol, acetamide, acetonitrile and methylammonium acetate, leading to a total of 5.4 μs simulation time. For each system, we determined the corresponding solvent sites, defined as space regions adjacent to the protein surface where the probability of finding a probe atom is higher than that in the bulk solvent. Finally, we compared the identified solvent sites with 121 different protein-ligand complexes and used them to perform molecular docking and ligand binding free energy estimates. Our results show that combining solely water and ethanol sites allows sampling over 70% of all possible protein-ligand interactions, especially those that coincide with ligand-based pharmacophoric points. Most important, we also show how the solvent sites can be used to significantly improve ligand docking in terms of both accuracy and precision, and that accurate predictions of ligand binding free energies, along with relative ranking of ligand affinity, can be performed.
Protein-Protein Docking with F2Dock 2.0 and GB-Rerank
Chowdhury, Rezaul; Rasheed, Muhibur; Keidel, Donald; Moussalem, Maysam; Olson, Arthur; Sanner, Michel; Bajaj, Chandrajit
2013-01-01
Motivation Computational simulation of protein-protein docking can expedite the process of molecular modeling and drug discovery. This paper reports on our new F2 Dock protocol which improves the state of the art in initial stage rigid body exhaustive docking search, scoring and ranking by introducing improvements in the shape-complementarity and electrostatics affinity functions, a new knowledge-based interface propensity term with FFT formulation, a set of novel knowledge-based filters and finally a solvation energy (GBSA) based reranking technique. Our algorithms are based on highly efficient data structures including the dynamic packing grids and octrees which significantly speed up the computations and also provide guaranteed bounds on approximation error. Results The improved affinity functions show superior performance compared to their traditional counterparts in finding correct docking poses at higher ranks. We found that the new filters and the GBSA based reranking individually and in combination significantly improve the accuracy of docking predictions with only minor increase in computation time. We compared F2 Dock 2.0 with ZDock 3.0.2 and found improvements over it, specifically among 176 complexes in ZLab Benchmark 4.0, F2 Dock 2.0 finds a near-native solution as the top prediction for 22 complexes; where ZDock 3.0.2 does so for 13 complexes. F2 Dock 2.0 finds a near-native solution within the top 1000 predictions for 106 complexes as opposed to 104 complexes for ZDock 3.0.2. However, there are 17 and 15 complexes where F2 Dock 2.0 finds a solution but ZDock 3.0.2 does not and vice versa; which indicates that the two docking protocols can also complement each other. Availability The docking protocol has been implemented as a server with a graphical client (TexMol) which allows the user to manage multiple docking jobs, and visualize the docked poses and interfaces. Both the server and client are available for download. Server: http://www.cs.utexas.edu/~bajaj/cvc/software/f2dock.shtml. Client: http://www.cs.utexas.edu/~bajaj/cvc/software/f2dockclient.shtml. PMID:23483883
Sasse, Alexander; de Vries, Sjoerd J; Schindler, Christina E M; de Beauchêne, Isaure Chauvot; Zacharias, Martin
2017-01-01
Protein-protein docking protocols aim to predict the structures of protein-protein complexes based on the structure of individual partners. Docking protocols usually include several steps of sampling, clustering, refinement and re-scoring. The scoring step is one of the bottlenecks in the performance of many state-of-the-art protocols. The performance of scoring functions depends on the quality of the generated structures and its coupling to the sampling algorithm. A tool kit, GRADSCOPT (GRid Accelerated Directly SCoring OPTimizing), was designed to allow rapid development and optimization of different knowledge-based scoring potentials for specific objectives in protein-protein docking. Different atomistic and coarse-grained potentials can be created by a grid-accelerated directly scoring dependent Monte-Carlo annealing or by a linear regression optimization. We demonstrate that the scoring functions generated by our approach are similar to or even outperform state-of-the-art scoring functions for predicting near-native solutions. Of additional importance, we find that potentials specifically trained to identify the native bound complex perform rather poorly on identifying acceptable or medium quality (near-native) solutions. In contrast, atomistic long-range contact potentials can increase the average fraction of near-native poses by up to a factor 2.5 in the best scored 1% decoys (compared to existing scoring), emphasizing the need of specific docking potentials for different steps in the docking protocol.
Probing binding hot spots at protein-RNA recognition sites.
Barik, Amita; Nithin, Chandran; Karampudi, Naga Bhushana Rao; Mukherjee, Sunandan; Bahadur, Ranjit Prasad
2016-01-29
We use evolutionary conservation derived from structure alignment of polypeptide sequences along with structural and physicochemical attributes of protein-RNA interfaces to probe the binding hot spots at protein-RNA recognition sites. We find that the degree of conservation varies across the RNA binding proteins; some evolve rapidly compared to others. Additionally, irrespective of the structural class of the complexes, residues at the RNA binding sites are evolutionary better conserved than those at the solvent exposed surfaces. For recognitions involving duplex RNA, residues interacting with the major groove are better conserved than those interacting with the minor groove. We identify multi-interface residues participating simultaneously in protein-protein and protein-RNA interfaces in complexes where more than one polypeptide is involved in RNA recognition, and show that they are better conserved compared to any other RNA binding residues. We find that the residues at water preservation site are better conserved than those at hydrated or at dehydrated sites. Finally, we develop a Random Forests model using structural and physicochemical attributes for predicting binding hot spots. The model accurately predicts 80% of the instances of experimental ΔΔG values in a particular class, and provides a stepping-stone towards the engineering of protein-RNA recognition sites with desired affinity. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
Binding free energy analysis of protein-protein docking model structures by evERdock.
Takemura, Kazuhiro; Matubayasi, Nobuyuki; Kitao, Akio
2018-03-14
To aid the evaluation of protein-protein complex model structures generated by protein docking prediction (decoys), we previously developed a method to calculate the binding free energies for complexes. The method combines a short (2 ns) all-atom molecular dynamics simulation with explicit solvent and solution theory in the energy representation (ER). We showed that this method successfully selected structures similar to the native complex structure (near-native decoys) as the lowest binding free energy structures. In our current work, we applied this method (evERdock) to 100 or 300 model structures of four protein-protein complexes. The crystal structures and the near-native decoys showed the lowest binding free energy of all the examined structures, indicating that evERdock can successfully evaluate decoys. Several decoys that show low interface root-mean-square distance but relatively high binding free energy were also identified. Analysis of the fraction of native contacts, hydrogen bonds, and salt bridges at the protein-protein interface indicated that these decoys were insufficiently optimized at the interface. After optimizing the interactions around the interface by including interfacial water molecules, the binding free energies of these decoys were improved. We also investigated the effect of solute entropy on binding free energy and found that consideration of the entropy term does not necessarily improve the evaluations of decoys using the normal model analysis for entropy calculation.
Binding free energy analysis of protein-protein docking model structures by evERdock
NASA Astrophysics Data System (ADS)
Takemura, Kazuhiro; Matubayasi, Nobuyuki; Kitao, Akio
2018-03-01
To aid the evaluation of protein-protein complex model structures generated by protein docking prediction (decoys), we previously developed a method to calculate the binding free energies for complexes. The method combines a short (2 ns) all-atom molecular dynamics simulation with explicit solvent and solution theory in the energy representation (ER). We showed that this method successfully selected structures similar to the native complex structure (near-native decoys) as the lowest binding free energy structures. In our current work, we applied this method (evERdock) to 100 or 300 model structures of four protein-protein complexes. The crystal structures and the near-native decoys showed the lowest binding free energy of all the examined structures, indicating that evERdock can successfully evaluate decoys. Several decoys that show low interface root-mean-square distance but relatively high binding free energy were also identified. Analysis of the fraction of native contacts, hydrogen bonds, and salt bridges at the protein-protein interface indicated that these decoys were insufficiently optimized at the interface. After optimizing the interactions around the interface by including interfacial water molecules, the binding free energies of these decoys were improved. We also investigated the effect of solute entropy on binding free energy and found that consideration of the entropy term does not necessarily improve the evaluations of decoys using the normal model analysis for entropy calculation.
A tool for calculating binding-site residues on proteins from PDB structures.
Hu, Jing; Yan, Changhui
2009-08-03
In the research on protein functional sites, researchers often need to identify binding-site residues on a protein. A commonly used strategy is to find a complex structure from the Protein Data Bank (PDB) that consists of the protein of interest and its interacting partner(s) and calculate binding-site residues based on the complex structure. However, since a protein may participate in multiple interactions, the binding-site residues calculated based on one complex structure usually do not reveal all binding sites on a protein. Thus, this requires researchers to find all PDB complexes that contain the protein of interest and combine the binding-site information gleaned from them. This process is very time-consuming. Especially, combing binding-site information obtained from different PDB structures requires tedious work to align protein sequences. The process becomes overwhelmingly difficult when researchers have a large set of proteins to analyze, which is usually the case in practice. In this study, we have developed a tool for calculating binding-site residues on proteins, TCBRP http://yanbioinformatics.cs.usu.edu:8080/ppbindingsubmit. For an input protein, TCBRP can quickly find all binding-site residues on the protein by automatically combining the information obtained from all PDB structures that consist of the protein of interest. Additionally, TCBRP presents the binding-site residues in different categories according to the interaction type. TCBRP also allows researchers to set the definition of binding-site residues. The developed tool is very useful for the research on protein binding site analysis and prediction.
Mode localization in the cooperative dynamics of protein recognition
NASA Astrophysics Data System (ADS)
Copperman, J.; Guenza, M. G.
2016-07-01
The biological function of proteins is encoded in their structure and expressed through the mediation of their dynamics. This paper presents a study on the correlation between local fluctuations, binding, and biological function for two sample proteins, starting from the Langevin Equation for Protein Dynamics (LE4PD). The LE4PD is a microscopic and residue-specific coarse-grained approach to protein dynamics, which starts from the static structural ensemble of a protein and predicts the dynamics analytically. It has been shown to be accurate in its prediction of NMR relaxation experiments and Debye-Waller factors. The LE4PD is solved in a set of diffusive modes which span a vast range of time scales of the protein dynamics, and provides a detailed picture of the mode-dependent localization of the fluctuation as a function of the primary structure of the protein. To investigate the dynamics of protein complexes, the theory is implemented here to treat the coarse-grained dynamics of interacting macromolecules. As an example, calculations of the dynamics of monomeric and dimerized HIV protease and the free Insulin Growth Factor II Receptor (IGF2R) domain 11 and its IGF2R:IGF2 complex are presented. Either simulation-derived or experimentally measured NMR conformers are used as input structural ensembles to the theory. The picture that emerges suggests a dynamical heterogeneous protein where biologically active regions provide energetically comparable conformational states that are trapped by a reacting partner in agreement with the conformation-selection mechanism of binding.
Smaczniak, Cezary; Muiño, Jose M; Chen, Dijun; Angenent, Gerco C; Kaufmann, Kerstin
2017-08-01
Floral organ identities in plants are specified by the combinatorial action of homeotic master regulatory transcription factors. However, how these factors achieve their regulatory specificities is still largely unclear. Genome-wide in vivo DNA binding data show that homeotic MADS domain proteins recognize partly distinct genomic regions, suggesting that DNA binding specificity contributes to functional differences of homeotic protein complexes. We used in vitro systematic evolution of ligands by exponential enrichment followed by high-throughput DNA sequencing (SELEX-seq) on several floral MADS domain protein homo- and heterodimers to measure their DNA binding specificities. We show that specification of reproductive organs is associated with distinct binding preferences of a complex formed by SEPALLATA3 and AGAMOUS. Binding specificity is further modulated by different binding site spacing preferences. Combination of SELEX-seq and genome-wide DNA binding data allows differentiation between targets in specification of reproductive versus perianth organs in the flower. We validate the importance of DNA binding specificity for organ-specific gene regulation by modulating promoter activity through targeted mutagenesis. Our study shows that intrafamily protein interactions affect DNA binding specificity of floral MADS domain proteins. Differential DNA binding of MADS domain protein complexes plays a role in the specificity of target gene regulation. © 2017 American Society of Plant Biologists. All rights reserved.
Protein-protein interface analysis and hot spots identification for chemical ligand design.
Chen, Jing; Ma, Xiaomin; Yuan, Yaxia; Pei, Jianfeng; Lai, Luhua
2014-01-01
Rational design for chemical compounds targeting protein-protein interactions has grown from a dream to reality after a decade of efforts. There are an increasing number of successful examples, though major challenges remain in the field. In this paper, we will first give a brief review of the available methods that can be used to analyze protein-protein interface and predict hot spots for chemical ligand design. New developments of binding sites detection, ligandability and hot spots prediction from the author's group will also be described. Pocket V.3 is an improved program for identifying hot spots in protein-protein interface using only an apo protein structure. It has been developed based on Pocket V.2 that can derive receptor-based pharmacophore model for ligand binding cavity. Given similarities and differences between the essence of pharmacophore and hot spots for guiding design of chemical compounds, not only energetic but also spatial properties of protein-protein interface are used in Pocket V.3 for dealing with protein-protein interface. In order to illustrate the capability of Pocket V.3, two datasets have been used. One is taken from ASEdb and BID having experimental alanine scanning results for testing hot spots prediction. The other is taken from the 2P2I database containing complex structures of protein-ligand binding at the original protein-protein interface for testing hot spots application in ligand design.
NASA Astrophysics Data System (ADS)
Keskin, Ozlem; Ma, Buyong; Rogale, Kristina; Gunasekaran, K.; Nussinov, Ruth
2005-06-01
Understanding and ultimately predicting protein associations is immensely important for functional genomics and drug design. Here, we propose that binding sites have preferred organizations. First, the hot spots cluster within densely packed 'hot regions'. Within these regions, they form networks of interactions. Thus, hot spots located within a hot region contribute cooperatively to the stability of the complex. However, the contributions of separate, independent hot regions are additive. Moreover, hot spots are often already pre-organized in the unbound (free) protein states. Describing a binding site through independent local hot regions has implications for binding site definition, design and parametrization for prediction. The compactness and cooperativity emphasize the similarity between binding and folding. This proposition is grounded in computation and experiment. It explains why summation of the interactions may over-estimate the stability of the complex. Furthermore, statistically, charge-charge coupling of the hot spots is disfavored. However, since within the highly packed regions the solvent is screened, the electrostatic contributions are strengthened. Thus, we propose a new description of protein binding sites: a site consists of (one or a few) self-contained cooperative regions. Since the residue hot spots are those conserved by evolution, proteins binding multiple partners at the same sites are expected to use all or some combination of these regions.
Suzuki, Nao; Zara, Jane; Sato, Takaaki; Ong, Edgar; Bakhiet, Nouna; Oshima, Robert G.; Watson, Kellie L.; Fukuda, Michiko N.
1998-01-01
Trophinin and tastin form a cell adhesion molecule complex that potentially mediates an initial attachment of the blastocyst to uterine epithelial cells at the time of implantation. Trophinin and tastin, however, do not directly bind to each other, suggesting the presence of an intermediary protein. The present study identifies a cytoplasmic protein, named bystin, that directly binds trophinin and tastin. Bystin consists of 306 amino acid residues and is predicted to contain tyrosine, serine, and threonine residues in contexts conforming to motifs for phosphorylation by protein kinases. Database searches revealed a 53% identity of the predicted peptide sequence with the Drosophila bys (mrr) gene. Direct protein–protein interactions of trophinin, tastin, and bystin analyzed by yeast two-hybrid assays and by in vitro protein binding assays indicated that binding between bystin and trophinin and between bystin and tastin is enhanced when cytokeratin 8 and 18 are present as the third molecule. Immunocytochemistry of bystin showed that bystin colocalizes with trophinin, tastin, and cytokeratins in a human trophoblastic teratocarcinoma cell, HT-H. It is therefore possible that these molecules form a complex and thus are involved in the process of embryo implantation. PMID:9560222
Prediction of purification of biopharmeceuticals with molecular dynamics
NASA Astrophysics Data System (ADS)
Ustach, Vincent; Faller, Roland
Purification of biopharmeceuticals remains the most expensive part of protein-based drug production. In ion exchange chromatography (IEX), prediction of the elution ionic strength of host cell and target proteins has the potential to reduce the parameter space for scale-up of protein production. The complex shape and charge distribution of proteins and pores complicates predictions of the interactions in these systems. All-atom molecular dynamics methods are beyond the scope of computational limits for mass transport regimes. We present a coarse-grained model for proteins for prediction of elution pH and ionic strength. By extending the raspberry model for colloid particles to surface shapes and charge distributions of proteins, we can reproduce the behavior of proteins in IEX. The average charge states of titratatable amino acid residues at relevant pH values are determined by extrapolation from all-atom molecular dynamics at pH 7. The pH specific all-atom electrostatic field is then mapped onto the coarse-grained surface beads of the raspberry particle. The hydrodynamics are reproduced with the lattice-Boltzmann scheme. This combination of methods allows very long simulation times. The model is being validated for known elution procedures by comparing the data with experiments. Defense Threat Reduction Agency (Grant Number HDTRA1-15-1-0054).
Jothi, Raja; Cherukuri, Praveen F.; Tasneem, Asba; Przytycka, Teresa M.
2006-01-01
Recent advances in functional genomics have helped generate large-scale high-throughput protein interaction data. Such networks, though extremely valuable towards molecular level understanding of cells, do not provide any direct information about the regions (domains) in the proteins that mediate the interaction. Here, we performed co-evolutionary analysis of domains in interacting proteins in order to understand the degree of co-evolution of interacting and non-interacting domains. Using a combination of sequence and structural analysis, we analyzed protein–protein interactions in F1-ATPase, Sec23p/Sec24p, DNA-directed RNA polymerase and nuclear pore complexes, and found that interacting domain pair(s) for a given interaction exhibits higher level of co-evolution than the noninteracting domain pairs. Motivated by this finding, we developed a computational method to test the generality of the observed trend, and to predict large-scale domain–domain interactions. Given a protein–protein interaction, the proposed method predicts the domain pair(s) that is most likely to mediate the protein interaction. We applied this method on the yeast interactome to predict domain–domain interactions, and used known domain–domain interactions found in PDB crystal structures to validate our predictions. Our results show that the prediction accuracy of the proposed method is statistically significant. Comparison of our prediction results with those from two other methods reveals that only a fraction of predictions are shared by all the three methods, indicating that the proposed method can detect known interactions missed by other methods. We believe that the proposed method can be used with other methods to help identify previously unrecognized domain–domain interactions on a genome scale, and could potentially help reduce the search space for identifying interaction sites. PMID:16949097
Zhang, Huiling; Huang, Qingsheng; Bei, Zhendong; Wei, Yanjie; Floudas, Christodoulos A
2016-03-01
In this article, we present COMSAT, a hybrid framework for residue contact prediction of transmembrane (TM) proteins, integrating a support vector machine (SVM) method and a mixed integer linear programming (MILP) method. COMSAT consists of two modules: COMSAT_SVM which is trained mainly on position-specific scoring matrix features, and COMSAT_MILP which is an ab initio method based on optimization models. Contacts predicted by the SVM model are ranked by SVM confidence scores, and a threshold is trained to improve the reliability of the predicted contacts. For TM proteins with no contacts above the threshold, COMSAT_MILP is used. The proposed hybrid contact prediction scheme was tested on two independent TM protein sets based on the contact definition of 14 Å between Cα-Cα atoms. First, using a rigorous leave-one-protein-out cross validation on the training set of 90 TM proteins, an accuracy of 66.8%, a coverage of 12.3%, a specificity of 99.3% and a Matthews' correlation coefficient (MCC) of 0.184 were obtained for residue pairs that are at least six amino acids apart. Second, when tested on a test set of 87 TM proteins, the proposed method showed a prediction accuracy of 64.5%, a coverage of 5.3%, a specificity of 99.4% and a MCC of 0.106. COMSAT shows satisfactory results when compared with 12 other state-of-the-art predictors, and is more robust in terms of prediction accuracy as the length and complexity of TM protein increase. COMSAT is freely accessible at http://hpcc.siat.ac.cn/COMSAT/. © 2016 Wiley Periodicals, Inc.
Gardères, Johan; Domart-Coulon, Isabelle; Marie, Arul; Hamer, Bojan; Batel, Renato; Müller, Werner E G; Bourguet-Kondracki, Marie-Lise
2016-10-01
Carbohydrate-binding proteins were purified from the marine calcareous sponge Clathrina clathrus via affinity chromatography on lactose and N-acetyl glucosamine-agarose resins. Proteomic analysis of acrylamide gel separated protein subunits obtained in reducing conditions pointed out several candidates for lectins. Based on amino-acid sequence similarity, two peptides displayed homology with the jack bean lectin Concanavalin A, including a conserved domain shared by proteins in the L-type lectin superfamily. An N-acetyl glucosamine - binding protein complex, named clathrilectin, was further purified via gel filtration chromatography, bioguided with a diagnostic rabbit erythrocyte haemagglutination assay, and its activity was found to be calcium dependent. Clathrilectin, a protein complex of 3200kDa estimated by gel filtration, is composed of monomers with apparent molecular masses of 208 and 180kDa estimated on 10% SDS-PAGE. Nine internal peptides were identified using proteomic analyses, and compared to protein libraries from the demosponge Amphimedon queenslandica and a calcareous sponge Sycon sp. from the Adriatic Sea. The clathrilectin is the first lectin isolated from a calcareous sponge and displays homologies with predicted sponge proteins potentially involved in cell aggregation and interaction with bacteria. Copyright © 2016 Elsevier Inc. All rights reserved.
Mishra, Vinita; Pathak, Chandramani
2018-05-29
Toll-like receptor 4 (TLR4) is a member of Toll-Like Receptors (TLRs) family that serves as a receptor for bacterial lipopolysaccharide (LPS). TLR4 alone cannot recognize LPS without aid of co-receptor myeloid differentiation factor-2 (MD-2). Binding of LPS with TLR4 forms a LPS-TLR4-MD-2 complex and directs downstream signaling for activation of immune response, inflammation and NF-κB activation. Activation of TLR4 signaling is associated with various pathophysiological consequences. Therefore, targeting protein-protein interaction (PPI) in TLR4-MD-2 complex formation could be an attractive therapeutic approach for targeting inflammatory disorders. The aim of present study was directed to identify small molecule PPI inhibitors (SMPPIIs) using pharmacophore mapping-based approach of computational drug discovery. Here, we had retrieved the information about the hot spot residues and their pharmacophoric features at both primary (TLR4-MD-2) and dimerization (MD-2-TLR4*) protein-protein interaction interfaces in TLR4-MD-2 homo-dimer complex using in silico methods. Promising candidates were identified after virtual screening, which may restrict TLR4-MD-2 protein-protein interaction. In silico off-target profiling over the virtually screened compounds revealed other possible molecular targets. Two of the virtually screened compounds (C11 and C15) were predicted to have an inhibitory concentration in μM range after HYDE assessment. Molecular dynamics simulation study performed for these two compounds in complex with target protein confirms the stability of the complex. After virtual high throughput screening we found selective hTLR4-MD-2 inhibitors, which may have therapeutic potential to target chronic inflammatory diseases.
On the Distribution of Protein Refractive Index Increments
Zhao, Huaying; Brown, Patrick H.; Schuck, Peter
2011-01-01
The protein refractive index increment, dn/dc, is an important parameter underlying the concentration determination and the biophysical characterization of proteins and protein complexes in many techniques. In this study, we examine the widely used assumption that most proteins have dn/dc values in a very narrow range, and reappraise the prediction of dn/dc of unmodified proteins based on their amino acid composition. Applying this approach in large scale to the entire set of known and predicted human proteins, we obtain, for the first time, to our knowledge, an estimate of the full distribution of protein dn/dc values. The distribution is close to Gaussian with a mean of 0.190 ml/g (for unmodified proteins at 589 nm) and a standard deviation of 0.003 ml/g. However, small proteins <10 kDa exhibit a larger spread, and almost 3000 proteins have values deviating by more than two standard deviations from the mean. Due to the widespread availability of protein sequences and the potential for outliers, the compositional prediction should be convenient and provide greater accuracy than an average consensus value for all proteins. We discuss how this approach should be particularly valuable for certain protein classes where a high dn/dc is coincidental to structural features, or may be functionally relevant such as in proteins of the eye. PMID:21539801
On the distribution of protein refractive index increments.
Zhao, Huaying; Brown, Patrick H; Schuck, Peter
2011-05-04
The protein refractive index increment, dn/dc, is an important parameter underlying the concentration determination and the biophysical characterization of proteins and protein complexes in many techniques. In this study, we examine the widely used assumption that most proteins have dn/dc values in a very narrow range, and reappraise the prediction of dn/dc of unmodified proteins based on their amino acid composition. Applying this approach in large scale to the entire set of known and predicted human proteins, we obtain, for the first time, to our knowledge, an estimate of the full distribution of protein dn/dc values. The distribution is close to Gaussian with a mean of 0.190 ml/g (for unmodified proteins at 589 nm) and a standard deviation of 0.003 ml/g. However, small proteins <10 kDa exhibit a larger spread, and almost 3000 proteins have values deviating by more than two standard deviations from the mean. Due to the widespread availability of protein sequences and the potential for outliers, the compositional prediction should be convenient and provide greater accuracy than an average consensus value for all proteins. We discuss how this approach should be particularly valuable for certain protein classes where a high dn/dc is coincidental to structural features, or may be functionally relevant such as in proteins of the eye. Copyright © 2011 Biophysical Society. Published by Elsevier Inc. All rights reserved.
Wang, Yong-Cui; Wang, Yong; Yang, Zhi-Xia; Deng, Nai-Yang
2011-06-20
Enzymes are known as the largest class of proteins and their functions are usually annotated by the Enzyme Commission (EC), which uses a hierarchy structure, i.e., four numbers separated by periods, to classify the function of enzymes. Automatically categorizing enzyme into the EC hierarchy is crucial to understand its specific molecular mechanism. In this paper, we introduce two key improvements in predicting enzyme function within the machine learning framework. One is to introduce the efficient sequence encoding methods for representing given proteins. The second one is to develop a structure-based prediction method with low computational complexity. In particular, we propose to use the conjoint triad feature (CTF) to represent the given protein sequences by considering not only the composition of amino acids but also the neighbor relationships in the sequence. Then we develop a support vector machine (SVM)-based method, named as SVMHL (SVM for hierarchy labels), to output enzyme function by fully considering the hierarchical structure of EC. The experimental results show that our SVMHL with the CTF outperforms SVMHL with the amino acid composition (AAC) feature both in predictive accuracy and Matthew's correlation coefficient (MCC). In addition, SVMHL with the CTF obtains the accuracy and MCC ranging from 81% to 98% and 0.82 to 0.98 when predicting the first three EC digits on a low-homologous enzyme dataset. We further demonstrate that our method outperforms the methods which do not take account of hierarchical relationship among enzyme categories and alternative methods which incorporate prior knowledge about inter-class relationships. Our structure-based prediction model, SVMHL with the CTF, reduces the computational complexity and outperforms the alternative approaches in enzyme function prediction. Therefore our new method will be a useful tool for enzyme function prediction community.
Genes2Networks: connecting lists of gene symbols using mammalian protein interactions databases.
Berger, Seth I; Posner, Jeremy M; Ma'ayan, Avi
2007-10-04
In recent years, mammalian protein-protein interaction network databases have been developed. The interactions in these databases are either extracted manually from low-throughput experimental biomedical research literature, extracted automatically from literature using techniques such as natural language processing (NLP), generated experimentally using high-throughput methods such as yeast-2-hybrid screens, or interactions are predicted using an assortment of computational approaches. Genes or proteins identified as significantly changing in proteomic experiments, or identified as susceptibility disease genes in genomic studies, can be placed in the context of protein interaction networks in order to assign these genes and proteins to pathways and protein complexes. Genes2Networks is a software system that integrates the content of ten mammalian interaction network datasets. Filtering techniques to prune low-confidence interactions were implemented. Genes2Networks is delivered as a web-based service using AJAX. The system can be used to extract relevant subnetworks created from "seed" lists of human Entrez gene symbols. The output includes a dynamic linkable three color web-based network map, with a statistical analysis report that identifies significant intermediate nodes used to connect the seed list. Genes2Networks is powerful web-based software that can help experimental biologists to interpret lists of genes and proteins such as those commonly produced through genomic and proteomic experiments, as well as lists of genes and proteins associated with disease processes. This system can be used to find relationships between genes and proteins from seed lists, and predict additional genes or proteins that may play key roles in common pathways or protein complexes.
Hoang, Margaret L; Leon, Ronald P; Pessoa-Brandao, Luis; Hunt, Sonia; Raghuraman, M K; Fangman, Walton L; Brewer, Bonita J; Sclafani, Robert A
2007-11-01
Eukaryotic chromosomal replication is a complicated process with many origins firing at different efficiencies and times during S phase. Prereplication complexes are assembled on all origins in G(1) phase, and yet only a subset of complexes is activated during S phase by DDK (for Dbf4-dependent kinase) (Cdc7-Dbf4). The yeast mcm5-bob1 (P83L) mutation bypasses DDK but results in reduced intrinsic firing efficiency at 11 endogenous origins and at origins located on minichromosomes. Origin efficiency may result from Mcm5 protein assuming an altered conformation, as predicted from the atomic structure of an archaeal MCM (for minichromosome maintenance) homologue. Similarly, an intragenic mutation in a residue predicted to interact with P83L suppresses the mcm5-bob1 bypass phenotype. We propose DDK phosphorylation of the MCM complex normally results in a single, highly active conformation of Mcm5, whereas the mcm5-bob1 mutation produces a number of conformations, only one of which is permissive for origin activation. Random adoption of these alternate states by the mcm5-bob1 protein can explain both how origin firing occurs independently of DDK and why origin efficiency is reduced. Because similar mutations in mcm2 and mcm4 cannot bypass DDK, Mcm5 protein may be a unique Mcm protein that is the final target of DDK regulation.
Zhang, Yi; Nikolovski, Nino; Sorieul, Mathias; Vellosillo, Tamara; McFarlane, Heather E; Dupree, Ray; Kesten, Christopher; Schneider, René; Driemeier, Carlos; Lathe, Rahul; Lampugnani, Edwin; Yu, Xiaolan; Ivakov, Alexander; Doblin, Monika S; Mortimer, Jenny C; Brown, Steven P; Persson, Staffan; Dupree, Paul
2016-06-09
As the most abundant biopolymer on Earth, cellulose is a key structural component of the plant cell wall. Cellulose is produced at the plasma membrane by cellulose synthase (CesA) complexes (CSCs), which are assembled in the endomembrane system and trafficked to the plasma membrane. While several proteins that affect CesA activity have been identified, components that regulate CSC assembly and trafficking remain unknown. Here we show that STELLO1 and 2 are Golgi-localized proteins that can interact with CesAs and control cellulose quantity. In the absence of STELLO function, the spatial distribution within the Golgi, secretion and activity of the CSCs are impaired indicating a central role of the STELLO proteins in CSC assembly. Point mutations in the predicted catalytic domains of the STELLO proteins indicate that they are glycosyltransferases facing the Golgi lumen. Hence, we have uncovered proteins that regulate CSC assembly in the plant Golgi apparatus.
Analysis of deep learning methods for blind protein contact prediction in CASP12.
Wang, Sheng; Sun, Siqi; Xu, Jinbo
2018-03-01
Here we present the results of protein contact prediction achieved in CASP12 by our RaptorX-Contact server, which is an early implementation of our deep learning method for contact prediction. On a set of 38 free-modeling target domains with a median family size of around 58 effective sequences, our server obtained an average top L/5 long- and medium-range contact accuracy of 47% and 44%, respectively (L = length). A complete implementation has an average accuracy of 59% and 57%, respectively. Our deep learning method formulates contact prediction as a pixel-level image labeling problem and simultaneously predicts all residue pairs of a protein using a combination of two deep residual neural networks, taking as input the residue conservation information, predicted secondary structure and solvent accessibility, contact potential, and coevolution information. Our approach differs from existing methods mainly in (1) formulating contact prediction as a pixel-level image labeling problem instead of an image-level classification problem; (2) simultaneously predicting all contacts of an individual protein to make effective use of contact occurrence patterns; and (3) integrating both one-dimensional and two-dimensional deep convolutional neural networks to effectively learn complex sequence-structure relationship including high-order residue correlation. This paper discusses the RaptorX-Contact pipeline, both contact prediction and contact-based folding results, and finally the strength and weakness of our method. © 2017 Wiley Periodicals, Inc.
Discovering amino acid patterns on binding sites in protein complexes
Kuo, Huang-Cheng; Ong, Ping-Lin; Lin, Jung-Chang; Huang, Jen-Peng
2011-01-01
Discovering amino acid (AA) patterns on protein binding sites has recently become popular. We propose a method to discover the association relationship among AAs on binding sites. Such knowledge of binding sites is very helpful in predicting protein-protein interactions. In this paper, we focus on protein complexes which have protein-protein recognition. The association rule mining technique is used to discover geographically adjacent amino acids on a binding site of a protein complex. When mining, instead of treating all AAs of binding sites as a transaction, we geographically partition AAs of binding sites in a protein complex. AAs in a partition are treated as a transaction. For the partition process, AAs on a binding site are projected from three-dimensional to two-dimensional. And then, assisted with a circular grid, AAs on the binding site are placed into grid cells. A circular grid has ten rings: a central ring, the second ring with 6 sectors, the third ring with 12 sectors, and later rings are added to four sectors in order. As for the radius of each ring, we examined the complexes and found that 10Å is a suitable range, which can be set by the user. After placing these recognition complexes on the circular grid, we obtain mining records (i.e. transactions) from each sector. A sector is regarded as a record. Finally, we use the association rule to mine these records for frequent AA patterns. If the support of an AA pattern is larger than the predetermined minimum support (i.e. threshold), it is called a frequent pattern. With these discovered patterns, we offer the biologists a novel point of view, which will improve the prediction accuracy of protein-protein recognition. In our experiments, we produced the AA patterns by data mining. As a result, we found that arginine (arg) most frequently appears on the binding sites of two proteins in the recognition protein complexes, while cysteine (cys) appears the fewest. In addition, if we discriminate the shape of binding sites between concave and convex further, we discover that patterns {arg, glu, asp} and {arg, ser, asp} on the concave shape of binding sites in a protein more frequently (i.e. higher probability) make contact with {lys} or {arg} on the convex shape of binding sites in another protein. Thus, we can confidently achieve a rate of at least 78%. On the other hand {val, gly, lys} on the convex surface of binding sites in proteins is more frequently in contact with {asp} on the concave site of another protein, and the confidence achieved is over 81%. Applying data mining in biology can reveal more facts that may otherwise be ignored or not easily discovered by the naked eye. Furthermore, we can discover more relationships among AAs on binding sites by appropriately rotating these residues on binding sites from a three-dimension to two-dimension perspective. We designed a circular grid to deposit the data, which total to 463 records consisting of AAs. Then we used the association rules to mine these records for discovering relationships. The proposed method in this paper provides an insight into the characteristics of binding sites for recognition complexes. PMID:21464838
Camproux, A C; Tufféry, P
2005-08-05
Understanding and predicting protein structures depend on the complexity and the accuracy of the models used to represent them. We have recently set up a Hidden Markov Model to optimally compress protein three-dimensional conformations into a one-dimensional series of letters of a structural alphabet. Such a model learns simultaneously the shape of representative structural letters describing the local conformation and the logic of their connections, i.e. the transition matrix between the letters. Here, we move one step further and report some evidence that such a model of protein local architecture also captures some accurate amino acid features. All the letters have specific and distinct amino acid distributions. Moreover, we show that words of amino acids can have significant propensities for some letters. Perspectives point towards the prediction of the series of letters describing the structure of a protein from its amino acid sequence.
A TALE-inspired computational screen for proteins that contain approximate tandem repeats.
Perycz, Malgorzata; Krwawicz, Joanna; Bochtler, Matthias
2017-01-01
TAL (transcription activator-like) effectors (TALEs) are bacterial proteins that are secreted from bacteria to plant cells to act as transcriptional activators. TALEs and related proteins (RipTALs, BurrH, MOrTL1 and MOrTL2) contain approximate tandem repeats that differ in conserved positions that define specificity. Using PERL, we screened ~47 million protein sequences for TALE-like architecture characterized by approximate tandem repeats (between 30 and 43 amino acids in length) and sequence variability in conserved positions, without requiring sequence similarity to TALEs. Candidate proteins were scored according to their propensity for nuclear localization, secondary structure, repeat sequence complexity, as well as covariation and predicted structural proximity of variable residues. Biological context was tentatively inferred from co-occurrence of other domains and interactome predictions. Approximate repeats with TALE-like features that merit experimental characterization were found in a protein of chestnut blight fungus, a eukaryotic plant pathogen.
A TALE-inspired computational screen for proteins that contain approximate tandem repeats
Krwawicz, Joanna
2017-01-01
TAL (transcription activator-like) effectors (TALEs) are bacterial proteins that are secreted from bacteria to plant cells to act as transcriptional activators. TALEs and related proteins (RipTALs, BurrH, MOrTL1 and MOrTL2) contain approximate tandem repeats that differ in conserved positions that define specificity. Using PERL, we screened ~47 million protein sequences for TALE-like architecture characterized by approximate tandem repeats (between 30 and 43 amino acids in length) and sequence variability in conserved positions, without requiring sequence similarity to TALEs. Candidate proteins were scored according to their propensity for nuclear localization, secondary structure, repeat sequence complexity, as well as covariation and predicted structural proximity of variable residues. Biological context was tentatively inferred from co-occurrence of other domains and interactome predictions. Approximate repeats with TALE-like features that merit experimental characterization were found in a protein of chestnut blight fungus, a eukaryotic plant pathogen. PMID:28617832
Computational Analysis of the CB1 Carboxyl-terminus in the Receptor-G Protein Complex
Shim, Joong-Youn; Khurana, Leepakshi; Kendall, Debra A.
2016-01-01
Despite the important role of the carboxyl-terminus (Ct) of the activated brain cannabinoid receptor one (CB1) in the regulation of G protein signaling, a structural understanding of interactions with G proteins is lacking. This is largely due to the highly flexible nature of the CB1 Ct that dynamically adapts its conformation to the presence of G proteins. In the present study, we explored how the CB1 Ct can interact with the G protein by building on our prior modeling of the CB1-Gi complex (Shim J-Y, Ahn KH, Kendall DA. The Journal of Biological Chemistry 2013;288:32449-32465) to incorporate a complete CB1 Ct (Glu416Ct–Leu472Ct). Based upon the structural constraints from NMR studies, we employed ROSETTA to predict tertiary folds, ZDOCK to predict docking orientation, and molecular dynamics (MD) simulations to obtain two distinct plausible models of CB1 Ct in the CB1-Gi complex. The resulting models were consistent with the NMR-determined helical structure (H9) in the middle region of the CB1 Ct. The CB1 Ct directly interacted with both Gα and Gβ and stabilized the receptor at the Gi interface. The results of site-directed mutagenesis studies of Glu416Ct, Asp423Ct, Asp428Ct, and Arg444Ct of CB1 Ct suggested that the CB1 Ct can influence receptor-G protein coupling by stabilizing the receptor at the Gi interface. This research provided, for the first time, models of the CB1 Ct in contact with the G protein. PMID:26994549
Ghosh, Semanti; Bagchi, Angshuman
2018-04-26
Sulfur metabolism is one of the oldest known biochemical processes. Chemotrophic or phototrophic proteobacteria, through the dissimilatory pathway, use sulfate, sulfide, sulfite, thiosulfate or elementary sulfur by either reductive or oxidative mechanisms. During anoxygenic photosynthesis, anaerobic sulfur oxidizer Allochromatium vinosum forms sulfur globules that are further oxidized by dsr operon. One of the key redox enzymes in reductive or oxidative sulfur metabolic pathways is the DsrAB protein complex. However, there are practically no reports to elucidate the molecular mechanism of the sulfur oxidation process by the DsrAB protein complex from sulfur oxidizer Allochromatium vinosum. In the present context, we tried to analyze the structural details of the DsrAB protein complex from sulfur oxidizer Allochromatium vinosum by molecular dynamics simulations. The molecular dynamics simulation results revealed the various types of molecular interactions between DsrA and DsrB proteins during the formation of DsrAB protein complex. We, for the first time, predicted the mode of binding interactions between the co-factor and DsrAB protein complex from Allochromatium vinosum. We also compared the binding interfaces of DsrAB from sulfur oxidizer Allochromatium vinosum and sulfate reducer Desulfovibrio vulgaris. This study is the first to provide a comparative aspect of binding modes of sulfur oxidizer Allochromatium vinosum and sulfate reducer Desulfovibrio vulgaris.
Cooper, Gareth R; Moir, Anne
2011-05-01
The paradigm gerA operon is required for endospore germination in response to c-alanine as the sole germinant, and the three protein products, GerAA, GerAB, and GerAC are predicted to form a receptor complex in the spore inner membrane. GerAB shows homology to the amino acid-polyamine-organocation (APC) family of single-component transporters and is predicted to be an integral membrane protein with 10 membrane-spanning helices. Site-directed mutations were introduced into the gerAB gene at its natural location on the chromosome. Alterations to some charged or potential helix-breaking residues within membrane spans affected receptor function dramatically. In some cases, this is likely to reflect the complete loss of the GerA receptor complex, as judged by the absence of the germinant receptor protein GerAC, which suggests that the altered GerAB protein itself may be unstable or that the altered structure destabilizes the complex. Mutants that have a null phenotype for Instituto de Biotecnología de León, INBIOTEC, Parque Científico de León, Av. Real, 1, 24006 León, Spain-alanine germination but retain GerAC protein at near-normal levels are more likely to define amino acid residues of functional, rather than structural, importance. Single-amino-acid substitutions in each of the GerAB and GerAA proteins can prevent incorporation of GerAC protein into the spore; this provides strong evidence that the proteins within a specific receptor interact and that these interactions are required for receptor assembly. The lipoprotein nature of the GerAC receptor subunit is also important; an amino acid change in the prelipoprotein signal sequence in the gerAC1 mutant results in the absence of GerAC protein from the spore.
Proteins of the Glycine Decarboxylase Complex in the Hydrogenosome of Trichomonas vaginalis†
Mukherjee, Mandira; Brown, Mark T.; McArthur, Andrew G.; Johnson, Patricia J.
2006-01-01
Trichomonas vaginalis is a unicellular eukaryote that lacks mitochondria and contains a specialized organelle, the hydrogenosome, involved in carbohydrate metabolism and iron-sulfur cluster assembly. We report the identification of two glycine cleavage H proteins and a dihydrolipoamide dehydrogenase (L protein) of the glycine decarboxylase complex in T. vaginalis with predicted N-terminal hydrogenosomal presequences. Immunofluorescence analyses reveal that both H and L proteins are localized in hydrogenosomes, providing the first evidence for amino acid metabolism in this organelle. All three proteins were expressed in Escherichia coli and purified to homogeneity. The experimental Km of L protein for the two H proteins were 2.6 μM and 3.7 μM, consistent with both H proteins serving as substrates of L protein. Analyses using purified hydrogenosomes showed that endogenous H proteins exist as monomers and endogenous L protein as a homodimer in their native states. Phylogenetic analyses of L proteins revealed that the T. vaginalis homologue shares a common ancestry with dihydrolipoamide dehydrogenases from the firmicute bacteria, indicating its acquisition via a horizontal gene transfer event independent of the origins of mitochondria and hydrogenosomes. PMID:17158739
Rose, Annkatrin; Schraegle, Shannon J; Stahlberg, Eric A; Meier, Iris
2005-11-16
Long alpha-helical coiled-coil proteins are involved in diverse organizational and regulatory processes in eukaryotic cells. They provide cables and networks in the cyto- and nucleoskeleton, molecular scaffolds that organize membrane systems and tissues, motors, levers, rotating arms, and possibly springs. Mutations in long coiled-coil proteins have been implemented in a growing number of human diseases. Using the coiled-coil prediction program MultiCoil, we have previously identified all long coiled-coil proteins from the model plant Arabidopsis thaliana and have established a searchable Arabidopsis coiled-coil protein database. Here, we have identified all proteins with long coiled-coil domains from 21 additional fully sequenced genomes. Because regions predicted to form coiled-coils interfere with sequence homology determination, we have developed a sequence comparison and clustering strategy based on masking predicted coiled-coil domains. Comparing and grouping all long coiled-coil proteins from 22 genomes, the kingdom-specificity of coiled-coil protein families was determined. At the same time, a number of proteins with unknown function could be grouped with already characterized proteins from other organisms. MultiCoil predicts proteins with extended coiled-coil domains (more than 250 amino acids) to be largely absent from bacterial genomes, but present in archaea and eukaryotes. The structural maintenance of chromosomes proteins and their relatives are the only long coiled-coil protein family clearly conserved throughout all kingdoms, indicating their ancient nature. Motor proteins, membrane tethering and vesicle transport proteins are the dominant eukaryote-specific long coiled-coil proteins, suggesting that coiled-coil proteins have gained functions in the increasingly complex processes of subcellular infrastructure maintenance and trafficking control of the eukaryotic cell.
Rose, Annkatrin; Schraegle, Shannon J; Stahlberg, Eric A; Meier, Iris
2005-01-01
Background Long alpha-helical coiled-coil proteins are involved in diverse organizational and regulatory processes in eukaryotic cells. They provide cables and networks in the cyto- and nucleoskeleton, molecular scaffolds that organize membrane systems and tissues, motors, levers, rotating arms, and possibly springs. Mutations in long coiled-coil proteins have been implemented in a growing number of human diseases. Using the coiled-coil prediction program MultiCoil, we have previously identified all long coiled-coil proteins from the model plant Arabidopsis thaliana and have established a searchable Arabidopsis coiled-coil protein database. Results Here, we have identified all proteins with long coiled-coil domains from 21 additional fully sequenced genomes. Because regions predicted to form coiled-coils interfere with sequence homology determination, we have developed a sequence comparison and clustering strategy based on masking predicted coiled-coil domains. Comparing and grouping all long coiled-coil proteins from 22 genomes, the kingdom-specificity of coiled-coil protein families was determined. At the same time, a number of proteins with unknown function could be grouped with already characterized proteins from other organisms. Conclusion MultiCoil predicts proteins with extended coiled-coil domains (more than 250 amino acids) to be largely absent from bacterial genomes, but present in archaea and eukaryotes. The structural maintenance of chromosomes proteins and their relatives are the only long coiled-coil protein family clearly conserved throughout all kingdoms, indicating their ancient nature. Motor proteins, membrane tethering and vesicle transport proteins are the dominant eukaryote-specific long coiled-coil proteins, suggesting that coiled-coil proteins have gained functions in the increasingly complex processes of subcellular infrastructure maintenance and trafficking control of the eukaryotic cell. PMID:16288662
Genome-wide protein-protein interactions and protein function exploration in cyanobacteria
Lv, Qi; Ma, Weimin; Liu, Hui; Li, Jiang; Wang, Huan; Lu, Fang; Zhao, Chen; Shi, Tieliu
2015-01-01
Genome-wide network analysis is well implemented to study proteins of unknown function. Here, we effectively explored protein functions and the biological mechanism based on inferred high confident protein-protein interaction (PPI) network in cyanobacteria. We integrated data from seven different sources and predicted 1,997 PPIs, which were evaluated by experiments in molecular mechanism, text mining of literatures in proved direct/indirect evidences, and “interologs” in conservation. Combined the predicted PPIs with known PPIs, we obtained 4,715 no-redundant PPIs (involving 3,231 proteins covering over 90% of genome) to generate the PPI network. Based on the PPI network, terms in Gene ontology (GO) were assigned to function-unknown proteins. Functional modules were identified by dissecting the PPI network into sub-networks and analyzing pathway enrichment, with which we investigated novel function of underlying proteins in protein complexes and pathways. Examples of photosynthesis and DNA repair indicate that the network approach is a powerful tool in protein function analysis. Overall, this systems biology approach provides a new insight into posterior functional analysis of PPIs in cyanobacteria. PMID:26490033
Peng, Jiangjun; Leung, Yee; Leung, Kwong-Sak; Wong, Man-Hon; Lu, Gang; Ballester, Pedro J.
2018-01-01
It has recently been claimed that the outstanding performance of machine-learning scoring functions (SFs) is exclusively due to the presence of training complexes with highly similar proteins to those in the test set. Here, we revisit this question using 24 similarity-based training sets, a widely used test set, and four SFs. Three of these SFs employ machine learning instead of the classical linear regression approach of the fourth SF (X-Score which has the best test set performance out of 16 classical SFs). We have found that random forest (RF)-based RF-Score-v3 outperforms X-Score even when 68% of the most similar proteins are removed from the training set. In addition, unlike X-Score, RF-Score-v3 is able to keep learning with an increasing training set size, becoming substantially more predictive than X-Score when the full 1105 complexes are used for training. These results show that machine-learning SFs owe a substantial part of their performance to training on complexes with dissimilar proteins to those in the test set, against what has been previously concluded using the same data. Given that a growing amount of structural and interaction data will be available from academic and industrial sources, this performance gap between machine-learning SFs and classical SFs is expected to enlarge in the future. PMID:29538331
Li, Hongjian; Peng, Jiangjun; Leung, Yee; Leung, Kwong-Sak; Wong, Man-Hon; Lu, Gang; Ballester, Pedro J
2018-03-14
It has recently been claimed that the outstanding performance of machine-learning scoring functions (SFs) is exclusively due to the presence of training complexes with highly similar proteins to those in the test set. Here, we revisit this question using 24 similarity-based training sets, a widely used test set, and four SFs. Three of these SFs employ machine learning instead of the classical linear regression approach of the fourth SF (X-Score which has the best test set performance out of 16 classical SFs). We have found that random forest (RF)-based RF-Score-v3 outperforms X-Score even when 68% of the most similar proteins are removed from the training set. In addition, unlike X-Score, RF-Score-v3 is able to keep learning with an increasing training set size, becoming substantially more predictive than X-Score when the full 1105 complexes are used for training. These results show that machine-learning SFs owe a substantial part of their performance to training on complexes with dissimilar proteins to those in the test set, against what has been previously concluded using the same data. Given that a growing amount of structural and interaction data will be available from academic and industrial sources, this performance gap between machine-learning SFs and classical SFs is expected to enlarge in the future.
FastRNABindR: Fast and Accurate Prediction of Protein-RNA Interface Residues.
El-Manzalawy, Yasser; Abbas, Mostafa; Malluhi, Qutaibah; Honavar, Vasant
2016-01-01
A wide range of biological processes, including regulation of gene expression, protein synthesis, and replication and assembly of many viruses are mediated by RNA-protein interactions. However, experimental determination of the structures of protein-RNA complexes is expensive and technically challenging. Hence, a number of computational tools have been developed for predicting protein-RNA interfaces. Some of the state-of-the-art protein-RNA interface predictors rely on position-specific scoring matrix (PSSM)-based encoding of the protein sequences. The computational efforts needed for generating PSSMs severely limits the practical utility of protein-RNA interface prediction servers. In this work, we experiment with two approaches, random sampling and sequence similarity reduction, for extracting a representative reference database of protein sequences from more than 50 million protein sequences in UniRef100. Our results suggest that random sampled databases produce better PSSM profiles (in terms of the number of hits used to generate the profile and the distance of the generated profile to the corresponding profile generated using the entire UniRef100 data as well as the accuracy of the machine learning classifier trained using these profiles). Based on our results, we developed FastRNABindR, an improved version of RNABindR for predicting protein-RNA interface residues using PSSM profiles generated using 1% of the UniRef100 sequences sampled uniformly at random. To the best of our knowledge, FastRNABindR is the only protein-RNA interface residue prediction online server that requires generation of PSSM profiles for query sequences and accepts hundreds of protein sequences per submission. Our approach for determining the optimal BLAST database for a protein-RNA interface residue classification task has the potential of substantially speeding up, and hence increasing the practical utility of, other amino acid sequence based predictors of protein-protein and protein-DNA interfaces.
Knowledge-based fragment binding prediction.
Tang, Grace W; Altman, Russ B
2014-04-01
Target-based drug discovery must assess many drug-like compounds for potential activity. Focusing on low-molecular-weight compounds (fragments) can dramatically reduce the chemical search space. However, approaches for determining protein-fragment interactions have limitations. Experimental assays are time-consuming, expensive, and not always applicable. At the same time, computational approaches using physics-based methods have limited accuracy. With increasing high-resolution structural data for protein-ligand complexes, there is now an opportunity for data-driven approaches to fragment binding prediction. We present FragFEATURE, a machine learning approach to predict small molecule fragments preferred by a target protein structure. We first create a knowledge base of protein structural environments annotated with the small molecule substructures they bind. These substructures have low-molecular weight and serve as a proxy for fragments. FragFEATURE then compares the structural environments within a target protein to those in the knowledge base to retrieve statistically preferred fragments. It merges information across diverse ligands with shared substructures to generate predictions. Our results demonstrate FragFEATURE's ability to rediscover fragments corresponding to the ligand bound with 74% precision and 82% recall on average. For many protein targets, it identifies high scoring fragments that are substructures of known inhibitors. FragFEATURE thus predicts fragments that can serve as inputs to fragment-based drug design or serve as refinement criteria for creating target-specific compound libraries for experimental or computational screening.
Knowledge-based Fragment Binding Prediction
Tang, Grace W.; Altman, Russ B.
2014-01-01
Target-based drug discovery must assess many drug-like compounds for potential activity. Focusing on low-molecular-weight compounds (fragments) can dramatically reduce the chemical search space. However, approaches for determining protein-fragment interactions have limitations. Experimental assays are time-consuming, expensive, and not always applicable. At the same time, computational approaches using physics-based methods have limited accuracy. With increasing high-resolution structural data for protein-ligand complexes, there is now an opportunity for data-driven approaches to fragment binding prediction. We present FragFEATURE, a machine learning approach to predict small molecule fragments preferred by a target protein structure. We first create a knowledge base of protein structural environments annotated with the small molecule substructures they bind. These substructures have low-molecular weight and serve as a proxy for fragments. FragFEATURE then compares the structural environments within a target protein to those in the knowledge base to retrieve statistically preferred fragments. It merges information across diverse ligands with shared substructures to generate predictions. Our results demonstrate FragFEATURE's ability to rediscover fragments corresponding to the ligand bound with 74% precision and 82% recall on average. For many protein targets, it identifies high scoring fragments that are substructures of known inhibitors. FragFEATURE thus predicts fragments that can serve as inputs to fragment-based drug design or serve as refinement criteria for creating target-specific compound libraries for experimental or computational screening. PMID:24762971
Ceruloplasmin: Macromolecular Assemblies with Iron-Containing Acute Phase Proteins
Samygina, Valeriya R.; Sokolov, Alexey V.; Bourenkov, Gleb; Petoukhov, Maxim V.; Pulina, Maria O.; Zakharova, Elena T.; Vasilyev, Vadim B.; Bartunik, Hans; Svergun, Dmitri I.
2013-01-01
Copper-containing ferroxidase ceruloplasmin (Cp) forms binary and ternary complexes with cationic proteins lactoferrin (Lf) and myeloperoxidase (Mpo) during inflammation. We present an X-ray crystal structure of a 2Cp-Mpo complex at 4.7 Å resolution. This structure allows one to identify major protein–protein interaction areas and provides an explanation for a competitive inhibition of Mpo by Cp and for the activation of p-phenylenediamine oxidation by Mpo. Small angle X-ray scattering was employed to construct low-resolution models of the Cp-Lf complex and, for the first time, of the ternary 2Cp-2Lf-Mpo complex in solution. The SAXS-based model of Cp-Lf supports the predicted 1∶1 stoichiometry of the complex and demonstrates that both lobes of Lf contact domains 1 and 6 of Cp. The 2Cp-2Lf-Mpo SAXS model reveals the absence of interaction between Mpo and Lf in the ternary complex, so Cp can serve as a mediator of protein interactions in complex architecture. Mpo protects antioxidant properties of Cp by isolating its sensitive loop from proteases. The latter is important for incorporation of Fe3+ into Lf, which activates ferroxidase activity of Cp and precludes oxidation of Cp substrates. Our models provide the structural basis for possible regulatory role of these complexes in preventing iron-induced oxidative damage. PMID:23843990
Recovery of known T-cell epitopes by computational scanning of a viral genome
NASA Astrophysics Data System (ADS)
Logean, Antoine; Rognan, Didier
2002-04-01
A new computational method (EpiDock) is proposed for predicting peptide binding to class I MHC proteins, from the amino acid sequence of any protein of immunological interest. Starting from the primary structure of the target protein, individual three-dimensional structures of all possible MHC-peptide (8-, 9- and 10-mers) complexes are obtained by homology modelling. A free energy scoring function (Fresno) is then used to predict the absolute binding free energy of all possible peptides to the class I MHC restriction protein. Assuming that immunodominant epitopes are usually found among the top MHC binders, the method can thus be applied to predict the location of immunogenic peptides on the sequence of the protein target. When applied to the prediction of HLA-A*0201-restricted T-cell epitopes from the Hepatitis B virus, EpiDock was able to recover 92% of known high affinity binders and 80% of known epitopes within a filtered subset of all possible nonapeptides corresponding to about one tenth of the full theoretical list. The proposed method is fully automated and fast enough to scan a viral genome in less than an hour on a parallel computing architecture. As it requires very few starting experimental data, EpiDock can be used: (i) to predict potential T-cell epitopes from viral genomes (ii) to roughly predict still unknown peptide binding motifs for novel class I MHC alleles.
Kröner, Frieder; Elsäßer, Dennis; Hubbuch, Jürgen
2013-11-29
The accelerating growth of the market for biopharmaceutical proteins, the market entry of biosimilars and the growing interest in new, more complex molecules constantly pose new challenges for bioseparation process development. In the presented work we demonstrate the application of a multidimensional, analytical separation approach to obtain the relevant physicochemical parameters of single proteins in a complex mixture for in silico chromatographic process development. A complete cell lysate containing a low titre target protein was first fractionated by multiple linear salt gradient anion exchange chromatography (AEC) with varying gradient length. The collected fractions were subsequently analysed by high-throughput capillary gel electrophoresis (HT-CGE) after being desalted and concentrated. From the obtained data of the 2D-separation the retention-volumes and the concentration of the single proteins were determined. The retention-volumes of the single proteins were used to calculate the related steric-mass action model parameters. In a final evaluation experiment the received parameters were successfully applied to predict the retention behaviour of the single proteins in salt gradient AEC. Copyright © 2013 Elsevier B.V. All rights reserved.
Muley, Vijaykumar Yogesh; Ranjan, Akash
2012-01-01
Recent progress in computational methods for predicting physical and functional protein-protein interactions has provided new insights into the complexity of biological processes. Most of these methods assume that functionally interacting proteins are likely to have a shared evolutionary history. This history can be traced out for the protein pairs of a query genome by correlating different evolutionary aspects of their homologs in multiple genomes known as the reference genomes. These methods include phylogenetic profiling, gene neighborhood and co-occurrence of the orthologous protein coding genes in the same cluster or operon. These are collectively known as genomic context methods. On the other hand a method called mirrortree is based on the similarity of phylogenetic trees between two interacting proteins. Comprehensive performance analyses of these methods have been frequently reported in literature. However, very few studies provide insight into the effect of reference genome selection on detection of meaningful protein interactions. We analyzed the performance of four methods and their variants to understand the effect of reference genome selection on prediction efficacy. We used six sets of reference genomes, sampled in accordance with phylogenetic diversity and relationship between organisms from 565 bacteria. We used Escherichia coli as a model organism and the gold standard datasets of interacting proteins reported in DIP, EcoCyc and KEGG databases to compare the performance of the prediction methods. Higher performance for predicting protein-protein interactions was achievable even with 100-150 bacterial genomes out of 565 genomes. Inclusion of archaeal genomes in the reference genome set improves performance. We find that in order to obtain a good performance, it is better to sample few genomes of related genera of prokaryotes from the large number of available genomes. Moreover, such a sampling allows for selecting 50-100 genomes for comparable accuracy of predictions when computational resources are limited.
Ganguly, Debabani; Zhang, Weihong; Chen, Jianhan
2013-01-01
Achieving facile specific recognition is essential for intrinsically disordered proteins (IDPs) that are involved in cellular signaling and regulation. Consideration of the physical time scales of protein folding and diffusion-limited protein-protein encounter has suggested that the frequent requirement of protein folding for specific IDP recognition could lead to kinetic bottlenecks. How IDPs overcome such potential kinetic bottlenecks to viably function in signaling and regulation in general is poorly understood. Our recent computational and experimental study of cell-cycle regulator p27 (Ganguly et al., J. Mol. Biol. (2012)) demonstrated that long-range electrostatic forces exerted on enriched charges of IDPs could accelerate protein-protein encounter via “electrostatic steering” and at the same time promote “folding-competent” encounter topologies to enhance the efficiency of IDP folding upon encounter. Here, we further investigated the coupled binding and folding mechanisms and the roles of electrostatic forces in the formation of three IDP complexes with more complex folded topologies. The surface electrostatic potentials of these complexes lack prominent features like those observed for the p27/Cdk2/cyclin A complex to directly suggest the ability of electrostatic forces to facilitate folding upon encounter. Nonetheless, similar electrostatically accelerated encounter and folding mechanisms were consistently predicted for all three complexes using topology-based coarse-grained simulations. Together with our previous analysis of charge distributions in known IDP complexes, our results support a prevalent role of electrostatic interactions in promoting efficient coupled binding and folding for facile specific recognition. These results also suggest that there is likely a co-evolution of IDP folded topology, charge characteristics, and coupled binding and folding mechanisms, driven at least partially by the need to achieve fast association kinetics for cellular signaling and regulation. PMID:24278008
NASA Astrophysics Data System (ADS)
Norinder, Ulf
1990-12-01
An experimental design based 3-D QSAR analysis using a combination of principal component and PLS analysis is presented and applied to human corticosteroid-binding globulin complexes. The predictive capability of the created model is good. The technique can also be used as guidance when selecting new compounds to be investigated.
NASA Astrophysics Data System (ADS)
Wang, Yu; Guo, Yanzhi; Kuang, Qifan; Pu, Xuemei; Ji, Yue; Zhang, Zhihang; Li, Menglong
2015-04-01
The assessment of binding affinity between ligands and the target proteins plays an essential role in drug discovery and design process. As an alternative to widely used scoring approaches, machine learning methods have also been proposed for fast prediction of the binding affinity with promising results, but most of them were developed as all-purpose models despite of the specific functions of different protein families, since proteins from different function families always have different structures and physicochemical features. In this study, we proposed a random forest method to predict the protein-ligand binding affinity based on a comprehensive feature set covering protein sequence, binding pocket, ligand structure and intermolecular interaction. Feature processing and compression was respectively implemented for different protein family datasets, which indicates that different features contribute to different models, so individual representation for each protein family is necessary. Three family-specific models were constructed for three important protein target families of HIV-1 protease, trypsin and carbonic anhydrase respectively. As a comparison, two generic models including diverse protein families were also built. The evaluation results show that models on family-specific datasets have the superior performance to those on the generic datasets and the Pearson and Spearman correlation coefficients ( R p and Rs) on the test sets are 0.740, 0.874, 0.735 and 0.697, 0.853, 0.723 for HIV-1 protease, trypsin and carbonic anhydrase respectively. Comparisons with the other methods further demonstrate that individual representation and model construction for each protein family is a more reasonable way in predicting the affinity of one particular protein family.
Kaur, Parminder; Kiselar, Janna; Yang, Sichun; Chance, Mark R.
2015-01-01
Hydroxyl radical footprinting based MS for protein structure assessment has the goal of understanding ligand induced conformational changes and macromolecular interactions, for example, protein tertiary and quaternary structure, but the structural resolution provided by typical peptide-level quantification is limiting. In this work, we present experimental strategies using tandem-MS fragmentation to increase the spatial resolution of the technique to the single residue level to provide a high precision tool for molecular biophysics research. Overall, in this study we demonstrated an eightfold increase in structural resolution compared with peptide level assessments. In addition, to provide a quantitative analysis of residue based solvent accessibility and protein topography as a basis for high-resolution structure prediction; we illustrate strategies of data transformation using the relative reactivity of side chains as a normalization strategy and predict side-chain surface area from the footprinting data. We tested the methods by examination of Ca+2-calmodulin showing highly significant correlations between surface area and side-chain contact predictions for individual side chains and the crystal structure. Tandem ion based hydroxyl radical footprinting-MS provides quantitative high-resolution protein topology information in solution that can fill existing gaps in structure determination for large proteins and macromolecular complexes. PMID:25687570
Ferreira da Costa, Joana; Silva, David; Caamaño, Olga; Brea, José M; Loza, Maria Isabel; Munteanu, Cristian R; Pazos, Alejandro; García-Mera, Xerardo; González-Díaz, Humbert
2018-06-25
Predicting drug-protein interactions (DPIs) for target proteins involved in dopamine pathways is a very important goal in medicinal chemistry. We can tackle this problem using Molecular Docking or Machine Learning (ML) models for one specific protein. Unfortunately, these models fail to account for large and complex big data sets of preclinical assays reported in public databases. This includes multiple conditions of assays, such as different experimental parameters, biological assays, target proteins, cell lines, organism of the target, or organism of assay. On the other hand, perturbation theory (PT) models allow us to predict the properties of a query compound or molecular system in experimental assays with multiple boundary conditions based on a previously known case of reference. In this work, we report the first PTML (PT + ML) study of a large ChEMBL data set of preclinical assays of compounds targeting dopamine pathway proteins. The best PTML model found predicts 50000 cases with accuracy of 70-91% in training and external validation series. We also compared the linear PTML model with alternative PTML models trained with multiple nonlinear methods (artificial neural network (ANN), Random Forest, Deep Learning, etc.). Some of the nonlinear methods outperform the linear model but at the cost of a notable increment of the complexity of the model. We illustrated the practical use of the new model with a proof-of-concept theoretical-experimental study. We reported for the first time the organic synthesis, chemical characterization, and pharmacological assay of a new series of l-prolyl-l-leucyl-glycinamide (PLG) peptidomimetic compounds. In addition, we performed a molecular docking study for some of these compounds with the software Vina AutoDock. The work ends with a PTML model predictive study of the outcomes of the new compounds in a large number of assays. Therefore, this study offers a new computational methodology for predicting the outcome for any compound in new assays. This PTML method focuses on the prediction with a simple linear model of multiple pharmacological parameters (IC 50 , EC 50 , K i , etc.) for compounds in assays involving different cell lines used, organisms of the protein target, or organism of assay for proteins in the dopamine pathway.
NASA Astrophysics Data System (ADS)
Kim, Duckhoe; Sahin, Ozgur
2015-03-01
Scanning probe microscopes can be used to image and chemically characterize surfaces down to the atomic scale. However, the localized tip-sample interactions in scanning probe microscopes limit high-resolution images to the topmost atomic layer of surfaces, and characterizing the inner structures of materials and biomolecules is a challenge for such instruments. Here, we show that an atomic force microscope can be used to image and three-dimensionally reconstruct chemical groups inside a protein complex. We use short single-stranded DNAs as imaging labels that are linked to target regions inside a protein complex, and T-shaped atomic force microscope cantilevers functionalized with complementary probe DNAs allow the labels to be located with sequence specificity and subnanometre resolution. After measuring pairwise distances between labels, we reconstruct the three-dimensional structure formed by the target chemical groups within the protein complex using simple geometric calculations. Experiments with the biotin-streptavidin complex show that the predicted three-dimensional loci of the carboxylic acid groups of biotins are within 2 Å of their respective loci in the corresponding crystal structure, suggesting that scanning probe microscopes could complement existing structural biological techniques in solving structures that are difficult to study due to their size and complexity.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Changotra, Harish; Turk, Susan M.; Artigues, Antonio
The Epstein–Barr virus glycoprotein complex gMgN has been implicated in assembly and release of fully enveloped virus, although the precise role that it plays has not been elucidated. We report here that the long predicted cytoplasmic tail of gM is not required for complex formation and that it interacts with the cellular protein p32, which has been reported to be involved in nuclear egress of human cytomegalovirus and herpes simplex virus. Although redistribution of p32 and colocalization with gM was not observed in virus infected cells, knockdown of p32 expression by siRNA or lentivirus-delivered shRNA recapitulated the phenotype of amore » virus lacking expression of gNgM. A proportion of virus released from cells sedimented with characteristics of virus lacking an intact envelope and there was an increase in virus trapped in nuclear condensed chromatin. The observations suggest the possibility that p32 may also be involved in nuclear egress of Epstein–Barr virus. - Highlights: • The predicted cytoplasmic tail of gM is not required to complex with gN. • Cellular p32 can interact with the predicted cytoplasmic tail of EBV gM. • Knockdown of p32 recapitulates the phenotype of virus lacking the gNgM complex.« less
Lau, Julia B; Stork, Simone; Moog, Daniel; Sommer, Maik S; Maier, Uwe G
2015-05-01
Nuclear-encoded pre-proteins being imported into complex plastids of red algal origin have to cross up to five membranes. Thereby, transport across the second outermost or periplastidal membrane (PPM) is facilitated by SELMA (symbiont-specific ERAD-like machinery), an endoplasmic reticulum-associated degradation (ERAD)-derived machinery. Core components of SELMA are enzymes involved in ubiquitination (E1-E3), a Cdc48 ATPase complex and Derlin proteins. These components are present in all investigated organisms with four membrane-bound complex plastids of red algal origin, suggesting a ubiquitin-dependent translocation process of substrates mechanistically similar to the process of retro-translocation in ERAD. Even if, according to the current model, translocation via SELMA does not end up in the classical poly-ubiquitination, transient mono-/oligo-ubiquitination of pre-proteins might be required for the mechanism of translocation. We investigated the import mechanism of SELMA and were able to show that protein transport across the PPM depends on lysines in the N-terminal but not in the C-terminal part of pre-proteins. These lysines are predicted to be targets of ubiquitination during the translocation process. As proteins lacking the N-terminal lysines get stuck in the PPM, a 'frozen intermediate' of the translocation process could be envisioned and initially characterized. © 2015 John Wiley & Sons Ltd.
Chen, Langdong; Cao, Yan; Zhang, Hai; Lv, Diya; Zhao, Yahong; Liu, Yanjun; Ye, Guan; Chai, Yifeng
2018-01-31
Yangxinshi tablet (YXST) is an effective treatment for heart failure and myocardial infarction; it consists of 13 herbal medicines formulated according to traditional Chinese Medicine (TCM) practices. It has been used for the treatment of cardiovascular disease for many years in China. In this study, a network pharmacology-based strategy was used to elucidate the mechanism of action of YXST for the treatment of heart failure. Cardiovascular disease-related protein target and compound databases were constructed for YXST. A molecular docking platform was used to predict the protein targets of YXST. The affinity between proteins and ingredients was determined using surface plasmon resonance (SPR) assays. The action modes between targets and representative ingredients were calculated using Glide docking, and the related pathways were predicted using the Kyoto Encyclopedia of Genes and Genomes (KEGG) database. A protein target database containing 924 proteins was constructed; 179 compounds in YXST were identified, and 48 compounds with high relevance to the proteins were defined as representative ingredients. Thirty-four protein targets of the 48 representative ingredients were analyzed and classified into two categories: immune and cardiovascular systems. The SPR assay and molecular docking partly validated the interplay between protein targets and representative ingredients. Moreover, 28 pathways related to heart failure were identified, which provided directions for further research on YXST. This study demonstrated that the cardiovascular protective effect of YXST mainly involved the immune and cardiovascular systems. Through the research strategy based on network pharmacology, we analysis the complex system of YXST and found 48 representative compounds, 34 proteins and 28 related pathways of YXST, which could help us understand the underlying mechanism of YSXT's anti-heart failure effect. The network-based investigation could help researchers simplify the complex system of YXSY. It may also offer a feasible approach to decipher the chemical and pharmacological bases of other TCM formulas. Copyright © 2018 Elsevier B.V. All rights reserved.
Low abundance of the matrix arm of complex I in mitochondria predicts longevity in mice
Miwa, Satomi; Jow, Howsun; Baty, Karen; Johnson, Amy; Czapiewski, Rafal; Saretzki, Gabriele; Treumann, Achim; von Zglinicki, Thomas
2014-01-01
Mitochondrial function is an important determinant of the ageing process; however, the mitochondrial properties that enable longevity are not well understood. Here we show that optimal assembly of mitochondrial complex I predicts longevity in mice. Using an unbiased high-coverage high-confidence approach, we demonstrate that electron transport chain proteins, especially the matrix arm subunits of complex I, are decreased in young long-living mice, which is associated with improved complex I assembly, higher complex I-linked state 3 oxygen consumption rates and decreased superoxide production, whereas the opposite is seen in old mice. Disruption of complex I assembly reduces oxidative metabolism with concomitant increase in mitochondrial superoxide production. This is rescued by knockdown of the mitochondrial chaperone, prohibitin. Disrupted complex I assembly causes premature senescence in primary cells. We propose that lower abundance of free catalytic complex I components supports complex I assembly, efficacy of substrate utilization and minimal ROS production, enabling enhanced longevity. PMID:24815183
Agius, Rudi; Torchala, Mieczyslaw; Moal, Iain H.; Fernández-Recio, Juan; Bates, Paul A.
2013-01-01
Predicting the effects of mutations on the kinetic rate constants of protein-protein interactions is central to both the modeling of complex diseases and the design of effective peptide drug inhibitors. However, while most studies have concentrated on the determination of association rate constants, dissociation rates have received less attention. In this work we take a novel approach by relating the changes in dissociation rates upon mutation to the energetics and architecture of hotspots and hotregions, by performing alanine scans pre- and post-mutation. From these scans, we design a set of descriptors that capture the change in hotspot energy and distribution. The method is benchmarked on 713 kinetically characterized mutations from the SKEMPI database. Our investigations show that, with the use of hotspot descriptors, energies from single-point alanine mutations may be used for the estimation of off-rate mutations to any residue type and also multi-point mutations. A number of machine learning models are built from a combination of molecular and hotspot descriptors, with the best models achieving a Pearson's Correlation Coefficient of 0.79 with experimental off-rates and a Matthew's Correlation Coefficient of 0.6 in the detection of rare stabilizing mutations. Using specialized feature selection models we identify descriptors that are highly specific and, conversely, broadly important to predicting the effects of different classes of mutations, interface regions and complexes. Our results also indicate that the distribution of the critical stability regions across protein-protein interfaces is a function of complex size more strongly than interface area. In addition, mutations at the rim are critical for the stability of small complexes, but consistently harder to characterize. The relationship between hotregion size and the dissociation rate is also investigated and, using hotspot descriptors which model cooperative effects within hotregions, we show how the contribution of hotregions of different sizes, changes under different cooperative effects. PMID:24039569
Andreeva, Antonina
2016-06-15
The Structural Classification of Proteins (SCOP) database has facilitated the development of many tools and algorithms and it has been successfully used in protein structure prediction and large-scale genome annotations. During the development of SCOP, numerous exceptions were found to topological rules, along with complex evolutionary scenarios and peculiarities in proteins including the ability to fold into alternative structures. This article reviews cases of structural variations observed for individual proteins and among groups of homologues, knowledge of which is essential for protein structure modelling. © 2016 The Author(s). published by Portland Press Limited on behalf of the Biochemical Society.
Mahalingam, Rajasekaran; Peng, Hung-Pin; Yang, An-Suei
2014-08-01
Protein-fatty acid interaction is vital for many cellular processes and understanding this interaction is important for functional annotation as well as drug discovery. In this work, we present a method for predicting the fatty acid (FA)-binding residues by using three-dimensional probability density distributions of interacting atoms of FAs on protein surfaces which are derived from the known protein-FA complex structures. A machine learning algorithm was established to learn the characteristic patterns of the probability density maps specific to the FA-binding sites. The predictor was trained with five-fold cross validation on a non-redundant training set and then evaluated with an independent test set as well as on holo-apo pair's dataset. The results showed good accuracy in predicting the FA-binding residues. Further, the predictor developed in this study is implemented as an online server which is freely accessible at the following website, http://ismblab.genomics.sinica.edu.tw/. Copyright © 2014 Elsevier B.V. All rights reserved.
Context influences on TALE–DNA binding revealed by quantitative profiling
Rogers, Julia M.; Barrera, Luis A.; Reyon, Deepak; Sander, Jeffry D.; Kellis, Manolis; Joung, J Keith; Bulyk, Martha L.
2015-01-01
Transcription activator-like effector (TALE) proteins recognize DNA using a seemingly simple DNA-binding code, which makes them attractive for use in genome engineering technologies that require precise targeting. Although this code is used successfully to design TALEs to target specific sequences, off-target binding has been observed and is difficult to predict. Here we explore TALE–DNA interactions comprehensively by quantitatively assaying the DNA-binding specificities of 21 representative TALEs to ∼5,000–20,000 unique DNA sequences per protein using custom-designed protein-binding microarrays (PBMs). We find that protein context features exert significant influences on binding. Thus, the canonical recognition code does not fully capture the complexity of TALE–DNA binding. We used the PBM data to develop a computational model, Specificity Inference For TAL-Effector Design (SIFTED), to predict the DNA-binding specificity of any TALE. We provide SIFTED as a publicly available web tool that predicts potential genomic off-target sites for improved TALE design. PMID:26067805
Context influences on TALE-DNA binding revealed by quantitative profiling.
Rogers, Julia M; Barrera, Luis A; Reyon, Deepak; Sander, Jeffry D; Kellis, Manolis; Joung, J Keith; Bulyk, Martha L
2015-06-11
Transcription activator-like effector (TALE) proteins recognize DNA using a seemingly simple DNA-binding code, which makes them attractive for use in genome engineering technologies that require precise targeting. Although this code is used successfully to design TALEs to target specific sequences, off-target binding has been observed and is difficult to predict. Here we explore TALE-DNA interactions comprehensively by quantitatively assaying the DNA-binding specificities of 21 representative TALEs to ∼5,000-20,000 unique DNA sequences per protein using custom-designed protein-binding microarrays (PBMs). We find that protein context features exert significant influences on binding. Thus, the canonical recognition code does not fully capture the complexity of TALE-DNA binding. We used the PBM data to develop a computational model, Specificity Inference For TAL-Effector Design (SIFTED), to predict the DNA-binding specificity of any TALE. We provide SIFTED as a publicly available web tool that predicts potential genomic off-target sites for improved TALE design.
Jogler, Christian; Waldmann, Jost; Huang, Xiaoluo; Jogler, Mareike; Glöckner, Frank Oliver; Mascher, Thorsten; Kolter, Roberto
2012-12-01
Members of the Planctomycetes clade share many unusual features for bacteria. Their cytoplasm contains membrane-bound compartments, they lack peptidoglycan and FtsZ, they divide by polar budding, and they are capable of endocytosis. Planctomycete genomes have remained enigmatic, generally being quite large (up to 9 Mb), and on average, 55% of their predicted proteins are of unknown function. Importantly, proteins related to the unusual traits of Planctomycetes remain largely unknown. Thus, we embarked on bioinformatic analyses of these genomes in an effort to predict proteins that are likely to be involved in compartmentalization, cell division, and signal transduction. We used three complementary strategies. First, we defined the Planctomycetes core genome and subtracted genes of well-studied model organisms. Second, we analyzed the gene content and synteny of morphogenesis and cell division genes and combined both methods using a "guilt-by-association" approach. Third, we identified signal transduction systems as well as sigma factors. These analyses provide a manageable list of candidate genes for future genetic studies and provide evidence for complex signaling in the Planctomycetes akin to that observed for bacteria with complex life-styles, such as Myxococcus xanthus.
An organelle-specific protein landscape identifies novel diseases and molecular mechanisms
Boldt, Karsten; van Reeuwijk, Jeroen; Lu, Qianhao; Koutroumpas, Konstantinos; Nguyen, Thanh-Minh T.; Texier, Yves; van Beersum, Sylvia E. C.; Horn, Nicola; Willer, Jason R.; Mans, Dorus A.; Dougherty, Gerard; Lamers, Ideke J. C.; Coene, Karlien L. M.; Arts, Heleen H.; Betts, Matthew J.; Beyer, Tina; Bolat, Emine; Gloeckner, Christian Johannes; Haidari, Khatera; Hetterschijt, Lisette; Iaconis, Daniela; Jenkins, Dagan; Klose, Franziska; Knapp, Barbara; Latour, Brooke; Letteboer, Stef J. F.; Marcelis, Carlo L.; Mitic, Dragana; Morleo, Manuela; Oud, Machteld M.; Riemersma, Moniek; Rix, Susan; Terhal, Paulien A.; Toedt, Grischa; van Dam, Teunis J. P.; de Vrieze, Erik; Wissinger, Yasmin; Wu, Ka Man; Apic, Gordana; Beales, Philip L.; Blacque, Oliver E.; Gibson, Toby J.; Huynen, Martijn A.; Katsanis, Nicholas; Kremer, Hannie; Omran, Heymut; van Wijk, Erwin; Wolfrum, Uwe; Kepes, François; Davis, Erica E.; Franco, Brunella; Giles, Rachel H.; Ueffing, Marius; Russell, Robert B.; Roepman, Ronald; Al-Turki, Saeed; Anderson, Carl; Antony, Dinu; Barroso, Inês; Bentham, Jamie; Bhattacharya, Shoumo; Carss, Keren; Chatterjee, Krishna; Cirak, Sebahattin; Cosgrove, Catherine; Danecek, Petr; Durbin, Richard; Fitzpatrick, David; Floyd, Jamie; Reghan Foley, A.; Franklin, Chris; Futema, Marta; Humphries, Steve E.; Hurles, Matt; Joyce, Chris; McCarthy, Shane; Mitchison, Hannah M.; Muddyman, Dawn; Muntoni, Francesco; O'Rahilly, Stephen; Onoufriadis, Alexandros; Payne, Felicity; Plagnol, Vincent; Raymond, Lucy; Savage, David B.; Scambler, Peter; Schmidts, Miriam; Schoenmakers, Nadia; Semple, Robert; Serra, Eva; Stalker, Jim; van Kogelenberg, Margriet; Vijayarangakannan, Parthiban; Walter, Klaudia; Whittall, Ros; Williamson, Kathy
2016-01-01
Cellular organelles provide opportunities to relate biological mechanisms to disease. Here we use affinity proteomics, genetics and cell biology to interrogate cilia: poorly understood organelles, where defects cause genetic diseases. Two hundred and seventeen tagged human ciliary proteins create a final landscape of 1,319 proteins, 4,905 interactions and 52 complexes. Reverse tagging, repetition of purifications and statistical analyses, produce a high-resolution network that reveals organelle-specific interactions and complexes not apparent in larger studies, and links vesicle transport, the cytoskeleton, signalling and ubiquitination to ciliary signalling and proteostasis. We observe sub-complexes in exocyst and intraflagellar transport complexes, which we validate biochemically, and by probing structurally predicted, disruptive, genetic variants from ciliary disease patients. The landscape suggests other genetic diseases could be ciliary including 3M syndrome. We show that 3M genes are involved in ciliogenesis, and that patient fibroblasts lack cilia. Overall, this organelle-specific targeting strategy shows considerable promise for Systems Medicine. PMID:27173435
A new theoretical approach to analyze complex processes in cytoskeleton proteins.
Li, Xin; Kolomeisky, Anatoly B
2014-03-20
Cytoskeleton proteins are filament structures that support a large number of important biological processes. These dynamic biopolymers exist in nonequilibrium conditions stimulated by hydrolysis chemical reactions in their monomers. Current theoretical methods provide a comprehensive picture of biochemical and biophysical processes in cytoskeleton proteins. However, the description is only qualitative under biologically relevant conditions because utilized theoretical mean-field models neglect correlations. We develop a new theoretical method to describe dynamic processes in cytoskeleton proteins that takes into account spatial correlations in the chemical composition of these biopolymers. Our approach is based on analysis of probabilities of different clusters of subunits. It allows us to obtain exact analytical expressions for a variety of dynamic properties of cytoskeleton filaments. By comparing theoretical predictions with Monte Carlo computer simulations, it is shown that our method provides a fully quantitative description of complex dynamic phenomena in cytoskeleton proteins under all conditions.
Application of long-range order to predict unfolding rates of two-state proteins.
Harihar, B; Selvaraj, S
2011-03-01
Predicting the experimental unfolding rates of two-state proteins and models describing the unfolding rates of these proteins is quite limited because of the complexity present in the unfolding mechanism and the lack of experimental unfolding data compared with folding data. In this work, 25 two-state proteins characterized by Maxwell et al. (Protein Sci 2005;14:602–616) using a consensus set of experimental conditions were taken, and the parameter long-range order (LRO) derived from their three-dimensional structures were related with their experimental unfolding rates ln(k(u)). From the total data set of 30 proteins used by Maxwell et al. (Protein Sci 2005;14:602–616), five slow-unfolding proteins with very low unfolding rates were considered to be outliers and were not included in our data set. Except all beta structural class, LRO of both the all-alpha and mixed-class proteins showed a strong inverse correlation of r = -0.99 and -0.88, respectively, with experimental ln(k(u)). LRO shows a correlation of -0.62 with experimental ln(k(u)) for all-beta proteins. For predicting the unfolding rates, a simple statistical method has been used and linear regression equations were developed for individual structural classes of proteins using LRO, and the results obtained showed a better agreement with experimental results. Copyright © 2010 Wiley-Liss, Inc.
Liu, Jie; Su, Minyi; Liu, Zhihai; Li, Jie; Li, Yan; Wang, Renxiao
2017-07-18
In structure-based drug design, binding affinity prediction remains as a challenging goal for current scoring functions. Development of target-biased scoring functions provides a new possibility for tackling this problem, but this approach is also associated with certain technical difficulties. We previously reported the Knowledge-Guided Scoring (KGS) method as an alternative approach (BMC Bioinformatics, 2010, 11, 193-208). The key idea is to compute the binding affinity of a given protein-ligand complex based on the known binding data of an appropriate reference complex, so the error in binding affinity prediction can be reduced effectively. In this study, we have developed an upgraded version, i.e. KGS2, by employing 3D protein-ligand interaction fingerprints in reference selection. KGS2 was evaluated in combination with four scoring functions (X-Score, ChemPLP, ASP, and GoldScore) on five drug targets (HIV-1 protease, carbonic anhydrase 2, beta-secretase 1, beta-trypsin, and checkpoint kinase 1). In the in situ scoring test, considerable improvements were observed in most cases after application of KGS2. Besides, the performance of KGS2 was always better than KGS in all cases. In the more challenging molecular docking test, application of KGS2 also led to improved structure-activity relationship in some cases. KGS2 can be applied as a convenient "add-on" to current scoring functions without the need to re-engineer them, and its application is not limited to certain target proteins as customized scoring functions. As an interpolation method, its accuracy in principle can be improved further with the increasing knowledge of protein-ligand complex structures and binding affinity data. We expect that KGS2 will become a practical tool for enhancing the performance of current scoring functions in binding affinity prediction. The KGS2 software is available upon contacting the authors.
Liquid crystalline phase behavior of protein fibers in water: experiments versus theory.
Jung, Jin-Mi; Mezzenga, Raffaele
2010-01-05
We have developed a new method allowing the study of the thermodynamic phase behavior of mesoscopic colloidal systems consisting of amyloid protein fibers in water, obtained by heat denaturation and aggregation of beta-lactoglobulin, a dairy protein. The fibers have a cross section of about 5.2 nm and two groups of polydisperse contour lengths: (i) long fibers of 1-20 microm, showing semiflexible behavior, and (ii) short rods of 100-200 nm long, obtained by cutting the long fibers via high-pressure homogenization. At pH 2 without salt, these fibers are highly charged and stable in water. We have studied the isotropic-nematic phase transition for both systems and compared our results with the theoretical values predicted by Onsager's theory. The experimentally measured isotropic-nematic phase transition was found to occur at 0.4% and at 3% for the long and short fibers, respectively. For both systems, this phase transition occurs at concentrations more than 1 order of magnitude lower than what is expected based on Onsager's theory. Moreover, at low enough pH, no intermediate biphasic region was observed between the isotropic phase and the nematic phase. The phase diagrams of both systems (pH vs concentration) showed similar, yet complex and rich, phase behavior. We discuss the possible physical fundamentals ruling the phase diagram as well as the discrepancy we observe for the isotropic-nematic phase transition between our experimental results and the predicted theoretical results. Our work highlights that systems formed by water-amyloid protein fibers are way too complex to be understood based solely on Onsager's theories. Experimental results are revisited in terms of the Flory's theory (1956) for suspensions of rods, which allows accounting for rod-solvent hydrophobic interactions. This theoretical approach allows explaining, on a semiquantitative basis, most of the discrepancies observed between the experimental results and Onsager's predictions. The sources of protein fibers complex colloidal behavior are analyzed and discussed at length.
1991-01-01
We recently described the identification of BOS1 (Newman, A., J. Shim, and S. Ferro-Novick. 1990. Mol. Cell. Biol. 10:3405-3414.). BOS1 is a gene that in multiple copy suppresses the growth and secretion defect of bet1 and sec22, two mutants that disrupt transport from the ER to the Golgi complex in yeast. The ability of BOS1 to specifically suppress mutants blocked at a particular stage of the secretory pathway suggested that this gene encodes a protein that functions in this process. The experiments presented in this study support this hypothesis. Specifically, the BOS1 gene was found to be essential for cellular growth. Furthermore, cells depleted of the Bos1 protein fail to transport pro-alpha-factor and carboxypeptidase Y (CPY) to the Golgi apparatus. This defect in export leads to the accumulation of an extensive network of ER and small vesicles. DNA sequence analysis predicts that Bos1 is a 27-kD protein containing a putative membrane- spanning domain. This prediction is supported by differential centrifugation experiments. Thus, Bos1 appears to be a membrane protein that functions in conjunction with Bet1 and Sec22 to facilitate the transport of proteins at a step subsequent to translocation into the ER but before entry into the Golgi apparatus. PMID:2007627
Accurate high-throughput structure mapping and prediction with transition metal ion FRET
Yu, Xiaozhen; Wu, Xiongwu; Bermejo, Guillermo A.; Brooks, Bernard R.; Taraska, Justin W.
2013-01-01
Mapping the landscape of a protein’s conformational space is essential to understanding its functions and regulation. The limitations of many structural methods have made this process challenging for most proteins. Here, we report that transition metal ion FRET (tmFRET) can be used in a rapid, highly parallel screen, to determine distances from multiple locations within a protein at extremely low concentrations. The distances generated through this screen for the protein Maltose Binding Protein (MBP) match distances from the crystal structure to within a few angstroms. Furthermore, energy transfer accurately detects structural changes during ligand binding. Finally, fluorescence-derived distances can be used to guide molecular simulations to find low energy states. Our results open the door to rapid, accurate mapping and prediction of protein structures at low concentrations, in large complex systems, and in living cells. PMID:23273426
Learning a peptide-protein binding affinity predictor with kernel ridge regression
2013-01-01
Background The cellular function of a vast majority of proteins is performed through physical interactions with other biomolecules, which, most of the time, are other proteins. Peptides represent templates of choice for mimicking a secondary structure in order to modulate protein-protein interaction. They are thus an interesting class of therapeutics since they also display strong activity, high selectivity, low toxicity and few drug-drug interactions. Furthermore, predicting peptides that would bind to a specific MHC alleles would be of tremendous benefit to improve vaccine based therapy and possibly generate antibodies with greater affinity. Modern computational methods have the potential to accelerate and lower the cost of drug and vaccine discovery by selecting potential compounds for testing in silico prior to biological validation. Results We propose a specialized string kernel for small bio-molecules, peptides and pseudo-sequences of binding interfaces. The kernel incorporates physico-chemical properties of amino acids and elegantly generalizes eight kernels, comprised of the Oligo, the Weighted Degree, the Blended Spectrum, and the Radial Basis Function. We provide a low complexity dynamic programming algorithm for the exact computation of the kernel and a linear time algorithm for it’s approximation. Combined with kernel ridge regression and SupCK, a novel binding pocket kernel, the proposed kernel yields biologically relevant and good prediction accuracy on the PepX database. For the first time, a machine learning predictor is capable of predicting the binding affinity of any peptide to any protein with reasonable accuracy. The method was also applied to both single-target and pan-specific Major Histocompatibility Complex class II benchmark datasets and three Quantitative Structure Affinity Model benchmark datasets. Conclusion On all benchmarks, our method significantly (p-value ≤ 0.057) outperforms the current state-of-the-art methods at predicting peptide-protein binding affinities. The proposed approach is flexible and can be applied to predict any quantitative biological activity. Moreover, generating reliable peptide-protein binding affinities will also improve system biology modelling of interaction pathways. Lastly, the method should be of value to a large segment of the research community with the potential to accelerate the discovery of peptide-based drugs and facilitate vaccine development. The proposed kernel is freely available at http://graal.ift.ulaval.ca/downloads/gs-kernel/. PMID:23497081
Armean, Irina M; Lilley, Kathryn S; Trotter, Matthew W B; Pilkington, Nicholas C V; Holden, Sean B
2018-06-01
Protein-protein interactions (PPI) play a crucial role in our understanding of protein function and biological processes. The standardization and recording of experimental findings is increasingly stored in ontologies, with the Gene Ontology (GO) being one of the most successful projects. Several PPI evaluation algorithms have been based on the application of probabilistic frameworks or machine learning algorithms to GO properties. Here, we introduce a new training set design and machine learning based approach that combines dependent heterogeneous protein annotations from the entire ontology to evaluate putative co-complex protein interactions determined by empirical studies. PPI annotations are built combinatorically using corresponding GO terms and InterPro annotation. We use a S.cerevisiae high-confidence complex dataset as a positive training set. A series of classifiers based on Maximum Entropy and support vector machines (SVMs), each with a composite counterpart algorithm, are trained on a series of training sets. These achieve a high performance area under the ROC curve of ≤0.97, outperforming go2ppi-a previously established prediction tool for protein-protein interactions (PPI) based on Gene Ontology (GO) annotations. https://github.com/ima23/maxent-ppi. sbh11@cl.cam.ac.uk. Supplementary data are available at Bioinformatics online.
Xiong, Dapeng; Zeng, Jianyang; Gong, Haipeng
2017-09-01
Residue-residue contacts are of great value for protein structure prediction, since contact information, especially from those long-range residue pairs, can significantly reduce the complexity of conformational sampling for protein structure prediction in practice. Despite progresses in the past decade on protein targets with abundant homologous sequences, accurate contact prediction for proteins with limited sequence information is still far from satisfaction. Methodologies for these hard targets still need further improvement. We presented a computational program DeepConPred, which includes a pipeline of two novel deep-learning-based methods (DeepCCon and DeepRCon) as well as a contact refinement step, to improve the prediction of long-range residue contacts from primary sequences. When compared with previous prediction approaches, our framework employed an effective scheme to identify optimal and important features for contact prediction, and was only trained with coevolutionary information derived from a limited number of homologous sequences to ensure robustness and usefulness for hard targets. Independent tests showed that 59.33%/49.97%, 64.39%/54.01% and 70.00%/59.81% of the top L/5, top L/10 and top 5 predictions were correct for CASP10/CASP11 proteins, respectively. In general, our algorithm ranked as one of the best methods for CASP targets. All source data and codes are available at http://166.111.152.91/Downloads.html . hgong@tsinghua.edu.cn or zengjy321@tsinghua.edu.cn. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
Coarse-grained versus atomistic simulations: realistic interaction free energies for real proteins.
May, Ali; Pool, René; van Dijk, Erik; Bijlard, Jochem; Abeln, Sanne; Heringa, Jaap; Feenstra, K Anton
2014-02-01
To assess whether two proteins will interact under physiological conditions, information on the interaction free energy is needed. Statistical learning techniques and docking methods for predicting protein-protein interactions cannot quantitatively estimate binding free energies. Full atomistic molecular simulation methods do have this potential, but are completely unfeasible for large-scale applications in terms of computational cost required. Here we investigate whether applying coarse-grained (CG) molecular dynamics simulations is a viable alternative for complexes of known structure. We calculate the free energy barrier with respect to the bound state based on molecular dynamics simulations using both a full atomistic and a CG force field for the TCR-pMHC complex and the MP1-p14 scaffolding complex. We find that the free energy barriers from the CG simulations are of similar accuracy as those from the full atomistic ones, while achieving a speedup of >500-fold. We also observe that extensive sampling is extremely important to obtain accurate free energy barriers, which is only within reach for the CG models. Finally, we show that the CG model preserves biological relevance of the interactions: (i) we observe a strong correlation between evolutionary likelihood of mutations and the impact on the free energy barrier with respect to the bound state; and (ii) we confirm the dominant role of the interface core in these interactions. Therefore, our results suggest that CG molecular simulations can realistically be used for the accurate prediction of protein-protein interaction strength. The python analysis framework and data files are available for download at http://www.ibi.vu.nl/downloads/bioinformatics-2013-btt675.tgz.
Crosara, Karla Tonelli Bicalho; Moffa, Eduardo Buozi; Xiao, Yizhi; Siqueira, Walter Luiz
2018-01-16
Protein-protein interaction is a common physiological mechanism for protection and actions of proteins in an organism. The identification and characterization of protein-protein interactions in different organisms is necessary to better understand their physiology and to determine their efficacy. In a previous in vitro study using mass spectrometry, we identified 43 proteins that interact with histatin 1. Six previously documented interactors were confirmed and 37 novel partners were identified. In this tutorial, we aimed to demonstrate the usefulness of the STRING database for studying protein-protein interactions. We used an in-silico approach along with the STRING database (http://string-db.org/) and successfully performed a fast simulation of a novel constructed histatin 1 protein-protein network, including both the previously known and the predicted interactors, along with our newly identified interactors. Our study highlights the advantages and importance of applying bioinformatics tools to merge in-silico tactics with experimental in vitro findings for rapid advancement of our knowledge about protein-protein interactions. Our findings also indicate that bioinformatics tools such as the STRING protein network database can help predict potential interactions between proteins and thus serve as a guide for future steps in our exploration of the Human Interactome. Our study highlights the usefulness of the STRING protein database for studying protein-protein interactions. The STRING database can collect and integrate data about known and predicted protein-protein associations from many organisms, including both direct (physical) and indirect (functional) interactions, in an easy-to-use interface. Copyright © 2017 Elsevier B.V. All rights reserved.
Ponts, Nadia; Yang, Jianfeng; Chung, Duk-Won Doug; Prudhomme, Jacques; Girke, Thomas; Horrocks, Paul; Le Roch, Karine G
2008-06-11
Reversible modification of proteins through the attachment of ubiquitin or ubiquitin-like modifiers is an essential post-translational regulatory mechanism in eukaryotes. The conjugation of ubiquitin or ubiquitin-like proteins has been demonstrated to play roles in growth, adaptation and homeostasis in all eukaryotes, with perturbation of ubiquitin-mediated systems associated with the pathogenesis of many human diseases, including cancer and neurodegenerative disorders. Here we describe the use of an HMM search of functional Pfam domains found in the key components of the ubiquitin-mediated pathway necessary to activate and reversibly modify target proteins in eight apicomplexan parasitic protozoa for which complete or late-stage genome projects exist. In parallel, the same search was conducted on five model organisms, single-celled and metazoans, to generate data to validate both the search parameters employed and aid paralog classification in Apicomplexa. For each of the 13 species investigated, a set of proteins predicted to be involved in the ubiquitylation pathway has been identified and demonstrates increasing component members of the ubiquitylation pathway correlating with organism and genome complexity. Sequence homology and domain architecture analyses facilitated prediction of apicomplexan-specific protein function, particularly those involved in regulating cell division during these parasite's complex life cycles. This study provides a comprehensive analysis of proteins predicted to be involved in the apicomplexan ubiquitin-mediated pathway. Given the importance of such pathway in a wide variety of cellular processes, our data is a key step in elucidating the biological networks that, in part, direct the pathogenicity of these parasites resulting in a massive impact on global health. Moreover, apicomplexan-specific adaptations of the ubiquitylation pathway may represent new therapeutic targets for much needed drugs against apicomplexan parasites.
Ze, Xiaolei; Ben David, Yonit; Laverde-Gomez, Jenny A.; Dassa, Bareket; Sheridan, Paul O.; Duncan, Sylvia H.; Louis, Petra; Henrissat, Bernard; Juge, Nathalie; Koropatkin, Nicole M.; Bayer, Edward A.
2015-01-01
ABSTRACT Ruminococcus bromii is a dominant member of the human gut microbiota that plays a key role in releasing energy from dietary starches that escape digestion by host enzymes via its exceptional activity against particulate “resistant” starches. Genomic analysis of R. bromii shows that it is highly specialized, with 15 of its 21 glycoside hydrolases belonging to one family (GH13). We found that amylase activity in R. bromii is expressed constitutively, with the activity seen during growth with fructose as an energy source being similar to that seen with starch as an energy source. Six GH13 amylases that carry signal peptides were detected by proteomic analysis in R. bromii cultures. Four of these enzymes are among 26 R. bromii proteins predicted to carry dockerin modules, with one, Amy4, also carrying a cohesin module. Since cohesin-dockerin interactions are known to mediate the formation of protein complexes in cellulolytic ruminococci, the binding interactions of four cohesins and 11 dockerins from R. bromii were investigated after overexpressing them as recombinant fusion proteins. Dockerins possessed by the enzymes Amy4 and Amy9 are predicted to bind a cohesin present in protein scaffoldin 2 (Sca2), which resembles the ScaE cell wall-anchoring protein of a cellulolytic relative, R. flavefaciens. Further complexes are predicted between the dockerin-carrying amylases Amy4, Amy9, Amy10, and Amy12 and two other cohesin-carrying proteins, while Amy4 has the ability to autoaggregate, as its dockerin can recognize its own cohesin. This organization of starch-degrading enzymes is unprecedented and provides the first example of cohesin-dockerin interactions being involved in an amylolytic system, which we refer to as an “amylosome.” PMID:26419877
NASA Astrophysics Data System (ADS)
Bordner, Andrew J.; Zorman, Barry; Abagyan, Ruben
2011-10-01
Membrane proteins comprise a significant fraction of the proteomes of sequenced organisms and are the targets of approximately half of marketed drugs. However, in spite of their prevalence and biomedical importance, relatively few experimental structures are available due to technical challenges. Computational simulations can potentially address this deficit by providing structural models of membrane proteins. Solvation within the spatially heterogeneous membrane/solvent environment provides a major component of the energetics driving protein folding and association within the membrane. We have developed an implicit solvation model for membranes that is both computationally efficient and accurate enough to enable molecular mechanics predictions for the folding and association of peptides within the membrane. We derived the new atomic solvation model parameters using an unbiased fitting procedure to experimental data and have applied it to diverse problems in order to test its accuracy and to gain insight into membrane protein folding. First, we predicted the positions and orientations of peptides and complexes within the lipid bilayer and compared the simulation results with solid-state NMR structures. Additionally, we performed folding simulations for a series of host-guest peptides with varying propensities to form alpha helices in a hydrophobic environment and compared the structures with experimental measurements. We were also able to successfully predict the structures of amphipathic peptides as well as the structures for dimeric complexes of short hexapeptides that have experimentally characterized propensities to form beta sheets within the membrane. Finally, we compared calculated relative transfer energies with data from experiments measuring the effects of mutations on the free energies of translocon-mediated insertion of proteins into lipid bilayers and of combined folding and membrane insertion of a beta barrel protein.
Prediction of cassava protein interactome based on interolog method.
Thanasomboon, Ratana; Kalapanulak, Saowalak; Netrphan, Supatcharee; Saithong, Treenut
2017-12-08
Cassava is a starchy root crop whose role in food security becomes more significant nowadays. Together with the industrial uses for versatile purposes, demand for cassava starch is continuously growing. However, in-depth study to uncover the mystery of cellular regulation, especially the interaction between proteins, is lacking. To reduce the knowledge gap in protein-protein interaction (PPI), genome-scale PPI network of cassava was constructed using interolog-based method (MePPI-In, available at http://bml.sbi.kmutt.ac.th/ppi ). The network was constructed from the information of seven template plants. The MePPI-In included 90,173 interactions from 7,209 proteins. At least, 39 percent of the total predictions were found with supports from gene/protein expression data, while further co-expression analysis yielded 16 highly promising PPIs. In addition, domain-domain interaction information was employed to increase reliability of the network and guide the search for more groups of promising PPIs. Moreover, the topology and functional content of MePPI-In was similar to the networks of Arabidopsis and rice. The potential contribution of MePPI-In for various applications, such as protein-complex formation and prediction of protein function, was discussed and exemplified. The insights provided by our MePPI-In would hopefully enable us to pursue precise trait improvement in cassava.
VarMod: modelling the functional effects of non-synonymous variants.
Pappalardo, Morena; Wass, Mark N
2014-07-01
Unravelling the genotype-phenotype relationship in humans remains a challenging task in genomics studies. Recent advances in sequencing technologies mean there are now thousands of sequenced human genomes, revealing millions of single nucleotide variants (SNVs). For non-synonymous SNVs present in proteins the difficulties of the problem lie in first identifying those nsSNVs that result in a functional change in the protein among the many non-functional variants and in turn linking this functional change to phenotype. Here we present VarMod (Variant Modeller) a method that utilises both protein sequence and structural features to predict nsSNVs that alter protein function. VarMod develops recent observations that functional nsSNVs are enriched at protein-protein interfaces and protein-ligand binding sites and uses these characteristics to make predictions. In benchmarking on a set of nearly 3000 nsSNVs VarMod performance is comparable to an existing state of the art method. The VarMod web server provides extensive resources to investigate the sequence and structural features associated with the predictions including visualisation of protein models and complexes via an interactive JSmol molecular viewer. VarMod is available for use at http://www.wasslab.org/varmod. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
Predicting permanent and transient protein-protein interfaces.
La, David; Kong, Misun; Hoffman, William; Choi, Youn Im; Kihara, Daisuke
2013-05-01
Protein-protein interactions (PPIs) are involved in diverse functions in a cell. To optimize functional roles of interactions, proteins interact with a spectrum of binding affinities. Interactions are conventionally classified into permanent and transient, where the former denotes tight binding between proteins that result in strong complexes, whereas the latter compose of relatively weak interactions that can dissociate after binding to regulate functional activity at specific time point. Knowing the type of interactions has significant implications for understanding the nature and function of PPIs. In this study, we constructed amino acid substitution models that capture mutation patterns at permanent and transient type of protein interfaces, which were found to be different with statistical significance. Using the substitution models, we developed a novel computational method that predicts permanent and transient protein binding interfaces (PBIs) in protein surfaces. Without knowledge of the interacting partner, the method uses a single query protein structure and a multiple sequence alignment of the sequence family. Using a large dataset of permanent and transient proteins, we show that our method, BindML+, performs very well in protein interface classification. A very high area under the curve (AUC) value of 0.957 was observed when predicted protein binding sites were classified. Remarkably, near prefect accuracy was achieved with an AUC of 0.991 when actual binding sites were classified. The developed method will be also useful for protein design of permanent and transient PBIs. Copyright © 2013 Wiley Periodicals, Inc.
Li, Yaohang; Liu, Hui; Rata, Ionel; Jakobsson, Eric
2013-02-25
The rapidly increasing number of protein crystal structures available in the Protein Data Bank (PDB) has naturally made statistical analyses feasible in studying complex high-order inter-residue correlations. In this paper, we report a context-based secondary structure potential (CSSP) for assessing the quality of predicted protein secondary structures generated by various prediction servers. CSSP is a sequence-position-specific knowledge-based potential generated based on the potentials of mean force approach, where high-order inter-residue interactions are taken into consideration. The CSSP potential is effective in identifying secondary structure predictions with good quality. In 56% of the targets in the CB513 benchmark, the optimal CSSP potential is able to recognize the native secondary structure or a prediction with Q3 accuracy higher than 90% as best scored in the predicted secondary structures generated by 10 popularly used secondary structure prediction servers. In more than 80% of the CB513 targets, the predicted secondary structures with the lowest CSSP potential values yield higher than 80% Q3 accuracy. Similar performance of CSSP is found on the CASP9 targets as well. Moreover, our computational results also show that the CSSP potential using triplets outperforms the CSSP potential using doublets and is currently better than the CSSP potential using quartets.
Li, Shu; Williams, Justin S; Sun, Penglin; Kao, Teh-Hui
2016-09-01
The collaborative non-self-recognition model for S-RNase-based self-incompatibility predicts that multiple S-locus F-box proteins (SLFs) produced by pollen of a given S-haplotype collectively mediate ubiquitination and degradation of all non-self S-RNases, but not self S-RNases, in the pollen tube, thereby resulting in cross-compatible pollination but self-incompatible pollination. We had previously used pollen extracts containing GFP-fused S2 -SLF1 (SLF1 with an S2 -haplotype) of Petunia inflata for co-immunoprecipitation (Co-IP) and mass spectrometry (MS), and identified PiCUL1-P (a pollen-specific Cullin1), PiSSK1 (a pollen-specific Skp1-like protein) and PiRBX1 (a conventional Rbx1) as components of the SCF(S) (2-) (SLF) (1) complex. Using pollen extracts containing PiSSK1:FLAG:GFP for Co-IP/MS, we identified two additional SLFs (SLF4 and SLF13) that were assembled into SCF(SLF) complexes. As 17 SLF genes (SLF1 to SLF17) have been identified in S2 and S3 pollen, here we examined whether all 17 SLFs are assembled into similar complexes and, if so, whether these complexes are unique to SLFs. We modified the previous Co-IP/MS procedure, including the addition of style extracts from four different S-genotypes to pollen extracts containing PiSSK1:FLAG:GFP, to perform four separate experiments. The results taken together show that all 17 SLFs and an SLF-like protein, SLFLike1 (encoded by an S-locus-linked gene), co-immunoprecipitated with PiSSK1:FLAG:GFP. Moreover, of the 179 other F-box proteins predicted by S2 and S3 pollen transcriptomes, only a pair with 94.9% identity and another pair with 99.7% identity co-immunoprecipitated with PiSSK1:FLAG:GFP. These results suggest that SCF(SLF) complexes have evolved specifically to function in self-incompatibility. © 2016 The Authors The Plant Journal © 2016 John Wiley & Sons Ltd.
Mu, Lin
2018-01-01
This work introduces a number of algebraic topology approaches, including multi-component persistent homology, multi-level persistent homology, and electrostatic persistence for the representation, characterization, and description of small molecules and biomolecular complexes. In contrast to the conventional persistent homology, multi-component persistent homology retains critical chemical and biological information during the topological simplification of biomolecular geometric complexity. Multi-level persistent homology enables a tailored topological description of inter- and/or intra-molecular interactions of interest. Electrostatic persistence incorporates partial charge information into topological invariants. These topological methods are paired with Wasserstein distance to characterize similarities between molecules and are further integrated with a variety of machine learning algorithms, including k-nearest neighbors, ensemble of trees, and deep convolutional neural networks, to manifest their descriptive and predictive powers for protein-ligand binding analysis and virtual screening of small molecules. Extensive numerical experiments involving 4,414 protein-ligand complexes from the PDBBind database and 128,374 ligand-target and decoy-target pairs in the DUD database are performed to test respectively the scoring power and the discriminatory power of the proposed topological learning strategies. It is demonstrated that the present topological learning outperforms other existing methods in protein-ligand binding affinity prediction and ligand-decoy discrimination. PMID:29309403
Gurung, Ratna B.; Purdie, Auriol C.; Begg, Douglas J.
2012-01-01
Johne's disease in ruminants is caused by Mycobacterium avium subsp. paratuberculosis. Diagnosis of M. avium subsp. paratuberculosis infection is difficult, especially in the early stages. To date, ideal antigen candidates are not available for efficient immunization or immunodiagnosis. This study reports the in silico selection and subsequent analysis of epitopes of M. avium subsp. paratuberculosis proteins that were found to be upregulated under stress conditions as a means to identify immunogenic candidate proteins. Previous studies have reported differential regulation of proteins when M. avium subsp. paratuberculosis is exposed to stressors which induce a response similar to dormancy. Dormancy may be involved in evading host defense mechanisms, and the host may also mount an immune response against these proteins. Twenty-five M. avium subsp. paratuberculosis proteins that were previously identified as being upregulated under in vitro stress conditions were analyzed for B and T cell epitopes by use of the prediction tools at the Immune Epitope Database and Analysis Resource. Major histocompatibility complex class I T cell epitopes were predicted using an artificial neural network method, and class II T cell epitopes were predicted using the consensus method. Conformational B cell epitopes were predicted from the relevant three-dimensional structure template for each protein. Based on the greatest number of predicted epitopes, eight proteins (MAP2698c [encoded by desA2], MAP2312c [encoded by fadE19], MAP3651c [encoded by fadE3_2], MAP2872c [encoded by fabG5_2], MAP3523c [encoded by oxcA], MAP0187c [encoded by sodA], and the hypothetical proteins MAP3567 and MAP1168c) were identified as potential candidates for study of antibody- and cell-mediated immune responses within infected hosts. PMID:22496492
The solution structure of the pentatricopeptide repeat protein PPR10 upon binding atpH RNA
Gully, Benjamin S.; Cowieson, Nathan; Stanley, Will A.; Shearston, Kate; Small, Ian D.; Barkan, Alice; Bond, Charles S.
2015-01-01
The pentatricopeptide repeat (PPR) protein family is a large family of RNA-binding proteins that is characterized by tandem arrays of a degenerate 35-amino-acid motif which form an α-solenoid structure. PPR proteins influence the editing, splicing, translation and stability of specific RNAs in mitochondria and chloroplasts. Zea mays PPR10 is amongst the best studied PPR proteins, where sequence-specific binding to two RNA transcripts, atpH and psaJ, has been demonstrated to follow a recognition code where the identity of two amino acids per repeat determines the base-specificity. A recently solved ZmPPR10:psaJ complex crystal structure suggested a homodimeric complex with considerably fewer sequence-specific protein–RNA contacts than inferred previously. Here we describe the solution structure of the ZmPPR10:atpH complex using size-exclusion chromatography-coupled synchrotron small-angle X-ray scattering (SEC-SY-SAXS). Our results support prior evidence that PPR10 binds RNA as a monomer, and that it does so in a manner that is commensurate with a canonical and predictable RNA-binding mode across much of the RNA–protein interface. PMID:25609698
Preliminary results of human PrPC protein studied by spectroscopic techniques
NASA Astrophysics Data System (ADS)
Nowakowski, Michał; Czapla-Masztafiak, Joanna; Kozak, Maciej; Zhukov, Igor; Zhukova, Lilia; Szlachetko, Jakub; Kwiatek, Wojciech M.
2017-11-01
Neurodegenerative diseases are one of the malfunctions of human nervous system, being a class of complex and prominent pathologies. The human prion Protease Resistant Protein (PrP) is protein regulating copper metabolism in mammalian cells through binding of Cu(II) ions to specific fragments. Nowadays misfolding of this protein is associated with development of prion diseases. Therefore, it is crucial to obtain structural information about coordination of Cu(II) by PrP protein. Herein, we report X-ray absorption spectroscopy (XAS) measurements, carried out on SuperXAS beamline (SLS, PSI Villigen) on PrPC-Cu(II) complexes. Obtained results were compared with theoretical predictions done by FEFF 9.6 software. Complementary to XAS data, Atomic Force Microscopy (AFM) measurements were conducted to obtain low resolution structural information about prepared sample that allow to develop protocol of fixing PrPC molecules on solid substrate used for further experiments. It has been established that folded C-terminal domain of PrPC protein has around 5 nm in diameter. Presented results showed that both XAS and AFM methods are useful tools in detailed examination of complexes of human PrPC either with Cu(II) or with other divalent metal ions.
A combinatorial approach to protein docking with flexible side chains.
Althaus, Ernst; Kohlbacher, Oliver; Lenhof, Hans-Peter; Müller, Peter
2002-01-01
Rigid-body docking approaches are not sufficient to predict the structure of a protein complex from the unbound (native) structures of the two proteins. Accounting for side chain flexibility is an important step towards fully flexible protein docking. This work describes an approach that allows conformational flexibility for the side chains while keeping the protein backbone rigid. Starting from candidates created by a rigid-docking algorithm, we demangle the side chains of the docking site, thus creating reasonable approximations of the true complex structure. These structures are ranked with respect to the binding free energy. We present two new techniques for side chain demangling. Both approaches are based on a discrete representation of the side chain conformational space by the use of a rotamer library. This leads to a combinatorial optimization problem. For the solution of this problem, we propose a fast heuristic approach and an exact, albeit slower, method that uses branch-and-cut techniques. As a test set, we use the unbound structures of three proteases and the corresponding protein inhibitors. For each of the examples, the highest-ranking conformation produced was a good approximation of the true complex structure.
NASA Astrophysics Data System (ADS)
Palla, Gergely; Derenyi, Imre; Farkas, Illes J.; Vicsek, Tamas
2006-03-01
Most tasks in a cell are performed not by individual proteins, but by functional groups of proteins (either physically interacting with each other or associated in other ways). In gene (protein) association networks these groups show up as sets of densely connected nodes. In the yeast, Saccharomyces cerevisiae, known physically interacting groups of proteins (called protein complexes) strongly overlap: the total number of proteins contained by these complexes by far underestimates the sum of their sizes (2750 vs. 8932). Thus, most functional groups of proteins, both physically interacting and other, are likely to share many of their members with other groups. However, current algorithms searching for dense groups of nodes in networks usually exclude overlaps. With the aim to discover both novel functions of individual proteins and novel protein functional groups we combine in protein association networks (i) a search for overlapping dense subgraphs based on the Clique Percolation Method (CPM) (Palla, G., et.al. Nature 435, 814-818 (2005), http://angel.elte.hu/clustering), which explicitly allows for overlaps among the groups, and (ii) a verification and characterization of the identified groups of nodes (proteins) with the help of standard annotation databases listing known functions.
RaptorX-Property: a web server for protein structure property prediction.
Wang, Sheng; Li, Wei; Liu, Shiwang; Xu, Jinbo
2016-07-08
RaptorX Property (http://raptorx2.uchicago.edu/StructurePropertyPred/predict/) is a web server predicting structure property of a protein sequence without using any templates. It outperforms other servers, especially for proteins without close homologs in PDB or with very sparse sequence profile (i.e. carries little evolutionary information). This server employs a powerful in-house deep learning model DeepCNF (Deep Convolutional Neural Fields) to predict secondary structure (SS), solvent accessibility (ACC) and disorder regions (DISO). DeepCNF not only models complex sequence-structure relationship by a deep hierarchical architecture, but also interdependency between adjacent property labels. Our experimental results show that, tested on CASP10, CASP11 and the other benchmarks, this server can obtain ∼84% Q3 accuracy for 3-state SS, ∼72% Q8 accuracy for 8-state SS, ∼66% Q3 accuracy for 3-state solvent accessibility, and ∼0.89 area under the ROC curve (AUC) for disorder prediction. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Dursun, Erdinç; Gezen-Ak, Duygu
2017-01-01
Our recent study indicated that vitamin D and its receptors are important parts of the amyloid processing pathway in neurons. Yet the role of vitamin D receptor (VDR) in amyloid pathogenesis is complex and all regulations over the production of amyloid beta cannot be explained solely with the transcriptional regulatory properties of VDR. Given that we hypothesized that VDR might exist on the neuronal plasma membrane in close proximity with amyloid precursor protein (APP) and secretase complexes. The present study primarily focused on the localization of VDR in neurons and its interaction with amyloid pathology-related proteins. The localization of VDR on neuronal membranes and its co-localization with target proteins were investigated with cell surface staining followed by immunofluorescence labelling. The FpClass was used for protein-protein interaction prediction. Our results demonstrated the localization of VDR on the neuronal plasma membrane and the co-localization of VDR and APP or ADAM10 or Nicastrin and limited co-localization of VDR and PS1. E-cadherin interaction with APP or the γ-secretase complex may involve NOTCH1, NUMB, or FHL2, according to FpClass. This suggested complex might also include VDR, which greatly contributes to Ca+2 hemostasis with its ligand vitamin D. Consequently, we suggested that VDR might be a member of this complex also with its own non-genomic action and that it can regulate the APP processing pathway in this way in neurons.
NASA Astrophysics Data System (ADS)
Kraft, Lewis J.; Kenworthy, Anne K.
2012-01-01
The protein microtubule-associated protein 1, light chain 3 (LC3) functions in autophagosome formation and plays a central role in the autophagy pathway. Previously, we found LC3 diffuses more slowly in cells than is expected for a freely diffusing monomer, suggesting it may constitutively associate with a macromolecular complex containing other protein components of the pathway. In the current study, we used Förster resonance energy transfer (FRET) microscopy and fluorescence recovery after photobleaching (FRAP) to investigate the interactions of LC3 with Atg4BC74A, a catalytically inactive mutant of the cysteine protease involved in lipidation and de-lipidation of LC3, as a model system to probe protein complex formation in the autophagy pathway. We show Atg4BC74A is in FRET proximity with LC3 in both the cytoplasm and nucleus of living cells, consistent with previous biochemical evidence that suggests these proteins directly interact. In addition, overexpressed Atg4BC74A diffuses significantly more slowly than predicted based on its molecular weight, and its translational diffusion coefficient is significantly slowed upon coexpression with LC3 to match that of LC3 itself. Taken together, these results suggest Atg4BC74A and LC3 are contained within the same multiprotein complex and that this complex exists in both the cytoplasm and nucleoplasm of living cells.
Nonlinear scoring functions for similarity-based ligand docking and binding affinity prediction.
Brylinski, Michal
2013-11-25
A common strategy for virtual screening considers a systematic docking of a large library of organic compounds into the target sites in protein receptors with promising leads selected based on favorable intermolecular interactions. Despite a continuous progress in the modeling of protein-ligand interactions for pharmaceutical design, important challenges still remain, thus the development of novel techniques is required. In this communication, we describe eSimDock, a new approach to ligand docking and binding affinity prediction. eSimDock employs nonlinear machine learning-based scoring functions to improve the accuracy of ligand ranking and similarity-based binding pose prediction, and to increase the tolerance to structural imperfections in the target structures. In large-scale benchmarking using the Astex/CCDC data set, we show that 53.9% (67.9%) of the predicted ligand poses have RMSD of <2 Å (<3 Å). Moreover, using binding sites predicted by recently developed eFindSite, eSimDock models ligand binding poses with an RMSD of 4 Å for 50.0-39.7% of the complexes at the protein homology level limited to 80-40%. Simulations against non-native receptor structures, whose mean backbone rearrangements vary from 0.5 to 5.0 Å Cα-RMSD, show that the ratio of docking accuracy and the estimated upper bound is at a constant level of ∼0.65. Pearson correlation coefficient between experimental and predicted by eSimDock Ki values for a large data set of the crystal structures of protein-ligand complexes from BindingDB is 0.58, which decreases only to 0.46 when target structures distorted to 3.0 Å Cα-RMSD are used. Finally, two case studies demonstrate that eSimDock can be customized to specific applications as well. These encouraging results show that the performance of eSimDock is largely unaffected by the deformations of ligand binding regions, thus it represents a practical strategy for across-proteome virtual screening using protein models. eSimDock is freely available to the academic community as a Web server at http://www.brylinski.org/esimdock .
van Lis, Robert; Atteia, Ariane; Mendoza-Hernández, Guillermo; González-Halphen, Diego
2003-01-01
Pure mitochondria of the photosynthetic alga Chlamydomonas reinhardtii were analyzed using blue native-polyacrylamide gel electrophoresis (BN-PAGE). The major oxidative phosphorylation complexes were resolved: F1F0-ATP synthase, NADH-ubiquinone oxidoreductase, ubiquinol-cytochrome c reductase, and cytochrome c oxidase. The oligomeric states of these complexes were determined. The F1F0-ATP synthase runs exclusively as a dimer, in contrast to the C. reinhardtii chloroplast enzyme, which is present as a monomer and subcomplexes. The sequence of a 60-kD protein, associated with the mitochondrial ATP synthase and with no known counterpart in any other organism, is reported. This protein may be related to the strong dimeric character of the algal F1F0-ATP synthase. The oxidative phosphorylation complexes resolved by BN-PAGE were separated into their subunits by second dimension sodium dodecyl sulfate-PAGE. A number of polypeptides were identified mainly on the basis of their N-terminal sequence. Core I and II subunits of complex III were characterized, and their proteolytic activities were predicted. Also, the heterodimeric nature of COXIIA and COXIIB subunits in cytochrome c oxidase was demonstrated. Other mitochondrial proteins like the chaperone HSP60, the alternative oxidase, the aconitase, and the ADP/ATP carrier were identified. BN-PAGE was also used to approach the analysis of the major chloroplast protein complexes of C. reinhardtii. PMID:12746537
Molecular analysis of the von hippel-lindau disease gene.
Chernoff, A; Kasparcova, V; Linehan, W M; Stolle, C A
2001-01-01
Von Hippel-Lindau (VHL) disease is an autosomal dominant disorder that predisposes the affected individual to develop characteristic tumors. These include CNS hemangioblastoma, retinal angiomas, endolymphatic sac tumors, pancreatic cysts and tumors, epididymal cystadenomas, pheochromocytomas, renal cysts, and clear-cell renal carcinoma. The VHL gene was localized to 3p25 and then isolated by Latif et al. (1). The gene contains three exons with an open reading frame of 852 nucleotides, which encode a predicted protein of 284 amino acids. The VHL protein is believed to have several functions. It is involved in transcription regulation through its inhibition of elongation by binding to the B and C subunits of elongin. Mutations of VHL allow the B and C subunits to bind with the A subunit. This complex then overcomes "pausing" of RNA polymerase during mRNA transcription (2,3). Several studies suggest that the VHL protein is also involved in regulation of hypoxia-inducible transcripts, particularly vascular endothelial growth factor (VEGF), by altering mRNA stability (4,5). Therefore, VHL gene mutations permit the overexpression of VEGF under normoxic conditions, which leads to the angiogenesis believed to be required for tumor growth. The VHL-elongin BC complex (VBC) also binds two other proteins-CUL2 and Rbx1-in a complex that has structural similarity to other E3 ubiquitin ligase complexes (6). Such complexes mediate the degradation of cell-cycle regulatory proteins.
Popelka, Hana; Uversky, Vladimir N; Klionsky, Daniel J
2014-06-01
The mechanism of autophagy relies on complex cell signaling and regulatory processes. Each cell contains many proteins that lack a rigid 3-dimensional structure under physiological conditions. These dynamic proteins, called intrinsically disordered proteins (IDPs) and protein regions (IDPRs), are predominantly involved in cell signaling and regulation. Yet, very little is known about their presence among proteins of the core autophagy machinery. In this work, we characterized the autophagy protein Atg3 from yeast and human along with 2 variants to show that Atg3 is an IDPRs-containing protein and that disorder/order predicted for these proteins from their amino acid sequence corresponds to their experimental characteristics. Based on this consensus, we applied the same prediction methods to all known Atg proteins from Saccharomyces cerevisiae. The data presented here provide an insight into the structural dynamics of each Atg protein. They also show that intrinsic disorder at various levels has to be taken into consideration for about half of the Atg proteins. This work should become a useful tool that will facilitate and encourage exploration of protein intrinsic disorder in autophagy.
VarMod: modelling the functional effects of non-synonymous variants
Pappalardo, Morena; Wass, Mark N.
2014-01-01
Unravelling the genotype–phenotype relationship in humans remains a challenging task in genomics studies. Recent advances in sequencing technologies mean there are now thousands of sequenced human genomes, revealing millions of single nucleotide variants (SNVs). For non-synonymous SNVs present in proteins the difficulties of the problem lie in first identifying those nsSNVs that result in a functional change in the protein among the many non-functional variants and in turn linking this functional change to phenotype. Here we present VarMod (Variant Modeller) a method that utilises both protein sequence and structural features to predict nsSNVs that alter protein function. VarMod develops recent observations that functional nsSNVs are enriched at protein–protein interfaces and protein–ligand binding sites and uses these characteristics to make predictions. In benchmarking on a set of nearly 3000 nsSNVs VarMod performance is comparable to an existing state of the art method. The VarMod web server provides extensive resources to investigate the sequence and structural features associated with the predictions including visualisation of protein models and complexes via an interactive JSmol molecular viewer. VarMod is available for use at http://www.wasslab.org/varmod. PMID:24906884
Computational Methods to Predict Protein Interaction Partners
NASA Astrophysics Data System (ADS)
Valencia, Alfonso; Pazos, Florencio
In the new paradigm for studying biological phenomena represented by Systems Biology, cellular components are not considered in isolation but as forming complex networks of relationships. Protein interaction networks are among the first objects studied from this new point of view. Deciphering the interactome (the whole network of interactions for a given proteome) has been shown to be a very complex task. Computational techniques for detecting protein interactions have become standard tools for dealing with this problem, helping and complementing their experimental counterparts. Most of these techniques use genomic or sequence features intuitively related with protein interactions and are based on "first principles" in the sense that they do not involve training with examples. There are also other computational techniques that use other sources of information (i.e. structural information or even experimental data) or are based on training with examples.
Hawkins, Clare L; Pattison, David I; Davies, Michael J
2002-01-01
Stimulated phagocyte cells produce the oxidant HOCl, via the release of the enzyme myeloperoxidase and hydrogen peroxide. HOCl is important in bacterial cell killing, but excessive or misplaced generation can damage the host tissue and may lead to the development of certain diseases such as cancer. The role of HOCl in the oxidation of isolated proteins, DNA and their components has been investigated extensively, but little work has been performed on the protein-DNA (nucleosome) complexes present in eukaryotic cell nuclei. Neither the selectivity of damage in such complexes nor the possibility of transfer of damage from the protein to DNA or vice versa, has been studied. In the present study, kinetic modelling has been employed to predict that reaction occurs predominantly with the protein and not with the DNA in the nucleosome, using molar HOCl excesses of up to 200-fold. With 50-200-fold excesses, 50-80% of the HOCl is predicted to react with histone lysine and histidine residues to yield chloramines. The yield and stability of such chloramines predicted by these modelling studies agrees well with experimental data. Decomposition of these species gives protein-derived, nitrogen-centred radicals, probably on the lysine side chains, as characterized by the EPR and spin-trapping experiments. It is shown that isolated lysine, histidine, peptide and protein chloramines can react with plasmid DNA to cause strand breaks. The protection against such damage afforded by the radical scavengers Trolox (a water-soluble alpha-tocopherol derivative) and 5,5-dimethyl-1-pyrroline-N-oxide suggests a radical-mediated process. The EPR experiments and product analyses have also provided evidence for the rapid addition of protein radicals, formed on chloramine decomposition, to pyrimidine nucleosides to give nucleobase radicals. Further evidence for the formation of such covalent cross-links has been obtained from experiments performed using (3)H-lysine and (14)C-histidine chloramines. These results are consistent with the predictions of the kinetic model and suggest that histones are major targets for HOCl in the nucleosome. Furthermore, the resulting protein chloramines and the radicals derived from them may act as contributing agents in HOCl-mediated DNA oxidation. PMID:12010123
Minimalist ensemble algorithms for genome-wide protein localization prediction.
Lin, Jhih-Rong; Mondal, Ananda Mohan; Liu, Rong; Hu, Jianjun
2012-07-03
Computational prediction of protein subcellular localization can greatly help to elucidate its functions. Despite the existence of dozens of protein localization prediction algorithms, the prediction accuracy and coverage are still low. Several ensemble algorithms have been proposed to improve the prediction performance, which usually include as many as 10 or more individual localization algorithms. However, their performance is still limited by the running complexity and redundancy among individual prediction algorithms. This paper proposed a novel method for rational design of minimalist ensemble algorithms for practical genome-wide protein subcellular localization prediction. The algorithm is based on combining a feature selection based filter and a logistic regression classifier. Using a novel concept of contribution scores, we analyzed issues of algorithm redundancy, consensus mistakes, and algorithm complementarity in designing ensemble algorithms. We applied the proposed minimalist logistic regression (LR) ensemble algorithm to two genome-wide datasets of Yeast and Human and compared its performance with current ensemble algorithms. Experimental results showed that the minimalist ensemble algorithm can achieve high prediction accuracy with only 1/3 to 1/2 of individual predictors of current ensemble algorithms, which greatly reduces computational complexity and running time. It was found that the high performance ensemble algorithms are usually composed of the predictors that together cover most of available features. Compared to the best individual predictor, our ensemble algorithm improved the prediction accuracy from AUC score of 0.558 to 0.707 for the Yeast dataset and from 0.628 to 0.646 for the Human dataset. Compared with popular weighted voting based ensemble algorithms, our classifier-based ensemble algorithms achieved much better performance without suffering from inclusion of too many individual predictors. We proposed a method for rational design of minimalist ensemble algorithms using feature selection and classifiers. The proposed minimalist ensemble algorithm based on logistic regression can achieve equal or better prediction performance while using only half or one-third of individual predictors compared to other ensemble algorithms. The results also suggested that meta-predictors that take advantage of a variety of features by combining individual predictors tend to achieve the best performance. The LR ensemble server and related benchmark datasets are available at http://mleg.cse.sc.edu/LRensemble/cgi-bin/predict.cgi.
Minimalist ensemble algorithms for genome-wide protein localization prediction
2012-01-01
Background Computational prediction of protein subcellular localization can greatly help to elucidate its functions. Despite the existence of dozens of protein localization prediction algorithms, the prediction accuracy and coverage are still low. Several ensemble algorithms have been proposed to improve the prediction performance, which usually include as many as 10 or more individual localization algorithms. However, their performance is still limited by the running complexity and redundancy among individual prediction algorithms. Results This paper proposed a novel method for rational design of minimalist ensemble algorithms for practical genome-wide protein subcellular localization prediction. The algorithm is based on combining a feature selection based filter and a logistic regression classifier. Using a novel concept of contribution scores, we analyzed issues of algorithm redundancy, consensus mistakes, and algorithm complementarity in designing ensemble algorithms. We applied the proposed minimalist logistic regression (LR) ensemble algorithm to two genome-wide datasets of Yeast and Human and compared its performance with current ensemble algorithms. Experimental results showed that the minimalist ensemble algorithm can achieve high prediction accuracy with only 1/3 to 1/2 of individual predictors of current ensemble algorithms, which greatly reduces computational complexity and running time. It was found that the high performance ensemble algorithms are usually composed of the predictors that together cover most of available features. Compared to the best individual predictor, our ensemble algorithm improved the prediction accuracy from AUC score of 0.558 to 0.707 for the Yeast dataset and from 0.628 to 0.646 for the Human dataset. Compared with popular weighted voting based ensemble algorithms, our classifier-based ensemble algorithms achieved much better performance without suffering from inclusion of too many individual predictors. Conclusions We proposed a method for rational design of minimalist ensemble algorithms using feature selection and classifiers. The proposed minimalist ensemble algorithm based on logistic regression can achieve equal or better prediction performance while using only half or one-third of individual predictors compared to other ensemble algorithms. The results also suggested that meta-predictors that take advantage of a variety of features by combining individual predictors tend to achieve the best performance. The LR ensemble server and related benchmark datasets are available at http://mleg.cse.sc.edu/LRensemble/cgi-bin/predict.cgi. PMID:22759391
deepNF: Deep network fusion for protein function prediction.
Gligorijevic, Vladimir; Barot, Meet; Bonneau, Richard
2018-06-01
The prevalence of high-throughput experimental methods has resulted in an abundance of large-scale molecular and functional interaction networks. The connectivity of these networks provides a rich source of information for inferring functional annotations for genes and proteins. An important challenge has been to develop methods for combining these heterogeneous networks to extract useful protein feature representations for function prediction. Most of the existing approaches for network integration use shallow models that encounter difficulty in capturing complex and highly-nonlinear network structures. Thus, we propose deepNF, a network fusion method based on Multimodal Deep Autoencoders to extract high-level features of proteins from multiple heterogeneous interaction networks. We apply this method to combine STRING networks to construct a common low-dimensional representation containing high-level protein features. We use separate layers for different network types in the early stages of the multimodal autoencoder, later connecting all the layers into a single bottleneck layer from which we extract features to predict protein function. We compare the cross-validation and temporal holdout predictive performance of our method with state-of-the-art methods, including the recently proposed method Mashup. Our results show that our method outperforms previous methods for both human and yeast STRING networks. We also show substantial improvement in the performance of our method in predicting GO terms of varying type and specificity. deepNF is freely available at: https://github.com/VGligorijevic/deepNF. vgligorijevic@flatironinstitute.org, rb133@nyu.edu. Supplementary data are available at Bioinformatics online.
Predicting Protein-Protein Interaction Sites with a Novel Membership Based Fuzzy SVM Classifier.
Sriwastava, Brijesh K; Basu, Subhadip; Maulik, Ujjwal
2015-01-01
Predicting residues that participate in protein-protein interactions (PPI) helps to identify, which amino acids are located at the interface. In this paper, we show that the performance of the classical support vector machine (SVM) algorithm can further be improved with the use of a custom-designed fuzzy membership function, for the partner-specific PPI interface prediction problem. We evaluated the performances of both classical SVM and fuzzy SVM (F-SVM) on the PPI databases of three different model proteomes of Homo sapiens, Escherichia coli and Saccharomyces Cerevisiae and calculated the statistical significance of the developed F-SVM over classical SVM algorithm. We also compared our performance with the available state-of-the-art fuzzy methods in this domain and observed significant performance improvements. To predict interaction sites in protein complexes, local composition of amino acids together with their physico-chemical characteristics are used, where the F-SVM based prediction method exploits the membership function for each pair of sequence fragments. The average F-SVM performance (area under ROC curve) on the test samples in 10-fold cross validation experiment are measured as 77.07, 78.39, and 74.91 percent for the aforementioned organisms respectively. Performances on independent test sets are obtained as 72.09, 73.24 and 82.74 percent respectively. The software is available for free download from http://code.google.com/p/cmater-bioinfo.
Exploring the potential of 3D Zernike descriptors and SVM for protein-protein interface prediction.
Daberdaku, Sebastian; Ferrari, Carlo
2018-02-06
The correct determination of protein-protein interaction interfaces is important for understanding disease mechanisms and for rational drug design. To date, several computational methods for the prediction of protein interfaces have been developed, but the interface prediction problem is still not fully understood. Experimental evidence suggests that the location of binding sites is imprinted in the protein structure, but there are major differences among the interfaces of the various protein types: the characterising properties can vary a lot depending on the interaction type and function. The selection of an optimal set of features characterising the protein interface and the development of an effective method to represent and capture the complex protein recognition patterns are of paramount importance for this task. In this work we investigate the potential of a novel local surface descriptor based on 3D Zernike moments for the interface prediction task. Descriptors invariant to roto-translations are extracted from circular patches of the protein surface enriched with physico-chemical properties from the HQI8 amino acid index set, and are used as samples for a binary classification problem. Support Vector Machines are used as a classifier to distinguish interface local surface patches from non-interface ones. The proposed method was validated on 16 classes of proteins extracted from the Protein-Protein Docking Benchmark 5.0 and compared to other state-of-the-art protein interface predictors (SPPIDER, PrISE and NPS-HomPPI). The 3D Zernike descriptors are able to capture the similarity among patterns of physico-chemical and biochemical properties mapped on the protein surface arising from the various spatial arrangements of the underlying residues, and their usage can be easily extended to other sets of amino acid properties. The results suggest that the choice of a proper set of features characterising the protein interface is crucial for the interface prediction task, and that optimality strongly depends on the class of proteins whose interface we want to characterise. We postulate that different protein classes should be treated separately and that it is necessary to identify an optimal set of features for each protein class.
Zhou, Jiyun; Wang, Hongpeng; Zhao, Zhishan; Xu, Ruifeng; Lu, Qin
2018-05-08
Protein secondary structure is the three dimensional form of local segments of proteins and its prediction is an important problem in protein tertiary structure prediction. Developing computational approaches for protein secondary structure prediction is becoming increasingly urgent. We present a novel deep learning based model, referred to as CNNH_PSS, by using multi-scale CNN with highway. In CNNH_PSS, any two neighbor convolutional layers have a highway to deliver information from current layer to the output of the next one to keep local contexts. As lower layers extract local context while higher layers extract long-range interdependencies, the highways between neighbor layers allow CNNH_PSS to have ability to extract both local contexts and long-range interdependencies. We evaluate CNNH_PSS on two commonly used datasets: CB6133 and CB513. CNNH_PSS outperforms the multi-scale CNN without highway by at least 0.010 Q8 accuracy and also performs better than CNF, DeepCNF and SSpro8, which cannot extract long-range interdependencies, by at least 0.020 Q8 accuracy, demonstrating that both local contexts and long-range interdependencies are indeed useful for prediction. Furthermore, CNNH_PSS also performs better than GSM and DCRNN which need extra complex model to extract long-range interdependencies. It demonstrates that CNNH_PSS not only cost less computer resource, but also achieves better predicting performance. CNNH_PSS have ability to extracts both local contexts and long-range interdependencies by combing multi-scale CNN and highway network. The evaluations on common datasets and comparisons with state-of-the-art methods indicate that CNNH_PSS is an useful and efficient tool for protein secondary structure prediction.
Versatility and Invariance in the Evolution of Homologous Heteromeric Interfaces
Andreani, Jessica; Faure, Guilhem; Guerois, Raphaël
2012-01-01
Evolutionary pressures act on protein complex interfaces so that they preserve their complementarity. Nonetheless, the elementary interactions which compose the interface are highly versatile throughout evolution. Understanding and characterizing interface plasticity across evolution is a fundamental issue which could provide new insights into protein-protein interaction prediction. Using a database of 1,024 couples of close and remote heteromeric structural interologs, we studied protein-protein interactions from a structural and evolutionary point of view. We systematically and quantitatively analyzed the conservation of different types of interface contacts. Our study highlights astonishing plasticity regarding polar contacts at complex interfaces. It also reveals that up to a quarter of the residues switch out of the interface when comparing two homologous complexes. Despite such versatility, we identify two important interface descriptors which correlate with an increased conservation in the evolution of interfaces: apolar patches and contacts surrounding anchor residues. These observations hold true even when restricting the dataset to transiently formed complexes. We show that a combination of six features related either to sequence or to geometric properties of interfaces can be used to rank positions likely to share similar contacts between two interologs. Altogether, our analysis provides important tracks for extracting meaningful information from multiple sequence alignments of conserved binding partners and for discriminating near-native interfaces using evolutionary information. PMID:22952442
Real-Time Ligand Binding Pocket Database Search Using Local Surface Descriptors
Chikhi, Rayan; Sael, Lee; Kihara, Daisuke
2010-01-01
Due to the increasing number of structures of unknown function accumulated by ongoing structural genomics projects, there is an urgent need for computational methods for characterizing protein tertiary structures. As functions of many of these proteins are not easily predicted by conventional sequence database searches, a legitimate strategy is to utilize structure information in function characterization. Of a particular interest is prediction of ligand binding to a protein, as ligand molecule recognition is a major part of molecular function of proteins. Predicting whether a ligand molecule binds a protein is a complex problem due to the physical nature of protein-ligand interactions and the flexibility of both binding sites and ligand molecules. However, geometric and physicochemical complementarity is observed between the ligand and its binding site in many cases. Therefore, ligand molecules which bind to a local surface site in a protein can be predicted by finding similar local pockets of known binding ligands in the structure database. Here, we present two representations of ligand binding pockets and utilize them for ligand binding prediction by pocket shape comparison. These representations are based on mapping of surface properties of binding pockets, which are compactly described either by the two dimensional pseudo-Zernike moments or the 3D Zernike descriptors. These compact representations allow a fast real-time pocket searching against a database. Thorough benchmark study employing two different datasets show that our representations are competitive with the other existing methods. Limitations and potentials of the shape-based methods as well as possible improvements are discussed. PMID:20455259
Real-time ligand binding pocket database search using local surface descriptors.
Chikhi, Rayan; Sael, Lee; Kihara, Daisuke
2010-07-01
Because of the increasing number of structures of unknown function accumulated by ongoing structural genomics projects, there is an urgent need for computational methods for characterizing protein tertiary structures. As functions of many of these proteins are not easily predicted by conventional sequence database searches, a legitimate strategy is to utilize structure information in function characterization. Of particular interest is prediction of ligand binding to a protein, as ligand molecule recognition is a major part of molecular function of proteins. Predicting whether a ligand molecule binds a protein is a complex problem due to the physical nature of protein-ligand interactions and the flexibility of both binding sites and ligand molecules. However, geometric and physicochemical complementarity is observed between the ligand and its binding site in many cases. Therefore, ligand molecules which bind to a local surface site in a protein can be predicted by finding similar local pockets of known binding ligands in the structure database. Here, we present two representations of ligand binding pockets and utilize them for ligand binding prediction by pocket shape comparison. These representations are based on mapping of surface properties of binding pockets, which are compactly described either by the two-dimensional pseudo-Zernike moments or the three-dimensional Zernike descriptors. These compact representations allow a fast real-time pocket searching against a database. Thorough benchmark studies employing two different datasets show that our representations are competitive with the other existing methods. Limitations and potentials of the shape-based methods as well as possible improvements are discussed.
Zhang, Yi; Nikolovski, Nino; Sorieul, Mathias; Vellosillo, Tamara; McFarlane, Heather E.; Dupree, Ray; Kesten, Christopher; Schneider, René; Driemeier, Carlos; Lathe, Rahul; Lampugnani, Edwin; Yu, Xiaolan; Ivakov, Alexander; Doblin, Monika S.; Mortimer, Jenny C.; Brown, Steven P.; Persson, Staffan; Dupree, Paul
2016-01-01
As the most abundant biopolymer on Earth, cellulose is a key structural component of the plant cell wall. Cellulose is produced at the plasma membrane by cellulose synthase (CesA) complexes (CSCs), which are assembled in the endomembrane system and trafficked to the plasma membrane. While several proteins that affect CesA activity have been identified, components that regulate CSC assembly and trafficking remain unknown. Here we show that STELLO1 and 2 are Golgi-localized proteins that can interact with CesAs and control cellulose quantity. In the absence of STELLO function, the spatial distribution within the Golgi, secretion and activity of the CSCs are impaired indicating a central role of the STELLO proteins in CSC assembly. Point mutations in the predicted catalytic domains of the STELLO proteins indicate that they are glycosyltransferases facing the Golgi lumen. Hence, we have uncovered proteins that regulate CSC assembly in the plant Golgi apparatus. PMID:27277162
The Disordered C-Terminus of Yeast Hsf1 Contains a Cryptic Low-Complexity Amyloidogenic Region.
Pujols, Jordi; Santos, Jaime; Pallarès, Irantzu; Ventura, Salvador
2018-05-06
Response mechanisms to external stress rely on networks of proteins able to activate specific signaling pathways to ensure the maintenance of cell proteostasis. Many of the proteins mediating this kind of response contain intrinsically disordered regions, which lack a defined structure, but still are able to interact with a wide range of clients that modulate the protein function. Some of these interactions are mediated by specific short sequences embedded in the longer disordered regions. Because the physicochemical properties that promote functional and abnormal interactions are similar, it has been shown that, in globular proteins, aggregation-prone and binding regions tend to overlap. It could be that the same principle applies for disordered protein regions. In this context, we show here that a predicted low-complexity interacting region in the disordered C-terminus of the stress response master regulator heat shock factor 1 (Hsf1) protein corresponds to a cryptic amyloid region able to self-assemble into fibrillary structures resembling those found in neurodegenerative disorders.
2009-07-01
that pathogenic TSC1 amino acid changes are clustered to a conserved ~300 amino acid region close to the N-terminal of the protein . These substitutions ...Genet. (2009) 18 2378-2387. 15. Ng PC and Henikoff S. Predicting the effects of amino acid substitutions on protein function. Annu. Rev...amino acid substitutions in the N-terminal region of TSC1 that result in reduced steady state levels of the protein and lead to increased mTOR
Predicting network modules of cell cycle regulators using relative protein abundance statistics.
Oguz, Cihan; Watson, Layne T; Baumann, William T; Tyson, John J
2017-02-28
Parameter estimation in systems biology is typically done by enforcing experimental observations through an objective function as the parameter space of a model is explored by numerical simulations. Past studies have shown that one usually finds a set of "feasible" parameter vectors that fit the available experimental data equally well, and that these alternative vectors can make different predictions under novel experimental conditions. In this study, we characterize the feasible region of a complex model of the budding yeast cell cycle under a large set of discrete experimental constraints in order to test whether the statistical features of relative protein abundance predictions are influenced by the topology of the cell cycle regulatory network. Using differential evolution, we generate an ensemble of feasible parameter vectors that reproduce the phenotypes (viable or inviable) of wild-type yeast cells and 110 mutant strains. We use this ensemble to predict the phenotypes of 129 mutant strains for which experimental data is not available. We identify 86 novel mutants that are predicted to be viable and then rank the cell cycle proteins in terms of their contributions to cumulative variability of relative protein abundance predictions. Proteins involved in "regulation of cell size" and "regulation of G1/S transition" contribute most to predictive variability, whereas proteins involved in "positive regulation of transcription involved in exit from mitosis," "mitotic spindle assembly checkpoint" and "negative regulation of cyclin-dependent protein kinase by cyclin degradation" contribute the least. These results suggest that the statistics of these predictions may be generating patterns specific to individual network modules (START, S/G2/M, and EXIT). To test this hypothesis, we develop random forest models for predicting the network modules of cell cycle regulators using relative abundance statistics as model inputs. Predictive performance is assessed by the areas under receiver operating characteristics curves (AUC). Our models generate an AUC range of 0.83-0.87 as opposed to randomized models with AUC values around 0.50. By using differential evolution and random forest modeling, we show that the model prediction statistics generate distinct network module-specific patterns within the cell cycle network.
Proteomics to study DNA-bound and chromatin-associated gene regulatory complexes
Wierer, Michael; Mann, Matthias
2016-01-01
High-resolution mass spectrometry (MS)-based proteomics is a powerful method for the identification of soluble protein complexes and large-scale affinity purification screens can decode entire protein interaction networks. In contrast, protein complexes residing on chromatin have been much more challenging, because they are difficult to purify and often of very low abundance. However, this is changing due to recent methodological and technological advances in proteomics. Proteins interacting with chromatin marks can directly be identified by pulldowns with synthesized histone tails containing posttranslational modifications (PTMs). Similarly, pulldowns with DNA baits harbouring single nucleotide polymorphisms or DNA modifications reveal the impact of those DNA alterations on the recruitment of transcription factors. Accurate quantitation – either isotope-based or label free – unambiguously pinpoints proteins that are significantly enriched over control pulldowns. In addition, protocols that combine classical chromatin immunoprecipitation (ChIP) methods with mass spectrometry (ChIP-MS) target gene regulatory complexes in their in-vivo context. Similar to classical ChIP, cells are crosslinked with formaldehyde and chromatin sheared by sonication or nuclease digested. ChIP-MS baits can be proteins in tagged or endogenous form, histone PTMs, or lncRNAs. Locus-specific ChIP-MS methods would allow direct purification of a single genomic locus and the proteins associated with it. There, loci can be targeted either by artificial DNA-binding sites and corresponding binding proteins or via proteins with sequence specificity such as TAL or nuclease deficient Cas9 in combination with a specific guide RNA. We predict that advances in MS technology will soon make such approaches generally applicable tools in epigenetics. PMID:27402878
Swanson, Jon; Audie, Joseph
2018-01-01
A fundamental and unsolved problem in biophysical chemistry is the development of a computationally simple, physically intuitive, and generally applicable method for accurately predicting and physically explaining protein-protein binding affinities from protein-protein interaction (PPI) complex coordinates. Here, we propose that the simplification of a previously described six-term PPI scoring function to a four term function results in a simple expression of all physically and statistically meaningful terms that can be used to accurately predict and explain binding affinities for a well-defined subset of PPIs that are characterized by (1) crystallographic coordinates, (2) rigid-body association, (3) normal interface size, and hydrophobicity and hydrophilicity, and (4) high quality experimental binding affinity measurements. We further propose that the four-term scoring function could be regarded as a core expression for future development into a more general PPI scoring function. Our work has clear implications for PPI modeling and structure-based drug design.
High-confidence prediction of global interactomes based on genome-wide coevolutionary networks
Juan, David; Pazos, Florencio; Valencia, Alfonso
2008-01-01
Interacting or functionally related protein families tend to have similar phylogenetic trees. Based on this observation, techniques have been developed to predict interaction partners. The observed degree of similarity between the phylogenetic trees of two proteins is the result of many different factors besides the actual interaction or functional relationship between them. Such factors influence the performance of interaction predictions. One aspect that can influence this similarity is related to the fact that a given protein interacts with many others, and hence it must adapt to all of them. Accordingly, the interaction or coadaptation signal within its tree is a composite of the influence of all of the interactors. Here, we introduce a new estimator of coevolution to overcome this and other problems. Instead of relying on the individual value of tree similarity between two proteins, we use the whole network of similarities between all of the pairs of proteins within a genome to reassess the similarity of that pair, thereby taking into account its coevolutionary context. We show that this approach offers a substantial improvement in interaction prediction performance, providing a degree of accuracy/coverage comparable with, or in some cases better than, that of experimental techniques. Moreover, important information on the structure, function, and evolution of macromolecular complexes can be inferred with this methodology. PMID:18199838
High-confidence prediction of global interactomes based on genome-wide coevolutionary networks.
Juan, David; Pazos, Florencio; Valencia, Alfonso
2008-01-22
Interacting or functionally related protein families tend to have similar phylogenetic trees. Based on this observation, techniques have been developed to predict interaction partners. The observed degree of similarity between the phylogenetic trees of two proteins is the result of many different factors besides the actual interaction or functional relationship between them. Such factors influence the performance of interaction predictions. One aspect that can influence this similarity is related to the fact that a given protein interacts with many others, and hence it must adapt to all of them. Accordingly, the interaction or coadaptation signal within its tree is a composite of the influence of all of the interactors. Here, we introduce a new estimator of coevolution to overcome this and other problems. Instead of relying on the individual value of tree similarity between two proteins, we use the whole network of similarities between all of the pairs of proteins within a genome to reassess the similarity of that pair, thereby taking into account its coevolutionary context. We show that this approach offers a substantial improvement in interaction prediction performance, providing a degree of accuracy/coverage comparable with, or in some cases better than, that of experimental techniques. Moreover, important information on the structure, function, and evolution of macromolecular complexes can be inferred with this methodology.
Multiple TPR motifs characterize the Fanconi anemia FANCG protein.
Blom, Eric; van de Vrugt, Henri J; de Vries, Yne; de Winter, Johan P; Arwert, Fré; Joenje, Hans
2004-01-05
The genome protection pathway that is defective in patients with Fanconi anemia (FA) is controlled by at least eight genes, including BRCA2. A key step in the pathway involves the monoubiquitylation of FANCD2, which critically depends on a multi-subunit nuclear 'core complex' of at least six FANC proteins (FANCA, -C, -E, -F, -G, and -L). Except for FANCL, which has WD40 repeats and a RING finger domain, no significant domain structure has so far been recognized in any of the core complex proteins. By using a homology search strategy comparing the human FANCG protein sequence with its ortholog sequences in Oryzias latipes (Japanese rice fish) and Danio rerio (zebrafish) we identified at least seven tetratricopeptide repeat motifs (TPRs) covering a major part of this protein. TPRs are degenerate 34-amino acid repeat motifs which function as scaffolds mediating protein-protein interactions, often found in multiprotein complexes. In four out of five TPR motifs tested (TPR1, -2, -5, and -6), targeted missense mutagenesis disrupting the motifs at the critical position 8 of each TPR caused complete or partial loss of FANCG function. Loss of function was evident from failure of the mutant proteins to complement the cellular FA phenotype in FA-G lymphoblasts, which was correlated with loss of binding to FANCA. Although the TPR4 mutant fully complemented the cells, it showed a reduced interaction with FANCA, suggesting that this TPR may also be of functional importance. The recognition of FANCG as a typical TPR protein predicts this protein to play a key role in the assembly and/or stabilization of the nuclear FA protein core complex.
Hot-spot analysis for drug discovery targeting protein-protein interactions.
Rosell, Mireia; Fernández-Recio, Juan
2018-04-01
Protein-protein interactions are important for biological processes and pathological situations, and are attractive targets for drug discovery. However, rational drug design targeting protein-protein interactions is still highly challenging. Hot-spot residues are seen as the best option to target such interactions, but their identification requires detailed structural and energetic characterization, which is only available for a tiny fraction of protein interactions. Areas covered: In this review, the authors cover a variety of computational methods that have been reported for the energetic analysis of protein-protein interfaces in search of hot-spots, and the structural modeling of protein-protein complexes by docking. This can help to rationalize the discovery of small-molecule inhibitors of protein-protein interfaces of therapeutic interest. Computational analysis and docking can help to locate the interface, molecular dynamics can be used to find suitable cavities, and hot-spot predictions can focus the search for inhibitors of protein-protein interactions. Expert opinion: A major difficulty for applying rational drug design methods to protein-protein interactions is that in the majority of cases the complex structure is not available. Fortunately, computational docking can complement experimental data. An interesting aspect to explore in the future is the integration of these strategies for targeting PPIs with large-scale mutational analysis.
Ballester, Pedro J; Mitchell, John B O
2010-05-01
Accurately predicting the binding affinities of large sets of diverse protein-ligand complexes is an extremely challenging task. The scoring functions that attempt such computational prediction are essential for analysing the outputs of molecular docking, which in turn is an important technique for drug discovery, chemical biology and structural biology. Each scoring function assumes a predetermined theory-inspired functional form for the relationship between the variables that characterize the complex, which also include parameters fitted to experimental or simulation data and its predicted binding affinity. The inherent problem of this rigid approach is that it leads to poor predictivity for those complexes that do not conform to the modelling assumptions. Moreover, resampling strategies, such as cross-validation or bootstrapping, are still not systematically used to guard against the overfitting of calibration data in parameter estimation for scoring functions. We propose a novel scoring function (RF-Score) that circumvents the need for problematic modelling assumptions via non-parametric machine learning. In particular, Random Forest was used to implicitly capture binding effects that are hard to model explicitly. RF-Score is compared with the state of the art on the demanding PDBbind benchmark. Results show that RF-Score is a very competitive scoring function. Importantly, RF-Score's performance was shown to improve dramatically with training set size and hence the future availability of more high-quality structural and interaction data is expected to lead to improved versions of RF-Score. pedro.ballester@ebi.ac.uk; jbom@st-andrews.ac.uk Supplementary data are available at Bioinformatics online.
NASA Astrophysics Data System (ADS)
Hitzenberger, Manuel; Schuster, Daniela; Hofer, Thomas S.
2017-10-01
Erroneous activation of the Hedgehog pathway has been linked to a great amount of cancerous diseases and therefore a large number of studies aiming at its inhibition have been carried out. One leverage point for novel therapeutic strategies targeting the proteins involved, is the prevention of complex formation between the extracellular signaling protein Sonic Hedgehog and the transmembrane protein Patched 1. In 2009 robotnikinin, a small molecule capable of binding to and inhibiting the activity of Sonic Hedgehog has been identified, however in the absence of X-ray structures of the Sonic Hedgehog-robotnikinin complex, the binding mode of this inhibitor remains unknown. In order to aid with the identification of novel Sonic Hedgehog inhibitors, the presented investigation elucidates the binding mode of robotnikinin by performing an extensive docking study, including subsequent molecular mechanical as well as quantum mechanical/molecular mechanical molecular dynamics simulations. The attained configurations enabled the identification of a number of key protein-ligand interactions, aiding complex formation and providing stabilizing contributions to the binding of the ligand. The predicted structure of the Sonic Hedgehog-robotnikinin complex is provided via a PDB file as supplementary material and can be used for further reference.
Brasil, Christiane Regina Soares; Delbem, Alexandre Claudio Botazzo; da Silva, Fernando Luís Barroso
2013-07-30
This article focuses on the development of an approach for ab initio protein structure prediction (PSP) without using any earlier knowledge from similar protein structures, as fragment-based statistics or inference of secondary structures. Such an approach is called purely ab initio prediction. The article shows that well-designed multiobjective evolutionary algorithms can predict relevant protein structures in a purely ab initio way. One challenge for purely ab initio PSP is the prediction of structures with β-sheets. To work with such proteins, this research has also developed procedures to efficiently estimate hydrogen bond and solvation contribution energies. Considering van der Waals, electrostatic, hydrogen bond, and solvation contribution energies, the PSP is a problem with four energetic terms to be minimized. Each interaction energy term can be considered an objective of an optimization method. Combinatorial problems with four objectives have been considered too complex for the available multiobjective optimization (MOO) methods. The proposed approach, called "Multiobjective evolutionary algorithms with many tables" (MEAMT), can efficiently deal with four objectives through the combination thereof, performing a more adequate sampling of the objective space. Therefore, this method can better map the promising regions in this space, predicting structures in a purely ab initio way. In other words, MEAMT is an efficient optimization method for MOO, which explores simultaneously the search space as well as the objective space. MEAMT can predict structures with one or two domains with RMSDs comparable to values obtained by recently developed ab initio methods (GAPFCG , I-PAES, and Quark) that use different levels of earlier knowledge. Copyright © 2013 Wiley Periodicals, Inc.
Zhang, Z; Liu, Y; Song, T; Xue, Z; Shen, X; Liang, F; Zhao, Y; Li, Z; Sheng, H
2013-01-01
Background: Bcl-2-like members have been found to be inherently overexpressed in many types of haematologic malignancies. The small-molecule S1 is a BH3 mimetic and a triple inhibitor of Bcl-2, Mcl-1 and Bcl-XL. Methods: The lethal dose 50 (LD50) values of S1 in five leukaemic cell lines and 41 newly diagnosed leukaemia samples were tested. The levels of Bcl-2 family members and phosphorylated Bcl-2 were semiquantitatively measured by western blotting. The interactions between Bcl-2 family members were tested by co-immunoprecipitation. The correlation between the LD50 and expression levels of Bcl-2 family members, alone or in combination, was analysed. Results: S1 exhibited variable sensitivity with LD50 values ranging >2 logs in both established and primary leukaemic cells. The ratio of pBcl-2/(Bcl-2+Mcl-1) could predict the S1 response. Furthermore, we demonstrated that pBcl-2 antagonised S1 by sequestering the Bak and Bim proteins that were released from Mcl-1, andpBcl-2/Bak, pBcl-2/Bax and pBcl-2/Bim complexes cannot be disrupted by S1. Conclusion: A predictive index was obtained for the novel BH3 mimetic S1. The shift of proapoptotic proteins from being complexed with Mcl-1 to being complexed with pBcl-2 was revealed for the first time, which is the mechanism underlying the index value described herein. PMID:23558901
Binding Affinity prediction with Property Encoded Shape Distribution signatures
Das, Sourav; Krein, Michael P.
2010-01-01
We report the use of the molecular signatures known as “Property-Encoded Shape Distributions” (PESD) together with standard Support Vector Machine (SVM) techniques to produce validated models that can predict the binding affinity of a large number of protein ligand complexes. This “PESD-SVM” method uses PESD signatures that encode molecular shapes and property distributions on protein and ligand surfaces as features to build SVM models that require no subjective feature selection. A simple protocol was employed for tuning the SVM models during their development, and the results were compared to SFCscore – a regression-based method that was previously shown to perform better than 14 other scoring functions. Although the PESD-SVM method is based on only two surface property maps, the overall results were comparable. For most complexes with a dominant enthalpic contribution to binding (ΔH/-TΔS > 3), a good correlation between true and predicted affinities was observed. Entropy and solvent were not considered in the present approach and further improvement in accuracy would require accounting for these components rigorously. PMID:20095526
Maurer-Stroh, Sebastian; Gao, He; Han, Hao; Baeten, Lies; Schymkowitz, Joost; Rousseau, Frederic; Zhang, Louxin; Eisenhaber, Frank
2013-02-01
Data mining in protein databases, derivatives from more fundamental protein 3D structure and sequence databases, has considerable unearthed potential for the discovery of sequence motif--structural motif--function relationships as the finding of the U-shape (Huf-Zinc) motif, originally a small student's project, exemplifies. The metal ion zinc is critically involved in universal biological processes, ranging from protein-DNA complexes and transcription regulation to enzymatic catalysis and metabolic pathways. Proteins have evolved a series of motifs to specifically recognize and bind zinc ions. Many of these, so called zinc fingers, are structurally independent globular domains with discontinuous binding motifs made up of residues mostly far apart in sequence. Through a systematic approach starting from the BRIX structure fragment database, we discovered that there exists another predictable subset of zinc-binding motifs that not only have a conserved continuous sequence pattern but also share a characteristic local conformation, despite being included in totally different overall folds. While this does not allow general prediction of all Zn binding motifs, a HMM-based web server, Huf-Zinc, is available for prediction of these novel, as well as conventional, zinc finger motifs in protein sequences. The Huf-Zinc webserver can be freely accessed through this URL (http://mendel.bii.a-star.edu.sg/METHODS/hufzinc/).
Chen, Ruoying; Zhang, Zhiwang; Wu, Di; Zhang, Peng; Zhang, Xinyang; Wang, Yong; Shi, Yong
2011-01-21
Protein-protein interactions are fundamentally important in many biological processes and it is in pressing need to understand the principles of protein-protein interactions. Mutagenesis studies have found that only a small fraction of surface residues, known as hot spots, are responsible for the physical binding in protein complexes. However, revealing hot spots by mutagenesis experiments are usually time consuming and expensive. In order to complement the experimental efforts, we propose a new computational approach in this paper to predict hot spots. Our method, Rough Set-based Multiple Criteria Linear Programming (RS-MCLP), integrates rough sets theory and multiple criteria linear programming to choose dominant features and computationally predict hot spots. Our approach is benchmarked by a dataset of 904 alanine-mutated residues and the results show that our RS-MCLP method performs better than other methods, e.g., MCLP, Decision Tree, Bayes Net, and the existing HotSprint database. In addition, we reveal several biological insights based on our analysis. We find that four features (the change of accessible surface area, percentage of the change of accessible surface area, size of a residue, and atomic contacts) are critical in predicting hot spots. Furthermore, we find that three residues (Tyr, Trp, and Phe) are abundant in hot spots through analyzing the distribution of amino acids. Copyright © 2010 Elsevier Ltd. All rights reserved.
Habibi, Narjeskhatoon; Norouzi, Alireza; Mohd Hashim, Siti Z; Shamsir, Mohd Shahir; Samian, Razip
2015-11-01
Recombinant protein overexpression, an important biotechnological process, is ruled by complex biological rules which are mostly unknown, is in need of an intelligent algorithm so as to avoid resource-intensive lab-based trial and error experiments in order to determine the expression level of the recombinant protein. The purpose of this study is to propose a predictive model to estimate the level of recombinant protein overexpression for the first time in the literature using a machine learning approach based on the sequence, expression vector, and expression host. The expression host was confined to Escherichia coli which is the most popular bacterial host to overexpress recombinant proteins. To provide a handle to the problem, the overexpression level was categorized as low, medium and high. A set of features which were likely to affect the overexpression level was generated based on the known facts (e.g. gene length) and knowledge gathered from related literature. Then, a representative sub-set of features generated in the previous objective was determined using feature selection techniques. Finally a predictive model was developed using random forest classifier which was able to adequately classify the multi-class imbalanced small dataset constructed. The result showed that the predictive model provided a promising accuracy of 80% on average, in estimating the overexpression level of a recombinant protein. Copyright © 2015 Elsevier Ltd. All rights reserved.
Mapping of ligand-binding cavities in proteins.
Andersson, C David; Chen, Brian Y; Linusson, Anna
2010-05-01
The complex interactions between proteins and small organic molecules (ligands) are intensively studied because they play key roles in biological processes and drug activities. Here, we present a novel approach to characterize and map the ligand-binding cavities of proteins without direct geometric comparison of structures, based on Principal Component Analysis of cavity properties (related mainly to size, polarity, and charge). This approach can provide valuable information on the similarities and dissimilarities, of binding cavities due to mutations, between-species differences and flexibility upon ligand-binding. The presented results show that information on ligand-binding cavity variations can complement information on protein similarity obtained from sequence comparisons. The predictive aspect of the method is exemplified by successful predictions of serine proteases that were not included in the model construction. The presented strategy to compare ligand-binding cavities of related and unrelated proteins has many potential applications within protein and medicinal chemistry, for example in the characterization and mapping of "orphan structures", selection of protein structures for docking studies in structure-based design, and identification of proteins for selectivity screens in drug design programs. 2009 Wiley-Liss, Inc.
Mapping of Ligand-Binding Cavities in Proteins
Andersson, C. David; Chen, Brian Y.; Linusson, Anna
2010-01-01
The complex interactions between proteins and small organic molecules (ligands) are intensively studied because they play key roles in biological processes and drug activities. Here, we present a novel approach to characterise and map the ligand-binding cavities of proteins without direct geometric comparison of structures, based on Principal Component Analysis of cavity properties (related mainly to size, polarity and charge). This approach can provide valuable information on the similarities, and dissimilarities, of binding cavities due to mutations, between-species differences and flexibility upon ligand-binding. The presented results show that information on ligand-binding cavity variations can complement information on protein similarity obtained from sequence comparisons. The predictive aspect of the method is exemplified by successful predictions of serine proteases that were not included in the model construction. The presented strategy to compare ligand-binding cavities of related and unrelated proteins has many potential applications within protein and medicinal chemistry, for example in the characterisation and mapping of “orphan structures”, selection of protein structures for docking studies in structure-based design and identification of proteins for selectivity screens in drug design programs. PMID:20034113
Chen, Derek E; Willick, Darryl L; Ruckel, Joseph B; Floriano, Wely B
2015-01-01
Directed evolution is a technique that enables the identification of mutants of a particular protein that carry a desired property by successive rounds of random mutagenesis, screening, and selection. This technique has many applications, including the development of G protein-coupled receptor-based biosensors and designer drugs for personalized medicine. Although effective, directed evolution is not without challenges and can greatly benefit from the development of computational techniques to predict the functional outcome of single-point amino acid substitutions. In this article, we describe a molecular dynamics-based approach to predict the effects of single amino acid substitutions on agonist binding (salicin) to a human bitter taste receptor (hT2R16). An experimentally determined functional map of single-point amino acid substitutions was used to validate the whole-protein molecular dynamics-based predictive functions. Molecular docking was used to construct a wild-type agonist-receptor complex, providing a starting structure for single-point substitution simulations. The effects of each single amino acid substitution in the functional response of the receptor to its agonist were estimated using three binding energy schemes with increasing inclusion of solvation effects. We show that molecular docking combined with molecular mechanics simulations of single-point mutants of the agonist-receptor complex accurately predicts the functional outcome of single amino acid substitutions in a human bitter taste receptor.
NASA Astrophysics Data System (ADS)
Laszlo, Kenneth J.; Bush, Matthew F.
2015-12-01
Mass spectra of native-like protein complexes often exhibit narrow charge-state distributions, broad peaks, and contributions from multiple, coexisting species. These factors can make it challenging to interpret those spectra, particularly for mixtures with significant heterogeneity. Here we demonstrate the use of ion/ion proton transfer reactions to reduce the charge states of m/ z-selected, native-like ions of proteins and protein complexes, a technique that we refer to as cation to anion proton transfer reactions (CAPTR). We then demonstrate that CAPTR can increase the accuracy of charge state assignments and the resolution of interfering species in native mass spectrometry. The CAPTR product ion spectra for pyruvate kinase exhibit ~30 peaks and enable unambiguous determination of the charge state of each peak, whereas the corresponding precursor spectra exhibit ~6 peaks and the assigned charge states have an uncertainty of ±3%. 15+ bovine serum albumin and 21+ yeast enolase dimer both appear near m/ z 4450 and are completely unresolved in a mixture. After a single CAPTR event, the resulting product ions are baseline resolved. The separation of the product ions increases dramatically after each subsequent CAPTR event; 12 events resulted in a 3000-fold improvement in separation relative to the precursor ions. Finally, we introduce a framework for interpreting and predicting the figures of merit for CAPTR experiments. More generally, these results suggest that CAPTR strongly complements other mass spectrometry tools for analyzing proteins and protein complexes, particularly those in mixtures.
Wilson, Heather L.; Ou, Mark S.; Aldrich, Henry C.; Maupin-Furlow, Julie
2000-01-01
The 20S proteasome is a self-compartmentalized protease which degrades unfolded polypeptides and has been purified from eucaryotes, gram-positive actinomycetes, and archaea. Energy-dependent complexes, such as the 19S cap of the eucaryal 26S proteasome, are assumed to be responsible for the recognition and/or unfolding of substrate proteins which are then translocated into the central chamber of the 20S proteasome and hydrolyzed to polypeptide products of 3 to 30 residues. All archaeal genomes which have been sequenced are predicted to encode proteins with up to ∼50% identity to the six ATPase subunits of the 19S cap. In this study, one of these archaeal homologs which has been named PAN for proteasome-activating nucleotidase was characterized from the hyperthermophile Methanococcus jannaschii. In addition, the M. jannaschii 20S proteasome was purified as a 700-kDa complex by in vitro assembly of the α and β subunits and has an unusually high rate of peptide and unfolded-polypeptide hydrolysis at 100°C. The 550-kDa PAN complex was required for CTP- or ATP-dependent degradation of β-casein by archaeal 20S proteasomes. A 500-kDa complex of PAN(Δ1–73), which has a deletion of residues 1 to 73 of the deduced protein and disrupts the predicted N-terminal coiled-coil, also facilitated this energy-dependent proteolysis. However, this deletion increased the types of nucleotides hydrolyzed to include not only ATP and CTP but also ITP, GTP, TTP, and UTP. The temperature optimum for nucleotide (ATP) hydrolysis was reduced from 80°C for the full-length protein to 65°C for PAN(Δ1–73). Both PAN protein complexes were stable in the absence of ATP and were inhibited by N-ethylmaleimide and p-chloromercuriphenyl-sulfonic acid. Kinetic analysis reveals that the PAN protein has a relatively high Vmax for ATP and CTP hydrolysis of 3.5 and 5.8 μmol of Pi per min per mg of protein as well as a relatively low affinity for CTP and ATP with Km values of 307 and 497 μM compared to other proteins of the AAA family. Based on electron micrographs, PAN and PAN(Δ1–73) apparently associate with the ends of the 20S proteasome cylinder. These results suggest that the M. jannaschii as well as related archaeal 20S proteasomes require a nucleotidase complex such as PAN to mediate the energy-dependent hydrolysis of folded-substrate proteins and that the N-terminal 73 amino acid residues of PAN are not absolutely required for this reaction. PMID:10692374
Drug search for leishmaniasis: a virtual screening approach by grid computing
NASA Astrophysics Data System (ADS)
Ochoa, Rodrigo; Watowich, Stanley J.; Flórez, Andrés; Mesa, Carol V.; Robledo, Sara M.; Muskus, Carlos
2016-07-01
The trypanosomatid protozoa Leishmania is endemic in 100 countries, with infections causing 2 million new cases of leishmaniasis annually. Disease symptoms can include severe skin and mucosal ulcers, fever, anemia, splenomegaly, and death. Unfortunately, therapeutics approved to treat leishmaniasis are associated with potentially severe side effects, including death. Furthermore, drug-resistant Leishmania parasites have developed in most endemic countries. To address an urgent need for new, safe and inexpensive anti-leishmanial drugs, we utilized the IBM World Community Grid to complete computer-based drug discovery screens (Drug Search for Leishmaniasis) using unique leishmanial proteins and a database of 600,000 drug-like small molecules. Protein structures from different Leishmania species were selected for molecular dynamics (MD) simulations, and a series of conformational "snapshots" were chosen from each MD trajectory to simulate the protein's flexibility. A Relaxed Complex Scheme methodology was used to screen 2000 MD conformations against the small molecule database, producing >1 billion protein-ligand structures. For each protein target, a binding spectrum was calculated to identify compounds predicted to bind with highest average affinity to all protein conformations. Significantly, four different Leishmania protein targets were predicted to strongly bind small molecules, with the strongest binding interactions predicted to occur for dihydroorotate dehydrogenase (LmDHODH; PDB:3MJY). A number of predicted tight-binding LmDHODH inhibitors were tested in vitro and potent selective inhibitors of Leishmania panamensis were identified. These promising small molecules are suitable for further development using iterative structure-based optimization and in vitro/in vivo validation assays.
Drug search for leishmaniasis: a virtual screening approach by grid computing.
Ochoa, Rodrigo; Watowich, Stanley J; Flórez, Andrés; Mesa, Carol V; Robledo, Sara M; Muskus, Carlos
2016-07-01
The trypanosomatid protozoa Leishmania is endemic in ~100 countries, with infections causing ~2 million new cases of leishmaniasis annually. Disease symptoms can include severe skin and mucosal ulcers, fever, anemia, splenomegaly, and death. Unfortunately, therapeutics approved to treat leishmaniasis are associated with potentially severe side effects, including death. Furthermore, drug-resistant Leishmania parasites have developed in most endemic countries. To address an urgent need for new, safe and inexpensive anti-leishmanial drugs, we utilized the IBM World Community Grid to complete computer-based drug discovery screens (Drug Search for Leishmaniasis) using unique leishmanial proteins and a database of 600,000 drug-like small molecules. Protein structures from different Leishmania species were selected for molecular dynamics (MD) simulations, and a series of conformational "snapshots" were chosen from each MD trajectory to simulate the protein's flexibility. A Relaxed Complex Scheme methodology was used to screen ~2000 MD conformations against the small molecule database, producing >1 billion protein-ligand structures. For each protein target, a binding spectrum was calculated to identify compounds predicted to bind with highest average affinity to all protein conformations. Significantly, four different Leishmania protein targets were predicted to strongly bind small molecules, with the strongest binding interactions predicted to occur for dihydroorotate dehydrogenase (LmDHODH; PDB:3MJY). A number of predicted tight-binding LmDHODH inhibitors were tested in vitro and potent selective inhibitors of Leishmania panamensis were identified. These promising small molecules are suitable for further development using iterative structure-based optimization and in vitro/in vivo validation assays.
Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model.
Wang, Sheng; Sun, Siqi; Li, Zhen; Zhang, Renyu; Xu, Jinbo
2017-01-01
Protein contacts contain key information for the understanding of protein structure and function and thus, contact prediction from sequence is an important problem. Recently exciting progress has been made on this problem, but the predicted contacts for proteins without many sequence homologs is still of low quality and not very useful for de novo structure prediction. This paper presents a new deep learning method that predicts contacts by integrating both evolutionary coupling (EC) and sequence conservation information through an ultra-deep neural network formed by two deep residual neural networks. The first residual network conducts a series of 1-dimensional convolutional transformation of sequential features; the second residual network conducts a series of 2-dimensional convolutional transformation of pairwise information including output of the first residual network, EC information and pairwise potential. By using very deep residual networks, we can accurately model contact occurrence patterns and complex sequence-structure relationship and thus, obtain higher-quality contact prediction regardless of how many sequence homologs are available for proteins in question. Our method greatly outperforms existing methods and leads to much more accurate contact-assisted folding. Tested on 105 CASP11 targets, 76 past CAMEO hard targets, and 398 membrane proteins, the average top L long-range prediction accuracy obtained by our method, one representative EC method CCMpred and the CASP11 winner MetaPSICOV is 0.47, 0.21 and 0.30, respectively; the average top L/10 long-range accuracy of our method, CCMpred and MetaPSICOV is 0.77, 0.47 and 0.59, respectively. Ab initio folding using our predicted contacts as restraints but without any force fields can yield correct folds (i.e., TMscore>0.6) for 203 of the 579 test proteins, while that using MetaPSICOV- and CCMpred-predicted contacts can do so for only 79 and 62 of them, respectively. Our contact-assisted models also have much better quality than template-based models especially for membrane proteins. The 3D models built from our contact prediction have TMscore>0.5 for 208 of the 398 membrane proteins, while those from homology modeling have TMscore>0.5 for only 10 of them. Further, even if trained mostly by soluble proteins, our deep learning method works very well on membrane proteins. In the recent blind CAMEO benchmark, our fully-automated web server implementing this method successfully folded 6 targets with a new fold and only 0.3L-2.3L effective sequence homologs, including one β protein of 182 residues, one α+β protein of 125 residues, one α protein of 140 residues, one α protein of 217 residues, one α/β of 260 residues and one α protein of 462 residues. Our method also achieved the highest F1 score on free-modeling targets in the latest CASP (Critical Assessment of Structure Prediction), although it was not fully implemented back then. http://raptorx.uchicago.edu/ContactMap/.
Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model
Li, Zhen; Zhang, Renyu
2017-01-01
Motivation Protein contacts contain key information for the understanding of protein structure and function and thus, contact prediction from sequence is an important problem. Recently exciting progress has been made on this problem, but the predicted contacts for proteins without many sequence homologs is still of low quality and not very useful for de novo structure prediction. Method This paper presents a new deep learning method that predicts contacts by integrating both evolutionary coupling (EC) and sequence conservation information through an ultra-deep neural network formed by two deep residual neural networks. The first residual network conducts a series of 1-dimensional convolutional transformation of sequential features; the second residual network conducts a series of 2-dimensional convolutional transformation of pairwise information including output of the first residual network, EC information and pairwise potential. By using very deep residual networks, we can accurately model contact occurrence patterns and complex sequence-structure relationship and thus, obtain higher-quality contact prediction regardless of how many sequence homologs are available for proteins in question. Results Our method greatly outperforms existing methods and leads to much more accurate contact-assisted folding. Tested on 105 CASP11 targets, 76 past CAMEO hard targets, and 398 membrane proteins, the average top L long-range prediction accuracy obtained by our method, one representative EC method CCMpred and the CASP11 winner MetaPSICOV is 0.47, 0.21 and 0.30, respectively; the average top L/10 long-range accuracy of our method, CCMpred and MetaPSICOV is 0.77, 0.47 and 0.59, respectively. Ab initio folding using our predicted contacts as restraints but without any force fields can yield correct folds (i.e., TMscore>0.6) for 203 of the 579 test proteins, while that using MetaPSICOV- and CCMpred-predicted contacts can do so for only 79 and 62 of them, respectively. Our contact-assisted models also have much better quality than template-based models especially for membrane proteins. The 3D models built from our contact prediction have TMscore>0.5 for 208 of the 398 membrane proteins, while those from homology modeling have TMscore>0.5 for only 10 of them. Further, even if trained mostly by soluble proteins, our deep learning method works very well on membrane proteins. In the recent blind CAMEO benchmark, our fully-automated web server implementing this method successfully folded 6 targets with a new fold and only 0.3L-2.3L effective sequence homologs, including one β protein of 182 residues, one α+β protein of 125 residues, one α protein of 140 residues, one α protein of 217 residues, one α/β of 260 residues and one α protein of 462 residues. Our method also achieved the highest F1 score on free-modeling targets in the latest CASP (Critical Assessment of Structure Prediction), although it was not fully implemented back then. Availability http://raptorx.uchicago.edu/ContactMap/ PMID:28056090
Matsumura, Y; Nishigori, C; Yagi, T; Imamura, S; Takebe, H
1998-06-01
Xeroderma pigmentosum (XP) complementation group F was first reported in Japan and most XP-F patients reported to date are Japanese. The clinical features of XP-F patients are rather mild, including late onset of skin cancer. Recently a cDNA that corrects the repair deficiency of cultured XP-F cells was isolated. The XPF protein forms a tight complex with ERCC1 and this complex functions as a structure-specific endonuclease responsible for the 5' incision during DNA excision repair. Here we have identified XPF mRNA mutations and examined levels of the mRNA and protein expression in seven primary cell strains from Japanese XP-F patients. The XP-F cell strains were classified into three types in terms of the effect of the mutation on the predicted protein; (i) XPF proteins with amino acid substitutions; (ii) amino acid substituted and truncated XPF proteins; and (iii) truncated XPF protein only. A normal level of expression of XPF mRNA was observed in XP-F cells but XPF protein was extremely low. These results indicate that the detected mutations lead to unstable XPF protein, resulting in a decrease in formation of the ERCC1-XPF endonuclease complex. Slow excision repair of UV-induced DNA damage due to low residual endonuclease activity provides a plausible explanation for the typical mild phenotype of XP-F patients.
El-Assaad, Atlal; Dawy, Zaher; Nemer, Georges; Hajj, Hazem; Kobeissy, Firas H
2017-01-01
Degradomics is a novel discipline that involves determination of the proteases/substrate fragmentation profile, called the substrate degradome, and has been recently applied in different disciplines. A major application of degradomics is its utility in the field of biomarkers where the breakdown products (BDPs) of different protease have been investigated. Among the major proteases assessed, calpain and caspase proteases have been associated with the execution phases of the pro-apoptotic and pro-necrotic cell death, generating caspase/calpain-specific cleaved fragments. The distinction between calpain and caspase protein fragments has been applied to distinguish injury mechanisms. Advanced proteomics technology has been used to identify these BDPs experimentally. However, it has been a challenge to identify these BDPs with high precision and efficiency, especially if we are targeting a number of proteins at one time. In this chapter, we present a novel bioinfromatic detection method that identifies BDPs accurately and efficiently with validation against experimental data. This method aims at predicting the consensus sequence occurrences and their variants in a large set of experimentally detected protein sequences based on state-of-the-art sequence matching and alignment algorithms. After detection, the method generates all the potential cleaved fragments by a specific protease. This space and time-efficient algorithm is flexible to handle the different orientations that the consensus sequence and the protein sequence can take before cleaving. It is O(mn) in space complexity and O(Nmn) in time complexity, with N number of protein sequences, m length of the consensus sequence, and n length of each protein sequence. Ultimately, this knowledge will subsequently feed into the development of a novel tool for researchers to detect diverse types of selected BDPs as putative disease markers, contributing to the diagnosis and treatment of related disorders.
Optimizing expression of the pregnancy malaria vaccine candidate, VAR2CSA in Pichia pastoris.
Avril, Marion; Hathaway, Marianne J; Cartwright, Megan M; Gose, Severin O; Narum, David L; Smith, Joseph D
2009-06-29
VAR2CSA is the main candidate for a vaccine against pregnancy-associated malaria, but vaccine development is complicated by the large size and complex disulfide bonding pattern of the protein. Recent X-ray crystallographic information suggests that domain boundaries of VAR2CSA Duffy binding-like (DBL) domains may be larger than previously predicted and include two additional cysteine residues. This study investigated whether longer constructs would improve VAR2CSA recombinant protein secretion from Pichia pastoris and if domain boundaries were applicable across different VAR2CSA alleles. VAR2CSA sequences were bioinformatically analysed to identify the predicted C11 and C12 cysteine residues at the C-termini of DBL domains and revised N- and C-termimal domain boundaries were predicted in VAR2CSA. Multiple construct boundaries were systematically evaluated for protein secretion in P. pastoris and secreted proteins were tested as immunogens. From a total of 42 different VAR2CSA constructs, 15 proteins (36%) were secreted. Longer construct boundaries, including the predicted C11 and C12 cysteine residues, generally improved expression of poorly or non-secreted domains and permitted expression of all six VAR2CSA DBL domains. However, protein secretion was still highly empiric and affected by subtle differences in domain boundaries and allelic variation between VAR2CSA sequences. Eleven of the secreted proteins were used to immunize rabbits. Antibodies reacted with CSA-binding infected erythrocytes, indicating that P. pastoris recombinant proteins possessed native protein epitopes. These findings strengthen emerging data for a revision of DBL domain boundaries in var-encoded proteins and may facilitate pregnancy malaria vaccine development.
Optimizing expression of the pregnancy malaria vaccine candidate, VAR2CSA in Pichia pastoris
Avril, Marion; Hathaway, Marianne J; Cartwright, Megan M; Gose, Severin O; Narum, David L; Smith, Joseph D
2009-01-01
Background VAR2CSA is the main candidate for a vaccine against pregnancy-associated malaria, but vaccine development is complicated by the large size and complex disulfide bonding pattern of the protein. Recent X-ray crystallographic information suggests that domain boundaries of VAR2CSA Duffy binding-like (DBL) domains may be larger than previously predicted and include two additional cysteine residues. This study investigated whether longer constructs would improve VAR2CSA recombinant protein secretion from Pichia pastoris and if domain boundaries were applicable across different VAR2CSA alleles. Methods VAR2CSA sequences were bioinformatically analysed to identify the predicted C11 and C12 cysteine residues at the C-termini of DBL domains and revised N- and C-termimal domain boundaries were predicted in VAR2CSA. Multiple construct boundaries were systematically evaluated for protein secretion in P. pastoris and secreted proteins were tested as immunogens. Results From a total of 42 different VAR2CSA constructs, 15 proteins (36%) were secreted. Longer construct boundaries, including the predicted C11 and C12 cysteine residues, generally improved expression of poorly or non-secreted domains and permitted expression of all six VAR2CSA DBL domains. However, protein secretion was still highly empiric and affected by subtle differences in domain boundaries and allelic variation between VAR2CSA sequences. Eleven of the secreted proteins were used to immunize rabbits. Antibodies reacted with CSA-binding infected erythrocytes, indicating that P. pastoris recombinant proteins possessed native protein epitopes. Conclusion These findings strengthen emerging data for a revision of DBL domain boundaries in var-encoded proteins and may facilitate pregnancy malaria vaccine development. PMID:19563628
Rubinstein, Alexander I; Sabirianov, Renat F; Namavar, Fereydoon
2016-10-14
The rapid development of nanoscience and nanotechnology has raised many fundamental questions that significantly impede progress in these fields. In particular, understanding the physicochemical processes at the interface in aqueous solvents requires the development and application of efficient and accurate methods. In the present work we evaluate the electrostatic contribution to the energy of model protein-ceramic complex formation in an aqueous solvent. We apply a non-local (NL) electrostatic approach that accounts for the effects of the short-range structure of the solvent on the electrostatic interactions of the interfacial systems. In this approach the aqueous solvent is considered as a non-ionic liquid, with the rigid and strongly correlated dipoles of the water molecules. We have found that an ordered interfacial aqueous solvent layer at the protein- and ceramic-solvent interfaces reduces the charging energy of both the ceramic and the protein in the solvent, and significantly increases the electrostatic contribution to their association into a complex. This contribution in the presented NL approach was found to be significantly shifted with respect to the classical model at any dielectric constant value of the ceramics. This implies a significant increase of the adsorption energy in the protein-ceramic complex formation for any ceramic material. We show that for several biocompatible ceramics (for example HfO2, ZrO2, and Ta2O5) the above effect predicts electrostatically induced protein-ceramic complex formation. However, in the framework of the classical continuum electrostatic model (the aqueous solvent as a uniform dielectric medium with a high dielectric constant ∼80) the above ceramics cannot be considered as suitable for electrostatically induced complex formation. Our results also show that the protein-ceramic electrostatic interactions can be strong enough to compensate for the unfavorable desolvation effect in the process of protein-ceramic complex formation.
NASA Astrophysics Data System (ADS)
Rubinstein, Alexander I.; Sabirianov, Renat F.; Namavar, Fereydoon
2016-10-01
The rapid development of nanoscience and nanotechnology has raised many fundamental questions that significantly impede progress in these fields. In particular, understanding the physicochemical processes at the interface in aqueous solvents requires the development and application of efficient and accurate methods. In the present work we evaluate the electrostatic contribution to the energy of model protein-ceramic complex formation in an aqueous solvent. We apply a non-local (NL) electrostatic approach that accounts for the effects of the short-range structure of the solvent on the electrostatic interactions of the interfacial systems. In this approach the aqueous solvent is considered as a non-ionic liquid, with the rigid and strongly correlated dipoles of the water molecules. We have found that an ordered interfacial aqueous solvent layer at the protein- and ceramic-solvent interfaces reduces the charging energy of both the ceramic and the protein in the solvent, and significantly increases the electrostatic contribution to their association into a complex. This contribution in the presented NL approach was found to be significantly shifted with respect to the classical model at any dielectric constant value of the ceramics. This implies a significant increase of the adsorption energy in the protein-ceramic complex formation for any ceramic material. We show that for several biocompatible ceramics (for example HfO2, ZrO2, and Ta2O5) the above effect predicts electrostatically induced protein-ceramic complex formation. However, in the framework of the classical continuum electrostatic model (the aqueous solvent as a uniform dielectric medium with a high dielectric constant ˜80) the above ceramics cannot be considered as suitable for electrostatically induced complex formation. Our results also show that the protein-ceramic electrostatic interactions can be strong enough to compensate for the unfavorable desolvation effect in the process of protein-ceramic complex formation.
Shotgun metaproteomics of the human distal gut microbiota
DOE Office of Scientific and Technical Information (OSTI.GOV)
VerBerkmoes, N.C.; Russell, A.L.; Shah, M.
2008-10-15
The human gut contains a dense, complex and diverse microbial community, comprising the gut microbiome. Metagenomics has recently revealed the composition of genes in the gut microbiome, but provides no direct information about which genes are expressed or functioning. Therefore, our goal was to develop a novel approach to directly identify microbial proteins in fecal samples to gain information about the genes expressed and about key microbial functions in the human gut. We used a non-targeted, shotgun mass spectrometry-based whole community proteomics, or metaproteomics, approach for the first deep proteome measurements of thousands of proteins in human fecal samples, thusmore » demonstrating this approach on the most complex sample type to date. The resulting metaproteomes had a skewed distribution relative to the metagenome, with more proteins for translation, energy production and carbohydrate metabolism when compared to what was earlier predicted from metagenomics. Human proteins, including antimicrobial peptides, were also identified, providing a non-targeted glimpse of the host response to the microbiota. Several unknown proteins represented previously undescribed microbial pathways or host immune responses, revealing a novel complex interplay between the human host and its associated microbes.« less
Modeling the Hydration Layer around Proteins: Applications to Small- and Wide-Angle X-Ray Scattering
Virtanen, Jouko Juhani; Makowski, Lee; Sosnick, Tobin R.; Freed, Karl F.
2011-01-01
Small-/wide-angle x-ray scattering (SWAXS) experiments can aid in determining the structures of proteins and protein complexes, but success requires accurate computational treatment of solvation. We compare two methods by which to calculate SWAXS patterns. The first approach uses all-atom explicit-solvent molecular dynamics (MD) simulations. The second, far less computationally expensive method involves prediction of the hydration density around a protein using our new HyPred solvation model, which is applied without the need for additional MD simulations. The SWAXS patterns obtained from the HyPred model compare well to both experimental data and the patterns predicted by the MD simulations. Both approaches exhibit advantages over existing methods for analyzing SWAXS data. The close correspondence between calculated and observed SWAXS patterns provides strong experimental support for the description of hydration implicit in the HyPred model. PMID:22004761
NASA Astrophysics Data System (ADS)
Shtykova, E. V.; Bogacheva, E. N.; Dadinova, L. A.; Jeffries, C. M.; Fedorova, N. V.; Golovko, A. O.; Baratova, L. A.; Batishchev, O. V.
2017-11-01
A complex structural analysis of nuclear export protein NS2 (NEP) of influenza virus A has been performed using bioinformatics predictive methods and small-angle X-ray scattering data. The behavior of NEP molecules in a solution (their aggregation, oligomerization, and dissociation, depending on the buffer composition) has been investigated. It was shown that stable associates are formed even in a conventional aqueous salt solution at physiological pH value. For the first time we have managed to get NEP dimers in solution, to analyze their structure, and to compare the models obtained using the method of the molecular tectonics with the spatial protein structure predicted by us using the bioinformatics methods. The results of the study provide a new insight into the structural features of nuclear export protein NS2 (NEP) of the influenza virus A, which is very important for viral infection development.
@TOME-2: a new pipeline for comparative modeling of protein-ligand complexes.
Pons, Jean-Luc; Labesse, Gilles
2009-07-01
@TOME 2.0 is new web pipeline dedicated to protein structure modeling and small ligand docking based on comparative analyses. @TOME 2.0 allows fold recognition, template selection, structural alignment editing, structure comparisons, 3D-model building and evaluation. These tasks are routinely used in sequence analyses for structure prediction. In our pipeline the necessary software is efficiently interconnected in an original manner to accelerate all the processes. Furthermore, we have also connected comparative docking of small ligands that is performed using protein-protein superposition. The input is a simple protein sequence in one-letter code with no comment. The resulting 3D model, protein-ligand complexes and structural alignments can be visualized through dedicated Web interfaces or can be downloaded for further studies. These original features will aid in the functional annotation of proteins and the selection of templates for molecular modeling and virtual screening. Several examples are described to highlight some of the new functionalities provided by this pipeline. The server and its documentation are freely available at http://abcis.cbs.cnrs.fr/AT2/
Blocquel, David; Habchi, Johnny; Gruet, Antoine; Blangy, Stéphanie; Longhi, Sonia
2012-01-01
Henipaviruses are recently emerged severe human pathogens within the Paramyxoviridae family. Their genome is encapsidated by the nucleoprotein (N) within a helical nucleocapsid that recruits the polymerase complex via the phosphoprotein (P). We have previously shown that in Henipaviruses the N protein possesses an intrinsically disordered C-terminal domain, N(TAIL), which undergoes α-helical induced folding in the presence of the C-terminal domain (P(XD)) of the P protein. Using computational approaches, we previously identified within N(TAIL) four putative molecular recognition elements (MoREs) with different structural propensities, and proposed a structural model for the N(TAIL)-P(XD) complex where the MoRE encompassing residues 473-493 adopt an α-helical conformation at the P(XD) surface. In this work, for each N(TAIL) protein, we designed four deletion constructs bearing different combinations of the predicted MoREs. Following purification of the N(TAIL) truncated proteins from the soluble fraction of E. coli, we characterized them in terms of their conformational, spectroscopic and binding properties. These studies provided direct experimental evidence for the structural state of the four predicted MoREs, and showed that two of them have clear α-helical propensities, with the one spanning residues 473-493 being strictly required for binding to P(XD). We also showed that Henipavirus N(TAIL) and P(XD) form heterologous complexes, indicating that the P(XD) binding regions are functionally interchangeable between the two viruses. By combining spectroscopic and conformational analyses, we showed that the content in regular secondary structure is not a major determinant of protein compaction.
A Demonstration of Le Chatelier’s Principle on the Nanoscale
2017-01-01
Photothermal desorption of molecules from plasmonic nanoparticles is an example of a light-triggered molecular release due to heating of the system. However, this phenomenon ought to work only if the molecule–nanoparticle interaction is exothermic in nature. In this study, we compare protein adsorption behavior onto gold nanoparticles for both endothermic and exothermic complexation reactions, and demonstrate that Le Chatelier’s principle can be applied to predict protein adsorption or desorption on nanomaterial surfaces. Polyelectrolyte-wrapped gold nanorods were used as adsorption platforms for two different proteins, which we were able to adsorb/desorb from the nanorod surface depending on the thermodynamics of their interactions. Furthermore, we show that the behaviors hold up under more complex biological environments such as fetal bovine serum. PMID:29104926
Yang, Pinfen; Sale, Winfield S.
1998-01-01
Previous structural and biochemical studies have revealed that the inner arm dynein I1 is targeted and anchored to a unique site located proximal to the first radial spoke in each 96-nm axoneme repeat on flagellar doublet microtubules. To determine whether intermediate chains mediate the positioning and docking of dynein complexes, we cloned and characterized the 140-kDa intermediate chain (IC140) of the I1 complex. Sequence and secondary structural analysis, with particular emphasis on β-sheet organization, predicted that IC140 contains seven WD repeats. Reexamination of other members of the dynein intermediate chain family of WD proteins indicated that these polypeptides also bear seven WD/β-sheet repeats arranged in the same pattern along each intermediate chain protein. A polyclonal antibody was raised against a 53-kDa fusion protein derived from the C-terminal third of IC140. The antibody is highly specific for IC140 and does not bind to other dynein intermediate chains or proteins in Chlamydomonas flagella. Immunofluorescent microscopy of Chlamydomonas cells confirmed that IC140 is distributed along the length of both flagellar axonemes. In vitro reconstitution experiments demonstrated that the 53-kDa C-terminal fusion protein binds specifically to axonemes lacking the I1 complex. Chemical cross-linking indicated that IC140 is closely associated with a second intermediate chain in the I1 complex. These data suggest that IC140 contains domains responsible for the assembly and docking of the I1 complex to the doublet microtubule cargo. PMID:9843573
Xia, Kai; Dong, Dong; Han, Jing-Dong J
2006-01-01
Background Although protein-protein interaction (PPI) networks have been explored by various experimental methods, the maps so built are still limited in coverage and accuracy. To further expand the PPI network and to extract more accurate information from existing maps, studies have been carried out to integrate various types of functional relationship data. A frequently updated database of computationally analyzed potential PPIs to provide biological researchers with rapid and easy access to analyze original data as a biological network is still lacking. Results By applying a probabilistic model, we integrated 27 heterogeneous genomic, proteomic and functional annotation datasets to predict PPI networks in human. In addition to previously studied data types, we show that phenotypic distances and genetic interactions can also be integrated to predict PPIs. We further built an easy-to-use, updatable integrated PPI database, the Integrated Network Database (IntNetDB) online, to provide automatic prediction and visualization of PPI network among genes of interest. The networks can be visualized in SVG (Scalable Vector Graphics) format for zooming in or out. IntNetDB also provides a tool to extract topologically highly connected network neighborhoods from a specific network for further exploration and research. Using the MCODE (Molecular Complex Detections) algorithm, 190 such neighborhoods were detected among all the predicted interactions. The predicted PPIs can also be mapped to worm, fly and mouse interologs. Conclusion IntNetDB includes 180,010 predicted protein-protein interactions among 9,901 human proteins and represents a useful resource for the research community. Our study has increased prediction coverage by five-fold. IntNetDB also provides easy-to-use network visualization and analysis tools that allow biological researchers unfamiliar with computational biology to access and analyze data over the internet. The web interface of IntNetDB is freely accessible at . Visualization requires Mozilla version 1.8 (or higher) or Internet Explorer with installation of SVGviewer. PMID:17112386
Nanoparticles-cell association predicted by protein corona fingerprints
NASA Astrophysics Data System (ADS)
Palchetti, S.; Digiacomo, L.; Pozzi, D.; Peruzzi, G.; Micarelli, E.; Mahmoudi, M.; Caracciolo, G.
2016-06-01
In a physiological environment (e.g., blood and interstitial fluids) nanoparticles (NPs) will bind proteins shaping a ``protein corona'' layer. The long-lived protein layer tightly bound to the NP surface is referred to as the hard corona (HC) and encodes information that controls NP bioactivity (e.g. cellular association, cellular signaling pathways, biodistribution, and toxicity). Decrypting this complex code has become a priority to predict the NP biological outcomes. Here, we use a library of 16 lipid NPs of varying size (Ø ~ 100-250 nm) and surface chemistry (unmodified and PEGylated) to investigate the relationships between NP physicochemical properties (nanoparticle size, aggregation state and surface charge), protein corona fingerprints (PCFs), and NP-cell association. We found out that none of the NPs' physicochemical properties alone was exclusively able to account for association with human cervical cancer cell line (HeLa). For the entire library of NPs, a total of 436 distinct serum proteins were detected. We developed a predictive-validation modeling that provides a means of assessing the relative significance of the identified corona proteins. Interestingly, a minor fraction of the HC, which consists of only 8 PCFs were identified as main promoters of NP association with HeLa cells. Remarkably, identified PCFs have several receptors with high level of expression on the plasma membrane of HeLa cells.In a physiological environment (e.g., blood and interstitial fluids) nanoparticles (NPs) will bind proteins shaping a ``protein corona'' layer. The long-lived protein layer tightly bound to the NP surface is referred to as the hard corona (HC) and encodes information that controls NP bioactivity (e.g. cellular association, cellular signaling pathways, biodistribution, and toxicity). Decrypting this complex code has become a priority to predict the NP biological outcomes. Here, we use a library of 16 lipid NPs of varying size (Ø ~ 100-250 nm) and surface chemistry (unmodified and PEGylated) to investigate the relationships between NP physicochemical properties (nanoparticle size, aggregation state and surface charge), protein corona fingerprints (PCFs), and NP-cell association. We found out that none of the NPs' physicochemical properties alone was exclusively able to account for association with human cervical cancer cell line (HeLa). For the entire library of NPs, a total of 436 distinct serum proteins were detected. We developed a predictive-validation modeling that provides a means of assessing the relative significance of the identified corona proteins. Interestingly, a minor fraction of the HC, which consists of only 8 PCFs were identified as main promoters of NP association with HeLa cells. Remarkably, identified PCFs have several receptors with high level of expression on the plasma membrane of HeLa cells. Electronic supplementary information (ESI) available: Table S1. Cell viability (%) and cell association of the different nanoparticles used. Table S2. Total number of identified proteins on the different nanoparticles used. Tables S3-S18. Top 25 most abundant corona proteins identified in the protein corona of nanoparticles NP2-NP16 following 1 hour incubation with HP. Table S19. List of descriptors used. Table S20. Potential targets of protein corona fingerprints with its own interaction score (mentha) and the expression median value in Hela cells. Fig. S1 and S2. Effect of exposure to human plasma on size and zeta potential of NPs. Fig. S3. Predictive modeling of nanoparticle-cell association. See DOI: 10.1039/c6nr03898k
Extending Halogen-based Medicinal Chemistry to Proteins
El Hage, Krystel; Pandyarajan, Vijay; Phillips, Nelson B.; Smith, Brian J.; Menting, John G.; Whittaker, Jonathan; Lawrence, Michael C.; Meuwly, Markus; Weiss, Michael A.
2016-01-01
Insulin, a protein critical for metabolic homeostasis, provides a classical model for protein design with application to human health. Recent efforts to improve its pharmaceutical formulation demonstrated that iodination of a conserved tyrosine (TyrB26) enhances key properties of a rapid-acting clinical analog. Moreover, the broad utility of halogens in medicinal chemistry has motivated the use of hybrid quantum- and molecular-mechanical methods to study proteins. Here, we (i) undertook quantitative atomistic simulations of 3-[iodo-TyrB26]insulin to predict its structural features, and (ii) tested these predictions by X-ray crystallography. Using an electrostatic model of the modified aromatic ring based on quantum chemistry, the calculations suggested that the analog, as a dimer and hexamer, exhibits subtle differences in aromatic-aromatic interactions at the dimer interface. Aromatic rings (TyrB16, PheB24, PheB25, 3-I-TyrB26, and their symmetry-related mates) at this interface adjust to enable packing of the hydrophobic iodine atoms within the core of each monomer. Strikingly, these features were observed in the crystal structure of a 3-[iodo-TyrB26]insulin analog (determined as an R6 zinc hexamer). Given that residues B24–B30 detach from the core on receptor binding, the environment of 3-I-TyrB26 in a receptor complex must differ from that in the free hormone. Based on the recent structure of a “micro-receptor” complex, we predict that 3-I-TyrB26 engages the receptor via directional halogen bonding and halogen-directed hydrogen bonding as follows: favorable electrostatic interactions exploiting, respectively, the halogen's electron-deficient σ-hole and electronegative equatorial band. Inspired by quantum chemistry and molecular dynamics, such “halogen engineering” promises to extend principles of medicinal chemistry to proteins. PMID:27875310
Genome-scale prediction of proteins with long intrinsically disordered regions.
Peng, Zhenling; Mizianty, Marcin J; Kurgan, Lukasz
2014-01-01
Proteins with long disordered regions (LDRs), defined as having 30 or more consecutive disordered residues, are abundant in eukaryotes, and these regions are recognized as a distinct class of biologically functional domains. LDRs facilitate various cellular functions and are important for target selection in structural genomics. Motivated by the lack of methods that directly predict proteins with LDRs, we designed Super-fast predictor of proteins with Long Intrinsically DisordERed regions (SLIDER). SLIDER utilizes logistic regression that takes an empirically chosen set of numerical features, which consider selected physicochemical properties of amino acids, sequence complexity, and amino acid composition, as its inputs. Empirical tests show that SLIDER offers competitive predictive performance combined with low computational cost. It outperforms, by at least a modest margin, a comprehensive set of modern disorder predictors (that can indirectly predict LDRs) and is 16 times faster compared to the best currently available disorder predictor. Utilizing our time-efficient predictor, we characterized abundance and functional roles of proteins with LDRs over 110 eukaryotic proteomes. Similar to related studies, we found that eukaryotes have many (on average 30.3%) proteins with LDRs with majority of proteomes having between 25 and 40%, where higher abundance is characteristic to proteomes that have larger proteins. Our first-of-its-kind large-scale functional analysis shows that these proteins are enriched in a number of cellular functions and processes including certain binding events, regulation of catalytic activities, cellular component organization, biogenesis, biological regulation, and some metabolic and developmental processes. A webserver that implements SLIDER is available at http://biomine.ece.ualberta.ca/SLIDER/. Copyright © 2013 Wiley Periodicals, Inc.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Cashman, Derek J.; Zhu, Tuo; Simmerman, Richard F.
2014-08-01
The stromal domain (PsaC, PsaD, and PsaE) of photosystem I (PSI) reduces transiently bound ferredoxin (Fd) or flavodoxin. Experimental structures exist for all of these protein partners individually, but no experimental structure of the PSI/Fd or PSI/flavodoxin complexes is presently available. Molecular models of Fd docked onto the stromal domain of the cyanobacterial PSI site are constructed here utilizing X-ray and NMR structures of PSI and Fd, respectively. Moreover, predictions of potential protein-protein interaction regions are based on experimental site-directed mutagenesis and cross-linking studies to guide rigid body docking calculations of Fd into PSI, complemented by energy landscape theory tomore » bring together regions of high energetic frustration on each of the interacting proteins. Results identify two regions of high localized frustration on the surface of Fd that contain negatively charged Asp and Glu residues. Our study predicts that these regions interact predominantly with regions of high localized frustration on the PsaC, PsaD, and PsaE chains of PSI, which include several residues predicted by previous experimental studies.« less
Prediction of Water Binding to Protein Hydration Sites with a Discrete, Semiexplicit Solvent Model.
Setny, Piotr
2015-12-08
Buried water molecules are ubiquitous in protein structures and are found at the interface of most protein-ligand complexes. Determining their distribution and thermodynamic effect is a challenging yet important task, of great of practical value for the modeling of biomolecular structures and their interactions. In this study, we present a novel method aimed at the prediction of buried water molecules in protein structures and estimation of their binding free energies. It is based on a semiexplicit, discrete solvation model, which we previously introduced in the context of small molecule hydration. The method is applicable to all macromolecular structures described by a standard all-atom force field, and predicts complete solvent distribution within a single run with modest computational cost. We demonstrate that it indicates positions of buried hydration sites, including those filled by more than one water molecule, and accurately differentiates them from sterically accessible to water but void regions. The obtained estimates of water binding free energies are in fair agreement with reference results determined with the double decoupling method.
NASA Astrophysics Data System (ADS)
Keane, Harriet; Ryan, Brent J.; Jackson, Brendan; Whitmore, Alan; Wade-Martins, Richard
2015-11-01
Neurodegenerative diseases are complex multifactorial disorders characterised by the interplay of many dysregulated physiological processes. As an exemplar, Parkinson’s disease (PD) involves multiple perturbed cellular functions, including mitochondrial dysfunction and autophagic dysregulation in preferentially-sensitive dopamine neurons, a selective pathophysiology recapitulated in vitro using the neurotoxin MPP+. Here we explore a network science approach for the selection of therapeutic protein targets in the cellular MPP+ model. We hypothesised that analysis of protein-protein interaction networks modelling MPP+ toxicity could identify proteins critical for mediating MPP+ toxicity. Analysis of protein-protein interaction networks constructed to model the interplay of mitochondrial dysfunction and autophagic dysregulation (key aspects of MPP+ toxicity) enabled us to identify four proteins predicted to be key for MPP+ toxicity (P62, GABARAP, GBRL1 and GBRL2). Combined, but not individual, knockdown of these proteins increased cellular susceptibility to MPP+ toxicity. Conversely, combined, but not individual, over-expression of the network targets provided rescue of MPP+ toxicity associated with the formation of autophagosome-like structures. We also found that modulation of two distinct proteins in the protein-protein interaction network was necessary and sufficient to mitigate neurotoxicity. Together, these findings validate our network science approach to multi-target identification in complex neurological diseases.
PrionScan: an online database of predicted prion domains in complete proteomes.
Espinosa Angarica, Vladimir; Angulo, Alfonso; Giner, Arturo; Losilla, Guillermo; Ventura, Salvador; Sancho, Javier
2014-02-05
Prions are a particular type of amyloids related to a large variety of important processes in cells, but also responsible for serious diseases in mammals and humans. The number of experimentally characterized prions is still low and corresponds to a handful of examples in microorganisms and mammals. Prion aggregation is mediated by specific protein domains with a remarkable compositional bias towards glutamine/asparagine and against charged residues and prolines. These compositional features have been used to predict new prion proteins in the genomes of different organisms. Despite these efforts, there are only a few available data sources containing prion predictions at a genomic scale. Here we present PrionScan, a new database of predicted prion-like domains in complete proteomes. We have previously developed a predictive methodology to identify and score prionogenic stretches in protein sequences. In the present work, we exploit this approach to scan all the protein sequences in public databases and compile a repository containing relevant information of proteins bearing prion-like domains. The database is updated regularly alongside UniprotKB and in its present version contains approximately 28000 predictions in proteins from different functional categories in more than 3200 organisms from all the taxonomic subdivisions. PrionScan can be used in two different ways: database query and analysis of protein sequences submitted by the users. In the first mode, simple queries allow to retrieve a detailed description of the properties of a defined protein. Queries can also be combined to generate more complex and specific searching patterns. In the second mode, users can submit and analyze their own sequences. It is expected that this database would provide relevant insights on prion functions and regulation from a genome-wide perspective, allowing researches performing cross-species prion biology studies. Our database might also be useful for guiding experimentalists in the identification of new candidates for further experimental characterization.
Eukaryotic Protein Kinases (ePKs) of the Helminth Parasite Schistosoma mansoni
2011-01-01
Background Schistosomiasis remains an important parasitic disease and a major economic problem in many countries. The Schistosoma mansoni genome and predicted proteome sequences were recently published providing the opportunity to identify new drug candidates. Eukaryotic protein kinases (ePKs) play a central role in mediating signal transduction through complex networks and are considered druggable targets from the medical and chemical viewpoints. Our work aimed at analyzing the S. mansoni predicted proteome in order to identify and classify all ePKs of this parasite through combined computational approaches. Functional annotation was performed mainly to yield insights into the parasite signaling processes relevant to its complex lifestyle and to select some ePKs as potential drug targets. Results We have identified 252 ePKs, which corresponds to 1.9% of the S. mansoni predicted proteome, through sequence similarity searches using HMMs (Hidden Markov Models). Amino acid sequences corresponding to the conserved catalytic domain of ePKs were aligned by MAFFT and further used in distance-based phylogenetic analysis as implemented in PHYLIP. Our analysis also included the ePK homologs from six other eukaryotes. The results show that S. mansoni has proteins in all ePK groups. Most of them are clearly clustered with known ePKs in other eukaryotes according to the phylogenetic analysis. None of the ePKs are exclusively found in S. mansoni or belong to an expanded family in this parasite. Only 16 S. mansoni ePKs were experimentally studied, 12 proteins are predicted to be catalytically inactive and approximately 2% of the parasite ePKs remain unclassified. Some proteins were mentioned as good target for drug development since they have a predicted essential function for the parasite. Conclusions Our approach has improved the functional annotation of 40% of S. mansoni ePKs through combined similarity and phylogenetic-based approaches. As we continue this work, we will highlight the biochemical and physiological adaptations of S. mansoni in response to diverse environments during the parasite development, vector interaction, and host infection. PMID:21548963
NASA Astrophysics Data System (ADS)
Simon, Joseph R.; Carroll, Nick J.; Rubinstein, Michael; Chilkoti, Ashutosh; López, Gabriel P.
2017-06-01
Dynamic protein-rich intracellular structures that contain phase-separated intrinsically disordered proteins (IDPs) composed of sequences of low complexity (SLC) have been shown to serve a variety of important cellular functions, which include signalling, compartmentalization and stabilization. However, our understanding of these structures and our ability to synthesize models of them have been limited. We present design rules for IDPs possessing SLCs that phase separate into diverse assemblies within droplet microenvironments. Using theoretical analyses, we interpret the phase behaviour of archetypal IDP sequences and demonstrate the rational design of a vast library of multicomponent protein-rich structures that ranges from uniform nano-, meso- and microscale puncta (distinct protein droplets) to multilayered orthogonally phase-separated granular structures. The ability to predict and program IDP-rich assemblies in this fashion offers new insights into (1) genetic-to-molecular-to-macroscale relationships that encode hierarchical IDP assemblies, (2) design rules of such assemblies in cell biology and (3) molecular-level engineering of self-assembled recombinant IDP-rich materials.
Network biology discovers pathogen contact points in host protein-protein interactomes.
Ahmed, Hadia; Howton, T C; Sun, Yali; Weinberger, Natascha; Belkhadir, Youssef; Mukhtar, M Shahid
2018-06-13
In all organisms, major biological processes are controlled by complex protein-protein interactions networks (interactomes), yet their structural complexity presents major analytical challenges. Here, we integrate a compendium of over 4300 phenotypes with Arabidopsis interactome (AI-1 MAIN ). We show that nodes with high connectivity and betweenness are enriched and depleted in conditional and essential phenotypes, respectively. Such nodes are located in the innermost layers of AI-1 MAIN and are preferential targets of pathogen effectors. We extend these network-centric analyses to Cell Surface Interactome (CSI LRR ) and predict its 35 most influential nodes. To determine their biological relevance, we show that these proteins physically interact with pathogen effectors and modulate plant immunity. Overall, our findings contrast with centrality-lethality rule, discover fast information spreading nodes, and highlight the structural properties of pathogen targets in two different interactomes. Finally, this theoretical framework could possibly be applicable to other inter-species interactomes to reveal pathogen contact points.
Digestibility of gluten proteins is reduced by baking and enhanced by starch digestion
Pan, Xiaoyan; Bellido, Vincent; Toole, Geraldine A.; Gates, Fred K.; Wickham, Martin S. J.; Shewry, Peter R.; Bakalis, Serafim; Padfield, Philip; Mills, E. N. Clare
2015-01-01
Scope Resistance of proteins to gastrointestinal digestion may play a role in determining immune‐mediated adverse reactions to foods. However, digestion studies have largely been restricted to purified proteins and the impact of food processing and food matrices on protein digestibility is poorly understood. Methods and results Digestibility of a total gliadin fraction (TGF), flour (cv Hereward), and bread was assessed using in vitro batch digestion with simulated oral, gastric, and duodenal phases. Protein digestion was monitored by SDS‐PAGE and immunoblotting using monoclonal antibodies specific for celiac‐toxic sequences (QQSF, QPFP) and starch digestion by measuring undigested starch. Whereas the TGF was rapidly digested during the gastric phase the gluten proteins in bread were virtually undigested and digested rapidly during the duodenal phase only if amylase was included. Duodenal starch digestion was also slower in the absence of duodenal proteases. Conclusion The baking process reduces the digestibility of wheat gluten proteins, including those containing sequences active in celiac disease. Starch digestion affects the extent of protein digestion, probably because of gluten‐starch complex formation during baking. Digestion studies using purified protein fractions alone are therefore not predictive of digestion in complex food matrices. PMID:26202208
Investigation of the pH-dependence of dye-doped protein-protein interactions.
Nudelman, Roman; Gloukhikh, Ekaterina; Rekun, Antonina; Richter, Shachar
2016-11-01
Proteins can dramatically change their conformation under environmental conditions such as temperature and pH. In this context, Glycoprotein's conformational determination is challenging. This is due to the variety of domains which contain rich chemical characters existing within this complex. Here we demonstrate a new, straightforward and efficient technique that uses the pH-dependent properties of dyes-doped Pig Gastric Mucin (PGM) for predicting and controlling protein-protein interaction and conformation. We utilize the PGM as natural host matrix which is capable of dynamically changing its conformational shape and adsorbing hydrophobic and hydrophilic dyes under different pH conditions and investigate and control the fluorescent properties of these composites in solution. It is shown at various pH conditions, a large variety of light emission from these complexes such as red, green and white is obtained. This phenomenon is explained by pH-dependent protein folding and protein-protein interactions that induce different emission spectra which are mediated and controlled by means of dye-dye interactions and surrounding environment. This process is used to form the technologically challenging white light-emitting liquid or solid coating for LED devices. © 2016 The Protein Society.
Vibrational Dynamics of Biological Molecules: Multi-quantum Contributions
Leu, Bogdan M.; Timothy Sage, J.; Zgierski, Marek Z.; Wyllie, Graeme R. A.; Ellison, Mary K.; Robert Scheidt, W.; Sturhahn, Wolfgang; Ercan Alp, E.; Durbin, Stephen M.
2006-01-01
High-resolution X-ray measurements near a nuclear resonance reveal the complete vibrational spectrum of the probe nucleus. Because of this, nuclear resonance vibrational spectroscopy (NRVS) is a uniquely quantitative probe of the vibrational dynamics of reactive iron sites in proteins and other complex molecules. Our measurements of vibrational fundamentals have revealed both frequencies and amplitudes of 57Fe vibrations in proteins and model compounds. Information on the direction of Fe motion has also been obtained from measurements on oriented single crystals, and provides an essential test of normal mode predictions. Here, we report the observation of weaker two-quantum vibrational excitations (overtones and combinations) for compounds that mimic the active site of heme proteins. The predicted intensities depend strongly on the direction of Fe motion. We compare the observed features with predictions based on the observed fundamentals, using information on the direction of Fe motion obtained either from DFT predictions or from single crystal measurements. Two-quantum excitations may become a useful tool to identify the directions of the Fe oscillations when single crystals are not available. PMID:16894397
A Large-Scale Assessment of Nucleic Acids Binding Site Prediction Programs
Miao, Zhichao; Westhof, Eric
2015-01-01
Computational prediction of nucleic acid binding sites in proteins are necessary to disentangle functional mechanisms in most biological processes and to explore the binding mechanisms. Several strategies have been proposed, but the state-of-the-art approaches display a great diversity in i) the definition of nucleic acid binding sites; ii) the training and test datasets; iii) the algorithmic methods for the prediction strategies; iv) the performance measures and v) the distribution and availability of the prediction programs. Here we report a large-scale assessment of 19 web servers and 3 stand-alone programs on 41 datasets including more than 5000 proteins derived from 3D structures of protein-nucleic acid complexes. Well-defined binary assessment criteria (specificity, sensitivity, precision, accuracy…) are applied. We found that i) the tools have been greatly improved over the years; ii) some of the approaches suffer from theoretical defects and there is still room for sorting out the essential mechanisms of binding; iii) RNA binding and DNA binding appear to follow similar driving forces and iv) dataset bias may exist in some methods. PMID:26681179
Prediction of scaffold proteins based on protein interaction and domain architectures.
Oh, Kimin; Yi, Gwan-Su
2016-07-28
Scaffold proteins are known for being crucial regulators of various cellular functions by assembling multiple proteins involved in signaling and metabolic pathways. Identification of scaffold proteins and the study of their molecular mechanisms can open a new aspect of cellular systemic regulation and the results can be applied in the field of medicine and engineering. Despite being highlighted as the regulatory roles of dozens of scaffold proteins, there was only one known computational approach carried out so far to find scaffold proteins from interactomes. However, there were limitations in finding diverse types of scaffold proteins because their criteria were restricted to the classical scaffold proteins. In this paper, we will suggest a systematic approach to predict massive scaffold proteins from interactomes and to characterize the roles of scaffold proteins comprehensively. From a total of 10,419 basic scaffold protein candidates in protein interactomes, we classified them into three classes according to the structural evidences for scaffolding, such as domain architectures, domain interactions and protein complexes. Finally, we could define 2716 highly reliable scaffold protein candidates and their characterized functional features. To assess the accuracy of our prediction, the gold standard positive and negative data sets were constructed. We prepared 158 gold standard positive data and 844 gold standard negative data based on the functional information from Gene Ontology consortium. The precision, sensitivity and specificity of our testing was 80.3, 51.0, and 98.5 % respectively. Through the function enrichment analysis of highly reliable scaffold proteins, we could confirm the significantly enriched functions that are related to scaffold protein binding. We also identified functional association between scaffold proteins and their recruited proteins. Furthermore, we checked that the disease association of scaffold proteins is higher than kinases. In conclusion, we could predict larger volume of scaffold proteins and analyzed their functional characteristics. Deeper understandings about the roles of scaffold proteins from this study will provide a higher opportunity to find therapeutic or engineering applications of scaffold proteins using their functional characteristics.
Optimizing energy functions for protein-protein interface design.
Sharabi, Oz; Yanover, Chen; Dekel, Ayelet; Shifman, Julia M
2011-01-15
Protein design methods have been originally developed for the design of monomeric proteins. When applied to the more challenging task of protein–protein complex design, these methods yield suboptimal results. In particular, they often fail to recapitulate favorable hydrogen bonds and electrostatic interactions across the interface. In this work, we aim to improve the energy function of the protein design program ORBIT to better account for binding interactions between proteins. By using the advanced machine learning framework of conditional random fields, we optimize the relative importance of all the terms in the energy function, attempting to reproduce the native side-chain conformations in protein–protein interfaces. We evaluate the performance of several optimized energy functions, each describes the van der Waals interactions using a different potential. In comparison with the original energy function, our best energy function (a) incorporates a much “softer” repulsive van der Waals potential, suitable for the discrete rotameric representation of amino acid side chains; (b) does not penalize burial of polar atoms, reflecting the frequent occurrence of polar buried residues in protein–protein interfaces; and (c) significantly up-weights the electrostatic term, attesting to the high importance of these interactions for protein–protein complex formation. Using this energy function considerably improves side chain placement accuracy for interface residues in a large test set of protein–protein complexes. Moreover, the optimized energy function recovers the native sequences of protein–protein interface at a higher rate than the default function and performs substantially better in predicting changes in free energy of binding due to mutations.
Junaid, Muhammad; Kaushik, Aman Chandra; Ali, Arif; Ali, Syed Shujait; Mehmood, Aamir; Wei, Dong-Qing
2018-01-01
High-risk human papillomaviruses (hrHPVs) are the most prevalent viruses in human diseases including cervical cancers. Expression of E6 protein has already been reported in cervical cancer cases, excluding normal tissues. Continuous expression of E6 protein is making it ideal to develop therapeutic vaccines against hrHPVs infection and cervical cancer. Therefore, we carried out a meta-analysis of multiple hrHPVs to predict the most potential prophylactic peptide vaccines. In this study, immunoinformatics approach was employed to predict antigenic epitopes of hrHPVs E6 proteins restricted to 12 Human HLAs to aid the development of peptide vaccines against hrHPVs. Conformational B-cell and CTL epitopes were predicted for hrHPVs E6 proteins using ElliPro and NetCTL. The potential of the predicted peptides were tested and validated by using systems biology approach considering experimental concentration. We also investigated the binding interactions of the antigenic CTL epitopes by using docking. The stability of the resulting peptide-MHC I complexes was further studied by molecular dynamics simulations. The simulation results highlighted the regions from 46–62 and 65–76 that could be the first choice for the development of prophylactic peptide vaccines against hrHPVs. To overcome the worldwide distribution, the predicted epitopes restricted to different HLAs could cover most of the vaccination and would help to explore the possibility of these epitopes for adaptive immunotherapy against HPVs infections. PMID:29715318
Hayat, Maqsood; Khan, Asifullah
2013-05-01
Membrane protein is the prime constituent of a cell, which performs a role of mediator between intra and extracellular processes. The prediction of transmembrane (TM) helix and its topology provides essential information regarding the function and structure of membrane proteins. However, prediction of TM helix and its topology is a challenging issue in bioinformatics and computational biology due to experimental complexities and lack of its established structures. Therefore, the location and orientation of TM helix segments are predicted from topogenic sequences. In this regard, we propose WRF-TMH model for effectively predicting TM helix segments. In this model, information is extracted from membrane protein sequences using compositional index and physicochemical properties. The redundant and irrelevant features are eliminated through singular value decomposition. The selected features provided by these feature extraction strategies are then fused to develop a hybrid model. Weighted random forest is adopted as a classification approach. We have used two benchmark datasets including low and high-resolution datasets. tenfold cross validation is employed to assess the performance of WRF-TMH model at different levels including per protein, per segment, and per residue. The success rates of WRF-TMH model are quite promising and are the best reported so far on the same datasets. It is observed that WRF-TMH model might play a substantial role, and will provide essential information for further structural and functional studies on membrane proteins. The accompanied web predictor is accessible at http://111.68.99.218/WRF-TMH/ .
Quantitative analyses of bifunctional molecules.
Braun, Patrick D; Wandless, Thomas J
2004-05-11
Small molecules can be discovered or engineered to bind tightly to biologically relevant proteins, and these molecules have proven to be powerful tools for both basic research and therapeutic applications. In many cases, detailed biophysical analyses of the intermolecular binding events are essential for improving the activity of the small molecules. These interactions can often be characterized as straightforward bimolecular binding events, and a variety of experimental and analytical techniques have been developed and refined to facilitate these analyses. Several investigators have recently synthesized heterodimeric molecules that are designed to bind simultaneously with two different proteins to form ternary complexes. These heterodimeric molecules often display compelling biological activity; however, they are difficult to characterize. The bimolecular interaction between one protein and the heterodimeric ligand (primary dissociation constant) can be determined by a number of methods. However, the interaction between that protein-ligand complex and the second protein (secondary dissociation constant) is more difficult to measure due to the noncovalent nature of the original protein-ligand complex. Consequently, these heterodimeric compounds are often characterized in terms of their activity, which is an experimentally dependent metric. We have developed a general quantitative mathematical model that can be used to measure both the primary (protein + ligand) and secondary (protein-ligand + protein) dissociation constants for heterodimeric small molecules. These values are largely independent of the experimental technique used and furthermore provide a direct measure of the thermodynamic stability of the ternary complexes that are formed. Fluorescence polarization and this model were used to characterize the heterodimeric molecule, SLFpYEEI, which binds to both FKBP12 and the Fyn SH2 domain, demonstrating that the model is useful for both predictive as well as ex post facto analytical applications.
Identification of human microRNA targets from isolated argonaute protein complexes.
Beitzinger, Michaela; Peters, Lasse; Zhu, Jia Yun; Kremmer, Elisabeth; Meister, Gunter
2007-06-01
MicroRNAs (miRNAs) constitute a class of small non-coding RNAs that regulate gene expression on the level of translation and/or mRNA stability. Mammalian miRNAs associate with members of the Argonaute (Ago) protein family and bind to partially complementary sequences in the 3' untranslated region (UTR) of specific target mRNAs. Computer algorithms based on factors such as free binding energy or sequence conservation have been used to predict miRNA target mRNAs. Based on such predictions, up to one third of all mammalian mRNAs seem to be under miRNA regulation. However, due to the low degree of complementarity between the miRNA and its target, such computer programs are often imprecise and therefore not very reliable. Here we report the first biochemical identification approach of miRNA targets from human cells. Using highly specific monoclonal antibodies against members of the Ago protein family, we co-immunoprecipitate Ago-bound mRNAs and identify them by cloning. Interestingly, most of the identified targets are also predicted by different computer programs. Moreover, we randomly analyzed six different target candidates and were able to experimentally validate five as miRNA targets. Our data clearly indicate that miRNA targets can be experimentally identified from Ago complexes and therefore provide a new tool to directly analyze miRNA function.
Davis, M P; Brooks, M A; Kerley, M S
2016-04-01
Rate of oxygen uptake by muscle mitochondria and respiratory chain protein concentrations differed between high- and low-residual feed intake (RFI) animals. The hypothesis of this research was that complex I (CI), II (CII), and III (CIII) mitochondria protein concentrations in lymphocyte (blood) mitochondria were related to the RFI phenotype of beef steers. Daily feed intake (ADFI) was individually recorded for 92 Hereford-crossbreed steers over 63 d using GrowSafe individual feed intake system. Predicted ADFI was calculated as the regression of ADFI on ADG and midtest BW. Difference between ADFI and predicted ADFI was RFI. Lymphocytes were isolated from low-RFI (-1.32 ± 0.11 kg/d; = 10) and high-RFI (1.34 ± 0.18 kg/d; = 8) steers. Immunocapture of CI, CII, and CIII proteins from the lymphocyte was done using MitoProfile CI, CII, and CIII immunocapture kits (MitoSciences Inc., Eugene, OR). Protein concentrations of CI, CII, and CIII and total protein were quantified using bicinchoninic acid colorimetric procedures. Low-RFI steers consumed 30% less ( = 0.0004) feed and had a 40% improvement ( < 0.0001) in feed efficiency compared with high-RFI steers with similar growth ( = 0.78) and weight measurements ( > 0.65). High- and low-RFI steers did not differ in CI ( = 0.22), CII ( = 0.69), and CIII ( = 0.59) protein concentrations. The protein concentration ratios for CI to CII ( = 0.03) were 20% higher and the ratios of CI to CIII ( = 0.01) were 30% higher, but the ratios of CII to CIII ( = 0.89) did not differ when comparing low-RFI steers with high-RFI steers. The similar magnitude difference in feed intake, feed efficiency measurements, and CI-to-CIII ratio between RFI phenotypes provides a plausible explanation for differences between the phenotypes. We also concluded that mitochondria isolated from lymphocytes could be used to study respiratory chain differences among differing RFI phenotypes. Further research is needed to determine if lymphocyte mitochondrial complex proteins can be used for identification of RFI phenotype.
Structure and Sequence Search on Aptamer-Protein Docking
NASA Astrophysics Data System (ADS)
Xiao, Jiajie; Bonin, Keith; Guthold, Martin; Salsbury, Freddie
2015-03-01
Interactions between proteins and deoxyribonucleic acid (DNA) play a significant role in the living systems, especially through gene regulation. However, short nucleic acids sequences (aptamers) with specific binding affinity to specific proteins exhibit clinical potential as therapeutics. Our capillary and gel electrophoresis selection experiments show that specific sequences of aptamers can be selected that bind specific proteins. Computationally, given the experimentally-determined structure and sequence of a thrombin-binding aptamer, we can successfully dock the aptamer onto thrombin in agreement with experimental structures of the complex. In order to further study the conformational flexibility of this thrombin-binding aptamer and to potentially develop a predictive computational model of aptamer-binding, we use GPU-enabled molecular dynamics simulations to both examine the conformational flexibility of the aptamer in the absence of binding to thrombin, and to determine our ability to fold an aptamer. This study should help further de-novo predictions of aptamer sequences by enabling the study of structural and sequence-dependent effects on aptamer-protein docking specificity.
Huber, Roland G.; Bond, Peter J.
2017-01-01
An improved knowledge of protein-protein interactions is essential for better understanding of metabolic and signaling networks, and cellular function. Progress tends to be based on structure determination and predictions using known structures, along with computational methods based on evolutionary information or detailed atomistic descriptions. We hypothesized that for the case of interactions across a common interface, between proteins from a pair of paralogue families or within a family of paralogues, a relatively simple interface description could distinguish between binding and non-binding pairs. Using binding data for several systems, and large-scale comparative modeling based on known template complex structures, it is found that charge-charge interactions (for groups bearing net charge) are generally a better discriminant than buried non-polar surface. This is particularly the case for paralogue families that are less divergent, with more reliable comparative modeling. We suggest that electrostatic interactions are major determinants of specificity in such systems, an observation that could be used to predict binding partners. PMID:29016650
Wise, C A; Chiang, L C; Paznekas, W A; Sharma, M; Musy, M M; Ashley, J A; Lovett, M; Jabs, E W
1997-04-01
Treacher Collins Syndrome (TCS) is the most common of the human mandibulofacial dysostosis disorders. Recently, a partial TCOF1 cDNA was identified and shown to contain mutations in TCS families. Here we present the entire exon/intron genomic structure and the complete coding sequence of TCOF1. TCOF1 encodes a low complexity protein of 1,411 amino acids, whose predicted protein structure reveals repeated motifs that mirror the organization of its exons. These motifs are shared with nucleolar trafficking proteins in other species and are predicted to be highly phosphorylated by casein kinase. Consistent with this, the full-length TCOF1 protein sequence also contains putative nuclear and nucleolar localization signals. Throughout the open reading frame, we detected an additional eight mutations in TCS families and several polymorphisms. We postulate that TCS results from defects in a nucleolar trafficking protein that is critically required during human craniofacial development.
Wise, Carol A.; Chiang, Lydia C.; Paznekas, William A.; Sharma, Mridula; Musy, Maurice M.; Ashley, Jennifer A.; Lovett, Michael; Jabs, Ethylin W.
1997-01-01
Treacher Collins Syndrome (TCS) is the most common of the human mandibulofacial dysostosis disorders. Recently, a partial TCOF1 cDNA was identified and shown to contain mutations in TCS families. Here we present the entire exon/intron genomic structure and the complete coding sequence of TCOF1. TCOF1 encodes a low complexity protein of 1,411 amino acids, whose predicted protein structure reveals repeated motifs that mirror the organization of its exons. These motifs are shared with nucleolar trafficking proteins in other species and are predicted to be highly phosphorylated by casein kinase. Consistent with this, the full-length TCOF1 protein sequence also contains putative nuclear and nucleolar localization signals. Throughout the open reading frame, we detected an additional eight mutations in TCS families and several polymorphisms. We postulate that TCS results from defects in a nucleolar trafficking protein that is critically required during human craniofacial development. PMID:9096354
Ivanov, Stefan M; Cawley, Andrew; Huber, Roland G; Bond, Peter J; Warwicker, Jim
2017-01-01
An improved knowledge of protein-protein interactions is essential for better understanding of metabolic and signaling networks, and cellular function. Progress tends to be based on structure determination and predictions using known structures, along with computational methods based on evolutionary information or detailed atomistic descriptions. We hypothesized that for the case of interactions across a common interface, between proteins from a pair of paralogue families or within a family of paralogues, a relatively simple interface description could distinguish between binding and non-binding pairs. Using binding data for several systems, and large-scale comparative modeling based on known template complex structures, it is found that charge-charge interactions (for groups bearing net charge) are generally a better discriminant than buried non-polar surface. This is particularly the case for paralogue families that are less divergent, with more reliable comparative modeling. We suggest that electrostatic interactions are major determinants of specificity in such systems, an observation that could be used to predict binding partners.
QSAR models for prediction of chromatographic behavior of homologous Fab variants.
Robinson, Julie R; Karkov, Hanne S; Woo, James A; Krogh, Berit O; Cramer, Steven M
2017-06-01
While quantitative structure activity relationship (QSAR) models have been employed successfully for the prediction of small model protein chromatographic behavior, there have been few reports to date on the use of this methodology for larger, more complex proteins. Recently our group generated focused libraries of antibody Fab fragment variants with different combinations of surface hydrophobicities and electrostatic potentials, and demonstrated that the unique selectivities of multimodal resins can be exploited to separate these Fab variants. In this work, results from linear salt gradient experiments with these Fabs were employed to develop QSAR models for six chromatographic systems, including multimodal (Capto MMC, Nuvia cPrime, and two novel ligand prototypes), hydrophobic interaction chromatography (HIC; Capto Phenyl), and cation exchange (CEX; CM Sepharose FF) resins. The models utilized newly developed "local descriptors" to quantify changes around point mutations in the Fab libraries as well as novel cluster descriptors recently introduced by our group. Subsequent rounds of feature selection and linearized machine learning algorithms were used to generate robust, well-validated models with high training set correlations (R 2 > 0.70) that were well suited for predicting elution salt concentrations in the various systems. The developed models then were used to predict the retention of a deamidated Fab and isotype variants, with varying success. The results represent the first successful utilization of QSAR for the prediction of chromatographic behavior of complex proteins such as Fab fragments in multimodal chromatographic systems. The framework presented here can be employed to facilitate process development for the purification of biological products from product-related impurities by in silico screening of resin alternatives. Biotechnol. Bioeng. 2017;114: 1231-1240. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
Luštrek, Mitja; Lorenz, Peter; Kreutzer, Michael; Qian, Zilliang; Steinbeck, Felix; Wu, Di; Born, Nadine; Ziems, Bjoern; Hecker, Michael; Blank, Miri; Shoenfeld, Yehuda; Cao, Zhiwei; Glocker, Michael O; Li, Yixue; Fuellen, Georg; Thiesen, Hans-Jürgen
2013-01-01
Epitope-antibody-reactivities (EAR) of intravenous immunoglobulins (IVIGs) determined for 75,534 peptides by microarray analysis demonstrate that roughly 9% of peptides derived from 870 different human protein sequences react with antibodies present in IVIG. Computational prediction of linear B cell epitopes was conducted using machine learning with an ensemble of classifiers in combination with position weight matrix (PWM) analysis. Machine learning slightly outperformed PWM with area under the curve (AUC) of 0.884 vs. 0.849. Two different types of epitope-antibody recognition-modes (Type I EAR and Type II EAR) were found. Peptides of Type I EAR are high in tyrosine, tryptophan and phenylalanine, and low in asparagine, glutamine and glutamic acid residues, whereas for peptides of Type II EAR it is the other way around. Representative crystal structures present in the Protein Data Bank (PDB) of Type I EAR are PDB 1TZI and PDB 2DD8, while PDB 2FD6 and 2J4W are typical for Type II EAR. Type I EAR peptides share predicted propensities for being presented by MHC class I and class II complexes. The latter interaction possibly favors T cell-dependent antibody responses including IgG class switching. Peptides of Type II EAR are predicted not to be preferentially presented by MHC complexes, thus implying the involvement of T cell-independent IgG class switch mechanisms. The high extent of IgG immunoglobulin reactivity with human peptides implies that circulating IgG molecules are prone to bind to human protein/peptide structures under non-pathological, non-inflammatory conditions. A webserver for predicting EAR of peptide sequences is available at www.sysmed-immun.eu/EAR.
GPS-ARM: Computational Analysis of the APC/C Recognition Motif by Predicting D-Boxes and KEN-Boxes
Ren, Jian; Cao, Jun; Zhou, Yanhong; Yang, Qing; Xue, Yu
2012-01-01
Anaphase-promoting complex/cyclosome (APC/C), an E3 ubiquitin ligase incorporated with Cdh1 and/or Cdc20 recognizes and interacts with specific substrates, and faithfully orchestrates the proper cell cycle events by targeting proteins for proteasomal degradation. Experimental identification of APC/C substrates is largely dependent on the discovery of APC/C recognition motifs, e.g., the D-box and KEN-box. Although a number of either stringent or loosely defined motifs proposed, these motif patterns are only of limited use due to their insufficient powers of prediction. We report the development of a novel GPS-ARM software package which is useful for the prediction of D-boxes and KEN-boxes in proteins. Using experimentally identified D-boxes and KEN-boxes as the training data sets, a previously developed GPS (Group-based Prediction System) algorithm was adopted. By extensive evaluation and comparison, the GPS-ARM performance was found to be much better than the one using simple motifs. With this powerful tool, we predicted 4,841 potential D-boxes in 3,832 proteins and 1,632 potential KEN-boxes in 1,403 proteins from H. sapiens, while further statistical analysis suggested that both the D-box and KEN-box proteins are involved in a broad spectrum of biological processes beyond the cell cycle. In addition, with the co-localization information, we predicted hundreds of mitosis-specific APC/C substrates with high confidence. As the first computational tool for the prediction of APC/C-mediated degradation, GPS-ARM is a useful tool for information to be used in further experimental investigations. The GPS-ARM is freely accessible for academic researchers at: http://arm.biocuckoo.org. PMID:22479614
Zhang, Lina; Zhang, Chengjin; Gao, Rui; Yang, Runtao
2015-09-09
Bacteriophage virion proteins and non-virion proteins have distinct functions in biological processes, such as specificity determination for host bacteria, bacteriophage replication and transcription. Accurate identification of bacteriophage virion proteins from bacteriophage protein sequences is significant to understand the complex virulence mechanism in host bacteria and the influence of bacteriophages on the development of antibacterial drugs. In this study, an ensemble method for bacteriophage virion protein prediction from bacteriophage protein sequences is put forward with hybrid feature spaces incorporating CTD (composition, transition and distribution), bi-profile Bayes, PseAAC (pseudo-amino acid composition) and PSSM (position-specific scoring matrix). When performing on the training dataset 10-fold cross-validation, the presented method achieves a satisfactory prediction result with a sensitivity of 0.870, a specificity of 0.830, an accuracy of 0.850 and Matthew's correlation coefficient (MCC) of 0.701, respectively. To evaluate the prediction performance objectively, an independent testing dataset is used to evaluate the proposed method. Encouragingly, our proposed method performs better than previous studies with a sensitivity of 0.853, a specificity of 0.815, an accuracy of 0.831 and MCC of 0.662 on the independent testing dataset. These results suggest that the proposed method can be a potential candidate for bacteriophage virion protein prediction, which may provide a useful tool to find novel antibacterial drugs and to understand the relationship between bacteriophage and host bacteria. For the convenience of the vast majority of experimental Int. J. Mol. Sci. 2015, 16,21735 scientists, a user-friendly and publicly-accessible web-server for the proposed ensemble method is established.
Theory and Normal Mode Analysis of Change in Protein Vibrational Dynamics on Ligand Binding
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mortisugu, Kei; Njunda, Brigitte; Smith, Jeremy C
2009-12-01
The change of protein vibrations on ligand binding is of functional and thermodynamic importance. Here, this process is characterized using a simple analytical 'ball-and-spring' model and all-atom normal-mode analysis (NMA) of the binding of the cancer drug, methotrexate (MTX) to its target, dihydrofolate reductase (DHFR). The analytical model predicts that the coupling between protein vibrations and ligand external motion generates entropy-rich, low-frequency vibrations in the complex. This is consistent with the atomistic NMA which reveals vibrational softening in forming the DHFR-MTX complex, a result also in qualitative agreement with neutron-scattering experiments. Energy minimization of the atomistic bound-state (B) structure whilemore » gradually decreasing the ligand interaction to zero allows the generation of a hypothetical 'intermediate' (I) state, without the ligand force field but with a structure similar to that of B. In going from I to B, it is found that the vibrational entropies of both the protein and MTX decrease while the complex structure becomes enthalpically stabilized. However, the relatively weak DHFR:MTX interaction energy results in the net entropy gain arising from coupling between the protein and MTX external motion being larger than the loss of vibrational entropy on complex formation. This, together with the I structure being more flexible than the unbound structure, results in the observed vibrational softening on ligand binding.« less
Löhner, Alexander; Cogdell, Richard
2018-01-01
As the electronic energies of the chromophores in a pigment–protein complex are imposed by the geometrical structure of the protein, this allows the spectral information obtained to be compared with predictions derived from structural models. Thereby, the single-molecule approach is particularly suited for the elucidation of specific, distinctive spectral features that are key for a particular model structure, and that would not be observable in ensemble-averaged spectra due to the heterogeneity of the biological objects. In this concise review, we illustrate with the example of the light-harvesting complexes from photosynthetic purple bacteria how results from low-temperature single-molecule spectroscopy can be used to discriminate between different structural models. Thereby the low-temperature approach provides two advantages: (i) owing to the negligible photobleaching, very long observation times become possible, and more importantly, (ii) at cryogenic temperatures, vibrational degrees of freedom are frozen out, leading to sharper spectral features and in turn to better resolved spectra. PMID:29321265
Lebon, Sophie; Minai, Limor; Chretien, Dominique; Corcos, Johanna; Serre, Valérie; Kadhom, Noman; Steffann, Julie; Pauchard, Jean-Yves; Munnich, Arnold; Bonnefont, Jean-Paul; Rötig, Agnès
2007-01-01
Complex I deficiency is a frequent cause of mitochondrial disease as it accounts for one third of these disorders. By genotyping several putative disease loci using microsatellite markers we were able to describe a new NDUFS7 mutation in a consanguineous family with Leigh syndrome and isolated complex I deficiency. This mutation lies in the first intron of the NDUFS7 gene (c.17-1167 C>G) and creates a strong donor splice site resulting in the generation of a cryptic exon. This mutation is predicted to result in a shortened mutant protein of 41 instead of 213 amino acids containing only the first five amino acids of the normal protein. Analysis of the assembly state of the respiratory chain complexes under native condition revealed a marked decrease of fully assembled complex I while the quantity of the other complexes was not altered. These results report the first intronic NDUFS7 gene mutation and demonstrate the crucial role of NDUFS7 in the biogenesis of complex I.
2012-01-01
Background Existing methods for predicting protein solubility on overexpression in Escherichia coli advance performance by using ensemble classifiers such as two-stage support vector machine (SVM) based classifiers and a number of feature types such as physicochemical properties, amino acid and dipeptide composition, accompanied with feature selection. It is desirable to develop a simple and easily interpretable method for predicting protein solubility, compared to existing complex SVM-based methods. Results This study proposes a novel scoring card method (SCM) by using dipeptide composition only to estimate solubility scores of sequences for predicting protein solubility. SCM calculates the propensities of 400 individual dipeptides to be soluble using statistic discrimination between soluble and insoluble proteins of a training data set. Consequently, the propensity scores of all dipeptides are further optimized using an intelligent genetic algorithm. The solubility score of a sequence is determined by the weighted sum of all propensity scores and dipeptide composition. To evaluate SCM by performance comparisons, four data sets with different sizes and variation degrees of experimental conditions were used. The results show that the simple method SCM with interpretable propensities of dipeptides has promising performance, compared with existing SVM-based ensemble methods with a number of feature types. Furthermore, the propensities of dipeptides and solubility scores of sequences can provide insights to protein solubility. For example, the analysis of dipeptide scores shows high propensity of α-helix structure and thermophilic proteins to be soluble. Conclusions The propensities of individual dipeptides to be soluble are varied for proteins under altered experimental conditions. For accurately predicting protein solubility using SCM, it is better to customize the score card of dipeptide propensities by using a training data set under the same specified experimental conditions. The proposed method SCM with solubility scores and dipeptide propensities can be easily applied to the protein function prediction problems that dipeptide composition features play an important role. Availability The used datasets, source codes of SCM, and supplementary files are available at http://iclab.life.nctu.edu.tw/SCM/. PMID:23282103
Garcia, Nelson; Messing, Joachim
2017-01-01
The TEL2, TTI1, and TTI2 proteins are co-chaperones for heat shock protein 90 (HSP90) to regulate the protein folding and maturation of phosphatidylinositol 3-kinase-related kinases (PIKKs). Referred to as the TTT complex, the genes that encode them are highly conserved from man to maize. TTT complex and PIKK genes exist mostly as single copy genes in organisms where they have been characterized. Members of this interacting protein network in maize were identified and synteny analyses were performed to study their evolution. Similar to other species, there is only one copy of each of these genes in maize which was due to a loss of the duplicated copy created by ancient allotetraploidy. Moreover, the retained copies of the TTT complex and the PIKK genes tolerated extensive retrotransposon insertion in their introns that resulted in increased gene lengths and gene body methylation, without apparent effect in normal gene expression and function. The results raise an interesting question on whether the reversion to single copy was due to selection against deleterious unbalanced gene duplications between members of the complex as predicted by the gene balance hypothesis, or due to neutral loss of extra copies. Uneven alteration of dosage either by adding extra copies or modulating gene expression of complex members is being proposed as a means to investigate whether the data supports the gene balance hypothesis or not.
Global Analysis of Yeast Endosomal Transport Identifies the Vps55/68 Sorting Complex
Schluter, Cayetana; Lam, Karen K.Y.; Brumm, Jochen; Wu, Bella W.; Saunders, Matthew; Stevens, Tom H.
2008-01-01
Endosomal transport is critical for cellular processes ranging from receptor down-regulation and retroviral budding to the immune response. A full understanding of endosome sorting requires a comprehensive picture of the multiprotein complexes that orchestrate vesicle formation and fusion. Here, we use unsupervised, large-scale phenotypic analysis and a novel computational approach for the global identification of endosomal transport factors. This technique effectively identifies components of known and novel protein assemblies. We report the characterization of a previously undescribed endosome sorting complex that contains two well-conserved proteins with four predicted membrane-spanning domains. Vps55p and Vps68p form a complex that acts with or downstream of ESCRT function to regulate endosomal trafficking. Loss of Vps68p disrupts recycling to the TGN as well as onward trafficking to the vacuole without preventing the formation of lumenal vesicles within the MVB. Our results suggest the Vps55/68 complex mediates a novel, conserved step in the endosomal maturation process. PMID:18216282
Rapid and accurate prediction and scoring of water molecules in protein binding sites.
Ross, Gregory A; Morris, Garrett M; Biggin, Philip C
2012-01-01
Water plays a critical role in ligand-protein interactions. However, it is still challenging to predict accurately not only where water molecules prefer to bind, but also which of those water molecules might be displaceable. The latter is often seen as a route to optimizing affinity of potential drug candidates. Using a protocol we call WaterDock, we show that the freely available AutoDock Vina tool can be used to predict accurately the binding sites of water molecules. WaterDock was validated using data from X-ray crystallography, neutron diffraction and molecular dynamics simulations and correctly predicted 97% of the water molecules in the test set. In addition, we combined data-mining, heuristic and machine learning techniques to develop probabilistic water molecule classifiers. When applied to WaterDock predictions in the Astex Diverse Set of protein ligand complexes, we could identify whether a water molecule was conserved or displaced to an accuracy of 75%. A second model predicted whether water molecules were displaced by polar groups or by non-polar groups to an accuracy of 80%. These results should prove useful for anyone wishing to undertake rational design of new compounds where the displacement of water molecules is being considered as a route to improved affinity.
1986-10-01
organic acids using the Hammett equation , has been called the hydrophobic effect.’ Water adjusts its geometry to maximize the number of intact hydrogen...understanding both structural stability with respect to the underlying equations (not initial values) and phase transitions in these dynamical hierarchies...for quantitative characterization. Although the complicated behavior is gen- erated by deterministic equations , its description in entropies leads to
Serrano León, Esteban; Coat, Rémy; Moutel, Benjamin; Pruvost, Jérémy; Legrand, Jack; Gonçalves, Olivier
2014-11-01
Absolute concentrations of total macromolecules (triglycerides, proteins and carbohydrates) in microorganisms can be rapidly measured by FTIR spectroscopy, but caution is needed to avoid non-specific experimental bias. Here, we assess the limits within which this approach can be used on model solutions of macromolecules of interest. We used the Bruker HTSXT-FTIR system. Our results show that the solid deposits obtained after the sampling procedure present physical and chemical properties that influence the quality of the absolute concentration prediction models (univariate and multivariate). The accuracy of the models was degraded by a factor of 2 or 3 outside the recommended concentration interval of 0.5-35 µg spot(-1). Change occurred notably in the sample hydrogen bond network, which could, however, be controlled using an internal probe (pseudohalide anion). We also demonstrate that for aqueous solutions, accurate prediction of total carbohydrate quantities (in glucose equivalent) could not be made unless a constant amount of protein was added to the model solution (BSA). The results of the prediction model for more complex solutions, here with two components: glucose and BSA, were very encouraging, suggesting that this FTIR approach could be used as a rapid quantification method for mixtures of molecules of interest, provided the limits of use of the HTSXT-FTIR method are precisely known and respected. This last finding opens the way to direct quantification of total molecules of interest in more complex matrices.
oGNM: online computation of structural dynamics using the Gaussian Network Model
Yang, Lee-Wei; Rader, A. J.; Liu, Xiong; Jursa, Cristopher Jon; Chen, Shann Ching; Karimi, Hassan A.; Bahar, Ivet
2006-01-01
An assessment of the equilibrium dynamics of biomolecular systems, and in particular their most cooperative fluctuations accessible under native state conditions, is a first step towards understanding molecular mechanisms relevant to biological function. We present a web-based system, oGNM that enables users to calculate online the shape and dispersion of normal modes of motion for proteins, oligonucleotides and their complexes, or associated biological units, using the Gaussian Network Model (GNM). Computations with the new engine are 5–6 orders of magnitude faster than those using conventional normal mode analyses. Two cases studies illustrate the utility of oGNM. The first shows that the thermal fluctuations predicted for 1250 non-homologous proteins correlate well with X-ray crystallographic data over a broad range [7.3–15 Å] of inter-residue interaction cutoff distances and the correlations improve with increasing observation temperatures. The second study, focused on 64 oligonucleotides and oligonucleotide–protein complexes, shows that good agreement with experiments is achieved by representing each nucleotide by three GNM nodes (as opposed to one-node-per-residue in proteins) along with uniform interaction ranges for all components of the complexes. These results open the way to a rapid assessment of the dynamics of DNA/RNA-containing complexes. The server can be accessed at . PMID:16845002
Proteins evolve on the edge of supramolecular self-assembly.
Garcia-Seisdedos, Hector; Empereur-Mot, Charly; Elad, Nadav; Levy, Emmanuel D
2017-08-10
The self-association of proteins into symmetric complexes is ubiquitous in all kingdoms of life. Symmetric complexes possess unique geometric and functional properties, but their internal symmetry can pose a risk. In sickle-cell disease, the symmetry of haemoglobin exacerbates the effect of a mutation, triggering assembly into harmful fibrils. Here we examine the universality of this mechanism and its relation to protein structure geometry. We introduced point mutations solely designed to increase surface hydrophobicity among 12 distinct symmetric complexes from Escherichia coli. Notably, all responded by forming supramolecular assemblies in vitro, as well as in vivo upon heterologous expression in Saccharomyces cerevisiae. Remarkably, in four cases, micrometre-long fibrils formed in vivo in response to a single point mutation. Biophysical measurements and electron microscopy revealed that mutants self-assembled in their folded states and so were not amyloid-like. Structural examination of 73 mutants identified supramolecular assembly hot spots predictable by geometry. A subsequent structural analysis of 7,471 symmetric complexes showed that geometric hot spots were buffered chemically by hydrophilic residues, suggesting a mechanism preventing mis-assembly of these regions. Thus, point mutations can frequently trigger folded proteins to self-assemble into higher-order structures. This potential is counterbalanced by negative selection and can be exploited to design nanomaterials in living cells.
Proteins evolve on the edge of supramolecular self-assembly
NASA Astrophysics Data System (ADS)
Garcia-Seisdedos, Hector; Empereur-Mot, Charly; Elad, Nadav; Levy, Emmanuel D.
2017-08-01
The self-association of proteins into symmetric complexes is ubiquitous in all kingdoms of life. Symmetric complexes possess unique geometric and functional properties, but their internal symmetry can pose a risk. In sickle-cell disease, the symmetry of haemoglobin exacerbates the effect of a mutation, triggering assembly into harmful fibrils. Here we examine the universality of this mechanism and its relation to protein structure geometry. We introduced point mutations solely designed to increase surface hydrophobicity among 12 distinct symmetric complexes from Escherichia coli. Notably, all responded by forming supramolecular assemblies in vitro, as well as in vivo upon heterologous expression in Saccharomyces cerevisiae. Remarkably, in four cases, micrometre-long fibrils formed in vivo in response to a single point mutation. Biophysical measurements and electron microscopy revealed that mutants self-assembled in their folded states and so were not amyloid-like. Structural examination of 73 mutants identified supramolecular assembly hot spots predictable by geometry. A subsequent structural analysis of 7,471 symmetric complexes showed that geometric hot spots were buffered chemically by hydrophilic residues, suggesting a mechanism preventing mis-assembly of these regions. Thus, point mutations can frequently trigger folded proteins to self-assemble into higher-order structures. This potential is counterbalanced by negative selection and can be exploited to design nanomaterials in living cells.
Fully Flexible Docking of Medium Sized Ligand Libraries with RosettaLigand
DeLuca, Samuel; Khar, Karen; Meiler, Jens
2015-01-01
RosettaLigand has been successfully used to predict binding poses in protein-small molecule complexes. However, the RosettaLigand docking protocol is comparatively slow in identifying an initial starting pose for the small molecule (ligand) making it unfeasible for use in virtual High Throughput Screening (vHTS). To overcome this limitation, we developed a new sampling approach for placing the ligand in the protein binding site during the initial ‘low-resolution’ docking step. It combines the translational and rotational adjustments to the ligand pose in a single transformation step. The new algorithm is both more accurate and more time-efficient. The docking success rate is improved by 10–15% in a benchmark set of 43 protein/ligand complexes, reducing the number of models that typically need to be generated from 1000 to 150. The average time to generate a model is reduced from 50 seconds to 10 seconds. As a result we observe an effective 30-fold speed increase, making RosettaLigand appropriate for docking medium sized ligand libraries. We demonstrate that this improved initial placement of the ligand is critical for successful prediction of an accurate binding position in the ‘high-resolution’ full atom refinement step. PMID:26207742
Li, Zhan-Chao; Zhou, Xi-Bin; Dai, Zong; Zou, Xiao-Yong
2009-07-01
A prior knowledge of protein structural classes can provide useful information about its overall structure, so it is very important for quick and accurate determination of protein structural class with computation method in protein science. One of the key for computation method is accurate protein sample representation. Here, based on the concept of Chou's pseudo-amino acid composition (AAC, Chou, Proteins: structure, function, and genetics, 43:246-255, 2001), a novel method of feature extraction that combined continuous wavelet transform (CWT) with principal component analysis (PCA) was introduced for the prediction of protein structural classes. Firstly, the digital signal was obtained by mapping each amino acid according to various physicochemical properties. Secondly, CWT was utilized to extract new feature vector based on wavelet power spectrum (WPS), which contains more abundant information of sequence order in frequency domain and time domain, and PCA was then used to reorganize the feature vector to decrease information redundancy and computational complexity. Finally, a pseudo-amino acid composition feature vector was further formed to represent primary sequence by coupling AAC vector with a set of new feature vector of WPS in an orthogonal space by PCA. As a showcase, the rigorous jackknife cross-validation test was performed on the working datasets. The results indicated that prediction quality has been improved, and the current approach of protein representation may serve as a useful complementary vehicle in classifying other attributes of proteins, such as enzyme family class, subcellular localization, membrane protein types and protein secondary structure, etc.
Hattotuwagama, Channa K; Guan, Pingping; Doytchinova, Irini A; Flower, Darren R
2004-11-21
Quantitative structure-activity relationship (QSAR) analysis is a main cornerstone of modern informatic disciplines. Predictive computational models, based on QSAR technology, of peptide-major histocompatibility complex (MHC) binding affinity have now become a vital component of modern day computational immunovaccinology. Historically, such approaches have been built around semi-qualitative, classification methods, but these are now giving way to quantitative regression methods. The additive method, an established immunoinformatics technique for the quantitative prediction of peptide-protein affinity, was used here to identify the sequence dependence of peptide binding specificity for three mouse class I MHC alleles: H2-D(b), H2-K(b) and H2-K(k). As we show, in terms of reliability the resulting models represent a significant advance on existing methods. They can be used for the accurate prediction of T-cell epitopes and are freely available online ( http://www.jenner.ac.uk/MHCPred).
Assembling the Tat protein translocase
Alcock, Felicity; Stansfeld, Phillip J; Basit, Hajra; Habersetzer, Johann; Baker, Matthew AB; Palmer, Tracy; Wallace, Mark I; Berks, Ben C
2016-01-01
The twin-arginine protein translocation system (Tat) transports folded proteins across the bacterial cytoplasmic membrane and the thylakoid membranes of plant chloroplasts. The Tat transporter is assembled from multiple copies of the membrane proteins TatA, TatB, and TatC. We combine sequence co-evolution analysis, molecular simulations, and experimentation to define the interactions between the Tat proteins of Escherichia coli at molecular-level resolution. In the TatBC receptor complex the transmembrane helix of each TatB molecule is sandwiched between two TatC molecules, with one of the inter-subunit interfaces incorporating a functionally important cluster of interacting polar residues. Unexpectedly, we find that TatA also associates with TatC at the polar cluster site. Our data provide a structural model for assembly of the active Tat translocase in which substrate binding triggers replacement of TatB by TatA at the polar cluster site. Our work demonstrates the power of co-evolution analysis to predict protein interfaces in multi-subunit complexes. DOI: http://dx.doi.org/10.7554/eLife.20718.001 PMID:27914200
Shavkunov, Alexander; Panova, Neli; Prasai, Anesh; Veselenak, Ron; Bourne, Nigel; Stoilova-McPhie, Svetla; Laezza, Fernanda
2012-04-01
Protein-protein interactions are critical molecular determinants of ion channel function and emerging targets for pharmacological interventions. Yet, current methodologies for the rapid detection of ion channel macromolecular complexes are still lacking. In this study we have adapted a split-luciferase complementation assay (LCA) for detecting the assembly of the voltage-gated Na+ (Nav) channel C-tail and the intracellular fibroblast growth factor 14 (FGF14), a functionally relevant component of the Nav channelosome that controls gating and targeting of Nav channels through direct interaction with the channel C-tail. In the LCA, two complementary N-terminus and C-terminus fragments of the firefly luciferase were fused, respectively, to a chimera of the CD4 transmembrane segment and the C-tail of Nav1.6 channel (CD4-Nav1.6-NLuc) or FGF14 (CLuc-FGF14). Co-expression of CLuc-FGF14 and CD4-Nav1.6-NLuc in live cells led to a robust assembly of the FGF14:Nav1.6 C-tail complex, which was attenuated by introducing single-point mutations at the predicted FGF14:Nav channel interface. To evaluate the dynamic regulation of the FGF14:Nav1.6 C-tail complex by signaling pathways, we investigated the effect of kinase inhibitors on the complex formation. Through a platform of counter screenings, we show that the p38/MAPK inhibitor, PD169316, and the IκB kinase inhibitor, BAY 11-7082, reduce the FGF14:Nav1.6 C-tail complementation, highlighting a potential role of the p38MAPK and the IκB/NFκB pathways in controlling neuronal excitability through protein-protein interactions. We envision the methodology presented here as a new valuable tool to allow functional evaluations of protein-channel complexes toward probe development and drug discovery targeting ion channels implicated in human disorders.
Ashford, Paul; Moss, David S; Alex, Alexander; Yeap, Siew K; Povia, Alice; Nobeli, Irene; Williams, Mark A
2012-03-14
Protein structures provide a valuable resource for rational drug design. For a protein with no known ligand, computational tools can predict surface pockets that are of suitable size and shape to accommodate a complementary small-molecule drug. However, pocket prediction against single static structures may miss features of pockets that arise from proteins' dynamic behaviour. In particular, ligand-binding conformations can be observed as transiently populated states of the apo protein, so it is possible to gain insight into ligand-bound forms by considering conformational variation in apo proteins. This variation can be explored by considering sets of related structures: computationally generated conformers, solution NMR ensembles, multiple crystal structures, homologues or homology models. It is non-trivial to compare pockets, either from different programs or across sets of structures. For a single structure, difficulties arise in defining particular pocket's boundaries. For a set of conformationally distinct structures the challenge is how to make reasonable comparisons between them given that a perfect structural alignment is not possible. We have developed a computational method, Provar, that provides a consistent representation of predicted binding pockets across sets of related protein structures. The outputs are probabilities that each atom or residue of the protein borders a predicted pocket. These probabilities can be readily visualised on a protein using existing molecular graphics software. We show how Provar simplifies comparison of the outputs of different pocket prediction algorithms, of pockets across multiple simulated conformations and between homologous structures. We demonstrate the benefits of use of multiple structures for protein-ligand and protein-protein interface analysis on a set of complexes and consider three case studies in detail: i) analysis of a kinase superfamily highlights the conserved occurrence of surface pockets at the active and regulatory sites; ii) a simulated ensemble of unliganded Bcl2 structures reveals extensions of a known ligand-binding pocket not apparent in the apo crystal structure; iii) visualisations of interleukin-2 and its homologues highlight conserved pockets at the known receptor interfaces and regions whose conformation is known to change on inhibitor binding. Through post-processing of the output of a variety of pocket prediction software, Provar provides a flexible approach to the analysis and visualization of the persistence or variability of pockets in sets of related protein structures.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ansong, Charles; Tolic, Nikola; Purvine, Samuel O.
Complete and accurate genome annotation is crucial for comprehensive and systematic studies of biological systems. For example systems biology-oriented genome scale modeling efforts greatly benefit from accurate annotation of protein-coding genes to develop proper functioning models. However, determining protein-coding genes for most new genomes is almost completely performed by inference, using computational predictions with significant documented error rates (> 15%). Furthermore, gene prediction programs provide no information on biologically important post-translational processing events critical for protein function. With the ability to directly measure peptides arising from expressed proteins, mass spectrometry-based proteomics approaches can be used to augment and verify codingmore » regions of a genomic sequence and importantly detect post-translational processing events. In this study we utilized “shotgun” proteomics to guide accurate primary genome annotation of the bacterial pathogen Salmonella Typhimurium 14028 to facilitate a systems-level understanding of Salmonella biology. The data provides protein-level experimental confirmation for 44% of predicted protein-coding genes, suggests revisions to 48 genes assigned incorrect translational start sites, and uncovers 13 non-annotated genes missed by gene prediction programs. We also present a comprehensive analysis of post-translational processing events in Salmonella, revealing a wide range of complex chemical modifications (70 distinct modifications) and confirming more than 130 signal peptide and N-terminal methionine cleavage events in Salmonella. This study highlights several ways in which proteomics data applied during the primary stages of annotation can improve the quality of genome annotations, especially with regards to the annotation of mature protein products.« less
Kawabata, Takeshi; Nakamura, Haruki
2014-07-28
A protein-bound conformation of a target molecule can be predicted by aligning the target molecule on the reference molecule obtained from the 3D structure of the compound-protein complex. This strategy is called "similarity-based docking". For this purpose, we develop the flexible alignment program fkcombu, which aligns the target molecule based on atomic correspondences with the reference molecule. The correspondences are obtained by the maximum common substructure (MCS) of 2D chemical structures, using our program kcombu. The prediction performance was evaluated using many target-reference pairs of superimposed ligand 3D structures on the same protein in the PDB, with different ranges of chemical similarity. The details of atomic correspondence largely affected the prediction success. We found that topologically constrained disconnected MCS (TD-MCS) with the simple element-based atomic classification provides the best prediction. The crashing potential energy with the receptor protein improved the performance. We also found that the RMSD between the predicted and correct target conformations significantly correlates with the chemical similarities between target-reference molecules. Generally speaking, if the reference and target compounds have more than 70% chemical similarity, then the average RMSD of 3D conformations is <2.0 Å. We compared the performance with a rigid-body molecular alignment program based on volume-overlap scores (ShaEP). Our MCS-based flexible alignment program performed better than the rigid-body alignment program, especially when the target and reference molecules were sufficiently similar.
Analysis of Immune Complex Structure by Statistical Mechanics and Light Scattering Techniques.
NASA Astrophysics Data System (ADS)
Busch, Nathan Adams
1995-01-01
The size and structure of immune complexes determine their behavior in the immune system. The chemical physics of the complex formation is not well understood; this is due in part to inadequate characterization of the proteins involved, and in part by lack of sufficiently well developed theoretical techniques. Understanding the complex formation will permit rational design of strategies for inhibiting tissue deposition of the complexes. A statistical mechanical model of the proteins based upon the theory of associating fluids was developed. The multipole electrostatic potential for each protein used in this study was characterized for net protein charge, dipole moment magnitude, and dipole moment direction. The binding sites, between the model antigen and antibodies, were characterized for their net surface area, energy, and position relative to the dipole moment of the protein. The equilibrium binding graphs generated with the protein statistical mechanical model compares favorably with experimental data obtained from radioimmunoassay results. The isothermal compressibility predicted by the model agrees with results obtained from dynamic light scattering. The statistical mechanics model was used to investigate association between the model antigen and selected pairs of antibodies. It was found that, in accordance to expectations from thermodynamic arguments, the highest total binding energy yielded complex distributions which were skewed to higher complex size. From examination of the simulated formation of ring structures from linear chain complexes, and from the joint shape probability surfaces, it was found that ring configurations were formed by the "folding" of linear chains until the ends are within binding distance. By comparing the single antigen/two antibody system which differ only in their respective binding site locations, it was found that binding site location influences complex size and shape distributions only when ring formation occurs. The internal potential energy of a ring complex is considerably less than that of the non-associating system; therefore the ring complexes are quite stable and show no evidence of breaking, and collapsing into smaller complexes. The ring formation will occur only in systems where the total free energy of each complex may be minimized. Thus, ring formation will occur even though entropically unfavorable conformations result if the total free energy can be minimized by doing so.
He, Jieyue; Li, Chaojun; Ye, Baoliu; Zhong, Wei
2012-06-25
Most computational algorithms mainly focus on detecting highly connected subgraphs in PPI networks as protein complexes but ignore their inherent organization. Furthermore, many of these algorithms are computationally expensive. However, recent analysis indicates that experimentally detected protein complexes generally contain Core/attachment structures. In this paper, a Greedy Search Method based on Core-Attachment structure (GSM-CA) is proposed. The GSM-CA method detects densely connected regions in large protein-protein interaction networks based on the edge weight and two criteria for determining core nodes and attachment nodes. The GSM-CA method improves the prediction accuracy compared to other similar module detection approaches, however it is computationally expensive. Many module detection approaches are based on the traditional hierarchical methods, which is also computationally inefficient because the hierarchical tree structure produced by these approaches cannot provide adequate information to identify whether a network belongs to a module structure or not. In order to speed up the computational process, the Greedy Search Method based on Fast Clustering (GSM-FC) is proposed in this work. The edge weight based GSM-FC method uses a greedy procedure to traverse all edges just once to separate the network into the suitable set of modules. The proposed methods are applied to the protein interaction network of S. cerevisiae. Experimental results indicate that many significant functional modules are detected, most of which match the known complexes. Results also demonstrate that the GSM-FC algorithm is faster and more accurate as compared to other competing algorithms. Based on the new edge weight definition, the proposed algorithm takes advantages of the greedy search procedure to separate the network into the suitable set of modules. Experimental analysis shows that the identified modules are statistically significant. The algorithm can reduce the computational time significantly while keeping high prediction accuracy.
Predicting protein structures with a multiplayer online game.
Cooper, Seth; Khatib, Firas; Treuille, Adrien; Barbero, Janos; Lee, Jeehyung; Beenen, Michael; Leaver-Fay, Andrew; Baker, David; Popović, Zoran; Players, Foldit
2010-08-05
People exert large amounts of problem-solving effort playing computer games. Simple image- and text-recognition tasks have been successfully 'crowd-sourced' through games, but it is not clear if more complex scientific problems can be solved with human-directed computing. Protein structure prediction is one such problem: locating the biologically relevant native conformation of a protein is a formidable computational challenge given the very large size of the search space. Here we describe Foldit, a multiplayer online game that engages non-scientists in solving hard prediction problems. Foldit players interact with protein structures using direct manipulation tools and user-friendly versions of algorithms from the Rosetta structure prediction methodology, while they compete and collaborate to optimize the computed energy. We show that top-ranked Foldit players excel at solving challenging structure refinement problems in which substantial backbone rearrangements are necessary to achieve the burial of hydrophobic residues. Players working collaboratively develop a rich assortment of new strategies and algorithms; unlike computational approaches, they explore not only the conformational space but also the space of possible search strategies. The integration of human visual problem-solving and strategy development capabilities with traditional computational algorithms through interactive multiplayer games is a powerful new approach to solving computationally-limited scientific problems.
Bayden, Alexander S.; Fornabaio, Micaela; Scarsdale, J. Neel
2009-01-01
A public web server performing computational titration at the active site in a protein-ligand complex has been implemented. This calculation is based on the Hydropathic INTeraction (HINT) noncovalent force field. From 3D coordinate data for the protein, ligand and bridging waters (if available), the server predicts the best combination of protonation states for each ionizable residue and/or ligand functional group as well as the Gibbs free energy of binding for the ionization-optimized protein-ligand complex. The 3D structure for the modified molecules is available as output. In addition, a graph depicting how this energy changes with acidity, i.e., as a function of added protons, can be obtained. This data may prove to be of use in preparing models for virtual screening and molecular docking. A few illustrative examples are presented. In β secretase (2va7) computational titration flipped the amide groups of Gln12 and Asn37 and protonated a ligand amine yielding an improvement of 6.37 kcal mol−1 in the protein-ligand binding score. Protonation of Glu139 in mutant HIV-1 reverse transcriptase (2opq) allows a water bridge between the protein and inhibitor that increases the protein-ligand interaction score by 0.16 kcal mol−1. In human sialidase NEU2 complexed with an isobutyl ether mimetic inhibitor (2f11) computational titration suggested that protonating Glu218, deprotonating Arg237, flipping the amide bond on Tyr334, and optimizing the positions of several other polar protons would increase the protein-ligand interaction score by 0.71 kcal mol−1. PMID:19554265
Transcriptomic Analysis of the Salivary Glands of an Invasive Whitefly
Su, Yun-Lin; Li, Jun-Min; Li, Meng; Luan, Jun-Bo; Ye, Xiao-Dong; Wang, Xiao-Wei; Liu, Shu-Sheng
2012-01-01
Background Some species of the whitefly Bemisia tabaci complex cause tremendous losses to crops worldwide through feeding directly and virus transmission indirectly. The primary salivary glands of whiteflies are critical for their feeding and virus transmission. However, partly due to their tiny size, research on whitefly salivary glands is limited and our knowledge on these glands is scarce. Methodology/Principal Findings We sequenced the transcriptome of the primary salivary glands of the Mediterranean species of B. tabaci complex using an effective cDNA amplification method in combination with short read sequencing (Illumina). In a single run, we obtained 13,615 unigenes. The quantity of the unigenes obtained from the salivary glands of the whitefly is at least four folds of the salivary gland genes from other plant-sucking insects. To reveal the functions of the primary glands, sequence similarity search and comparisons with the whole transcriptome of the whitefly were performed. The results demonstrated that the genes related to metabolism and transport were significantly enriched in the primary salivary glands. Furthermore, we found that a number of highly expressed genes in the salivary glands might be involved in secretory protein processing, secretion and virus transmission. To identify potential proteins of whitefly saliva, the translated unigenes were put into secretory protein prediction. Finally, 295 genes were predicted to encode secretory proteins and some of them might play important roles in whitefly feeding. Conclusions/Significance: The combined method of cDNA amplification, Illumina sequencing and de novo assembly is suitable for transcriptomic analysis of tiny organs in insects. Through analysis of the transcriptome, genomic features of the primary salivary glands were dissected and biologically important proteins, especially secreted proteins, were predicted. Our findings provide substantial sequence information for the primary salivary glands of whiteflies and will be the basis for future studies on whitefly-plant interactions and virus transmission. PMID:22745728
2010-01-01
Background Protein-protein interaction (PPI) plays essential roles in cellular functions. The cost, time and other limitations associated with the current experimental methods have motivated the development of computational methods for predicting PPIs. As protein interactions generally occur via domains instead of the whole molecules, predicting domain-domain interaction (DDI) is an important step toward PPI prediction. Computational methods developed so far have utilized information from various sources at different levels, from primary sequences, to molecular structures, to evolutionary profiles. Results In this paper, we propose a computational method to predict DDI using support vector machines (SVMs), based on domains represented as interaction profile hidden Markov models (ipHMM) where interacting residues in domains are explicitly modeled according to the three dimensional structural information available at the Protein Data Bank (PDB). Features about the domains are extracted first as the Fisher scores derived from the ipHMM and then selected using singular value decomposition (SVD). Domain pairs are represented by concatenating their selected feature vectors, and classified by a support vector machine trained on these feature vectors. The method is tested by leave-one-out cross validation experiments with a set of interacting protein pairs adopted from the 3DID database. The prediction accuracy has shown significant improvement as compared to InterPreTS (Interaction Prediction through Tertiary Structure), an existing method for PPI prediction that also uses the sequences and complexes of known 3D structure. Conclusions We show that domain-domain interaction prediction can be significantly enhanced by exploiting information inherent in the domain profiles via feature selection based on Fisher scores, singular value decomposition and supervised learning based on support vector machines. Datasets and source code are freely available on the web at http://liao.cis.udel.edu/pub/svdsvm. Implemented in Matlab and supported on Linux and MS Windows. PMID:21034480
Importance of ligand reorganization free energy in protein-ligand binding-affinity prediction.
Yang, Chao-Yie; Sun, Haiying; Chen, Jianyong; Nikolovska-Coleska, Zaneta; Wang, Shaomeng
2009-09-30
Accurate prediction of the binding affinities of small-molecule ligands to their biological targets is fundamental for structure-based drug design but remains a very challenging task. In this paper, we have performed computational studies to predict the binding models of 31 small-molecule Smac (the second mitochondria-derived activator of caspase) mimetics to their target, the XIAP (X-linked inhibitor of apoptosis) protein, and their binding affinities. Our results showed that computational docking was able to reliably predict the binding models, as confirmed by experimentally determined crystal structures of some Smac mimetics complexed with XIAP. However, all the computational methods we have tested, including an empirical scoring function, two knowledge-based scoring functions, and MM-GBSA (molecular mechanics and generalized Born surface area), yield poor to modest prediction for binding affinities. The linear correlation coefficient (r(2)) value between the predicted affinities and the experimentally determined affinities was found to be between 0.21 and 0.36. Inclusion of ensemble protein-ligand conformations obtained from molecular dynamic simulations did not significantly improve the prediction. However, major improvement was achieved when the free-energy change for ligands between their free- and bound-states, or "ligand-reorganization free energy", was included in the MM-GBSA calculation, and the r(2) value increased from 0.36 to 0.66. The prediction was validated using 10 additional Smac mimetics designed and evaluated by an independent group. This study demonstrates that ligand reorganization free energy plays an important role in the overall binding free energy between Smac mimetics and XIAP. This term should be evaluated for other ligand-protein systems and included in the development of new scoring functions. To our best knowledge, this is the first computational study to demonstrate the importance of ligand reorganization free energy for the prediction of protein-ligand binding free energy.
Network-based prediction and knowledge mining of disease genes.
Carson, Matthew B; Lu, Hui
2015-01-01
In recent years, high-throughput protein interaction identification methods have generated a large amount of data. When combined with the results from other in vivo and in vitro experiments, a complex set of relationships between biological molecules emerges. The growing popularity of network analysis and data mining has allowed researchers to recognize indirect connections between these molecules. Due to the interdependent nature of network entities, evaluating proteins in this context can reveal relationships that may not otherwise be evident. We examined the human protein interaction network as it relates to human illness using the Disease Ontology. After calculating several topological metrics, we trained an alternating decision tree (ADTree) classifier to identify disease-associated proteins. Using a bootstrapping method, we created a tree to highlight conserved characteristics shared by many of these proteins. Subsequently, we reviewed a set of non-disease-associated proteins that were misclassified by the algorithm with high confidence and searched for evidence of a disease relationship. Our classifier was able to predict disease-related genes with 79% area under the receiver operating characteristic (ROC) curve (AUC), which indicates the tradeoff between sensitivity and specificity and is a good predictor of how a classifier will perform on future data sets. We found that a combination of several network characteristics including degree centrality, disease neighbor ratio, eccentricity, and neighborhood connectivity help to distinguish between disease- and non-disease-related proteins. Furthermore, the ADTree allowed us to understand which combinations of strongly predictive attributes contributed most to protein-disease classification. In our post-processing evaluation, we found several examples of potential novel disease-related proteins and corresponding literature evidence. In addition, we showed that first- and second-order neighbors in the PPI network could be used to identify likely disease associations. We analyzed the human protein interaction network and its relationship to disease and found that both the number of interactions with other proteins and the disease relationship of neighboring proteins helped to determine whether a protein had a relationship to disease. Our classifier predicted many proteins with no annotated disease association to be disease-related, which indicated that these proteins have network characteristics that are similar to disease-related proteins and may therefore have disease associations not previously identified. By performing a post-processing step after the prediction, we were able to identify evidence in literature supporting this possibility. This method could provide a useful filter for experimentalists searching for new candidate protein targets for drug repositioning and could also be extended to include other network and data types in order to refine these predictions.
Becerra-Artiles, Aniuska; Dominguez-Amorocho, Omar; Stern, Lawrence J.; Calvo-Calle, J. Mauricio
2015-01-01
Most of humanity is chronically infected with human herpesvirus 6 (HHV-6), with viral replication controlled at least in part by a poorly characterized CD4 T cell response. Identification of viral epitopes recognized by CD4 T cells is complicated by the large size of the herpesvirus genome and a low frequency of circulating T cells responding to the virus. Here, we present an alternative to classical epitope mapping approaches used to identify major targets of the T cell response to a complex pathogen like HHV-6B. In the approach presented here, extracellular virus preparations or virus-infected cells are fractionated by SDS-PAGE, and eluted fractions are used as source of antigens to study cytokine responses in direct ex vivo T cell activation studies. Fractions inducing significant cytokine responses are analyzed by mass spectrometry to identify viral proteins, and a subset of peptides from these proteins corresponding to predicted HLA-DR binders is tested for IFN-γ production in seropositive donors with diverse HLA haplotypes. Ten HHV-6B viral proteins were identified as immunodominant antigens. The epitope-specific response to HHV-6B virus was complex and variable between individuals. We identified 107 peptides, each recognized by at least one donor, with each donor having a distinctive footprint. Fourteen peptides showed responses in the majority of donors. Responses to these epitopes were validated using in vitro expanded cells and naturally expressed viral proteins. Predicted peptide binding affinities for the eight HLA-DRB1 alleles investigated here correlated only modestly with the observed CD4 T cell responses. Overall, the response to the virus was dominated by peptides from the major capsid protein U57 and major antigenic protein U11, but responses to other proteins including glycoprotein H (U48) and tegument proteins U54 and U14 also were observed. These results provide a means to follow and potentially modulate the CD4 T-cell immune response to HHV-6B. PMID:26599878
A microscopic insight from conformational thermodynamics to functional ligand binding in proteins.
Sikdar, Samapan; Chakrabarti, J; Ghosh, Mahua
2014-12-01
We show that the thermodynamics of metal ion-induced conformational changes aid to understand the functions of protein complexes. This is illustrated in the case of a metalloprotein, alpha-lactalbumin (aLA), a divalent metal ion binding protein. We use the histograms of dihedral angles of the protein, generated from all-atom molecular dynamics simulations, to calculate conformational thermodynamics. The thermodynamically destabilized and disordered residues in different conformational states of a protein are proposed to serve as binding sites for ligands. This is tested for β-1,4-galactosyltransferase (β4GalT) binding to the Ca(2+)-aLA complex, in which the binding residues are known. Among the binding residues, the C-terminal residues like aspartate (D) 116, glutamine (Q) 117, tryptophan (W) 118 and leucine (L) 119 are destabilized and disordered and can dock β4GalT onto Ca(2+)-aLA. No such thermodynamically favourable binding residues can be identified in the case of the Mg(2+)-aLA complex. We apply similar analysis to oleic acid binding and predict that the Ca(2+)-aLA complex can bind to oleic acid through the basic histidine (H) 32 of the A2 helix and the hydrophobic residues, namely, isoleucine (I) 59, W60 and I95, of the interfacial cleft. However, the number of destabilized and disordered residues in Mg(2+)-aLA are few, and hence, the oleic acid binding to Mg(2+)-bound aLA is less stable than that to the Ca(2+)-aLA complex. Our analysis can be generalized to understand the functionality of other ligand bound proteins.
Tan, Kemin; Chang, Changsoo; Cuff, Marianne; Osipiuk, Jerzy; Landorf, Elizabeth; Mack, Jamey C; Zerbs, Sarah; Joachimiak, Andrzej; Collart, Frank R
2013-10-01
Lignin comprises 15-25% of plant biomass and represents a major environmental carbon source for utilization by soil microorganisms. Access to this energy resource requires the action of fungal and bacterial enzymes to break down the lignin polymer into a complex assortment of aromatic compounds that can be transported into the cells. To improve our understanding of the utilization of lignin by microorganisms, we characterized the molecular properties of solute binding proteins of ATP-binding cassette transporter proteins that interact with these compounds. A combination of functional screens and structural studies characterized the binding specificity of the solute binding proteins for aromatic compounds derived from lignin such as p-coumarate, 3-phenylpropionic acid and compounds with more complex ring substitutions. A ligand screen based on thermal stabilization identified several binding protein clusters that exhibit preferences based on the size or number of aromatic ring substituents. Multiple X-ray crystal structures of protein-ligand complexes for these clusters identified the molecular basis of the binding specificity for the lignin-derived aromatic compounds. The screens and structural data provide new functional assignments for these solute-binding proteins which can be used to infer their transport specificity. This knowledge of the functional roles and molecular binding specificity of these proteins will support the identification of the specific enzymes and regulatory proteins of peripheral pathways that funnel these compounds to central metabolic pathways and will improve the predictive power of sequence-based functional annotation methods for this family of proteins. Copyright © 2013 Wiley Periodicals, Inc.
Tan, Kemin; Chang, Changsoo; Cuff, Marianne; Osipiuk, Jerzy; Landorf, Elizabeth; Mack, Jamey C.; Zerbs, Sarah; Joachimiak, Andrzej; Collart, Frank R.
2013-01-01
Lignin comprises 15.25% of plant biomass and represents a major environmental carbon source for utilization by soil microorganisms. Access to this energy resource requires the action of fungal and bacterial enzymes to break down the lignin polymer into a complex assortment of aromatic compounds that can be transported into the cells. To improve our understanding of the utilization of lignin by microorganisms, we characterized the molecular properties of solute binding proteins of ATP.binding cassette transporter proteins that interact with these compounds. A combination of functional screens and structural studies characterized the binding specificity of the solute binding proteins for aromatic compounds derived from lignin such as p-coumarate, 3-phenylpropionic acid and compounds with more complex ring substitutions. A ligand screen based on thermal stabilization identified several binding protein clusters that exhibit preferences based on the size or number of aromatic ring substituents. Multiple X-ray crystal structures of protein-ligand complexes for these clusters identified the molecular basis of the binding specificity for the lignin-derived aromatic compounds. The screens and structural data provide new functional assignments for these solute.binding proteins which can be used to infer their transport specificity. This knowledge of the functional roles and molecular binding specificity of these proteins will support the identification of the specific enzymes and regulatory proteins of peripheral pathways that funnel these compounds to central metabolic pathways and will improve the predictive power of sequence-based functional annotation methods for this family of proteins. PMID:23606130