Science.gov

Sample records for predicting protein function

  1. Year 2 Report: Protein Function Prediction Platform

    SciTech Connect

    Zhou, C E

    2012-04-27

    Upon completion of our second year of development in a 3-year development cycle, we have completed a prototype protein structure-function annotation and function prediction system: Protein Function Prediction (PFP) platform (v.0.5). We have met our milestones for Years 1 and 2 and are positioned to continue development in completion of our original statement of work, or a reasonable modification thereof, in service to DTRA Programs involved in diagnostics and medical countermeasures research and development. The PFP platform is a multi-scale computational modeling system for protein structure-function annotation and function prediction. As of this writing, PFP is the only existing fully automated, high-throughput, multi-scale modeling, whole-proteome annotation platform, and represents a significant advance in the field of genome annotation (Fig. 1). PFP modules perform protein functional annotations at the sequence, systems biology, protein structure, and atomistic levels of biological complexity (Fig. 2). Because these approaches provide orthogonal means of characterizing proteins and suggesting protein function, PFP processing maximizes the protein functional information that can currently be gained by computational means. Comprehensive annotation of pathogen genomes is essential for bio-defense applications in pathogen characterization, threat assessment, and medical countermeasure design and development in that it can short-cut the time and effort required to select and characterize protein biomarkers.

  2. Graph pyramids for protein function prediction

    PubMed Central

    2015-01-01

    Background Uncovering the hidden organizational characteristics and regularities among biological sequences is the key issue for detailed understanding of an underlying biological phenomenon. Thus pattern recognition from nucleic acid sequences is an important affair for protein function prediction. As proteins from the same family exhibit similar characteristics, homology based approaches predict protein functions via protein classification. But conventional classification approaches mostly rely on the global features by considering only strong protein similarity matches. This leads to significant loss of prediction accuracy. Methods Here we construct the Protein-Protein Similarity (PPS) network, which captures the subtle properties of protein families. The proposed method considers the local as well as the global features, by examining the interactions among 'weakly interacting proteins' in the PPS network and by using hierarchical graph analysis via the graph pyramid. Different underlying properties of the protein families are uncovered by operating the proposed graph based features at various pyramid levels. Results Experimental results on benchmark data sets show that the proposed hierarchical voting algorithm using graph pyramid helps to improve computational efficiency as well the protein classification accuracy. Quantitatively, among 14,086 test sequences, on an average the proposed method misclassified only 21.1 sequences whereas baseline BLAST score based global feature matching method misclassified 362.9 sequences. With each correctly classified test sequence, the fast incremental learning ability of the proposed method further enhances the training model. Thus it has achieved more than 96% protein classification accuracy using only 20% per class training data. PMID:26044522

  3. Quantitative assessment of protein function prediction programs.

    PubMed

    Rodrigues, B N; Steffens, M B R; Raittz, R T; Santos-Weiss, I C R; Marchaukoski, J N

    2015-01-01

    Fast prediction of protein function is essential for high-throughput sequencing analysis. Bioinformatic resources provide cheaper and faster techniques for function prediction and have helped to accelerate the process of protein sequence characterization. In this study, we assessed protein function prediction programs that accept amino acid sequences as input. We analyzed the classification, equality, and similarity between programs, and, additionally, compared program performance. The following programs were selected for our assessment: Blast2GO, InterProScan, PANTHER, Pfam, and ScanProsite. This selection was based on the high number of citations (over 500), fully automatic analysis, and the possibility of returning a single best classification per sequence. We tested these programs using 12 gold standard datasets from four different sources. The gold standard classification of the databases was based on expert analysis, the Protein Data Bank, or the Structure-Function Linkage Database. We found that the miss rate among the programs is globally over 50%. Furthermore, we observed little overlap in the correct predictions from each program. Therefore, a combination of multiple types of sources and methods, including experimental data, protein-protein interaction, and data mining, may be the best way to generate more reliable predictions and decrease the miss rate. PMID:26782400

  4. Hierarchical Ensemble Methods for Protein Function Prediction

    PubMed Central

    2014-01-01

    Protein function prediction is a complex multiclass multilabel classification problem, characterized by multiple issues such as the incompleteness of the available annotations, the integration of multiple sources of high dimensional biomolecular data, the unbalance of several functional classes, and the difficulty of univocally determining negative examples. Moreover, the hierarchical relationships between functional classes that characterize both the Gene Ontology and FunCat taxonomies motivate the development of hierarchy-aware prediction methods that showed significantly better performances than hierarchical-unaware “flat” prediction methods. In this paper, we provide a comprehensive review of hierarchical methods for protein function prediction based on ensembles of learning machines. According to this general approach, a separate learning machine is trained to learn a specific functional term and then the resulting predictions are assembled in a “consensus” ensemble decision, taking into account the hierarchical relationships between classes. The main hierarchical ensemble methods proposed in the literature are discussed in the context of existing computational methods for protein function prediction, highlighting their characteristics, advantages, and limitations. Open problems of this exciting research area of computational biology are finally considered, outlining novel perspectives for future research. PMID:25937954

  5. A new protein structure representation for efficient protein function prediction.

    PubMed

    Maghawry, Huda A; Mostafa, Mostafa G M; Gharib, Tarek F

    2014-12-01

    One of the challenging problems in bioinformatics is the prediction of protein function. Protein function is the main key that can be used to classify different proteins. Protein function can be inferred experimentally with very small throughput or computationally with very high throughput. Computational methods are sequence based or structure based. Structure-based methods produce more accurate protein function prediction. In this article, we propose a new protein structure representation for efficient protein function prediction. The representation is based on three-dimensional patterns of protein residues. In the analysis, we used protein function based on enzyme activity through six mechanistically diverse enzyme superfamilies: amidohydrolase, crotonase, haloacid dehalogenase, isoprenoid synthase type I, and vicinal oxygen chelate. We applied three different classification methods, naïve Bayes, k-nearest neighbors, and random forest, to predict the enzyme superfamily of a given protein. The prediction accuracy using the proposed representation outperforms a recently introduced representation method that is based only on the distance patterns. The results show that the proposed representation achieved prediction accuracy up to 98%, with improvement of about 10% on average. PMID:25343279

  6. Protein function prediction based on data fusion and functional interrelationship.

    PubMed

    Meng, Jun; Wekesa, Jael-Sanyanda; Shi, Guan-Li; Luan, Yu-Shi

    2016-04-01

    One of the challenging tasks of bioinformatics is to predict more accurate and confident protein functions from genomics and proteomics datasets. Computational approaches use a variety of high throughput experimental data, such as protein-protein interaction (PPI), protein sequences and phylogenetic profiles, to predict protein functions. This paper presents a method that uses transductive multi-label learning algorithm by integrating multiple data sources for classification. Multiple proteomics datasets are integrated to make inferences about functions of unknown proteins and use a directed bi-relational graph to assign labels to unannotated proteins. Our method, bi-relational graph based transductive multi-label function annotation (Bi-TMF) uses functional correlation and topological PPI network properties on both the training and testing datasets to predict protein functions through data fusion of the individual kernel result. The main purpose of our proposed method is to enhance the performance of classifier integration for protein function prediction algorithms. Experimental results demonstrate the effectiveness and efficiency of Bi-TMF on multi-sources datasets in yeast, human and mouse benchmarks. Bi-TMF outperforms other recently proposed methods. PMID:26869536

  7. Functional prediction of hypothetical proteins in human adenoviruses.

    PubMed

    Dorden, Shane; Mahadevan, Padmanabhan

    2015-01-01

    Assigning functional information to hypothetical proteins in virus genomes is crucial for gaining insight into their proteomes. Human adenoviruses are medium sized viruses that cause a range of diseases. Their genomes possess proteins with uncharacterized function known as hypothetical proteins. Using a wide range of protein function prediction servers, functional information was obtained about these hypothetical proteins. A comparison of functional information obtained from these servers revealed that some of them produced functional information, while others provided little functional information about these human adenovirus hypothetical proteins. The PFP, ESG, PSIPRED, 3d2GO, and ProtFun servers produced the most functional information regarding these hypothetical proteins. PMID:26664031

  8. Network-based prediction of protein function

    PubMed Central

    Sharan, Roded; Ulitsky, Igor; Shamir, Ron

    2007-01-01

    Functional annotation of proteins is a fundamental problem in the post-genomic era. The recent availability of protein interaction networks for many model species has spurred on the development of computational methods for interpreting such data in order to elucidate protein function. In this review, we describe the current computational approaches for the task, including direct methods, which propagate functional information through the network, and module-assisted methods, which infer functional modules within the network and use those for the annotation task. Although a broad variety of interesting approaches has been developed, further progress in the field will depend on systematic evaluation of the methods and their dissemination in the biological community. PMID:17353930

  9. Using search engine technology for protein function prediction.

    PubMed

    Chen, Ziyang; Cai, Zhao; Li, Min; Liu, Binbin

    2011-01-01

    Prediction of protein function is one of the most challenging problems in the post-genomic era. In this paper, we propose a novel algorithm Improved ProteinRank (IPR) for protein function prediction, which is based on the search engine technology and the preferential attachment criteria. In addition, an improved algorithm IPRW is developed from IPR to be used in the weighted protein?protein interaction (PPI) network. The proposed algorithms IPR and IPRW are applied to the PPI network of S.cerevisiae. The experimental results show that both IPR and IPRW outweigh the previous methods for the prediction of protein functions. PMID:21441099

  10. Text Mining Improves Prediction of Protein Functional Sites

    PubMed Central

    Cohn, Judith D.; Ravikumar, Komandur E.

    2012-01-01

    We present an approach that integrates protein structure analysis and text mining for protein functional site prediction, called LEAP-FS (Literature Enhanced Automated Prediction of Functional Sites). The structure analysis was carried out using Dynamics Perturbation Analysis (DPA), which predicts functional sites at control points where interactions greatly perturb protein vibrations. The text mining extracts mentions of residues in the literature, and predicts that residues mentioned are functionally important. We assessed the significance of each of these methods by analyzing their performance in finding known functional sites (specifically, small-molecule binding sites and catalytic sites) in about 100,000 publicly available protein structures. The DPA predictions recapitulated many of the functional site annotations and preferentially recovered binding sites annotated as biologically relevant vs. those annotated as potentially spurious. The text-based predictions were also substantially supported by the functional site annotations: compared to other residues, residues mentioned in text were roughly six times more likely to be found in a functional site. The overlap of predictions with annotations improved when the text-based and structure-based methods agreed. Our analysis also yielded new high-quality predictions of many functional site residues that were not catalogued in the curated data sources we inspected. We conclude that both DPA and text mining independently provide valuable high-throughput protein functional site predictions, and that integrating the two methods using LEAP-FS further improves the quality of these predictions. PMID:22393388

  11. Probabilistic Protein Function Prediction from Heterogeneous Genome-Wide Data

    PubMed Central

    Nariai, Naoki; Kolaczyk, Eric D.; Kasif, Simon

    2007-01-01

    Dramatic improvements in high throughput sequencing technologies have led to a staggering growth in the number of predicted genes. However, a large fraction of these newly discovered genes do not have a functional assignment. Fortunately, a variety of novel high-throughput genome-wide functional screening technologies provide important clues that shed light on gene function. The integration of heterogeneous data to predict protein function has been shown to improve the accuracy of automated gene annotation systems. In this paper, we propose and evaluate a probabilistic approach for protein function prediction that integrates protein-protein interaction (PPI) data, gene expression data, protein motif information, mutant phenotype data, and protein localization data. First, functional linkage graphs are constructed from PPI data and gene expression data, in which an edge between nodes (proteins) represents evidence for functional similarity. The assumption here is that graph neighbors are more likely to share protein function, compared to proteins that are not neighbors. The functional linkage graph model is then used in concert with protein domain, mutant phenotype and protein localization data to produce a functional prediction. Our method is applied to the functional prediction of Saccharomyces cerevisiae genes, using Gene Ontology (GO) terms as the basis of our annotation. In a cross validation study we show that the integrated model increases recall by 18%, compared to using PPI data alone at the 50% precision. We also show that the integrated predictor is significantly better than each individual predictor. However, the observed improvement vs. PPI depends on both the new source of data and the functional category to be predicted. Surprisingly, in some contexts integration hurts overall prediction accuracy. Lastly, we provide a comprehensive assignment of putative GO terms to 463 proteins that currently have no assigned function. PMID:17396164

  12. INTEGRATING COMPUTATIONAL PROTEIN FUNCTION PREDICTION INTO DRUG DISCOVERY INITIATIVES

    PubMed Central

    Grant, Marianne A.

    2014-01-01

    Pharmaceutical researchers must evaluate vast numbers of protein sequences and formulate innovative strategies for identifying valid targets and discovering leads against them as a way of accelerating drug discovery. The ever increasing number and diversity of novel protein sequences identified by genomic sequencing projects and the success of worldwide structural genomics initiatives have spurred great interest and impetus in the development of methods for accurate, computationally empowered protein function prediction and active site identification. Previously, in the absence of direct experimental evidence, homology-based protein function annotation remained the gold-standard for in silico analysis and prediction of protein function. However, with the continued exponential expansion of sequence databases, this approach is not always applicable, as fewer query protein sequences demonstrate significant homology to protein gene products of known function. As a result, several non-homology based methods for protein function prediction that are based on sequence features, structure, evolution, biochemical and genetic knowledge have emerged. Herein, we review current bioinformatic programs and approaches for protein function prediction/annotation and discuss their integration into drug discovery initiatives. The development of such methods to annotate protein functional sites and their application to large protein functional families is crucial to successfully utilizing the vast amounts of genomic sequence information available to drug discovery and development processes. PMID:25530654

  13. A Survey of Computational Intelligence Techniques in Protein Function Prediction

    PubMed Central

    Tiwari, Arvind Kumar; Srivastava, Rajeev

    2014-01-01

    During the past, there was a massive growth of knowledge of unknown proteins with the advancement of high throughput microarray technologies. Protein function prediction is the most challenging problem in bioinformatics. In the past, the homology based approaches were used to predict the protein function, but they failed when a new protein was different from the previous one. Therefore, to alleviate the problems associated with homology based traditional approaches, numerous computational intelligence techniques have been proposed in the recent past. This paper presents a state-of-the-art comprehensive review of various computational intelligence techniques for protein function predictions using sequence, structure, protein-protein interaction network, and gene expression data used in wide areas of applications such as prediction of DNA and RNA binding sites, subcellular localization, enzyme functions, signal peptides, catalytic residues, nuclear/G-protein coupled receptors, membrane proteins, and pathway analysis from gene expression datasets. This paper also summarizes the result obtained by many researchers to solve these problems by using computational intelligence techniques with appropriate datasets to improve the prediction performance. The summary shows that ensemble classifiers and integration of multiple heterogeneous data are useful for protein function prediction. PMID:25574395

  14. A large-scale evaluation of computational protein function prediction

    PubMed Central

    Radivojac, Predrag; Clark, Wyatt T; Ronnen Oron, Tal; Schnoes, Alexandra M; Wittkop, Tobias; Sokolov, Artem; Graim, Kiley; Funk, Christopher; Verspoor, Karin; Ben-Hur, Asa; Pandey, Gaurav; Yunes, Jeffrey M; Talwalkar, Ameet S; Repo, Susanna; Souza, Michael L; Piovesan, Damiano; Casadio, Rita; Wang, Zheng; Cheng, Jianlin; Fang, Hai; Gough, Julian; Koskinen, Patrik; Törönen, Petri; Nokso-Koivisto, Jussi; Holm, Liisa; Cozzetto, Domenico; Buchan, Daniel W A; Bryson, Kevin; Jones, David T; Limaye, Bhakti; Inamdar, Harshal; Datta, Avik; Manjari, Sunitha K; Joshi, Rajendra; Chitale, Meghana; Kihara, Daisuke; Lisewski, Andreas M; Erdin, Serkan; Venner, Eric; Lichtarge, Olivier; Rentzsch, Robert; Yang, Haixuan; Romero, Alfonso E; Bhat, Prajwal; Paccanaro, Alberto; Hamp, Tobias; Kassner, Rebecca; Seemayer, Stefan; Vicedo, Esmeralda; Schaefer, Christian; Achten, Dominik; Auer, Florian; Böhm, Ariane; Braun, Tatjana; Hecht, Maximilian; Heron, Mark; Hönigschmid, Peter; Hopf, Thomas; Kaufmann, Stefanie; Kiening, Michael; Krompass, Denis; Landerer, Cedric; Mahlich, Yannick; Roos, Manfred; Björne, Jari; Salakoski, Tapio; Wong, Andrew; Shatkay, Hagit; Gatzmann, Fanny; Sommer, Ingolf; Wass, Mark N; Sternberg, Michael J E; Škunca, Nives; Supek, Fran; Bošnjak, Matko; Panov, Panče; Džeroski, Sašo; Šmuc, Tomislav; Kourmpetis, Yiannis A I; van Dijk, Aalt D J; ter Braak, Cajo J F; Zhou, Yuanpeng; Gong, Qingtian; Dong, Xinran; Tian, Weidong; Falda, Marco; Fontana, Paolo; Lavezzo, Enrico; Di Camillo, Barbara; Toppo, Stefano; Lan, Liang; Djuric, Nemanja; Guo, Yuhong; Vucetic, Slobodan; Bairoch, Amos; Linial, Michal; Babbitt, Patricia C; Brenner, Steven E; Orengo, Christine; Rost, Burkhard; Mooney, Sean D; Friedberg, Iddo

    2013-01-01

    Automated annotation of protein function is challenging. As the number of sequenced genomes rapidly grows, the overwhelming majority of protein products can only be annotated computationally. If computational predictions are to be relied upon, it is crucial that the accuracy of these methods be high. Here we report the results from the first large-scale community-based Critical Assessment of protein Function Annotation (CAFA) experiment. Fifty-four methods representing the state-of-the-art for protein function prediction were evaluated on a target set of 866 proteins from eleven organisms. Two findings stand out: (i) today’s best protein function prediction algorithms significantly outperformed widely-used first-generation methods, with large gains on all types of targets; and (ii) although the top methods perform well enough to guide experiments, there is significant need for improvement of currently available tools. PMID:23353650

  15. A large-scale evaluation of computational protein function prediction.

    PubMed

    Radivojac, Predrag; Clark, Wyatt T; Oron, Tal Ronnen; Schnoes, Alexandra M; Wittkop, Tobias; Sokolov, Artem; Graim, Kiley; Funk, Christopher; Verspoor, Karin; Ben-Hur, Asa; Pandey, Gaurav; Yunes, Jeffrey M; Talwalkar, Ameet S; Repo, Susanna; Souza, Michael L; Piovesan, Damiano; Casadio, Rita; Wang, Zheng; Cheng, Jianlin; Fang, Hai; Gough, Julian; Koskinen, Patrik; Törönen, Petri; Nokso-Koivisto, Jussi; Holm, Liisa; Cozzetto, Domenico; Buchan, Daniel W A; Bryson, Kevin; Jones, David T; Limaye, Bhakti; Inamdar, Harshal; Datta, Avik; Manjari, Sunitha K; Joshi, Rajendra; Chitale, Meghana; Kihara, Daisuke; Lisewski, Andreas M; Erdin, Serkan; Venner, Eric; Lichtarge, Olivier; Rentzsch, Robert; Yang, Haixuan; Romero, Alfonso E; Bhat, Prajwal; Paccanaro, Alberto; Hamp, Tobias; Kaßner, Rebecca; Seemayer, Stefan; Vicedo, Esmeralda; Schaefer, Christian; Achten, Dominik; Auer, Florian; Boehm, Ariane; Braun, Tatjana; Hecht, Maximilian; Heron, Mark; Hönigschmid, Peter; Hopf, Thomas A; Kaufmann, Stefanie; Kiening, Michael; Krompass, Denis; Landerer, Cedric; Mahlich, Yannick; Roos, Manfred; Björne, Jari; Salakoski, Tapio; Wong, Andrew; Shatkay, Hagit; Gatzmann, Fanny; Sommer, Ingolf; Wass, Mark N; Sternberg, Michael J E; Škunca, Nives; Supek, Fran; Bošnjak, Matko; Panov, Panče; Džeroski, Sašo; Šmuc, Tomislav; Kourmpetis, Yiannis A I; van Dijk, Aalt D J; ter Braak, Cajo J F; Zhou, Yuanpeng; Gong, Qingtian; Dong, Xinran; Tian, Weidong; Falda, Marco; Fontana, Paolo; Lavezzo, Enrico; Di Camillo, Barbara; Toppo, Stefano; Lan, Liang; Djuric, Nemanja; Guo, Yuhong; Vucetic, Slobodan; Bairoch, Amos; Linial, Michal; Babbitt, Patricia C; Brenner, Steven E; Orengo, Christine; Rost, Burkhard; Mooney, Sean D; Friedberg, Iddo

    2013-03-01

    Automated annotation of protein function is challenging. As the number of sequenced genomes rapidly grows, the overwhelming majority of protein products can only be annotated computationally. If computational predictions are to be relied upon, it is crucial that the accuracy of these methods be high. Here we report the results from the first large-scale community-based critical assessment of protein function annotation (CAFA) experiment. Fifty-four methods representing the state of the art for protein function prediction were evaluated on a target set of 866 proteins from 11 organisms. Two findings stand out: (i) today's best protein function prediction algorithms substantially outperform widely used first-generation methods, with large gains on all types of targets; and (ii) although the top methods perform well enough to guide experiments, there is considerable need for improvement of currently available tools. PMID:23353650

  16. Protein Structure and Function Prediction Using I-TASSER

    PubMed Central

    Yang, Jianyi; Zhang, Yang

    2016-01-01

    I-TASSER is a hierarchical protocol for automated protein structure prediction and structure-based function annotation. Starting from the amino acid sequence of target proteins, I-TASSER first generates full-length atomic structural models from multiple threading alignments and iterative structural assembly simulations followed by atomic-level structure refinement. The biological functions of the protein, including ligand-binding sites, enzyme commission number, and gene ontology terms, are then inferred from known protein function databases based on sequence and structure profile comparisons. I-TASSER is freely available as both an on-line server and a stand-alone package. This unit describes how to use the I-TASSER protocol to generate structure and function prediction and how to interpret the prediction results, as well as alternative approaches for further improving the I-TASSER modeling quality for distant-homologous and multi-domain protein targets. PMID:26678386

  17. Collective prediction of protein functions from protein-protein interaction networks

    PubMed Central

    2014-01-01

    Background Automated assignment of functions to unknown proteins is one of the most important task in computational biology. The development of experimental methods for genome scale analysis of molecular interaction networks offers new ways to infer protein function from protein-protein interaction (PPI) network data. Existing techniques for collective classification (CC) usually increase accuracy for network data, wherein instances are interlinked with each other, using a large amount of labeled data for training. However, the labeled data are time-consuming and expensive to obtain. On the other hand, one can easily obtain large amount of unlabeled data. Thus, more sophisticated methods are needed to exploit the unlabeled data to increase prediction accuracy for protein function prediction. Results In this paper, we propose an effective Markov chain based CC algorithm (ICAM) to tackle the label deficiency problem in CC for interrelated proteins from PPI networks. Our idea is to model the problem using two distinct Markov chain classifiers to make separate predictions with regard to attribute features from protein data and relational features from relational information. The ICAM learning algorithm combines the results of the two classifiers to compute the ranks of labels to indicate the importance of a set of labels to an instance, and uses an ICA framework to iteratively refine the learning models for improving performance of protein function prediction from PPI networks in the paucity of labeled data. Conclusion Experimental results on the real-world Yeast protein-protein interaction datasets show that our proposed ICAM method is better than the other ICA-type methods given limited labeled training data. This approach can serve as a valuable tool for the study of protein function prediction from PPI networks. PMID:24564855

  18. Predicting Protein Function via Semantic Integration of Multiple Networks.

    PubMed

    Yu, Guoxian; Fu, Guangyuan; Wang, Jun; Zhu, Hailong

    2016-01-01

    Determining the biological functions of proteins is one of the key challenges in the post-genomic era. The rapidly accumulated large volumes of proteomic and genomic data drives to develop computational models for automatically predicting protein function in large scale. Recent approaches focus on integrating multiple heterogeneous data sources and they often get better results than methods that use single data source alone. In this paper, we investigate how to integrate multiple biological data sources with the biological knowledge, i.e., Gene Ontology (GO), for protein function prediction. We propose a method, called SimNet, to Semantically i ntegrate multiple functional association Networks derived from heterogenous data sources. SimNet firstly utilizes GO annotations of proteins to capture the semantic similarity between proteins and introduces a semantic kernel based on the similarity. Next, SimNet constructs a composite network, obtained as a weighted summation of individual networks, and aligns the network with the kernel to get the weights assigned to individual networks. Then, it applies a network-based classifier on the composite network to predict protein function. Experiment results on heterogenous proteomic data sources of Yeast, Human, Mouse, and Fly show that, SimNet not only achieves better (or comparable) results than other related competitive approaches, but also takes much less time. The Matlab codes of SimNet are available at https://sites.google.com/site/guoxian85/simnet. PMID:26800544

  19. Protein side chain conformation predictions with an MMGBSA energy function.

    PubMed

    Gaillard, Thomas; Panel, Nicolas; Simonson, Thomas

    2016-06-01

    The prediction of protein side chain conformations from backbone coordinates is an important task in structural biology, with applications in structure prediction and protein design. It is a difficult problem due to its combinatorial nature. We study the performance of an "MMGBSA" energy function, implemented in our protein design program Proteus, which combines molecular mechanics terms, a Generalized Born and Surface Area (GBSA) solvent model, with approximations that make the model pairwise additive. Proteus is not a competitor to specialized side chain prediction programs due to its cost, but it allows protein design applications, where side chain prediction is an important step and MMGBSA an effective energy model. We predict the side chain conformations for 18 proteins. The side chains are first predicted individually, with the rest of the protein in its crystallographic conformation. Next, all side chains are predicted together. The contributions of individual energy terms are evaluated and various parameterizations are compared. We find that the GB and SA terms, with an appropriate choice of the dielectric constant and surface energy coefficients, are beneficial for single side chain predictions. For the prediction of all side chains, however, errors due to the pairwise additive approximation overcome the improvement brought by these terms. We also show the crucial contribution of side chain minimization to alleviate the rigid rotamer approximation. Even without GB and SA terms, we obtain accuracies comparable to SCWRL4, a specialized side chain prediction program. In particular, we obtain a better RMSD than SCWRL4 for core residues (at a higher cost), despite our simpler rotamer library. Proteins 2016; 84:803-819. © 2016 Wiley Periodicals, Inc. PMID:26948696

  20. Revisiting the prediction of protein function at CASP6.

    PubMed

    Pellegrini-Calace, Marialuisa; Soro, Simonetta; Tramontano, Anna

    2006-07-01

    The ability to predict the function of a protein, given its sequence and/or 3D structure, is an essential requirement for exploiting the wealth of data made available by genomics and structural genomics projects and is therefore raising increasing interest in the computational biology community. To foster developments in the area as well as to establish the state of the art of present methods, a function prediction category was tentatively introduced in the 6th edition of the Critical Assessment of Techniques for Protein Structure Prediction (CASP) worldwide experiment. The assessment of the performance of the methods was made difficult by at least two factors: (a) the experimentally determined function of the targets was not available at the time of assessment; (b) the experiment is run blindly, preventing verification of whether the convergence of different predictions towards the same functional annotation was due to the similarity of the methods or to a genuine signal detectable by different methodologies. In this work, we collected information about the methods used by the various predictors and revisited the results of the experiment by verifying how often and in which cases a convergent prediction was obtained by methods based on different rationale. We propose a method for classifying the type and redundancy of the methods. We also analyzed the cases in which a function for the target protein has become available. Our results show that predictions derived from a consensus of different methods can reach an accuracy as high as 80%. It follows that some of the predictions submitted to CASP6, once reanalyzed taking into account the type of converging methods, can provide very useful information to researchers interested in the function of the target proteins. PMID:16759228

  1. Pattern recognition methods for protein functional site prediction.

    PubMed

    Yang, Zheng Rong; Wang, Lipo; Young, Natasha; Trudgian, Dave; Chou, Kuo-Chen

    2005-10-01

    Protein functional site prediction is closely related to drug design, hence to public health. In order to save the cost and the time spent on identifying the functional sites in sequenced proteins in biology laboratory, computer programs have been widely used for decades. Many of them are implemented using the state-of-the-art pattern recognition algorithms, including decision trees, neural networks and support vector machines. Although the success of this effort has been obvious, advanced and new algorithms are still under development for addressing some difficult issues. This review will go through the major stages in developing pattern recognition algorithms for protein functional site prediction and outline the future research directions in this important area. PMID:16248799

  2. Exploring Function Prediction in Protein Interaction Networks via Clustering Methods

    PubMed Central

    Trivodaliev, Kire; Bogojeska, Aleksandra; Kocarev, Ljupco

    2014-01-01

    Complex networks have recently become the focus of research in many fields. Their structure reveals crucial information for the nodes, how they connect and share information. In our work we analyze protein interaction networks as complex networks for their functional modular structure and later use that information in the functional annotation of proteins within the network. We propose several graph representations for the protein interaction network, each having different level of complexity and inclusion of the annotation information within the graph. We aim to explore what the benefits and the drawbacks of these proposed graphs are, when they are used in the function prediction process via clustering methods. For making this cluster based prediction, we adopt well established approaches for cluster detection in complex networks using most recent representative algorithms that have been proven as efficient in the task at hand. The experiments are performed using a purified and reliable Saccharomyces cerevisiae protein interaction network, which is then used to generate the different graph representations. Each of the graph representations is later analysed in combination with each of the clustering algorithms, which have been possibly modified and implemented to fit the specific graph. We evaluate results in regards of biological validity and function prediction performance. Our results indicate that the novel ways of presenting the complex graph improve the prediction process, although the computational complexity should be taken into account when deciding on a particular approach. PMID:24972109

  3. Scoring functions for prediction of protein-ligand interactions.

    PubMed

    Wang, Jui-Chih; Lin, Jung-Hsin

    2013-01-01

    The scoring functions for protein-ligand interactions plays central roles in computational drug design, virtual screening of chemical libraries for new lead identification, and prediction of possible binding targets of small chemical molecules. An ideal scoring function for protein-ligand interactions is expected to be able to recognize the native binding pose of a ligand on the protein surface among decoy poses, and to accurately predict the binding affinity (or binding free energy) so that the active molecules can be discriminated from the non-active ones. Due to the empirical nature of most, if not all, scoring functions for protein-ligand interactions, the general applicability of empirical scoring functions, especially to domains far outside training sets, is a major concern. In this review article, we will explore the foundations of different classes of scoring functions, their possible limitations, and their suitable application domains. We also provide assessments of several scoring functions on weakly-interacting protein-ligand complexes, which will be useful information in computational fragment-based drug design or virtual screening. PMID:23016847

  4. Cloud Prediction of Protein Structure and Function with PredictProtein for Debian

    PubMed Central

    Kaján, László; Yachdav, Guy; Vicedo, Esmeralda; Steinegger, Martin; Mirdita, Milot; Angermüller, Christof; Böhm, Ariane; Domke, Simon; Ertl, Julia; Mertes, Christian; Reisinger, Eva; Rost, Burkhard

    2013-01-01

    We report the release of PredictProtein for the Debian operating system and derivatives, such as Ubuntu, Bio-Linux, and Cloud BioLinux. The PredictProtein suite is available as a standard set of open source Debian packages. The release covers the most popular prediction methods from the Rost Lab, including methods for the prediction of secondary structure and solvent accessibility (profphd), nuclear localization signals (predictnls), and intrinsically disordered regions (norsnet). We also present two case studies that successfully utilize PredictProtein packages for high performance computing in the cloud: the first analyzes protein disorder for whole organisms, and the second analyzes the effect of all possible single sequence variants in protein coding regions of the human genome. PMID:23971032

  5. Cloud prediction of protein structure and function with PredictProtein for Debian.

    PubMed

    Kaján, László; Yachdav, Guy; Vicedo, Esmeralda; Steinegger, Martin; Mirdita, Milot; Angermüller, Christof; Böhm, Ariane; Domke, Simon; Ertl, Julia; Mertes, Christian; Reisinger, Eva; Staniewski, Cedric; Rost, Burkhard

    2013-01-01

    We report the release of PredictProtein for the Debian operating system and derivatives, such as Ubuntu, Bio-Linux, and Cloud BioLinux. The PredictProtein suite is available as a standard set of open source Debian packages. The release covers the most popular prediction methods from the Rost Lab, including methods for the prediction of secondary structure and solvent accessibility (profphd), nuclear localization signals (predictnls), and intrinsically disordered regions (norsnet). We also present two case studies that successfully utilize PredictProtein packages for high performance computing in the cloud: the first analyzes protein disorder for whole organisms, and the second analyzes the effect of all possible single sequence variants in protein coding regions of the human genome. PMID:23971032

  6. Prediction of functional residues in water channels and related proteins.

    PubMed Central

    Froger, A.; Tallur, B.; Thomas, D.; Delamarche, C.

    1998-01-01

    In this paper, we present an updated classification of the ubiquitous MIP (Major Intrinsic Protein) family proteins, including 153 fully or partially sequenced members available in public databases. Presently, about 30 of these proteins have been functionally characterized, exhibiting essentially two distinct types of channel properties: (1) specific water transport by the aquaporins, and (2) small neutral solutes transport, such as glycerol by the glycerol facilitators. Sequence alignments were used to predict amino acids and motifs discriminant in channel specificity. The protein sequences were also analyzed using statistical tools (comparisons of means and correspondence analysis). Five key positions were clearly identified where the residues are specific for each functional subgroup and exhibit high dissimilar physico-chemical properties. Moreover, we have found that the putative channels for small neutral solutes clearly differ from the aquaporins by the amino acid content and the length of predicted loop regions, suggesting a substrate filter function for these loops. From these results, we propose a signature pattern for water transport. PMID:9655351

  7. High Precision Prediction of Functional Sites in Protein Structures

    PubMed Central

    Buturovic, Ljubomir; Wong, Mike; Tang, Grace W.; Altman, Russ B.; Petkovic, Dragutin

    2014-01-01

    We address the problem of assigning biological function to solved protein structures. Computational tools play a critical role in identifying potential active sites and informing screening decisions for further lab analysis. A critical parameter in the practical application of computational methods is the precision, or positive predictive value. Precision measures the level of confidence the user should have in a particular computed functional assignment. Low precision annotations lead to futile laboratory investigations and waste scarce research resources. In this paper we describe an advanced version of the protein function annotation system FEATURE, which achieved 99% precision and average recall of 95% across 20 representative functional sites. The system uses a Support Vector Machine classifier operating on the microenvironment of physicochemical features around an amino acid. We also compared performance of our method with state-of-the-art sequence-level annotator Pfam in terms of precision, recall and localization. To our knowledge, no other functional site annotator has been rigorously evaluated against these key criteria. The software and predictive models are incorporated into the WebFEATURE service at http://feature.stanford.edu/wf4.0-beta. PMID:24632601

  8. Integrated protein function prediction by mining function associations, sequences, and protein–protein and gene–gene interaction networks

    PubMed Central

    Cao, Renzhi; Cheng, Jianlin

    2016-01-01

    Motivations Protein function prediction is an important and challenging problem in bioinformatics and computational biology. Functionally relevant biological information such as protein sequences, gene expression, and protein–protein interactions has been used mostly separately for protein function prediction. One of the major challenges is how to effectively integrate multiple sources of both traditional and new information such as spatial gene–gene interaction networks generated from chromosomal conformation data together to improve protein function prediction. Results In this work, we developed three different probabilistic scores (MIS, SEQ, and NET score) to combine protein sequence, function associations, and protein–protein interaction and spatial gene–gene interaction networks for protein function prediction. The MIS score is mainly generated from homologous proteins found by PSI-BLAST search, and also association rules between Gene Ontology terms, which are learned by mining the Swiss-Prot database. The SEQ score is generated from protein sequences. The NET score is generated from protein–protein interaction and spatial gene–gene interaction networks. These three scores were combined in a new Statistical Multiple Integrative Scoring System (SMISS) to predict protein function. We tested SMISS on the data set of 2011 Critical Assessment of Function Annotation (CAFA). The method performed substantially better than three base-line methods and an advanced method based on protein profile–sequence comparison, profile–profile comparison, and domain co-occurrence networks according to the maximum F-measure. PMID:26370280

  9. Biochemical functional predictions for protein structures of unknown or uncertain function

    PubMed Central

    Mills, Caitlyn L.; Beuning, Penny J.; Ondrechen, Mary Jo

    2015-01-01

    With the exponential growth in the determination of protein sequences and structures via genome sequencing and structural genomics efforts, there is a growing need for reliable computational methods to determine the biochemical function of these proteins. This paper reviews the efforts to address the challenge of annotating the function at the molecular level of uncharacterized proteins. While sequence- and three-dimensional-structure-based methods for protein function prediction have been reviewed previously, the recent trends in local structure-based methods have received less attention. These local structure-based methods are the primary focus of this review. Computational methods have been developed to predict the residues important for catalysis and the local spatial arrangements of these residues can be used to identify protein function. In addition, the combination of different types of methods can help obtain more information and better predictions of function for proteins of unknown function. Global initiatives, including the Enzyme Function Initiative (EFI), COMputational BRidges to EXperiments (COMBREX), and the Critical Assessment of Function Annotation (CAFA), are evaluating and testing the different approaches to predicting the function of proteins of unknown function. These initiatives and global collaborations will increase the capability and reliability of methods to predict biochemical function computationally and will add substantial value to the current volume of structural genomics data by reducing the number of absent or inaccurate functional annotations. PMID:25848497

  10. PredictProtein—an open resource for online prediction of protein structural and functional features

    PubMed Central

    Yachdav, Guy; Kloppmann, Edda; Kajan, Laszlo; Hecht, Maximilian; Goldberg, Tatyana; Hamp, Tobias; Hönigschmid, Peter; Schafferhans, Andrea; Roos, Manfred; Bernhofer, Michael; Richter, Lothar; Ashkenazy, Haim; Punta, Marco; Schlessinger, Avner; Bromberg, Yana; Schneider, Reinhard; Vriend, Gerrit; Sander, Chris; Ben-Tal, Nir; Rost, Burkhard

    2014-01-01

    PredictProtein is a meta-service for sequence analysis that has been predicting structural and functional features of proteins since 1992. Queried with a protein sequence it returns: multiple sequence alignments, predicted aspects of structure (secondary structure, solvent accessibility, transmembrane helices (TMSEG) and strands, coiled-coil regions, disulfide bonds and disordered regions) and function. The service incorporates analysis methods for the identification of functional regions (ConSurf), homology-based inference of Gene Ontology terms (metastudent), comprehensive subcellular localization prediction (LocTree3), protein–protein binding sites (ISIS2), protein–polynucleotide binding sites (SomeNA) and predictions of the effect of point mutations (non-synonymous SNPs) on protein function (SNAP2). Our goal has always been to develop a system optimized to meet the demands of experimentalists not highly experienced in bioinformatics. To this end, the PredictProtein results are presented as both text and a series of intuitive, interactive and visually appealing figures. The web server and sources are available at http://ppopen.rostlab.org. PMID:24799431

  11. FunPred-1: protein function prediction from a protein interaction network using neighborhood analysis.

    PubMed

    Saha, Sovan; Chatterjee, Piyali; Basu, Subhadip; Kundu, Mahantapas; Nasipuri, Mita

    2014-12-01

    Proteins are responsible for all biological activities in living organisms. Thanks to genome sequencing projects, large amounts of DNA and protein sequence data are now available, but the biological functions of many proteins are still not annotated in most cases. The unknown function of such non-annotated proteins may be inferred or deduced from their neighbors in a protein interaction network. In this paper, we propose two new methods to predict protein functions based on network neighborhood properties. FunPred 1.1 uses a combination of three simple-yet-effective scoring techniques: the neighborhood ratio, the protein path connectivity and the relative functional similarity. FunPred 1.2 applies a heuristic approach using the edge clustering coefficient to reduce the search space by identifying densely connected neighborhood regions. The overall accuracy achieved in FunPred 1.2 over 8 functional groups involving hetero-interactions in 650 yeast proteins is around 87%, which is higher than the accuracy with FunPred 1.1. It is also higher than the accuracy of many of the state-of-the-art protein function prediction methods described in the literature. The test datasets and the complete source code of the developed software are now freely available at http://code.google.com/p/cmaterbioinfo/ . PMID:25424913

  12. Ensemble learning prediction of protein-protein interactions using proteins functional annotations.

    PubMed

    Saha, Indrajit; Zubek, Julian; Klingström, Tomas; Forsberg, Simon; Wikander, Johan; Kierczak, Marcin; Maulik, Ujjwal; Plewczynski, Dariusz

    2014-04-01

    Protein-protein interactions are important for the majority of biological processes. A significant number of computational methods have been developed to predict protein-protein interactions using protein sequence, structural and genomic data. Vast experimental data is publicly available on the Internet, but it is scattered across numerous databases. This fact motivated us to create and evaluate new high-throughput datasets of interacting proteins. We extracted interaction data from DIP, MINT, BioGRID and IntAct databases. Then we constructed descriptive features for machine learning purposes based on data from Gene Ontology and DOMINE. Thereafter, four well-established machine learning methods: Support Vector Machine, Random Forest, Decision Tree and Naïve Bayes, were used on these datasets to build an Ensemble Learning method based on majority voting. In cross-validation experiment, sensitivity exceeded 80% and classification/prediction accuracy reached 90% for the Ensemble Learning method. We extended the experiment to a bigger and more realistic dataset maintaining sensitivity over 70%. These results confirmed that our datasets are suitable for performing PPI prediction and Ensemble Learning method is well suited for this task. Both the processed PPI datasets and the software are available at . PMID:24469380

  13. UET: a database of evolutionarily-predicted functional determinants of protein sequences that cluster as functional sites in protein structures.

    PubMed

    Lua, Rhonald C; Wilson, Stephen J; Konecki, Daniel M; Wilkins, Angela D; Venner, Eric; Morgan, Daniel H; Lichtarge, Olivier

    2016-01-01

    The structure and function of proteins underlie most aspects of biology and their mutational perturbations often cause disease. To identify the molecular determinants of function as well as targets for drugs, it is central to characterize the important residues and how they cluster to form functional sites. The Evolutionary Trace (ET) achieves this by ranking the functional and structural importance of the protein sequence positions. ET uses evolutionary distances to estimate functional distances and correlates genotype variations with those in the fitness phenotype. Thus, ET ranks are worse for sequence positions that vary among evolutionarily closer homologs but better for positions that vary mostly among distant homologs. This approach identifies functional determinants, predicts function, guides the mutational redesign of functional and allosteric specificity, and interprets the action of coding sequence variations in proteins, people and populations. Now, the UET database offers pre-computed ET analyses for the protein structure databank, and on-the-fly analysis of any protein sequence. A web interface retrieves ET rankings of sequence positions and maps results to a structure to identify functionally important regions. This UET database integrates several ways of viewing the results on the protein sequence or structure and can be found at http://mammoth.bcm.tmc.edu/uet/. PMID:26590254

  14. UET: a database of evolutionarily-predicted functional determinants of protein sequences that cluster as functional sites in protein structures

    PubMed Central

    Lua, Rhonald C.; Wilson, Stephen J.; Konecki, Daniel M.; Wilkins, Angela D.; Venner, Eric; Morgan, Daniel H.; Lichtarge, Olivier

    2016-01-01

    The structure and function of proteins underlie most aspects of biology and their mutational perturbations often cause disease. To identify the molecular determinants of function as well as targets for drugs, it is central to characterize the important residues and how they cluster to form functional sites. The Evolutionary Trace (ET) achieves this by ranking the functional and structural importance of the protein sequence positions. ET uses evolutionary distances to estimate functional distances and correlates genotype variations with those in the fitness phenotype. Thus, ET ranks are worse for sequence positions that vary among evolutionarily closer homologs but better for positions that vary mostly among distant homologs. This approach identifies functional determinants, predicts function, guides the mutational redesign of functional and allosteric specificity, and interprets the action of coding sequence variations in proteins, people and populations. Now, the UET database offers pre-computed ET analyses for the protein structure databank, and on-the-fly analysis of any protein sequence. A web interface retrieves ET rankings of sequence positions and maps results to a structure to identify functionally important regions. This UET database integrates several ways of viewing the results on the protein sequence or structure and can be found at http://mammoth.bcm.tmc.edu/uet/. PMID:26590254

  15. Prediction of functional sites in proteins using conserved functional group analysis.

    PubMed

    Innis, C Axel; Anand, A Prem; Sowdhamini, R

    2004-04-01

    A detailed knowledge of a protein's functional site is an absolute prerequisite for understanding its mode of action at the molecular level. However, the rapid pace at which sequence and structural information is being accumulated for proteins greatly exceeds our ability to determine their biochemical roles experimentally. As a result, computational methods are required which allow for the efficient processing of the evolutionary information contained in this wealth of data, in particular that related to the nature and location of functionally important sites and residues. The method presented here, referred to as conserved functional group (CFG) analysis, relies on a simplified representation of the chemical groups found in amino acid side-chains to identify functional sites from a single protein structure and a number of its sequence homologues. We show that CFG analysis can fully or partially predict the location of functional sites in approximately 96% of the 470 cases tested and that, unlike other methods available, it is able to tolerate wide variations in sequence identity. In addition, we discuss its potential in a structural genomics context, where automation, scalability and efficiency are critical, and an increasing number of protein structures are determined with no prior knowledge of function. This is exemplified by our analysis of the hypothetical protein Ydde_Ecoli, whose structure was recently solved by members of the North East Structural Genomics consortium. Although the proposed active site for this protein needs to be validated experimentally, this example illustrates the scope of CFG analysis as a general tool for the identification of residues likely to play an important role in a protein's biochemical function. Thus, our method offers a convenient solution to rapidly and automatically process the vast amounts of data that are beginning to emerge from structural genomics projects. PMID:15033369

  16. COMBREX-DB: an experiment centered database of protein function: knowledge, predictions and knowledge gaps.

    PubMed

    Chang, Yi-Chien; Hu, Zhenjun; Rachlin, John; Anton, Brian P; Kasif, Simon; Roberts, Richard J; Steffen, Martin

    2016-01-01

    The COMBREX database (COMBREX-DB; combrex.bu.edu) is an online repository of information related to (i) experimentally determined protein function, (ii) predicted protein function, (iii) relationships among proteins of unknown function and various types of experimental data, including molecular function, protein structure, and associated phenotypes. The database was created as part of the novel COMBREX (COMputational BRidges to EXperiments) effort aimed at accelerating the rate of gene function validation. It currently holds information on ∼ 3.3 million known and predicted proteins from over 1000 completely sequenced bacterial and archaeal genomes. The database also contains a prototype recommendation system for helping users identify those proteins whose experimental determination of function would be most informative for predicting function for other proteins within protein families. The emphasis on documenting experimental evidence for function predictions, and the prioritization of uncharacterized proteins for experimental testing distinguish COMBREX from other publicly available microbial genomics resources. This article describes updates to COMBREX-DB since an initial description in the 2011 NAR Database Issue. PMID:26635392

  17. Phagonaute: A web-based interface for phage synteny browsing and protein function prediction.

    PubMed

    Delattre, Hadrien; Souiai, Oussema; Fagoonee, Khema; Guerois, Raphaël; Petit, Marie-Agnès

    2016-09-01

    Distant homology search tools are of great help to predict viral protein functions. However, due to the lack of profile databases dedicated to viruses, they can lack sensitivity. We constructed HMM profiles for more than 80,000 proteins from both phages and archaeal viruses, and performed all pairwise comparisons with HHsearch program. The whole resulting database can be explored through a user-friendly "Phagonaute" interface to help predict functions. Results are displayed together with their genetic context, to strengthen inferences based on remote homology. Beyond function prediction, this tool permits detections of co-occurrences, often indicative of proteins completing a task together, and observation of conserved patterns across large evolutionary distances. As a test, Herpes simplex virus I was added to Phagonaute, and 25% of its proteome matched to bacterial or archaeal viral protein counterparts. Phagonaute should therefore help virologists in their quest for protein functions and evolutionary relationships. PMID:27254594

  18. Local structure based method for prediction of the biochemical function of proteins: Applications to glycoside hydrolases.

    PubMed

    Parasuram, Ramya; Mills, Caitlyn L; Wang, Zhouxi; Somasundaram, Saroja; Beuning, Penny J; Ondrechen, Mary Jo

    2016-01-15

    Thousands of protein structures of unknown or uncertain function have been reported as a result of high-throughput structure determination techniques developed by Structural Genomics (SG) projects. However, many of the putative functional assignments of these SG proteins in the Protein Data Bank (PDB) are incorrect. While high-throughput biochemical screening techniques have provided valuable functional information for limited sets of SG proteins, the biochemical functions for most SG proteins are still unknown or uncertain. Therefore, computational methods for the reliable prediction of protein function from structure can add tremendous value to the existing SG data. In this article, we show how computational methods may be used to predict the function of SG proteins, using examples from the six-hairpin glycosidase (6-HG) and the concanavalin A-like lectin/glucanase (CAL/G) superfamilies. Using a set of predicted functional residues, obtained from computed electrostatic and chemical properties for each protein structure, it is shown that these superfamilies may be sorted into functional families according to biochemical function. Within these superfamilies, a total of 18 SG proteins were analyzed according to their predicted, local functional sites: 13 from the 6-HG superfamily, five from the CAL/G superfamily. Within the 6-HG superfamily, an uncharacterized protein BACOVA_03626 from Bacteroides ovatus (PDB 3ON6) and a hypothetical protein BT3781 from Bacteroides thetaiotaomicron (PDB 2P0V) are shown to have very strong active site matches with exo-α-1,6-mannosidases, thus likely possessing this function. Also in this superfamily, it is shown that protein BH0842, a putative glycoside hydrolase from Bacillus halodurans (PDB 2RDY), has a predicted active site that matches well with a known α-L-galactosidase. In the CAL/G superfamily, an uncharacterized glycosyl hydrolase family 16 protein from Mycobacterium smegmatis (PDB 3RQ0) is shown to have local structural

  19. Structural and functional protein network analyses predict novel signaling functions for rhodopsin

    PubMed Central

    Kiel, Christina; Vogt, Andreas; Campagna, Anne; Chatr-aryamontri, Andrew; Swiatek-de Lange, Magdalena; Beer, Monika; Bolz, Sylvia; Mack, Andreas F; Kinkl, Norbert; Cesareni, Gianni; Serrano, Luis; Ueffing, Marius

    2011-01-01

    Orchestration of signaling, photoreceptor structural integrity, and maintenance needed for mammalian vision remain enigmatic. By integrating three proteomic data sets, literature mining, computational analyses, and structural information, we have generated a multiscale signal transduction network linked to the visual G protein-coupled receptor (GPCR) rhodopsin, the major protein component of rod outer segments. This network was complemented by domain decomposition of protein–protein interactions and then qualified for mutually exclusive or mutually compatible interactions and ternary complex formation using structural data. The resulting information not only offers a comprehensive view of signal transduction induced by this GPCR but also suggests novel signaling routes to cytoskeleton dynamics and vesicular trafficking, predicting an important level of regulation through small GTPases. Further, it demonstrates a specific disease susceptibility of the core visual pathway due to the uniqueness of its components present mainly in the eye. As a comprehensive multiscale network, it can serve as a basis to elucidate the physiological principles of photoreceptor function, identify potential disease-associated genes and proteins, and guide the development of therapies that target specific branches of the signaling pathway. PMID:22108793

  20. Integrative approaches for predicting protein function and prioritizing genes for complex phenotypes using protein interaction networks

    PubMed Central

    Ma, Xiaotu; Chen, Ting

    2014-01-01

    With the rapid development of biotechnologies, many types of biological data including molecular networks are now available. However, to obtain a more complete understanding of a biological system, the integration of molecular networks with other data, such as molecular sequences, protein domains and gene expression profiles, is needed. A key to the use of networks in biological studies is the definition of similarity among proteins over the networks. Here, we review applications of similarity measures over networks with a special focus on the following four problems: (i) predicting protein functions, (ii) prioritizing genes related to a phenotype given a set of seed genes that have been shown to be related to the phenotype, (iii) prioritizing genes related to a phenotype by integrating gene expression profiles and networks and (iv) identification of false positives and false negatives from RNAi experiments. Diffusion kernels are demonstrated to give superior performance in all these tasks, leading to the suggestion that diffusion kernels should be the primary choice for a network similarity metric over other similarity measures such as direct neighbors and shortest path distance. PMID:23788799

  1. Network-based auto-probit modeling for protein function prediction.

    PubMed

    Jiang, Xiaoyu; Gold, David; Kolaczyk, Eric D

    2011-09-01

    Predicting the functional roles of proteins based on various genome-wide data, such as protein-protein association networks, has become a canonical problem in computational biology. Approaching this task as a binary classification problem, we develop a network-based extension of the spatial auto-probit model. In particular, we develop a hierarchical Bayesian probit-based framework for modeling binary network-indexed processes, with a latent multivariate conditional autoregressive Gaussian process. The latter allows for the easy incorporation of protein-protein association network topologies-either binary or weighted-in modeling protein functional similarity. We use this framework to predict protein functions, for functions defined as terms in the Gene Ontology (GO) database, a popular rigorous vocabulary for biological functionality. Furthermore, we show how a natural extension of this framework can be used to model and correct for the high percentage of false negative labels in training data derived from GO, a serious shortcoming endemic to biological databases of this type. Our method performance is evaluated and compared with standard algorithms on weighted yeast protein-protein association networks, extracted from a recently developed integrative database called Search Tool for the Retrieval of INteracting Genes/proteins (STRING). Results show that our basic method is competitive with these other methods, and that the extended method-incorporating the uncertainty in negative labels among the training data-can yield nontrivial improvements in predictive accuracy. PMID:21133881

  2. Composite motifs integrating multiple protein structures increase sensitivity for function prediction.

    PubMed

    Chen, Brian Y; Bryant, Drew H; Cruess, Amanda E; Bylund, Joseph H; Fofanov, Viacheslav Y; Kristensen, David M; Kimmel, Marek; Lichtarge, Olivier; Kavraki, Lydia E

    2007-01-01

    The study of disease often hinges on the biological function of proteins, but determining protein function is a difficult experimental process. To minimize duplicated effort, algorithms for function prediction seek characteristics indicative of possible protein function. One approach is to identify substructural matches of geometric and chemical similarity between motifs representing known active sites and target protein structures with unknown function. In earlier work, statistically significant matches of certain effective motifs have identified functionally related active sites. Effective motifs must be carefully designed to maintain similarity to functionally related sites (sensitivity) and avoid incidental similarities to functionally unrelated protein geometry (specificity). Existing motif design techniques use the geometry of a single protein structure. Poor selection of this structure can limit motif effectiveness if the selected functional site lacks similarity to functionally related sites. To address this problem, this paper presents composite motifs, which combine structures of functionally related active sites to potentially increase sensitivity. Our experimentation compares the effectiveness of composite motifs with simple motifs designed from single protein structures. On six distinct families of functionally related proteins, leave-one-out testing showed that composite motifs had sensitivity comparable to the most sensitive of all simple motifs and specificity comparable to the average simple motif. On our data set, we observed that composite motifs simultaneously capture variations in active site conformation, diminish the problem of selecting motif structures, and enable the fusion of protein structures from diverse data sources. PMID:17951837

  3. Functional prediction: identification of protein orthologs and paralogs.

    PubMed Central

    Chen, R.; Jeong, S. S.

    2000-01-01

    Orthologs typically retain the same function in the course of evolution. Using beta-decarboxylating dehydrogenase family as a model, we demonstrate that orthologs can be confidently identified. The strategy is based on our recent findings that substitutions of only a few amino acid residues in these enzymes are sufficient to exchange substrate and coenzyme specificities. Hence, the few major specificity determinants can serve as reliable markers for determining orthologous or paralogous relationships. The power of this approach has been demonstrated by correcting similarity-based functional misassignment and discovering new genes and related pathways, and should be broadly applicable to other enzyme families. PMID:11206056

  4. A multi-label classifier for prediction membrane protein functional types in animal.

    PubMed

    Zou, Hong-Liang

    2014-11-01

    Membrane protein is an important composition of cell membrane. Given a membrane protein sequence, how can we identify its type(s) is very important because the type keeps a close correlation with its functions. According to previous studies, membrane protein can be divided into the following eight types: single-pass type I, single-pass type II, single-pass type III, single-pass type IV, multipass, lipid-anchor, GPI-anchor, peripheral membrane protein. With the avalanche of newly found protein sequences in the post-genomic age, it is urgent to develop an automatic and effective computational method to rapid and reliable prediction of the types of membrane proteins. At present, most of the existing methods were based on the assumption that one membrane protein only belongs to one type. Actually, a membrane protein may simultaneously exist at two or more different functional types. In this study, a new method by hybridizing the pseudo amino acid composition with multi-label algorithm called LIFT (multi-label learning with label-specific features) was proposed to predict the functional types both singleplex and multiplex animal membrane proteins. Experimental result on a stringent benchmark dataset of membrane proteins by jackknife test show that the absolute-true obtained was 0.6342, indicating that our approach is quite promising. It may become a useful high-through tool, or at least play a complementary role to the existing predictors in identifying functional types of membrane proteins. PMID:25107302

  5. Predicting protein functions from redundancies in large-scale protein interaction networks

    NASA Technical Reports Server (NTRS)

    Samanta, Manoj Pratim; Liang, Shoudan

    2003-01-01

    Interpreting data from large-scale protein interaction experiments has been a challenging task because of the widespread presence of random false positives. Here, we present a network-based statistical algorithm that overcomes this difficulty and allows us to derive functions of unannotated proteins from large-scale interaction data. Our algorithm uses the insight that if two proteins share significantly larger number of common interaction partners than random, they have close functional associations. Analysis of publicly available data from Saccharomyces cerevisiae reveals >2,800 reliable functional associations, 29% of which involve at least one unannotated protein. By further analyzing these associations, we derive tentative functions for 81 unannotated proteins with high certainty. Our method is not overly sensitive to the false positives present in the data. Even after adding 50% randomly generated interactions to the measured data set, we are able to recover almost all (approximately 89%) of the original associations.

  6. iPFPi: A System for Improving Protein Function Prediction through Cumulative Iterations.

    PubMed

    Taha, Kamal; Yoo, Paul D; Alzaabi, Mohammed

    2015-01-01

    We propose a classifier system called iPFPi that predicts the functions of un-annotated proteins. iPFPi assigns an un-annotated protein P the functions of GO annotation terms that are semantically similar to P. An un-annotated protein P and a GO annotation term T are represented by their characteristics. The characteristics of P are GO terms found within the abstracts of biomedical literature associated with P. The characteristics of Tare GO terms found within the abstracts of biomedical literature associated with the proteins annotated with the function of T. Let F and F/ be the important (dominant) sets of characteristic terms representing T and P, respectively. iPFPi would annotate P with the function of T, if F and F/ are semantically similar. We constructed a novel semantic similarity measure that takes into consideration several factors, such as the dominance degree of each characteristic term t in set F based on its score, which is a value that reflects the dominance status of t relative to other characteristic terms, using pairwise beats and looses procedure. Every time a protein P is annotated with the function of T, iPFPi updates and optimizes the current scores of the characteristic terms for T based on the weights of the characteristic terms for P. Set F will be updated accordingly. Thus, the accuracy of predicting the function of T as the function of subsequent proteins improves. This prediction accuracy keeps improving over time iteratively through the cumulative weights of the characteristic terms representing proteins that are successively annotated with the function of T. We evaluated the quality of iPFPi by comparing it experimentally with two recent protein function prediction systems. Results showed marked improvement. PMID:26357323

  7. A scoring function based on solvation thermodynamics for protein structure prediction

    PubMed Central

    Du, Shiqiao; Harano, Yuichi; Kinoshita, Masahiro; Sakurai, Minoru

    2012-01-01

    We predict protein structure using our recently developed free energy function for describing protein stability, which is focused on solvation thermodynamics. The function is combined with the current most reliable sampling methods, i.e., fragment assembly (FA) and comparative modeling (CM). The prediction is tested using 11 small proteins for which high-resolution crystal structures are available. For 8 of these proteins, sequence similarities are found in the database, and the prediction is performed with CM. Fairly accurate models with average Cα root mean square deviation (RMSD) ∼ 2.0 Å are successfully obtained for all cases. For the rest of the target proteins, we perform the prediction following FA protocols. For 2 cases, we obtain predicted models with an RMSD ∼ 3.0 Å as the best-scored structures. For the other case, the RMSD remains larger than 7 Å. For all the 11 target proteins, our scoring function identifies the experimentally determined native structure as the best structure. Starting from the predicted structure, replica exchange molecular dynamics is performed to further refine the structures. However, we are unable to improve its RMSD toward the experimental structure. The exhaustive sampling by coarse-grained normal mode analysis around the native structures reveals that our function has a linear correlation with RMSDs < 3.0 Å. These results suggest that the function is quite reliable for the protein structure prediction while the sampling method remains one of the major limiting factors in it. The aspects through which the methodology could further be improved are discussed.

  8. LRR Conservation Mapping to Predict Functional Sites within Protein Leucine-Rich Repeat Domains

    PubMed Central

    Helft, Laura; Reddy, Vignyan; Chen, Xiyang; Koller, Teresa; Federici, Luca; Fernández-Recio, Juan; Gupta, Rishabh; Bent, Andrew

    2011-01-01

    Computational prediction of protein functional sites can be a critical first step for analysis of large or complex proteins. Contemporary methods often require several homologous sequences and/or a known protein structure, but these resources are not available for many proteins. Leucine-rich repeats (LRRs) are ligand interaction domains found in numerous proteins across all taxonomic kingdoms, including immune system receptors in plants and animals. We devised Repeat Conservation Mapping (RCM), a computational method that predicts functional sites of LRR domains. RCM utilizes two or more homologous sequences and a generic representation of the LRR structure to identify conserved or diversified patches of amino acids on the predicted surface of the LRR. RCM was validated using solved LRR+ligand structures from multiple taxa, identifying ligand interaction sites. RCM was then used for de novo dissection of two plant microbe-associated molecular pattern (MAMP) receptors, EF-TU RECEPTOR (EFR) and FLAGELLIN-SENSING 2 (FLS2). In vivo testing of Arabidopsis thaliana EFR and FLS2 receptors mutagenized at sites identified by RCM demonstrated previously unknown functional sites. The RCM predictions for EFR, FLS2 and a third plant LRR protein, PGIP, compared favorably to predictions from ODA (optimal docking area), Consurf, and PAML (positive selection) analyses, but RCM also made valid functional site predictions not available from these other bioinformatic approaches. RCM analyses can be conducted with any LRR-containing proteins at www.plantpath.wisc.edu/RCM, and the approach should be modifiable for use with other types of repeat protein domains. PMID:21789174

  9. Automated protein motif generation in the structure-based protein function prediction tool ProMOL.

    PubMed

    Osipovitch, Mikhail; Lambrecht, Mitchell; Baker, Cameron; Madha, Shariq; Mills, Jeffrey L; Craig, Paul A; Bernstein, Herbert J

    2015-12-01

    ProMOL, a plugin for the PyMOL molecular graphics system, is a structure-based protein function prediction tool. ProMOL includes a set of routines for building motif templates that are used for screening query structures for enzyme active sites. Previously, each motif template was generated manually and required supervision in the optimization of parameters for sensitivity and selectivity. We developed an algorithm and workflow for the automation of motif building and testing routines in ProMOL. The algorithm uses a set of empirically derived parameters for optimization and requires little user intervention. The automated motif generation algorithm was first tested in a performance comparison with a set of manually generated motifs based on identical active sites from the same 112 PDB entries. The two sets of motifs were equally effective in identifying alignments with homologs and in rejecting alignments with unrelated structures. A second set of 296 active site motifs were generated automatically, based on Catalytic Site Atlas entries with literature citations, as an expansion of the library of existing manually generated motif templates. The new motif templates exhibited comparable performance to the existing ones in terms of hit rates against native structures, homologs with the same EC and Pfam designations, and randomly selected unrelated structures with a different EC designation at the first EC digit, as well as in terms of RMSD values obtained from local structural alignments of motifs and query structures. This research is supported by NIH grant GM078077. PMID:26573864

  10. Application of Gap-Constraints Given Sequential Frequent Pattern Mining for Protein Function Prediction

    PubMed Central

    Park, Hyeon Ah; Kim, Taewook; Li, Meijing; Shon, Ho Sun; Park, Jeong Seok; Ryu, Keun Ho

    2015-01-01

    Objectives Predicting protein function from the protein–protein interaction network is challenging due to its complexity and huge scale of protein interaction process along with inconsistent pattern. Previously proposed methods such as neighbor counting, network analysis, and graph pattern mining has predicted functions by calculating the rules and probability of patterns inside network. Although these methods have shown good prediction, difficulty still exists in searching several functions that are exceptional from simple rules and patterns as a result of not considering the inconsistent aspect of the interaction network. Methods In this article, we propose a novel approach using the sequential pattern mining method with gap-constraints. To overcome the inconsistency problem, we suggest frequent functional patterns to include every possible functional sequence—including patterns for which search is limited by the structure of connection or level of neighborhood layer. We also constructed a tree-graph with the most crucial interaction information of the target protein, and generated candidate sets to assign by sequential pattern mining allowing gaps. Results The parameters of pattern length, maximum gaps, and minimum support were given to find the best setting for the most accurate prediction. The highest accuracy rate was 0.972, which showed better results than the simple neighbor counting approach and link-based approach. Conclusion The results comparison with other approaches has confirmed that the proposed approach could reach more function candidates that previous methods could not obtain. PMID:25938021

  11. SIFTER search: a web server for accurate phylogeny-based protein function prediction.

    PubMed

    Sahraeian, Sayed M; Luo, Kevin R; Brenner, Steven E

    2015-07-01

    We are awash in proteins discovered through high-throughput sequencing projects. As only a minuscule fraction of these have been experimentally characterized, computational methods are widely used for automated annotation. Here, we introduce a user-friendly web interface for accurate protein function prediction using the SIFTER algorithm. SIFTER is a state-of-the-art sequence-based gene molecular function prediction algorithm that uses a statistical model of function evolution to incorporate annotations throughout the phylogenetic tree. Due to the resources needed by the SIFTER algorithm, running SIFTER locally is not trivial for most users, especially for large-scale problems. The SIFTER web server thus provides access to precomputed predictions on 16 863 537 proteins from 232 403 species. Users can explore SIFTER predictions with queries for proteins, species, functions, and homologs of sequences not in the precomputed prediction set. The SIFTER web server is accessible at http://sifter.berkeley.edu/ and the source code can be downloaded. PMID:25979264

  12. SIFTER search: a web server for accurate phylogeny-based protein function prediction

    SciTech Connect

    Sahraeian, Sayed M.; Luo, Kevin R.; Brenner, Steven E.

    2015-05-15

    We are awash in proteins discovered through high-throughput sequencing projects. As only a minuscule fraction of these have been experimentally characterized, computational methods are widely used for automated annotation. Here, we introduce a user-friendly web interface for accurate protein function prediction using the SIFTER algorithm. SIFTER is a state-of-the-art sequence-based gene molecular function prediction algorithm that uses a statistical model of function evolution to incorporate annotations throughout the phylogenetic tree. Due to the resources needed by the SIFTER algorithm, running SIFTER locally is not trivial for most users, especially for large-scale problems. The SIFTER web server thus provides access to precomputed predictions on 16 863 537 proteins from 232 403 species. Users can explore SIFTER predictions with queries for proteins, species, functions, and homologs of sequences not in the precomputed prediction set. Lastly, the SIFTER web server is accessible at http://sifter.berkeley.edu/ and the source code can be downloaded.

  13. SIFTER search: a web server for accurate phylogeny-based protein function prediction

    DOE PAGESBeta

    Sahraeian, Sayed M.; Luo, Kevin R.; Brenner, Steven E.

    2015-05-15

    We are awash in proteins discovered through high-throughput sequencing projects. As only a minuscule fraction of these have been experimentally characterized, computational methods are widely used for automated annotation. Here, we introduce a user-friendly web interface for accurate protein function prediction using the SIFTER algorithm. SIFTER is a state-of-the-art sequence-based gene molecular function prediction algorithm that uses a statistical model of function evolution to incorporate annotations throughout the phylogenetic tree. Due to the resources needed by the SIFTER algorithm, running SIFTER locally is not trivial for most users, especially for large-scale problems. The SIFTER web server thus provides access tomore » precomputed predictions on 16 863 537 proteins from 232 403 species. Users can explore SIFTER predictions with queries for proteins, species, functions, and homologs of sequences not in the precomputed prediction set. Lastly, the SIFTER web server is accessible at http://sifter.berkeley.edu/ and the source code can be downloaded.« less

  14. Recent improvements in prediction of protein structure by global optimization of a potential energy function

    PubMed Central

    Pillardy, Jarosław; Czaplewski, Cezary; Liwo, Adam; Lee, Jooyoung; Ripoll, Daniel R.; Kaźmierkiewicz, Rajmund; Ołdziej, Stanisław; Wedemeyer, William J.; Gibson, Kenneth D.; Arnautova, Yelena A.; Saunders, Jeff; Ye, Yuan-Jie; Scheraga, Harold A.

    2001-01-01

    Recent improvements of a hierarchical ab initio or de novo approach for predicting both α and β structures of proteins are described. The united-residue energy function used in this procedure includes multibody interactions from a cumulant expansion of the free energy of polypeptide chains, with their relative weights determined by Z-score optimization. The critical initial stage of the hierarchical procedure involves a search of conformational space by the conformational space annealing (CSA) method, followed by optimization of an all-atom model. The procedure was assessed in a recent blind test of protein structure prediction (CASP4). The resulting lowest-energy structures of the target proteins (ranging in size from 70 to 244 residues) agreed with the experimental structures in many respects. The entire experimental structure of a cyclic α-helical protein of 70 residues was predicted to within 4.3 Å α-carbon (Cα) rms deviation (rmsd) whereas, for other α-helical proteins, fragments of roughly 60 residues were predicted to within 6.0 Å Cα rmsd. Whereas β structures can now be predicted with the new procedure, the success rate for α/β- and β-proteins is lower than that for α-proteins at present. For the β portions of α/β structures, the Cα rmsd's are less than 6.0 Å for contiguous fragments of 30–40 residues; for one target, three fragments (of length 10, 23, and 28 residues, respectively) formed a compact part of the tertiary structure with a Cα rmsd less than 6.0 Å. Overall, these results constitute an important step toward the ab initio prediction of protein structure solely from the amino acid sequence. PMID:11226239

  15. Evaluating a variety of text-mined features for automatic protein function prediction with GOstruct.

    PubMed

    Funk, Christopher S; Kahanda, Indika; Ben-Hur, Asa; Verspoor, Karin M

    2015-01-01

    Most computational methods that predict protein function do not take advantage of the large amount of information contained in the biomedical literature. In this work we evaluate both ontology term co-mention and bag-of-words features mined from the biomedical literature and analyze their impact in the context of a structured output support vector machine model, GOstruct. We find that even simple literature based features are useful for predicting human protein function (F-max: Molecular Function =0.408, Biological Process =0.461, Cellular Component =0.608). One advantage of using literature features is their ability to offer easy verification of automated predictions. We find through manual inspection of misclassifications that some false positive predictions could be biologically valid predictions based upon support extracted from the literature. Additionally, we present a "medium-throughput" pipeline that was used to annotate a large subset of co-mentions; we suggest that this strategy could help to speed up the rate at which proteins are curated. PMID:26005564

  16. Structure based function prediction of proteins using fragment library frequency vectors

    PubMed Central

    Yadav, Akshay; Jayaraman, Valadi Krishnamoorthy

    2012-01-01

    The function of the protein is primarily dictated by its structure. Therefore it is far more logical to find the functional clues of the protein in its overall 3-dimensional fold or its global structure. In this paper, we have developed a novel Support Vector Machines (SVM) based prediction model for functional classification and prediction of proteins using features extracted from its global structure based on fragment libraries. Fragment libraries have been previously used for abintio modelling of proteins and protein structure comparisons. The query protein structure is broken down into a collection of short contiguous backbone fragments and this collection is discretized using a library of fragments. The input feature vector is frequency vector that counts the number of each library fragment in the collection of fragments by all-to-all fragment comparisons. SVM models were trained and optimised for obtaining the best 10-fold Cross validation accuracy for classification. As an example, this method was applied for prediction and classification of Cell Adhesion molecules (CAMs). Thirty-four different fragment libraries with sizes ranging from 4 to 400 and fragment lengths ranging from 4 to 12 were used for obtaining the best prediction model. The best 10-fold CV accuracy of 95.25% was obtained for library of 400 fragments of length 10. An accuracy of 87.5% was obtained on an unseen test dataset consisting of 20 CAMs and 20 NonCAMs. This shows that protein structure can be accurately and uniquely described using 400 representative fragments of length 10. PMID:23144557

  17. Membrane Protein Prediction Methods

    PubMed Central

    Punta, Marco; Forrest, Lucy R.; Bigelow, Henry; Kernytsky, Andrew; Liu, Jinfeng; Rost, Burkhard

    2007-01-01

    We survey computational approaches that tackle membrane protein structure and function prediction. While describing the main ideas that have led to the development of the most relevant and novel methods, we also discuss pitfalls, provide practical hints and highlight the challenges that remain. The methods covered include: sequence alignment, motif search, functional residue identification, transmembrane segment and protein topology predictions, homology and ab initio modeling. Overall, predictions of functional and structural features of membrane proteins are improving, although progress is hampered by the limited amount of high-resolution experimental information available. While predictions of transmembrane segments and protein topology rank among the most accurate methods in computational biology, more attention and effort will be required in the future to ameliorate database search, homology and ab initio modeling. PMID:17367718

  18. Prediction of the orientations of adsorbed protein using an empirical energy function with implicit solvation.

    PubMed

    Sun, Yu; Welsh, William J; Latour, Robert A

    2005-06-01

    When simulating protein adsorption behavior, decisions must first be made regarding how the protein should be oriented on the surface. To address this problem, we have developed a molecular simulation program that combines an empirical adsorption free energy function with an efficient configurational search method to calculate orientation-dependent adsorption free energies between proteins and functionalized surfaces. The configuration space is searched systematically using a quaternion rotation technique, and the adsorption free energy is evaluated using an empirical energy function with an efficient grid-based calculational method. In this paper, the developed method is applied to analyze the preferred orientations of a model protein, lysozyme, on various functionalized alkanethiol self-assembled monolayer (SAM) surfaces by the generation of contour graphs that relate adsorption free energy to adsorbed orientation, and the results are compared with experimental observations. As anticipated, the adsorbed orientation of lysozyme is predicted to be dependent on the discrete organization of the functional groups presented by the surface. Lysozyme, which is a positively charged protein, is predicted to adsorb on its 'side' on both hydrophobic and negatively charged surfaces. On surfaces with discrete positively charged sites, attractive interaction energies can also be obtained due to the presence of discrete local negative charges present on the lysozyme surface. In this case, 'end-on' orientations are preferred. Additionally, SAM surface models with mixed functionality suggest that the interactions between lysozyme and surfaces could be greatly enhanced if individual surface functional groups are able to access the catalytic cleft region of lysozyme, similar to ligand-receptor interactions. The contour graphs generated by this method can be used to identify low-energy orientations that can then be used as starting points for further simulations to investigate

  19. Predicting Structure and Function for Novel Proteins of an Extremophilic Iron Oxidizing Bacterium

    NASA Astrophysics Data System (ADS)

    Wheeler, K.; Zemla, A.; Banfield, J.; Thelen, M.

    2007-12-01

    Proteins isolated from uncultivated microbial populations represent the functional components of microbial processes and contribute directly to community fitness under natural conditions. Investigations into proteins in the environment are hindered by the lack of genome data, or where available, the high proportion of proteins of unknown function. We have identified thousands of proteins from biofilms in the extremely acidic drainage outflow of an iron mine ecosystem (1). With an extensive genomic and proteomic foundation, we have focused directly on the problem of several hundred proteins of unknown function within this well-defined model system. Here we describe the geobiological insights gained by using a high throughput computational approach for predicting structure and function of 421 novel proteins from the biofilm community. We used a homology based modeling system to compare these proteins to those of known structure (AS2TS) (2). This approach has resulted in the assignment of structures to 360 proteins (85%) and provided functional information for up to 75% of the modeled proteins. Detailed examination of the modeling results enables confident, high-throughput prediction of the roles of many of the novel proteins within the microbial community. For instance, one prediction places a protein in the phosphoenolpyruvate/pyruvate domain superfamily as a carboxylase that fills in a gap in an otherwise complete carbon cycle. Particularly important for a community in such a metal rich environment is the evolution of over 25% of the novel proteins that contain a metal cofactor; of these, one third are likely Fe containing proteins. Two of the most abundant proteins in biofilm samples are unusual c-type cytochromes. Both of these proteins catalyze iron- oxidation, a key metabolic reaction supporting the energy requirements of this community. Structural models of these cytochromes verify our experimental results on heme binding and electron transfer reactivity, and

  20. How and when should interactome-derived clusters be used to predict functional modules and protein function?

    PubMed Central

    Song, Jimin; Singh, Mona

    2009-01-01

    Motivation: Clustering of protein–protein interaction networks is one of the most common approaches for predicting functional modules, protein complexes and protein functions. But, how well does clustering perform at these tasks? Results: We develop a general framework to assess how well computationally derived clusters in physical interactomes overlap functional modules derived via the Gene Ontology (GO). Using this framework, we evaluate six diverse network clustering algorithms using Saccharomyces cerevisiae and show that (i) the performances of these algorithms can differ substantially when run on the same network and (ii) their relative performances change depending upon the topological characteristics of the network under consideration. For the specific task of function prediction in S.cerevisiae, we demonstrate that, surprisingly, a simple non-clustering guilt-by-association approach outperforms widely used clustering-based approaches that annotate a protein with the overrepresented biological process and cellular component terms in its cluster; this is true over the range of clustering algorithms considered. Further analysis parameterizes performance based on the number of annotated proteins, and suggests when clustering approaches should be used for interactome functional analyses. Overall our results suggest a re-examination of when and how clustering approaches should be applied to physical interactomes, and establishes guidelines by which novel clustering approaches for biological networks should be justified and evaluated with respect to functional analysis. Contact: msingh@cs.princeton.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:19770263

  1. Multi-instance multi-label distance metric learning for genome-wide protein function prediction.

    PubMed

    Xu, Yonghui; Min, Huaqing; Song, Hengjie; Wu, Qingyao

    2016-08-01

    Multi-instance multi-label (MIML) learning has been proven to be effective for the genome-wide protein function prediction problems where each training example is associated with not only multiple instances but also multiple class labels. To find an appropriate MIML learning method for genome-wide protein function prediction, many studies in the literature attempted to optimize objective functions in which dissimilarity between instances is measured using the Euclidean distance. But in many real applications, Euclidean distance may be unable to capture the intrinsic similarity/dissimilarity in feature space and label space. Unlike other previous approaches, in this paper, we propose to learn a multi-instance multi-label distance metric learning framework (MIMLDML) for genome-wide protein function prediction. Specifically, we learn a Mahalanobis distance to preserve and utilize the intrinsic geometric information of both feature space and label space for MIML learning. In addition, we try to deal with the sparsely labeled data by giving weight to the labeled data. Extensive experiments on seven real-world organisms covering the biological three-domain system (i.e., archaea, bacteria, and eukaryote; Woese et al., 1990) show that the MIMLDML algorithm is superior to most state-of-the-art MIML learning algorithms. PMID:26923212

  2. Negative example selection for protein function prediction: the NoGO database.

    PubMed

    Youngs, Noah; Penfold-Brown, Duncan; Bonneau, Richard; Shasha, Dennis

    2014-06-01

    Negative examples - genes that are known not to carry out a given protein function - are rarely recorded in genome and proteome annotation databases, such as the Gene Ontology database. Negative examples are required, however, for several of the most powerful machine learning methods for integrative protein function prediction. Most protein function prediction efforts have relied on a variety of heuristics for the choice of negative examples. Determining the accuracy of methods for negative example prediction is itself a non-trivial task, given that the Open World Assumption as applied to gene annotations rules out many traditional validation metrics. We present a rigorous comparison of these heuristics, utilizing a temporal holdout, and a novel evaluation strategy for negative examples. We add to this comparison several algorithms adapted from Positive-Unlabeled learning scenarios in text-classification, which are the current state of the art methods for generating negative examples in low-density annotation contexts. Lastly, we present two novel algorithms of our own construction, one based on empirical conditional probability, and the other using topic modeling applied to genes and annotations. We demonstrate that our algorithms achieve significantly fewer incorrect negative example predictions than the current state of the art, using multiple benchmarks covering multiple organisms. Our methods may be applied to generate negative examples for any type of method that deals with protein function, and to this end we provide a database of negative examples in several well-studied organisms, for general use (The NoGO database, available at: bonneaulab.bio.nyu.edu/nogo.html). PMID:24922051

  3. Negative Example Selection for Protein Function Prediction: The NoGO Database

    PubMed Central

    Youngs, Noah; Penfold-Brown, Duncan; Bonneau, Richard; Shasha, Dennis

    2014-01-01

    Negative examples – genes that are known not to carry out a given protein function – are rarely recorded in genome and proteome annotation databases, such as the Gene Ontology database. Negative examples are required, however, for several of the most powerful machine learning methods for integrative protein function prediction. Most protein function prediction efforts have relied on a variety of heuristics for the choice of negative examples. Determining the accuracy of methods for negative example prediction is itself a non-trivial task, given that the Open World Assumption as applied to gene annotations rules out many traditional validation metrics. We present a rigorous comparison of these heuristics, utilizing a temporal holdout, and a novel evaluation strategy for negative examples. We add to this comparison several algorithms adapted from Positive-Unlabeled learning scenarios in text-classification, which are the current state of the art methods for generating negative examples in low-density annotation contexts. Lastly, we present two novel algorithms of our own construction, one based on empirical conditional probability, and the other using topic modeling applied to genes and annotations. We demonstrate that our algorithms achieve significantly fewer incorrect negative example predictions than the current state of the art, using multiple benchmarks covering multiple organisms. Our methods may be applied to generate negative examples for any type of method that deals with protein function, and to this end we provide a database of negative examples in several well-studied organisms, for general use (The NoGO database, available at: bonneaulab.bio.nyu.edu/nogo.html). PMID:24922051

  4. Genome wide prediction of protein function via a generic knowledge discovery approach based on evidence integration

    PubMed Central

    Xiong, Jianghui; Rayner, Simon; Luo, Kunyi; Li, Yinghui; Chen, Shanguang

    2006-01-01

    Background The automation of many common molecular biology techniques has resulted in the accumulation of vast quantities of experimental data. One of the major challenges now facing researchers is how to process this data to yield useful information about a biological system (e.g. knowledge of genes and their products, and the biological roles of proteins, their molecular functions, localizations and interaction networks). We present a technique called Global Mapping of Unknown Proteins (GMUP) which uses the Gene Ontology Index to relate diverse sources of experimental data by creation of an abstraction layer of evidence data. This abstraction layer is used as input to a neural network which, once trained, can be used to predict function from the evidence data of unannotated proteins. The method allows us to include almost any experimental data set related to protein function, which incorporates the Gene Ontology, to our evidence data in order to seek relationships between the different sets. Results We have demonstrated the capabilities of this method in two ways. We first collected various experimental datasets associated with yeast (Saccharomyces cerevisiae) and applied the technique to a set of previously annotated open reading frames (ORFs). These ORFs were divided into training and test sets and were used to examine the accuracy of the predictions made by our method. Then we applied GMUP to previously un-annotated ORFs and made 1980, 836 and 1969 predictions corresponding to the GO Biological Process, Molecular Function and Cellular Component sub-categories respectively. We found that GMUP was particularly successful at predicting ORFs with functions associated with the ribonucleoprotein complex, protein metabolism and transportation. Conclusion This study presents a global and generic gene knowledge discovery approach based on evidence integration of various genome-scale data. It can be used to provide insight as to how certain biological processes are

  5. PTMcode: a database of known and predicted functional associations between post-translational modifications in proteins

    PubMed Central

    Minguez, Pablo; Letunic, Ivica; Parca, Luca; Bork, Peer

    2013-01-01

    Post-translational modifications (PTMs) are involved in the regulation and structural stabilization of eukaryotic proteins. The combination of individual PTM states is a key to modulate cellular functions as became evident in a few well-studied proteins. This combinatorial setting, dubbed the PTM code, has been proposed to be extended to whole proteomes in eukaryotes. Although we are still far from deciphering such a complex language, thousands of protein PTM sites are being mapped by high-throughput technologies, thus providing sufficient data for comparative analysis. PTMcode (http://ptmcode.embl.de) aims to compile known and predicted PTM associations to provide a framework that would enable hypothesis-driven experimental or computational analysis of various scales. In its first release, PTMcode provides PTM functional associations of 13 different PTM types within proteins in 8 eukaryotes. They are based on five evidence channels: a literature survey, residue co-evolution, structural proximity, PTMs at the same residue and location within PTM highly enriched protein regions (hotspots). PTMcode is presented as a protein-based searchable database with an interactive web interface providing the context of the co-regulation of nearly 75 000 residues in >10 000 proteins. PMID:23193284

  6. It’s the machine that matters: Predicting gene function and phenotype from protein networks

    PubMed Central

    Wang, Peggy I.; Marcotte, Edward M.

    2010-01-01

    Increasing knowledge about the organization of proteins into complexes, systems, and pathways has led to a flowering of theoretical approaches for exploiting this knowledge in order to better learn the functions of proteins and their roles underlying phenotypic traits and diseases. Much of this body of theory has been developed and tested in model organisms, relying on their relative simplicity and genetic and biochemical tractability to accelerate the research. In this review, we discuss several of the major approaches for computationally integrating proteomics and genomics observations into integrated protein networks, then applying guilt-by-association in these networks in order to identify genes underlying traits. Recent trends in this field include a rising appreciation of the modular network organization of proteins underlying traits or mutational phenotypes, and how to exploit such protein modularity using computational approaches related to the internet search algorithm PageRank. Many protein network-based predictions have recently been experimentally confirmed in yeast, worms, plants, and mice, and several successful approaches in model organisms have been directly translated to analyze human disease, with notable recent applications to glioma and breast cancer prognosis. PMID:20637909

  7. sDFIRE: Sequence-specific statistical energy function for protein structure prediction by decoy selections.

    PubMed

    Hoque, Md Tamjidul; Yang, Yuedong; Mishra, Avdesh; Zhou, Yaoqi

    2016-05-01

    An important unsolved problem in molecular and structural biology is the protein folding and structure prediction problem. One major bottleneck for solving this is the lack of an accurate energy to discriminate near-native conformations against other possible conformations. Here we have developed sDFIRE energy function, which is an optimized linear combination of DFIRE (the Distance-scaled Finite Ideal gas Reference state based Energy), the orientation dependent (polar-polar and polar-nonpolar) statistical potentials, and the matching scores between predicted and model structural properties including predicted main-chain torsion angles and solvent accessible surface area. The weights for these scoring terms are optimized by three widely used decoy sets consisting of a total of 134 proteins. Independent tests on CASP8 and CASP9 decoy sets indicate that sDFIRE outperforms other state-of-the-art energy functions in selecting near native structures and in the Pearson's correlation coefficient between the energy score and structural accuracy of the model (measured by TM-score). © 2016 Wiley Periodicals, Inc. PMID:26849026

  8. Coevolutionary modeling of protein sequences: Predicting structure, function, and mutational landscapes

    NASA Astrophysics Data System (ADS)

    Weigt, Martin

    Over the last years, biological research has been revolutionized by experimental high-throughput techniques, in particular by next-generation sequencing technology. Unprecedented amounts of data are accumulating, and there is a growing request for computational methods unveiling the information hidden in raw data, thereby increasing our understanding of complex biological systems. Statistical-physics models based on the maximum-entropy principle have, in the last few years, played an important role in this context. To give a specific example, proteins and many non-coding RNA show a remarkable degree of structural and functional conservation in the course of evolution, despite a large variability in amino acid sequences. We have developed a statistical-mechanics inspired inference approach - called Direct-Coupling Analysis - to link this sequence variability (easy to observe in sequence alignments, which are available in public sequence databases) to bio-molecular structure and function. In my presentation I will show, how this methodology can be used (i) to infer contacts between residues and thus to guide tertiary and quaternary protein structure prediction and RNA structure prediction, (ii) to discriminate interacting from non-interacting protein families, and thus to infer conserved protein-protein interaction networks, and (iii) to reconstruct mutational landscapes and thus to predict the phenotypic effect of mutations. References [1] M. Figliuzzi, H. Jacquier, A. Schug, O. Tenaillon and M. Weigt ''Coevolutionary landscape inference and the context-dependence of mutations in beta-lactamase TEM-1'', Mol. Biol. Evol. (2015), doi: 10.1093/molbev/msv211 [2] E. De Leonardis, B. Lutz, S. Ratz, S. Cocco, R. Monasson, A. Schug, M. Weigt ''Direct-Coupling Analysis of nucleotide coevolution facilitates RNA secondary and tertiary structure prediction'', Nucleic Acids Research (2015), doi: 10.1093/nar/gkv932 [3] F. Morcos, A. Pagnani, B. Lunt, A. Bertolino, D. Marks, C

  9. Semi-supervised Learning Predicts Approximately One Third of the Alternative Splicing Isoforms as Functional Proteins.

    PubMed

    Hao, Yanqi; Colak, Recep; Teyra, Joan; Corbi-Verge, Carles; Ignatchenko, Alexander; Hahne, Hannes; Wilhelm, Mathias; Kuster, Bernhard; Braun, Pascal; Kaida, Daisuke; Kislinger, Thomas; Kim, Philip M

    2015-07-14

    Alternative splicing acts on transcripts from almost all human multi-exon genes. Notwithstanding its ubiquity, fundamental ramifications of splicing on protein expression remain unresolved. The number and identity of spliced transcripts that form stably folded proteins remain the sources of considerable debate, due largely to low coverage of experimental methods and the resulting absence of negative data. We circumvent this issue by developing a semi-supervised learning algorithm, positive unlabeled learning for splicing elucidation (PULSE; http://www.kimlab.org/software/pulse), which uses 48 features spanning various categories. We validated its accuracy on sets of bona fide protein isoforms and directly on mass spectrometry (MS) spectra for an overall AU-ROC of 0.85. We predict that around 32% of "exon skipping" alternative splicing events produce stable proteins, suggesting that the process engenders a significant number of previously uncharacterized proteins. We also provide insights into the distribution of positive isoforms in various functional classes and into the structural effects of alternative splicing. PMID:26146086

  10. Prediction of mitochondrial protein function by comparative physiology and phylogenetic profiling.

    PubMed

    Cheng, Yiming; Perocchi, Fabiana

    2015-01-01

    According to the endosymbiotic theory, mitochondria originate from a free-living alpha-proteobacteria that established an intracellular symbiosis with the ancestor of present-day eukaryotic cells. During the bacterium-to-organelle transformation, the proto-mitochondrial proteome has undergone a massive turnover, whereby less than 20 % of modern mitochondrial proteomes can be traced back to the bacterial ancestor. Moreover, mitochondrial proteomes from several eukaryotic organisms, for example, yeast and human, show a rather modest overlap, reflecting differences in mitochondrial physiology. Those differences may result from the combination of differential gain and loss of genes and retargeting processes among lineages. Therefore, an evolutionary signature, also called "phylogenetic profile", could be generated for every mitochondrial protein. Here, we present two evolutionary biology approaches to study mitochondrial physiology: the first strategy, which we refer to as "comparative physiology," allows the de novo identification of mitochondrial proteins involved in a physiological function; the second, known as "phylogenetic profiling," allows to predict protein functions and functional interactions by comparing phylogenetic profiles of uncharacterized and known components. PMID:25631025

  11. SIFT Indel: Predictions for the Functional Effects of Amino Acid Insertions/Deletions in Proteins

    PubMed Central

    Hu, Jing; Ng, Pauline C.

    2013-01-01

    Indels in the coding regions of a gene can either cause frameshifts or amino acid insertions/deletions. Frameshifting indels are indels that have a length that is not divisible by 3 and subsequently cause frameshifts. Indels that have a length divisible by 3 cause amino acid insertions/deletions or block substitutions; we call these 3n indels. The new amino acid changes resulting from 3n indels could potentially affect protein function. Therefore, we construct a SIFT Indel prediction algorithm for 3n indels which achieves 82% accuracy, 81% sensitivity, 82% specificity, 82% precision, 0.63 MCC, and 0.87 AUC by 10-fold cross-validation. We have previously published a prediction algorithm for frameshifting indels. The rules for the prediction of 3n indels are different from the rules for the prediction of frameshifting indels and reflect the biological differences of these two different types of variations. SIFT Indel was applied to human 3n indels from the 1000 Genomes Project and the Exome Sequencing Project. We found that common variants are less likely to be deleterious than rare variants. The SIFT indel prediction algorithm for 3n indels is available at http://sift-dna.org/ PMID:24194902

  12. A Mixed QM/MM Scoring Function to Predict Protein-Ligand Binding Affinity.

    PubMed

    Hayik, Seth A; Dunbrack, Roland; Merz, Kenneth M

    2010-09-01

    Computational methods for predicting protein-ligand binding free energy continue to be popular as a potential cost-cutting method in the drug discovery process. However, accurate predictions are often difficult to make as estimates must be made for certain electronic and entropic terms in conventional force field based scoring functions. Mixed quantum mechanics/molecular mechanics (QM/MM) methods allow electronic effects for a small region of the protein to be calculated, treating the remaining atoms as a fixed charge background for the active site. Such a semi-empirical QM/MM scoring function has been implemented in AMBER using DivCon and tested on a set of 23 metalloprotein-ligand complexes, where QM/MM methods provide a particular advantage in the modeling of the metal ion. The binding affinity of this set of proteins can be calculated with an R(2) of 0.64 and a standard deviation of 1.88 kcal/mol without fitting and 0.71 and a standard deviation of 1.69 kcal/mol with fitted weighting of the individual scoring terms. In this study we explore using various methods to calculate terms in the binding free energy equation, including entropy estimates and minimization standards. From these studies we found that using the rotational bond estimate to ligand entropy results in a reasonable R(2) of 0.63 without fitting. We also found that using the ESCF energy of the proteins without minimization resulted in an R(2) of 0.57, when using the rotatable bond entropy estimate. PMID:21221417

  13. The involvement of proline-rich protein Mus musculus predicted gene 4736 in ocular surface functions

    PubMed Central

    Qi, Xia; Ren, Sheng-Wei; Zhang, Feng; Wang, Yi-Qiang

    2016-01-01

    AIM To research the two homologous predicted proline-rich protein genes, Mus musculus predicted gene 4736 (MP4) and proline-rich protein BstNI subfamily 1 (Prb1) which were significantly upregulated in cultured corneal organs when encountering fungal pathogen preparations. This study was to confirm the expression and potential functions of these two genes in ocular surface. METHODS A Pseudomonas aeruginosa keratitis model was established in Balb/c mice. One day post infection, mRNA level of MP4 was measured using real-time polymerase chain reaction (PCR), and MP4 protein detected by immunohistochemistry (IHC) or Western blot using a customized polyclonal anti-MP4 antibody preparation. Lacrimal glands from normal mice were also subjected to IHC staining for MP4. An online bioinformatics program, BioGPS, was utilized to screen public data to determine other potential locations of MP4. RESULTS One day after keratitis induction, MP4 was upregulated in the corneas at both mRNA level as measured using real-time PCR and protein levels as measured using Western blot and IHC. BioGPS analysis of public data suggested that the MP4 gene was most abundantly expressed in the lacrimal glands, and IHC revealed that normal murine lacrimal glands were positive for MP4 staining. CONCLUSION MP4 and Prb1 are closely related with the physiology and pathological processes of the ocular surface. Considering the significance of ocular surface abnormalities like dry eye, we propose that MP4 and Prb1 contribute to homeostasis of ocular surface, and deserve more extensive functional and disease correlation studies. PMID:27588265

  14. Enhancing protein function prediction with taxonomic constraints--The Argot2.5 web server.

    PubMed

    Lavezzo, Enrico; Falda, Marco; Fontana, Paolo; Bianco, Luca; Toppo, Stefano

    2016-01-15

    Argot2.5 (Annotation Retrieval of Gene Ontology Terms) is a web server designed to predict protein function. It is an updated version of the previous Argot2 enriched with new features in order to enhance its usability and its overall performance. The algorithmic strategy exploits the grouping of Gene Ontology terms by means of semantic similarity to infer protein function. The tool has been challenged over two independent benchmarks and compared to Argot2, PANNZER, and a baseline method relying on BLAST, proving to obtain a better performance thanks to the contribution of some key interventions in critical steps of the working pipeline. The most effective changes regard: (a) the selection of the input data from sequence similarity searches performed against a clustered version of UniProt databank and a remodeling of the weights given to Pfam hits, (b) the application of taxonomic constraints to filter out annotations that cannot be applied to proteins belonging to the species under investigation. The taxonomic rules are derived from our in-house developed tool, FunTaxIS, that extends those provided by the Gene Ontology consortium. The web server is free for academic users and is available online at http://www.medcomp.medicina.unipd.it/Argot2-5/. PMID:26318087

  15. Artefacts and biases affecting the evaluation of scoring functions on decoy sets for protein structure prediction

    PubMed Central

    Handl, Julia; Knowles, Joshua; Lovell, Simon C.

    2009-01-01

    Motivation: Decoy datasets, consisting of a solved protein structure and numerous alternative native-like structures, are in common use for the evaluation of scoring functions in protein structure prediction. Several pitfalls with the use of these datasets have been identified in the literature, as well as useful guidelines for generating more effective decoy datasets. We contribute to this ongoing discussion an empirical assessment of several decoy datasets commonly used in experimental studies. Results: We find that artefacts and sampling issues in the large majority of these data make it trivial to discriminate the native structure. This underlines that evaluation based on the rank/z-score of the native is a weak test of scoring function performance. Moreover, sampling biases present in the way decoy sets are generated or used can strongly affect other types of evaluation measures such as the correlation between score and root mean squared deviation (RMSD) to the native. We demonstrate how, depending on type of bias and evaluation context, sampling biases may lead to both over- or under-estimation of the quality of scoring terms, functions or methods. Availability: Links to the software and data used in this study are available at http://dbkgroup.org/handl/decoy_sets. Contact: simon.lovell@manchester.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online. PMID:19297350

  16. Statistical prediction of protein structural, localization and functional properties by the analysis of its fragment mass distributions after proteolytic cleavage

    PubMed Central

    Bogachev, Mikhail I.; Kayumov, Airat R.; Markelov, Oleg A.; Bunde, Armin

    2016-01-01

    Structural, localization and functional properties of unknown proteins are often being predicted from their primary polypeptide chains using sequence alignment with already characterized proteins and consequent molecular modeling. Here we suggest an approach to predict various structural and structure-associated properties of proteins directly from the mass distributions of their proteolytic cleavage fragments. For amino-acid-specific cleavages, the distributions of fragment masses are determined by the distributions of inter-amino-acid intervals in the protein, that in turn apparently reflect its structural and structure-related features. Large-scale computer simulations revealed that for transmembrane proteins, either α-helical or β -barrel secondary structure could be predicted with about 90% accuracy after thermolysin cleavage. Moreover, 3/4 intrinsically disordered proteins could be correctly distinguished from proteins with fixed three-dimensional structure belonging to all four SCOP structural classes by combining 3–4 different cleavages. Additionally, in some cases the protein cellular localization (cytosolic or membrane-associated) and its host organism (Firmicute or Proteobacteria) could be predicted with around 80% accuracy. In contrast to cytosolic proteins, for membrane-associated proteins exhibiting specific structural conformations, their monotopic or transmembrane localization and functional group (ATP-binding, transporters, sensors and so on) could be also predicted with high accuracy and particular robustness against missing cleavages. PMID:26924271

  17. Statistical prediction of protein structural, localization and functional properties by the analysis of its fragment mass distributions after proteolytic cleavage

    NASA Astrophysics Data System (ADS)

    Bogachev, Mikhail I.; Kayumov, Airat R.; Markelov, Oleg A.; Bunde, Armin

    2016-02-01

    Structural, localization and functional properties of unknown proteins are often being predicted from their primary polypeptide chains using sequence alignment with already characterized proteins and consequent molecular modeling. Here we suggest an approach to predict various structural and structure-associated properties of proteins directly from the mass distributions of their proteolytic cleavage fragments. For amino-acid-specific cleavages, the distributions of fragment masses are determined by the distributions of inter-amino-acid intervals in the protein, that in turn apparently reflect its structural and structure-related features. Large-scale computer simulations revealed that for transmembrane proteins, either α-helical or β -barrel secondary structure could be predicted with about 90% accuracy after thermolysin cleavage. Moreover, 3/4 intrinsically disordered proteins could be correctly distinguished from proteins with fixed three-dimensional structure belonging to all four SCOP structural classes by combining 3-4 different cleavages. Additionally, in some cases the protein cellular localization (cytosolic or membrane-associated) and its host organism (Firmicute or Proteobacteria) could be predicted with around 80% accuracy. In contrast to cytosolic proteins, for membrane-associated proteins exhibiting specific structural conformations, their monotopic or transmembrane localization and functional group (ATP-binding, transporters, sensors and so on) could be also predicted with high accuracy and particular robustness against missing cleavages.

  18. Protein functional properties prediction in sparsely-label PPI networks through regularized non-negative matrix factorization

    PubMed Central

    2015-01-01

    Background Predicting functional properties of proteins in protein-protein interaction (PPI) networks presents a challenging problem and has important implication in computational biology. Collective classification (CC) that utilizes both attribute features and relational information to jointly classify related proteins in PPI networks has been shown to be a powerful computational method for this problem setting. Enabling CC usually increases accuracy when given a fully-labeled PPI network with a large amount of labeled data. However, such labels can be difficult to obtain in many real-world PPI networks in which there are usually only a limited number of labeled proteins and there are a large amount of unlabeled proteins. In this case, most of the unlabeled proteins may not connected to the labeled ones, the supervision knowledge cannot be obtained effectively from local network connections. As a consequence, learning a CC model in sparsely-labeled PPI networks can lead to poor performance. Results We investigate a latent graph approach for finding an integration latent graph by exploiting various latent linkages and judiciously integrate the investigated linkages to link (separate) the proteins with similar (different) functions. We develop a regularized non-negative matrix factorization (RNMF) algorithm for CC to make protein functional properties prediction by utilizing various data sources that are available in this problem setting, including attribute features, latent graph, and unlabeled data information. In RNMF, a label matrix factorization term and a network regularization term are incorporated into the non-negative matrix factorization (NMF) objective function to seek a matrix factorization that respects the network structure and label information for classification prediction. Conclusion Experimental results on KDD Cup tasks predicting the localization and functions of proteins to yeast genes demonstrate the effectiveness of the proposed RNMF method for

  19. PINALOG: a novel approach to align protein interaction networks—implications for complex detection and function prediction

    PubMed Central

    Phan, Hang T. T.; Sternberg, Michael J. E.

    2012-01-01

    Motivation: Analysis of protein–protein interaction networks (PPINs) at the system level has become increasingly important in understanding biological processes. Comparison of the interactomes of different species not only provides a better understanding of species evolution but also helps with detecting conserved functional components and in function prediction. Method and Results: Here we report a PPIN alignment method, called PINALOG, which combines information from protein sequence, function and network topology. Alignment of human and yeast PPINs reveals several conserved subnetworks between them that participate in similar biological processes, notably the proteasome and transcription related processes. PINALOG has been tested for its power in protein complex prediction as well as function prediction. Comparison with PSI-BLAST in predicting protein function in the twilight zone also shows that PINALOG is valuable in predicting protein function. Availability and implementation: The PINALOG web-server is freely available from http://www.sbg.bio.ic.ac.uk/~pinalog. The PINALOG program and associated data are available from the Download section of the web-server. Contact: m.sternberg@imperial.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online. PMID:22419782

  20. Predicting functional divergence in protein evolution by site-specific rate shifts

    NASA Technical Reports Server (NTRS)

    Gaucher, Eric A.; Gu, Xun; Miyamoto, Michael M.; Benner, Steven A.

    2002-01-01

    Most modern tools that analyze protein evolution allow individual sites to mutate at constant rates over the history of the protein family. However, Walter Fitch observed in the 1970s that, if a protein changes its function, the mutability of individual sites might also change. This observation is captured in the "non-homogeneous gamma model", which extracts functional information from gene families by examining the different rates at which individual sites evolve. This model has recently been coupled with structural and molecular biology to identify sites that are likely to be involved in changing function within the gene family. Applying this to multiple gene families highlights the widespread divergence of functional behavior among proteins to generate paralogs and orthologs.

  1. Prediction of the functional class of metal-binding proteins from sequence derived physicochemical properties by support vector machine approach

    PubMed Central

    Lin, HH; Han, LY; Zhang, HL; Zheng, CJ; Xie, B; Cao, ZW; Chen, YZ

    2006-01-01

    Metal-binding proteins play important roles in structural stability, signaling, regulation, transport, immune response, metabolism control, and metal homeostasis. Because of their functional and sequence diversity, it is desirable to explore additional methods for predicting metal-binding proteins irrespective of sequence similarity. This work explores support vector machines (SVM) as such a method. SVM prediction systems were developed by using 53,333 metal-binding and 147,347 non-metal-binding proteins, and evaluated by an independent set of 31,448 metal-binding and 79,051 non-metal-binding proteins. The computed prediction accuracy is 86.3%, 81.6%, 83.5%, 94.0%, 81.2%, 85.4%, 77.6%, 90.4%, 90.9%, 74.9% and 78.1% for calcium-binding, cobalt-binding, copper-binding, iron-binding, magnesium-binding, manganese-binding, nickel-binding, potassium-binding, sodium-binding, zinc-binding, and all metal-binding proteins respectively. The accuracy for the non-member proteins of each class is 88.2%, 99.9%, 98.1%, 91.4%, 87.9%, 94.5%, 99.2%, 99.9%, 99.9%, 98.0%, and 88.0% respectively. Comparable accuracies were obtained by using a different SVM kernel function. Our method predicts 67% of the 87 metal-binding proteins non-homologous to any protein in the Swissprot database and 85.3% of the 333 proteins of known metal-binding domains as metal-binding. These suggest the usefulness of SVM for facilitating the prediction of metal-binding proteins. Our software can be accessed at the SVMProt server . PMID:17254297

  2. The PredictProtein server

    PubMed Central

    Rost, Burkhard; Yachdav, Guy; Liu, Jinfeng

    2004-01-01

    PredictProtein (http://www.predictprotein.org) is an Internet service for sequence analysis and the prediction of protein structure and function. Users submit protein sequences or alignments; PredictProtein returns multiple sequence alignments, PROSITE sequence motifs, low-complexity regions (SEG), nuclear localization signals, regions lacking regular structure (NORS) and predictions of secondary structure, solvent accessibility, globular regions, transmembrane helices, coiled-coil regions, structural switch regions, disulfide-bonds, sub-cellular localization and functional annotations. Upon request fold recognition by prediction-based threading, CHOP domain assignments, predictions of transmembrane strands and inter-residue contacts are also available. For all services, users can submit their query either by electronic mail or interactively via the World Wide Web. PMID:15215403

  3. The Recipe for Protein Sequence-Based Function Prediction and Its Implementation in the ANNOTATOR Software Environment.

    PubMed

    Eisenhaber, Birgit; Kuchibhatla, Durga; Sherman, Westley; Sirota, Fernanda L; Berezovsky, Igor N; Wong, Wing-Cheong; Eisenhaber, Frank

    2016-01-01

    As biomolecular sequencing is becoming the main technique in life sciences, functional interpretation of sequences in terms of biomolecular mechanisms with in silico approaches is getting increasingly significant. Function prediction tools are most powerful for protein-coding sequences; yet, the concepts and technologies used for this purpose are not well reflected in bioinformatics textbooks. Notably, protein sequences typically consist of globular domains and non-globular segments. The two types of regions require cardinally different approaches for function prediction. Whereas the former are classic targets for homology-inspired function transfer based on remnant, yet statistically significant sequence similarity to other, characterized sequences, the latter type of regions are characterized by compositional bias or simple, repetitive patterns and require lexical analysis and/or empirical sequence pattern-function correlations. The recipe for function prediction recommends first to find all types of non-globular segments and, then, to subject the remaining query sequence to sequence similarity searches. We provide an updated description of the ANNOTATOR software environment as an advanced example of a software platform that facilitates protein sequence-based function prediction. PMID:27115649

  4. A comparison of different functions for predicted protein model quality assessment.

    PubMed

    Li, Juan; Fang, Huisheng

    2016-07-01

    In protein structure prediction, a considerable number of models are usually produced by either the Template-Based Method (TBM) or the ab initio prediction. The purpose of this study is to find the critical parameter in assessing the quality of the predicted models. A non-redundant template library was developed and 138 target sequences were modeled. The target sequences were all distant from the proteins in the template library and were aligned with template library proteins on the basis of the transformation matrix. The quality of each model was first assessed with QMEAN and its six parameters, which are C_β interaction energy (C_beta), all-atom pairwise energy (PE), solvation energy (SE), torsion angle energy (TAE), secondary structure agreement (SSA), and solvent accessibility agreement (SAE). Finally, the alignment score (score) was also used to assess the quality of model. Hence, a total of eight parameters (i.e., QMEAN, C_beta, PE, SE, TAE, SSA, SAE, score) were independently used to assess the quality of each model. The results indicate that SSA is the best parameter to estimate the quality of the model. PMID:27488386

  5. Towards New Drug Targets? Function Prediction of Putative Proteins of Neisseria meningitidis MC58 and Their Virulence Characterization

    PubMed Central

    Shahbaaz, Mohd.; Bisetty, Krishna; Ahmad, Faizan

    2015-01-01

    Abstract Neisseria meningitidis is a Gram-negative aerobic diplococcus, responsible for a variety of meningococcal diseases. The genome of N. meningitidis MC58 is comprised of 2114 genes that are translated into 1953 proteins. The 698 genes (∼35%) encode hypothetical proteins (HPs), because no experimental evidence of their biological functions are available. Analyses of these proteins are important to understand their functions in the metabolic networks and may lead to the discovery of novel drug targets against the infections caused by N. meningitidis. This study aimed at the identification and categorization of each HP present in the genome of N. meningitidis MC58 using computational tools. Functions of 363 proteins were predicted with high accuracy among the annotated set of HPs investigated. The reliably predicted 363 HPs were further grouped into 41 different classes of proteins, based on their possible roles in cellular processes such as metabolism, transport, and replication. Our studies revealed that 22 HPs may be involved in the pathogenesis caused by this microorganism. The top two HPs with highest virulence scores were subjected to molecular dynamics (MD) simulations to better understand their conformational behavior in a water environment. We also compared the MD simulation results with other virulent proteins present in N. meningitidis. This study broadens our understanding of the mechanistic pathways of pathogenesis, drug resistance, tolerance, and adaptability for host immune responses to N. meningitidis. PMID:26076386

  6. SVM-Prot 2016: A Web-Server for Machine Learning Prediction of Protein Functional Families from Sequence Irrespective of Similarity

    PubMed Central

    Li, Xiao Feng; Li, Shuang; Zeng, Xian; Chen, Shang Ying; Zhang, Peng; Qin, Chu; Zhang, Cheng; Chen, Zhe; Zhu, Feng; Chen, Yu Zong

    2016-01-01

    Knowledge of protein function is important for biological, medical and therapeutic studies, but many proteins are still unknown in function. There is a need for more improved functional prediction methods. Our SVM-Prot web-server employed a machine learning method for predicting protein functional families from protein sequences irrespective of similarity, which complemented those similarity-based and other methods in predicting diverse classes of proteins including the distantly-related proteins and homologous proteins of different functions. Since its publication in 2003, we made major improvements to SVM-Prot with (1) expanded coverage from 54 to 192 functional families, (2) more diverse protein descriptors protein representation, (3) improved predictive performances due to the use of more enriched training datasets and more variety of protein descriptors, (4) newly integrated BLAST analysis option for assessing proteins in the SVM-Prot predicted functional families that were similar in sequence to a query protein, and (5) newly added batch submission option for supporting the classification of multiple proteins. Moreover, 2 more machine learning approaches, K nearest neighbor and probabilistic neural networks, were added for facilitating collective assessment of protein functions by multiple methods. SVM-Prot can be accessed at http://bidd2.nus.edu.sg/cgi-bin/svmprot/svmprot.cgi. PMID:27525735

  7. SVM-Prot 2016: A Web-Server for Machine Learning Prediction of Protein Functional Families from Sequence Irrespective of Similarity.

    PubMed

    Li, Ying Hong; Xu, Jing Yu; Tao, Lin; Li, Xiao Feng; Li, Shuang; Zeng, Xian; Chen, Shang Ying; Zhang, Peng; Qin, Chu; Zhang, Cheng; Chen, Zhe; Zhu, Feng; Chen, Yu Zong

    2016-01-01

    Knowledge of protein function is important for biological, medical and therapeutic studies, but many proteins are still unknown in function. There is a need for more improved functional prediction methods. Our SVM-Prot web-server employed a machine learning method for predicting protein functional families from protein sequences irrespective of similarity, which complemented those similarity-based and other methods in predicting diverse classes of proteins including the distantly-related proteins and homologous proteins of different functions. Since its publication in 2003, we made major improvements to SVM-Prot with (1) expanded coverage from 54 to 192 functional families, (2) more diverse protein descriptors protein representation, (3) improved predictive performances due to the use of more enriched training datasets and more variety of protein descriptors, (4) newly integrated BLAST analysis option for assessing proteins in the SVM-Prot predicted functional families that were similar in sequence to a query protein, and (5) newly added batch submission option for supporting the classification of multiple proteins. Moreover, 2 more machine learning approaches, K nearest neighbor and probabilistic neural networks, were added for facilitating collective assessment of protein functions by multiple methods. SVM-Prot can be accessed at http://bidd2.nus.edu.sg/cgi-bin/svmprot/svmprot.cgi. PMID:27525735

  8. Construction of polycythemia vera protein interaction network and prediction of related biological functions.

    PubMed

    Liu, L-J; Cao, X-J; Zhou, C; Sun, Y; Lv, Q-L; Feng, F-B; Zhang, Y-Y; Sun, C-G

    2016-01-01

    Here, polycythemia vera (PV)-related genes were screened by the Online Mendelian Inheritance in Man (OMIM), and literature pertaining to the identified genes was extracted and a protein-protein interaction network was constructed using various Cytoscape plugins. Various molecular complexes were detected using the Clustervize plugin and a gene ontology-enrichment analysis of the biological pathways, molecular functions, and cellular components of the selected molecular complexes were identified using the BiNGo plugin. Fifty-four PV-related genes were identified in OMIM. The protein-protein interaction network contains 5 molecular complexes with correlation integral values >4. These complexes regulated various biological processes (peptide tyrosinase acidification, cell metabolism, and macromolecular biosynthesis), molecular functions (kinase activity, receptor binding, and cytokine activity), and the cellular components were mainly concentrated in the nucleus, intracellular membrane-bounded organelles, and extracellular region. These complexes were associated with the JAK-STAT signal transduction pathway, neurotrophic factor signaling pathway, and Wnt signaling pathway, which were correlated with chronic myeloid leukemia and acute myeloid leukemia. PMID:26909922

  9. PREFACE: Protein protein interactions: principles and predictions

    NASA Astrophysics Data System (ADS)

    Nussinov, Ruth; Tsai, Chung-Jung

    2005-06-01

    Proteins are the `workhorses' of the cell. Their roles span functions as diverse as being molecular machines and signalling. They carry out catalytic reactions, transport, form viral capsids, traverse membranes and form regulated channels, transmit information from DNA to RNA, making possible the synthesis of new proteins, and they are responsible for the degradation of unnecessary proteins and nucleic acids. They are the vehicles of the immune response and are responsible for viral entry into the cell. Given their importance, considerable effort has been centered on the prediction of protein function. A prime way to do this is through identification of binding partners. If the function of at least one of the components with which the protein interacts is known, that should let us assign its function(s) and the pathway(s) in which it plays a role. This holds since the vast majority of their chores in the living cell involve protein-protein interactions. Hence, through the intricate network of these interactions we can map cellular pathways, their interconnectivities and their dynamic regulation. Their identification is at the heart of functional genomics; their prediction is crucial for drug discovery. Knowledge of the pathway, its topology, length, and dynamics may provide useful information for forecasting side effects. The goal of predicting protein-protein interactions is daunting. Some associations are obligatory, others are continuously forming and dissociating. In principle, from the physical standpoint, any two proteins can interact, but under what conditions and at which strength? The principles of protein-protein interactions are general: the non-covalent interactions of two proteins are largely the outcome of the hydrophobic effect, which drives the interactions. In addition, hydrogen bonds and electrostatic interactions play important roles. Thus, many of the interactions observed in vitro are the outcome of experimental overexpression. Protein disorder

  10. Freezability prediction of boar ejaculates assessed by functional sperm parameters and sperm proteins.

    PubMed

    Casas, I; Sancho, S; Briz, M; Pinart, E; Bussalleu, E; Yeste, M; Bonet, S

    2009-10-15

    The objective of this work was to look for useful predictive indicators of the potentially "good" or "poor" ability of a boar ejaculate to sustain cryopreservation by assessing both the conventional sperm quality parameters (Study 1) and the immunolabeling of three proteins involved in the physiology of the sperm cell: GLUT3, HSP90AA1 and Cu/ZnSOD (Study 2). Study 1 was carried out in three different steps during the cryopreservation process of the sperm-rich fraction of 29 Piétrain boar ejaculates (17 degrees C, 5 degrees C, and 240min postthaw). These ejaculates were clustered based on sperm quality parameters analyzed at 240min postthaw, obtaining 16 good freezability ejaculates (GFEs) and 13 poor freezability ejaculates (PFEs). The sperm linearity (LIN) and the straightforward (STR) indexes at 5 degrees C showed higher hyperactivated movement in the PFEs than in the GFEs, which suggests that analyzing these sperm kinematic parameters could be a useful tool for predicting the potential freezability of an ejaculate. This statement was demonstrated by grouping the 29 ejaculates into two clusters (A and B) based on LIN and STR values assessed after 30 min at 5 degrees C, which resulted in around 72% of coincidence with the GFE and PFE groups. Study 2, performed at 17 degrees C and 240 min postthaw, revealed no differences between GFEs and PFEs in the immunolabeling of the three proteins within a same step, in terms of location and reactivity, although reactivity was generally weaker at 240 min postthaw in both groups. Additional studies on Western blot are currently being carried out with the objective to quantify the expression of the three proteins in GFEs and PFEs in the three steps of the cryopreservation process. PMID:19651432

  11. Many amino acid substitution variants identified in DNA repair genes during human population screenings are predicted to impact protein function

    SciTech Connect

    Xi, T; Jones, I M; Mohrenweiser, H W

    2003-11-03

    Over 520 different amino acid substitution variants have been previously identified in the systematic screening of 91 human DNA repair genes for sequence variation. Two algorithms were employed to predict the impact of these amino acid substitutions on protein activity. Sorting Intolerant From Tolerant (SIFT) classified 226 of 508 variants (44%) as ''Intolerant''. Polymorphism Phenotyping (PolyPhen) classed 165 of 489 amino acid substitutions (34%) as ''Probably or Possibly Damaging''. Another 9-15% of the variants were classed as ''Potentially Intolerant or Damaging''. The results from the two algorithms are highly associated, with concordance in predicted impact observed for {approx}62% of the variants. Twenty one to thirty one percent of the variant proteins are predicted to exhibit reduced activity by both algorithms. These variants occur at slightly lower individual allele frequency than do the variants classified as ''Tolerant'' or ''Benign''. Both algorithms correctly predicted the impact of 26 functionally characterized amino acid substitutions in the APE1 protein on biochemical activity, with one exception. It is concluded that a substantial fraction of the missense variants observed in the general human population are functionally relevant. These variants are expected to be the molecular genetic and biochemical basis for the associations of reduced DNA repair capacity phenotypes with elevated cancer risk.

  12. Toolbox for Protein Structure Prediction.

    PubMed

    Roche, Daniel Barry; McGuffin, Liam James

    2016-01-01

    Protein tertiary structure prediction algorithms aim to predict, from amino acid sequence, the tertiary structure of a protein. In silico protein structure prediction methods have become extremely important, as in vitro-based structural elucidation is unable to keep pace with the current growth of sequence databases due to high-throughput next-generation sequencing, which has exacerbated the gaps in our knowledge between sequences and structures.Here we briefly discuss protein tertiary structure prediction, the biennial competition for the Critical Assessment of Techniques for Protein Structure Prediction (CASP) and its role in shaping the field. We also discuss, in detail, our cutting-edge web-server method IntFOLD2-TS for tertiary structure prediction. Furthermore, we provide a step-by-step guide on using the IntFOLD2-TS web server, along with some real world examples, where the IntFOLD server can and has been used to improve protein tertiary structure prediction and aid in functional elucidation. PMID:26519323

  13. AUTO-MUTE 2.0: A Portable Framework with Enhanced Capabilities for Predicting Protein Functional Consequences upon Mutation

    PubMed Central

    Masso, Majid; Vaisman, Iosif I.

    2014-01-01

    The AUTO-MUTE 2.0 stand-alone software package includes a collection of programs for predicting functional changes to proteins upon single residue substitutions, developed by combining structure-based features with trained statistical learning models. Three of the predictors evaluate changes to protein stability upon mutation, each complementing a distinct experimental approach. Two additional classifiers are available, one for predicting activity changes due to residue replacements and the other for determining the disease potential of mutations associated with nonsynonymous single nucleotide polymorphisms (nsSNPs) in human proteins. These five command-line driven tools, as well as all the supporting programs, complement those that run our AUTO-MUTE web-based server. Nevertheless, all the codes have been rewritten and substantially altered for the new portable software, and they incorporate several new features based on user feedback. Included among these upgrades is the ability to perform three highly requested tasks: to run “big data” batch jobs; to generate predictions using modified protein data bank (PDB) structures, and unpublished personal models prepared using standard PDB file formatting; and to utilize NMR structure files that contain multiple models. PMID:25197272

  14. AUTO-MUTE 2.0: A Portable Framework with Enhanced Capabilities for Predicting Protein Functional Consequences upon Mutation.

    PubMed

    Masso, Majid; Vaisman, Iosif I

    2014-01-01

    The AUTO-MUTE 2.0 stand-alone software package includes a collection of programs for predicting functional changes to proteins upon single residue substitutions, developed by combining structure-based features with trained statistical learning models. Three of the predictors evaluate changes to protein stability upon mutation, each complementing a distinct experimental approach. Two additional classifiers are available, one for predicting activity changes due to residue replacements and the other for determining the disease potential of mutations associated with nonsynonymous single nucleotide polymorphisms (nsSNPs) in human proteins. These five command-line driven tools, as well as all the supporting programs, complement those that run our AUTO-MUTE web-based server. Nevertheless, all the codes have been rewritten and substantially altered for the new portable software, and they incorporate several new features based on user feedback. Included among these upgrades is the ability to perform three highly requested tasks: to run "big data" batch jobs; to generate predictions using modified protein data bank (PDB) structures, and unpublished personal models prepared using standard PDB file formatting; and to utilize NMR structure files that contain multiple models. PMID:25197272

  15. The PredictProtein server

    PubMed Central

    Rost, Burkhard; Liu, Jinfeng

    2003-01-01

    PredictProtein (PP, http://cubic.bioc.columbia.edu/pp/) is an internet service for sequence analysis and the prediction of aspects of protein structure and function. Users submit protein sequence or alignments; the server returns a multiple sequence alignment, PROSITE sequence motifs, low-complexity regions (SEG), ProDom domain assignments, nuclear localisation signals, regions lacking regular structure and predictions of secondary structure, solvent accessibility, globular regions, transmembrane helices, coiled-coil regions, structural switch regions and disulfide-bonds. Upon request, fold recognition by prediction-based threading is available. For all services, users can submit their query either by electronic mail or interactively from World Wide Web. PMID:12824312

  16. Final report for LDRD project {open_quotes}A new approach to protein function and structure prediction{close_quotes}

    SciTech Connect

    Phillips, C.A.

    1997-03-01

    This report describes the research performed under the laboratory-Directed Research and Development (LDRD) grant {open_quotes}A new approach to protein function and structure prediction{close_quotes}, funded FY94-6. We describe the goals of the research, motivate and list our improvements to the state of the art in multiple sequence alignment and phylogeny (evolutionary tree) construction, but leave technical details to the six publications resulting from this work. At least three algorithms for phylogeny construction or tree consensus have been implemented and used by researchers outside of Sandia.

  17. Methods for protein complex prediction and their contributions towards understanding the organisation, function and dynamics of complexes.

    PubMed

    Srihari, Sriganesh; Yong, Chern Han; Patil, Ashwini; Wong, Limsoon

    2015-09-14

    Complexes of physically interacting proteins constitute fundamental functional units responsible for driving biological processes within cells. A faithful reconstruction of the entire set of complexes is therefore essential to understand the functional organisation of cells. In this review, we discuss the key contributions of computational methods developed till date (approximately between 2003 and 2015) for identifying complexes from the network of interacting proteins (PPI network). We evaluate in depth the performance of these methods on PPI datasets from yeast, and highlight their limitations and challenges, in particular at detecting sparse and small or sub-complexes and discerning overlapping complexes. We describe methods for integrating diverse information including expression profiles and 3D structures of proteins with PPI networks to understand the dynamics of complex formation, for instance, of time-based assembly of complex subunits and formation of fuzzy complexes from intrinsically disordered proteins. Finally, we discuss methods for identifying dysfunctional complexes in human diseases, an application that is proving invaluable to understand disease mechanisms and to discover novel therapeutic targets. We hope this review aptly commemorates a decade of research on computational prediction of complexes and constitutes a valuable reference for further advancements in this exciting area. PMID:25913176

  18. Novel Urinary Protein Biomarkers Predicting the Development of Microalbuminuria and Renal Function Decline in Type 1 Diabetes

    PubMed Central

    Schlatzer, Daniela; Maahs, David M.; Chance, Mark R.; Dazard, Jean-Eudes; Li, Xiaolin; Hazlett, Fred; Rewers, Marian; Snell-Bergeon, Janet K.

    2012-01-01

    OBJECTIVE To define a panel of novel protein biomarkers of renal disease. RESEARCH DESIGN AND METHODS Adults with type 1 diabetes in the Coronary Artery Calcification in Type 1 Diabetes study who were initially free of renal complications (n = 465) were followed for development of micro- or macroalbuminuria (MA) and early renal function decline (ERFD, annual decline in estimated glomerular filtration rate of ≥3.3%). The label-free proteomic discovery phase was conducted in 13 patients who progressed to MA by the 6-year visit and 11 control subjects, and four proteins (Tamm-Horsfall glycoprotein, α-1 acid glycoprotein, clusterin, and progranulin) identified in the discovery phase were measured by enzyme-linked immunosorbent assay in 74 subjects: group A, normal renal function (n = 35); group B, ERFD without MA (n = 15); group C, MA without ERFD (n = 16); and group D, both ERFD and MA (n = 8). RESULTS In the label-free analysis, a model of progression to MA was built using 252 peptides, yielding an area under the curve (AUC) of 84.7 ± 5.3%. In the validation study, ordinal logistic regression was used to predict development of ERFD, MA, or both. A panel including Tamm-Horsfall glycoprotein (odds ratio 2.9, 95% CI 1.3–6.2, P = 0.008), progranulin (1.9, 0.8–4.5, P = 0.16), clusterin (0.6, 0.3–1.1, P = 0.09), and α-1 acid glycoprotein (1.6, 0.7–3.7, P = 0.27) improved the AUC from 0.841 to 0.889. CONCLUSIONS A panel of four novel protein biomarkers predicted early renal damage in type 1 diabetes. These findings require further validation in other populations for prediction of renal complications and treatment monitoring. PMID:22238279

  19. A novel approach for a functional group to predict protein in undigested residue and protein digestibility by mid-infrared spectroscopy.

    PubMed

    Wang, Li Fang; Swift, Mary Lou; Zijlstra, Ruurd T

    2013-11-01

    To evaluate nutrient digestibility, we propose the novel approach of functional group digestibility (FGD). The FGD was based on the absorbance of specific Fourier transform infrared (FT-IR) peaks and the ratio of an inorganic indigestible marker in diet and digesta, without calibration. For application, samples of diet and digesta of wheat with predetermined crude protein (CP) digestibility were scanned on an FT-IR spectrometer equipped with a single-reflection attenuated total reflection (ATR) attachment. The FGD in the amide I region (1689-1631 cm (-1)) of digesta spectra was strongly related (R(2) = 0.99) with CP digestibility. The measured diet CP digestibility ranged from 60.4 to 87.8% with a standard error of prediction of 1.09%. In conclusion, instead of predictions based on calibrations, FGD can be calculated directly from spectra, provided the ratio of marker in diet and undigested residue is known, and then accurately predicts nutrient digestibility. PMID:24160888

  20. Prediction of Certain Well-Characterized Domains of Known Functions within the PE and PPE Proteins of Mycobacteria

    PubMed Central

    Sultana, Rafiya; Tanneeru, Karunakar; Kumar, Ashwin B. R.; Guruprasad, Lalitha

    2016-01-01

    The PE and PPE protein family are unique to mycobacteria. Though the complete genome sequences for over 500 M. tuberculosis strains and mycobacterial species are available, few PE and PPE proteins have been structurally and functionally characterized. We have therefore used bioinformatics tools to characterize the structure and function of these proteins. We selected representative members of the PE and PPE protein family by phylogeny analysis and using structure-based sequence annotation identified ten well-characterized protein domains of known function. Some of these domains were observed to be common to all mycobacterial species and some were species specific. PMID:26891364

  1. Biofragments: An Approach towards Predicting Protein Function Using Biologically Related Fragments and its Application to Mycobacterium tuberculosis CYP126

    PubMed Central

    Hudson, Sean A; Mashalidis, Ellene H; Bender, Andreas; McLean, Kirsty J; Munro, Andrew W; Abell, Chris

    2014-01-01

    We present a novel fragment-based approach that tackles some of the challenges for chemical biology of predicting protein function. The general approach, which we have termed biofragments, comprises two key stages. First, a biologically relevant fragment library (biofragment library) can be designed and constructed from known sets of substrate-like ligands for a protein class of interest. Second, the library can be screened for binding to a novel putative ligand-binding protein from the same or similar class, and the characterization of hits provides insight into the basis of ligand recognition, selectivity, and function at the substrate level. As a proof-of-concept, we applied the biofragments approach to the functionally uncharacterized Mycobacterium tuberculosis (Mtb) cytochrome P450 isoform, CYP126. This led to the development of a tailored CYP biofragment library with notable 3D characteristics and a significantly higher screening hit rate (14 %) than standard drug-like fragment libraries screened previously against Mtb CYP121 and 125 (4 % and 1 %, respectively). Biofragment hits were identified that make both substrate-like type-I and inhibitor-like type-II interactions with CYP126. A chemical-fingerprint-based substrate model was built from the hits and used to search a virtual TB metabolome, which led to the discovery that CYP126 has a strong preference for the recognition of aromatics and substrate-like type-I binding of chlorophenol moieties within the active site near the heme. Future catalytic analyses will be focused on assessing CYP126 for potential substrate oxidative dehalogenation. PMID:24677424

  2. Protein structural domains: definition and prediction.

    PubMed

    Ezkurdia, Iakes; Tress, Michael L

    2011-11-01

    Recognition and prediction of structural domains in proteins is an important part of structure and function prediction. This unit lists the range of tools available for domain prediction, and describes sequence and structural analysis tools that complement domain prediction methods. Also detailed are the basic domain prediction steps, along with suggested strategies for different protein sequences and potential pitfalls in domain boundary prediction. The difficult problem of domain orientation prediction is also discussed. All the resources necessary for domain boundary prediction are accessible via publicly available Web servers and databases and do not require computational expertise. PMID:22045561

  3. Functional, structural and epitopic prediction of hypothetical proteins of Mycobacterium tuberculosis H37Rv: An in silico approach for prioritizing the targets.

    PubMed

    Gazi, Md Amran; Kibria, Mohammad Golam; Mahfuz, Mustafa; Islam, Md Rezaul; Ghosh, Prakash; Afsar, Md Nure Alam; Khan, Md Arif; Ahmed, Tahmeed

    2016-10-15

    The global control of tuberculosis (TB) remains a great challenge from the standpoint of diagnosis, detection of drug resistance, and treatment. Major serodiagnostic limitations include low sensitivity and high cost in detecting TB. On the other hand, treatment measures are often hindered by low efficacies of commonly used drugs and resistance developed by the bacteria. Hence, there is a need to look into newer diagnostic and therapeutic targets. The proteome information available suggests that among the 3906 proteins in Mycobacterium tuberculosis H37Rv, about quarter remain classified as hypothetical uncharacterized set. This study involves a combination of a number of bioinformatics tools to analyze those hypothetical proteins (HPs). An entire set of 999 proteins was primarily screened for protein sequences having conserved domains with high confidence using a combination of the latest versions of protein family databases. Subsequently, 98 of such potential target proteins were extensively analyzed by means of physicochemical characteristics, protein-protein interaction, sub-cellular localization, structural similarity and functional classification. Next, we predicted antigenic proteins from the entire set and identified B and T cell epitopes of these proteins in M. tuberculosis H37Rv. We predicted the function of these HPs belong to various classes of proteins such as enzymes, transporters, receptors, structural proteins, transcription regulators and other proteins. However, the structural similarity prediction of the annotated proteins substantiated the functional classification of those proteins. Consequently, based on higher antigenicity score and sub-cellular localization, we choose two (NP_216420.1, NP_216903.1) of the antigenic proteins to exemplify B and T cell epitope prediction approach. Finally we found 15 epitopes those located partially or fully in the linear epitope region. We found 21 conformational epitopes by using Ellipro server as well. In

  4. De Novo Protein Structure Prediction

    NASA Astrophysics Data System (ADS)

    Hung, Ling-Hong; Ngan, Shing-Chung; Samudrala, Ram

    An unparalleled amount of sequence data is being made available from large-scale genome sequencing efforts. The data provide a shortcut to the determination of the function of a gene of interest, as long as there is an existing sequenced gene with similar sequence and of known function. This has spurred structural genomic initiatives with the goal of determining as many protein folds as possible (Brenner and Levitt, 2000; Burley, 2000; Brenner, 2001; Heinemann et al., 2001). The purpose of this is twofold: First, the structure of a gene product can often lead to direct inference of its function. Second, since the function of a protein is dependent on its structure, direct comparison of the structures of gene products can be more sensitive than the comparison of sequences of genes for detecting homology. Presently, structural determination by crystallography and NMR techniques is still slow and expensive in terms of manpower and resources, despite attempts to automate the processes. Computer structure prediction algorithms, while not providing the accuracy of the traditional techniques, are extremely quick and inexpensive and can provide useful low-resolution data for structure comparisons (Bonneau and Baker, 2001). Given the immense number of structures which the structural genomic projects are attempting to solve, there would be a considerable gain even if the computer structure prediction approach were applicable to a subset of proteins.

  5. Computational approaches for predicting mutant protein stability.

    PubMed

    Kulshreshtha, Shweta; Chaudhary, Vigi; Goswami, Girish K; Mathur, Nidhi

    2016-05-01

    Mutations in the protein affect not only the structure of protein, but also its function and stability. Prediction of mutant protein stability with accuracy is desired for uncovering the molecular aspects of diseases and design of novel proteins. Many advanced computational approaches have been developed over the years, to predict the stability and function of a mutated protein. These approaches based on structure, sequence features and combined features (both structure and sequence features) provide reasonably accurate estimation of the impact of amino acid substitution on stability and function of protein. Recently, consensus tools have been developed by incorporating many tools together, which provide single window results for comparison purpose. In this review, a useful guide for the selection of tools that can be employed in predicting mutated proteins' stability and disease causing capability is provided. PMID:27160393

  6. PSSP-RFE: accurate prediction of protein structural class by recursive feature extraction from PSI-BLAST profile, physical-chemical property and functional annotations.

    PubMed

    Li, Liqi; Cui, Xiang; Yu, Sanjiu; Zhang, Yuan; Luo, Zhong; Yang, Hua; Zhou, Yue; Zheng, Xiaoqi

    2014-01-01

    Protein structure prediction is critical to functional annotation of the massively accumulated biological sequences, which prompts an imperative need for the development of high-throughput technologies. As a first and key step in protein structure prediction, protein structural class prediction becomes an increasingly challenging task. Amongst most homological-based approaches, the accuracies of protein structural class prediction are sufficiently high for high similarity datasets, but still far from being satisfactory for low similarity datasets, i.e., below 40% in pairwise sequence similarity. Therefore, we present a novel method for accurate and reliable protein structural class prediction for both high and low similarity datasets. This method is based on Support Vector Machine (SVM) in conjunction with integrated features from position-specific score matrix (PSSM), PROFEAT and Gene Ontology (GO). A feature selection approach, SVM-RFE, is also used to rank the integrated feature vectors through recursively removing the feature with the lowest ranking score. The definitive top features selected by SVM-RFE are input into the SVM engines to predict the structural class of a query protein. To validate our method, jackknife tests were applied to seven widely used benchmark datasets, reaching overall accuracies between 84.61% and 99.79%, which are significantly higher than those achieved by state-of-the-art tools. These results suggest that our method could serve as an accurate and cost-effective alternative to existing methods in protein structural classification, especially for low similarity datasets. PMID:24675610

  7. GECluster: a novel protein complex prediction method

    PubMed Central

    Su, Lingtao; Liu, Guixia; Wang, Han; Tian, Yuan; Zhou, Zhihui; Han, Liang; Yan, Lun

    2014-01-01

    Identification of protein complexes is of great importance in the understanding of cellular organization and functions. Traditional computational protein complex prediction methods mainly rely on the topology of protein–protein interaction (PPI) networks but seldom take biological information of proteins (such as Gene Ontology (GO)) into consideration. Meanwhile, the environment relevant analysis of protein complex evolution has been poorly studied, partly due to the lack of high-precision protein complex datasets. In this paper, a combined PPI network is introduced to predict protein complexes which integrate both GO and expression value of relevant protein-coding genes. A novel protein complex prediction method GECluster (Gene Expression Cluster) was proposed based on a seed node expansion strategy, in which a combined PPI network was utilized. GECluster was applied to a training combined PPI network and it predicted more credible complexes than peer methods. The results indicate that using a combined PPI network can efficiently improve protein complex prediction accuracy. In order to study protein complex evolution within cells due to changes in the living environment surrounding cells, GECluster was applied to seven combined PPI networks constructed using the data of a test set including yeast response to stress throughout a wine fermentation process. Our results showed that with the rise of alcohol concentration, protein complexes within yeast cells gradually evolve from one state to another. Besides this, the number of core and attachment proteins within a protein complex both changed significantly. PMID:26019559

  8. Structure Prediction of Protein Complexes

    NASA Astrophysics Data System (ADS)

    Pierce, Brian; Weng, Zhiping

    Protein-protein interactions are critical for biological function. They directly and indirectly influence the biological systems of which they are a part. Antibodies bind with antigens to detect and stop viruses and other infectious agents. Cell signaling is performed in many cases through the interactions between proteins. Many diseases involve protein-protein interactions on some level, including cancer and prion diseases.

  9. Modeling Protein Domain Function

    ERIC Educational Resources Information Center

    Baker, William P.; Jones, Carleton "Buck"; Hull, Elizabeth

    2007-01-01

    This simple but effective laboratory exercise helps students understand the concept of protein domain function. They use foam beads, Styrofoam craft balls, and pipe cleaners to explore how domains within protein active sites interact to form a functional protein. The activity allows students to gain content mastery and an understanding of the…

  10. Dietary modulation and structure prediction of rat mucosal pentraxin (Mptx) protein and loss of function in humans

    PubMed Central

    van der Meer-van Kraaij, Cindy; Siezen, Roland; Kramer, Evelien; Reinders, Marjolein; Blokzijl, Hans; van der Meer, Roelof

    2007-01-01

    Mucosal pentraxin (Mptx), identified in rats, is a short pentraxin of unknown function. Other subfamily members are Serum amyloid P component (SAP), C-reactive protein (CRP) and Jeltraxin. Rat Mptx mRNA is predominantly expressed in colon and in vivo is strongly (30-fold) regulated by dietary heme and calcium, modulators of colon cancer risk. This renders Mptx a potential nutrient sensitive biomarker of gut health. To support a role as biomarker, we examined whether the pentraxin protein structure is conserved, whether Mptx protein is nutrient-sensitively expressed and whether Mptx is expressed in mouse and human. Sequence comparison and 3D modelling showed that rat Mptx is highly homologous to the other pentraxins. The calcium-binding site and subunit interaction sites are highly conserved, while a loop deletion and charged residues contribute to a distinctive “top” face of the pentamer. In accordance with mRNA expression, Mptx protein is strongly down-regulated in rat colon mucosa in response to high dietary heme intake. Mptx mRNA is expressed in rat and mouse colon, but not in human colon. A stop codon at the beginning of human exon two indicates loss of function, which may be related to differences in intestinal cell turnover between man and rodents. PMID:18850182

  11. Prediction of CTL epitope, in silico modeling and functional analysis of cytolethal distending toxin (CDT) protein of Campylobacter jejuni

    PubMed Central

    2014-01-01

    Background Campylobacter jejuni is a potent bacterial pathogen culpable for diarrheal disease called campylobacteriosis. It is realized as a major health issue attributable to unavailability of appropriate vaccines and clinical treatment options. As other pathogens, C. jejuni entails host cellular components of an infected individual to disseminate this disease. These host–pathogen interfaces during C. jejuni infection are complex, vibrant and involved in the nicking of host cell environment, enzymes and pathways. Existing therapies are trusted only on a much smaller number of drugs, most of them are insufficient because of their severe host toxicity or drug-resistance phenomena. To find out remedial alternatives, the identification of new biotargets is highly anticipated. Understanding the molecules involved in pathogenesis has the potential to yield new and exciting strategies for therapeutic intervention. In this direction, advances in bioinformatics have opened up new possibilities for the rapid measurement of global changes during infection and this could be exploited to understand the molecular interactions involved in campylobacteriosis. Methods In this study, homology modeling, epitope prediction and identification of ligand binding sites has been explored. Further attempt to generate strapping 3D model of cytolethal distending toxin protein from C. jejuni have been described for the first time. Results CDT protein isolated from C. jejuni was analyzed using various bioinformatics and immuno-informatics tools including sequence and structure tools. A total of fifty five antigenic determinants were predicted and prediction results of CTL epitopes revealed that five MHC ligand are found in CDT. The three potential pocket binding site are found in the sequence that can be useful for drug designing. Conclusions This model, we hope, will be of help in designing and predicting novel CDT inhibitors and vaccine candidates. PMID:24552167

  12. An improved PMF scoring function for universally predicting the interactions of a ligand with protein, DNA, and RNA.

    PubMed

    Zhao, Xiaoyu; Liu, Xiaofeng; Wang, Yuanyuan; Chen, Zhi; Kang, Ling; Zhang, Hailei; Luo, Xiaomin; Zhu, Weiliang; Chen, Kaixian; Li, Honglin; Wang, Xicheng; Jiang, Hualiang

    2008-07-01

    An improved potential mean force (PMF) scoring function, named KScore, has been developed by using 23 redefined ligand atom types and 17 protein atom types, as well as 28 newly introduced atom types for nucleic acids (DNA and RNA). Metal ions and water molecules embedded in the binding sites of receptors are considered explicitly by two newly defined atom types. The individual potential terms were devised on the basis of the high-resolution crystal and NMR structures of 2,422 protein-ligand complexes, 300 DNA-ligand complexes, and 97 RNA-ligand complexes. The optimized atom pairwise distances and minima of the potentials overcome some of the disadvantages and ambiguities of current PMF potentials; thus, they more reasonably explain the atomic interaction between receptors and ligands. KScore was validated against five test sets of protein-ligand complexes and two sets of nucleic-acid-ligand complexes. The results showed acceptable correlations between KScore scores and experimentally determined binding affinities (log K i's or binding free energies). In particular, KScore can be used to rank the binding of ligands with metalloproteins; the linear correlation coefficient ( R) for the test set is 0.65. In addition to reasonably ranking protein-ligand interactions, KScore also yielded good results for scoring DNA/RNA--ligand interactions; the linear correlation coefficients for DNA-ligand and RNA-ligand complexes are 0.68 and 0.81, respectively. Moreover, KScore can appropriately reproduce the experimental structures of ligand-receptor complexes. Thus, KScore is an appropriate scoring function for universally ranking the interactions of ligands with protein, DNA, and RNA. PMID:18553962

  13. Protein function annotation using protein domain family resources.

    PubMed

    Das, Sayoni; Orengo, Christine A

    2016-01-15

    As a result of the genome sequencing and structural genomics initiatives, we have a wealth of protein sequence and structural data. However, only about 1% of these proteins have experimental functional annotations. As a result, computational approaches that can predict protein functions are essential in bridging this widening annotation gap. This article reviews the current approaches of protein function prediction using structure and sequence based classification of protein domain family resources with a special focus on functional families in the CATH-Gene3D resource. PMID:26434392

  14. Predicting protein-protein interactions based only on sequences information.

    PubMed

    Shen, Juwen; Zhang, Jian; Luo, Xiaomin; Zhu, Weiliang; Yu, Kunqian; Chen, Kaixian; Li, Yixue; Jiang, Hualiang

    2007-03-13

    Protein-protein interactions (PPIs) are central to most biological processes. Although efforts have been devoted to the development of methodology for predicting PPIs and protein interaction networks, the application of most existing methods is limited because they need information about protein homology or the interaction marks of the protein partners. In the present work, we propose a method for PPI prediction using only the information of protein sequences. This method was developed based on a learning algorithm-support vector machine combined with a kernel function and a conjoint triad feature for describing amino acids. More than 16,000 diverse PPI pairs were used to construct the universal model. The prediction ability of our approach is better than that of other sequence-based PPI prediction methods because it is able to predict PPI networks. Different types of PPI networks have been effectively mapped with our method, suggesting that, even with only sequence information, this method could be applied to the exploration of networks for any newly discovered protein with unknown biological relativity. In addition, such supplementary experimental information can enhance the prediction ability of the method. PMID:17360525

  15. Predicting protein-peptide interactions from scratch

    NASA Astrophysics Data System (ADS)

    Yan, Chengfei; Xu, Xianjin; Zou, Xiaoqin; Zou lab Team

    Protein-peptide interactions play an important role in many cellular processes. The ability to predict protein-peptide complex structures is valuable for mechanistic investigation and therapeutic development. Due to the high flexibility of peptides and lack of templates for homologous modeling, predicting protein-peptide complex structures is extremely challenging. Recently, we have developed a novel docking framework for protein-peptide structure prediction. Specifically, given the sequence of a peptide and a 3D structure of the protein, initial conformations of the peptide are built through protein threading. Then, the peptide is globally and flexibly docked onto the protein using a novel iterative approach. Finally, the sampled modes are scored and ranked by a statistical potential-based energy scoring function that was derived for protein-peptide interactions from statistical mechanics principles. Our docking methodology has been tested on the Peptidb database and compared with other protein-peptide docking methods. Systematic analysis shows significantly improved results compared to the performances of the existing methods. Our method is computationally efficient and suitable for large-scale applications. Nsf CAREER Award 0953839 (XZ) NIH R01GM109980 (XZ).

  16. Multipass Membrane Protein Structure Prediction Using Rosetta

    PubMed Central

    Yarov-Yarovoy, Vladimir; Schonbrun, Jack; Baker, David

    2006-01-01

    We describe the adaptation of the Rosetta de novo structure prediction method for prediction of helical transmembrane protein structures. The membrane environment is modeled by embedding the protein chain into a model membrane represented by parallel planes defining hydrophobic, interface, and polar membrane layers for each energy evaluation. The optimal embedding is determined by maximizing the exposure of surface hydrophobic residues within the membrane and minimizing hydrophobic exposure outside of the membrane. Protein conformations are built up using the Rosetta fragment assembly method and evaluated using a new membrane-specific version of the Rosetta low-resolution energy function in which residue–residue and residue–environment interactions are functions of the membrane layer in addition to amino acid identity, distance, and density. We find that lower energy and more native-like structures are achieved by sequential addition of helices to a growing chain, which may mimic some aspects of helical protein biogenesis after translocation, rather than folding the whole chain simultaneously as in the Rosetta soluble protein prediction method. In tests on 12 membrane proteins for which the structure is known, between 51 and 145 residues were predicted with root-mean-square deviation <4Å from the native structure. PMID:16372357

  17. BgN-Score and BsN-Score: Bagging and boosting based ensemble neural networks scoring functions for accurate binding affinity prediction of protein-ligand complexes

    PubMed Central

    2015-01-01

    Background Accurately predicting the binding affinities of large sets of protein-ligand complexes is a key challenge in computational biomolecular science, with applications in drug discovery, chemical biology, and structural biology. Since a scoring function (SF) is used to score, rank, and identify drug leads, the fidelity with which it predicts the affinity of a ligand candidate for a protein's binding site has a significant bearing on the accuracy of virtual screening. Despite intense efforts in developing conventional SFs, which are either force-field based, knowledge-based, or empirical, their limited predictive power has been a major roadblock toward cost-effective drug discovery. Therefore, in this work, we present novel SFs employing a large ensemble of neural networks (NN) in conjunction with a diverse set of physicochemical and geometrical features characterizing protein-ligand complexes to predict binding affinity. Results We assess the scoring accuracies of two new ensemble NN SFs based on bagging (BgN-Score) and boosting (BsN-Score), as well as those of conventional SFs in the context of the 2007 PDBbind benchmark that encompasses a diverse set of high-quality protein families. We find that BgN-Score and BsN-Score have more than 25% better Pearson's correlation coefficient (0.804 and 0.816 vs. 0.644) between predicted and measured binding affinities compared to that achieved by a state-of-the-art conventional SF. In addition, these ensemble NN SFs are also at least 19% more accurate (0.804 and 0.816 vs. 0.675) than SFs based on a single neural network that has been traditionally used in drug discovery applications. We further find that ensemble models based on NNs surpass SFs based on the decision-tree ensemble technique Random Forests. Conclusions Ensemble neural networks SFs, BgN-Score and BsN-Score, are the most accurate in predicting binding affinity of protein-ligand complexes among the considered SFs. Moreover, their accuracies are even higher

  18. Phenomenological simulation and density functional theory prediction of 57 Fe Mössbauer parameters: application to magnetically coupled diiron proteins

    NASA Astrophysics Data System (ADS)

    Rodriguez, Jorge H.

    2013-04-01

    The use of phenomenological spin Hamiltonians and of spin density functional theory for the analysis and interpretation of Mössbauer spectra of antiferromagnetic or ferromagnetic diiron centers is briefly discussed. The spectroscopic parameters of the hydroxylase component of methane monooxygenase (MMOH), an enzyme that catalyzes the conversion of methane to methanol, have been studied. In its reduced diferrous state (MMOH Red ) the enzyme displays 57Fe Mössbauer and EPR parameters characteristic of two ferromagnetically coupled high spin ferrous ions. However, Mössbauer spectra recorded for MMOH Red from two different bacteria, Methylococcus capsulatus (Bath) and Methylosinus trichosporium OB3b, display slightly different electric quadrupole splittings (Δ E Q ) in apparent contradiction to their essentially identical active site crystallographic structures and biochemical functions. Herein, the Mössbauer spectral parameters of MMOH Red have been predicted and studied via spin density functional theory. The somewhat different Δ E Q recorded for the two bacteria have been traced to the relative position of an essentially unbound water molecule within their diiron active sites. It is shown that the presence or absence of the unbound water molecule mainly affects the electric field gradient at only one iron ion of the binuclear active sites.

  19. Predicting Disease-Related Proteins Based on Clique Backbone in Protein-Protein Interaction Network

    PubMed Central

    Yang, Lei; Zhao, Xudong; Tang, Xianglong

    2014-01-01

    Network biology integrates different kinds of data, including physical or functional networks and disease gene sets, to interpret human disease. A clique (maximal complete subgraph) in a protein-protein interaction network is a topological module and possesses inherently biological significance. A disease-related clique possibly associates with complex diseases. Fully identifying disease components in a clique is conductive to uncovering disease mechanisms. This paper proposes an approach of predicting disease proteins based on cliques in a protein-protein interaction network. To tolerate false positive and negative interactions in protein networks, extending cliques and scoring predicted disease proteins with gene ontology terms are introduced to the clique-based method. Precisions of predicted disease proteins are verified by disease phenotypes and steadily keep to more than 95%. The predicted disease proteins associated with cliques can partly complement mapping between genotype and phenotype, and provide clues for understanding the pathogenesis of serious diseases. PMID:25013377

  20. Structure Prediction of Membrane Proteins

    NASA Astrophysics Data System (ADS)

    Hu, Xiche

    Membrane proteins play a central role in many cellular and physiological processes. It is estimated that integral membrane proteins make up about 20-30% of the proteome (Krogh et al., 2001b; Stevens and Arkin, 2000; von Heijne, 1999). They are essential mediators of material and information transfer across cell membranes. Their functions include active and passive transport of molecules into and out of cells and organelles; transduction of energy among various forms (light, electrical, and chemical energy); as well as reception and transduction of chemical and electrical signals across membranes (Avdonin, 2005; Bockaert et al., 2002; Pahl, 1999; Rehling et al., 2004; Stack et al., 1995). Identifying these transmembrane (TM) proteins and deciphering their molecular mechanisms, then, is of great importance, particularly as applied to biomedicine. Membrane proteins are the targets of a large number of pharmacologically and toxicologically active substances, and are directly involved in their uptake, metabolism, and clearance (Bettler et al., 1998; Cohen, 2002; Heusser and Jardieu, 1997; Tibes et al., 2005; Xu et al., 2005). Despite the importance of membrane proteins, the knowledge of their high-resolution structures and mechanisms of action has lagged far behind in comparison to that of water-soluble proteins: less than 1% of all three-dimensional structures deposited in the Protein Data Bank are of membrane proteins. This unfortunate disparity stems from difficulties in overexpression and the crystallization of membrane proteins (Grisshammer and Tate, 1995; Michel, 1991).

  1. Prokaryotic homologs of Argonaute proteins are predicted to function as key components of a novel system of defense against mobile genetic elements

    PubMed Central

    Makarova, Kira S; Wolf, Yuri I; van der Oost, John; Koonin, Eugene V

    2009-01-01

    Background In eukaryotes, RNA interference (RNAi) is a major mechanism of defense against viruses and transposable elements as well of regulating translation of endogenous mRNAs. The RNAi systems recognize the target RNA molecules via small guide RNAs that are completely or partially complementary to a region of the target. Key components of the RNAi systems are proteins of the Argonaute-PIWI family some of which function as slicers, the nucleases that cleave the target RNA that is base-paired to a guide RNA. Numerous prokaryotes possess the CRISPR-associated system (CASS) of defense against phages and plasmids that is, in part, mechanistically analogous but not homologous to eukaryotic RNAi systems. Many prokaryotes also encode homologs of Argonaute-PIWI proteins but their functions remain unknown. Results We present a detailed analysis of Argonaute-PIWI protein sequences and the genomic neighborhoods of the respective genes in prokaryotes. Whereas eukaryotic Ago/PIWI proteins always contain PAZ (oligonucleotide binding) and PIWI (active or inactivated nuclease) domains, the prokaryotic Argonaute homologs (pAgos) fall into two major groups in which the PAZ domain is either present or absent. The monophyly of each group is supported by a phylogenetic analysis of the conserved PIWI-domains. Almost all pAgos that lack a PAZ domain appear to be inactivated, and the respective genes are associated with a variety of predicted nucleases in putative operons. An additional, uncharacterized domain that is fused to various nucleases appears to be a unique signature of operons encoding the short (lacking PAZ) pAgo form. By contrast, almost all PAZ-domain containing pAgos are predicted to be active nucleases. Some proteins of this group (e.g., that from Aquifex aeolicus) have been experimentally shown to possess nuclease activity, and are not typically associated with genes for other (putative) nucleases. Given these observations, the apparent extensive horizontal transfer of

  2. Protein-protein interactions and prediction: a comprehensive overview.

    PubMed

    Sowmya, Gopichandran; Ranganathan, Shoba

    2014-01-01

    Molecular function in cellular processes is governed by protein-protein interactions (PPIs) within biological networks. Selective yet specific association of these protein partners contributes to diverse functionality such as catalysis, regulation, assembly, immunity, and inhibition in a cell. Therefore, understanding the principles of protein-protein association has been of immense interest for several decades. We provide an overview of the experimental methods used to determine PPIs and the key databases archiving this information. Structural and functional information of existing protein complexes confers knowledge on the principles of PPI, based on which a classification scheme for PPIs is then introduced. Obtaining high-quality non-redundant datasets of protein complexes for interaction characterisation is an essential step towards deciphering their underlying binding principles. Analysis of physicochemical features and their documentation has enhanced our understanding of the molecular basis of protein-protein association. We describe the diverse datasets created/collected by various groups and their key findings inferring distinguishing features. The currently available interface databases and prediction servers have also been compiled. PMID:23855658

  3. Predicting Thermodynamic Behaviors of Non-Protein Amino Acids as a Function of Temperature and pH

    NASA Astrophysics Data System (ADS)

    Kitadai, Norio

    2016-03-01

    Why does life use α-amino acids exclusively as building blocks of proteins? To address that fundamental question from an energetic perspective, this study estimated the standard molal thermodynamic data for three non-α-amino acids (β-alanine, γ-aminobutyric acid, and ɛ-aminocaproic acid) and α-amino- n-butyric acid in their zwitterionic, negative, and positive ionization states based on the corresponding experimental measurements reported in the literature. Temperature dependences of their heat capacities were described based on the revised Helgeson-Kirkham-Flowers (HKF) equations of state. The obtained dataset was then used to calculate the standard molal Gibbs energies ( ∆G o) of the non-α-amino acids as a function of temperature and pH. Comparison of their ∆G o values with those of α-amino acids having the same molecular formula showed that the non-α-amino acids have similar ∆G o values to the corresponding α-amino acids in physiologically relevant conditions (neutral pH, <100 °C). In acidic and alkaline pH, the non-α-amino acids are thermodynamically more stable than the corresponding α-ones over a broad temperature range. These results suggest that the energetic cost of synthesis is not an important selection pressure to incorporate α-amino acids into biological systems.

  4. Predicting Physical Interactions between Protein Complexes*

    PubMed Central

    Clancy, Trevor; Rødland, Einar Andreas; Nygard, Ståle; Hovig, Eivind

    2013-01-01

    Protein complexes enact most biochemical functions in the cell. Dynamic interactions between protein complexes are frequent in many cellular processes. As they are often of a transient nature, they may be difficult to detect using current genome-wide screens. Here, we describe a method to computationally predict physical interactions between protein complexes, applied to both humans and yeast. We integrated manually curated protein complexes and physical protein interaction networks, and we designed a statistical method to identify pairs of protein complexes where the number of protein interactions between a complex pair is due to an actual physical interaction between the complexes. An evaluation against manually curated physical complex-complex interactions in yeast revealed that 50% of these interactions could be predicted in this manner. A community network analysis of the highest scoring pairs revealed a biologically sensible organization of physical complex-complex interactions in the cell. Such analyses of proteomes may serve as a guide to the discovery of novel functional cellular relationships. PMID:23438732

  5. Including Ligand Induced Protein Flexibility into Protein Tunnel Prediction

    PubMed Central

    Kingsley, Laura J.; Lill, Markus A.

    2014-01-01

    In proteins with buried active sites, understanding how ligands migrate through the tunnels that connect the exterior of the protein to the active site can shed light on substrate specificity and enzyme function. A growing body of evidence highlights the importance of protein flexibility in the binding site upon ligand binding; however, the influence of protein flexibility throughout the body of the protein during ligand entry and egress is much less characterized. We have developed a novel tunnel prediction and evaluation method named IterTunnel, which includes the influence of ligand-induced protein flexibility, guarantees ligand egress, and provides detailed free energy information as the ligand proceeds along the egress route. IterTunnel combines geometric tunnel prediction with steered MD in an iterative process to identify tunnels that open as a result of ligand migration and calculates the potential of mean force (PMF) of ligand egress through a given tunnel. Applying this new method to cytochrome P450 2B6 (CYP2B6), we demonstrate the influence of protein flexibility on the shape and accessibility of tunnels. More importantly, we demonstrate that the ligand itself, while traversing through a tunnel, can reshape tunnels due to its interaction with the protein. This process results in the exposure of new tunnels and the closure of pre-existing tunnels as the ligand migrates from the active site. PMID:25043499

  6. Actin-interacting and flagellar proteins in Leishmania spp.: Bioinformatics predictions to functional assignments in phagosome formation

    PubMed Central

    2009-01-01

    Several motile processes are responsible for the movement of proteins into and within the flagellar membrane, but little is known about the process by which specific proteins (either actin-associated or not) are targeted to protozoan flagellar membranes. Actin is a major cytoskeleton protein, while polymerization and depolymerization of parasite actin and actin-interacting proteins (AIPs) during both processes of motility and host cell entry might be key events for successful infection. For a better understanding the eukaryotic flagellar dynamics, we have surveyed genomes, transcriptomes and proteomes of pathogenic Leishmania spp. to identify pertinent genes/proteins and to build in silico models to properly address their putative roles in trypanosomatid virulence. In a search for AIPs involved in flagellar activities, we applied computational biology and proteomic tools to infer from the biological meaning of coronins and Arp2/3, two important elements in phagosome formation after parasite phagocytosis by macrophages. Results presented here provide the first report of Leishmania coronin and Arp2/3 as flagellar proteins that also might be involved in phagosome formation through actin polymerization within the flagellar environment. This is an issue worthy of further in vitro examination that remains now as a direct, positive bioinformatics-derived inference to be presented. PMID:21637533

  7. Actin-interacting and flagellar proteins in Leishmania spp.: Bioinformatics predictions to functional assignments in phagosome formation.

    PubMed

    Diniz, Michely C; Costa, Marcília P; Pacheco, Ana C L; Kamimura, Michel T; Silva, Samara C; Carneiro, Laura D G; Sousa, Ana P L; Soares, Carlos E A; Souza, Celeste S F; de Oliveira, Diana Magalhães

    2009-07-01

    Several motile processes are responsible for the movement of proteins into and within the flagellar membrane, but little is known about the process by which specific proteins (either actin-associated or not) are targeted to protozoan flagellar membranes. Actin is a major cytoskeleton protein, while polymerization and depolymerization of parasite actin and actin-interacting proteins (AIPs) during both processes of motility and host cell entry might be key events for successful infection. For a better understanding the eukaryotic flagellar dynamics, we have surveyed genomes, transcriptomes and proteomes of pathogenic Leishmania spp. to identify pertinent genes/proteins and to build in silico models to properly address their putative roles in trypanosomatid virulence. In a search for AIPs involved in flagellar activities, we applied computational biology and proteomic tools to infer from the biological meaning of coronins and Arp2/3, two important elements in phagosome formation after parasite phagocytosis by macrophages. Results presented here provide the first report of Leishmania coronin and Arp2/3 as flagellar proteins that also might be involved in phagosome formation through actin polymerization within the flagellar environment. This is an issue worthy of further in vitro examination that remains now as a direct, positive bioinformatics-derived inference to be presented. PMID:21637533

  8. Predicting communities from functional traits.

    PubMed

    Cadotte, Marc W; Arnillas, Carlos A; Livingstone, Stuart W; Yasui, Simone-Louise E

    2015-09-01

    Species traits influence where species live and how they interact. While there have been many advances in describing the functional composition and diversity of communities, only recently do researchers have the ability to predict community composition and diversity. This predictive ability can offer fundamental insights into ecosystem resilience and restoration. PMID:26190136

  9. Predictive and comparative analysis of Ebolavirus proteins.

    PubMed

    Cong, Qian; Pei, Jimin; Grishin, Nick V

    2015-01-01

    Ebolavirus is the pathogen for Ebola Hemorrhagic Fever (EHF). This disease exhibits a high fatality rate and has recently reached a historically epidemic proportion in West Africa. Out of the 5 known Ebolavirus species, only Reston ebolavirus has lost human pathogenicity, while retaining the ability to cause EHF in long-tailed macaque. Significant efforts have been spent to determine the three-dimensional (3D) structures of Ebolavirus proteins, to study their interaction with host proteins, and to identify the functional motifs in these viral proteins. Here, in light of these experimental results, we apply computational analysis to predict the 3D structures and functional sites for Ebolavirus protein domains with unknown structure, including a zinc-finger domain of VP30, the RNA-dependent RNA polymerase catalytic domain and a methyltransferase domain of protein L. In addition, we compare sequences of proteins that interact with Ebolavirus proteins from RESTV-resistant primates with those from RESTV-susceptible monkeys. The host proteins that interact with GP and VP35 show an elevated level of sequence divergence between the RESTV-resistant and RESTV-susceptible species, suggesting that they may be responsible for host specificity. Meanwhile, we detect variable positions in protein sequences that are likely associated with the loss of human pathogenicity in RESTV, map them onto the 3D structures and compare their positions to known functional sites. VP35 and VP30 are significantly enriched in these potential pathogenicity determinants and the clustering of such positions on the surfaces of VP35 and GP suggests possible uncharacterized interaction sites with host proteins that contribute to the virulence of Ebolavirus. PMID:26158395

  10. Bioinformatics Approaches for Predicting Disordered Protein Motifs.

    PubMed

    Bhowmick, Pallab; Guharoy, Mainak; Tompa, Peter

    2015-01-01

    Short, linear motifs (SLiMs) in proteins are functional microdomains consisting of contiguous residue segments along the protein sequence, typically not more than 10 consecutive amino acids in length with less than 5 defined positions. Many positions are 'degenerate' thus offering flexibility in terms of the amino acid types allowed at those positions. Their short length and degenerate nature confers evolutionary plasticity meaning that SLiMs often evolve convergently. Further, SLiMs have a propensity to occur within intrinsically unstructured protein segments and this confers versatile functionality to unstructured regions of the proteome. SLiMs mediate multiple types of protein interactions based on domain-peptide recognition and guide functions including posttranslational modifications, subcellular localization of proteins, and ligand binding. SLiMs thus behave as modular interaction units that confer versatility to protein function and SLiM-mediated interactions are increasingly being recognized as therapeutic targets. In this chapter we start with a brief description about the properties of SLiMs and their interactions and then move on to discuss algorithms and tools including several web-based methods that enable the discovery of novel SLiMs (de novo motif discovery) as well as the prediction of novel occurrences of known SLiMs. Both individual amino acid sequences as well as sets of protein sequences can be scanned using these methods to obtain statistically overrepresented sequence patterns. Lists of putatively functional SLiMs are then assembled based on parameters such as evolutionary sequence conservation, disorder scores, structural data, gene ontology terms and other contextual information that helps to assess the functional credibility or significance of these motifs. These bioinformatics methods should certainly guide experiments aimed at motif discovery. PMID:26387106

  11. Predicting protein-ligand and protein-peptide interfaces

    NASA Astrophysics Data System (ADS)

    Bertolazzi, Paola; Guerra, Concettina; Liuzzi, Giampaolo

    2014-06-01

    The paper deals with the identification of binding sites and concentrates on interactions involving small interfaces. In particular we focus our attention on two major interface types, namely protein-ligand and protein-peptide interfaces. As concerns protein-ligand binding site prediction, we classify the most interesting methods and approaches into four main categories: (a) shape-based methods, (b) alignment-based methods, (c) graph-theoretic approaches and (d) machine learning methods. Class (a) encompasses those methods which employ, in some way, geometric information about the protein surface. Methods falling into class (b) address the prediction problem as an alignment problem, i.e. finding protein-ligand atom pairs that occupy spatially equivalent positions. Graph theoretic approaches, class (c), are mainly based on the definition of a particular graph, known as the protein contact graph, and then apply some sophisticated methods from graph theory to discover subgraphs or score similarities for uncovering functional sites. The last class (d) contains those methods that are based on the learn-from-examples paradigm and that are able to take advantage of the large amount of data available on known protein-ligand pairs. As for protein-peptide interfaces, due to the often disordered nature of the regions involved in binding, shape similarity is no longer a determining factor. Then, in geometry-based methods, geometry is accounted for by providing the relative position of the atoms surrounding the peptide residues in known structures. Finally, also for protein-peptide interfaces, we present a classification of some successful machine learning methods. Indeed, they can be categorized in the way adopted to construct the learning examples. In particular, we envisage three main methods: distance functions, structure and potentials and structure alignment.

  12. Developing algorithms for predicting protein-protein interactions of homology modeled proteins.

    SciTech Connect

    Martin, Shawn Bryan; Sale, Kenneth L.; Faulon, Jean-Loup Michel; Roe, Diana C.

    2006-01-01

    The goal of this project was to examine the protein-protein docking problem, especially as it relates to homology-based structures, identify the key bottlenecks in current software tools, and evaluate and prototype new algorithms that may be developed to improve these bottlenecks. This report describes the current challenges in the protein-protein docking problem: correctly predicting the binding site for the protein-protein interaction and correctly placing the sidechains. Two different and complementary approaches are taken that can help with the protein-protein docking problem. The first approach is to predict interaction sites prior to docking, and uses bioinformatics studies of protein-protein interactions to predict theses interaction site. The second approach is to improve validation of predicted complexes after docking, and uses an improved scoring function for evaluating proposed docked poses, incorporating a solvation term. This scoring function demonstrates significant improvement over current state-of-the art functions. Initial studies on both these approaches are promising, and argue for full development of these algorithms.

  13. Structure to function prediction of hypothetical protein KPN_00953 (Ycbk) from Klebsiella pneumoniae MGH 78578 highlights possible role in cell wall metabolism

    PubMed Central

    2014-01-01

    Background Klebsiella pneumoniae plays a major role in causing nosocomial infection in immunocompromised patients. Medical inflictions by the pathogen can range from respiratory and urinary tract infections, septicemia and primarily, pneumonia. As more K. pneumoniae strains are becoming highly resistant to various antibiotics, treatment of this bacterium has been rendered more difficult. This situation, as a consequence, poses a threat to public health. Hence, identification of possible novel drug targets against this opportunistic pathogen need to be undertaken. In the complete genome sequence of K. pneumoniae MGH 78578, approximately one-fourth of the genome encodes for hypothetical proteins (HPs). Due to their low homology and relatedness to other known proteins, HPs may serve as potential, new drug targets. Results Sequence analysis on the HPs of K. pneumoniae MGH 78578 revealed that a particular HP termed KPN_00953 (YcbK) contains a M15_3 peptidases superfamily conserved domain. Some members of this superfamily are metalloproteases which are involved in cell wall metabolism. BLASTP similarity search on KPN_00953 (YcbK) revealed that majority of the hits were hypothetical proteins although two of the hits suggested that it may be a lipoprotein or related to twin-arginine translocation (Tat) pathway important for transport of proteins to the cell membrane and periplasmic space. As lipoproteins and other components of the cell wall are important pathogenic factors, homology modeling of KPN_00953 was attempted to predict the structure and function of this protein. Three-dimensional model of the protein showed that its secondary structure topology and active site are similar with those found among metalloproteases where two His residues, namely His169 and His209 and an Asp residue, Asp176 in KPN_00953 were found to be Zn-chelating residues. Interestingly, induced expression of the cloned KPN_00953 gene in lipoprotein-deficient E. coli JE5505 resulted in smoother

  14. Predicting Resistance Mutations Using Protein Design Algorithms

    SciTech Connect

    Frey, K.; Georgiev, I; Donald, B; Anderson, A

    2010-01-01

    Drug resistance resulting from mutations to the target is an unfortunate common phenomenon that limits the lifetime of many of the most successful drugs. In contrast to the investigation of mutations after clinical exposure, it would be powerful to be able to incorporate strategies early in the development process to predict and overcome the effects of possible resistance mutations. Here we present a unique prospective application of an ensemble-based protein design algorithm, K*, to predict potential resistance mutations in dihydrofolate reductase from Staphylococcus aureus using positive design to maintain catalytic function and negative design to interfere with binding of a lead inhibitor. Enzyme inhibition assays show that three of the four highly-ranked predicted mutants are active yet display lower affinity (18-, 9-, and 13-fold) for the inhibitor. A crystal structure of the top-ranked mutant enzyme validates the predicted conformations of the mutated residues and the structural basis of the loss of potency. The use of protein design algorithms to predict resistance mutations could be incorporated in a lead design strategy against any target that is susceptible to mutational resistance.

  15. Scoring docking conformations using predicted protein interfaces

    PubMed Central

    2014-01-01

    Background Since proteins function by interacting with other molecules, analysis of protein-protein interactions is essential for comprehending biological processes. Whereas understanding of atomic interactions within a complex is especially useful for drug design, limitations of experimental techniques have restricted their practical use. Despite progress in docking predictions, there is still room for improvement. In this study, we contribute to this topic by proposing T-PioDock, a framework for detection of a native-like docked complex 3D structure. T-PioDock supports the identification of near-native conformations from 3D models that docking software produced by scoring those models using binding interfaces predicted by the interface predictor, Template based Protein Interface Prediction (T-PIP). Results First, exhaustive evaluation of interface predictors demonstrates that T-PIP, whose predictions are customised to target complexity, is a state-of-the-art method. Second, comparative study between T-PioDock and other state-of-the-art scoring methods establishes T-PioDock as the best performing approach. Moreover, there is good correlation between T-PioDock performance and quality of docking models, which suggests that progress in docking will lead to even better results at recognising near-native conformations. Conclusion Accurate identification of near-native conformations remains a challenging task. Although availability of 3D complexes will benefit from template-based methods such as T-PioDock, we have identified specific limitations which need to be addressed. First, docking software are still not able to produce native like models for every target. Second, current interface predictors do not explicitly consider pairwise residue interactions between proteins and their interacting partners which leaves ambiguity when assessing quality of complex conformations. PMID:24906633

  16. A novel method for protein-protein interaction site prediction using phylogenetic substitution models

    PubMed Central

    La, David; Kihara, Daisuke

    2011-01-01

    Protein-protein binding events mediate many critical biological functions in the cell. Typically, functionally important sites in proteins can be well identified by considering sequence conservation. However, protein-protein interaction sites exhibit higher sequence variation than other functional regions, such as catalytic sites of enzymes. Consequently, the mutational behavior leading to weak sequence conservation poses significant challenges to the protein-protein interaction site prediction. Here, we present a phylogenetic framework to capture critical sequence variations that favor the selection of residues essential for protein-protein binding. Through the comprehensive analysis of diverse protein families, we show that protein binding interfaces exhibit distinct amino acid substitution as compared with other surface residues. Based on this analysis, we have developed a novel method, BindML, which utilizes the substitution models to predict protein-protein binding sites of protein with unknown interacting partners. BindML estimates the likelihood that a phylogenetic tree of a local surface region in a query protein structure follows the substitution patterns of protein binding interface and non-binding surfaces. BindML is shown to perform well compared to alternative methods for protein binding interface prediction. The methodology developed in this study is very versatile in the sense that it can be generally applied for predicting other types of functional sites, such as DNA, RNA, and membrane binding sites in proteins. PMID:21989996

  17. A physical approach to protein structure prediction: CASP4 results

    SciTech Connect

    Crivelli, Silvia; Eskow, Elizabeth; Bader, Brett; Lamberti, Vincent; Byrd, Richard; Schnabel, Robert; Head-Gordon, Teresa

    2001-02-27

    We describe our global optimization method called Stochastic Perturbation with Soft Constraints (SPSC), which uses information from known proteins to predict secondary structure, but not in the tertiary structure predictions or in generating the terms of the physics-based energy function. Our approach is also characterized by the use of an all atom energy function that includes a novel hydrophobic solvation function derived from experiments that shows promising ability for energy discrimination against misfolded structures. We present the results obtained using our SPSC method and energy function for blind prediction in the 4th Critical Assessment of Techniques for Protein Structure Prediction (CASP4) competition, and show that our approach is more effective on targets for which less information from known proteins is available. In fact our SPSC method produced the best prediction for one of the most difficult targets of the competition, a new fold protein of 240 amino acids.

  18. Functional significance of protein assemblies predicted by the crystal structure of the restriction endonuclease BsaWI

    PubMed Central

    Tamulaitis, Gintautas; Rutkauskas, Marius; Zaremba, Mindaugas; Grazulis, Saulius; Tamulaitiene, Giedre; Siksnys, Virginijus

    2015-01-01

    Type II restriction endonuclease BsaWI recognizes a degenerated sequence 5′-W/CCGGW-3′ (W stands for A or T, ‘/’ denotes the cleavage site). It belongs to a large family of restriction enzymes that contain a conserved CCGG tetranucleotide in their target sites. These enzymes are arranged as dimers or tetramers, and require binding of one, two or three DNA targets for their optimal catalytic activity. Here, we present a crystal structure and biochemical characterization of the restriction endonuclease BsaWI. BsaWI is arranged as an ‘open’ configuration dimer and binds a single DNA copy through a minor groove contacts. In the crystal primary BsaWI dimers form an indefinite linear chain via the C-terminal domain contacts implying possible higher order aggregates. We show that in solution BsaWI protein exists in a dimer-tetramer-oligomer equilibrium, but in the presence of specific DNA forms a tetramer bound to two target sites. Site-directed mutagenesis and kinetic experiments show that BsaWI is active as a tetramer and requires two target sites for optimal activity. We propose BsaWI mechanism that shares common features both with dimeric Ecl18kI/SgrAI and bona fide tetrameric NgoMIV/SfiI enzymes. PMID:26240380

  19. Functional significance of protein assemblies predicted by the crystal structure of the restriction endonuclease BsaWI.

    PubMed

    Tamulaitis, Gintautas; Rutkauskas, Marius; Zaremba, Mindaugas; Grazulis, Saulius; Tamulaitiene, Giedre; Siksnys, Virginijus

    2015-09-18

    Type II restriction endonuclease BsaWI recognizes a degenerated sequence 5'-W/CCGGW-3' (W stands for A or T, '/' denotes the cleavage site). It belongs to a large family of restriction enzymes that contain a conserved CCGG tetranucleotide in their target sites. These enzymes are arranged as dimers or tetramers, and require binding of one, two or three DNA targets for their optimal catalytic activity. Here, we present a crystal structure and biochemical characterization of the restriction endonuclease BsaWI. BsaWI is arranged as an 'open' configuration dimer and binds a single DNA copy through a minor groove contacts. In the crystal primary BsaWI dimers form an indefinite linear chain via the C-terminal domain contacts implying possible higher order aggregates. We show that in solution BsaWI protein exists in a dimer-tetramer-oligomer equilibrium, but in the presence of specific DNA forms a tetramer bound to two target sites. Site-directed mutagenesis and kinetic experiments show that BsaWI is active as a tetramer and requires two target sites for optimal activity. We propose BsaWI mechanism that shares common features both with dimeric Ecl18kI/SgrAI and bona fide tetrameric NgoMIV/SfiI enzymes. PMID:26240380

  20. Functional characterization of pediocin PA-1 binding to liposomes in the absence of a protein receptor and its relationship to a predicted tertiary structure.

    PubMed Central

    Chen, Y; Shapira, R; Eisenstein, M; Montville, T J

    1997-01-01

    The physicochemical interaction of pediocin PA-1 with target membranes was characterized using lipid vesicles made from the total lipids extracted from Listeria monocytogenes. Pediocin PA-1 caused the time- and concentration-dependent release of entrapped carboxyfluorescein (CF) from the vesicles. The pediocin-induced CF efflux rates were higher under acidic conditions than under neutral and alkaline conditions and were dependent on both pediocin and lipid concentrations. A binding isotherm constructed on the basis of the Langmuir isotherm gave an apparent binding constant of 1.4 x 10(7) M-1 at pH 6.0. The imposition of a transmembrane potential (inside negative) increased the CF efflux rate by 88%. Pediocin PA-1 also permeablized synthetic vesicles composed only of phosphatidylcholine. Sequence alignments and secondary-structure predictions for the N terminus of pediocin PA-1 and other class IIa bacteriocins predicted that pediocin PA-1 contained two beta-sheets maintained in a hairpin conformation stabilized by a disulfide bridge. The structural model also revealed patches of positively charged residues, consistent with the argument that electrostatic interactions play an important role in the binding of pediocin PA-1 to the lipid vesicles. This study demonstrates that pediocin PA-1 can function in the absence of a protein receptor and provides a structural model consistent with these results. PMID:9023932

  1. Phospholipid liposomes functionalized by protein

    NASA Astrophysics Data System (ADS)

    Glukhova, O. E.; Savostyanov, G. V.; Grishina, O. A.

    2015-03-01

    Finding new ways to deliver neurotrophic drugs to the brain in newborns is one of the contemporary problems of medicine and pharmaceutical industry. Modern researches in this field indicate the promising prospects of supramolecular transport systems for targeted drug delivery to the brain which can overcome the blood-brain barrier (BBB). Thus, the solution of this problem is actual not only for medicine, but also for society as a whole because it determines the health of future generations. Phospholipid liposomes due to combination of lipo- and hydrophilic properties are considered as the main future objects in medicine for drug delivery through the BBB as well as increasing their bioavailability and toxicity. Liposomes functionalized by various proteins were used as transport systems for ease of liposomes use. Designing of modification oligosaccharide of liposomes surface is promising in the last decade because it enables the delivery of liposomes to specific receptor of human cells by selecting ligand and it is widely used in pharmacology for the treatment of several diseases. The purpose of this work is creation of a coarse-grained model of bilayer of phospholipid liposomes, functionalized by specific to the structural elements of the BBB proteins, as well as prediction of the most favorable orientation and position of the molecules in the generated complex by methods of molecular docking for the formation of the structure. Investigation of activity of the ligand molecule to protein receptor of human cells by the methods of molecular dynamics was carried out.

  2. Protein Residue Contacts and Prediction Methods

    PubMed Central

    Adhikari, Badri

    2016-01-01

    In the field of computational structural proteomics, contact predictions have shown new prospects of solving the longstanding problem of ab initio protein structure prediction. In the last few years, application of deep learning algorithms and availability of large protein sequence databases, combined with improvement in methods that derive contacts from multiple sequence alignments, have shown a huge increase in the precision of contact prediction. In addition, these predicted contacts have also been used to build three-dimensional models from scratch. In this chapter, we briefly discuss many elements of protein residue–residue contacts and the methods available for prediction, focusing on a state-of-the-art contact prediction tool, DNcon. Illustrating with a case study, we describe how DNcon can be used to make ab initio contact predictions for a given protein sequence and discuss how the predicted contacts may be analyzed and evaluated. PMID:27115648

  3. Protein Residue Contacts and Prediction Methods.

    PubMed

    Adhikari, Badri; Cheng, Jianlin

    2016-01-01

    In the field of computational structural proteomics, contact predictions have shown new prospects of solving the longstanding problem of ab initio protein structure prediction. In the last few years, application of deep learning algorithms and availability of large protein sequence databases, combined with improvement in methods that derive contacts from multiple sequence alignments, have shown a huge increase in the precision of contact prediction. In addition, these predicted contacts have also been used to build three-dimensional models from scratch.In this chapter, we briefly discuss many elements of protein residue-residue contacts and the methods available for prediction, focusing on a state-of-the-art contact prediction tool, DNcon. Illustrating with a case study, we describe how DNcon can be used to make ab initio contact predictions for a given protein sequence and discuss how the predicted contacts may be analyzed and evaluated. PMID:27115648

  4. Practical lessons from protein structure prediction

    PubMed Central

    Ginalski, Krzysztof; Grishin, Nick V.; Godzik, Adam; Rychlewski, Leszek

    2005-01-01

    Despite recent efforts to develop automated protein structure determination protocols, structural genomics projects are slow in generating fold assignments for complete proteomes, and spatial structures remain unknown for many protein families. Alternative cheap and fast methods to assign folds using prediction algorithms continue to provide valuable structural information for many proteins. The development of high-quality prediction methods has been boosted in the last years by objective community-wide assessment experiments. This paper gives an overview of the currently available practical approaches to protein structure prediction capable of generating accurate fold assignment. Recent advances in assessment of the prediction quality are also discussed. PMID:15805122

  5. PQuad: Visualization of Predicted Peptides and Proteins

    SciTech Connect

    Havre, Susan L.; Singhal, Mudita; Payne, Deborah A.; Webb-Robertson, Bobbie-Jo M.

    2004-10-10

    New high-throughput proteomic techniques generate data faster than biologist and bioinformaticists can analyze it. Yet, hidden within this massive and complex data are answers to basic questions about how cells function to support life or respond to disease. Now biologists can take a global or systems approach studying not one or two proteins at a time but whole proteomes comprising all the proteins in a cell. However, the tremendous size and complexity of the high-throughput experiment data make it difficult to process and interpret. Visualization provides powerful analysis capabilities for such enormous and complex data. In this paper, we introduce a novel interactive visualization, PQuad (Peptide Permutation and Protein Prediction), designed for the visual analysis of peptides (protein fragments) identified from high-throughput data. PQuad depicts the experiment peptides in the context of their parent protein and DNA, thereby integrating proteomic and genomic information. A wrapped line metaphor is applied across key resolutions of the data, from a compressed view of an entire chromosome to the actual nucleotide sequence. PQuad provides a difference visualization for comparing peptides from different experimental conditions. We describe the requirements for such a visual analysis tool, the design decisions, and the novel aspects of PQuad.

  6. A Method for Predicting Protein-Protein Interaction Types

    PubMed Central

    Silberberg, Yael

    2014-01-01

    Protein-protein interactions (PPIs) govern basic cellular processes through signal transduction and complex formation. The diversity of those processes gives rise to a remarkable diversity of interactions types, ranging from transient phosphorylation interactions to stable covalent bonding. Despite our increasing knowledge on PPIs in humans and other species, their types remain relatively unexplored and few annotations of types exist in public databases. Here, we propose the first method for systematic prediction of PPI type based solely on the techniques by which the interaction was detected. We show that different detection methods are better suited for detecting specific types. We apply our method to ten interaction types on a large scale human PPI dataset. We evaluate the performance of the method using both internal cross validation and external data sources. In cross validation, we obtain an area under receiver operating characteristic (ROC) curve ranging from 0.65 to 0.97 with an average of 0.84 across the predicted types. Comparing the predicted interaction types to external data sources, we obtained significant agreements for phosphorylation and ubiquitination interactions, with hypergeometric p-value = 2.3e−54 and 5.6e−28 respectively. We examine the biological relevance of our predictions using known signaling pathways and chart the abundance of interaction types in cell processes. Finally, we investigate the cross-relations between different interaction types within the network and characterize the discovered patterns, or motifs. We expect the resulting annotated network to facilitate the reconstruction of process-specific subnetworks and assist in predicting protein function or interaction. PMID:24625764

  7. An Overview of Practical Applications of Protein Disorder Prediction and Drive for Faster, More Accurate Predictions.

    PubMed

    Deng, Xin; Gumm, Jordan; Karki, Suman; Eickholt, Jesse; Cheng, Jianlin

    2015-01-01

    Protein disordered regions are segments of a protein chain that do not adopt a stable structure. Thus far, a variety of protein disorder prediction methods have been developed and have been widely used, not only in traditional bioinformatics domains, including protein structure prediction, protein structure determination and function annotation, but also in many other biomedical fields. The relationship between intrinsically-disordered proteins and some human diseases has played a significant role in disorder prediction in disease identification and epidemiological investigations. Disordered proteins can also serve as potential targets for drug discovery with an emphasis on the disordered-to-ordered transition in the disordered binding regions, and this has led to substantial research in drug discovery or design based on protein disordered region prediction. Furthermore, protein disorder prediction has also been applied to healthcare by predicting the disease risk of mutations in patients and studying the mechanistic basis of diseases. As the applications of disorder prediction increase, so too does the need to make quick and accurate predictions. To fill this need, we also present a new approach to predict protein residue disorder using wide sequence windows that is applicable on the genomic scale. PMID:26198229

  8. An Overview of Practical Applications of Protein Disorder Prediction and Drive for Faster, More Accurate Predictions

    PubMed Central

    Deng, Xin; Gumm, Jordan; Karki, Suman; Eickholt, Jesse; Cheng, Jianlin

    2015-01-01

    Protein disordered regions are segments of a protein chain that do not adopt a stable structure. Thus far, a variety of protein disorder prediction methods have been developed and have been widely used, not only in traditional bioinformatics domains, including protein structure prediction, protein structure determination and function annotation, but also in many other biomedical fields. The relationship between intrinsically-disordered proteins and some human diseases has played a significant role in disorder prediction in disease identification and epidemiological investigations. Disordered proteins can also serve as potential targets for drug discovery with an emphasis on the disordered-to-ordered transition in the disordered binding regions, and this has led to substantial research in drug discovery or design based on protein disordered region prediction. Furthermore, protein disorder prediction has also been applied to healthcare by predicting the disease risk of mutations in patients and studying the mechanistic basis of diseases. As the applications of disorder prediction increase, so too does the need to make quick and accurate predictions. To fill this need, we also present a new approach to predict protein residue disorder using wide sequence windows that is applicable on the genomic scale. PMID:26198229

  9. Prediction of Protein-Protein Interaction Sites Based on Naive Bayes Classifier

    PubMed Central

    Geng, Haijiang; Lu, Tao; Lin, Xiao; Liu, Yu; Yan, Fangrong

    2015-01-01

    Protein functions through interactions with other proteins and biomolecules and these interactions occur on the so-called interface residues of the protein sequences. Identifying interface residues makes us better understand the biological mechanism of protein interaction. Meanwhile, information about the interface residues contributes to the understanding of metabolic, signal transduction networks and indicates directions in drug designing. In recent years, researchers have focused on developing new computational methods for predicting protein interface residues. Here we creatively used a 181-dimension protein sequence feature vector as input to the Naive Bayes Classifier- (NBC-) based method to predict interaction sites in protein-protein complexes interaction. The prediction of interaction sites in protein interactions is regarded as an amino acid residue binary classification problem by applying NBC with protein sequence features. Independent test results suggested that Naive Bayes Classifier-based method with the protein sequence features as input vectors performed well. PMID:26697220

  10. Structural model of ρ1 GABAC receptor based on evolutionary analysis: Testing of predicted protein–protein interactions involved in receptor assembly and function

    PubMed Central

    Adamian, Larisa; Gussin, Hélène A; Tseng, Yan Yuan; Muni, Niraj J; Feng, Feng; Qian, Haohua; Pepperberg, David R; Liang, Jie

    2009-01-01

    The homopentameric ρ1 GABAC receptor is a ligand-gated ion channel with a binding pocket for γ-aminobutyric acid (GABA) at the interfaces of N-terminal extracellular domains. We combined evolutionary analysis, structural modeling, and experimental testing to study determinants of GABAC receptor assembly and channel gating. We estimated the posterior probability of selection pressure at amino acid residue sites measured as ω-values and built a comparative structural model, which identified several polar residues under strong selection pressure at the subunit interfaces that may form intersubunit hydrogen bonds or salt bridges. At three selected sites (R111, T151, and E55), mutations disrupting intersubunit interactions had strong effects on receptor folding, assembly, and function. We next examined the role of a predicted intersubunit salt bridge for residue pair R158–D204. The mutant R158D, where the positively charged residue is replaced by a negatively charged aspartate, yielded a partially degraded receptor and lacked membrane surface expression. The membrane surface expression was rescued by the double mutant R158D–D204R, where positive and negative charges are switched, although the mutant receptor was inactive. The single mutants R158A, D204R, and D204A exhibited diminished activities and altered kinetic profiles with fast recovery kinetics, suggesting that R158–D204 salt bridge perhaps stabilizes the open state of the GABAC receptor. Our results emphasize the functional importance of highly conserved polar residues at the protein–protein interfaces in GABAC ρ1 receptors and demonstrate how the integration of computational and experimental approaches can aid discovery of functionally important interactions. PMID:19768800

  11. Predicting Ca(2+)-binding sites in proteins.

    PubMed Central

    Nayal, M; Di Cera, E

    1994-01-01

    The coordination shell of Ca2+ ions in proteins contains almost exclusively oxygen atoms supported by an outer shell of carbon atoms. The bond-strength contribution of each ligating oxygen in the inner shell can be evaluated by using an empirical expression successfully applied in the analysis of crystals of metal oxides. The sum of such contributions closely approximates the valence of the bound cation. When a protein is embedded in a very fine grid of points and an algorithm is used to calculate the valence of each point representing a potential Ca(2+)-binding site, a typical distribution of valence values peaked around 0.4 is obtained. In 32 documented Ca(2+)-binding proteins, containing a total of 62 Ca(2+)-binding sites, a very small fraction of points in the distribution has a valence close to that of Ca2+. Only 0.06% of the points have a valence > or = 1.4. These points share the remarkable tendency to cluster around documented Ca2+ ions. A high enough value of the valence is both necessary (58 out of 62 Ca(2+)-binding sites have a valence > or = 1.4) and sufficient (87% of the grid points with a valence > or = 1.4 are within 1.0 A from a documented Ca2+ ion) to predict the location of bound Ca2+ ions. The algorithm can also be used for the analysis of other cations and predicts the location of Mg(2+)- and Na(+)-binding sites in a number of proteins. The valence is, therefore, a tool of pinpoint accuracy for locating cation-binding sites, which can also be exploited in engineering high-affinity binding sites and characterizing the linkage between structural components and functional energetics for molecular recognition of metal ions by proteins. Images Fig. 4 PMID:8290605

  12. The DynaMine webserver: predicting protein dynamics from sequence.

    PubMed

    Cilia, Elisa; Pancsa, Rita; Tompa, Peter; Lenaerts, Tom; Vranken, Wim F

    2014-07-01

    Protein dynamics are important for understanding protein function. Unfortunately, accurate protein dynamics information is difficult to obtain: here we present the DynaMine webserver, which provides predictions for the fast backbone movements of proteins directly from their amino-acid sequence. DynaMine rapidly produces a profile describing the statistical potential for such movements at residue-level resolution. The predicted values have meaning on an absolute scale and go beyond the traditional binary classification of residues as ordered or disordered, thus allowing for direct dynamics comparisons between protein regions. Through this webserver, we provide molecular biologists with an efficient and easy to use tool for predicting the dynamical characteristics of any protein of interest, even in the absence of experimental observations. The prediction results are visualized and can be directly downloaded. The DynaMine webserver, including instructive examples describing the meaning of the profiles, is available at http://dynamine.ibsquare.be. PMID:24728994

  13. Predicting the dynamics of protein abundance.

    PubMed

    Mehdi, Ahmed M; Patrick, Ralph; Bailey, Timothy L; Bodén, Mikael

    2014-05-01

    Protein synthesis is finely regulated across all organisms, from bacteria to humans, and its integrity underpins many important processes. Emerging evidence suggests that the dynamic range of protein abundance is greater than that observed at the transcript level. Technological breakthroughs now mean that sequencing-based measurement of mRNA levels is routine, but protocols for measuring protein abundance remain both complex and expensive. This paper introduces a Bayesian network that integrates transcriptomic and proteomic data to predict protein abundance and to model the effects of its determinants. We aim to use this model to follow a molecular response over time, from condition-specific data, in order to understand adaptation during processes such as the cell cycle. With microarray data now available for many conditions, the general utility of a protein abundance predictor is broad. Whereas most quantitative proteomics studies have focused on higher organisms, we developed a predictive model of protein abundance for both Saccharomyces cerevisiae and Schizosaccharomyces pombe to explore the latitude at the protein level. Our predictor primarily relies on mRNA level, mRNA-protein interaction, mRNA folding energy and half-life, and tRNA adaptation. The combination of key features, allowing for the low certainty and uneven coverage of experimental observations, gives comparatively minor but robust prediction accuracy. The model substantially improved the analysis of protein regulation during the cell cycle: predicted protein abundance identified twice as many cell-cycle-associated proteins as experimental mRNA levels. Predicted protein abundance was more dynamic than observed mRNA expression, agreeing with experimental protein abundance from a human cell line. We illustrate how the same model can be used to predict the folding energy of mRNA when protein abundance is available, lending credence to the emerging view that mRNA folding affects translation efficiency

  14. Predicting the Dynamics of Protein Abundance

    PubMed Central

    Mehdi, Ahmed M.; Patrick, Ralph; Bailey, Timothy L.; Bodén, Mikael

    2014-01-01

    Protein synthesis is finely regulated across all organisms, from bacteria to humans, and its integrity underpins many important processes. Emerging evidence suggests that the dynamic range of protein abundance is greater than that observed at the transcript level. Technological breakthroughs now mean that sequencing-based measurement of mRNA levels is routine, but protocols for measuring protein abundance remain both complex and expensive. This paper introduces a Bayesian network that integrates transcriptomic and proteomic data to predict protein abundance and to model the effects of its determinants. We aim to use this model to follow a molecular response over time, from condition-specific data, in order to understand adaptation during processes such as the cell cycle. With microarray data now available for many conditions, the general utility of a protein abundance predictor is broad. Whereas most quantitative proteomics studies have focused on higher organisms, we developed a predictive model of protein abundance for both Saccharomyces cerevisiae and Schizosaccharomyces pombe to explore the latitude at the protein level. Our predictor primarily relies on mRNA level, mRNA–protein interaction, mRNA folding energy and half-life, and tRNA adaptation. The combination of key features, allowing for the low certainty and uneven coverage of experimental observations, gives comparatively minor but robust prediction accuracy. The model substantially improved the analysis of protein regulation during the cell cycle: predicted protein abundance identified twice as many cell-cycle-associated proteins as experimental mRNA levels. Predicted protein abundance was more dynamic than observed mRNA expression, agreeing with experimental protein abundance from a human cell line. We illustrate how the same model can be used to predict the folding energy of mRNA when protein abundance is available, lending credence to the emerging view that mRNA folding affects translation

  15. Predicting protein dynamics from structural ensembles

    NASA Astrophysics Data System (ADS)

    Copperman, J.; Guenza, M. G.

    2015-12-01

    The biological properties of proteins are uniquely determined by their structure and dynamics. A protein in solution populates a structural ensemble of metastable configurations around the global fold. From overall rotation to local fluctuations, the dynamics of proteins can cover several orders of magnitude in time scales. We propose a simulation-free coarse-grained approach which utilizes knowledge of the important metastable folded states of the protein to predict the protein dynamics. This approach is based upon the Langevin Equation for Protein Dynamics (LE4PD), a Langevin formalism in the coordinates of the protein backbone. The linear modes of this Langevin formalism organize the fluctuations of the protein, so that more extended dynamical cooperativity relates to increasing energy barriers to mode diffusion. The accuracy of the LE4PD is verified by analyzing the predicted dynamics across a set of seven different proteins for which both relaxation data and NMR solution structures are available. Using experimental NMR conformers as the input structural ensembles, LE4PD predicts quantitatively accurate results, with correlation coefficient ρ = 0.93 to NMR backbone relaxation measurements for the seven proteins. The NMR solution structure derived ensemble and predicted dynamical relaxation is compared with molecular dynamics simulation-derived structural ensembles and LE4PD predictions and is consistent in the time scale of the simulations. The use of the experimental NMR conformers frees the approach from computationally demanding simulations.

  16. The Intrinsic Geometric Structure of Protein-Protein Interaction Networks for Protein Interaction Prediction.

    PubMed

    Fang, Yi; Sun, Mengtian; Dai, Guoxian; Ramain, Karthik

    2016-01-01

    Recent developments in high-throughput technologies for measuring protein-protein interaction (PPI) have profoundly advanced our ability to systematically infer protein function and regulation. However, inherently high false positive and false negative rates in measurement have posed great challenges in computational approaches for the prediction of PPI. A good PPI predictor should be 1) resistant to high rate of missing and spurious PPIs, and 2) robust against incompleteness of observed PPI networks. To predict PPI in a network, we developed an intrinsic geometry structure (IGS) for network, which exploits the intrinsic and hidden relationship among proteins in network through a heat diffusion process. In this process, all explicit PPIs participate simultaneously to glue local infinitesimal and noisy experimental interaction data to generate a global macroscopic descriptions about relationships among proteins. The revealed implicit relationship can be interpreted as the probability of two proteins interacting with each other. The revealed relationship is intrinsic and robust against individual, local and explicit protein interactions in the original network. We apply our approach to publicly available PPI network data for the evaluation of the performance of PPI prediction. Experimental results indicate that, under different levels of the missing and spurious PPIs, IGS is able to robustly exploit the intrinsic and hidden relationship for PPI prediction with a higher sensitivity and specificity compared to that of recently proposed methods. PMID:26886733

  17. Local backbone structure prediction of proteins.

    PubMed

    de Brevern, Alexandre G; Benros, Cristina; Gautier, Romain; Valadié, Héléne; Hazout, Serge; Etchebest, Catherine

    2004-01-01

    A statistical analysis of the PDB structures has led us to define a new set of small 3D structural prototypes called Protein Blocks (PBs). This structural alphabet includes 16 PBs, each one is defined by the (phi, psi) dihedral angles of 5 consecutive residues. The amino acid distributions observed in sequence windows encompassing these PBs are used to predict by a Bayesian approach the local 3D structure of proteins from the sole knowledge of their sequences. LocPred is a software which allows the users to submit a protein sequence and performs a prediction in terms of PBs. The prediction results are given both textually and graphically. PMID:15724288

  18. Assigning protein functions by comparative genome analysis protein phylogenetic profiles

    DOEpatents

    Pellegrini, Matteo; Marcotte, Edward M.; Thompson, Michael J.; Eisenberg, David; Grothe, Robert; Yeates, Todd O.

    2003-05-13

    A computational method system, and computer program are provided for inferring functional links from genome sequences. One method is based on the observation that some pairs of proteins A' and B' have homologs in another organism fused into a single protein chain AB. A trans-genome comparison of sequences can reveal these AB sequences, which are Rosetta Stone sequences because they decipher an interaction between A' and B. Another method compares the genomic sequence of two or more organisms to create a phylogenetic profile for each protein indicating its presence or absence across all the genomes. The profile provides information regarding functional links between different families of proteins. In yet another method a combination of the above two methods is used to predict functional links.

  19. Human genome protein function database.

    PubMed Central

    Sorenson, D. K.

    1991-01-01

    A database which focuses on the normal functions of the currently-known protein products of the Human Genome was constructed. Information is stored as text, figures, tables, and diagrams. The program contains built-in functions to modify, update, categorize, hypertext, search, create reports, and establish links to other databases. The semi-automated categorization feature of the database program was used to classify these proteins in terms of biomedical functions. PMID:1807638

  20. Accurate Prediction of Docked Protein Structure Similarity.

    PubMed

    Akbal-Delibas, Bahar; Pomplun, Marc; Haspel, Nurit

    2015-09-01

    One of the major challenges for protein-protein docking methods is to accurately discriminate nativelike structures. The protein docking community agrees on the existence of a relationship between various favorable intermolecular interactions (e.g. Van der Waals, electrostatic, desolvation forces, etc.) and the similarity of a conformation to its native structure. Different docking algorithms often formulate this relationship as a weighted sum of selected terms and calibrate their weights against specific training data to evaluate and rank candidate structures. However, the exact form of this relationship is unknown and the accuracy of such methods is impaired by the pervasiveness of false positives. Unlike the conventional scoring functions, we propose a novel machine learning approach that not only ranks the candidate structures relative to each other but also indicates how similar each candidate is to the native conformation. We trained the AccuRMSD neural network with an extensive dataset using the back-propagation learning algorithm. Our method achieved predicting RMSDs of unbound docked complexes with 0.4Å error margin. PMID:26335807

  1. Predicting Protein-Protein Interactions from the Molecular to the Proteome Level.

    PubMed

    Keskin, Ozlem; Tuncbag, Nurcan; Gursoy, Attila

    2016-04-27

    Identification of protein-protein interactions (PPIs) is at the center of molecular biology considering the unquestionable role of proteins in cells. Combinatorial interactions result in a repertoire of multiple functions; hence, knowledge of PPI and binding regions naturally serve to functional proteomics and drug discovery. Given experimental limitations to find all interactions in a proteome, computational prediction/modeling of protein interactions is a prerequisite to proceed on the way to complete interactions at the proteome level. This review aims to provide a background on PPIs and their types. Computational methods for PPI predictions can use a variety of biological data including sequence-, evolution-, expression-, and structure-based data. Physical and statistical modeling are commonly used to integrate these data and infer PPI predictions. We review and list the state-of-the-art methods, servers, databases, and tools for protein-protein interaction prediction. PMID:27074302

  2. Structure prediction of magnetosome-associated proteins.

    PubMed

    Nudelman, Hila; Zarivach, Raz

    2014-01-01

    Magnetotactic bacteria (MTB) are Gram-negative bacteria that can navigate along geomagnetic fields. This ability is a result of a unique intracellular organelle, the magnetosome. These organelles are composed of membrane-enclosed magnetite (Fe3O4) or greigite (Fe3S4) crystals ordered into chains along the cell. Magnetosome formation, assembly, and magnetic nano-crystal biomineralization are controlled by magnetosome-associated proteins (MAPs). Most MAP-encoding genes are located in a conserved genomic region - the magnetosome island (MAI). The MAI appears to be conserved in all MTB that were analyzed so far, although the MAI size and organization differs between species. It was shown that MAI deletion leads to a non-magnetic phenotype, further highlighting its important role in magnetosome formation. Today, about 28 proteins are known to be involved in magnetosome formation, but the structures and functions of most MAPs are unknown. To reveal the structure-function relationship of MAPs we used bioinformatics tools in order to build homology models as a way to understand their possible role in magnetosome formation. Here we present a predicted 3D structural models' overview for all known Magnetospirillum gryphiswaldense strain MSR-1 MAPs. PMID:24523717

  3. Structure prediction of magnetosome-associated proteins

    PubMed Central

    Nudelman, Hila; Zarivach, Raz

    2014-01-01

    Magnetotactic bacteria (MTB) are Gram-negative bacteria that can navigate along geomagnetic fields. This ability is a result of a unique intracellular organelle, the magnetosome. These organelles are composed of membrane-enclosed magnetite (Fe3O4) or greigite (Fe3S4) crystals ordered into chains along the cell. Magnetosome formation, assembly, and magnetic nano-crystal biomineralization are controlled by magnetosome-associated proteins (MAPs). Most MAP-encoding genes are located in a conserved genomic region – the magnetosome island (MAI). The MAI appears to be conserved in all MTB that were analyzed so far, although the MAI size and organization differs between species. It was shown that MAI deletion leads to a non-magnetic phenotype, further highlighting its important role in magnetosome formation. Today, about 28 proteins are known to be involved in magnetosome formation, but the structures and functions of most MAPs are unknown. To reveal the structure–function relationship of MAPs we used bioinformatics tools in order to build homology models as a way to understand their possible role in magnetosome formation. Here we present a predicted 3D structural models’ overview for all known Magnetospirillum gryphiswaldense strain MSR-1 MAPs. PMID:24523717

  4. Signature Product Code for Predicting Protein-Protein Interactions

    SciTech Connect

    Martin, Shawn B.; Brown, William M.

    2004-09-25

    The SigProdV1.0 software consists of four programs which together allow the prediction of protein-protein interactions using only amino acid sequences and experimental data. The software is based on the use of tensor products of amino acid trimers coupled with classifiers known as support vector machines. Essentially the program looks for amino acid trimer pairs which occur more frequently in protein pairs which are known to interact. These trimer pairs are then used to make predictions about unknown protein pairs. A detailed description of the method can be found in the paper: S. Martin, D. Roe, J.L. Faulon. "Predicting protein-protein interactions using signature products," Bioinformatics, available online from Advance Access, Aug. 19, 2004.

  5. Predicting hand function after hemidisconnection.

    PubMed

    Küpper, Hanna; Kudernatsch, Manfred; Pieper, Tom; Groeschel, Samuel; Tournier, Jacques-Donald; Raffelt, David; Winkler, Peter; Holthausen, Hans; Staudt, Martin

    2016-09-01

    Hemidisconnections (i.e. hemispherectomies or hemispherotomies) invariably lead to contralateral hemiparesis. Many patients with a pre-existing hemiparesis, however, experience no deterioration in motor functions, and some can still grasp with their paretic hand after hemidisconnection. The scope of our study was to predict this phenomenon. Hypothesizing that preserved contralateral grasping ability after hemidisconnection can only occur in patients controlling their paretic hands via ipsilateral corticospinal projections already in the preoperative situation, we analysed the asymmetries of the brainstem (by manual magnetic resonance imaging volumetry) and of the structural connectivity of the corticospinal tracts within the brainstem (by magnetic resonance imaging diffusion tractography), assuming that marked hypoplasia or Wallerian degeneration on the lesioned side in patients who can grasp with their paretic hands indicate ipsilateral control. One hundred and two patients who underwent hemidisconnections between 0.8 and 36 years of age were included. Before the operation, contralateral hand function was normal in 3/102 patients, 47/102 patients showed hemiparetic grasping ability and 52/102 patients could not grasp with their paretic hands. After hemidisconnection, 20/102 patients showed a preserved grasping ability, and 5/102 patients began to grasp with their paretic hands only after the operation. All these 25 patients suffered from pre- or perinatal brain lesions. Thirty of 102 patients lost their grasping ability. This group included all seven patients with a post-neonatally acquired or progressive brain lesion who could grasp before the operation, and also all three patients with a preoperatively normal hand function. The remaining 52/102 patients were unable to grasp pre- and postoperatively. On magnetic resonance imaging, the patients with preserved grasping showed significantly more asymmetric brainstem volumes than the patients who lost their grasping

  6. Blind predictions of protein interfaces by docking calculations in CAPRI.

    PubMed

    Lensink, Marc F; Wodak, Shoshana J

    2010-11-15

    Reliable prediction of the amino acid residues involved in protein-protein interfaces can provide valuable insight into protein function, and inform mutagenesis studies, and drug design applications. A fast-growing number of methods are being proposed for predicting protein interfaces, using structural information, energetic criteria, or sequence conservation or by integrating multiple criteria and approaches. Overall however, their performance remains limited, especially when applied to nonobligate protein complexes, where the individual components are also stable on their own. Here, we evaluate interface predictions derived from protein-protein docking calculations. To this end we measure the overlap between the interfaces in models of protein complexes submitted by 76 participants in CAPRI (Critical Assessment of Predicted Interactions) and those of 46 observed interfaces in 20 CAPRI targets corresponding to nonobligate complexes. Our evaluation considers multiple models for each target interface, submitted by different participants, using a variety of docking methods. Although this results in a substantial variability in the prediction performance across participants and targets, clear trends emerge. Docking methods that perform best in our evaluation predict interfaces with average recall and precision levels of about 60%, for a small majority (60%) of the analyzed interfaces. These levels are significantly higher than those obtained for nonobligate complexes by most extant interface prediction methods. We find furthermore that a sizable fraction (24%) of the interfaces in models ranked as incorrect in the CAPRI assessment are actually correctly predicted (recall and precision ≥50%), and that these models contribute to 70% of the correct docking-based interface predictions overall. Our analysis proves that docking methods are much more successful in identifying interfaces than in predicting complexes, and suggests that these methods have an excellent

  7. Effective protein conformational sampling based on predicted torsion angles.

    PubMed

    Yang, Yuedong; Zhou, Yaoqi

    2016-04-30

    Protein structure prediction is a long-standing problem in molecular biology. Due to lack of an accurate energy function, it is often difficult to know whether the sampling algorithm or the energy function is the most important factor for failure of locating near-native conformations of proteins. This article examines the size dependence of sampling effectiveness by using a perfect "energy function": the root-mean-squared distance from the target native structure. Using protein targets up to 460 residues from critical assessment of structure prediction techniques (CASP11, 2014), we show that the accuracy of near native structures sampled is relatively independent of protein sizes but strongly depends on the errors of predicted torsion angles. Even with 40% out-of-range angle prediction, 2 Å or less near-native conformation can be sampled. The result supports that the poor energy function is one of the bottlenecks of structure prediction and predicted torsion angles are useful for overcoming the bottleneck by restricting the sampling space in the absence of a perfect energy function. © 2015 Wiley Periodicals, Inc. PMID:26696379

  8. Proteins and Their Interacting Partners: An Introduction to Protein-Ligand Binding Site Prediction Methods.

    PubMed

    Roche, Daniel Barry; Brackenridge, Danielle Allison; McGuffin, Liam James

    2015-01-01

    Elucidating the biological and biochemical roles of proteins, and subsequently determining their interacting partners, can be difficult and time consuming using in vitro and/or in vivo methods, and consequently the majority of newly sequenced proteins will have unknown structures and functions. However, in silico methods for predicting protein-ligand binding sites and protein biochemical functions offer an alternative practical solution. The characterisation of protein-ligand binding sites is essential for investigating new functional roles, which can impact the major biological research spheres of health, food, and energy security. In this review we discuss the role in silico methods play in 3D modelling of protein-ligand binding sites, along with their role in predicting biochemical functionality. In addition, we describe in detail some of the key alternative in silico prediction approaches that are available, as well as discussing the Critical Assessment of Techniques for Protein Structure Prediction (CASP) and the Continuous Automated Model EvaluatiOn (CAMEO) projects, and their impact on developments in the field. Furthermore, we discuss the importance of protein function prediction methods for tackling 21st century problems. PMID:26694353

  9. Systematic computational prediction of protein interaction networks.

    PubMed

    Lees, J G; Heriche, J K; Morilla, I; Ranea, J A; Orengo, C A

    2011-06-01

    Determining the network of physical protein associations is an important first step in developing mechanistic evidence for elucidating biological pathways. Despite rapid advances in the field of high throughput experiments to determine protein interactions, the majority of associations remain unknown. Here we describe computational methods for significantly expanding protein association networks. We describe methods for integrating multiple independent sources of evidence to obtain higher quality predictions and we compare the major publicly available resources available for experimentalists to use. PMID:21572181

  10. Genome-wide protein-protein interactions and protein function exploration in cyanobacteria.

    PubMed

    Lv, Qi; Ma, Weimin; Liu, Hui; Li, Jiang; Wang, Huan; Lu, Fang; Zhao, Chen; Shi, Tieliu

    2015-01-01

    Genome-wide network analysis is well implemented to study proteins of unknown function. Here, we effectively explored protein functions and the biological mechanism based on inferred high confident protein-protein interaction (PPI) network in cyanobacteria. We integrated data from seven different sources and predicted 1,997 PPIs, which were evaluated by experiments in molecular mechanism, text mining of literatures in proved direct/indirect evidences, and "interologs" in conservation. Combined the predicted PPIs with known PPIs, we obtained 4,715 no-redundant PPIs (involving 3,231 proteins covering over 90% of genome) to generate the PPI network. Based on the PPI network, terms in Gene ontology (GO) were assigned to function-unknown proteins. Functional modules were identified by dissecting the PPI network into sub-networks and analyzing pathway enrichment, with which we investigated novel function of underlying proteins in protein complexes and pathways. Examples of photosynthesis and DNA repair indicate that the network approach is a powerful tool in protein function analysis. Overall, this systems biology approach provides a new insight into posterior functional analysis of PPIs in cyanobacteria. PMID:26490033

  11. Genome-wide protein-protein interactions and protein function exploration in cyanobacteria

    PubMed Central

    Lv, Qi; Ma, Weimin; Liu, Hui; Li, Jiang; Wang, Huan; Lu, Fang; Zhao, Chen; Shi, Tieliu

    2015-01-01

    Genome-wide network analysis is well implemented to study proteins of unknown function. Here, we effectively explored protein functions and the biological mechanism based on inferred high confident protein-protein interaction (PPI) network in cyanobacteria. We integrated data from seven different sources and predicted 1,997 PPIs, which were evaluated by experiments in molecular mechanism, text mining of literatures in proved direct/indirect evidences, and “interologs” in conservation. Combined the predicted PPIs with known PPIs, we obtained 4,715 no-redundant PPIs (involving 3,231 proteins covering over 90% of genome) to generate the PPI network. Based on the PPI network, terms in Gene ontology (GO) were assigned to function-unknown proteins. Functional modules were identified by dissecting the PPI network into sub-networks and analyzing pathway enrichment, with which we investigated novel function of underlying proteins in protein complexes and pathways. Examples of photosynthesis and DNA repair indicate that the network approach is a powerful tool in protein function analysis. Overall, this systems biology approach provides a new insight into posterior functional analysis of PPIs in cyanobacteria. PMID:26490033

  12. Signature Product Code for Predicting Protein-Protein Interactions

    Energy Science and Technology Software Center (ESTSC)

    2004-09-25

    The SigProdV1.0 software consists of four programs which together allow the prediction of protein-protein interactions using only amino acid sequences and experimental data. The software is based on the use of tensor products of amino acid trimers coupled with classifiers known as support vector machines. Essentially the program looks for amino acid trimer pairs which occur more frequently in protein pairs which are known to interact. These trimer pairs are then used to make predictionsmore » about unknown protein pairs. A detailed description of the method can be found in the paper: S. Martin, D. Roe, J.L. Faulon. "Predicting protein-protein interactions using signature products," Bioinformatics, available online from Advance Access, Aug. 19, 2004.« less

  13. Protein-protein interactions prediction based on iterative clique extension with gene ontology filtering.

    PubMed

    Yang, Lei; Tang, Xianglong

    2014-01-01

    Cliques (maximal complete subnets) in protein-protein interaction (PPI) network are an important resource used to analyze protein complexes and functional modules. Clique-based methods of predicting PPI complement the data defection from biological experiments. However, clique-based predicting methods only depend on the topology of network. The false-positive and false-negative interactions in a network usually interfere with prediction. Therefore, we propose a method combining clique-based method of prediction and gene ontology (GO) annotations to overcome the shortcoming and improve the accuracy of predictions. According to different GO correcting rules, we generate two predicted interaction sets which guarantee the quality and quantity of predicted protein interactions. The proposed method is applied to the PPI network from the Database of Interacting Proteins (DIP) and most of the predicted interactions are verified by another biological database, BioGRID. The predicted protein interactions are appended to the original protein network, which leads to clique extension and shows the significance of biological meaning. PMID:24578640

  14. Reduced alphabet for protein folding prediction.

    PubMed

    Huang, Jitao T; Wang, Titi; Huang, Shanran R; Li, Xin

    2015-04-01

    What are the key building blocks that would have been needed to construct complex protein folds? This is an important issue for understanding protein folding mechanism and guiding de novo protein design. Twenty naturally occurring amino acids and eight secondary structures consist of a 28-letter alphabet to determine folding kinetics and mechanism. Here we predict folding kinetic rates of proteins from many reduced alphabets. We find that a reduced alphabet of 10 letters achieves good correlation with folding rates, close to the one achieved by full 28-letter alphabet. Many other reduced alphabets are not significantly correlated to folding rates. The finding suggests that not all amino acids and secondary structures are equally important for protein folding. The foldable sequence of a protein could be designed using at least 10 folding units, which can either promote or inhibit protein folding. Reducing alphabet cardinality without losing key folding kinetic information opens the door to potentially faster machine learning and data mining applications in protein structure prediction, sequence alignment and protein design. PMID:25641420

  15. On the Encoding of Proteins for Disordered Regions Prediction

    PubMed Central

    Becker, Julien; Maes, Francis; Wehenkel, Louis

    2013-01-01

    Disordered regions, i.e., regions of proteins that do not adopt a stable three-dimensional structure, have been shown to play various and critical roles in many biological processes. Predicting and understanding their formation is therefore a key sub-problem of protein structure and function inference. A wide range of machine learning approaches have been developed to automatically predict disordered regions of proteins. One key factor of the success of these methods is the way in which protein information is encoded into features. Recently, we have proposed a systematic methodology to study the relevance of various feature encodings in the context of disulfide connectivity pattern prediction. In the present paper, we adapt this methodology to the problem of predicting disordered regions and assess it on proteins from the 10th CASP competition, as well as on a very large subset of proteins extracted from PDB. Our results, obtained with ensembles of extremely randomized trees, highlight a novel feature function encoding the proximity of residues according to their accessibility to the solvent, which is playing the second most important role in the prediction of disordered regions, just after evolutionary information. Furthermore, even though our approach treats each residue independently, our results are very competitive in terms of accuracy with respect to the state-of-the-art. A web-application is available at http://m24.giga.ulg.ac.be:81/x3Disorder. PMID:24358161

  16. On the encoding of proteins for disordered regions prediction.

    PubMed

    Becker, Julien; Maes, Francis; Wehenkel, Louis

    2013-01-01

    Disordered regions, i.e., regions of proteins that do not adopt a stable three-dimensional structure, have been shown to play various and critical roles in many biological processes. Predicting and understanding their formation is therefore a key sub-problem of protein structure and function inference. A wide range of machine learning approaches have been developed to automatically predict disordered regions of proteins. One key factor of the success of these methods is the way in which protein information is encoded into features. Recently, we have proposed a systematic methodology to study the relevance of various feature encodings in the context of disulfide connectivity pattern prediction. In the present paper, we adapt this methodology to the problem of predicting disordered regions and assess it on proteins from the 10th CASP competition, as well as on a very large subset of proteins extracted from PDB. Our results, obtained with ensembles of extremely randomized trees, highlight a novel feature function encoding the proximity of residues according to their accessibility to the solvent, which is playing the second most important role in the prediction of disordered regions, just after evolutionary information. Furthermore, even though our approach treats each residue independently, our results are very competitive in terms of accuracy with respect to the state-of-the-art. A web-application is available at http://m24.giga.ulg.ac.be:81/x3Disorder. PMID:24358161

  17. Computational Prediction of Effector Proteins in Fungi: Opportunities and Challenges.

    PubMed

    Sonah, Humira; Deshmukh, Rupesh K; Bélanger, Richard R

    2016-01-01

    Effector proteins are mostly secretory proteins that stimulate plant infection by manipulating the host response. Identifying fungal effector proteins and understanding their function is of great importance in efforts to curb losses to plant diseases. Recent advances in high-throughput sequencing technologies have facilitated the availability of several fungal genomes and 1000s of transcriptomes. As a result, the growing amount of genomic information has provided great opportunities to identify putative effector proteins in different fungal species. There is little consensus over the annotation and functionality of effector proteins, and mostly small secretory proteins are considered as effector proteins, a concept that tends to overestimate the number of proteins involved in a plant-pathogen interaction. With the characterization of Avr genes, criteria for computational prediction of effector proteins are becoming more efficient. There are 100s of tools available for the identification of conserved motifs, signature sequences and structural features in the proteins. Many pipelines and online servers, which combine several tools, are made available to perform genome-wide identification of effector proteins. In this review, available tools and pipelines, their strength and limitations for effective identification of fungal effector proteins are discussed. We also present an exhaustive list of classically secreted proteins along with their key conserved motifs found in 12 common plant pathogens (11 fungi and one oomycete) through an analytical pipeline. PMID:26904083

  18. Computational Prediction of Effector Proteins in Fungi: Opportunities and Challenges

    PubMed Central

    Sonah, Humira; Deshmukh, Rupesh K.; Bélanger, Richard R.

    2016-01-01

    Effector proteins are mostly secretory proteins that stimulate plant infection by manipulating the host response. Identifying fungal effector proteins and understanding their function is of great importance in efforts to curb losses to plant diseases. Recent advances in high-throughput sequencing technologies have facilitated the availability of several fungal genomes and 1000s of transcriptomes. As a result, the growing amount of genomic information has provided great opportunities to identify putative effector proteins in different fungal species. There is little consensus over the annotation and functionality of effector proteins, and mostly small secretory proteins are considered as effector proteins, a concept that tends to overestimate the number of proteins involved in a plant–pathogen interaction. With the characterization of Avr genes, criteria for computational prediction of effector proteins are becoming more efficient. There are 100s of tools available for the identification of conserved motifs, signature sequences and structural features in the proteins. Many pipelines and online servers, which combine several tools, are made available to perform genome-wide identification of effector proteins. In this review, available tools and pipelines, their strength and limitations for effective identification of fungal effector proteins are discussed. We also present an exhaustive list of classically secreted proteins along with their key conserved motifs found in 12 common plant pathogens (11 fungi and one oomycete) through an analytical pipeline. PMID:26904083

  19. Protein structure prediction from sequence variation

    PubMed Central

    Marks, Debora S; Hopf, Thomas A; Sander, Chris

    2015-01-01

    Genomic sequences contain rich evolutionary information about functional constraints on macromolecules such as proteins. This information can be efficiently mined to detect evolutionary couplings between residues in proteins and address the long-standing challenge to compute protein three-dimensional structures from amino acid sequences. Substantial progress has recently been made on this problem owing to the explosive growth in available sequences and the application of global statistical methods. In addition to three-dimensional structure, the improved understanding of covariation may help identify functional residues involved in ligand binding, protein-complex formation and conformational changes. We expect computation of covariation patterns to complement experimental structural biology in elucidating the full spectrum of protein structures, their functional interactions and evolutionary dynamics. PMID:23138306

  20. Protein-protein interface prediction based on hexagon structure similarity.

    PubMed

    Guo, Fei; Ding, Yijie; Li, Shuai Cheng; Shen, Chao; Wang, Lusheng

    2016-08-01

    Studies on protein-protein interaction are important in proteome research. How to build more effective models based on sequence information, structure information and physicochemical characteristics, is the key technology in protein-protein interface prediction. In this paper, we study the protein-protein interface prediction problem. We propose a novel method for identifying residues on interfaces from an input protein with both sequence and 3D structure information, based on hexagon structure similarity. Experiments show that our method achieves better results than some state-of-the-art methods for identifying protein-protein interface. Comparing to existing methods, our approach improves F-measure value by at least 0.03. On a common dataset consisting of 41 complexes, our method has overall precision and recall values of 63% and 57%. On Benchmark v4.0, our method has overall precision and recall values of 55% and 56%. On CAPRI targets, our method has overall precision and recall values of 52% and 55%. PMID:26936323

  1. Functions of S100 Proteins

    PubMed Central

    Donato, R.; Cannon, B.R.; Sorci, G.; Riuzzi, F.; Hsu, K.; Weber, D.J.; Geczy, C.L.

    2013-01-01

    The S100 protein family consists of 24 members functionally distributed into three main subgroups: those that only exert intracellular regulatory effects, those with intracellular and extracellular functions and those which mainly exert extracellular regulatory effects. S100 proteins are only expressed in vertebrates and show cell-specific expression patterns. In some instances, a particular S100 protein can be induced in pathological circumstances in a cell type that does not express it in normal physiological conditions. Within cells, S100 proteins are involved in aspects of regulation of proliferation, differentiation, apoptosis, Ca2+ homeostasis, energy metabolism, inflammation and migration/invasion through interactions with a variety of target proteins including enzymes, cytoskeletal subunits, receptors, transcription factors and nucleic acids. Some S100 proteins are secreted or released and regulate cell functions in an autocrine and paracrine manner via activation of surface receptors (e.g. the receptor for advanced glycation end-products and toll-like receptor 4), G-protein-coupled receptors, scavenger receptors, or heparan sulfate proteoglycans and N-glycans. Extracellular S100A4 and S100B also interact with epidermal growth factor and basic fibroblast growth factor, respectively, thereby enhancing the activity of the corresponding receptors. Thus, extracellular S100 proteins exert regulatory activities on monocytes/macrophages/microglia, neutrophils, lymphocytes, mast cells, articular chondrocytes, endothelial and vascular smooth muscle cells, neurons, astrocytes, Schwann cells, epithelial cells, myoblasts and cardiomyocytes, thereby participating in innate and adaptive immune responses, cell migration and chemotaxis, tissue development and repair, and leukocyte and tumor cell invasion. PMID:22834835

  2. Prediction and redesign of protein-protein interactions.

    PubMed

    Lua, Rhonald C; Marciano, David C; Katsonis, Panagiotis; Adikesavan, Anbu K; Wilkins, Angela D; Lichtarge, Olivier

    2014-01-01

    Understanding the molecular basis of protein function remains a central goal of biology, with the hope to elucidate the role of human genes in health and in disease, and to rationally design therapies through targeted molecular perturbations. We review here some of the computational techniques and resources available for characterizing a critical aspect of protein function - those mediated by protein-protein interactions (PPI). We describe several applications and recent successes of the Evolutionary Trace (ET) in identifying molecular events and shapes that underlie protein function and specificity in both eukaryotes and prokaryotes. ET is a part of analytical approaches based on the successes and failures of evolution that enable the rational control of PPI. PMID:24878423

  3. Bioinformatic Prediction of WSSV-Host Protein-Protein Interaction

    PubMed Central

    Sun, Zheng; Xiang, Jianhai

    2014-01-01

    WSSV is one of the most dangerous pathogens in shrimp aquaculture. However, the molecular mechanism of how WSSV interacts with shrimp is still not very clear. In the present study, bioinformatic approaches were used to predict interactions between proteins from WSSV and shrimp. The genome data of WSSV (NC_003225.1) and the constructed transcriptome data of F. chinensis were used to screen potentially interacting proteins by searching in protein interaction databases, including STRING, Reactome, and DIP. Forty-four pairs of proteins were suggested to have interactions between WSSV and the shrimp. Gene ontology analysis revealed that 6 pairs of these interacting proteins were classified into “extracellular region” or “receptor complex” GO-terms. KEGG pathway analysis showed that they were involved in the “ECM-receptor interaction pathway.” In the 6 pairs of interacting proteins, an envelope protein called “collagen-like protein” (WSSV-CLP) encoded by an early virus gene “wsv001” in WSSV interacted with 6 deduced proteins from the shrimp, including three integrin alpha (ITGA), two integrin beta (ITGB), and one syndecan (SDC). Sequence analysis on WSSV-CLP, ITGA, ITGB, and SDC revealed that they possessed the sequence features for protein-protein interactions. This study might provide new insights into the interaction mechanisms between WSSV and shrimp. PMID:24982879

  4. Sequence-based feature prediction and annotation of proteins

    PubMed Central

    Juncker, Agnieszka S; Jensen, Lars J; Pierleoni, Andrea; Bernsel, Andreas; Tress, Michael L; Bork, Peer; von Heijne, Gunnar; Valencia, Alfonso; Ouzounis, Christos A; Casadio, Rita; Brunak, Søren

    2009-01-01

    A recent trend in computational methods for annotation of protein function is that many prediction tools are combined in complex workflows and pipelines to facilitate the analysis of feature combinations, for example, the entire repertoire of kinase-binding motifs in the human proteome. PMID:19226438

  5. Computational Prediction of Protein–Protein Interaction Networks: Algo-rithms and Resources

    PubMed Central

    Zahiri, Javad; Bozorgmehr, Joseph Hannon; Masoudi-Nejad, Ali

    2013-01-01

    Protein interactions play an important role in the discovery of protein functions and pathways in biological processes. This is especially true in case of the diseases caused by the loss of specific protein-protein interactions in the organism. The accuracy of experimental results in finding protein-protein interactions, however, is rather dubious and high throughput experimental results have shown both high false positive beside false negative information for protein interaction. Computational methods have attracted tremendous attention among biologists because of the ability to predict protein-protein interactions and validate the obtained experimental results. In this study, we have reviewed several computational methods for protein-protein interaction prediction as well as describing major databases, which store both predicted and detected protein-protein interactions, and the tools used for analyzing protein interaction networks and improving protein-protein interaction reliability. PMID:24396273

  6. Functional Classification of Immune Regulatory Proteins

    SciTech Connect

    Rubinstein, Rotem; Ramagopal, Udupi A.; Nathenson, Stanley G.; Almo, Steven C.; Fiser, Andras

    2013-05-01

    Members of the immunoglobulin superfamily (IgSF) control innate and adaptive immunity and are prime targets for the treatment of autoimmune diseases, infectious diseases, and malignancies. We describe a computational method, termed the Brotherhood algorithm, which utilizes intermediate sequence information to classify proteins into functionally related families. This approach identifies functional relationships within the IgSF and predicts additional receptor-ligand interactions. As a specific example, we examine the nectin/nectin-like family of cell adhesion and signaling proteins and propose receptor-ligand interactions within this family. We were guided by the Brotherhood approach and present the high-resolution structural characterization of a homophilic interaction involving the class-I MHC-restricted T-cell-associated molecule, which we now classify as a nectin-like family member. The Brotherhood algorithm is likely to have a significant impact on structural immunology by identifying those proteins and complexes for which structural characterization will be particularly informative.

  7. Genome-wide Membrane Protein Structure Prediction

    PubMed Central

    Piccoli, Stefano; Suku, Eda; Garonzi, Marianna; Giorgetti, Alejandro

    2013-01-01

    Transmembrane proteins allow cells to extensively communicate with the external world in a very accurate and specific way. They form principal nodes in several signaling pathways and attract large interest in therapeutic intervention, as the majority pharmaceutical compounds target membrane proteins. Thus, according to the current genome annotation methods, a detailed structural/functional characterization at the protein level of each of the elements codified in the genome is also required. The extreme difficulty in obtaining high-resolution three-dimensional structures, calls for computational approaches. Here we review to which extent the efforts made in the last few years, combining the structural characterization of membrane proteins with protein bioinformatics techniques, could help describing membrane proteins at a genome-wide scale. In particular we analyze the use of comparative modeling techniques as a way of overcoming the lack of high-resolution three-dimensional structures in the human membrane proteome. PMID:24403851

  8. How special is the biochemical function of native proteins?

    PubMed Central

    Skolnick, Jeffrey; Gao, Mu; Zhou, Hongyi

    2016-01-01

    Native proteins perform an amazing variety of biochemical functions, including enzymatic catalysis, and can engage in protein-protein and protein-DNA interactions that are essential for life. A key question is how special are these functional properties of proteins. Are they extremely rare, or are they an intrinsic feature? Comparison to the properties of compact conformations of artificially generated compact protein structures selected for thermodynamic stability but not any type of function, the artificial (ART) protein library, demonstrates that a remarkable number of the properties of native-like proteins are recapitulated. These include the complete set of small molecule ligand-binding pockets and most protein-protein interfaces. ART structures are predicted to be capable of weakly binding metabolites and cover a significant fraction of metabolic pathways, with the most enriched pathways including ancient ones such as glycolysis. Native-like active sites are also found in ART proteins. A small fraction of ART proteins are predicted to have strong protein-protein and protein-DNA interactions. Overall, it appears that biochemical function is an intrinsic feature of proteins which nature has significantly optimized during evolution. These studies raise questions as to the relative roles of specificity and promiscuity in the biochemical function and control of cells that need investigation. PMID:26962440

  9. Learning Protein Folding Energy Functions

    PubMed Central

    Guan, Wei; Ozakin, Arkadas; Gray, Alexander; Borreguero, Jose; Pandit, Shashi; Jagielska, Anna; Wroblewska, Liliana; Skolnick, Jeffrey

    2014-01-01

    A critical open problem in ab initio protein folding is protein energy function design, which pertains to defining the energy of protein conformations in a way that makes folding most efficient and reliable. In this paper, we address this issue as a weight optimization problem and utilize a machine learning approach, learning-to-rank, to solve this problem. We investigate the ranking-via-classification approach, especially the RankingSVM method and compare it with the state-of-the-art approach to the problem using the MINUIT optimization package. To maintain the physicality of the results, we impose non-negativity constraints on the weights. For this we develop two efficient non-negative support vector machine (NNSVM) methods, derived from L2-norm SVM and L1-norm SVMs, respectively. We demonstrate an energy function which maintains the correct ordering with respect to structure dissimilarity to the native state more often, is more efficient and reliable for learning on large protein sets, and is qualitatively superior to the current state-of-the-art energy function. PMID:25311546

  10. Prediction of protein-protein interactions based on protein-protein correlation using least squares regression.

    PubMed

    Huang, De-Shuang; Zhang, Lei; Han, Kyungsook; Deng, Suping; Yang, Kai; Zhang, Hongbo

    2014-01-01

    In order to transform protein sequences into the feature vectors, several works have been done, such as computing auto covariance (AC), conjoint triad (CT), local descriptor (LD), moran autocorrelation (MA), normalized moreaubroto autocorrelation (NMB) and so on. In this paper, we shall adopt these transformation methods to encode the proteins, respectively, where AC, CT, LD, MA and NMB are all represented by '+' in a unified manner. A new method, i.e. the combination of least squares regression with '+' (abbreviated as LSR(+)), will be introduced for encoding a protein-protein correlation-based feature representation and an interacting protein pair. Thus there are totally five different combinations for LSR(+), i.e. LSRAC, LSRCT, LSRLD, LSRMA and LSRNMB. As a result, we combined a support vector machine (SVM) approach with LSR(+) to predict protein-protein interactions (PPI) and PPI networks. The proposed method has been applied on four datasets, i.e. Saaccharomyces cerevisiae, Escherichia coli, Homo sapiens and Caenorhabditis elegans. The experimental results demonstrate that all LSR(+) methods outperform many existing representative algorithms. Therefore, LSR(+) is a powerful tool to characterize the protein-protein correlations and to infer PPI, whilst keeping high performance on prediction of PPI networks. PMID:25059329

  11. Prediction of protein structural classes and subcellular locations.

    PubMed

    Chou, K C

    2000-09-01

    The structural class and subcellular location are the two important features of proteins that are closely related to their biological functions. With the rapid increase in new protein sequences entering into data banks, it is highly desirable to develop a fast and accurate method for predicting the attributes of these features for them. This can expedite the functionality determination of new proteins and the process of prioritizing genes and proteins identified by genomics efforts as potential molecular targets for drug design. Various prediction methods have been developed during the last two decades. This review is devoted to presenting a systematic introduction and comparison of the existing methods in respect to the prediction algorithm and classification scheme. The attention is focused on the state-of-the-art, which is featured by the covarient-discriminant algorithm developed very recently, as well as some new classification schemes for protein structural classes and subcellular locations. Particularly, addressed are also the physical chemistry foundation of the existing prediction methods, and the essence why the covariant-discriminant algorithm is so powerful. PMID:12369916

  12. Structure based prediction of protein folding intermediates.

    PubMed

    Xie, D; Freire, E

    1994-09-01

    The complete unfolding of a protein involves the disruption of non-covalent intramolecular interactions within the protein and the subsequent hydration of the backbone and amino acid side-chains. The magnitude of the thermodynamic parameters associated with this process is known accurately for a growing number of globular proteins for which high-resolution structures are also available. The existence of this database of structural and thermodynamic information has facilitated the development of statistical procedures aimed at quantifying the relationships existing between protein structure and the thermodynamic parameters of folding/unfolding. Under some conditions proteins do not unfold completely, giving rise to states (commonly known as molten globules) in which the molecule retains some secondary structure and remains in a compact configuration after denaturation. This phenomenon is reflected in the thermodynamics of the process. Depending on the nature of the residual structure that exists after denaturation, the observed enthalpy, entropy and heat capacity changes will deviate in a particular and predictable way from the values expected for complete unfolding. For several proteins, these deviations have been shown to exhibit similar characteristics, suggesting that their equilibrium folding intermediates exhibit some common structural features. Employing empirically derived structure-energetic relationships, it is possible to identify in the native structure of the protein those regions with the higher probability of being structured in equilibrium partly folded states. In this work, a thermodynamic search algorithm aimed at identifying the structural determinants of the molten globule state has been applied to six globular proteins; alpha-lactalbumin, barnase, IIIGlc, interleukin-1 beta, phage T4 lysozyme and phage 434 repressor. Remarkably, the structural features of the predicted equilibrium intermediates coincide to a large extent with the known

  13. PREDICTION OF NONLINEAR SPATIAL FUNCTIONALS. (R827257)

    EPA Science Inventory

    Spatial statistical methodology can be useful in the arena of environmental regulation. Some regulatory questions may be addressed by predicting linear functionals of the underlying signal, but other questions may require the prediction of nonlinear functionals of the signal. ...

  14. Predicting the Impact of Missense Mutations on Protein-Protein Binding Affinity.

    PubMed

    Li, Minghui; Petukh, Marharyta; Alexov, Emil; Panchenko, Anna R

    2014-04-01

    The crucial prerequisite for proper biological function is the protein's ability to establish highly selective interactions with macromolecular partners. A missense mutation that alters the protein binding affinity may cause significant perturbations or complete abolishment of the function, potentially leading to diseases. The availability of computational methods to evaluate the impact of mutations on protein-protein binding is critical for a wide range of biomedical applications. Here, we report an efficient computational approach for predicting the effect of single and multiple missense mutations on protein-protein binding affinity. It is based on a well-tested simulation protocol for structure minimization, modified MM-PBSA and statistical scoring energy functions with parameters optimized on experimental sets of several thousands of mutations. Our simulation protocol yields very good agreement between predicted and experimental values with Pearson correlation coefficients of 0.69 and 0.63 and root-mean-square errors of 1.20 and 1.90 kcal mol(-1) for single and multiple mutations, respectively. Compared with other available methods, our approach achieves high speed and prediction accuracy and can be applied to large datasets generated by modern genomics initiatives. In addition, we report a crucial role of water model and the polar solvation energy in estimating the changes in binding affinity. Our analysis also reveals that prediction accuracy and effect of mutations on binding strongly depends on the type of mutation and its location in a protein complex. PMID:24803870

  15. Protein Structure Prediction with Evolutionary Algorithms

    SciTech Connect

    Hart, W.E.; Krasnogor, N.; Pelta, D.A.; Smith, J.

    1999-02-08

    Evolutionary algorithms have been successfully applied to a variety of molecular structure prediction problems. In this paper we reconsider the design of genetic algorithms that have been applied to a simple protein structure prediction problem. Our analysis considers the impact of several algorithmic factors for this problem: the confirmational representation, the energy formulation and the way in which infeasible conformations are penalized, Further we empirically evaluated the impact of these factors on a small set of polymer sequences. Our analysis leads to specific recommendations for both GAs as well as other heuristic methods for solving PSP on the HP model.

  16. Bioinformatics pipeline for functional identification and characterization of proteins

    NASA Astrophysics Data System (ADS)

    Skarzyńska, Agnieszka; Pawełkowicz, Magdalena; Krzywkowski, Tomasz; Świerkula, Katarzyna; PlÄ der, Wojciech; Przybecki, Zbigniew

    2015-09-01

    The new sequencing methods, called Next Generation Sequencing gives an opportunity to possess a vast amount of data in short time. This data requires structural and functional annotation. Functional identification and characterization of predicted proteins could be done by in silico approches, thanks to a numerous computational tools available nowadays. However, there is a need to confirm the results of proteins function prediction using different programs and comparing the results or confirm experimentally. Here we present a bioinformatics pipeline for structural and functional annotation of proteins.

  17. Proteins with Novel Structure, Function and Dynamics

    NASA Technical Reports Server (NTRS)

    Pohorille, Andrew

    2014-01-01

    Recently, a small enzyme that ligates two RNA fragments with the rate of 10(exp 6) above background was evolved in vitro (Seelig and Szostak, Nature 448:828-831, 2007). This enzyme does not resemble any contemporary protein (Chao et al., Nature Chem. Biol. 9:81-83, 2013). It consists of a dynamic, catalytic loop, a small, rigid core containing two zinc ions coordinated by neighboring amino acids, and two highly flexible tails that might be unimportant for protein function. In contrast to other proteins, this enzyme does not contain ordered secondary structure elements, such as alpha-helix or beta-sheet. The loop is kept together by just two interactions of a charged residue and a histidine with a zinc ion, which they coordinate on the opposite side of the loop. Such structure appears to be very fragile. Surprisingly, computer simulations indicate otherwise. As the coordinating, charged residue is mutated to alanine, another, nearby charged residue takes its place, thus keeping the structure nearly intact. If this residue is also substituted by alanine a salt bridge involving two other, charged residues on the opposite sides of the loop keeps the loop in place. These adjustments are facilitated by high flexibility of the protein. Computational predictions have been confirmed experimentally, as both mutants retain full activity and overall structure. These results challenge our notions about what is required for protein activity and about the relationship between protein dynamics, stability and robustness. We hypothesize that small, highly dynamic proteins could be both active and fault tolerant in ways that many other proteins are not, i.e. they can adjust to retain their structure and activity even if subjected to mutations in structurally critical regions. This opens the doors for designing proteins with novel functions, structures and dynamics that have not been yet considered.

  18. Machine learning algorithms for predicting protein folding rates and stability of mutant proteins: comparison with statistical methods.

    PubMed

    Gromiha, M Michael; Huang, Liang-Tsung

    2011-09-01

    Machine learning algorithms have wide range of applications in bioinformatics and computational biology such as prediction of protein secondary structures, solvent accessibility, binding site residues in protein complexes, protein folding rates, stability of mutant proteins, and discrimination of proteins based on their structure and function. In this work, we focus on two aspects of predictions: (i) protein folding rates and (ii) stability of proteins upon mutations. We briefly introduce the concepts of protein folding rates and stability along with available databases, features for prediction methods and measures for prediction performance. Subsequently, the development of structure based parameters and their relationship with protein folding rates will be outlined. The structure based parameters are helpful to understand the physical basis for protein folding and stability. Further, basic principles of major machine learning techniques will be mentioned and their applications for predicting protein folding rates and stability of mutant proteins will be illustrated. The machine learning techniques could achieve the highest accuracy of predicting protein folding rates and stability. In essence, statistical methods and machine learning algorithms are complimenting each other for understanding and predicting protein folding rates and the stability of protein mutants. The available online resources on protein folding rates and stability will be listed. PMID:21787301

  19. Algorithmic approaches to protein-protein interaction site prediction.

    PubMed

    Aumentado-Armstrong, Tristan T; Istrate, Bogdan; Murgita, Robert A

    2015-01-01

    Interaction sites on protein surfaces mediate virtually all biological activities, and their identification holds promise for disease treatment and drug design. Novel algorithmic approaches for the prediction of these sites have been produced at a rapid rate, and the field has seen significant advancement over the past decade. However, the most current methods have not yet been reviewed in a systematic and comprehensive fashion. Herein, we describe the intricacies of the biological theory, datasets, and features required for modern protein-protein interaction site (PPIS) prediction, and present an integrative analysis of the state-of-the-art algorithms and their performance. First, the major sources of data used by predictors are reviewed, including training sets, evaluation sets, and methods for their procurement. Then, the features employed and their importance in the biological characterization of PPISs are explored. This is followed by a discussion of the methodologies adopted in contemporary prediction programs, as well as their relative performance on the datasets most recently used for evaluation. In addition, the potential utility that PPIS identification holds for rational drug design, hotspot prediction, and computational molecular docking is described. Finally, an analysis of the most promising areas for future development of the field is presented. PMID:25713596

  20. Protein Markers Predict Survival in Glioma Patients.

    PubMed

    Stetson, Lindsay C; Dazard, Jean-Eudes; Barnholtz-Sloan, Jill S

    2016-07-01

    Glioblastoma multiforme (GBM) is a genomically complex and aggressive primary adult brain tumor, with a median survival time of 12-14 months. The heterogeneous nature of this disease has made the identification and validation of prognostic biomarkers difficult. Using reverse phase protein array data from 203 primary untreated GBM patients, we have identified a set of 13 proteins with prognostic significance. Our protein signature predictive of glioblastoma (PROTGLIO) patient survival model was constructed and validated on independent data sets and was shown to significantly predict survival in GBM patients (log-rank test: p = 0.0009). Using a multivariate Cox proportional hazards, we have shown that our PROTGLIO model is distinct from other known GBM prognostic factors (age at diagnosis, extent of surgical resection, postoperative Karnofsky performance score (KPS), treatment with temozolomide (TMZ) chemoradiation, and methylation of the MGMT gene). Tenfold cross-validation repetition of our model generation procedure confirmed validation of PROTGLIO. The model was further validated on an independent set of isocitrate dehydrogenase wild-type (IDHwt) lower grade gliomas (LGG)-a portion of these tumors progress rapidly to GBM. The PROTGLIO model contains proteins, such as Cox-2 and Annexin 1, involved in inflammatory response, pointing to potential therapeutic interventions. The PROTGLIO model is a simple and effective predictor of overall survival in glioblastoma patients, making it potentially useful in clinical practice of glioblastoma multiforme. PMID:27143410

  1. On the necessity of dissecting sequence similarity scores into segment-specific contributions for inferring protein homology, function prediction and annotation

    PubMed Central

    2014-01-01

    Background Protein sequence similarities to any types of non-globular segments (coiled coils, low complexity regions, transmembrane regions, long loops, etc. where either positional sequence conservation is the result of a very simple, physically induced pattern or rather integral sequence properties are critical) are pertinent sources for mistaken homologies. Regretfully, these considerations regularly escape attention in large-scale annotation studies since, often, there is no substitute to manual handling of these cases. Quantitative criteria are required to suppress events of function annotation transfer as a result of false homology assignments. Results The sequence homology concept is based on the similarity comparison between the structural elements, the basic building blocks for conferring the overall fold of a protein. We propose to dissect the total similarity score into fold-critical and other, remaining contributions and suggest that, for a valid homology statement, the fold-relevant score contribution should at least be significant on its own. As part of the article, we provide the DissectHMMER software program for dissecting HMMER2/3 scores into segment-specific contributions. We show that DissectHMMER reproduces HMMER2/3 scores with sufficient accuracy and that it is useful in automated decisions about homology for instructive sequence examples. To generalize the dissection concept for cases without 3D structural information, we find that a dissection based on alignment quality is an appropriate surrogate. The approach was applied to a large-scale study of SMART and PFAM domains in the space of seed sequences and in the space of UniProt/SwissProt. Conclusions Sequence similarity core dissection with regard to fold-critical and other contributions systematically suppresses false hits and, additionally, recovers previously obscured homology relationships such as the one between aquaporins and formate/nitrite transporters that, so far, was only

  2. CATH FunFHMMer web server: protein functional annotations using functional family assignments

    PubMed Central

    Das, Sayoni; Sillitoe, Ian; Lee, David; Lees, Jonathan G.; Dawson, Natalie L.; Ward, John; Orengo, Christine A.

    2015-01-01

    The widening function annotation gap in protein databases and the increasing number and diversity of the proteins being sequenced presents new challenges to protein function prediction methods. Multidomain proteins complicate the protein sequence–structure–function relationship further as new combinations of domains can expand the functional repertoire, creating new proteins and functions. Here, we present the FunFHMMer web server, which provides Gene Ontology (GO) annotations for query protein sequences based on the functional classification of the domain-based CATH-Gene3D resource. Our server also provides valuable information for the prediction of functional sites. The predictive power of FunFHMMer has been validated on a set of 95 proteins where FunFHMMer performs better than BLAST, Pfam and CDD. Recent validation by an independent international competition ranks FunFHMMer as one of the top function prediction methods in predicting GO annotations for both the Biological Process and Molecular Function Ontology. The FunFHMMer web server is available at http://www.cathdb.info/search/by_funfhmmer. PMID:25964299

  3. CATH FunFHMMer web server: protein functional annotations using functional family assignments.

    PubMed

    Das, Sayoni; Sillitoe, Ian; Lee, David; Lees, Jonathan G; Dawson, Natalie L; Ward, John; Orengo, Christine A

    2015-07-01

    The widening function annotation gap in protein databases and the increasing number and diversity of the proteins being sequenced presents new challenges to protein function prediction methods. Multidomain proteins complicate the protein sequence-structure-function relationship further as new combinations of domains can expand the functional repertoire, creating new proteins and functions. Here, we present the FunFHMMer web server, which provides Gene Ontology (GO) annotations for query protein sequences based on the functional classification of the domain-based CATH-Gene3D resource. Our server also provides valuable information for the prediction of functional sites. The predictive power of FunFHMMer has been validated on a set of 95 proteins where FunFHMMer performs better than BLAST, Pfam and CDD. Recent validation by an independent international competition ranks FunFHMMer as one of the top function prediction methods in predicting GO annotations for both the Biological Process and Molecular Function Ontology. The FunFHMMer web server is available at http://www.cathdb.info/search/by_funfhmmer. PMID:25964299

  4. Multi-level machine learning prediction of protein-protein interactions in Saccharomyces cerevisiae.

    PubMed

    Zubek, Julian; Tatjewski, Marcin; Boniecki, Adam; Mnich, Maciej; Basu, Subhadip; Plewczynski, Dariusz

    2015-01-01

    Accurate identification of protein-protein interactions (PPI) is the key step in understanding proteins' biological functions, which are typically context-dependent. Many existing PPI predictors rely on aggregated features from protein sequences, however only a few methods exploit local information about specific residue contacts. In this work we present a two-stage machine learning approach for prediction of protein-protein interactions. We start with the carefully filtered data on protein complexes available for Saccharomyces cerevisiae in the Protein Data Bank (PDB) database. First, we build linear descriptions of interacting and non-interacting sequence segment pairs based on their inter-residue distances. Secondly, we train machine learning classifiers to predict binary segment interactions for any two short sequence fragments. The final prediction of the protein-protein interaction is done using the 2D matrix representation of all-against-all possible interacting sequence segments of both analysed proteins. The level-I predictor achieves 0.88 AUC for micro-scale, i.e., residue-level prediction. The level-II predictor improves the results further by a more complex learning paradigm. We perform 30-fold macro-scale, i.e., protein-level cross-validation experiment. The level-II predictor using PSIPRED-predicted secondary structure reaches 0.70 precision, 0.68 recall, and 0.70 AUC, whereas other popular methods provide results below 0.6 threshold (recall, precision, AUC). Our results demonstrate that multi-scale sequence features aggregation procedure is able to improve the machine learning results by more than 10% as compared to other sequence representations. Prepared datasets and source code for our experimental pipeline are freely available for download from: http://zubekj.github.io/mlppi/ (open source Python implementation, OS independent). PMID:26157620

  5. Using support vector machine for improving protein-protein interaction prediction utilizing domain interactions

    SciTech Connect

    Singhal, Mudita; Shah, Anuj R.; Brown, Roslyn N.; Adkins, Joshua N.

    2010-10-02

    Understanding protein interactions is essential to gain insights into the biological processes at the whole cell level. The high-throughput experimental techniques for determining protein-protein interactions (PPI) are error prone and expensive with low overlap amongst them. Although several computational methods have been proposed for predicting protein interactions there is definite room for improvement. Here we present DomainSVM, a predictive method for PPI that uses computationally inferred domain-domain interaction values in a Support Vector Machine framework to predict protein interactions. DomainSVM method utilizes evidence of multiple interacting domains to predict a protein interaction. It outperforms existing methods of PPI prediction by achieving very high explanation ratios, precision, specificity, sensitivity and F-measure values in a 10 fold cross-validation study conducted on the positive and negative PPIs in yeast. A Functional comparison study using GO annotations on the positive and the negative test sets is presented in addition to discussing novel PPI predictions in Salmonella Typhimurium.

  6. Prediction and Annotation of Plant Protein Interaction Networks

    SciTech Connect

    McDermott, Jason E.; Wang, Jun; Yu, Jun; Wong, Gane Ka-Shu; Samudrala, Ram

    2009-02-01

    Large-scale experimental studies of interactions between components of biological systems have been performed for a variety of eukaryotic organisms. However, there is a dearth of such data for plants. Computational methods for prediction of relationships between proteins, primarily based on comparative genomics, provide a useful systems-level view of cellular functioning and can be used to extend information about other eukaryotes to plants. We have predicted networks for Arabidopsis thaliana, Oryza sativa indica and japonica and several plant pathogens using the Bioverse (http://bioverse.compbio.washington.edu) and show that they are similar to experimentally-derived interaction networks. Predicted interaction networks for plants can be used to provide novel functional annotations and predictions about plant phenotypes and aid in rational engineering of biosynthesis pathways.

  7. Enhanced prediction of conformational flexibility and phosphorylation in proteins.

    PubMed

    Swaminathan, Karthikeyan; Adamczak, Rafal; Porollo, Aleksey; Meller, Jarosław

    2010-01-01

    Many sequence-based predictors of structural and functional properties of proteins have been developed in the past. In this study, we developed new methods for predicting measures of conformational flexibility in proteins, including X-ray structure-derived temperature (B-) factors and the variance within NMR structural ensemble, as effectively measured by the solvent accessibility standard deviations (SASDs). We further tested whether these predicted measures of conformational flexibility in crystal lattices and solution, respectively, can be used to improve the prediction of phosphorylation in proteins. The latter is an example of a common post-translational modification that modulates protein function, e.g., by affecting interactions and conformational flexibility of phosphorylated sites. Using robust epsilon-insensitive support vector regression (ε-SVR) models, we assessed two specific representations of protein sequences: one based on the position-specific scoring matrices (PSSMs) derived from multiple sequence alignments, and an augmented representation that incorporates real-valued solvent accessibility and secondary structure predictions (RSA/SS) as additional measures of local structural propensities. We showed that a combination of PSSMs and real-valued SS/RSA predictions provides systematic improvements in the accuracy of both B-factors and SASD prediction. These intermediate predictions were subsequently combined into an enhanced predictor of phosphorylation that was shown to significantly outperform methods based on PSSM alone. We would like to stress that to the best of our knowledge, this is the first example of using predicted from sequence NMR structure-based measures of conformational flexibility in solution for the prediction of other properties of proteins. Phosphorylation prediction methods typically employ a two-class classification approach with the limitation that the set of negative examples used for training may include some sites that are

  8. Neurodegenerative diseases: quantitative predictions of protein-RNA interactions.

    PubMed

    Cirillo, Davide; Agostini, Federico; Klus, Petr; Marchese, Domenica; Rodriguez, Silvia; Bolognesi, Benedetta; Tartaglia, Gian Gaetano

    2013-02-01

    Increasing evidence indicates that RNA plays an active role in a number of neurodegenerative diseases. We recently introduced a theoretical framework, catRAPID, to predict the binding ability of protein and RNA molecules. Here, we use catRAPID to investigate ribonucleoprotein interactions linked to inherited intellectual disability, amyotrophic lateral sclerosis, Creutzfeuld-Jakob, Alzheimer's, and Parkinson's diseases. We specifically focus on (1) RNA interactions with fragile X mental retardation protein FMRP; (2) protein sequestration caused by CGG repeats; (3) noncoding transcripts regulated by TAR DNA-binding protein 43 TDP-43; (4) autogenous regulation of TDP-43 and FMRP; (5) iron-mediated expression of amyloid precursor protein APP and α-synuclein; (6) interactions between prions and RNA aptamers. Our results are in striking agreement with experimental evidence and provide new insights in processes associated with neuronal function and misfunction. PMID:23264567

  9. Prediction of protein-protein interaction network using a multi-objective optimization approach.

    PubMed

    Chowdhury, Archana; Rakshit, Pratyusha; Konar, Amit

    2016-06-01

    Protein-Protein Interactions (PPIs) are very important as they coordinate almost all cellular processes. This paper attempts to formulate PPI prediction problem in a multi-objective optimization framework. The scoring functions for the trial solution deal with simultaneous maximization of functional similarity, strength of the domain interaction profiles, and the number of common neighbors of the proteins predicted to be interacting. The above optimization problem is solved using the proposed Firefly Algorithm with Nondominated Sorting. Experiments undertaken reveal that the proposed PPI prediction technique outperforms existing methods, including gene ontology-based Relative Specific Similarity, multi-domain-based Domain Cohesion Coupling method, domain-based Random Decision Forest method, Bagging with REP Tree, and evolutionary/swarm algorithm-based approaches, with respect to sensitivity, specificity, and F1 score. PMID:26846814

  10. Identification, Analysis and Prediction of Protein Ubiquitination Sites

    PubMed Central

    Radivojac, Predrag; Vacic, Vladimir; Haynes, Chad; Cocklin, Ross R.; Mohan, Amrita; Heyen, Joshua W.; Goebl, Mark G.; Iakoucheva, Lilia M.

    2009-01-01

    Summary Ubiquitination plays an important role in many cellular processes and is implicated in many diseases. Experimental identification of ubiquitination sites is challenging due to rapid turnover of ubiquitinated proteins and the large size of the ubiquitin modifier. We identified 141 new ubiquitination sites using a combination of liquid chromatography, mass spectrometry and mutant yeast strains. Investigation of the sequence biases and structural preferences around known ubiquitination sites indicated that their properties were similar to those of intrinsically disordered protein regions. Using a combined set of new and previously known ubiquitination sites, we developed a random forest predictor of ubiquitination sites, UbPred. The class-balanced accuracy of UbPred reached 72%, with the area under the ROC curve at 80%. The application of UbPred showed that high confidence Rsp5 ubiquitin ligase substrates and proteins with very short half-lives were significantly enriched in the number of predicted ubiquitination sites. Proteome-wide prediction of ubiquitination sites in Saccharomyces cerevisiae indicated that highly ubiquitinated substrates were prevalent among transcription/enzyme regulators and proteins involved in cell cycle control. In the human proteome, cytoskeletal, cell cycle, regulatory and cancer-associated proteins display higher extent of ubiquitination than proteins from other functional categories. We show that gain and loss of predicted ubiquitination sites may likely represent a molecular mechanism behind a number of disease-associated mutations. UbPred is available at http://www.ubpred.org PMID:19722269

  11. Analysis and Functional Prediction of Reactive Cysteine Residues*

    PubMed Central

    Marino, Stefano M.; Gladyshev, Vadim N.

    2012-01-01

    Cys is much different from other common amino acids in proteins. Being one of the least abundant residues, Cys is often observed in functional sites in proteins. This residue is reactive, polarizable, and redox-active; has high affinity for metals; and is particularly responsive to the local environment. A better understanding of the basic properties of Cys is essential for interpretation of high-throughput data sets and for prediction and classification of functional Cys residues. We provide an overview of approaches used to study Cys residues, from methods for investigation of their basic properties, such as exposure and pKa, to algorithms for functional prediction of different types of Cys in proteins. PMID:22157013

  12. Rosetta stone method for detecting protein function and protein-protein interactions from genome sequences

    DOEpatents

    Eisenberg, David; Marcotte, Edward M.; Pellegrini, Matteo; Thompson, Michael J.; Yeates, Todd O.

    2002-10-15

    A computational method system, and computer program are provided for inferring functional links from genome sequences. One method is based on the observation that some pairs of proteins A' and B' have homologs in another organism fused into a single protein chain AB. A trans-genome comparison of sequences can reveal these AB sequences, which are Rosetta Stone sequences because they decipher an interaction between A' and B. Another method compares the genomic sequence of two or more organisms to create a phylogenetic profile for each protein indicating its presence or absence across all the genomes. The profile provides information regarding functional links between different families of proteins. In yet another method a combination of the above two methods is used to predict functional links.

  13. Some remarks on prediction of protein-protein interaction with machine learning.

    PubMed

    Zhang, Shao-Wu; Wei, Ze-Gang

    2015-01-01

    Protein-protein interactions (PPIs) play a key role in many cellular processes. Uncovering the PPIs and their function within the cell is a challenge of post-genomic biology and will improve our understanding of disease and help in the development of novel methods for disease diagnosis and forensics. The experimental methods currently used to identify PPIs are both time-consuming and expensive, and high throughput experimental results have shown both high false positive beside false negative information for protein interaction. These obstacles could be overcome by developing computational approaches to predict PPIs and validate the obtained experimental results. In this work, we will describe the recent advances in predicting protein-protein interaction from the following aspects: i) the benchmark dataset construction, ii) the sequence representation approaches, iii) the common machine learning algorithms, and iv) the cross-validation test methods and assessment metrics. PMID:25548927

  14. Visualizing and Clustering Protein Similarity Networks: Sequences, Structures, and Functions.

    PubMed

    Mai, Te-Lun; Hu, Geng-Ming; Chen, Chi-Ming

    2016-07-01

    Research in the recent decade has demonstrated the usefulness of protein network knowledge in furthering the study of molecular evolution of proteins, understanding the robustness of cells to perturbation, and annotating new protein functions. In this study, we aimed to provide a general clustering approach to visualize the sequence-structure-function relationship of protein networks, and investigate possible causes for inconsistency in the protein classifications based on sequences, structures, and functions. Such visualization of protein networks could facilitate our understanding of the overall relationship among proteins and help researchers comprehend various protein databases. As a demonstration, we clustered 1437 enzymes by their sequences and structures using the minimum span clustering (MSC) method. The general structure of this protein network was delineated at two clustering resolutions, and the second level MSC clustering was found to be highly similar to existing enzyme classifications. The clustering of these enzymes based on sequence, structure, and function information is consistent with each other. For proteases, the Jaccard's similarity coefficient is 0.86 between sequence and function classifications, 0.82 between sequence and structure classifications, and 0.78 between structure and function classifications. From our clustering results, we discussed possible examples of divergent evolution and convergent evolution of enzymes. Our clustering approach provides a panoramic view of the sequence-structure-function network of proteins, helps visualize the relation between related proteins intuitively, and is useful in predicting the structure and function of newly determined protein sequences. PMID:27267620

  15. PCI-SS: MISO dynamic nonlinear protein secondary structure prediction

    PubMed Central

    Green, James R; Korenberg, Michael J; Aboul-Magd, Mohammed O

    2009-01-01

    Background Since the function of a protein is largely dictated by its three dimensional configuration, determining a protein's structure is of fundamental importance to biology. Here we report on a novel approach to determining the one dimensional secondary structure of proteins (distinguishing α-helices, β-strands, and non-regular structures) from primary sequence data which makes use of Parallel Cascade Identification (PCI), a powerful technique from the field of nonlinear system identification. Results Using PSI-BLAST divergent evolutionary profiles as input data, dynamic nonlinear systems are built through a black-box approach to model the process of protein folding. Genetic algorithms (GAs) are applied in order to optimize the architectural parameters of the PCI models. The three-state prediction problem is broken down into a combination of three binary sub-problems and protein structure classifiers are built using 2 layers of PCI classifiers. Careful construction of the optimization, training, and test datasets ensures that no homology exists between any training and testing data. A detailed comparison between PCI and 9 contemporary methods is provided over a set of 125 new protein chains guaranteed to be dissimilar to all training data. Unlike other secondary structure prediction methods, here a web service is developed to provide both human- and machine-readable interfaces to PCI-based protein secondary structure prediction. This server, called PCI-SS, is available at . In addition to a dynamic PHP-generated web interface for humans, a Simple Object Access Protocol (SOAP) interface is added to permit invocation of the PCI-SS service remotely. This machine-readable interface facilitates incorporation of PCI-SS into multi-faceted systems biology analysis pipelines requiring protein secondary structure information, and greatly simplifies high-throughput analyses. XML is used to represent the input protein sequence data and also to encode the resulting

  16. PCDq: human protein complex database with quality index which summarizes different levels of evidences of protein complexes predicted from H-Invitational protein-protein interactions integrative dataset

    PubMed Central

    2012-01-01

    Background Proteins interact with other proteins or biomolecules in complexes to perform cellular functions. Existing protein-protein interaction (PPI) databases and protein complex databases for human proteins are not organized to provide protein complex information or facilitate the discovery of novel subunits. Data integration of PPIs focused specifically on protein complexes, subunits, and their functions. Predicted candidate complexes or subunits are also important for experimental biologists. Description Based on integrated PPI data and literature, we have developed a human protein complex database with a complex quality index (PCDq), which includes both known and predicted complexes and subunits. We integrated six PPI data (BIND, DIP, MINT, HPRD, IntAct, and GNP_Y2H), and predicted human protein complexes by finding densely connected regions in the PPI networks. They were curated with the literature so that missing proteins were complemented and some complexes were merged, resulting in 1,264 complexes comprising 9,268 proteins with 32,198 PPIs. The evidence level of each subunit was assigned as a categorical variable. This indicated whether it was a known subunit, and a specific function was inferable from sequence or network analysis. To summarize the categories of all the subunits in a complex, we devised a complex quality index (CQI) and assigned it to each complex. We examined the proportion of consistency of Gene Ontology (GO) terms among protein subunits of a complex. Next, we compared the expression profiles of the corresponding genes and found that many proteins in larger complexes tend to be expressed cooperatively at the transcript level. The proportion of duplicated genes in a complex was evaluated. Finally, we identified 78 hypothetical proteins that were annotated as subunits of 82 complexes, which included known complexes. Of these hypothetical proteins, after our prediction had been made, four were reported to be actual subunits of the

  17. MEGADOCK: an all-to-all protein-protein interaction prediction system using tertiary structure data.

    PubMed

    Ohue, Masahito; Matsuzaki, Yuri; Uchikoga, Nobuyuki; Ishida, Takashi; Akiyama, Yutaka

    2014-01-01

    The elucidation of protein-protein interaction (PPI) networks is important for understanding cellular structure and function and structure-based drug design. However, the development of an effective method to conduct exhaustive PPI screening represents a computational challenge. We have been investigating a protein docking approach based on shape complementarity and physicochemical properties. We describe here the development of the protein-protein docking software package "MEGADOCK" that samples an extremely large number of protein dockings at high speed. MEGADOCK reduces the calculation time required for docking by using several techniques such as a novel scoring function called the real Pairwise Shape Complementarity (rPSC) score. We showed that MEGADOCK is capable of exhaustive PPI screening by completing docking calculations 7.5 times faster than the conventional docking software, ZDOCK, while maintaining an acceptable level of accuracy. When MEGADOCK was applied to a subset of a general benchmark dataset to predict 120 relevant interacting pairs from 120 x 120 = 14,400 combinations of proteins, an F-measure value of 0.231 was obtained. Further, we showed that MEGADOCK can be applied to a large-scale protein-protein interaction-screening problem with accuracy better than random. When our approach is combined with parallel high-performance computing systems, it is now feasible to search and analyze protein-protein interactions while taking into account three-dimensional structures at the interactome scale. MEGADOCK is freely available at http://www.bi.cs.titech.ac.jp/megadock. PMID:23855673

  18. Protein Secondary Structure Prediction Using Deep Convolutional Neural Fields.

    PubMed

    Wang, Sheng; Peng, Jian; Ma, Jianzhu; Xu, Jinbo

    2016-01-01

    Protein secondary structure (SS) prediction is important for studying protein structure and function. When only the sequence (profile) information is used as input feature, currently the best predictors can obtain ~80% Q3 accuracy, which has not been improved in the past decade. Here we present DeepCNF (Deep Convolutional Neural Fields) for protein SS prediction. DeepCNF is a Deep Learning extension of Conditional Neural Fields (CNF), which is an integration of Conditional Random Fields (CRF) and shallow neural networks. DeepCNF can model not only complex sequence-structure relationship by a deep hierarchical architecture, but also interdependency between adjacent SS labels, so it is much more powerful than CNF. Experimental results show that DeepCNF can obtain ~84% Q3 accuracy, ~85% SOV score, and ~72% Q8 accuracy, respectively, on the CASP and CAMEO test proteins, greatly outperforming currently popular predictors. As a general framework, DeepCNF can be used to predict other protein structure properties such as contact number, disorder regions, and solvent accessibility. PMID:26752681

  19. Protein Secondary Structure Prediction Using Deep Convolutional Neural Fields

    PubMed Central

    Wang, Sheng; Peng, Jian; Ma, Jianzhu; Xu, Jinbo

    2016-01-01

    Protein secondary structure (SS) prediction is important for studying protein structure and function. When only the sequence (profile) information is used as input feature, currently the best predictors can obtain ~80% Q3 accuracy, which has not been improved in the past decade. Here we present DeepCNF (Deep Convolutional Neural Fields) for protein SS prediction. DeepCNF is a Deep Learning extension of Conditional Neural Fields (CNF), which is an integration of Conditional Random Fields (CRF) and shallow neural networks. DeepCNF can model not only complex sequence-structure relationship by a deep hierarchical architecture, but also interdependency between adjacent SS labels, so it is much more powerful than CNF. Experimental results show that DeepCNF can obtain ~84% Q3 accuracy, ~85% SOV score, and ~72% Q8 accuracy, respectively, on the CASP and CAMEO test proteins, greatly outperforming currently popular predictors. As a general framework, DeepCNF can be used to predict other protein structure properties such as contact number, disorder regions, and solvent accessibility. PMID:26752681

  20. Protein Secondary Structure Prediction Using Deep Convolutional Neural Fields

    NASA Astrophysics Data System (ADS)

    Wang, Sheng; Peng, Jian; Ma, Jianzhu; Xu, Jinbo

    2016-01-01

    Protein secondary structure (SS) prediction is important for studying protein structure and function. When only the sequence (profile) information is used as input feature, currently the best predictors can obtain ~80% Q3 accuracy, which has not been improved in the past decade. Here we present DeepCNF (Deep Convolutional Neural Fields) for protein SS prediction. DeepCNF is a Deep Learning extension of Conditional Neural Fields (CNF), which is an integration of Conditional Random Fields (CRF) and shallow neural networks. DeepCNF can model not only complex sequence-structure relationship by a deep hierarchical architecture, but also interdependency between adjacent SS labels, so it is much more powerful than CNF. Experimental results show that DeepCNF can obtain ~84% Q3 accuracy, ~85% SOV score, and ~72% Q8 accuracy, respectively, on the CASP and CAMEO test proteins, greatly outperforming currently popular predictors. As a general framework, DeepCNF can be used to predict other protein structure properties such as contact number, disorder regions, and solvent accessibility.

  1. Predicting Subcellular Localization of Apoptosis Proteins Combining GO Features of Homologous Proteins and Distance Weighted KNN Classifier

    PubMed Central

    Wang, Xiao; Li, Hui; Zhang, Qiuwen; Wang, Rong

    2016-01-01

    Apoptosis proteins play a key role in maintaining the stability of organism; the functions of apoptosis proteins are related to their subcellular locations which are used to understand the mechanism of programmed cell death. In this paper, we utilize GO annotation information of apoptosis proteins and their homologous proteins retrieved from GOA database to formulate feature vectors and then combine the distance weighted KNN classification algorithm with them to solve the data imbalance problem existing in CL317 data set to predict subcellular locations of apoptosis proteins. It is found that the number of homologous proteins can affect the overall prediction accuracy. Under the optimal number of homologous proteins, the overall prediction accuracy of our method on CL317 data set reaches 96.8% by Jackknife test. Compared with other existing methods, it shows that our proposed method is very effective and better than others for predicting subcellular localization of apoptosis proteins. PMID:27213149

  2. BPROMPT: A consensus server for membrane protein prediction.

    PubMed

    Taylor, Paul D; Attwood, Teresa K; Flower, Darren R

    2003-07-01

    Protein structure prediction is a cornerstone of bioinformatics research. Membrane proteins require their own prediction methods due to their intrinsically different composition. A variety of tools exist for topology prediction of membrane proteins, many of them available on the Internet. The server described in this paper, BPROMPT (Bayesian PRediction Of Membrane Protein Topology), uses a Bayesian Belief Network to combine the results of other prediction methods, providing a more accurate consensus prediction. Topology predictions with accuracies of 70% for prokaryotes and 53% for eukaryotes were achieved. BPROMPT can be accessed at http://www.jenner.ac.uk/BPROMPT. PMID:12824397

  3. Prediction and characterization of protein-protein interaction network in Bacillus licheniformis WX-02.

    PubMed

    Han, Yi-Chao; Song, Jia-Ming; Wang, Long; Shu, Cheng-Cheng; Guo, Jing; Chen, Ling-Ling

    2016-01-01

    In this study, we constructed a protein-protein interaction (PPI) network of B. licheniformis strain WX-02 with interolog method and domain-based method, which contained 15,864 edges and 2,448 nodes. Although computationally predicted networks have relatively low coverage and high false-positive rate, our prediction was confirmed from three perspectives: local structural features, functional similarities and transcriptional correlations. Further analysis of the COG heat map showed that protein interactions in B. licheniformis WX-02 mainly occurred in the same functional categories. By incorporating the transcriptome data, we found that the topological properties of the PPI network were robust under normal and high salt conditions. In addition, 267 different protein complexes were identified and 117 poorly characterized proteins were annotated with certain functions based on the PPI network. Furthermore, the sub-network showed that a hub protein CcpA jointed directly or indirectly many proteins related to γ-PGA synthesis and regulation, such as PgsB, GltA, GltB, ProB, ProJ, YcgM and two signal transduction systems ComP-ComA and DegS-DegU. Thus, CcpA might play an important role in the regulation of γ-PGA synthesis. This study therefore will facilitate the understanding of the complex cellular behaviors and mechanisms of γ-PGA synthesis in B. licheniformis WX-02. PMID:26782814

  4. Prediction and characterization of protein-protein interaction network in Bacillus licheniformis WX-02

    PubMed Central

    Han, Yi-Chao; Song, Jia-Ming; Wang, Long; Shu, Cheng-Cheng; Guo, Jing; Chen, Ling-Ling

    2016-01-01

    In this study, we constructed a protein-protein interaction (PPI) network of B. licheniformis strain WX-02 with interolog method and domain-based method, which contained 15,864 edges and 2,448 nodes. Although computationally predicted networks have relatively low coverage and high false-positive rate, our prediction was confirmed from three perspectives: local structural features, functional similarities and transcriptional correlations. Further analysis of the COG heat map showed that protein interactions in B. licheniformis WX-02 mainly occurred in the same functional categories. By incorporating the transcriptome data, we found that the topological properties of the PPI network were robust under normal and high salt conditions. In addition, 267 different protein complexes were identified and 117 poorly characterized proteins were annotated with certain functions based on the PPI network. Furthermore, the sub-network showed that a hub protein CcpA jointed directly or indirectly many proteins related to γ-PGA synthesis and regulation, such as PgsB, GltA, GltB, ProB, ProJ, YcgM and two signal transduction systems ComP-ComA and DegS-DegU. Thus, CcpA might play an important role in the regulation of γ-PGA synthesis. This study therefore will facilitate the understanding of the complex cellular behaviors and mechanisms of γ-PGA synthesis in B. licheniformis WX-02. PMID:26782814

  5. Collective Dynamics Differentiates Functional Divergence in Protein Evolution

    PubMed Central

    Glembo, Tyler J.; Farrell, Daniel W.; Gerek, Z. Nevin; Thorpe, M. F.; Ozkan, S. Banu

    2012-01-01

    Protein evolution is most commonly studied by analyzing related protein sequences and generating ancestral sequences through Bayesian and Maximum Likelihood methods, and/or by resurrecting ancestral proteins in the lab and performing ligand binding studies to determine function. Structural and dynamic evolution have largely been left out of molecular evolution studies. Here we incorporate both structure and dynamics to elucidate the molecular principles behind the divergence in the evolutionary path of the steroid receptor proteins. We determine the likely structure of three evolutionarily diverged ancestral steroid receptor proteins using the Zipping and Assembly Method with FRODA (ZAMF). Our predictions are within ∼2.7 Å all-atom RMSD of the respective crystal structures of the ancestral steroid receptors. Beyond static structure prediction, a particular feature of ZAMF is that it generates protein dynamics information. We investigate the differences in conformational dynamics of diverged proteins by obtaining the most collective motion through essential dynamics. Strikingly, our analysis shows that evolutionarily diverged proteins of the same family do not share the same dynamic subspace, while those sharing the same function are simultaneously clustered together and distant from those, that have functionally diverged. Dynamic analysis also enables those mutations that most affect dynamics to be identified. It correctly predicts all mutations (functional and permissive) necessary to evolve new function and ∼60% of permissive mutations necessary to recover ancestral function. PMID:22479170

  6. Predicting the disruption by of a protein-ligand interaction

    PubMed Central

    Pible, Olivier; Vidaud, Claude; Plantevin, Sophie; Pellequer, Jean-Luc; Quéméneur, Eric

    2010-01-01

    The uranyl cation () can be suspected to interfere with the binding of essential metal cations to proteins, underlying some mechanisms of toxicity. A dedicated computational screen was used to identify binding sites within a set of nonredundant protein structures. The list of potential targets was compared to data from a small molecules interaction database to pinpoint specific examples where should be able to bind in the vicinity of an essential cation, and would be likely to affect the function of the corresponding protein. The C-reactive protein appeared as an interesting hit since its structure involves critical calcium ions in the binding of phosphorylcholine. Biochemical experiments confirmed the predicted binding site for and it was demonstrated by surface plasmon resonance assays that binding to CRP prevents the calcium-mediated binding of phosphorylcholine. Strikingly, the apparent affinity of for native CRP was almost 100-fold higher than that of Ca2+. This result exemplifies in the case of CRP the capability of our computational tool to predict effective binding sites for in proteins and is a first evidence of calcium substitution by the uranyl cation in a native protein. PMID:20842713

  7. Large-scale de novo prediction of physical protein-protein association.

    PubMed

    Elefsinioti, Antigoni; Saraç, Ömer Sinan; Hegele, Anna; Plake, Conrad; Hubner, Nina C; Poser, Ina; Sarov, Mihail; Hyman, Anthony; Mann, Matthias; Schroeder, Michael; Stelzl, Ulrich; Beyer, Andreas

    2011-11-01

    Information about the physical association of proteins is extensively used for studying cellular processes and disease mechanisms. However, complete experimental mapping of the human interactome will remain prohibitively difficult in the near future. Here we present a map of predicted human protein interactions that distinguishes functional association from physical binding. Our network classifies more than 5 million protein pairs predicting 94,009 new interactions with high confidence. We experimentally tested a subset of these predictions using yeast two-hybrid analysis and affinity purification followed by quantitative mass spectrometry. Thus we identified 462 new protein-protein interactions and confirmed the predictive power of the network. These independent experiments address potential issues of circular reasoning and are a distinctive feature of this work. Analysis of the physical interactome unravels subnetworks mediating between different functional and physical subunits of the cell. Finally, we demonstrate the utility of the network for the analysis of molecular mechanisms of complex diseases by applying it to genome-wide association studies of neurodegenerative diseases. This analysis provides new evidence implying TOMM40 as a factor involved in Alzheimer's disease. The network provides a high-quality resource for the analysis of genomic data sets and genetic association studies in particular. Our interactome is available via the hPRINT web server at: www.print-db.org. PMID:21836163

  8. Large-scale De Novo Prediction of Physical Protein-Protein Association*

    PubMed Central

    Elefsinioti, Antigoni; Saraç, Ömer Sinan; Hegele, Anna; Plake, Conrad; Hubner, Nina C.; Poser, Ina; Sarov, Mihail; Hyman, Anthony; Mann, Matthias; Schroeder, Michael; Stelzl, Ulrich; Beyer, Andreas

    2011-01-01

    Information about the physical association of proteins is extensively used for studying cellular processes and disease mechanisms. However, complete experimental mapping of the human interactome will remain prohibitively difficult in the near future. Here we present a map of predicted human protein interactions that distinguishes functional association from physical binding. Our network classifies more than 5 million protein pairs predicting 94,009 new interactions with high confidence. We experimentally tested a subset of these predictions using yeast two-hybrid analysis and affinity purification followed by quantitative mass spectrometry. Thus we identified 462 new protein-protein interactions and confirmed the predictive power of the network. These independent experiments address potential issues of circular reasoning and are a distinctive feature of this work. Analysis of the physical interactome unravels subnetworks mediating between different functional and physical subunits of the cell. Finally, we demonstrate the utility of the network for the analysis of molecular mechanisms of complex diseases by applying it to genome-wide association studies of neurodegenerative diseases. This analysis provides new evidence implying TOMM40 as a factor involved in Alzheimer's disease. The network provides a high-quality resource for the analysis of genomic data sets and genetic association studies in particular. Our interactome is available via the hPRINT web server at: www.print-db.org. PMID:21836163

  9. JAFA: a protein function annotation meta-server

    PubMed Central

    Friedberg, Iddo; Harder, Tim; Godzik, Adam

    2006-01-01

    With the high number of sequences and structures streaming in from genomic projects, there is a need for more powerful and sophisticated annotation tools. Most problematic of the annotation efforts is predicting gene and protein function. Over the past few years there has been considerable progress in automated protein function prediction, using a diverse set of methods. Nevertheless, no single method reports all the information possible, and molecular biologists resort to ‘shopping around’ using different methods: a cumbersome and time-consuming practice. Here we present the Joined Assembly of Function Annotations, or JAFA server. JAFA queries several function prediction servers with a protein sequence and assembles the returned predictions in a legible, non-redundant format. In this manner, JAFA combines the predictions of several servers to provide a comprehensive view of what are the predicted functions of the proteins. JAFA also offers its own output, and the individual programs' predictions for further processing. JAFA is available for use from . PMID:16845030

  10. Assessment of protein domain fusions in human protein interaction networks prediction: application to the human kinetochore model.

    PubMed

    Morilla, Ian; Lees, Jon G; Reid, Adam J; Orengo, Christine; Ranea, Juan A G

    2010-12-31

    In order to understand how biological systems function it is necessary to determine the interactions and associations between proteins. Some proteins, involved in a common biological process and encoded by separate genes in one organism, can be found fused within a single protein chain in other organisms. By detecting these triplets, a functional relationship can be established between the unfused proteins. Here we use a domain fusion prediction method to predict these protein interactions for the human interactome. We observed that gene fusion events are more related to physical interaction between proteins than to other weaker functional relationships such as participation in a common biological pathway. These results suggest that domain fusion is an appropriate method for predicting protein complexes. The most reliable fused domain predictions were used to build protein-protein interaction (PPI) networks. These predicted PPI network models showed the same topological features as real biological networks and different features from random behaviour. We built the PPI domain fusion sub-network model of the human kinetochore and observed that the majority of the predicted interactions have not yet been experimentally characterised in the publicly available PPI repositories. The study of the human kinetochore domain fusion sub-network reveals undiscovered kinetochore proteins with presumably relevant functions, such as hubs with many connections in the kinetochore sub-network. These results suggest that experimentally hidden regions in the predicted PPI networks contain key functional elements, associated with important functional areas, still undiscovered in the human interactome. Until novel experiments shed light on these hidden regions; domain fusion predictions provide a valuable approach for exploring them. PMID:20851221

  11. 3D protein structure prediction using Imperialist Competitive algorithm and half sphere exposure prediction.

    PubMed

    Khaji, Erfan; Karami, Masoumeh; Garkani-Nejad, Zahra

    2016-02-21

    Predicting the native structure of proteins based on half-sphere exposure and contact numbers has been studied deeply within recent years. Online predictors of these vectors and secondary structures of amino acids sequences have made it possible to design a function for the folding process. By choosing variant structures and directs for each secondary structure, a random conformation can be generated, and a potential function can then be assigned. Minimizing the potential function utilizing meta-heuristic algorithms is the final step of finding the native structure of a given amino acid sequence. In this work, Imperialist Competitive algorithm was used in order to accelerate the process of minimization. Moreover, we applied an adaptive procedure to apply revolutionary changes. Finally, we considered a more accurate tool for prediction of secondary structure. The results of the computational experiments on standard benchmark show the superiority of the new algorithm over the previous methods with similar potential function. PMID:26718864

  12. TOPPER: Topology Prediction of Transmembrane Protein Based on Evidential Reasoning

    PubMed Central

    Deng, Xinyang; Liu, Qi; Hu, Yong; Deng, Yong

    2013-01-01

    The topology prediction of transmembrane protein is a hot research field in bioinformatics and molecular biology. It is a typical pattern recognition problem. Various prediction algorithms are developed to predict the transmembrane protein topology since the experimental techniques have been restricted by many stringent conditions. Usually, these individual prediction algorithms depend on various principles such as the hydrophobicity or charges of residues. In this paper, an evidential topology prediction method for transmembrane protein is proposed based on evidential reasoning, which is called TOPPER (topology prediction of transmembrane protein based on evidential reasoning). In the proposed method, the prediction results of multiple individual prediction algorithms can be transformed into BPAs (basic probability assignments) according to the confusion matrix. Then, the final prediction result can be obtained by the combination of each individual prediction base on Dempster's rule of combination. The experimental results show that the proposed method is superior to the individual prediction algorithms, which illustrates the effectiveness of the proposed method. PMID:23401665

  13. On lattice protein structure prediction revisited.

    PubMed

    Dotu, Ivan; Cebrián, Manuel; Van Hentenryck, Pascal; Clote, Peter

    2011-01-01

    Protein structure prediction is regarded as a highly challenging problem both for the biology and for the computational communities. In recent years, many approaches have been developed, moving to increasingly complex lattice models and off-lattice models. This paper presents a Large Neighborhood Search (LNS) to find the native state for the Hydrophobic-Polar (HP) model on the Face-Centered Cubic (FCC) lattice or, in other words, a self-avoiding walk on the FCC lattice having a maximum number of H-H contacts. The algorithm starts with a tabu-search algorithm, whose solution is then improved by a combination of constraint programming and LNS. The flexible framework of this hybrid algorithm allows an adaptation to the Miyazawa-Jernigan contact potential, in place of the HP model, thus suggesting its potential for tertiary structure prediction. Benchmarking statistics are given for our method against the hydrophobic core threading program HPstruct, an exact method which can be viewed as complementary to our method. PMID:21358007

  14. Prediction of lipid-binding regions in cytoplasmic and extracellular loops of membrane proteins as exemplified by protein translocation membrane proteins.

    PubMed

    Keller, Rob C A

    2013-01-01

    The presence of possible lipid-binding regions in the cytoplasmic or extracellular loops of membrane proteins with an emphasis on protein translocation membrane proteins was investigated in this study using bioinformatics. Recent developments in approaches recognizing lipid-binding regions in proteins were found to be promising. In this study a total bioinformatics approach specialized in identifying lipid-binding helical regions in proteins was explored. Two features of the protein translocation membrane proteins, the position of the transmembrane regions and the identification of additional lipid-binding regions, were analyzed. A number of well-studied protein translocation membrane protein structures were checked in order to demonstrate the predictive value of the bioinformatics approach. Furthermore, the results demonstrated that lipid-binding regions in the cytoplasmic and extracellular loops in protein translocation membrane proteins can be predicted, and it is proposed that the interaction of these regions with phospholipids is important for proper functioning during protein translocation. PMID:22961045

  15. Claudin Proteins And Neuronal Function.

    PubMed

    Devaux, Jérôme; Fykkolodziej, Bozena; Gow, Alexander

    2010-01-01

    The identification and characterization of the claudin family of tight junction (TJ) proteins in the late 1990s ushered in a new era for research into the molecular and cellular biology of intercellular junctions. Since that time, TJs have been studied in the contexts of many diseases including deafness, male infertility, cancer, bacterial invasion and liver and kidney disorders. In this review, we consider the role of claudins in the nervous system focusing on the mechanisms by which TJs in glial cells are involved in neuronal function. Electrophysiological evidence suggests that claudins may operate in the central nervous system (CNS) in a manner similar to polarized epithelia. We also evaluate hypotheses that TJs are the gatekeepers of an immune-privileged myelin compartment and that TJs emerged during evolution to form major adhesive forces within the myelin sheath. Finally, we consider the implications of CNS myelin TJs in the contexts of behavioral disorders (schizophrenia) and demyelinating/hypomyelinating diseases (multiple sclerosis and the leukodystrophies), and explore evidence of a possible mechanism governing affective disorder symptoms in patients with white matter abnormalities. PMID:25013353

  16. Network Analysis of Circular Permutations in Multidomain Proteins Reveals Functional Linkages for Uncharacterized Proteins

    PubMed Central

    Adjeroh, Donald; Jiang, Yue; Jiang, Bing-Hua; Lin, Jie

    2014-01-01

    Various studies have implicated different multidomain proteins in cancer. However, there has been little or no detailed study on the role of circular multidomain proteins in the general problem of cancer or on specific cancer types. This work represents an initial attempt at investigating the potential for predicting linkages between known cancer-associated proteins with uncharacterized or hypothetical multidomain proteins, based primarily on circular permutation (CP) relationships. First, we propose an efficient algorithm for rapid identification of both exact and approximate CPs in multidomain proteins. Using the circular relations identified, we construct networks between multidomain proteins, based on which we perform functional annotation of multidomain proteins. We then extend the method to construct subnetworks for selected cancer subtypes, and performed prediction of potential link-ages between uncharacterized multidomain proteins and the selected cancer types. We include practical results showing the performance of the proposed methods. PMID:25741177

  17. Biological cluster evaluation for gene function prediction.

    PubMed

    Klie, Sebastian; Nikoloski, Zoran; Selbig, Joachim

    2014-06-01

    Recent advances in high-throughput omics techniques render it possible to decode the function of genes by using the "guilt-by-association" principle on biologically meaningful clusters of gene expression data. However, the existing frameworks for biological evaluation of gene clusters are hindered by two bottleneck issues: (1) the choice for the number of clusters, and (2) the external measures which do not take in consideration the structure of the analyzed data and the ontology of the existing biological knowledge. Here, we address the identified bottlenecks by developing a novel framework that allows not only for biological evaluation of gene expression clusters based on existing structured knowledge, but also for prediction of putative gene functions. The proposed framework facilitates propagation of statistical significance at each of the following steps: (1) estimating the number of clusters, (2) evaluating the clusters in terms of novel external structural measures, (3) selecting an optimal clustering algorithm, and (4) predicting gene functions. The framework also includes a method for evaluation of gene clusters based on the structure of the employed ontology. Moreover, our method for obtaining a probabilistic range for the number of clusters is demonstrated valid on synthetic data and available gene expression profiles from Saccharomyces cerevisiae. Finally, we propose a network-based approach for gene function prediction which relies on the clustering of optimal score and the employed ontology. Our approach effectively predicts gene function on the Saccharomyces cerevisiae data set and is also employed to obtain putative gene functions for an Arabidopsis thaliana data set. PMID:20059365

  18. Minimalist ensemble algorithms for genome-wide protein localization prediction

    PubMed Central

    2012-01-01

    Background Computational prediction of protein subcellular localization can greatly help to elucidate its functions. Despite the existence of dozens of protein localization prediction algorithms, the prediction accuracy and coverage are still low. Several ensemble algorithms have been proposed to improve the prediction performance, which usually include as many as 10 or more individual localization algorithms. However, their performance is still limited by the running complexity and redundancy among individual prediction algorithms. Results This paper proposed a novel method for rational design of minimalist ensemble algorithms for practical genome-wide protein subcellular localization prediction. The algorithm is based on combining a feature selection based filter and a logistic regression classifier. Using a novel concept of contribution scores, we analyzed issues of algorithm redundancy, consensus mistakes, and algorithm complementarity in designing ensemble algorithms. We applied the proposed minimalist logistic regression (LR) ensemble algorithm to two genome-wide datasets of Yeast and Human and compared its performance with current ensemble algorithms. Experimental results showed that the minimalist ensemble algorithm can achieve high prediction accuracy with only 1/3 to 1/2 of individual predictors of current ensemble algorithms, which greatly reduces computational complexity and running time. It was found that the high performance ensemble algorithms are usually composed of the predictors that together cover most of available features. Compared to the best individual predictor, our ensemble algorithm improved the prediction accuracy from AUC score of 0.558 to 0.707 for the Yeast dataset and from 0.628 to 0.646 for the Human dataset. Compared with popular weighted voting based ensemble algorithms, our classifier-based ensemble algorithms achieved much better performance without suffering from inclusion of too many individual predictors. Conclusions We

  19. Exploration of the dynamic properties of protein complexes predicted from spatially constrained protein-protein interaction networks.

    PubMed

    Yen, Eric A; Tsay, Aaron; Waldispuhl, Jerome; Vogel, Jackie

    2014-05-01

    Protein complexes are not static, but rather highly dynamic with subunits that undergo 1-dimensional diffusion with respect to each other. Interactions within protein complexes are modulated through regulatory inputs that alter interactions and introduce new components and deplete existing components through exchange. While it is clear that the structure and function of any given protein complex is coupled to its dynamical properties, it remains a challenge to predict the possible conformations that complexes can adopt. Protein-fragment Complementation Assays detect physical interactions between protein pairs constrained to ≤8 nm from each other in living cells. This method has been used to build networks composed of 1000s of pair-wise interactions. Significantly, these networks contain a wealth of dynamic information, as the assay is fully reversible and the proteins are expressed in their natural context. In this study, we describe a method that extracts this valuable information in the form of predicted conformations, allowing the user to explore the conformational landscape, to search for structures that correlate with an activity state, and estimate the abundance of conformations in the living cell. The generator is based on a Markov Chain Monte Carlo simulation that uses the interaction dataset as input and is constrained by the physical resolution of the assay. We applied this method to an 18-member protein complex composed of the seven core proteins of the budding yeast Arp2/3 complex and 11 associated regulators and effector proteins. We generated 20,480 output structures and identified conformational states using principle component analysis. We interrogated the conformation landscape and found evidence of symmetry breaking, a mixture of likely active and inactive conformational states and dynamic exchange of the core protein Arc15 between core and regulatory components. Our method provides a novel tool for prediction and visualization of the hidden

  20. Linking structural features of protein complexes and biological function.

    PubMed

    Sowmya, Gopichandran; Breen, Edmond J; Ranganathan, Shoba

    2015-09-01

    Protein-protein interaction (PPI) establishes the central basis for complex cellular networks in a biological cell. Association of proteins with other proteins occurs at varying affinities, yet with a high degree of specificity. PPIs lead to diverse functionality such as catalysis, regulation, signaling, immunity, and inhibition, playing a crucial role in functional genomics. The molecular principle of such interactions is often elusive in nature. Therefore, a comprehensive analysis of known protein complexes from the Protein Data Bank (PDB) is essential for the characterization of structural interface features to determine structure-function relationship. Thus, we analyzed a nonredundant dataset of 278 heterodimer protein complexes, categorized into major functional classes, for distinguishing features. Interestingly, our analysis has identified five key features (interface area, interface polar residue abundance, hydrogen bonds, solvation free energy gain from interface formation, and binding energy) that are discriminatory among the functional classes using Kruskal-Wallis rank sum test. Significant correlations between these PPI interface features amongst functional categories are also documented. Salt bridges correlate with interface area in regulator-inhibitors (r = 0.75). These representative features have implications for the prediction of potential function of novel protein complexes. The results provide molecular insights for better understanding of PPIs and their relation to biological functions. PMID:26131659

  1. Functionalizing Microporous Membranes for Protein Purification and Protein Digestion

    NASA Astrophysics Data System (ADS)

    Dong, Jinlan; Bruening, Merlin L.

    2015-07-01

    This review examines advances in the functionalization of microporous membranes for protein purification and the development of protease-containing membranes for controlled protein digestion prior to mass spectrometry analysis. Recent studies confirm that membranes are superior to bead-based columns for rapid protein capture, presumably because convective mass transport in membrane pores rapidly brings proteins to binding sites. Modification of porous membranes with functional polymeric films or TiO2 nanoparticles yields materials that selectively capture species ranging from phosphopeptides to His-tagged proteins, and protein-binding capacities often exceed those of commercial beads. Thin membranes also provide a convenient framework for creating enzyme-containing reactors that afford control over residence times. With millisecond residence times, reactors with immobilized proteases limit protein digestion to increase sequence coverage in mass spectrometry analysis and facilitate elucidation of protein structures. This review emphasizes the advantages of membrane-based techniques and concludes with some challenges for their practical application.

  2. Predictive energy landscapes for folding membrane protein assemblies

    NASA Astrophysics Data System (ADS)

    Truong, Ha H.; Kim, Bobby L.; Schafer, Nicholas P.; Wolynes, Peter G.

    2015-12-01

    We study the energy landscapes for membrane protein oligomerization using the Associative memory, Water mediated, Structure and Energy Model with an implicit membrane potential (AWSEM-membrane), a coarse-grained molecular dynamics model previously optimized under the assumption that the energy landscapes for folding α-helical membrane protein monomers are funneled once their native topology within the membrane is established. In this study we show that the AWSEM-membrane force field is able to sample near native binding interfaces of several oligomeric systems. By predicting candidate structures using simulated annealing, we further show that degeneracies in predicting structures of membrane protein monomers are generally resolved in the folding of the higher order assemblies as is the case in the assemblies of both nicotinic acetylcholine receptor and V-type Na+-ATPase dimers. The physics of the phenomenon resembles domain swapping, which is consistent with the landscape following the principle of minimal frustration. We revisit also the classic Khorana study of the reconstitution of bacteriorhodopsin from its fragments, which is the close analogue of the early Anfinsen experiment on globular proteins. Here, we show the retinal cofactor likely plays a major role in selecting the final functional assembly.

  3. Functional correlations of respiratory syncytial virus proteins to intrinsic disorder.

    PubMed

    Whelan, Jillian N; Reddy, Krishna D; Uversky, Vladimir N; Teng, Michael N

    2016-04-26

    Protein intrinsic disorder is an important characteristic demonstrated by the absence of higher order structure, and is commonly detected in multifunctional proteins encoded by RNA viruses. Intrinsically disordered regions (IDRs) of proteins exhibit high flexibility and solvent accessibility, which permit several distinct protein functions, including but not limited to binding of multiple partners and accessibility for post-translational modifications. IDR-containing viral proteins can therefore execute various functional roles to enable productive viral replication. Respiratory syncytial virus (RSV) is a globally circulating, non-segmented, negative sense (NNS) RNA virus that causes severe lower respiratory infections. In this study, we performed a comprehensive evaluation of predicted intrinsic disorder of the RSV proteome to better understand the functional role of RSV protein IDRs. We included 27 RSV strains to sample major RSV subtypes and genotypes, as well as geographic and temporal isolate differences. Several types of disorder predictions were applied to the RSV proteome, including per-residue (PONDR®-FIT and PONDR® VL-XT), binary (CH, CDF, CH-CDF), and disorder-based interactions (ANCHOR and MoRFpred). We classified RSV IDRs by size, frequency and function. Finally, we determined the functional implications of RSV IDRs by mapping predicted IDRs to known functional domains of each protein. Identification of RSV IDRs within functional domains improves our understanding of RSV pathogenesis in addition to providing potential therapeutic targets. Furthermore, this approach can be applied to other NNS viruses that encode essential multifunctional proteins for the elucidation of viral protein regions that can be manipulated for attenuation of viral replication. PMID:27062995

  4. Functional classification of CATH superfamilies: a domain-based approach for protein function annotation

    PubMed Central

    Das, Sayoni; Lee, David; Sillitoe, Ian; Dawson, Natalie L.; Lees, Jonathan G.; Orengo, Christine A.

    2015-01-01

    Motivation: Computational approaches that can predict protein functions are essential to bridge the widening function annotation gap especially since <1.0% of all proteins in UniProtKB have been experimentally characterized. We present a domain-based method for protein function classification and prediction of functional sites that exploits functional sub-classification of CATH superfamilies. The superfamilies are sub-classified into functional families (FunFams) using a hierarchical clustering algorithm supervised by a new classification method, FunFHMMer. Results: FunFHMMer generates more functionally coherent groupings of protein sequences than other domain-based protein classifications. This has been validated using known functional information. The conserved positions predicted by the FunFams are also found to be enriched in known functional residues. Moreover, the functional annotations provided by the FunFams are found to be more precise than other domain-based resources. FunFHMMer currently identifies 110 439 FunFams in 2735 superfamilies which can be used to functionally annotate > 16 million domain sequences. Availability and implementation: All FunFam annotation data are made available through the CATH webpages (http://www.cathdb.info). The FunFHMMer webserver (http://www.cathdb.info/search/by_funfhmmer) allows users to submit query sequences for assignment to a CATH FunFam. Contact: sayoni.das.12@ucl.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online. PMID:26139634

  5. Origins of Protein Functions in Cells

    NASA Technical Reports Server (NTRS)

    Seelig, Burchard; Pohorille, Andrzej

    2011-01-01

    In modern organisms proteins perform a majority of cellular functions, such as chemical catalysis, energy transduction and transport of material across cell walls. Although great strides have been made towards understanding protein evolution, a meaningful extrapolation from contemporary proteins to their earliest ancestors is virtually impossible. In an alternative approach, the origin of water-soluble proteins was probed through the synthesis and in vitro evolution of very large libraries of random amino acid sequences. In combination with computer modeling and simulations, these experiments allow us to address a number of fundamental questions about the origins of proteins. Can functionality emerge from random sequences of proteins? How did the initial repertoire of functional proteins diversify to facilitate new functions? Did this diversification proceed primarily through drawing novel functionalities from random sequences or through evolution of already existing proto-enzymes? Did protein evolution start from a pool of proteins defined by a frozen accident and other collections of proteins could start a different evolutionary pathway? Although we do not have definitive answers to these questions yet, important clues have been uncovered. In one example (Keefe and Szostak, 2001), novel ATP binding proteins were identified that appear to be unrelated in both sequence and structure to any known ATP binding proteins. One of these proteins was subsequently redesigned computationally to bind GTP through introducing several mutations that introduce targeted structural changes to the protein, improve its binding to guanine and prevent water from accessing the active center. This study facilitates further investigations of individual evolutionary steps that lead to a change of function in primordial proteins. In a second study (Seelig and Szostak, 2007), novel enzymes were generated that can join two pieces of RNA in a reaction for which no natural enzymes are known

  6. [Prediction of short loops in the proteins with internal disorder].

    PubMed

    Deriusheva, E I; Galzitskaia, O V; Serdiuk, I N

    2008-01-01

    New possibility of the FoldUnfold program for prediction of short disordered regions (loops), which appears by using the short window width (3 amino acid residues), was described. For three representatives of the proteins G family the FoldUnfold program predicted almost all short loops and yield results are well compatible with the X-ray structure data. We have classified the loops predicted in the protein Ras-p21 structure in two types. In the first type, loops have high values of the Debye-Waller factor typical of the so-called functional loops (flexible loops). In the other type, loops have lower values of the Debye-Waller factor and can be considered as loops connecting secondary structure elements (rigid loops). When the results of prediction with the use of our program are compared with the results of other programs (PONDR, RONN, DisEMBL, PreLINK, IUPred, GlobPlot 2, FoldIndex), it is seen that the first enables far better prediction of short loop positions. Use of FoldUnfold for ubiquitin-like domain h-PLIC-2 allows to resolve such task as definition of boundary between the structured and unstructured regions in proteins with a big portion of disordered regions. The FoldUnfold program defines a clear boundary between the structured and unstructured regions at amino acid residues 30-31,whereas each of the other programs outlines the boundary from the 28-th amino acid residues through the 70th. PMID:19140328

  7. Gene Function Prediction from Functional Association Networks Using Kernel Partial Least Squares Regression

    PubMed Central

    Lehtinen, Sonja; Lees, Jon; Bähler, Jürg; Shawe-Taylor, John; Orengo, Christine

    2015-01-01

    With the growing availability of large-scale biological datasets, automated methods of extracting functionally meaningful information from this data are becoming increasingly important. Data relating to functional association between genes or proteins, such as co-expression or functional association, is often represented in terms of gene or protein networks. Several methods of predicting gene function from these networks have been proposed. However, evaluating the relative performance of these algorithms may not be trivial: concerns have been raised over biases in different benchmarking methods and datasets, particularly relating to non-independence of functional association data and test data. In this paper we propose a new network-based gene function prediction algorithm using a commute-time kernel and partial least squares regression (Compass). We compare Compass to GeneMANIA, a leading network-based prediction algorithm, using a number of different benchmarks, and find that Compass outperforms GeneMANIA on these benchmarks. We also explicitly explore problems associated with the non-independence of functional association data and test data. We find that a benchmark based on the Gene Ontology database, which, directly or indirectly, incorporates information from other databases, may considerably overestimate the performance of algorithms exploiting functional association data for prediction. PMID:26288239

  8. Predicting PDZ domain mediated protein interactions from structure

    PubMed Central

    2013-01-01

    Background PDZ domains are structural protein domains that recognize simple linear amino acid motifs, often at protein C-termini, and mediate protein-protein interactions (PPIs) in important biological processes, such as ion channel regulation, cell polarity and neural development. PDZ domain-peptide interaction predictors have been developed based on domain and peptide sequence information. Since domain structure is known to influence binding specificity, we hypothesized that structural information could be used to predict new interactions compared to sequence-based predictors. Results We developed a novel computational predictor of PDZ domain and C-terminal peptide interactions using a support vector machine trained with PDZ domain structure and peptide sequence information. Performance was estimated using extensive cross validation testing. We used the structure-based predictor to scan the human proteome for ligands of 218 PDZ domains and show that the predictions correspond to known PDZ domain-peptide interactions and PPIs in curated databases. The structure-based predictor is complementary to the sequence-based predictor, finding unique known and novel PPIs, and is less dependent on training–testing domain sequence similarity. We used a functional enrichment analysis of our hits to create a predicted map of PDZ domain biology. This map highlights PDZ domain involvement in diverse biological processes, some only found by the structure-based predictor. Based on this analysis, we predict novel PDZ domain involvement in xenobiotic metabolism and suggest new interactions for other processes including wound healing and Wnt signalling. Conclusions We built a structure-based predictor of PDZ domain-peptide interactions, which can be used to scan C-terminal proteomes for PDZ interactions. We also show that the structure-based predictor finds many known PDZ mediated PPIs in human that were not found by our previous sequence-based predictor and is less dependent on

  9. Predicting Protein-Protein Interaction Sites with a Novel Membership Based Fuzzy SVM Classifier.

    PubMed

    Sriwastava, Brijesh K; Basu, Subhadip; Maulik, Ujjwal

    2015-01-01

    Predicting residues that participate in protein-protein interactions (PPI) helps to identify, which amino acids are located at the interface. In this paper, we show that the performance of the classical support vector machine (SVM) algorithm can further be improved with the use of a custom-designed fuzzy membership function, for the partner-specific PPI interface prediction problem. We evaluated the performances of both classical SVM and fuzzy SVM (F-SVM) on the PPI databases of three different model proteomes of Homo sapiens, Escherichia coli and Saccharomyces Cerevisiae and calculated the statistical significance of the developed F-SVM over classical SVM algorithm. We also compared our performance with the available state-of-the-art fuzzy methods in this domain and observed significant performance improvements. To predict interaction sites in protein complexes, local composition of amino acids together with their physico-chemical characteristics are used, where the F-SVM based prediction method exploits the membership function for each pair of sequence fragments. The average F-SVM performance (area under ROC curve) on the test samples in 10-fold cross validation experiment are measured as 77.07, 78.39, and 74.91 percent for the aforementioned organisms respectively. Performances on independent test sets are obtained as 72.09, 73.24 and 82.74 percent respectively. The software is available for free download from http://code.google.com/p/cmater-bioinfo. PMID:26684462

  10. Protein-Based Urine Test Predicts Kidney Transplant Outcomes

    MedlinePlus

    ... News Releases News Release Thursday, August 22, 2013 Protein-based urine test predicts kidney transplant outcomes NIH- ... supporting development of noninvasive tests. Levels of a protein in the urine of kidney transplant recipients can ...

  11. Structural class prediction of protein using novel feature extraction method from chaos game representation of predicted secondary structure.

    PubMed

    Zhang, Lichao; Kong, Liang; Han, Xiaodong; Lv, Jinfeng

    2016-07-01

    Protein structural class prediction plays an important role in protein structure and function analysis, drug design and many other biological applications. Extracting good representation from protein sequence is fundamental for this prediction task. In recent years, although several secondary structure based feature extraction strategies have been specially proposed for low-similarity protein sequences, the prediction accuracy still remains limited. To explore the potential of secondary structure information, this study proposed a novel feature extraction method from the chaos game representation of predicted secondary structure to mainly capture sequence order information and secondary structure segments distribution information in a given protein sequence. Several kinds of prediction accuracies obtained by the jackknife test are reported on three widely used low-similarity benchmark datasets (25PDB, 1189 and 640). Compared with the state-of-the-art prediction methods, the proposed method achieves the highest overall accuracies on all the three datasets. The experimental results confirm that the proposed feature extraction method is effective for accurate prediction of protein structural class. Moreover, it is anticipated that the proposed method could be extended to other graphical representations of protein sequence and be helpful in future research. PMID:27084358

  12. Deducing protein function by forensic integrative cell biology.

    PubMed

    Earnshaw, William C

    2013-12-01

    Our ability to sequence genomes has provided us with near-complete lists of the proteins that compose cells, tissues, and organisms, but this is only the beginning of the process to discover the functions of cellular components. In the future, it's going to be crucial to develop computational analyses that can predict the biological functions of uncharacterised proteins. At the same time, we must not forget those fundamental experimental skills needed to confirm the predictions or send the analysts back to the drawing board to devise new ones. PMID:24358025

  13. Optimizing nondecomposable loss functions in structured prediction.

    PubMed

    Ranjbar, Mani; Lan, Tian; Wang, Yang; Robinovitch, Steven N; Li, Ze-Nian; Mori, Greg

    2013-04-01

    We develop an algorithm for structured prediction with nondecomposable performance measures. The algorithm learns parameters of Markov Random Fields (MRFs) and can be applied to multivariate performance measures. Examples include performance measures such as Fβ score (natural language processing), intersection over union (object category segmentation), Precision/Recall at k (search engines), and ROC area (binary classifiers). We attack this optimization problem by approximating the loss function with a piecewise linear function. The loss augmented inference forms a Quadratic Program (QP), which we solve using LP relaxation. We apply this approach to two tasks: object class-specific segmentation and human action retrieval from videos. We show significant improvement over baseline approaches that either use simple loss functions or simple scoring functions on the PASCAL VOC and H3D Segmentation datasets, and a nursing home action recognition dataset. PMID:22868650

  14. HKC: an algorithm to predict protein complexes in protein-protein interaction networks.

    PubMed

    Wang, Xiaomin; Wang, Zhengzhi; Ye, Jun

    2011-01-01

    With the availability of more and more genome-scale protein-protein interaction (PPI) networks, research interests gradually shift to Systematic Analysis on these large data sets. A key topic is to predict protein complexes in PPI networks by identifying clusters that are densely connected within themselves but sparsely connected with the rest of the network. In this paper, we present a new topology-based algorithm, HKC, to detect protein complexes in genome-scale PPI networks. HKC mainly uses the concepts of highest k-core and cohesion to predict protein complexes by identifying overlapping clusters. The experiments on two data sets and two benchmarks show that our algorithm has relatively high F-measure and exhibits better performance compared with some other methods. PMID:22174556

  15. Functional annotation of hypothetical proteins – A review

    PubMed Central

    Sivashankari, Selvarajan; Shanmughavel, Piramanayagam

    2006-01-01

    The complete human genome sequences in the public database provide ways to understand the blue print of life. As of June 29, 2006, 27 archaeal, 326 bacterial and 21 eukaryotes is complete genomes are available and the sequencing for 316 bacterial, 24 archaeal, 126 eukaryotic genomes are in progress. The traditional biochemical/molecular experiments can assign accurate functions for genes in these genomes. However, the process is time-consuming and costly. Despite several efforts, only 50-60 % of genes have been annotated in most completely sequenced genomes. Automated genome sequence analysis and annotation may provide ways to understand genomes. Thus, determination of protein function is one of the challenging problems of the post-genome era. This demands bioinformatics to predict functions of un-annotated protein sequences by developing efficient tools. Here, we discuss some of the recent and popular approaches developed in Bioinformatics to predict functions for hypothetical proteins. PMID:17597916

  16. Characterization and Functionality of Corn Germ Proteins

    Technology Transfer Automated Retrieval System (TEKTRAN)

    This study was conducted to evaluate the functional properties of protein extracted from wet-milled corn germ and identify potential applications of the recovered protein. Corn germ comprises 12% of the total weight of normal dent corn and about 29% of the corn protein (moisture-free and oil- free ...

  17. The Alba protein family: Structure and function.

    PubMed

    Goyal, Manish; Banerjee, Chinmoy; Nag, Shiladitya; Bandyopadhyay, Uday

    2016-05-01

    Alba family proteins are small, basic, dimeric nucleic acid-binding proteins, which are widely distributed in archaea and a number of eukaryotes. This family of proteins bears the distinct features of regulation through acetylation/deacetylation, hence named as acetylation lowers binding affinity (Alba). Alba family proteins bind DNA cooperatively with no apparent sequence specificity. Besides DNA, Alba proteins also interact with diverse RNA species and associate with ribonucleo-protein complexes. Initially, Alba proteins were recognized as chromosomal proteins and supposed to be involved in the maintenance of chromatin architecture and transcription repression. However, recent studies have shown increasing evidence of functional plasticity among Alba family of proteins that widely range from genome packaging and organization, transcriptional and translational regulation, RNA metabolism, and development and differentiation processes. In recent years, Alba family proteins have attracted growing interest due to their widespread occurrence in large number of organisms. Presence in multiple copies, functional crosstalk, differential binding affinity, and posttranslational modifications are some of the key factors that might regulate the biological functions of Alba family proteins. In this review article, we present an overview of the Alba family proteins, their salient features and emphasize their functional role in different organisms reported so far. PMID:26900088

  18. Essential protein identification based on essential protein-protein interaction prediction by Integrated Edge Weights.

    PubMed

    Jiang, Yuexu; Wang, Yan; Pang, Wei; Chen, Liang; Sun, Huiyan; Liang, Yanchun; Blanzieri, Enrico

    2015-07-15

    Essential proteins play a crucial role in cellular survival and development process. Experimentally, essential proteins are identified by gene knockouts or RNA interference, which are expensive and often fatal to the target organisms. Regarding this, an alternative yet important approach to essential protein identification is through computational prediction. Existing computational methods predict essential proteins based on their relative densities in a protein-protein interaction (PPI) network. Degree, betweenness, and other appropriate criteria are often used to measure the relative density. However, no matter what criterion is used, a protein is actually ordered by the attributes of this protein per se. In this research, we presented a novel computational method, Integrated Edge Weights (IEW), to first rank protein-protein interactions by integrating their edge weights, and then identified sub PPI networks consisting of those highly-ranked edges, and finally regarded the nodes in these sub networks as essential proteins. We evaluated IEW on three model organisms: Saccharomyces cerevisiae (S. cerevisiae), Escherichia coli (E. coli), and Caenorhabditis elegans (C. elegans). The experimental results showed that IEW achieved better performance than the state-of-the-art methods in terms of precision-recall and Jackknife measures. We had also demonstrated that IEW is a robust and effective method, which can retrieve biologically significant modules by its highly-ranked protein-protein interactions for S. cerevisiae, E. coli, and C. elegans. We believe that, with sufficient data provided, IEW can be used to any other organisms' essential protein identification. A website about IEW can be accessed from http://digbio.missouri.edu/IEW/index.html. PMID:25892709

  19. Computational Studies of Membrane Proteins: Models and Predictions for Biological Understanding

    PubMed Central

    Liang, Jie; Naveed, Hammad; Jimenez-Morales, David; Adamian, Larisa; Lin, Meishan

    2013-01-01

    We discuss recent progresses in computational studies of membrane proteins based on physical models with parameters derived from bioinformatics analysis. We describe computational identification of membrane proteins and prediction of their topology from sequence, discovery of sequence and spatial motifs, and implications of these discoveries. The detection of evolutionary signal for understanding the substitution pattern of residues in the TM segments and for sequence alignment are also discussed. We further discuss empirical potential functions for energetics of inserting residues in the TM domain, for interactions between TM helices or strands, and their applications in predicting lipid-facing surfaces of the TM domain. Recent progresses in structure predictions of membrane proteins are also reviewed, with further discussions on calculation of ensemble properties such as melting temperature based on simplified state space model. Additional topics include prediction of oligomerization state of membrane proteins, identification of the interfaces for protein-protein interactions, and design of membrane proteins. PMID:22051023

  20. A novel feature extraction scheme with ensemble coding for protein-protein interaction prediction.

    PubMed

    Du, Xiuquan; Cheng, Jiaxing; Zheng, Tingting; Duan, Zheng; Qian, Fulan

    2014-01-01

    Protein-protein interactions (PPIs) play key roles in most cellular processes, such as cell metabolism, immune response, endocrine function, DNA replication, and transcription regulation. PPI prediction is one of the most challenging problems in functional genomics. Although PPI data have been increasing because of the development of high-throughput technologies and computational methods, many problems are still far from being solved. In this study, a novel predictor was designed by using the Random Forest (RF) algorithm with the ensemble coding (EC) method. To reduce computational time, a feature selection method (DX) was adopted to rank the features and search the optimal feature combination. The DXEC method integrates many features and physicochemical/biochemical properties to predict PPIs. On the Gold Yeast dataset, the DXEC method achieves 67.2% overall precision, 80.74% recall, and 70.67% accuracy. On the Silver Yeast dataset, the DXEC method achieves 76.93% precision, 77.98% recall, and 77.27% accuracy. On the human dataset, the prediction accuracy reaches 80% for the DXEC-RF method. We extended the experiment to a bigger and more realistic dataset that maintains 50% recall on the Yeast All dataset and 80% recall on the Human All dataset. These results show that the DXEC method is suitable for performing PPI prediction. The prediction service of the DXEC-RF classifier is available at http://ailab.ahu.edu.cn:8087/ DXECPPI/index.jsp. PMID:25046746

  1. Phosphoinositide Control of Membrane Protein Function

    PubMed Central

    Logothetis, Diomedes E.; Petrou, Vasileios I.; Zhang, Miao; Mahajan, Rahul; Meng, Xuan-Yu; Adney, Scott K.; Cui, Meng; Baki, Lia

    2015-01-01

    Anionic phospholipids are critical constituents of the inner leaflet of the plasma membrane, ensuring appropriate membrane topology of transmembrane proteins. Additionally, in eukaryotes, the negatively charged phosphoinositides serve as key signals not only through their hydrolysis products but also through direct control of transmembrane protein function. Direct phosphoinositide control of the activity of ion channels and transporters has been the most convincing case of the critical importance of phospholipid-protein interactions in the functional control of membrane proteins. Furthermore, second messengers, such as [Ca2+]i, or posttranslational modifications, such as phosphorylation, can directly or allosterically fine-tune phospholipid-protein interactions and modulate activity. Recent advances in structure determination of membrane proteins have allowed investigators to obtain complexes of ion channels with phosphoinositides and to use computational and experimental approaches to probe the dynamic mechanisms by which lipid-protein interactions control active and inactive protein states. PMID:25293526

  2. Predict and Analyze Protein Glycation Sites with the mRMR and IFS Methods

    PubMed Central

    Gu, Wenxiang; Zhang, Wenyi; Wang, Jianan

    2015-01-01

    Glycation is a nonenzymatic process in which proteins react with reducing sugar molecules. The identification of glycation sites in protein may provide guidelines to understand the biological function of protein glycation. In this study, we developed a computational method to predict protein glycation sites by using the support vector machine classifier. The experimental results showed that the prediction accuracy was 85.51% and an overall MCC was 0.70. Feature analysis indicated that the composition of k-spaced amino acid pairs feature contributed the most for glycation sites prediction. PMID:25961025

  3. Predicting Abdominal Aortic Aneurysm Target Genes by Level-2 Protein-Protein Interaction

    PubMed Central

    Fu, Yi; Cui, Qinghua; Kong, Wei

    2015-01-01

    Abdominal aortic aneurysm (AAA) is frequently lethal and has no effective pharmaceutical treatment, posing a great threat to human health. Previous bioinformatics studies of the mechanisms underlying AAA relied largely on the detection of direct protein-protein interactions (level-1 PPI) between the products of reported AAA-related genes. Thus, some proteins not suspected to be directly linked to previously reported genes of pivotal importance to AAA might have been missed. In this study, we constructed an indirect protein-protein interaction (level-2 PPI) network based on common interacting proteins encoded by known AAA-related genes and successfully predicted previously unreported AAA-related genes using this network. We used four methods to test and verify the performance of this level-2 PPI network: cross validation, human AAA mRNA chip array comparison, literature mining, and verification in a mouse CaPO4 AAA model. We confirmed that the new level-2 PPI network is superior to the original level-1 PPI network and proved that the top 100 candidate genes predicted by the level-2 PPI network shared similar GO functions and KEGG pathways compared with positive genes. PMID:26496478

  4. Co-evolution analysis to predict protein-protein interactions within influenza virus envelope.

    PubMed

    Mintaev, Ramil R; Alexeevski, Andrei V; Kordyukova, Larisa V

    2014-04-01

    Interactions between integral membrane proteins hemagglutinin (HA), neuraminidase (NA), M2 and membrane-associated matrix protein M1 of influenza A virus are thought to be crucial for assembly of functionally competent virions. We hypothesized that the amino acid residues located at the interface of two different proteins are under physical constraints and thus probably co-evolve. To predict co-evolving residue pairs, the EvFold ( http://evfold.org ) program searching the (nontransitive) Direct Information scores was applied for large samplings of amino acid sequences from Influenza Research Database ( http://www.fludb.org/ ). Having focused on the HA, NA, and M2 cytoplasmic tails as well as C-terminal domain of M1 (being the less conserved among the protein domains) we captured six pairs of correlated positions. Among them, there were one, two, and three position pairs for HA-M2, HA-M1, and M2-M1 protein pairs, respectively. As expected, no co-varying positions were found for NA-HA, NA-M1, and NA-M2 pairs obviously due to high conservation of the NA cytoplasmic tail. The sum of frequencies calculated for two major amino acid patterns observed in pairs of correlated positions was up to 0.99 meaning their high to extreme evolutionary sustainability. Based on the predictions a hypothetical model of pair-wise protein interactions within the viral envelope was proposed. PMID:24712535

  5. Prediction of protein-protein interaction sites from weakly homologous template structures using meta-threading and machine learning.

    PubMed

    Maheshwari, Surabhi; Brylinski, Michal

    2015-01-01

    The identification of protein-protein interactions is vital for understanding protein function, elucidating interaction mechanisms, and for practical applications in drug discovery. With the exponentially growing protein sequence data, fully automated computational methods that predict interactions between proteins are becoming essential components of system-level function inference. A thorough analysis of protein complex structures demonstrated that binding site locations as well as the interfacial geometry are highly conserved across evolutionarily related proteins. Because the conformational space of protein-protein interactions is highly covered by experimental structures, sensitive protein threading techniques can be used to identify suitable templates for the accurate prediction of interfacial residues. Toward this goal, we developed eFindSite(PPI) , an algorithm that uses the three-dimensional structure of a target protein, evolutionarily remotely related templates and machine learning techniques to predict binding residues. Using crystal structures, the average sensitivity (specificity) of eFindSite(PPI) in interfacial residue prediction is 0.46 (0.92). For weakly homologous protein models, these values only slightly decrease to 0.40-0.43 (0.91-0.92) demonstrating that eFindSite(PPI) performs well not only using experimental data but also tolerates structural imperfections in computer-generated structures. In addition, eFindSite(PPI) detects specific molecular interactions at the interface; for instance, it correctly predicts approximately one half of hydrogen bonds and aromatic interactions, as well as one third of salt bridges and hydrophobic contacts. Comparative benchmarks against several dimer datasets show that eFindSite(PPI) outperforms other methods for protein-binding residue prediction. It also features a carefully tuned confidence estimation system, which is particularly useful in large-scale applications using raw genomic data. eFindSite(PPI) is

  6. Protein microarrays as tools for functional proteomics.

    PubMed

    LaBaer, Joshua; Ramachandran, Niroshan

    2005-02-01

    Protein microarrays present an innovative and versatile approach to study protein abundance and function at an unprecedented scale. Given the chemical and structural complexity of the proteome, the development of protein microarrays has been challenging. Despite these challenges there has been a marked increase in the use of protein microarrays to map interactions of proteins with various other molecules, and to identify potential disease biomarkers, especially in the area of cancer biology. In this review, we discuss some of the promising advances made in the development and use of protein microarrays. PMID:15701447

  7. Local functional descriptors for surface comparison based binding prediction

    PubMed Central

    2012-01-01

    Background Molecular recognition in proteins occurs due to appropriate arrangements of physical, chemical, and geometric properties of an atomic surface. Similar surface regions should create similar binding interfaces. Effective methods for comparing surface regions can be used in identifying similar regions, and to predict interactions without regard to the underlying structural scaffold that creates the surface. Results We present a new descriptor for protein functional surfaces and algorithms for using these descriptors to compare protein surface regions to identify ligand binding interfaces. Our approach uses descriptors of local regions of the surface, and assembles collections of matches to compare larger regions. Our approach uses a variety of physical, chemical, and geometric properties, adaptively weighting these properties as appropriate for different regions of the interface. Our approach builds a classifier based on a training corpus of examples of binding sites of the target ligand. The constructed classifiers can be applied to a query protein providing a probability for each position on the protein that the position is part of a binding interface. We demonstrate the effectiveness of the approach on a number of benchmarks, demonstrating performance that is comparable to the state-of-the-art, with an approach with more generality than these prior methods. Conclusions Local functional descriptors offer a new method for protein surface comparison that is sufficiently flexible to serve in a variety of applications. PMID:23176080

  8. RBO Aleph: leveraging novel information sources for protein structure prediction

    PubMed Central

    Mabrouk, Mahmoud; Putz, Ines; Werner, Tim; Schneider, Michael; Neeb, Moritz; Bartels, Philipp; Brock, Oliver

    2015-01-01

    RBO Aleph is a novel protein structure prediction web server for template-based modeling, protein contact prediction and ab initio structure prediction. The server has a strong emphasis on modeling difficult protein targets for which templates cannot be detected. RBO Aleph's unique features are (i) the use of combined evolutionary and physicochemical information to perform residue–residue contact prediction and (ii) leveraging this contact information effectively in conformational space search. RBO Aleph emerged as one of the leading approaches to ab initio protein structure prediction and contact prediction during the most recent Critical Assessment of Protein Structure Prediction experiment (CASP11, 2014). In addition to RBO Aleph's main focus on ab initio modeling, the server also provides state-of-the-art template-based modeling services. Based on template availability, RBO Aleph switches automatically between template-based modeling and ab initio prediction based on the target protein sequence, facilitating use especially for non-expert users. The RBO Aleph web server offers a range of tools for visualization and data analysis, such as the visualization of predicted models, predicted contacts and the estimated prediction error along the model's backbone. The server is accessible at http://compbio.robotics.tu-berlin.de/rbo_aleph/. PMID:25897112

  9. RBO Aleph: leveraging novel information sources for protein structure prediction.

    PubMed

    Mabrouk, Mahmoud; Putz, Ines; Werner, Tim; Schneider, Michael; Neeb, Moritz; Bartels, Philipp; Brock, Oliver

    2015-07-01

    RBO Aleph is a novel protein structure prediction web server for template-based modeling, protein contact prediction and ab initio structure prediction. The server has a strong emphasis on modeling difficult protein targets for which templates cannot be detected. RBO Aleph's unique features are (i) the use of combined evolutionary and physicochemical information to perform residue-residue contact prediction and (ii) leveraging this contact information effectively in conformational space search. RBO Aleph emerged as one of the leading approaches to ab initio protein structure prediction and contact prediction during the most recent Critical Assessment of Protein Structure Prediction experiment (CASP11, 2014). In addition to RBO Aleph's main focus on ab initio modeling, the server also provides state-of-the-art template-based modeling services. Based on template availability, RBO Aleph switches automatically between template-based modeling and ab initio prediction based on the target protein sequence, facilitating use especially for non-expert users. The RBO Aleph web server offers a range of tools for visualization and data analysis, such as the visualization of predicted models, predicted contacts and the estimated prediction error along the model's backbone. The server is accessible at http://compbio.robotics.tu-berlin.de/rbo_aleph/. PMID:25897112

  10. The unexpected structure of the designed protein Octarellin V.1 forms a challenge for protein structure prediction tools.

    PubMed

    Figueroa, Maximiliano; Sleutel, Mike; Vandevenne, Marylene; Parvizi, Gregory; Attout, Sophie; Jacquin, Olivier; Vandenameele, Julie; Fischer, Axel W; Damblon, Christian; Goormaghtigh, Erik; Valerio-Lepiniec, Marie; Urvoas, Agathe; Durand, Dominique; Pardon, Els; Steyaert, Jan; Minard, Philippe; Maes, Dominique; Meiler, Jens; Matagne, André; Martial, Joseph A; Van de Weerdt, Cécile

    2016-07-01

    Despite impressive successes in protein design, designing a well-folded protein of more 100 amino acids de novo remains a formidable challenge. Exploiting the promising biophysical features of the artificial protein Octarellin V, we improved this protein by directed evolution, thus creating a more stable and soluble protein: Octarellin V.1. Next, we obtained crystals of Octarellin V.1 in complex with crystallization chaperons and determined the tertiary structure. The experimental structure of Octarellin V.1 differs from its in silico design: the (αβα) sandwich architecture bears some resemblance to a Rossman-like fold instead of the intended TIM-barrel fold. This surprising result gave us a unique and attractive opportunity to test the state of the art in protein structure prediction, using this artificial protein free of any natural selection. We tested 13 automated webservers for protein structure prediction and found none of them to predict the actual structure. More than 50% of them predicted a TIM-barrel fold, i.e. the structure we set out to design more than 10years ago. In addition, local software runs that are human operated can sample a structure similar to the experimental one but fail in selecting it, suggesting that the scoring and ranking functions should be improved. We propose that artificial proteins could be used as tools to test the accuracy of protein structure prediction algorithms, because their lack of evolutionary pressure and unique sequences features. PMID:27181418

  11. Prediction of RNA binding proteins comes of age from low resolution to high resolution.

    PubMed

    Zhao, Huiying; Yang, Yuedong; Zhou, Yaoqi

    2013-10-01

    Networks of protein-RNA interactions is likely to be larger than protein-protein and protein-DNA interaction networks because RNA transcripts are encoded tens of times more than proteins (e.g. only 3% of human genome coded for proteins), have diverse function and localization, and are controlled by proteins from birth (transcription) to death (degradation). This massive network is evidenced by several recent experimental discoveries of large numbers of previously unknown RNA-binding proteins (RBPs). Meanwhile, more than 400 non-redundant protein-RNA complex structures (at 25% sequence identity or less) have been deposited into the protein databank. These sequences and structural resources for RBPs provide ample data for the development of computational techniques dedicated to RBP prediction, as experimentally determining RNA-binding functions is time-consuming and expensive. This review compares traditional machine-learning based approaches with emerging template-based methods at several levels of prediction resolution ranging from two-state binding/non-binding prediction, to binding residue prediction and protein-RNA complex structure prediction. The analysis indicates that the two approaches are complementary and their combinations may lead to further improvements. PMID:23872922

  12. Molecular dynamics and protein function

    PubMed Central

    Karplus, M.; Kuriyan, J.

    2005-01-01

    A fundamental appreciation for how biological macromolecules work requires knowledge of structure and dynamics. Molecular dynamics simulations provide powerful tools for the exploration of the conformational energy landscape accessible to these molecules, and the rapid increase in computational power coupled with improvements in methodology makes this an exciting time for the application of simulation to structural biology. In this Perspective we survey two areas, protein folding and enzymatic catalysis, in which simulations have contributed to a general understanding of mechanism. We also describe results for the F1 ATPase molecular motor and the Src family of signaling proteins as examples of applications of simulations to specific biological systems. PMID:15870208

  13. Qualitative and Quantitative Protein Complex Prediction Through Proteome-Wide Simulations.

    PubMed

    Rizzetto, Simone; Priami, Corrado; Csikász-Nagy, Attila

    2015-10-01

    Despite recent progress in proteomics most protein complexes are still unknown. Identification of these complexes will help us understand cellular regulatory mechanisms and support development of new drugs. Therefore it is really important to establish detailed information about the composition and the abundance of protein complexes but existing algorithms can only give qualitative predictions. Herein, we propose a new approach based on stochastic simulations of protein complex formation that integrates multi-source data--such as protein abundances, domain-domain interactions and functional annotations--to predict alternative forms of protein complexes together with their abundances. This method, called SiComPre (Simulation based Complex Prediction), achieves better qualitative prediction of yeast and human protein complexes than existing methods and is the first to predict protein complex abundances. Furthermore, we show that SiComPre can be used to predict complexome changes upon drug treatment with the example of bortezomib. SiComPre is the first method to produce quantitative predictions on the abundance of molecular complexes while performing the best qualitative predictions. With new data on tissue specific protein complexes becoming available SiComPre will be able to predict qualitative and quantitative differences in the complexome in various tissue types and under various conditions. PMID:26492574

  14. Protein-protein structure prediction by scoring molecular dynamics trajectories of putative poses.

    PubMed

    Sarti, Edoardo; Gladich, Ivan; Zamuner, Stefano; Correia, Bruno E; Laio, Alessandro

    2016-09-01

    The prediction of protein-protein interactions and their structural configuration remains a largely unsolved problem. Most of the algorithms aimed at finding the native conformation of a protein complex starting from the structure of its monomers are based on searching the structure corresponding to the global minimum of a suitable scoring function. However, protein complexes are often highly flexible, with mobile side chains and transient contacts due to thermal fluctuations. Flexibility can be neglected if one aims at finding quickly the approximate structure of the native complex, but may play a role in structure refinement, and in discriminating solutions characterized by similar scores. We here benchmark the capability of some state-of-the-art scoring functions (BACH-SixthSense, PIE/PISA and Rosetta) in discriminating finite-temperature ensembles of structures corresponding to the native state and to non-native configurations. We produce the ensembles by running thousands of molecular dynamics simulations in explicit solvent starting from poses generated by rigid docking and optimized in vacuum. We find that while Rosetta outperformed the other two scoring functions in scoring the structures in vacuum, BACH-SixthSense and PIE/PISA perform better in distinguishing near-native ensembles of structures generated by molecular dynamics in explicit solvent. Proteins 2016; 84:1312-1320. © 2016 Wiley Periodicals, Inc. PMID:27253756

  15. Sucrose Synthase: Expanding Protein Function

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Sucrose synthase (SUS: EC 2.4.1.13), a key enzyme in plant sucrose catabolism, is uniquely able to mobilize sucrose into multiple pathways involved in metabolic, structural, and storage functions. Our research indicates that the biological function of SUS may extend beyond its catalytic activity. Th...

  16. Predictable tuning of protein expression in bacteria.

    PubMed

    Bonde, Mads T; Pedersen, Margit; Klausen, Michael S; Jensen, Sheila I; Wulff, Tune; Harrison, Scott; Nielsen, Alex T; Herrgård, Markus J; Sommer, Morten O A

    2016-03-01

    We comprehensively assessed the contribution of the Shine-Dalgarno sequence to protein expression and used the data to develop EMOPEC (Empirical Model and Oligos for Protein Expression Changes; http://emopec.biosustain.dtu.dk). EMOPEC is a free tool that makes it possible to modulate the expression level of any Escherichia coli gene by changing only a few bases. Measured protein levels for 91% of our designed sequences were within twofold of the desired target level. PMID:26752768

  17. Architecture and Function of Mechanosensitive Membrane Protein Lattices

    PubMed Central

    Kahraman, Osman; Koch, Peter D.; Klug, William S.; Haselwandter, Christoph A.

    2016-01-01

    Experiments have revealed that membrane proteins can form two-dimensional clusters with regular translational and orientational protein arrangements, which may allow cells to modulate protein function. However, the physical mechanisms yielding supramolecular organization and collective function of membrane proteins remain largely unknown. Here we show that bilayer-mediated elastic interactions between membrane proteins can yield regular and distinctive lattice architectures of protein clusters, and may provide a link between lattice architecture and lattice function. Using the mechanosensitive channel of large conductance (MscL) as a model system, we obtain relations between the shape of MscL and the supramolecular architecture of MscL lattices. We predict that the tetrameric and pentameric MscL symmetries observed in previous structural studies yield distinct lattice architectures of MscL clusters and that, in turn, these distinct MscL lattice architectures yield distinct lattice activation barriers. Our results suggest general physical mechanisms linking protein symmetry, the lattice architecture of membrane protein clusters, and the collective function of membrane protein lattices. PMID:26771082

  18. Protein structure prediction and analysis using the Robetta server

    PubMed Central

    Kim, David E.; Chivian, Dylan; Baker, David

    2004-01-01

    The Robetta server (http://robetta.bakerlab.org) provides automated tools for protein structure prediction and analysis. For structure prediction, sequences submitted to the server are parsed into putative domains and structural models are generated using either comparative modeling or de novo structure prediction methods. If a confident match to a protein of known structure is found using BLAST, PSI-BLAST, FFAS03 or 3D-Jury, it is used as a template for comparative modeling. If no match is found, structure predictions are made using the de novo Rosetta fragment insertion method. Experimental nuclear magnetic resonance (NMR) constraints data can also be submitted with a query sequence for RosettaNMR de novo structure determination. Other current capabilities include the prediction of the effects of mutations on protein–protein interactions using computational interface alanine scanning. The Rosetta protein design and protein–protein docking methodologies will soon be available through the server as well. PMID:15215442

  19. Accuracy of functional surfaces on comparatively modeled protein structures

    PubMed Central

    Zhao, Jieling; Dundas, Joe; Kachalo, Sema; Ouyang, Zheng; Liang, Jie

    2012-01-01

    Identification and characterization of protein functional surfaces are important for predicting protein function, understanding enzyme mechanism, and docking small compounds to proteins. As the rapid speed of accumulation of protein sequence information far exceeds that of structures, constructing accurate models of protein functional surfaces and identify their key elements become increasingly important. A promising approach is to build comparative models from sequences using known structural templates such as those obtained from structural genome projects. Here we assess how well this approach works in modeling binding surfaces. By systematically building three-dimensional comparative models of proteins using Modeller, we determine how well functional surfaces can be accurately reproduced. We use an alpha shape based pocket algorithm to compute all pockets on the modeled structures, and conduct a large-scale computation of similarity measurements (pocket RMSD and fraction of functional atoms captured) for 26,590 modeled enzyme protein structures. Overall, we find that when the sequence fragment of the binding surfaces has more than 45% identity to that of the tempalte protein, the modeled surfaces have on average an RMSD of 0.5 Å, and contain 48% or more of the binding surface atoms, with nearly all of the important atoms in the signatures of binding pockets captured. PMID:21541664

  20. Structure-Templated Predictions of Novel Protein Interactions from Sequence Information

    PubMed Central

    Betel, Doron; Breitkreuz, Kevin E; Isserlin, Ruth; Dewar-Darch, Danielle; Tyers, Mike; Hogue, Christopher W. V

    2007-01-01

    The multitude of functions performed in the cell are largely controlled by a set of carefully orchestrated protein interactions often facilitated by specific binding of conserved domains in the interacting proteins. Interacting domains commonly exhibit distinct binding specificity to short and conserved recognition peptides called binding profiles. Although many conserved domains are known in nature, only a few have well-characterized binding profiles. Here, we describe a novel predictive method known as domain–motif interactions from structural topology (D-MIST) for elucidating the binding profiles of interacting domains. A set of domains and their corresponding binding profiles were derived from extant protein structures and protein interaction data and then used to predict novel protein interactions in yeast. A number of the predicted interactions were verified experimentally, including new interactions of the mitotic exit network, RNA polymerases, nucleotide metabolism enzymes, and the chaperone complex. These results demonstrate that new protein interactions can be predicted exclusively from sequence information. PMID:17892321

  1. Enhanced Inter-helical Residue Contact Prediction in Transmembrane Proteins

    PubMed Central

    Wei, Y.; Floudas, C. A.

    2011-01-01

    In this paper, based on a recent work by McAllister and Floudas who developed a mathematical optimization model to predict the contacts in transmembrane alpha-helical proteins from a limited protein data set [1], we have enhanced this method by 1) building a more comprehensive data set for transmembrane alpha-helical proteins and this enhanced data set is then used to construct the probability sets, MIN-1N and MIN-2N, for residue contact prediction, 2) enhancing the mathematical model via modifications of several important physical constraints and 3) applying a new blind contact prediction scheme on different protein sets proposed from analyzing the contact prediction on 65 proteins from Fuchs et al. [2]. The blind contact prediction scheme has been tested on two different membrane protein sets. Firstly it is applied to five carefully selected proteins from the training set. The contact prediction of these five proteins uses probability sets built by excluding the target protein from the training set, and an average accuracy of 56% was obtained. Secondly, it is applied to six independent membrane proteins with complicated topologies, and the prediction accuracies are 73% for 2ZY9A, 21% for 3KCUA, 46% for 2W1PA, 64% for 3CN5A, 77% for 3IXZA and 83% for 3K3FA. The average prediction accuracy for the six proteins is 60.7%. The proposed approach is also compared with a support vector machine method (TMhit [3]) and it is shown that it exhibits better prediction accuracy. PMID:21892227

  2. Predictability of gene ontology slim-terms from primary structure information in Embryophyta plant proteins

    PubMed Central

    2013-01-01

    Background Proteins are the key elements on the path from genetic information to the development of life. The roles played by the different proteins are difficult to uncover experimentally as this process involves complex procedures such as genetic modifications, injection of fluorescent proteins, gene knock-out methods and others. The knowledge learned from each protein is usually annotated in databases through different methods such as the proposed by The Gene Ontology (GO) consortium. Different methods have been proposed in order to predict GO terms from primary structure information, but very few are available for large-scale functional annotation of plants, and reported success rates are much less than the reported by other non-plant predictors. This paper explores the predictability of GO annotations on proteins belonging to the Embryophyta group from a set of features extracted solely from their primary amino acid sequence. Results High predictability of several GO terms was found for Molecular Function and Cellular Component. As expected, a lower degree of predictability was found on Biological Process ontology annotations, although a few biological processes were easily predicted. Proteins related to transport and transcription were particularly well predicted from primary structure information. The most discriminant features for prediction were those related to electric charges of the amino-acid sequence and hydropathicity derived features. Conclusions An analysis of GO-slim terms predictability in plants was carried out, in order to determine single categories or groups of functions that are most related with primary structure information. For each highly predictable GO term, the responsible features of such successfulness were identified and discussed. In addition to most published studies, focused on few categories or single ontologies, results in this paper comprise a complete landscape of GO predictability from primary structure encompassing 75 GO

  3. The 82-plex plasma protein signature that predicts increasing inflammation

    PubMed Central

    Tepel, Martin; Beck, Hans C.; Tan, Qihua; Borst, Christoffer; Rasmussen, Lars M.

    2015-01-01

    The objective of the study was to define the specific plasma protein signature that predicts the increase of the inflammation marker C-reactive protein from index day to next-day using proteome analysis and novel bioinformatics tools. We performed a prospective study of 91 incident kidney transplant recipients and quantified 359 plasma proteins simultaneously using nano-Liquid-Chromatography-Tandem Mass-Spectrometry in individual samples and plasma C-reactive protein on the index day and the next day. Next-day C-reactive protein increased in 59 patients whereas it decreased in 32 patients. The prediction model selected and validated 82 plasma proteins which determined increased next-day C-reactive protein (area under receiver-operator-characteristics curve, 0.772; 95% confidence interval, 0.669 to 0.876; P < 0.0001). Multivariable logistic regression showed that 82-plex protein signature (P < 0.001) was associated with observed increased next-day C-reactive protein. The 82-plex protein signature outperformed routine clinical procedures. The category-free net reclassification index improved with 82-plex plasma protein signature (total net reclassification index, 88.3%). Using the 82-plex plasma protein signature increased net reclassification index with a clinical meaningful 10% increase of risk mainly by the improvement of reclassification of subjects in the event group. An 82-plex plasma protein signature predicts an increase of the inflammatory marker C-reactive protein. PMID:26445912

  4. Protein function from its emergence to diversity in contemporary proteins

    NASA Astrophysics Data System (ADS)

    Goncearenco, Alexander; Berezovsky, Igor N.

    2015-07-01

    The goal of this work is to learn from nature the rules that govern evolution and the design of protein function. The fundamental laws of physics lie in the foundation of the protein structure and all stages of the protein evolution, determining optimal sizes and shapes at different levels of structural hierarchy. We looked back into the very onset of the protein evolution with a goal to find elementary functions (EFs) that came from the prebiotic world and served as building blocks of the first enzymes. We defined the basic structural and functional units of biochemical reactions—elementary functional loops. The diversity of contemporary enzymes can be described via combinations of a limited number of elementary chemical reactions, many of which are performed by the descendants of primitive prebiotic peptides/proteins. By analyzing protein sequences we were able to identify EFs shared by seemingly unrelated protein superfamilies and folds and to unravel evolutionary relations between them. Binding and metabolic processing of the metal- and nucleotide-containing cofactors and ligands are among the most abundant ancient EFs that became indispensable in many natural enzymes. Highly designable folds provide structural scaffolds for many different biochemical reactions. We show that contemporary proteins are built from a limited number of EFs, making their analysis instrumental for establishing the rules for protein design. Evolutionary studies help us to accumulate the library of essential EFs and to establish intricate relations between different folds and functional superfamilies. Generalized sequence-structure descriptors of the EF will become useful in future design and engineering of desired enzymatic functions.

  5. Novel 3D bio-macromolecular bilinear descriptors for protein science: Predicting protein structural classes.

    PubMed

    Marrero-Ponce, Yovani; Contreras-Torres, Ernesto; García-Jacas, César R; Barigye, Stephen J; Cubillán, Néstor; Alvarado, Ysaías J

    2015-06-01

    In the present study, we introduce novel 3D protein descriptors based on the bilinear algebraic form in the ℝ(n) space on the coulombic matrix. For the calculation of these descriptors, macromolecular vectors belonging to ℝ(n) space, whose components represent certain amino acid side-chain properties, were used as weighting schemes. Generalization approaches for the calculation of inter-amino acidic residue spatial distances based on Minkowski metrics are proposed. The simple- and double-stochastic schemes were defined as approaches to normalize the coulombic matrix. The local-fragment indices for both amino acid-types and amino acid-groups are presented in order to permit characterizing fragments of interest in proteins. On the other hand, with the objective of taking into account specific interactions among amino acids in global or local indices, geometric and topological cut-offs are defined. To assess the utility of global and local indices a classification model for the prediction of the major four protein structural classes, was built with the Linear Discriminant Analysis (LDA) technique. The developed LDA-model correctly classifies the 92.6% and 92.7% of the proteins on the training and test sets, respectively. The obtained model showed high values of the generalized square correlation coefficient (GC(2)) on both the training and test series. The statistical parameters derived from the internal and external validation procedures demonstrate the robustness, stability and the high predictive power of the proposed model. The performance of the LDA-model demonstrates the capability of the proposed indices not only to codify relevant biochemical information related to the structural classes of proteins, but also to yield suitable interpretability. It is anticipated that the current method will benefit the prediction of other protein attributes or functions. PMID:25843214

  6. Executive functions predict conceptual learning of science.

    PubMed

    Rhodes, Sinéad M; Booth, Josephine N; Palmer, Lorna Elise; Blythe, Richard A; Delibegovic, Mirela; Wheate, Nial J

    2016-06-01

    We examined the relationship between executive functions and both factual and conceptual learning of science, specifically chemistry, in early adolescence. Sixty-three pupils in their second year of secondary school (aged 12-13 years) participated. Pupils completed tasks of working memory (Spatial Working Memory), inhibition (Stop-Signal), attention set-shifting (ID/ED), and planning (Stockings of Cambridge), from the CANTAB. They also participated in a chemistry teaching session, practical, and assessment on the topic of acids and alkalis designed specifically for this study. Executive function data were related to (1) the chemistry assessment which included aspects of factual and conceptual learning and (2) a recent school science exam. Correlational analyses between executive functions and both the chemistry assessment and science grades revealed that science achievements were significantly correlated with working memory. Linear regression analysis revealed that visuospatial working memory ability was predictive of chemistry performance. Interestingly, this relationship was observed solely in relation to the conceptual learning condition of the assessment highlighting the role of executive functions in understanding and applying knowledge about what is learned within science teaching. PMID:26751597

  7. Proteins: sequence to structure and function--current status.

    PubMed

    Shenoy, Sandhya R; Jayaram, B

    2010-11-01

    In an era that has been dominated by Structural Biology for the last 30-40 years, a dramatic change of focus towards sequence analysis has spurred the advent of the genome projects and the resultant diverging sequence/structure deficit. The central challenge of Computational Structural Biology is therefore to rationalize the mass of sequence information into biochemical and biophysical knowledge and to decipher the structural, functional and evolutionary clues encoded in the language of biological sequences. In investigating the meaning of sequences, two distinct analytical themes have emerged: in the first approach, pattern recognition techniques are used to detect similarity between sequences and hence to infer related structures and functions; in the second ab initio prediction methods are used to deduce 3D structure, and ultimately to infer function, directly from the linear sequence. In this article, we attempt to provide a critical assessment of what one may and may not expect from the biological sequences and to identify major issues yet to be resolved. The presentation is organized under several subtitles like protein sequences, pattern recognition techniques, protein tertiary structure prediction, membrane protein bioinformatics, human proteome, protein-protein interactions, metabolic networks, potential drug targets based on simple sequence properties, disordered proteins, the sequence-structure relationship and chemical logic of protein sequences. PMID:20887265

  8. Genome-scale phylogenetic function annotation of large and diverse protein families

    PubMed Central

    Engelhardt, Barbara E.; Jordan, Michael I.; Srouji, John R.; Brenner, Steven E.

    2011-01-01

    The Statistical Inference of Function Through Evolutionary Relationships (SIFTER) framework uses a statistical graphical model that applies phylogenetic principles to automate precise protein function prediction. Here we present a revised approach (SIFTER version 2.0) that enables annotations on a genomic scale. SIFTER 2.0 produces equivalently precise predictions compared to the earlier version on a carefully studied family and on a collection of 100 protein families. We have added an approximation method to SIFTER 2.0 and show a 500-fold improvement in speed with minimal impact on prediction results in the functionally diverse sulfotransferase protein family. On the Nudix protein family, previously inaccessible to the SIFTER framework because of the 66 possible molecular functions, SIFTER achieved 47.4% accuracy on experimental data (where BLAST achieved 34.0%). Finally, we used SIFTER to annotate all of the Schizosaccharomyces pombe proteins with experimental functional characterizations, based on annotations from proteins in 46 fungal genomes. SIFTER precisely predicted molecular function for 45.5% of the characterized proteins in this genome, as compared with four current function prediction methods that precisely predicted function for 62.6%, 30.6%, 6.0%, and 5.7% of these proteins. We use both precision-recall curves and ROC analyses to compare these genome-scale predictions across the different methods and to assess performance on different types of applications. SIFTER 2.0 is capable of predicting protein molecular function for large and functionally diverse protein families using an approximate statistical model, enabling phylogenetics-based protein function prediction for genome-wide analyses. The code for SIFTER and protein family data are available at http://sifter.berkeley.edu. PMID:21784873

  9. Ribosomal proteins: functions beyond the ribosome

    PubMed Central

    Zhou, Xiang; Liao, Wen-Juan; Liao, Jun-Ming; Liao, Peng; Lu, Hua

    2015-01-01

    Although ribosomal proteins are known for playing an essential role in ribosome assembly and protein translation, their ribosome-independent functions have also been greatly appreciated. Over the past decade, more than a dozen of ribosomal proteins have been found to activate the tumor suppressor p53 pathway in response to ribosomal stress. In addition, these ribosomal proteins are involved in various physiological and pathological processes. This review is composed to overview the current understanding of how ribosomal stress provokes the accumulation of ribosome-free ribosomal proteins, as well as the ribosome-independent functions of ribosomal proteins in tumorigenesis, immune signaling, and development. We also propose the potential of applying these pieces of knowledge to the development of ribosomal stress-based cancer therapeutics. PMID:25735597

  10. Determining protein function and interaction from genome analysis

    DOEpatents

    Eisenberg, David; Marcotte, Edward M.; Thompson, Michael J.; Pellegrini, Matteo; Yeates, Todd O.

    2004-08-03

    A computational method system, and computer program are provided for inferring functional links from genome sequences. One method is based on the observation that some pairs of proteins A' and B' have homologs in another organism fused into a single protein chain AB. A trans-genome comparison of sequences can reveal these AB sequences, which are Rosetta Stone sequences because they decipher an interaction between A' and B. Another method compares the genomic sequence of two or more organisms to create a phylogenetic profile for each protein indicating its presence or absence across all the genomes. The profile provides information regarding functional links between different families of proteins. In yet another method a combination of the above two methods is used to predict functional links.

  11. Neutral genetic drift can alter promiscuous protein functions, potentially aiding functional evolution

    PubMed Central

    Bloom, Jesse D; Romero, Philip A; Lu, Zhongyi; Arnold, Frances H

    2007-01-01

    Background Many of the mutations accumulated by naturally evolving proteins are neutral in the sense that they do not significantly alter a protein's ability to perform its primary biological function. However, new protein functions evolve when selection begins to favor other, "promiscuous" functions that are incidental to a protein's original biological role. If mutations that are neutral with respect to a protein's primary biological function cause substantial changes in promiscuous functions, these mutations could enable future functional evolution. Results Here we investigate this possibility experimentally by examining how cytochrome P450 enzymes that have evolved neutrally with respect to activity on a single substrate have changed in their abilities to catalyze reactions on five other substrates. We find that the enzymes have sometimes changed as much as four-fold in the promiscuous activities. The changes in promiscuous activities tend to increase with the number of mutations, and can be largely rationalized in terms of the chemical structures of the substrates. The activities on chemically similar substrates tend to change in a coordinated fashion, potentially providing a route for systematically predicting the change in one activity based on the measurement of several others. Conclusion Our work suggests that initially neutral genetic drift can lead to substantial changes in protein functions that are not currently under selection, in effect poising the proteins to more readily undergo functional evolution should selection favor new functions in the future. Reviewers This article was reviewed by Martijn Huynen, Fyodor Kondrashov, and Dan Tawfik (nominated by Christoph Adami). PMID:17598905

  12. Turning yeast sequence into protein function

    SciTech Connect

    Heijne, G. von

    1996-04-01

    The complete genome sequencing of the yeast Saccharomyces Cerevisiae leads us into a new era of potential use for such data base information. Protein engineering studies suggest that genetic selection of overproducing strains may aid the assignment of protein function. Data base management and sequencing software have been developed to scan entire genomes.

  13. Flavin Redox Switching of Protein Functions

    PubMed Central

    Zhu, Weidong; Moxley, Michael A.

    2011-01-01

    Abstract Flavin cofactors impart remarkable catalytic diversity to enzymes, enabling them to participate in a broad array of biological processes. The properties of flavins also provide proteins with a versatile redox sensor that can be utilized for converting physiological signals such as cellular metabolism, light, and redox status into a unique functional output. The control of protein functions by the flavin redox state is important for transcriptional regulation, cell signaling pathways, and environmental adaptation. A significant number of proteins that have flavin redox switches are found in the Per-Arnt-Sim (PAS) domain family and include flavoproteins that act as photosensors and respond to changes in cellular redox conditions. Biochemical and structural studies of PAS domain flavoproteins have revealed key insights into how flavin redox changes are propagated to the surface of the protein and translated into a new functional output such as the binding of a target protein in a signaling pathway. Mechanistic details of proteins unrelated to the PAS domain are also emerging and provide novel examples of how the flavin redox state governs protein–membrane interactions in response to appropriate stimuli. Analysis of different flavin switch proteins reveals shared mechanistic themes for the regulation of protein structure and function by flavins. Antioxid. Redox Signal. 14, 1079–1091. PMID:21028987

  14. Finding the “Dark Matter” in Human and Yeast Protein Network Prediction and Modelling

    PubMed Central

    Lees, Jon G.; Reid, Adam J.; Yeats, Corin; Clegg, Andrew B.; Sanchez-Jimenez, Francisca; Orengo, Christine

    2010-01-01

    Accurate modelling of biological systems requires a deeper and more complete knowledge about the molecular components and their functional associations than we currently have. Traditionally, new knowledge on protein associations generated by experiments has played a central role in systems modelling, in contrast to generally less trusted bio-computational predictions. However, we will not achieve realistic modelling of complex molecular systems if the current experimental designs lead to biased screenings of real protein networks and leave large, functionally important areas poorly characterised. To assess the likelihood of this, we have built comprehensive network models of the yeast and human proteomes by using a meta-statistical integration of diverse computationally predicted protein association datasets. We have compared these predicted networks against combined experimental datasets from seven biological resources at different level of statistical significance. These eukaryotic predicted networks resemble all the topological and noise features of the experimentally inferred networks in both species, and we also show that this observation is not due to random behaviour. In addition, the topology of the predicted networks contains information on true protein associations, beyond the constitutive first order binary predictions. We also observe that most of the reliable predicted protein associations are experimentally uncharacterised in our models, constituting the hidden or “dark matter” of networks by analogy to astronomical systems. Some of this dark matter shows enrichment of particular functions and contains key functional elements of protein networks, such as hubs associated with important functional areas like the regulation of Ras protein signal transduction in human cells. Thus, characterising this large and functionally important dark matter, elusive to established experimental designs, may be crucial for modelling biological systems. In any case

  15. Finding the "dark matter" in human and yeast protein network prediction and modelling.

    PubMed

    Ranea, Juan A G; Morilla, Ian; Lees, Jon G; Reid, Adam J; Yeats, Corin; Clegg, Andrew B; Sanchez-Jimenez, Francisca; Orengo, Christine

    2010-01-01

    Accurate modelling of biological systems requires a deeper and more complete knowledge about the molecular components and their functional associations than we currently have. Traditionally, new knowledge on protein associations generated by experiments has played a central role in systems modelling, in contrast to generally less trusted bio-computational predictions. However, we will not achieve realistic modelling of complex molecular systems if the current experimental designs lead to biased screenings of real protein networks and leave large, functionally important areas poorly characterised. To assess the likelihood of this, we have built comprehensive network models of the yeast and human proteomes by using a meta-statistical integration of diverse computationally predicted protein association datasets. We have compared these predicted networks against combined experimental datasets from seven biological resources at different level of statistical significance. These eukaryotic predicted networks resemble all the topological and noise features of the experimentally inferred networks in both species, and we also show that this observation is not due to random behaviour. In addition, the topology of the predicted networks contains information on true protein associations, beyond the constitutive first order binary predictions. We also observe that most of the reliable predicted protein associations are experimentally uncharacterised in our models, constituting the hidden or "dark matter" of networks by analogy to astronomical systems. Some of this dark matter shows enrichment of particular functions and contains key functional elements of protein networks, such as hubs associated with important functional areas like the regulation of Ras protein signal transduction in human cells. Thus, characterising this large and functionally important dark matter, elusive to established experimental designs, may be crucial for modelling biological systems. In any case

  16. Characterizing informative sequence descriptors and predicting binding affinities of heterodimeric protein complexes

    PubMed Central

    2015-01-01

    Background Protein-protein interactions (PPIs) are involved in various biological processes, and underlying mechanism of the interactions plays a crucial role in therapeutics and protein engineering. Most machine learning approaches have been developed for predicting the binding affinity of protein-protein complexes based on structure and functional information. This work aims to predict the binding affinity of heterodimeric protein complexes from sequences only. Results This work proposes a support vector machine (SVM) based binding affinity classifier, called SVM-BAC, to classify heterodimeric protein complexes based on the prediction of their binding affinity. SVM-BAC identified 14 of 580 sequence descriptors (physicochemical, energetic and conformational properties of the 20 amino acids) to classify 216 heterodimeric protein complexes into low and high binding affinity. SVM-BAC yielded the training accuracy, sensitivity, specificity, AUC and test accuracy of 85.80%, 0.89, 0.83, 0.86 and 83.33%, respectively, better than existing machine learning algorithms. The 14 features and support vector regression were further used to estimate the binding affinities (Pkd) of 200 heterodimeric protein complexes. Prediction performance of a Jackknife test was the correlation coefficient of 0.34 and mean absolute error of 1.4. We further analyze three informative physicochemical properties according to their contribution to prediction performance. Results reveal that the following properties are effective in predicting the binding affinity of heterodimeric protein complexes: apparent partition energy based on buried molar fractions, relations between chemical structure and biological activity in principal component analysis IV, and normalized frequency of beta turn. Conclusions The proposed sequence-based prediction method SVM-BAC uses an optimal feature selection method to identify 14 informative features to classify and predict binding affinity of heterodimeric protein

  17. A Simple Method for Predicting Transmembrane Proteins Based on Wavelet Transform

    PubMed Central

    Yu, Bin; Zhang, Yan

    2013-01-01

    The increasing protein sequences from the genome project require theoretical methods to predict transmembrane helical segments (TMHs). So far, several prediction methods have been reported, but there are some deficiencies in prediction accuracy and adaptability in these methods. In this paper, a method based on discrete wavelet transform (DWT) has been developed to predict the number and location of TMHs in membrane proteins. PDB coded as 1KQG is chosen as an example to describe the prediction process by this method. 80 proteins with known 3D structure from Mptopo database are chosen at random as data sets (including 325 TMHs) and 80 sequences are divided into 13 groups according to their function and type. TMHs prediction is carried out for each group of membrane protein sequences and obtain satisfactory result. To verify the feasibility of this method, 80 membrane protein sequences are treated as test sets, 308 TMHs can be predicted and the prediction accuracy is 96.3%. Compared with the main prediction results of seven popular prediction methods, the obtained results indicate that the proposed method in this paper has higher prediction accuracy. PMID:23289014

  18. Comparison of Algorithms for Prediction of Protein Structural Features from Evolutionary Data

    PubMed Central

    Bywater, Robert P.

    2016-01-01

    Proteins have many functions and predicting these is still one of the major challenges in theoretical biophysics and bioinformatics. Foremost amongst these functions is the need to fold correctly thereby allowing the other genetically dictated tasks that the protein has to carry out to proceed efficiently. In this work, some earlier algorithms for predicting protein domain folds are revisited and they are compared with more recently developed methods. In dealing with intractable problems such as fold prediction, when different algorithms show convergence onto the same result there is every reason to take all algorithms into account such that a consensus result can be arrived at. In this work it is shown that the application of different algorithms in protein structure prediction leads to results that do not converge as such but rather they collude in a striking and useful way that has never been considered before. PMID:26963911

  19. 4D prediction of protein (1)H chemical shifts.

    PubMed

    Lehtivarjo, Juuso; Hassinen, Tommi; Korhonen, Samuli-Petrus; Peräkylä, Mikael; Laatikainen, Reino

    2009-12-01

    A 4D approach for protein (1)H chemical shift prediction was explored. The 4th dimension is the molecular flexibility, mapped using molecular dynamics simulations. The chemical shifts were predicted with a principal component model based on atom coordinates from a database of 40 protein structures. When compared to the corresponding non-dynamic (3D) model, the 4th dimension improved prediction by 6-7%. The prediction method achieved RMS errors of 0.29 and 0.50 ppm for Halpha and HN shifts, respectively. However, for individual proteins the RMS errors were 0.17-0.34 and 0.34-0.65 ppm for the Halpha and HN shifts, respectively. X-ray structures gave better predictions than the corresponding NMR structures, indicating that chemical shifts contain invaluable information about local structures. The (1)H chemical shift prediction tool 4DSPOT is available from http://www.uku.fi/kemia/4dspot . PMID:19876601

  20. microProtein Prediction Program (miP3): A Software for Predicting microProteins and Their Target Transcription Factors.

    PubMed

    de Klein, Niek; Magnani, Enrico; Banf, Michael; Rhee, Seung Yon

    2015-01-01

    An emerging concept in transcriptional regulation is that a class of truncated transcription factors (TFs), called microProteins (miPs), engages in protein-protein interactions with TF complexes and provides feedback controls. A handful of miP examples have been described in the literature but the extent of their prevalence is unclear. Here we present an algorithm that predicts miPs and their target TFs from a sequenced genome. The algorithm is called miP prediction program (miP3), which is implemented in Python. The software will help shed light on the prevalence, biological roles, and evolution of miPs. Moreover, miP3 can be used to predict other types of miP-like proteins that may have evolved from other functional classes such as kinases and receptors. The program is freely available and can be applied to any sequenced genome. PMID:26060811

  1. QuaBingo: A Prediction System for Protein Quaternary Structure Attributes Using Block Composition

    PubMed Central

    Tung, Chi-Hua; Chen, Chi-Wei; Guo, Ren-Chao; Ng, Hui-Fuang

    2016-01-01

    Background. Quaternary structures of proteins are closely relevant to gene regulation, signal transduction, and many other biological functions of proteins. In the current study, a new method based on protein-conserved motif composition in block format for feature extraction is proposed, which is termed block composition. Results. The protein quaternary assembly states prediction system which combines blocks with functional domain composition, called QuaBingo, is constructed by three layers of classifiers that can categorize quaternary structural attributes of monomer, homooligomer, and heterooligomer. The building of the first layer classifier uses support vector machines (SVM) based on blocks and functional domains of proteins, and the second layer SVM was utilized to process the outputs of the first layer. Finally, the result is determined by the Random Forest of the third layer. We compared the effectiveness of the combination of block composition, functional domain composition, and pseudoamino acid composition of the model. In the 11 kinds of functional protein families, QuaBingo is 23% of Matthews Correlation Coefficient (MCC) higher than the existing prediction system. The results also revealed the biological characterization of the top five block compositions. Conclusions. QuaBingo provides better predictive ability for predicting the quaternary structural attributes of proteins. PMID:27610389

  2. QuaBingo: A Prediction System for Protein Quaternary Structure Attributes Using Block Composition.

    PubMed

    Tung, Chi-Hua; Chen, Chi-Wei; Guo, Ren-Chao; Ng, Hui-Fuang; Chu, Yen-Wei

    2016-01-01

    Background. Quaternary structures of proteins are closely relevant to gene regulation, signal transduction, and many other biological functions of proteins. In the current study, a new method based on protein-conserved motif composition in block format for feature extraction is proposed, which is termed block composition. Results. The protein quaternary assembly states prediction system which combines blocks with functional domain composition, called QuaBingo, is constructed by three layers of classifiers that can categorize quaternary structural attributes of monomer, homooligomer, and heterooligomer. The building of the first layer classifier uses support vector machines (SVM) based on blocks and functional domains of proteins, and the second layer SVM was utilized to process the outputs of the first layer. Finally, the result is determined by the Random Forest of the third layer. We compared the effectiveness of the combination of block composition, functional domain composition, and pseudoamino acid composition of the model. In the 11 kinds of functional protein families, QuaBingo is 23% of Matthews Correlation Coefficient (MCC) higher than the existing prediction system. The results also revealed the biological characterization of the top five block compositions. Conclusions. QuaBingo provides better predictive ability for predicting the quaternary structural attributes of proteins. PMID:27610389

  3. Computational Methods to Predict Protein Interaction Partners

    NASA Astrophysics Data System (ADS)

    Valencia, Alfonso; Pazos, Florencio

    In the new paradigm for studying biological phenomena represented by Systems Biology, cellular components are not considered in isolation but as forming complex networks of relationships. Protein interaction networks are among the first objects studied from this new point of view. Deciphering the interactome (the whole network of interactions for a given proteome) has been shown to be a very complex task. Computational techniques for detecting protein interactions have become standard tools for dealing with this problem, helping and complementing their experimental counterparts. Most of these techniques use genomic or sequence features intuitively related with protein interactions and are based on "first principles" in the sense that they do not involve training with examples. There are also other computational techniques that use other sources of information (i.e. structural information or even experimental data) or are based on training with examples.

  4. Evolution of Ftz protein function in insects.

    PubMed

    Alonso, C R; Maxton-Kuechenmeister, J; Akam, M

    2001-09-18

    The Drosophila gene fushi tarazu (ftz) encodes a homeodomain-containing transcriptional regulator (Ftz) required at several stages during development. Drosophila melanogaster ftz (Dm-ftz) is first expressed in seven stripes defining alternate parasegments of the embryo--a "pair-rule" segmentation function [1, 2]. It is then expressed in specific neural precursor cells in the central nervous system and finally in the developing hindgut [3]. An Orthopteran ortholog of ftz (Sg-ftz, formally Dax) has been isolated from the grasshopper Schistocerca gregaria [4]. The pattern of Sg-ftz expression in Schistocerca embryos suggests that some developmental roles of the ftz gene are likely to be conserved between these two species (e.g., CNS functions) while others may have diverged (e.g., segmentation functions). To test whether the function of the Ftz protein itself differs between these two species, here we compare the functions of Sg-Ftz and Dm-Ftz proteins by expressing both in Drosophila embryos. Sg-ftz mimics only poorly several segmentation roles of Dm-ftz (engrailed activation, wingless repression, and embryonic cuticle transformation). However, the two proteins are similarly active in the rescue of a CNS-specific ftz mutant. These findings argue that this ftz CNS function is mediated by conserved parts of the protein, while efficient pair-rule function requires sequences present specifically in the Drosophila protein. PMID:11566109

  5. PredPlantPTS1: A Web Server for the Prediction of Plant Peroxisomal Proteins

    PubMed Central

    Reumann, Sigrun; Buchwald, Daniela; Lingner, Thomas

    2012-01-01

    Prediction of subcellular protein localization is essential to correctly assign unknown proteins to cell organelle-specific protein networks and to ultimately determine protein function. For metazoa, several computational approaches have been developed in the past decade to predict peroxisomal proteins carrying the peroxisome targeting signal type 1 (PTS1). However, plant-specific PTS1 protein prediction methods have been lacking up to now, and pre-existing methods generally were incapable of correctly predicting low-abundance plant proteins possessing non-canonical PTS1 patterns. Recently, we presented a machine learning approach that is able to predict PTS1 proteins for higher plants (spermatophytes) with high accuracy and which can correctly identify unknown targeting patterns, i.e., novel PTS1 tripeptides and tripeptide residues. Here we describe the first plant-specific web server PredPlantPTS1 for the prediction of plant PTS1 proteins using the above-mentioned underlying models. The server allows the submission of protein sequences from diverse spermatophytes and also performs well for mosses and algae. The easy-to-use web interface provides detailed output in terms of (i) the peroxisomal targeting probability of the given sequence, (ii) information whether a particular non-canonical PTS1 tripeptide has already been experimentally verified, and (iii) the prediction scores for the single C-terminal 14 amino acid residues. The latter allows identification of predicted residues that inhibit peroxisome targeting and which can be optimized using site-directed mutagenesis to raise the peroxisome targeting efficiency. The prediction server will be instrumental in identifying low-abundance and stress-inducible peroxisomal proteins and defining the entire peroxisomal proteome of Arabidopsis and agronomically important crop plants. PredPlantPTS1 is freely accessible at ppp.gobics.de. PMID:22969783

  6. An Atomistic Statistically Effective Energy Function for Computational Protein Design.

    PubMed

    Topham, Christopher M; Barbe, Sophie; André, Isabelle

    2016-08-01

    Shortcomings in the definition of effective free-energy surfaces of proteins are recognized to be a major contributory factor responsible for the low success rates of existing automated methods for computational protein design (CPD). The formulation of an atomistic statistically effective energy function (SEEF) suitable for a wide range of CPD applications and its derivation from structural data extracted from protein domains and protein-ligand complexes are described here. The proposed energy function comprises nonlocal atom-based and local residue-based SEEFs, which are coupled using a novel atom connectivity number factor to scale short-range, pairwise, nonbonded atomic interaction energies and a surface-area-dependent cavity energy term. This energy function was used to derive additional SEEFs describing the unfolded-state ensemble of any given residue sequence based on computed average energies for partially or fully solvent-exposed fragments in regions of irregular structure in native proteins. Relative thermal stabilities of 97 T4 bacteriophage lysozyme mutants were predicted from calculated energy differences for folded and unfolded states with an average unsigned error (AUE) of 0.84 kcal mol(-1) when compared to experiment. To demonstrate the utility of the energy function for CPD, further validation was carried out in tests of its capacity to recover cognate protein sequences and to discriminate native and near-native protein folds, loop conformers, and small-molecule ligand binding poses from non-native benchmark decoys. Experimental ligand binding free energies for a diverse set of 80 protein complexes could be predicted with an AUE of 2.4 kcal mol(-1) using an additional energy term to account for the loss in ligand configurational entropy upon binding. The atomistic SEEF is expected to improve the accuracy of residue-based coarse-grained SEEFs currently used in CPD and to extend the range of applications of extant atom-based protein statistical

  7. Genetically modified proteins: functional improvement and chimeragenesis

    PubMed Central

    Balabanova, Larissa; Golotin, Vasily; Podvolotskaya, Anna; Rasskazov, Valery

    2015-01-01

    This review focuses on the emerging role of site-specific mutagenesis and chimeragenesis for the functional improvement of proteins in areas where traditional protein engineering methods have been extensively used and practically exhausted. The novel path for the creation of the novel proteins has been created on the farther development of the new structure and sequence optimization algorithms for generating and designing the accurate structure models in result of x-ray crystallography studies of a lot of proteins and their mutant forms. Artificial genetic modifications aim to expand nature's repertoire of biomolecules. One of the most exciting potential results of mutagenesis or chimeragenesis finding could be design of effective diagnostics, bio-therapeutics and biocatalysts. A sampling of recent examples is listed below for the in vivo and in vitro genetically improvement of various binding protein and enzyme functions, with references for more in-depth study provided for the reader's benefit. PMID:26211369

  8. Structure and functional annotation of hypothetical proteins having putative Rubisco activase function from Vitis vinifera.

    PubMed

    Kumar, Suresh

    2015-01-01

    Rubisco is a very large, complex and one of the most abundant proteins in the world and comprises up to 50% of all soluble protein in plants. The activity of Rubisco, the enzyme that catalyzes CO2 assimilation in photosynthesis, is regulated by Rubisco activase (Rca). In the present study, we searched for hypothetical protein of Vitis vinifera which has putative Rubisco activase function. The Arabidopsis and tobacco Rubisco activase protein sequences were used as seed sequences to search against Vitis vinifera in UniprotKB database. The selected hypothetical proteins of Vitis vinifera were subjected to sequence, structural and functional annotation. Subcellular localization predictions suggested it to be cytoplasmic protein. Homology modelling was used to define the three-dimensional (3D) structure of selected hypothetical proteins of Vitis vinifera. Template search revealed that all the hypothetical proteins share more than 80% sequence identity with structure of green-type Rubisco activase from tobacco, indicating proteins are evolutionary conserved. The homology modelling was generated using SWISS-MODEL. Several quality assessment and validation parameters computed indicated that homology models are reliable. Further, functional annotation through PFAM, CATH, SUPERFAMILY, CDART suggested that selected hypothetical proteins of Vitis vinifera contain ATPase family associated with various cellular activities (AAA) and belong to the AAA+ super family of ring-shaped P-loop containing nucleoside triphosphate hydrolases. This study will lead to research in the optimization of the functionality of Rubisco which has large implication in the improvement of plant productivity and resource use efficiency. PMID:25780274

  9. Partner-Aware Prediction of Interacting Residues in Protein-Protein Complexes from Sequence Data

    PubMed Central

    Ahmad, Shandar; Mizuguchi, Kenji

    2011-01-01

    Computational prediction of residues that participate in protein-protein interactions is a difficult task, and state of the art methods have shown only limited success in this arena. One possible problem with these methods is that they try to predict interacting residues without incorporating information about the partner protein, although it is unclear how much partner information could enhance prediction performance. To address this issue, the two following comparisons are of crucial significance: (a) comparison between the predictability of inter-protein residue pairs, i.e., predicting exactly which residue pairs interact with each other given two protein sequences; this can be achieved by either combining conventional single-protein predictions or making predictions using a new model trained directly on the residue pairs, and the performance of these two approaches may be compared: (b) comparison between the predictability of the interacting residues in a single protein (irrespective of the partner residue or protein) from conventional methods and predictions converted from the pair-wise trained model. Using these two streams of training and validation procedures and employing similar two-stage neural networks, we showed that the models trained on pair-wise contacts outperformed the partner-unaware models in predicting both interacting pairs and interacting single-protein residues. Prediction performance decreased with the size of the conformational change upon complex formation; this trend is similar to docking, even though no structural information was used in our prediction. An example application that predicts two partner-specific interfaces of a protein was shown to be effective, highlighting the potential of the proposed approach. Finally, a preliminary attempt was made to score docking decoy poses using prediction of interacting residue pairs; this analysis produced an encouraging result. PMID:22194998

  10. Prediction of structural features and application to outer membrane protein identification

    NASA Astrophysics Data System (ADS)

    Yan, Renxiang; Wang, Xiaofeng; Huang, Lanqing; Yan, Feidi; Xue, Xiaoyu; Cai, Weiwen

    2015-06-01

    Protein three-dimensional (3D) structures provide insightful information in many fields of biology. One-dimensional properties derived from 3D structures such as secondary structure, residue solvent accessibility, residue depth and backbone torsion angles are helpful to protein function prediction, fold recognition and ab initio folding. Here, we predict various structural features with the assistance of neural network learning. Based on an independent test dataset, protein secondary structure prediction generates an overall Q3 accuracy of ~80%. Meanwhile, the prediction of relative solvent accessibility obtains the highest mean absolute error of 0.164, and prediction of residue depth achieves the lowest mean absolute error of 0.062. We further improve the outer membrane protein identification by including the predicted structural features in a scoring function using a simple profile-to-profile alignment. The results demonstrate that the accuracy of outer membrane protein identification can be improved by ~3% at a 1% false positive level when structural features are incorporated. Finally, our methods are available as two convenient and easy-to-use programs. One is PSSM-2-Features for predicting secondary structure, relative solvent accessibility, residue depth and backbone torsion angles, the other is PPA-OMP for identifying outer membrane proteins from proteomes.

  11. Jenner-predict server: prediction of protein vaccine candidates (PVCs) in bacteria based on host-pathogen interactions

    PubMed Central

    2013-01-01

    Background Subunit vaccines based on recombinant proteins have been effective in preventing infectious diseases and are expected to meet the demands of future vaccine development. Computational approach, especially reverse vaccinology (RV) method has enormous potential for identification of protein vaccine candidates (PVCs) from a proteome. The existing protective antigen prediction software and web servers have low prediction accuracy leading to limited applications for vaccine development. Besides machine learning techniques, those software and web servers have considered only protein’s adhesin-likeliness as criterion for identification of PVCs. Several non-adhesin functional classes of proteins involved in host-pathogen interactions and pathogenesis are known to provide protection against bacterial infections. Therefore, knowledge of bacterial pathogenesis has potential to identify PVCs. Results A web server, Jenner-Predict, has been developed for prediction of PVCs from proteomes of bacterial pathogens. The web server targets host-pathogen interactions and pathogenesis by considering known functional domains from protein classes such as adhesin, virulence, invasin, porin, flagellin, colonization, toxin, choline-binding, penicillin-binding, transferring-binding, fibronectin-binding and solute-binding. It predicts non-cytosolic proteins containing above domains as PVCs. It also provides vaccine potential of PVCs in terms of their possible immunogenicity by comparing with experimentally known IEDB epitopes, absence of autoimmunity and conservation in different strains. Predicted PVCs are prioritized so that only few prospective PVCs could be validated experimentally. The performance of web server was evaluated against known protective antigens from diverse classes of bacteria reported in Protegen database and datasets used for VaxiJen server development. The web server efficiently predicted known vaccine candidates reported from Streptococcus pneumoniae and

  12. How mutational epistasis impairs predictability in protein evolution and design.

    PubMed

    Miton, Charlotte M; Tokuriki, Nobuhiko

    2016-07-01

    There has been much debate about the extent to which mutational epistasis, that is, the dependence of the outcome of a mutation on the genetic background, constrains evolutionary trajectories. The degree of unpredictability introduced by epistasis, due to the non-additivity of functional effects, strongly hinders the strategies developed in protein design and engineering. While many studies have addressed this issue through systematic characterization of evolutionary trajectories within individual enzymes, the field lacks a consensus view on this matter. In this work, we performed a comprehensive analysis of epistasis by analyzing the mutational effects from nine adaptive trajectories toward new enzymatic functions. We quantified epistasis by comparing the effect of mutations occurring between two genetic backgrounds: the starting enzyme (for example, wild type) and the intermediate variant on which the mutation occurred during the trajectory. We found that most trajectories exhibit positive epistasis, in which the mutational effect is more beneficial when it occurs later in the evolutionary trajectory. Approximately half (49%) of functional mutations were neutral or negative on the wild-type background, but became beneficial at a later stage in the trajectory, indicating that these functional mutations were not predictable from the initial starting point. While some cases of strong epistasis were associated with direct interaction between residues, many others were caused by long-range indirect interactions between mutations. Our work highlights the prevalence of epistasis in enzyme adaptive evolution, in particular positive epistasis, and suggests the necessity of incorporating mutational epistasis in protein engineering and design to create highly efficient catalysts. PMID:26757214

  13. From residue coevolution to protein conformational ensembles and functional dynamics

    PubMed Central

    Sutto, Ludovico; Marsili, Simone; Valencia, Alfonso; Gervasio, Francesco Luigi

    2015-01-01

    The analysis of evolutionary amino acid correlations has recently attracted a surge of renewed interest, also due to their successful use in de novo protein native structure prediction. However, many aspects of protein function, such as substrate binding and product release in enzymatic activity, can be fully understood only in terms of an equilibrium ensemble of alternative structures, rather than a single static structure. In this paper we combine coevolutionary data and molecular dynamics simulations to study protein conformational heterogeneity. To that end, we adapt the Boltzmann-learning algorithm to the analysis of homologous protein sequences and develop a coarse-grained protein model specifically tailored to convert the resulting contact predictions to a protein structural ensemble. By means of exhaustive sampling simulations, we analyze the set of conformations that are consistent with the observed residue correlations for a set of representative protein domains, showing that (i) the most representative structure is consistent with the experimental fold and (ii) the various regions of the sequence display different stability, related to multiple biologically relevant conformations and to the cooperativity of the coevolving pairs. Moreover, we show that the proposed protocol is able to reproduce the essential features of a protein folding mechanism as well as to account for regions involved in conformational transitions through the correct sampling of the involved conformers. PMID:26487681

  14. Evolution-Based Functional Decomposition of Proteins.

    PubMed

    Rivoire, Olivier; Reynolds, Kimberly A; Ranganathan, Rama

    2016-06-01

    The essential biological properties of proteins-folding, biochemical activities, and the capacity to adapt-arise from the global pattern of interactions between amino acid residues. The statistical coupling analysis (SCA) is an approach to defining this pattern that involves the study of amino acid coevolution in an ensemble of sequences comprising a protein family. This approach indicates a functional architecture within proteins in which the basic units are coupled networks of amino acids termed sectors. This evolution-based decomposition has potential for new understandings of the structural basis for protein function. To facilitate its usage, we present here the principles and practice of the SCA and introduce new methods for sector analysis in a python-based software package (pySCA). We show that the pattern of amino acid interactions within sectors is linked to the divergence of functional lineages in a multiple sequence alignment-a model for how sector properties might be differentially tuned in members of a protein family. This work provides new tools for studying proteins and for generally testing the concept of sectors as the principal units of function and adaptive variation. PMID:27254668

  15. Proteins and Their Interacting Partners: An Introduction to Protein–Ligand Binding Site Prediction Methods

    PubMed Central

    Roche, Daniel Barry; Brackenridge, Danielle Allison; McGuffin, Liam James

    2015-01-01

    Elucidating the biological and biochemical roles of proteins, and subsequently determining their interacting partners, can be difficult and time consuming using in vitro and/or in vivo methods, and consequently the majority of newly sequenced proteins will have unknown structures and functions. However, in silico methods for predicting protein–ligand binding sites and protein biochemical functions offer an alternative practical solution. The characterisation of protein–ligand binding sites is essential for investigating new functional roles, which can impact the major biological research spheres of health, food, and energy security. In this review we discuss the role in silico methods play in 3D modelling of protein–ligand binding sites, along with their role in predicting biochemical functionality. In addition, we describe in detail some of the key alternative in silico prediction approaches that are available, as well as discussing the Critical Assessment of Techniques for Protein Structure Prediction (CASP) and the Continuous Automated Model EvaluatiOn (CAMEO) projects, and their impact on developments in the field. Furthermore, we discuss the importance of protein function prediction methods for tackling 21st century problems. PMID:26694353

  16. Computational Prediction of RNA-Binding Proteins and Binding Sites

    PubMed Central

    Si, Jingna; Cui, Jing; Cheng, Jin; Wu, Rongling

    2015-01-01

    Proteins and RNA interaction have vital roles in many cellular processes such as protein synthesis, sequence encoding, RNA transfer, and gene regulation at the transcriptional and post-transcriptional levels. Approximately 6%–8% of all proteins are RNA-binding proteins (RBPs). Distinguishing these RBPs or their binding residues is a major aim of structural biology. Previously, a number of experimental methods were developed for the determination of protein–RNA interactions. However, these experimental methods are expensive, time-consuming, and labor-intensive. Alternatively, researchers have developed many computational approaches to predict RBPs and protein–RNA binding sites, by combining various machine learning methods and abundant sequence and/or structural features. There are three kinds of computational approaches, which are prediction from protein sequence, prediction from protein structure, and protein-RNA docking. In this paper, we review all existing studies of predictions of RNA-binding sites and RBPs and complexes, including data sets used in different approaches, sequence and structural features used in several predictors, prediction method classifications, performance comparisons, evaluation methods, and future directions. PMID:26540053

  17. Functional divergence outlines the evolution of novel protein function in NifH/BchL protein family.

    PubMed

    Thakur, Subarna; Bothra, Asim K; Sen, Arnab

    2013-11-01

    Biological nitrogen fixation is accomplished by prokaryotes through the catalytic action of complex metalloenzyme, nitrogenase. Nitrogenase is a two-protein component system comprising MoFe protein (NifD and K) and Fe protein (NifH). NifH shares structural and mechanistic similarities as well as evolutionary relationships with light-independent protochlorophyllide reductase (BchL), a photosynthesis-related metalloenzyme belonging to the same protein family. We performed a comprehensive bioinformatics analysis of the NifH/BchL family in order to elucidate the intrinsic functional diversity and the underlying evolutionary mechanism among the members. To analyse functional divergence in the NifH/ BchL family, we have conducted pair-wise estimation in altered evolutionary rates between the member proteins. We identified a number of vital amino acid sites which contribute to predicted functional diversity. We have also made use of the maximum likelihood tests for detection of positive selection at the amino acid level followed by the structure-based phylogenetic approach to draw conclusion on the ancient lineage and novel characterization of the NifH/BchL protein family. Our investigation provides ample support to the fact that NifH protein and BchL share robust structural similarities and have probably deviated from a common ancestor followed by divergence in functional properties possibly due to gene duplication. PMID:24287653

  18. A Prediction Model for Membrane Proteins Using Moments Based Features.

    PubMed

    Butt, Ahmad Hassan; Khan, Sher Afzal; Jamil, Hamza; Rasool, Nouman; Khan, Yaser Daanial

    2016-01-01

    The most expedient unit of the human body is its cell. Encapsulated within the cell are many infinitesimal entities and molecules which are protected by a cell membrane. The proteins that are associated with this lipid based bilayer cell membrane are known as membrane proteins and are considered to play a significant role. These membrane proteins exhibit their effect in cellular activities inside and outside of the cell. According to the scientists in pharmaceutical organizations, these membrane proteins perform key task in drug interactions. In this study, a technique is presented that is based on various computationally intelligent methods used for the prediction of membrane protein without the experimental use of mass spectrometry. Statistical moments were used to extract features and furthermore a Multilayer Neural Network was trained using backpropagation for the prediction of membrane proteins. Results show that the proposed technique performs better than existing methodologies. PMID:26966690

  19. A Prediction Model for Membrane Proteins Using Moments Based Features

    PubMed Central

    Butt, Ahmad Hassan; Khan, Sher Afzal; Jamil, Hamza; Rasool, Nouman; Khan, Yaser Daanial

    2016-01-01

    The most expedient unit of the human body is its cell. Encapsulated within the cell are many infinitesimal entities and molecules which are protected by a cell membrane. The proteins that are associated with this lipid based bilayer cell membrane are known as membrane proteins and are considered to play a significant role. These membrane proteins exhibit their effect in cellular activities inside and outside of the cell. According to the scientists in pharmaceutical organizations, these membrane proteins perform key task in drug interactions. In this study, a technique is presented that is based on various computationally intelligent methods used for the prediction of membrane protein without the experimental use of mass spectrometry. Statistical moments were used to extract features and furthermore a Multilayer Neural Network was trained using backpropagation for the prediction of membrane proteins. Results show that the proposed technique performs better than existing methodologies. PMID:26966690

  20. A structural alphabet for local protein structures: improved prediction methods.

    PubMed

    Etchebest, Catherine; Benros, Cristina; Hazout, Serge; de Brevern, Alexandre G

    2005-06-01

    Three-dimensional protein structures can be described with a library of 3D fragments that define a structural alphabet. We have previously proposed such an alphabet, composed of 16 patterns of five consecutive amino acids, called Protein Blocks (PBs). These PBs have been used to describe protein backbones and to predict local structures from protein sequences. The Q16 prediction rate reaches 40.7% with an optimization procedure. This article examines two aspects of PBs. First, we determine the effect of the enlargement of databanks on their definition. The results show that the geometrical features of the different PBs are preserved (local RMSD value equal to 0.41 A on average) and sequence-structure specificities reinforced when databanks are enlarged. Second, we improve the methods for optimizing PB predictions from sequences, revisiting the optimization procedure and exploring different local prediction strategies. Use of a statistical optimization procedure for the sequence-local structure relation improves prediction accuracy by 8% (Q16 = 48.7%). Better recognition of repetitive structures occurs without losing the prediction efficiency of the other local folds. Adding secondary structure prediction improved the accuracy of Q16 by only 1%. An entropy index (Neq), strongly related to the RMSD value of the difference between predicted PBs and true local structures, is proposed to estimate prediction quality. The Neq is linearly correlated with the Q16 prediction rate distributions, computed for a large set of proteins. An "expected" prediction rate QE16 is deduced with a mean error of 5%. PMID:15822101

  1. Prediction of membrane protein structures with complex topologies using limited constraints

    PubMed Central

    Barth, P.; Wallner, B.; Baker, D.

    2009-01-01

    Reliable structure-prediction methods for membrane proteins are important because the experimental determination of high-resolution membrane protein structures remains very difficult, especially for eukaryotic proteins. However, membrane proteins are typically longer than 200 aa and represent a formidable challenge for structure prediction. We have developed a method for predicting the structures of large membrane proteins by constraining helix–helix packing arrangements at particular positions predicted from sequence or identified by experiments. We tested the method on 12 membrane proteins of diverse topologies and functions with lengths ranging between 190 and 300 residues. Enforcing a single constraint during the folding simulations enriched the population of near-native models for 9 proteins. In 4 of the cases in which the constraint was predicted from the sequence, 1 of the 5 lowest energy models was superimposable within 4 Å on the native structure. Near-native structures could also be selected for heme-binding and pore-forming domains from simulations in which pairs of conserved histidine-chelating hemes and one experimentally determined salt bridge were constrained, respectively. These results suggest that models within 4 Å of the native structure can be achieved for complex membrane proteins if even limited information on residue-residue interactions can be obtained from protein structure databases or experiments. PMID:19190187

  2. Towards predictive docking at aminergic G-protein coupled receptors.

    PubMed

    Jakubík, Jan; El-Fakahany, Esam E; Doležal, Vladimír

    2015-11-01

    G protein-coupled receptors (GPCRs) are hard to crystallize. However, attempts to predict their structure have boomed as a result of advancements in crystallographic techniques. This trend has allowed computer-aided molecular modeling of GPCRs. We analyzed the performance of four molecular modeling programs in pose evaluation of re-docked antagonists / inverse agonists to 11 original crystal structures of aminergic GPCRs using an induced fit-docking procedure. AutoDock and Glide were used for docking. AutoDock binding energy function, GlideXP, Prime MM-GB/SA, and YASARA binding function were used for pose scoring. Root mean square deviation (RMSD) of the best pose ranged from 0.09 to 1.58 Å, and median RMSD of the top 60 poses ranged from 1.47 to 3.83 Å. However, RMSD of the top pose ranged from 0.13 to 7.33 Å and ranking of the best pose ranged from the 1st to 60th out of 60 poses. Moreover, analysis of ligand-receptor interactions of top poses revealed substantial differences from interactions found in crystallographic structures. Bad ranking of top poses and discrepancies between top docked poses and crystal structures render current simple docking methods unsuitable for predictive modeling of receptor-ligand interactions. Prime MM-GB/SA optimized for 3NY9 by multiple linear regression did not work well at 3NY8 and 3NYA, structures of the same receptor with different ligands. However, 9 of 11 trajectories of molecular dynamics simulations by Desmond of top poses converged with trajectories of crystal structures. Key interactions were properly detected for all structures. This procedure also worked well for cross-docking of tested β2-adrenergic antagonists. Thus, this procedure represents a possible way to predict interactions of antagonists with aminergic GPCRs. PMID:26453085

  3. Modulation of opioid receptor function by protein-protein interactions.

    PubMed

    Alfaras-Melainis, Konstantinos; Gomes, Ivone; Rozenfeld, Raphael; Zachariou, Venetia; Devi, Lakshmi

    2009-01-01

    Opioid receptors, MORP, DORP and KORP, belong to the family A of G protein coupled receptors (GPCR), and have been found to modulate a large number of physiological functions, including mood, stress, appetite, nociception and immune responses. Exogenously applied opioid alkaloids produce analgesia, hedonia and addiction. Addiction is linked to alterations in function and responsiveness of all three opioid receptors in the brain. Over the last few years, a large number of studies identified protein-protein interactions that play an essential role in opioid receptor function and responsiveness. Here, we summarize interactions shown to affect receptor biogenesis and trafficking, as well as those affecting signal transduction events following receptor activation. This article also examines protein interactions modulating the rate of receptor endocytosis and degradation, events that play a major role in opiate analgesia. Like several other GPCRs, opioid receptors may form homo or heterodimers. The last part of this review summarizes recent knowledge on proteins known to affect opioid receptor dimerization. PMID:19273296

  4. Functional dynamics of cell surface membrane proteins

    NASA Astrophysics Data System (ADS)

    Nishida, Noritaka; Osawa, Masanori; Takeuchi, Koh; Imai, Shunsuke; Stampoulis, Pavlos; Kofuku, Yutaka; Ueda, Takumi; Shimada, Ichio

    2014-04-01

    Cell surface receptors are integral membrane proteins that receive external stimuli, and transmit signals across plasma membranes. In the conventional view of receptor activation, ligand binding to the extracellular side of the receptor induces conformational changes, which convert the structure of the receptor into an active conformation. However, recent NMR studies of cell surface membrane proteins have revealed that their structures are more dynamic than previously envisioned, and they fluctuate between multiple conformations in an equilibrium on various timescales. In addition, NMR analyses, along with biochemical and cell biological experiments indicated that such dynamical properties are critical for the proper functions of the receptors. In this review, we will describe several NMR studies that revealed direct linkage between the structural dynamics and the functions of the cell surface membrane proteins, such as G-protein coupled receptors (GPCRs), ion channels, membrane transporters, and cell adhesion molecules.

  5. Calreticulin: one protein, one gene, many functions.

    PubMed Central

    Michalak, M; Corbett, E F; Mesaeli, N; Nakamura, K; Opas, M

    1999-01-01

    The endoplasmic reticulum (ER) plays a critical role in the synthesis and chaperoning of membrane-associated and secreted proteins. The membrane is also an important site of Ca(2+) storage and release. Calreticulin is a unique ER luminal resident protein. The protein affects many cellular functions, both in the ER lumen and outside of the ER environment. In the ER lumen, calreticulin performs two major functions: chaperoning and regulation of Ca(2+) homoeostasis. Calreticulin is a highly versatile lectin-like chaperone, and it participates during the synthesis of a variety of molecules, including ion channels, surface receptors, integrins and transporters. The protein also affects intracellular Ca(2+) homoeostasis by modulation of ER Ca(2+) storage and transport. Studies on the cell biology of calreticulin revealed that the ER membrane is a very dynamic intracellular compartment affecting many aspects of cell physiology. PMID:10567207

  6. Ligand Similarity Complements Sequence, Physical Interaction, and Co-Expression for Gene Function Prediction

    PubMed Central

    Shoichet, Brian K.; Gillis, Jesse

    2016-01-01

    The expansion of protein-ligand annotation databases has enabled large-scale networking of proteins by ligand similarity. These ligand-based protein networks, which implicitly predict the ability of neighboring proteins to bind related ligands, may complement biologically-oriented gene networks, which are used to predict functional or disease relevance. To quantify the degree to which such ligand-based protein associations might complement functional genomic associations, including sequence similarity, physical protein-protein interactions, co-expression, and disease gene annotations, we calculated a network based on the Similarity Ensemble Approach (SEA: sea.docking.org), where protein neighbors reflect the similarity of their ligands. We also measured the similarity with functional genomic networks over a common set of 1,131 genes, and found that the networks had only small overlaps, which were significant only due to the large scale of the data. Consistent with the view that the networks contain different information, combining them substantially improved Molecular Function prediction within GO (from AUROC~0.63–0.75 for the individual data modalities to AUROC~0.8 in the aggregate). We investigated the boost in guilt-by-association gene function prediction when the networks are combined and describe underlying properties that can be further exploited. PMID:27467773

  7. Predicting protein disorder by analyzing amino acid sequence

    PubMed Central

    Yang, Jack Y; Yang, Mary Qu

    2008-01-01

    Background Many protein regions and some entire proteins have no definite tertiary structure, presenting instead as dynamic, disorder ensembles under different physiochemical circumstances. These proteins and regions are known as Intrinsically Unstructured Proteins (IUP). IUP have been associated with a wide range of protein functions, along with roles in diseases characterized by protein misfolding and aggregation. Results Identifying IUP is important task in structural and functional genomics. We exact useful features from sequences and develop machine learning algorithms for the above task. We compare our IUP predictor with PONDRs (mainly neural-network-based predictors), disEMBL (also based on neural networks) and Globplot (based on disorder propensity). Conclusion We find that augmenting features derived from physiochemical properties of amino acids (such as hydrophobicity, complexity etc.) and using ensemble method proved beneficial. The IUP predictor is a viable alternative software tool for identifying IUP protein regions and proteins. PMID:18831799

  8. Insights into prion protein function from atomistic simulations.

    PubMed

    Hodak, Miroslav; Bernholc, Jerzy

    2010-01-01

    Computer simulations are a powerful tool for studies of biological systems. They have often been used to study prion protein (PrP), a protein responsible for neurodegenerative diseases, which include "mad cow disease" in cattle and Creutzfeldt-Jacob disease in humans. An important aspect of the prion protein is its interaction with copper ion, which is thought to be relevant for PrP's yet undetermined function and also potentially play a role in prion diseases. for studies of copper attachment to the prion protein, computer simulations have often been used to complement experimental data and to obtain binding structures of Cu-PrP complexes. This paper summarizes the results of recent ab initio calculations of copper-prion protein interactions focusing on the recently discovered concentration-dependent binding modes in the octarepeat region of this protein. In addition to determining the binding structures, computer simulations were also used to make predictions about PrP's function and the role of copper in prion diseases. The results demonstrate the predictive power and applicability of ab initio simulations for studies of metal-biomolecular complexes. PMID:20118658

  9. Network pattern of residue packing in helical membrane proteins and its application in membrane protein structure prediction.

    PubMed

    Pabuwal, Vagmita; Li, Zhijun

    2008-01-01

    De novo protein structure prediction plays an important role in studies of helical membrane proteins as well as structure-based drug design efforts. Developing an accurate scoring function for protein structure discrimination and validation remains a current challenge. Network approaches based on overall network patterns of residue packing have proven useful in soluble protein structure discrimination. It is thus of interest to apply similar approaches to the studies of residue packing in membrane proteins. In this work, we first carried out such analysis on a set of diverse, non-redundant and high-resolution membrane protein structures. Next, we applied the same approach to three test sets. The first set includes nine structures of membrane proteins with the resolution worse than 2.5 A; the other two sets include a total of 101 G-protein coupled receptor models, constructed using either de novo or homology modeling techniques. Results of analyses indicate the two criteria derived from studying high-resolution membrane protein structures are good indicators of a high-quality native fold and the approach is very effective for discriminating native membrane protein folds from less-native ones. These findings should be of help for the investigation of the fundamental problem of membrane protein structure prediction. PMID:18178566

  10. Protein inter-domain linker prediction using Random Forest and amino acid physiochemical properties

    PubMed Central

    2014-01-01

    Background Protein chains are generally long and consist of multiple domains. Domains are distinct structural units of a protein that can evolve and function independently. The accurate prediction of protein domain linkers and boundaries is often regarded as the initial step of protein tertiary structure and function predictions. Such information not only enhances protein-targeted drug development but also reduces the experimental cost of protein analysis by allowing researchers to work on a set of smaller and independent units. In this study, we propose a novel and accurate domain-linker prediction approach based on protein primary structure information only. We utilize a nature-inspired machine-learning model called Random Forest along with a novel domain-linker profile that contains physiochemical and domain-linker information of amino acid sequences. Results The proposed approach was tested on two well-known benchmark protein datasets and achieved 68% sensitivity and 99% precision, which is better than any existing protein domain-linker predictor. Without applying any data balancing technique such as class weighting and data re-sampling, the proposed approach is able to accurately classify inter-domain linkers from highly imbalanced datasets. Conclusion Our experimental results prove that the proposed approach is useful for domain-linker identification in highly imbalanced single- and multi-domain proteins. PMID:25521329

  11. Prediction of Spontaneous Protein Deamidation from Sequence-Derived Secondary Structure and Intrinsic Disorder

    PubMed Central

    Lorenzo, J. Ramiro; Alonso, Leonardo G.; Sánchez, Ignacio E.

    2015-01-01

    Asparagine residues in proteins undergo spontaneous deamidation, a post-translational modification that may act as a molecular clock for the regulation of protein function and turnover. Asparagine deamidation is modulated by protein local sequence, secondary structure and hydrogen bonding. We present NGOME, an algorithm able to predict non-enzymatic deamidation of internal asparagine residues in proteins in the absence of structural data, using sequence-based predictions of secondary structure and intrinsic disorder. Compared to previous algorithms, NGOME does not require three-dimensional structures yet yields better predictions than available sequence-only methods. Four case studies of specific proteins show how NGOME may help the user identify deamidation-prone asparagine residues, often related to protein gain of function, protein degradation or protein misfolding in pathological processes. A fifth case study applies NGOME at a proteomic scale and unveils a correlation between asparagine deamidation and protein degradation in yeast. NGOME is freely available as a webserver at the National EMBnet node Argentina, URL: http://www.embnet.qb.fcen.uba.ar/ in the subpage “Protein and nucleic acid structure and sequence analysis”. PMID:26674530

  12. Protein structure, spectral properties, and photobiological function of lumazine protein

    NASA Astrophysics Data System (ADS)

    Lee, John W.; Bradley, Elizabeth A.; O'Kane, Dennis J.

    1992-04-01

    Protein sequence analysis, nuclear magnetic resonance, and fluorescence dynamics have been applied in a determination of the interactions of the lumazine derivative with the amino acid residues in the proposed ligand binding site of lumazine protein. It is these interactions that `tune' the excited state properties of the bound lumazine so that it can perform its photobiological function as the emitter of bioluminescence in Photobacterium species. A three- way sequence alignment shows that lumazine protein is homologous with the yellow- fluorescent protein of Vibrio fischeri and the riboflavin synthase from Bacillus subtilis. This last enzyme is ubiquitous in procaryotes, and utilizes two of these same lumazines as substrates for the production of riboflavin. By analogy with riboflavin synthase, a short sequence in the lumazine protein has been suggested as the ligand binding site. In riboflavin synthase there is a second binding site, but this is absent in lumazine protein, thus negating any synthase activity for this protein. Hydrogen bonds to the residues in this binding domain and `freeze' the lumazine structure into the highly polar tautomer deduced from NMR evidence. This also accounts for the rigidity of binding shown by the 23 ns (2 degree(s)C) rotational correlation time of the bound ligand as well as the strong blue shift of the fluorescence maximum, from 490 nm free to 475 nm when bound.

  13. Investigating neuronal function with optically controllable proteins

    PubMed Central

    Zhou, Xin X.; Pan, Michael; Lin, Michael Z.

    2015-01-01

    In the nervous system, protein activities are highly regulated in space and time. This regulation allows for fine modulation of neuronal structure and function during development and adaptive responses. For example, neurite extension and synaptogenesis both involve localized and transient activation of cytoskeletal and signaling proteins, allowing changes in microarchitecture to occur rapidly and in a localized manner. To investigate the role of specific protein regulation events in these processes, methods to optically control the activity of specific proteins have been developed. In this review, we focus on how photosensory domains enable optical control over protein activity and have been used in neuroscience applications. These tools have demonstrated versatility in controlling various proteins and thereby cellular functions, and possess enormous potential for future applications in nervous systems. Just as optogenetic control of neuronal firing using opsins has changed how we investigate the function of cellular circuits in vivo, optical control may yet yield another revolution in how we study the circuitry of intracellular signaling in the brain. PMID:26257603

  14. Truly Absorbed Microbial Protein Synthesis, Rumen Bypass Protein, Endogenous Protein, and Total Metabolizable Protein from Starchy and Protein-Rich Raw Materials: Model Comparison and Predictions.

    PubMed

    Parand, Ehsan; Vakili, Alireza; Mesgaran, Mohsen Danesh; van Duinkerken, Gert; Yu, Peiqiang

    2015-07-29

    This study was carried out to measure truly absorbed microbial protein synthesis, rumen bypass protein, and endogenous protein loss, as well as total metabolizable protein, from starchy and protein-rich raw feed materials with model comparisons. Predictions by the DVE2010 system as a more mechanistic model were compared with those of two other models, DVE1994 and NRC-2001, that are frequently used in common international feeding practice. DVE1994 predictions for intestinally digestible rumen undegradable protein (ARUP) for starchy concentrates were higher (27 vs 18 g/kg DM, p < 0.05, SEM = 1.2) than predictions by the NRC-2001, whereas there was no difference in predictions for ARUP from protein concentrates among the three models. DVE2010 and NRC-2001 had highest estimations of intestinally digestible microbial protein for starchy (92 g/kg DM in DVE2010 vs 46 g/kg DM in NRC-2001 and 67 g/kg DM in DVE1994, p < 0.05 SEM = 4) and protein concentrates (69 g/kg DM in NRC-2001 vs 31 g/kg DM in DVE1994 and 49 g/kg DM in DVE2010, p < 0.05 SEM = 4), respectively. Potential protein supplies predicted by tested models from starchy and protein concentrates are widely different, and comparable direct measurements are needed to evaluate the actual ability of different models to predict the potential protein supply to dairy cows from different feedstuffs. PMID:26118653

  15. Evolution-Based Functional Decomposition of Proteins

    PubMed Central

    Rivoire, Olivier; Reynolds, Kimberly A.; Ranganathan, Rama

    2016-01-01

    The essential biological properties of proteins—folding, biochemical activities, and the capacity to adapt—arise from the global pattern of interactions between amino acid residues. The statistical coupling analysis (SCA) is an approach to defining this pattern that involves the study of amino acid coevolution in an ensemble of sequences comprising a protein family. This approach indicates a functional architecture within proteins in which the basic units are coupled networks of amino acids termed sectors. This evolution-based decomposition has potential for new understandings of the structural basis for protein function. To facilitate its usage, we present here the principles and practice of the SCA and introduce new methods for sector analysis in a python-based software package (pySCA). We show that the pattern of amino acid interactions within sectors is linked to the divergence of functional lineages in a multiple sequence alignment—a model for how sector properties might be differentially tuned in members of a protein family. This work provides new tools for studying proteins and for generally testing the concept of sectors as the principal units of function and adaptive variation. PMID:27254668

  16. The Proteome Folding Project: Proteome-scale prediction of structure and function

    PubMed Central

    Drew, Kevin; Winters, Patrick; Butterfoss, Glenn L.; Berstis, Viktors; Uplinger, Keith; Armstrong, Jonathan; Riffle, Michael; Schweighofer, Erik; Bovermann, Bill; Goodlett, David R.; Davis, Trisha N.; Shasha, Dennis; Malmström, Lars; Bonneau, Richard

    2011-01-01

    The incompleteness of proteome structure and function annotation is a critical problem for biologists and, in particular, severely limits interpretation of high-throughput and next-generation experiments. We have developed a proteome annotation pipeline based on structure prediction, where function and structure annotations are generated using an integration of sequence comparison, fold recognition, and grid-computing-enabled de novo structure prediction. We predict protein domain boundaries and three-dimensional (3D) structures for protein domains from 94 genomes (including human, Arabidopsis, rice, mouse, fly, yeast, Escherichia coli, and worm). De novo structure predictions were distributed on a grid of more than 1.5 million CPUs worldwide (World Community Grid). We generated significant numbers of new confident fold annotations (9% of domains that are otherwise unannotated in these genomes). We demonstrate that predicted structures can be combined with annotations from the Gene Ontology database to predict new and more specific molecular functions. PMID:21824995

  17. Improved method for predicting protein fold patterns with ensemble classifiers.

    PubMed

    Chen, W; Liu, X; Huang, Y; Jiang, Y; Zou, Q; Lin, C

    2012-01-01

    Protein folding is recognized as a critical problem in the field of biophysics in the 21st century. Predicting protein-folding patterns is challenging due to the complex structure of proteins. In an attempt to solve this problem, we employed ensemble classifiers to improve prediction accuracy. In our experiments, 188-dimensional features were extracted based on the composition and physical-chemical property of proteins and 20-dimensional features were selected using a coupled position-specific scoring matrix. Compared with traditional prediction methods, these methods were superior in terms of prediction accuracy. The 188-dimensional feature-based method achieved 71.2% accuracy in five cross-validations. The accuracy rose to 77% when we used a 20-dimensional feature vector. These methods were used on recent data, with 54.2% accuracy. Source codes and dataset, together with web server and software tools for prediction, are available at: http://datamining.xmu.edu.cn/main/~cwc/ProteinPredict.html. PMID:22370884

  18. WeFold: A Coopetition for Protein Structure Prediction

    PubMed Central

    Khoury, George A.; Liwo, Adam; Khatib, Firas; Zhou, Hongyi; Chopra, Gaurav; Bacardit, Jaume; Bortot, Leandro O.; Faccioli, Rodrigo A.; Deng, Xin; He, Yi; Krupa, Pawel; Li, Jilong; Mozolewska, Magdalena A.; Sieradzan, Adam K.; Smadbeck, James; Wirecki, Tomasz; Cooper, Seth; Flatten, Jeff; Xu, Kefan; Baker, David; Cheng, Jianlin; Delbem, Alexandre C. B.; Floudas, Christodoulos A.; Keasar, Chen; Levitt, Michael; Popović, Zoran; Scheraga, Harold A.; Skolnick, Jeffrey; Crivelli, Silvia N.; Players, Foldit

    2014-01-01

    The protein structure prediction problem continues to elude scientists. Despite the introduction of many methods, only modest gains were made over the last decade for certain classes of prediction targets. To address this challenge, a social-media based worldwide collaborative effort, named WeFold, was undertaken by thirteen labs. During the collaboration, the labs were simultaneously competing with each other. Here, we present the first attempt at “coopetition” in scientific research applied to the protein structure prediction and refinement problems. The coopetition was possible by allowing the participating labs to contribute different components of their protein structure prediction pipelines and create new hybrid pipelines that they tested during CASP10. This manuscript describes both successes and areas needing improvement as identified throughout the first WeFold experiment and discusses the efforts that are underway to advance this initiative. A footprint of all contributions and structures are publicly accessible at http://www.wefold.org. PMID:24677212

  19. WeFold: a coopetition for protein structure prediction.

    PubMed

    Khoury, George A; Liwo, Adam; Khatib, Firas; Zhou, Hongyi; Chopra, Gaurav; Bacardit, Jaume; Bortot, Leandro O; Faccioli, Rodrigo A; Deng, Xin; He, Yi; Krupa, Pawel; Li, Jilong; Mozolewska, Magdalena A; Sieradzan, Adam K; Smadbeck, James; Wirecki, Tomasz; Cooper, Seth; Flatten, Jeff; Xu, Kefan; Baker, David; Cheng, Jianlin; Delbem, Alexandre C B; Floudas, Christodoulos A; Keasar, Chen; Levitt, Michael; Popović, Zoran; Scheraga, Harold A; Skolnick, Jeffrey; Crivelli, Silvia N

    2014-09-01

    The protein structure prediction problem continues to elude scientists. Despite the introduction of many methods, only modest gains were made over the last decade for certain classes of prediction targets. To address this challenge, a social-media based worldwide collaborative effort, named WeFold, was undertaken by 13 labs. During the collaboration, the laboratories were simultaneously competing with each other. Here, we present the first attempt at "coopetition" in scientific research applied to the protein structure prediction and refinement problems. The coopetition was possible by allowing the participating labs to contribute different components of their protein structure prediction pipelines and create new hybrid pipelines that they tested during CASP10. This manuscript describes both successes and areas needing improvement as identified throughout the first WeFold experiment and discusses the efforts that are underway to advance this initiative. A footprint of all contributions and structures are publicly accessible at http://www.wefold.org. PMID:24677212

  20. Predicting Ligand Binding Sites on Protein Surfaces by 3-Dimensional Probability Density Distributions of Interacting Atoms

    PubMed Central

    Jian, Jhih-Wei; Elumalai, Pavadai; Pitti, Thejkiran; Wu, Chih Yuan; Tsai, Keng-Chang; Chang, Jeng-Yih; Peng, Hung-Pin; Yang, An-Suei

    2016-01-01

    Predicting ligand binding sites (LBSs) on protein structures, which are obtained either from experimental or computational methods, is a useful first step in functional annotation or structure-based drug design for the protein structures. In this work, the structure-based machine learning algorithm ISMBLab-LIG was developed to predict LBSs on protein surfaces with input attributes derived from the three-dimensional probability density maps of interacting atoms, which were reconstructed on the query protein surfaces and were relatively insensitive to local conformational variations of the tentative ligand binding sites. The prediction accuracy of the ISMBLab-LIG predictors is comparable to that of the best LBS predictors benchmarked on several well-established testing datasets. More importantly, the ISMBLab-LIG algorithm has substantial tolerance to the prediction uncertainties of computationally derived protein structure models. As such, the method is particularly useful for predicting LBSs not only on experimental protein structures without known LBS templates in the database but also on computationally predicted model protein structures with structural uncertainties in the tentative ligand binding sites. PMID:27513851

  1. Stringent homology-based prediction of H. sapiens-M. tuberculosis H37Rv protein-protein interactions

    PubMed Central

    2014-01-01

    Background H. sapiens-M. tuberculosis H37Rv protein-protein interaction (PPI) data are essential for understanding the infection mechanism of the formidable pathogen M. tuberculosis H37Rv. Computational prediction is an important strategy to fill the gap in experimental H. sapiens-M. tuberculosis H37Rv PPI data. Homology-based prediction is frequently used in predicting both intra-species and inter-species PPIs. However, some limitations are not properly resolved in several published works that predict eukaryote-prokaryote inter-species PPIs using intra-species template PPIs. Results We develop a stringent homology-based prediction approach by taking into account (i) differences between eukaryotic and prokaryotic proteins and (ii) differences between inter-species and intra-species PPI interfaces. We compare our stringent homology-based approach to a conventional homology-based approach for predicting host-pathogen PPIs, based on cellular compartment distribution analysis, disease gene list enrichment analysis, pathway enrichment analysis and functional category enrichment analysis. These analyses support the validity of our prediction result, and clearly show that our approach has better performance in predicting H. sapiens-M. tuberculosis H37Rv PPIs. Using our stringent homology-based approach, we have predicted a set of highly plausible H. sapiens-M. tuberculosis H37Rv PPIs which might be useful for many of related studies. Based on our analysis of the H. sapiens-M. tuberculosis H37Rv PPI network predicted by our stringent homology-based approach, we have discovered several interesting properties which are reported here for the first time. We find that both host proteins and pathogen proteins involved in the host-pathogen PPIs tend to be hubs in their own intra-species PPI network. Also, both host and pathogen proteins involved in host-pathogen PPIs tend to have longer primary sequence, tend to have more domains, tend to be more hydrophilic, etc. And the protein

  2. Functional module identification in protein interaction networks by interaction patterns

    PubMed Central

    Wang, Yijie; Qian, Xiaoning

    2014-01-01

    Motivation: Identifying functional modules in protein–protein interaction (PPI) networks may shed light on cellular functional organization and thereafter underlying cellular mechanisms. Many existing module identification algorithms aim to detect densely connected groups of proteins as potential modules. However, based on this simple topological criterion of ‘higher than expected connectivity’, those algorithms may miss biologically meaningful modules of functional significance, in which proteins have similar interaction patterns to other proteins in networks but may not be densely connected to each other. A few blockmodel module identification algorithms have been proposed to address the problem but the lack of global optimum guarantee and the prohibitive computational complexity have been the bottleneck of their applications in real-world large-scale PPI networks. Results: In this article, we propose a novel optimization formulation LCP2 (low two-hop conductance sets) using the concept of Markov random walk on graphs, which enables simultaneous identification of both dense and sparse modules based on protein interaction patterns in given networks through searching for LCP2 by random walk. A spectral approximate algorithm SLCP2 is derived to identify non-overlapping functional modules. Based on a bottom-up greedy strategy, we further extend LCP2 to a new algorithm (greedy algorithm for LCP2) GLCP2 to identify overlapping functional modules. We compare SLCP2 and GLCP2 with a range of state-of-the-art algorithms on synthetic networks and real-world PPI networks. The performance evaluation based on several criteria with respect to protein complex prediction, high level Gene Ontology term prediction and especially sparse module detection, has demonstrated that our algorithms based on searching for LCP2 outperform all other compared algorithms. Availability and implementation: All data and code are available at http://www.cse.usf.edu/∼xqian/fmi/slcp2hop

  3. A computational tool for identifying minimotifs in protein-protein interactions and improving the accuracy of minimotif predictions.

    PubMed

    Rajasekaran, Sanguthevar; Merlin, Jerlin Camilus; Kundeti, Vamsi; Mi, Tian; Oommen, Aaron; Vyas, Jay; Alaniz, Izua; Chung, Keith; Chowdhury, Farah; Deverasatty, Sandeep; Irvey, Tenisha M; Lacambacal, David; Lara, Darlene; Panchangam, Subhasree; Rathnayake, Viraj; Watts, Paula; Schiller, Martin R

    2011-01-01

    Protein-protein interactions are important to understanding cell functions; however, our theoretical understanding is limited. There is a general discontinuity between the well-accepted physical and chemical forces that drive protein-protein interactions and the large collections of identified protein-protein interactions in various databases. Minimotifs are short functional peptide sequences that provide a basis to bridge this gap in knowledge. However, there is no systematic way to study minimotifs in the context of protein-protein interactions or vice versa. Here we have engineered a set of algorithms that can be used to identify minimotifs in known protein-protein interactions and implemented this for use by scientists in Minimotif Miner. By globally testing these algorithms on verified data and on 100 individual proteins as test cases, we demonstrate the utility of these new computation tools. This tool also can be used to reduce false-positive predictions in the discovery of novel minimotifs. The statistical significance of these algorithms is demonstrated by an ROC analysis (P = 0.001). PMID:20938975

  4. A protein structural classes prediction method based on predicted secondary structure and PSI-BLAST profile.

    PubMed

    Ding, Shuyan; Li, Yan; Shi, Zhuoxing; Yan, Shoujiang

    2014-02-01

    Knowledge of protein secondary structural classes plays an important role in understanding protein folding patterns. In this paper, 25 features based on position-specific scoring matrices are selected to reflect evolutionary information. In combination with other 11 rational features based on predicted protein secondary structure sequences proposed by the previous researchers, a 36-dimensional representation feature vector is presented to predict protein secondary structural classes for low-similarity sequences. ASTRALtraining dataset is used to train and design our method, other three low-similarity datasets ASTRALtest, 25PDB and 1189 are used to test the proposed method. Comparisons with other methods show that our method is effective to predict protein secondary structural classes. Stand alone version of the proposed method (PSSS-PSSM) is written in MATLAB language and it can be downloaded from http://letsgob.com/bioinfo_PSSS_PSSM/. PMID:24067326

  5. Protein location prediction using atomic composition and global features of the amino acid sequence

    SciTech Connect

    Cherian, Betsy Sheena; Nair, Achuthsankar S.

    2010-01-22

    Subcellular location of protein is constructive information in determining its function, screening for drug candidates, vaccine design, annotation of gene products and in selecting relevant proteins for further studies. Computational prediction of subcellular localization deals with predicting the location of a protein from its amino acid sequence. For a computational localization prediction method to be more accurate, it should exploit all possible relevant biological features that contribute to the subcellular localization. In this work, we extracted the biological features from the full length protein sequence to incorporate more biological information. A new biological feature, distribution of atomic composition is effectively used with, multiple physiochemical properties, amino acid composition, three part amino acid composition, and sequence similarity for predicting the subcellular location of the protein. Support Vector Machines are designed for four modules and prediction is made by a weighted voting system. Our system makes prediction with an accuracy of 100, 82.47, 88.81 for self-consistency test, jackknife test and independent data test respectively. Our results provide evidence that the prediction based on the biological features derived from the full length amino acid sequence gives better accuracy than those derived from N-terminal alone. Considering the features as a distribution within the entire sequence will bring out underlying property distribution to a greater detail to enhance the prediction accuracy.

  6. Functions of TET Proteins in Hematopoietic Transformation

    PubMed Central

    Han, Jae-A; An, Jungeun; Ko, Myunggon

    2015-01-01

    DNA methylation is a well-characterized epigenetic modification that plays central roles in mammalian development, genomic imprinting, X-chromosome inactivation and silencing of retrotransposon elements. Aberrant DNA methylation pattern is a characteristic feature of cancers and associated with abnormal expression of oncogenes, tumor suppressor genes or repair genes. Ten-eleven-translocation (TET) proteins are recently characterized dioxygenases that catalyze progressive oxidation of 5-methylcytosine to produce 5-hydroxymethylcytosine and further oxidized derivatives. These oxidized methylcytosines not only potentiate DNA demethylation but also behave as independent epigenetic modifications per se. The expression or activity of TET proteins and DNA hydroxymethylation are highly dysregulated in a wide range of cancers including hematologic and non-hematologic malignancies, and accumulating evidence points TET proteins as a novel tumor suppressor in cancers. Here we review DNA demethylation-dependent and -independent functions of TET proteins. We also describe diverse TET loss-of-function mutations that are recurrently found in myeloid and lymphoid malignancies and their potential roles in hematopoietic transformation. We discuss consequences of the deficiency of individual Tet genes and potential compensation between different Tet members in mice. Possible mechanisms underlying facilitated oncogenic transformation of TET-deficient hematopoietic cells are also described. Lastly, we address non-mutational mechanisms that lead to suppression or inactivation of TET proteins in cancers. Strategies to restore normal 5mC oxidation status in cancers by targeting TET proteins may provide new avenues to expedite the development of promising anti-cancer agents. PMID:26552488

  7. Functional Constraint Profiling of a Viral Protein Reveals Discordance of Evolutionary Conservation and Functionality

    PubMed Central

    Wu, Nicholas C.; Olson, C. Anders; Du, Yushen; Le, Shuai; Tran, Kevin; Remenyi, Roland; Gong, Danyang; Al-Mawsawi, Laith Q.; Qi, Hangfei; Wu, Ting-Ting; Sun, Ren

    2015-01-01

    Viruses often encode proteins with multiple functions due to their compact genomes. Existing approaches to identify functional residues largely rely on sequence conservation analysis. Inferring functional residues from sequence conservation can produce false positives, in which the conserved residues are functionally silent, or false negatives, where functional residues are not identified since they are species-specific and therefore non-conserved. Furthermore, the tedious process of constructing and analyzing individual mutations limits the number of residues that can be examined in a single study. Here, we developed a systematic approach to identify the functional residues of a viral protein by coupling experimental fitness profiling with protein stability prediction using the influenza virus polymerase PA subunit as the target protein. We identified a significant number of functional residues that were influenza type-specific and were evolutionarily non-conserved among different influenza types. Our results indicate that type-specific functional residues are prevalent and may not otherwise be identified by sequence conservation analysis alone. More importantly, this technique can be adapted to any viral (and potentially non-viral) protein where structural information is available. PMID:26132554

  8. The Predikin webserver: improved prediction of protein kinase peptide specificity using structural information

    PubMed Central

    Saunders, Neil F. W.

    2008-01-01

    The Predikin webserver allows users to predict substrates of protein kinases. The Predikin system is built from three components: a database of protein kinase substrates that links phosphorylation sites with specific protein kinase sequences; a perl module to analyse query protein kinases and a web interface through which users can submit protein kinases for analysis. The Predikin perl module provides methods to (i) locate protein kinase catalytic domains in a sequence, (ii) classify them by type or family, (iii) identify substrate-determining residues, (iv) generate weighted scoring matrices using three different methods, (v) extract putative phosphorylation sites in query substrate sequences and (vi) score phosphorylation sites for a given kinase, using optional filters. The web interface provides user-friendly access to each of these functions and allows users to obtain rapidly a set of predictions that they can export for further analysis. The server is available at http://predikin.biosci.uq.edu.au. PMID:18477637

  9. Efficient Prediction of Co-Complexed Proteins Based on Coevolution

    PubMed Central

    de Vienne, Damien M.; Azé, Jérôme

    2012-01-01

    The prediction of the network of protein-protein interactions (PPI) of an organism is crucial for the understanding of biological processes and for the development of new drugs. Machine learning methods have been successfully applied to the prediction of PPI in yeast by the integration of multiple direct and indirect biological data sources. However, experimental data are not available for most organisms. We propose here an ensemble machine learning approach for the prediction of PPI that depends solely on features independent from experimental data. We developed new estimators of the coevolution between proteins and combined them in an ensemble learning procedure. We applied this method to a dataset of known co-complexed proteins in Escherichia coli and compared it to previously published methods. We show that our method allows prediction of PPI with an unprecedented precision of 95.5% for the first 200 sorted pairs of proteins compared to 28.5% on the same dataset with the previous best method. A close inspection of the best predicted pairs allowed us to detect new or recently discovered interactions between chemotactic components, the flagellar apparatus and RNA polymerase complexes in E. coli. PMID:23152796

  10. FUNCTIONALITY OF MEMBRANE SEPARATED EGG WHITE PROTEINS

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The excellent nutritional and functional properties of liquid egg white (LEW), which is essentially a viscous fat-free protein solution, are exploited in many food preparations. Thermal pasteurization (at 56.6oC for 3.5 min. minimum) is currently used by industry to eliminate the microflora in LEW ...

  11. Functional conservation of an ancestral Pellino protein in helminth species.

    PubMed

    Cluxton, Christopher D; Caffrey, Brian E; Kinsella, Gemma K; Moynagh, Paul N; Fares, Mario A; Fallon, Padraic G

    2015-01-01

    The immune system of H. sapiens has innate signaling pathways that arose in ancestral species. This is exemplified by the discovery of the Toll-like receptor (TLR) pathway using free-living model organisms such as Drosophila melanogaster. The TLR pathway is ubiquitous and controls sensitivity to pathogen-associated molecular patterns (PAMPs) in eukaryotes. There is, however, a marked absence of this pathway from the plathyhelminthes, with the exception of the Pellino protein family, which is present in a number of species from this phylum. Helminth Pellino proteins are conserved having high similarity, both at the sequence and predicted structural protein level, with that of human Pellino proteins. Pellino from a model helminth, Schistosoma mansoni Pellino (SmPellino), was shown to bind and poly-ubiquitinate human IRAK-1, displaying E3 ligase activity consistent with its human counterparts. When transfected into human cells SmPellino is functional, interacting with signaling proteins and modulating mammalian signaling pathways. Strict conservation of a protein family in species lacking its niche signalling pathway is rare and provides a platform to examine the ancestral functions of Pellino proteins that may translate into novel mechanisms of immune regulation in humans. PMID:26120048

  12. Exploiting protein flexibility to predict the location of allosteric sites

    PubMed Central

    2012-01-01

    Background Allostery is one of the most powerful and common ways of regulation of protein activity. However, for most allosteric proteins identified to date the mechanistic details of allosteric modulation are not yet well understood. Uncovering common mechanistic patterns underlying allostery would allow not only a better academic understanding of the phenomena, but it would also streamline the design of novel therapeutic solutions. This relatively unexplored therapeutic potential and the putative advantages of allosteric drugs over classical active-site inhibitors fuel the attention allosteric-drug research is receiving at present. A first step to harness the regulatory potential and versatility of allosteric sites, in the context of drug-discovery and design, would be to detect or predict their presence and location. In this article, we describe a simple computational approach, based on the effect allosteric ligands exert on protein flexibility upon binding, to predict the existence and position of allosteric sites on a given protein structure. Results By querying the literature and a recently available database of allosteric sites, we gathered 213 allosteric proteins with structural information that we further filtered into a non-redundant set of 91 proteins. We performed normal-mode analysis and observed significant changes in protein flexibility upon allosteric-ligand binding in 70% of the cases. These results agree with the current view that allosteric mechanisms are in many cases governed by changes in protein dynamics caused by ligand binding. Furthermore, we implemented an approach that achieves 65% positive predictive value in identifying allosteric sites within the set of predicted cavities of a protein (stricter parameters set, 0.22 sensitivity), by combining the current analysis on dynamics with previous results on structural conservation of allosteric sites. We also analyzed four biological examples in detail, revealing that this simple coarse

  13. [Functionally-relevant conformational dynamics of water-soluble proteins].

    PubMed

    Novikov, G V; Sivozhelezov, V S; Shaĭtan, K V

    2013-01-01

    A study is reported of the functional-relevant dynamics of three typical water-soluble proteins: Calmodulin, Src-tyrosine kinase as well as repressor of Trp operon. Application of the state-of-art methods of structural bioinformatics allowed to identify dynamics seen in the X-ray structures of the investigated proteins associated with their specific biological functions. In addition, Normal Mode analysis technique revealed the most probable directions of the functionally-relevant motions for all that proteins were also predicted. Importantly, overall type of the motions observed on the lowest-frequency modes was very similar to the motions seen from the analysis of the X-ray data of the examined macromolecules. Thereby it was shown that the large-scale as well as local conformational motions of the proteins might be predetermined already at the level of their tertiary structures. In particular, the determining factor might be the specific fold of the alpha-helixes. Thus functionally-relevant in vivo dynamics of the investigated proteins might be evolutionally formed by means of natural selection at the level of the spatial topology. PMID:23705506

  14. Functionally Relevant Specific Packing Can Determine Protein Folding Routes.

    PubMed

    Yadahalli, Shilpa; Gosavi, Shachi

    2016-01-29

    Functional residues can modulate the folding mechanisms of proteins. In some proteins, mutations to such residues can radically change the primary folding route. Is it possible then to learn more about the functional regions of a protein by investigating just its choice of folding route? The folding and the function of the protein Escherichia coli ribonuclease H (ecoRNase-H) have been extensively studied and its folding route is known to near-residue resolution. Here, we computationally study the folding of ecoRNase-H using molecular dynamics simulations of structure-based models of increasing complexity. The differences between a model that correctly predicts the experimentally determined folding route and a simpler model that does not can be attributed to a set of six aromatic residues clustered together in a region of the protein called CORE. This clustering, which we term "specific" packing, drives CORE to fold early and determines the folding route. Both the residues involved in specific packing and their packing are largely conserved across E. coli-like RNase-Hs from diverse species. Residue conservation is usually implicated in function. Here, the identified residues either are known to bind substrate in ecoRNase-H or pack against the substrate in the homologous human RNase-H where a substrate-bound crystal structure exists. Thus, the folding mechanism of ecoRNase-H is a byproduct of functional demands upon its sequence. Using our observations on specific packing, we suggest mutations to an engineered HIV RNase-H to make its function better. Our results show that understanding folding route choice in proteins can provide unexpected insights into their function. PMID:26724535

  15. Defining and predicting structurally conserved regions in protein superfamilies

    PubMed Central

    Huang, Ivan K.; Grishin, Nick V.

    2013-01-01

    Motivation: The structures of homologous proteins are generally better conserved than their sequences. This phenomenon is demonstrated by the prevalence of structurally conserved regions (SCRs) even in highly divergent protein families. Defining SCRs requires the comparison of two or more homologous structures and is affected by their availability and divergence, and our ability to deduce structurally equivalent positions among them. In the absence of multiple homologous structures, it is necessary to predict SCRs of a protein using information from only a set of homologous sequences and (if available) a single structure. Accurate SCR predictions can benefit homology modelling and sequence alignment. Results: Using pairwise DaliLite alignments among a set of homologous structures, we devised a simple measure of structural conservation, termed structural conservation index (SCI). SCI was used to distinguish SCRs from non-SCRs. A database of SCRs was compiled from 386 SCOP superfamilies containing 6489 protein domains. Artificial neural networks were then trained to predict SCRs with various features deduced from a single structure and homologous sequences. Assessment of the predictions via a 5-fold cross-validation method revealed that predictions based on features derived from a single structure perform similarly to ones based on homologous sequences, while combining sequence and structural features was optimal in terms of accuracy (0.755) and Matthews correlation coefficient (0.476). These results suggest that even without information from multiple structures, it is still possible to effectively predict SCRs for a protein. Finally, inspection of the structures with the worst predictions pinpoints difficulties in SCR definitions. Availability: The SCR database and the prediction server can be found at http://prodata.swmed.edu/SCR. Contact: 91huangi@gmail.com or grishin@chop.swmed.edu Supplementary information: Supplementary data are available at Bioinformatics

  16. A Prediction Model of the Capillary Pressure J-Function.

    PubMed

    Xu, W S; Luo, P Y; Sun, L; Lin, N

    2016-01-01

    The capillary pressure J-function is a dimensionless measure of the capillary pressure of a fluid in a porous medium. The function was derived based on a capillary bundle model. However, the dependence of the J-function on the saturation Sw is not well understood. A prediction model for it is presented based on capillary pressure model, and the J-function prediction model is a power function instead of an exponential or polynomial function. Relative permeability is calculated with the J-function prediction model, resulting in an easier calculation and results that are more representative. PMID:27603701

  17. Inferring modules of functionally interacting proteins using the Bond Energy Algorithm

    PubMed Central

    Watanabe, Ryosuke LA; Morett, Enrique; Vallejo, Edgar E

    2008-01-01

    Background Non-homology based methods such as phylogenetic profiles are effective for predicting functional relationships between proteins with no considerable sequence or structure similarity. Those methods rely heavily on traditional similarity metrics defined on pairs of phylogenetic patterns. Proteins do not exclusively interact in pairs as the final biological function of a protein in the cellular context is often hold by a group of proteins. In order to accurately infer modules of functionally interacting proteins, the consideration of not only direct but also indirect relationships is required. In this paper, we used the Bond Energy Algorithm (BEA) to predict functionally related groups of proteins. With BEA we create clusters of phylogenetic profiles based on the associations of the surrounding elements of the analyzed data using a metric that considers linked relationships among elements in the data set. Results Using phylogenetic profiles obtained from the Cluster of Orthologous Groups of Proteins (COG) database, we conducted a series of clustering experiments using BEA to predict (upper level) relationships between profiles. We evaluated our results by comparing with COG's functional categories, And even more, with the experimentally determined functional relationships between proteins provided by the DIP and ECOCYC databases. Our results demonstrate that BEA is capable of predicting meaningful modules of functionally related proteins. BEA outperforms traditionally used clustering methods, such as k-means and hierarchical clustering by predicting functional relationships between proteins with higher accuracy. Conclusion This study shows that the linked relationships of phylogenetic profiles obtained by BEA is useful for detecting functional associations between profiles and extending functional modules not found by traditional methods. BEA is capable of detecting relationship among phylogenetic patterns by linking them through a common element shared in

  18. Knowledge of Native Protein-Protein Interfaces Is Sufficient To Construct Predictive Models for the Selection of Binding Candidates.

    PubMed

    Popov, Petr; Grudinin, Sergei

    2015-10-26

    Selection of putative binding poses is a challenging part of virtual screening for protein-protein interactions. Predictive models to filter out binding candidates with the highest binding affinities comprise scoring functions that assign a score to each binding pose. Existing scoring functions are typically deduced by collecting statistical information about interfaces of native conformations of protein complexes along with interfaces of a large generated set of non-native conformations. However, the obtained scoring functions become biased toward the method used to generate the non-native conformations, i.e., they may not recognize near-native interfaces generated with a different method. The present study demonstrates that knowledge of only native protein-protein interfaces is sufficient to construct well-discriminative predictive models for the selection of binding candidates. Here we introduce a new scoring method that comprises a knowledge-based potential called KSENIA deduced from structural information about the native interfaces of 844 crystallographic protein-protein complexes. We derive KSENIA using convex optimization with a training set composed of native protein complexes and their near-native conformations obtained using deformations along the low-frequency normal modes. As a result, our knowledge-based potential has only marginal bias toward a method used to generate putative binding poses. Furthermore, KSENIA is smooth by construction, which allows it to be used along with rigid-body optimization to refine the binding poses. Using several test benchmarks, we demonstrate that our method discriminates well native and near-native conformations of protein complexes from non-native ones. Our methodology can be easily adapted to the recognition of other types of molecular interactions, such as protein-ligand, protein-RNA, etc. KSENIA will be made publicly available as a part of the SAMSON software platform at https://team.inria.fr/nano-d/software . PMID

  19. Highly Accurate Prediction of Protein-Protein Interactions via Incorporating Evolutionary Information and Physicochemical Characteristics.

    PubMed

    Li, Zheng-Wei; You, Zhu-Hong; Chen, Xing; Gui, Jie; Nie, Ru

    2016-01-01

    Protein-protein interactions (PPIs) occur at almost all levels of cell functions and play crucial roles in various cellular processes. Thus, identification of PPIs is critical for deciphering the molecular mechanisms and further providing insight into biological processes. Although a variety of high-throughput experimental techniques have been developed to identify PPIs, existing PPI pairs by experimental approaches only cover a small fraction of the whole PPI networks, and further, those approaches hold inherent disadvantages, such as being time-consuming, expensive, and having high false positive rate. Therefore, it is urgent and imperative to develop automatic in silico approaches to predict PPIs efficiently and accurately. In this article, we propose a novel mixture of physicochemical and evolutionary-based feature extraction method for predicting PPIs using our newly developed discriminative vector machine (DVM) classifier. The improvements of the proposed method mainly consist in introducing an effective feature extraction method that can capture discriminative features from the evolutionary-based information and physicochemical characteristics, and then a powerful and robust DVM classifier is employed. To the best of our knowledge, it is the first time that DVM model is applied to the field of bioinformatics. When applying the proposed method to the Yeast and Helicobacter pylori (H. pylori) datasets, we obtain excellent prediction accuracies of 94.35% and 90.61%, respectively. The computational results indicate that our method is effective and robust for predicting PPIs, and can be taken as a useful supplementary tool to the traditional experimental methods for future proteomics research. PMID:27571061

  20. A predicted protein interactome identifies conserved global networks and disease resistance subnetworks in maize

    PubMed Central

    Musungu, Bryan; Bhatnagar, Deepak; Brown, Robert L.; Fakhoury, Ahmad M.; Geisler, Matt

    2015-01-01

    Interactomes are genome-wide roadmaps of protein-protein interactions. They have been produced for humans, yeast, the fruit fly, and Arabidopsis thaliana and have become invaluable tools for generating and testing hypotheses. A predicted interactome for Zea mays (PiZeaM) is presented here as an aid to the research community for this valuable crop species. PiZeaM was built using a proven method of interologs (interacting orthologs) that were identified using both one-to-one and many-to-many orthology between genomes of maize and reference species. Where both maize orthologs occurred for an experimentally determined interaction in the reference species, we predicted a likely interaction in maize. A total of 49,026 unique interactions for 6004 maize proteins were predicted. These interactions are enriched for processes that are evolutionarily conserved, but include many otherwise poorly annotated proteins in maize. The predicted maize interactions were further analyzed by comparing annotation of interacting proteins, including different layers of ontology. A map of pairwise gene co-expression was also generated and compared to predicted interactions. Two global subnetworks were constructed for highly conserved interactions. These subnetworks showed clear clustering of proteins by function. Another subnetwork was created for disease response using a bait and prey strategy to capture interacting partners for proteins that respond to other organisms. Closer examination of this subnetwork revealed the connectivity between biotic and abiotic hormone stress pathways. We believe PiZeaM will provide a useful tool for the prediction of protein function and analysis of pathways for Z. mays researchers and is presented in this paper as a reference tool for the exploration of protein interactions in maize. PMID:26089837

  1. [Pathophysiological functions of follistatin related protein].

    PubMed

    Shen, Hua; Liu, Yu-Yang

    2009-10-01

    Follistatin related protein (FRP) is an extra-cellular glycoprotein, involved in several pathological and physiological processes such as cell proliferation, migration, tissue remodeling, embryonic development, and cell-cell interaction. Nowadays researches showed that FRP possesses dual functions, including inhibiting cell apoptosis and inhibiting cell proliferation. In myocardial ischemia model, FRP is certified to have the effect of protecting myocardial cell and inhibiting apoptosis. At the same time FRP promotes endothelial cell proliferation. FRP is also synthesized by vascular smooth muscle cell (VSMC) to regulate the functions of VSMC via feedback mechanism. FRP can induce apoptosis in various cancer cell lines. In this review, we summarized the up-to-date data to show the structure, functions, mechanisms and regulation pathways of the protein. PMID:21417029

  2. Prediction of protein folding rates from simplified secondary structure alphabet.

    PubMed

    Huang, Jitao T; Wang, Titi; Huang, Shanran R; Li, Xin

    2015-10-21

    Protein folding is a very complicated and highly cooperative dynamic process. However, the folding kinetics is likely to depend more on a few key structural features. Here we find that secondary structures can determine folding rates of only large, multi-state folding proteins and fails to predict those for small, two-state proteins. The importance of secondary structures for protein folding is ordered as: extended β strand > α helix > bend > turn > undefined secondary structure>310 helix > isolated β strand > π helix. Only the first three secondary structures, extended β strand, α helix and bend, can achieve a good correlation with folding rates. This suggests that the rate-limiting step of protein folding would depend upon the formation of regular secondary structures and the buckling of chain. The reduced secondary structure alphabet provides a simplified description for the machine learning applications in protein design. PMID:26247139

  3. PPCM: Combing Multiple Classifiers to Improve Protein-Protein Interaction Prediction

    DOE PAGESBeta

    Yao, Jianzhuang; Guo, Hong; Yang, Xiaohan

    2015-01-01

    Determining protein-protein interaction (PPI) in biological systems is of considerable importance, and prediction of PPI has become a popular research area. Although different classifiers have been developed for PPI prediction, no single classifier seems to be able to predict PPI with high confidence. We postulated that by combining individual classifiers the accuracy of PPI prediction could be improved. We developed a method called protein-protein interaction prediction classifiers merger (PPCM), and this method combines output from two PPI prediction tools, GO2PPI and Phyloprof, using Random Forests algorithm. The performance of PPCM was tested by area under the curve (AUC) using anmore » assembled Gold Standard database that contains both positive and negative PPI pairs. Our AUC test showed that PPCM significantly improved the PPI prediction accuracy over the corresponding individual classifiers. We found that additional classifiers incorporated into PPCM could lead to further improvement in the PPI prediction accuracy. Furthermore, cross species PPCM could achieve competitive and even better prediction accuracy compared to the single species PPCM. This study established a robust pipeline for PPI prediction by integrating multiple classifiers using Random Forests algorithm. This pipeline will be useful for predicting PPI in nonmodel species.« less

  4. SAM-T08, HMM-based protein structure prediction

    PubMed Central

    Karplus, Kevin

    2009-01-01

    The SAM-T08 web server is a protein structure prediction server that provides several useful intermediate results in addition to the final predicted 3D structure: three multiple sequence alignments of putative homologs using different iterated search procedures, prediction of local structure features including various backbone and burial properties, calibrated E-values for the significance of template searches of PDB and residue–residue contact predictions. The server has been validated as part of the CASP8 assessment of structure prediction as having good performance across all classes of predictions. The SAM-T08 server is available at http://compbio.soe.ucsc.edu/SAM_T08/T08-query.html PMID:19483096

  5. Domain-mediated protein interaction prediction: From genome to network.

    PubMed

    Reimand, Jüri; Hui, Shirley; Jain, Shobhit; Law, Brian; Bader, Gary D

    2012-08-14

    Protein-protein interactions (PPIs), involved in many biological processes such as cellular signaling, are ultimately encoded in the genome. Solving the problem of predicting protein interactions from the genome sequence will lead to increased understanding of complex networks, evolution and human disease. We can learn the relationship between genomes and networks by focusing on an easily approachable subset of high-resolution protein interactions that are mediated by peptide recognition modules (PRMs) such as PDZ, WW and SH3 domains. This review focuses on computational prediction and analysis of PRM-mediated networks and discusses sequence- and structure-based interaction predictors, techniques and datasets for identifying physiologically relevant PPIs, and interpreting high-resolution interaction networks in the context of evolution and human disease. PMID:22561014

  6. DSP: a protein shape string and its profile prediction server

    PubMed Central

    Sun, Jiangming; Tang, Shengnan; Xiong, Wenwei; Cong, Peisheng; Li, Tonghua

    2012-01-01

    Many studies have demonstrated that shape string is an extremely important structure representation, since it is more complete than the classical secondary structure. The shape string provides detailed information also in the regions denoted random coil. But few services are provided for systematic analysis of protein shape string. To fill this gap, we have developed an accurate shape string predictor based on two innovative technologies: a knowledge-driven sequence alignment and a sequence shape string profile method. The performance on blind test data demonstrates that the proposed method can be used for accurate prediction of protein shape string. The DSP server provides both predicted shape string and sequence shape string profile for each query sequence. Using this information, the users can compare protein structure or display protein evolution in shape string space. The DSP server is available at both http://cheminfo.tongji.edu.cn/dsp/ and its main mirror http://chemcenter.tongji.edu.cn/dsp/. PMID:22553364

  7. Protein short loop prediction in terms of a structural alphabet.

    PubMed

    Tyagi, Manoj; Bornot, Aurélie; Offmann, Bernard; de Brevern, Alexandre G

    2009-08-01

    Loops connect regular secondary structures. In many instances, they are known to play crucial biological roles. To bypass the limitation of secondary structure description, we previously defined a structural alphabet composed of 16 structural prototypes, called Protein Blocks (PBs). It leads to an accurate description of every region of 3D protein backbones and has been used in local structure prediction. In the present study, we used our structural alphabet to predict the loops connecting two repetitive structures. Thus, we showed interest to take into account the flanking regions, leading to prediction rate improvement up to 19.8%, but we also underline the sensitivity of such an approach. This research can be used to propose different structures for the loops and to probe and sample their flexibility. It is a useful tool for ab initio loop prediction and leads to insights into flexible docking approach. PMID:19625218

  8. Improving Protein Expression Prediction Using Extra Features and Ensemble Averaging

    PubMed Central

    Fernandes, Armando; Vinga, Susana

    2016-01-01

    The article focus is the improvement of machine learning models capable of predicting protein expression levels based on their codon encoding. Support vector regression (SVR) and partial least squares (PLS) were used to create the models. SVR yields predictions that surpass those of PLS. It is shown that it is possible to improve the models predictive ability by using two more input features, codon identification number and codon count, besides the already used codon bias and minimum free energy. In addition, applying ensemble averaging to the SVR or PLS models also improves the results even further. The present work motivates the test of different ensembles and features with the aim of improving the prediction models whose correlation coefficients are still far from perfect. These results are relevant for the optimization of codon usage and enhancement of protein expression levels in synthetic biology problems. PMID:26934190

  9. Ice-Binding Proteins and Their Function.

    PubMed

    Bar Dolev, Maya; Braslavsky, Ido; Davies, Peter L

    2016-06-01

    Ice-binding proteins (IBPs) are a diverse class of proteins that assist organism survival in the presence of ice in cold climates. They have different origins in many organisms, including bacteria, fungi, algae, diatoms, plants, insects, and fish. This review covers the gamut of IBP structures and functions and the common features they use to bind ice. We discuss mechanisms by which IBPs adsorb to ice and interfere with its growth, evidence for their irreversible association with ice, and methods for enhancing the activity of IBPs. The applications of IBPs in the food industry, in cryopreservation, and in other technologies are vast, and we chart out some possibilities. PMID:27145844

  10. Insect Seminal Fluid Proteins: Identification and Function

    PubMed Central

    Avila, Frank W.; Sirot, Laura K.; LaFlamme, Brooke A.; Rubinstein, C. Dustin; Wolfner, Mariana F.

    2014-01-01

    Seminal fluid proteins (SFPs) produced in reproductive tract tissues of male insects and transferred to females during mating induce numerous physiological and behavioral post-mating changes in females. These changes include decreasing receptivity to re-mating, affecting sperm storage parameters, increasing egg production, modulating sperm competition, feeding behaviors, and mating plug formation. In addition, SFPs also have anti-microbial functions and induce expression of anti-microbial peptides in at least some insects. Here, we review recent identification of insect SFPs and discuss the multiple roles these proteins play in the post-mating processes of female insects. PMID:20868282

  11. PIPs: human protein–protein interaction prediction database

    PubMed Central

    McDowall, Mark D.; Scott, Michelle S.; Barton, Geoffrey J.

    2009-01-01

    The PIPs database (http://www.compbio.dundee.ac.uk/www-pips) is a resource for studying protein–protein interactions in human. It contains predictions of >37 000 high probability interactions of which >34 000 are not reported in the interaction databases HPRD, BIND, DIP or OPHID. The interactions in PIPs were calculated by a Bayesian method that combines information from expression, orthology, domain co-occurrence, post-translational modifications and sub-cellular location. The predictions also take account of the topology of the predicted interaction network. The web interface to PIPs ranks predictions according to their likelihood of interaction broken down by the contribution from each information source and with easy access to the evidence that supports each prediction. Where data exists in OPHID, HPRD, DIP or BIND for a protein pair this is also reported in the output tables returned by a search. A network browser is included to allow convenient browsing of the interaction network for any protein in the database. The PIPs database provides a new resource on protein–protein interactions in human that is straightforward to browse, or can be exploited completely, for interaction network modelling. PMID:18988626

  12. Benchmark data for identifying multi-functional types of membrane proteins.

    PubMed

    Wan, Shibiao; Mak, Man-Wai; Kung, Sun-Yuan

    2016-09-01

    Identifying membrane proteins and their multi-functional types is an indispensable yet challenging topic in proteomics and bioinformatics. In this article, we provide data that are used for training and testing Mem-ADSVM (Wan et al., 2016. "Mem-ADSVM: a two-layer multi-label predictor for identifying multi-functional types of membrane proteins" [1]), a two-layer multi-label predictor for predicting multi-functional types of membrane proteins. PMID:27294176

  13. JPred4: a protein secondary structure prediction server.

    PubMed

    Drozdetskiy, Alexey; Cole, Christian; Procter, James; Barton, Geoffrey J

    2015-07-01

    JPred4 (http://www.compbio.dundee.ac.uk/jpred4) is the latest version of the popular JPred protein secondary structure prediction server which provides predictions by the JNet algorithm, one of the most accurate methods for secondary structure prediction. In addition to protein secondary structure, JPred also makes predictions of solvent accessibility and coiled-coil regions. The JPred service runs up to 94 000 jobs per month and has carried out over 1.5 million predictions in total for users in 179 countries. The JPred4 web server has been re-implemented in the Bootstrap framework and JavaScript to improve its design, usability and accessibility from mobile devices. JPred4 features higher accuracy, with a blind three-state (α-helix, β-strand and coil) secondary structure prediction accuracy of 82.0% while solvent accessibility prediction accuracy has been raised to 90% for residues <5% accessible. Reporting of results is enhanced both on the website and through the optional email summaries and batch submission results. Predictions are now presented in SVG format with options to view full multiple sequence alignments with and without gaps and insertions. Finally, the help-pages have been updated and tool-tips added as well as step-by-step tutorials. PMID:25883141

  14. JPred4: a protein secondary structure prediction server

    PubMed Central

    Drozdetskiy, Alexey; Cole, Christian; Procter, James; Barton, Geoffrey J.

    2015-01-01

    JPred4 (http://www.compbio.dundee.ac.uk/jpred4) is the latest version of the popular JPred protein secondary structure prediction server which provides predictions by the JNet algorithm, one of the most accurate methods for secondary structure prediction. In addition to protein secondary structure, JPred also makes predictions of solvent accessibility and coiled-coil regions. The JPred service runs up to 94 000 jobs per month and has carried out over 1.5 million predictions in total for users in 179 countries. The JPred4 web server has been re-implemented in the Bootstrap framework and JavaScript to improve its design, usability and accessibility from mobile devices. JPred4 features higher accuracy, with a blind three-state (α-helix, β-strand and coil) secondary structure prediction accuracy of 82.0% while solvent accessibility prediction accuracy has been raised to 90% for residues <5% accessible. Reporting of results is enhanced both on the website and through the optional email summaries and batch submission results. Predictions are now presented in SVG format with options to view full multiple sequence alignments with and without gaps and insertions. Finally, the help-pages have been updated and tool-tips added as well as step-by-step tutorials. PMID:25883141

  15. Addressing the Role of Conformational Diversity in Protein Structure Prediction

    PubMed Central

    Parisi, Gustavo; Fornasari, Maria Silvina

    2016-01-01

    Computational modeling of tertiary structures has become of standard use to study proteins that lack experimental characterization. Unfortunately, 3D structure prediction methods and model quality assessment programs often overlook that an ensemble of conformers in equilibrium populates the native state of proteins. In this work we collected sets of publicly available protein models and the corresponding target structures experimentally solved and studied how they describe the conformational diversity of the protein. For each protein, we assessed the quality of the models against known conformers by several standard measures and identified those models ranked best. We found that model rankings are defined by both the selected target conformer and the similarity measure used. 70% of the proteins in our datasets show that different models are structurally closest to different conformers of the same protein target. We observed that model building protocols such as template-based or ab initio approaches describe in similar ways the conformational diversity of the protein, although for template-based methods this description may depend on the sequence similarity between target and template sequences. Taken together, our results support the idea that protein structure modeling could help to identify members of the native ensemble, highlight the importance of considering conformational diversity in protein 3D quality evaluations and endorse the study of the variability of the native structure for a meaningful biological analysis. PMID:27159429

  16. Addressing the Role of Conformational Diversity in Protein Structure Prediction.

    PubMed

    Palopoli, Nicolas; Monzon, Alexander Miguel; Parisi, Gustavo; Fornasari, Maria Silvina

    2016-01-01

    Computational modeling of tertiary structures has become of standard use to study proteins that lack experimental characterization. Unfortunately, 3D structure prediction methods and model quality assessment programs often overlook that an ensemble of conformers in equilibrium populates the native state of proteins. In this work we collected sets of publicly available protein models and the corresponding target structures experimentally solved and studied how they describe the conformational diversity of the protein. For each protein, we assessed the quality of the models against known conformers by several standard measures and identified those models ranked best. We found that model rankings are defined by both the selected target conformer and the similarity measure used. 70% of the proteins in our datasets show that different models are structurally closest to different conformers of the same protein target. We observed that model building protocols such as template-based or ab initio approaches describe in similar ways the conformational diversity of the protein, although for template-based methods this description may depend on the sequence similarity between target and template sequences. Taken together, our results support the idea that protein structure modeling could help to identify members of the native ensemble, highlight the importance of considering conformational diversity in protein 3D quality evaluations and endorse the study of the variability of the native structure for a meaningful biological analysis. PMID:27159429

  17. APSLAP: an adaptive boosting technique for predicting subcellular localization of apoptosis protein.

    PubMed

    Saravanan, Vijayakumar; Lakshmi, P T V

    2013-12-01

    Apoptotic proteins play key roles in understanding the mechanism of programmed cell death. Knowledge about the subcellular localization of apoptotic protein is constructive in understanding the mechanism of programmed cell death, determining the functional characterization of the protein, screening candidates in drug design, and selecting protein for relevant studies. It is also proclaimed that the information required for determining the subcellular localization of protein resides in their corresponding amino acid sequence. In this work, a new biological feature, class pattern frequency of physiochemical descriptor, was effectively used in accordance with the amino acid composition, protein similarity measure, CTD (composition, translation, and distribution) of physiochemical descriptors, and sequence similarity to predict the subcellular localization of apoptosis protein. AdaBoost with the weak learner as Random-Forest was designed for the five modules and prediction is made based on the weighted voting system. Bench mark dataset of 317 apoptosis proteins were subjected to prediction by our system and the accuracy was found to be 100.0 and 92.4 %, and 90.1 % for self-consistency test, jack-knife test, and tenfold cross validation test respectively, which is 0.9 % higher than that of other existing methods. Beside this, the independent data (N151 and ZW98) set prediction resulted in the accuracy of 90.7 and 87.7 %, respectively. These results show that the protein feature represented by a combined feature vector along with AdaBoost algorithm holds well in effective prediction of subcellular localization of apoptosis proteins. The user friendly web interface "APSLAP" has been constructed, which is freely available at http://apslap.bicpu.edu.in and it is anticipated that this tool will play a significant role in determining the specific role of apoptosis proteins with reliability. PMID:23982307

  18. Predicted vibrational spectra from anharmonic potential functions

    SciTech Connect

    Dunn, K.M.

    1986-01-01

    The dissertation develops a procedure for predicting vibrational spectra of polyatomic molecules from a combination of theoretical and experimental information. Ab initio quantum chemical calculations provide anharmonic force constants including cubics and diagonal quartics. A variational procedure analogous to configuration interaction is then used to compute eigenvalues of the pure vibrational Hamiltonian. The diagonal quadratic force constants are then adjusted until the calculated fundamental frequencies agree with experiment. The resulting theoretical-experimental force field may then be used to predict the energies of vibrationally excited states. The method is applied to three molecules: hydrogen cyanide, ammonia, and methyl fluoride. For hydrogen cyanide, the dissertation presents predicted energies for all of the vibrationally excited states with up to four quanta of excitation distributed among the four modes. The root-mean-square error is 8.7 cm{sup {minus}1} for the states below 11,000 cm{sup {minus}1}. The force constants for ammonia are adjusted to reproduce the fundamental frequencies of ND{sub 3}. The force constants then predict the energies of states below 7000 cm{sup {minus}1} with an rms error of 5.8 cm{sup {minus}1} for ND{sub 3} and 16.7 cm{sup {minus}1} for NH{sub 3}. Finally, the adjusted force constants for methyl fluoride predict the energies of states below 4100 cm{sup {minus}1} with an rms error of 4.3 cm{sup {minus}1}. These force constants are also used to predict the CH stretching overtone region of CH{sub 3}F and the first, second and third overtone regions of CD{sub 2}FH for which experimental information is not available.

  19. Prediction of allosteric sites on protein surfaces with an elastic-network-model-based thermodynamic method

    NASA Astrophysics Data System (ADS)

    Su, Ji Guo; Qi, Li Sheng; Li, Chun Hua; Zhu, Yan Ying; Du, Hui Jing; Hou, Yan Xue; Hao, Rui; Wang, Ji Hua

    2014-08-01

    Allostery is a rapid and efficient way in many biological processes to regulate protein functions, where binding of an effector at the allosteric site alters the activity and function at a distant active site. Allosteric regulation of protein biological functions provides a promising strategy for novel drug design. However, how to effectively identify the allosteric sites remains one of the major challenges for allosteric drug design. In the present work, a thermodynamic method based on the elastic network model was proposed to predict the allosteric sites on the protein surface. In our method, the thermodynamic coupling between the allosteric and active sites was considered, and then the allosteric sites were identified as those where the binding of an effector molecule induces a large change in the binding free energy of the protein with its ligand. Using the proposed method, two proteins, i.e., the 70 kD heat shock protein (Hsp70) and GluA2 alpha-amino-3-hydroxy-5-methyl-4-isoxazole propionic acid (AMPA) receptor, were studied and the allosteric sites on the protein surface were successfully identified. The predicted results are consistent with the available experimental data, which indicates that our method is a simple yet effective approach for the identification of allosteric sites on proteins.

  20. Prediction of allosteric sites on protein surfaces with an elastic-network-model-based thermodynamic method.

    PubMed

    Su, Ji Guo; Qi, Li Sheng; Li, Chun Hua; Zhu, Yan Ying; Du, Hui Jing; Hou, Yan Xue; Hao, Rui; Wang, Ji Hua

    2014-08-01

    Allostery is a rapid and efficient way in many biological processes to regulate protein functions, where binding of an effector at the allosteric site alters the activity and function at a distant active site. Allosteric regulation of protein biological functions provides a promising strategy for novel drug design. However, how to effectively identify the allosteric sites remains one of the major challenges for allosteric drug design. In the present work, a thermodynamic method based on the elastic network model was proposed to predict the allosteric sites on the protein surface. In our method, the thermodynamic coupling between the allosteric and active sites was considered, and then the allosteric sites were identified as those where the binding of an effector molecule induces a large change in the binding free energy of the protein with its ligand. Using the proposed method, two proteins, i.e., the 70 kD heat shock protein (Hsp70) and GluA2 alpha-amino-3-hydroxy-5-methyl-4-isoxazole propionic acid (AMPA) receptor, were studied and the allosteric sites on the protein surface were successfully identified. The predicted results are consistent with the available experimental data, which indicates that our method is a simple yet effective approach for the identification of allosteric sites on proteins. PMID:25215770

  1. Relationships between predicted moonlighting proteins, human diseases, and comorbidities from a network perspective

    PubMed Central

    Zanzoni, Andreas; Chapple, Charles E.; Brun, Christine

    2015-01-01

    Moonlighting proteins are a subset of multifunctional proteins characterized by their multiple, independent, and unrelated biological functions. We recently set up a large-scale identification of moonlighting proteins using a protein-protein interaction (PPI) network approach. We established that 3% of the current human interactome is composed of predicted moonlighting proteins. We found that disease-related genes are over-represented among those candidates. Here, by comparing moonlighting candidates to non-candidates as groups, we further show that (i) they are significantly involved in more than one disease, (ii) they contribute to complex rather than monogenic diseases, (iii) the diseases in which they are involved are phenotypically different according to their annotations, finally, (iv) they are enriched for diseases pairs showing statistically significant comorbidity patterns based on Medicare records. Altogether, our results suggest that some observed comorbidities between phenotypically different diseases could be due to a shared protein involved in unrelated biological processes. PMID:26157390

  2. Folding funnels, binding funnels, and protein function.

    PubMed Central

    Tsai, C. J.; Kumar, S.; Ma, B.; Nussinov, R.

    1999-01-01

    Folding funnels have been the focus of considerable attention during the last few years. These have mostly been discussed in the general context of the theory of protein folding. Here we extend the utility of the concept of folding funnels, relating them to biological mechanisms and function. In particular, here we describe the shape of the funnels in light of protein synthesis and folding; flexibility, conformational diversity, and binding mechanisms; and the associated binding funnels, illustrating the multiple routes and the range of complexed conformers. Specifically, the walls of the folding funnels, their crevices, and bumps are related to the complexity of protein folding, and hence to sequential vs. nonsequential folding. Whereas the former is more frequently observed in eukaryotic proteins, where the rate of protein synthesis is slower, the latter is more frequent in prokaryotes, with faster translation rates. The bottoms of the funnels reflect the extent of the flexibility of the proteins. Rugged floors imply a range of conformational isomers, which may be close on the energy landscape. Rather than undergoing an induced fit binding mechanism, the conformational ensembles around the rugged bottoms argue that the conformers, which are most complementary to the ligand, will bind to it with the equilibrium shifting in their favor. Furthermore, depending on the extent of the ruggedness, or of the smoothness with only a few minima, we may infer nonspecific, broad range vs. specific binding. In particular, folding and binding are similar processes, with similar underlying principles. Hence, the shape of the folding funnel of the monomer enables making reasonable guesses regarding the shape of the corresponding binding funnel. Proteins having a broad range of binding, such as proteolytic enzymes or relatively nonspecific endonucleases, may be expected to have not only rugged floors in their folding funnels, but their binding funnels will also behave similarly

  3. Binding affinity prediction for protein-ligand complexes based on β contacts and B factor.

    PubMed

    Liu, Qian; Kwoh, Chee Keong; Li, Jinyan

    2013-11-25

    Accurate determination of protein-ligand binding affinity is a fundamental problem in biochemistry useful for many applications including drug design and protein-ligand docking. A number of scoring functions have been proposed for the prediction of protein-ligand binding affinity. However, accurate prediction is still a challenging problem because poor performance is often seen in the evaluation under the leave-one-cluster-out cross-validation (LCOCV). We introduce a new scoring function named B2BScore to improve the prediction performance. B2BScore integrates two physicochemical properties for protein-ligand binding affinity prediction. One is the property of β contacts. A β contact between two atoms requires no other atoms to interrupt the atomic contact and assumes that the two atoms should have enough direct contact area. The other is the property of B factor to capture the atomic mobility in the dynamic protein-ligand binding process. Tested on the PDBBind2009 data set, B2BScore shows superior prediction performance to existing methods on independent test data as well as under the LCOCV evaluation framework. In particular, B2BScore achieves a significant LCOCV improvement across 26 protein clusters-a big increase of the averaged Pearson's correlation coefficients from 0.418 to 0.518 and a significant decrease of standard deviation of the coefficients from 0.352 to 0.196. We also identified several important and intuitive contact descriptors of protein-ligand binding through the random forest learning in B2BScore. Some of these descriptors are closely related to contacts between carbon atoms without covalent-bond oxygen/nitrogen, preferred contacts of metal ions, interfacial backbone atoms from proteins, or π rings. Some others are negative descriptors relating to those contacts with nitrogen atoms without covalent-bond hydrogens or nonpreferred contacts of metal ions. These descriptors can be directly used to guide protein-ligand docking. PMID:24191692

  4. Prediction of Multi-Type Membrane Proteins in Human by an Integrated Approach

    PubMed Central

    Chen, Lei; Zhang, Ning; Huang, Tao; Cai, Yu-Dong

    2014-01-01

    Membrane proteins were found to be involved in various cellular processes performing various important functions, which are mainly associated to their types. However, it is very time-consuming and expensive for traditional biophysical methods to identify membrane protein types. Although some computational tools predicting membrane protein types have been developed, most of them can only recognize one kind of type. Therefore, they are not as effective as one membrane protein can have several types at the same time. To our knowledge, few methods handling multiple types of membrane proteins were reported. In this study, we proposed an integrated approach to predict multiple types of membrane proteins by employing sequence homology and protein-protein interaction network. As a result, the prediction accuracies reached 87.65%, 81.39% and 70.79%, respectively, by the leave-one-out test on three datasets. It outperformed the nearest neighbor algorithm adopting pseudo amino acid composition. The method is anticipated to be an alternative tool for identifying membrane protein types. New metrics for evaluating performances of methods dealing with multi-label problems were also presented. The program of the method is available upon request. PMID:24676214

  5. Intrinsic Disorder in Transmembrane Proteins: Roles in Signaling and Topology Prediction

    PubMed Central

    Bürgi, Jérôme; Xue, Bin; Uversky, Vladimir N.

    2016-01-01

    Intrinsically disordered regions (IDRs) are peculiar stretches of amino acids that lack stable conformations in solution. Intrinsic Disorder containing Proteins (IDP) are defined by the presence of at least one large IDR and have been linked to multiple cellular processes including cell signaling, DNA binding and cancer. Here we used computational analyses and publicly available databases to deepen insight into the prevalence and function of IDRs specifically in transmembrane proteins, which are somewhat neglected in most studies. We found that 50% of transmembrane proteins have at least one IDR of 30 amino acids or more. Interestingly, these domains preferentially localize to the cytoplasmic side especially of multi-pass transmembrane proteins, suggesting that disorder prediction could increase the confidence of topology prediction algorithms. This was supported by the successful prediction of the topology of the uncharacterized multi-pass transmembrane protein TMEM117, as confirmed experimentally. Pathway analysis indicated that IDPs are enriched in cell projection and axons and appear to play an important role in cell adhesion, signaling and ion binding. In addition, we found that IDP are enriched in phosphorylation sites, a crucial post translational modification in signal transduction, when compared to fully ordered proteins and to be implicated in more protein-protein interaction events. Accordingly, IDPs were highly enriched in short protein binding regions called Molecular Recognition Features (MoRFs). Altogether our analyses strongly support the notion that the transmembrane IDPs act as hubs in cellular signal events. PMID:27391701

  6. CARDIO-PRED: an in silico tool for predicting cardiovascular-disorder associated proteins.

    PubMed

    Jain, Prerna; Thukral, Nitin; Gahlot, Lokesh Kumar; Hasija, Yasha

    2015-06-01

    Interactions between proteins largely govern cellular processes and this has led to numerous efforts culminating in enormous information related to the proteins, their interactions and the function which is determined by their interactions. The main concern of the present study is to present interface analysis of cardiovascular-disorder (CVD) related proteins to shed lights on details of interactions and to emphasize the importance of using structures in network studies. This study combines the network-centred approach with three dimensional studies to comprehend the fundamentals of biology. Interface properties were used as descriptors to classify the CVD associated proteins and non-CVD associated proteins. Machine learning algorithm was used to generate a classifier based on the training set which was then used to predict potential CVD related proteins from a set of polymorphic proteins which are not known to be involved in any disease. Among several classifying algorithms applied to generate models, best performance was achieved using Random Forest with an accuracy of 69.5 %. The tool named CARDIO-PRED, based on the prediction model is present at http://www.genomeinformatics.dce.edu/CARDIO-PRED/. The predicted CVD related proteins may not be the causing factor of particular disease but can be involved in pathways and reactions yet unknown to us thus permitting a more rational analysis of disease mechanism. Study of their interactions with other proteins can significantly improve our understanding of the molecular mechanism of diseases. PMID:25972989

  7. CoinFold: a web server for protein contact prediction and contact-assisted protein folding.

    PubMed

    Wang, Sheng; Li, Wei; Zhang, Renyu; Liu, Shiwang; Xu, Jinbo

    2016-07-01

    CoinFold (http://raptorx2.uchicago.edu/ContactMap/) is a web server for protein contact prediction and contact-assisted de novo structure prediction. CoinFold predicts contacts by integrating joint multi-family evolutionary coupling (EC) analysis and supervised machine learning. This joint EC analysis is unique in that it not only uses residue coevolution information in the target protein family, but also that in the related families which may have divergent sequences but similar folds. The supervised learning further improves contact prediction accuracy by making use of sequence profile, contact (distance) potential and other information. Finally, this server predicts tertiary structure of a sequence by feeding its predicted contacts and secondary structure to the CNS suite. Tested on the CASP and CAMEO targets, this server shows significant advantages over existing ones of similar category in both contact and tertiary structure prediction. PMID:27112569

  8. Heterogeneity in Retroviral Nucleocapsid Protein Function

    NASA Astrophysics Data System (ADS)

    Landes, Christy

    2009-03-01

    Time-resolved single-molecule fluorescence spectroscopy was used to study the human T-cell lymphotropic virus type 1 (HTLV-1) nucleocapsid protein (NC) chaperone activity as compared to that of the HIV-1 NC protein. HTLV-1 NC contains two zinc fingers with each having a CCHC binding motif similar to HIV-1 NC. HIV-1 NC is required for recognition and packaging of the viral RNA and is also a nucleic acid chaperone protein that facilitates nucleic acid restructuring during reverse transcription. Because of similarities in structures between the two retroviruses, we have used single-molecule fluorescence energy transfer to investigate the chaperoning activity of HTLV-1 NC protein. The results indicate that HTLV-1 NC protein induces structural changes by opening the transactivation response (TAR)-DNA hairpin to an even greater extent than HIV-1 NC. However, unlike HIV-1 NC, HTLV-1 NC does not chaperone the strand-transfer reaction involving TAR-DNA. These results suggest that despite its effective destabilization capability, HTLV-1 NC is not as effective at overall chaperone function as is its HIV-1 counterpart.

  9. Prediction and redesign of protein–protein interactions

    PubMed Central

    Lua, Rhonald C.; Marciano, David C.; Katsonis, Panagiotis; Adikesavan, Anbu K.; Wilkins, Angela D.; Lichtarge, Olivier

    2014-01-01

    Understanding the molecular basis of protein function remains a central goal of biology, with the hope to elucidate the role of human genes in health and in disease, and to rationally design therapies through targeted molecular perturbations. We review here some of the computational techniques and resources available for characterizing a critical aspect of protein function – those mediated by protein–protein interactions (PPI). We describe several applications and recent successes of the Evolutionary Trace (ET) in identifying molecular events and shapes that underlie protein function and specificity in both eukaryotes and prokaryotes. ET is a part of analytical approaches based on the successes and failures of evolution that enable the rational control of PPI. PMID:24878423

  10. Protein structure prediction enhanced with evolutionary diversity : SPEED.

    SciTech Connect

    DeBartolo, J.; Hocky, G.; Wilde, M.; Xu, J.; Freed, K. F.; Sosnick, T. R.; Univ. of Chicago; Toyota Technological Inst. at Chicago

    2010-03-01

    For naturally occurring proteins, similar sequence implies similar structure. Consequently, multiple sequence alignments (MSAs) often are used in template-based modeling of protein structure and have been incorporated into fragment-based assembly methods. Our previous homology-free structure prediction study introduced an algorithm that mimics the folding pathway by coupling the formation of secondary and tertiary structure. Moves in the Monte Carlo procedure involve only a change in a single pair of {phi},{psi} backbone dihedral angles that are obtained from a Protein Data Bank-based distribution appropriate for each amino acid, conditional on the type and conformation of the flanking residues. We improve this method by using MSAs to enrich the sampling distribution, but in a manner that does not require structural knowledge of any protein sequence (i.e., not homologous fragment insertion). In combination with other tools, including clustering and refinement, the accuracies of the predicted secondary and tertiary structures are substantially improved and a global and position-resolved measure of confidence is introduced for the accuracy of the predictions. Performance of the method in the Critical Assessment of Structure Prediction (CASP8) is discussed.

  11. CSF protein biomarkers predicting longitudinal reduction of CSF β-amyloid42 in cognitively healthy elders

    PubMed Central

    Mattsson, N; Insel, P; Nosheny, R; Zetterberg, H; Trojanowski, J Q; Shaw, L M; Tosun, D; Weiner, M

    2013-01-01

    β-amyloid (Aβ) plaque accumulation is a hallmark of Alzheimer's disease (AD). It is believed to start many years prior to symptoms and is reflected by reduced cerebrospinal fluid (CSF) levels of the peptide Aβ1–42 (Aβ42). Here we tested the hypothesis that baseline levels of CSF proteins involved in microglia activity, synaptic function and Aβ metabolism predict the development of Aβ plaques, assessed by longitudinal CSF Aβ42 decrease in cognitively healthy people. Forty-six healthy people with three to four serial CSF samples were included (mean follow-up 3 years, range 2–4 years). There was an overall reduction in Aβ42 from a mean concentration of 211–195 pg ml−1 after 4 years. Linear mixed-effects models using longitudinal Aβ42 as the response variable, and baseline proteins as explanatory variables (n=69 proteins potentially relevant for Aβ metabolism, microglia or synaptic/neuronal function), identified 10 proteins with significant effects on longitudinal Aβ42. The most significant proteins were angiotensin-converting enzyme (ACE, P=0.009), Chromogranin A (CgA, P=0.009) and Axl receptor tyrosine kinase (AXL, P=0.009). Receiver-operating characteristic analysis identified 11 proteins with significant effects on longitudinal Aβ42 (largely overlapping with the proteins identified by linear mixed-effects models). Several proteins (including ACE, CgA and AXL) were associated with Aβ42 reduction only in subjects with normal baseline Aβ42, and not in subjects with reduced baseline Aβ42. We conclude that baseline CSF proteins related to Aβ metabolism, microglia activity or synapses predict longitudinal Aβ42 reduction in cognitively healthy elders. The finding that some proteins only predict Aβ42 reduction in subjects with normal baseline Aβ42 suggest that they predict future development of the brain Aβ pathology at the earliest stages of AD, prior to widespread development of Aβ plaques. PMID:23962923

  12. A multilayer evaluation approach for protein structure prediction and model quality assessment.

    PubMed

    Zhang, Jingfen; Wang, Qingguo; Vantasin, Kittinun; Zhang, Jiong; He, Zhiquan; Kosztin, Ioan; Shang, Yi; Xu, Dong

    2011-01-01

    Protein tertiary structures are essential for studying functions of proteins at molecular level. An indispensable approach for protein structure solution is computational prediction. Most protein structure prediction methods generate candidate models first and select the best candidates by model quality assessment (QA). In many cases, good models can be produced, but the QA tools fail to select the best ones from the candidate model pool. Because of incomplete understanding of protein folding, each QA method only reflects partial facets of a structure model and thus has limited discerning power with no one consistently outperforming others. In this article, we developed a set of new QA methods, including two QA methods for evaluating target/template alignments, a molecular dynamics (MD)-based QA method, and three consensus QA methods with selected references to reveal new facets of protein structures complementary to the existing methods. Moreover, the underlying relationship among different QA methods were analyzed and then integrated into a multilayer evaluation approach to guide the model generation and model selection in prediction. All methods are integrated and implemented into an innovative and improved prediction system hereafter referred to as MUFOLD. In CASP8 and CASP9, MUFOLD has demonstrated the proof of the principles in terms of both QA discerning power and structure prediction accuracy. PMID:21997706

  13. Prediction of Mutational Tolerance in HIV-1 Protease and Reverse Transcriptase Using Flexible Backbone Protein Design

    PubMed Central

    Varela, Rocco; Ó Conchúir, Shane; Kortemme, Tanja

    2012-01-01

    Predicting which mutations proteins tolerate while maintaining their structure and function has important applications for modeling fundamental properties of proteins and their evolution; it also drives progress in protein design. Here we develop a computational model to predict the tolerated sequence space of HIV-1 protease reachable by single mutations. We assess the model by comparison to the observed variability in more than 50,000 HIV-1 protease sequences, one of the most comprehensive datasets on tolerated sequence space. We then extend the model to a second protein, reverse transcriptase. The model integrates multiple structural and functional constraints acting on a protein and uses ensembles of protein conformations. We find the model correctly captures a considerable fraction of protease and reverse-transcriptase mutational tolerance and shows comparable accuracy using either experimentally determined or computationally generated structural ensembles. Predictions of tolerated sequence space afforded by the model provide insights into stability-function tradeoffs in the emergence of resistance mutations and into strengths and limitations of the computational model. PMID:22927804

  14. Improved hybrid optimization algorithm for 3D protein structure prediction.

    PubMed

    Zhou, Changjun; Hou, Caixia; Wei, Xiaopeng; Zhang, Qiang

    2014-07-01

    A new improved hybrid optimization algorithm - PGATS algorithm, which is based on toy off-lattice model, is presented for dealing with three-dimensional protein structure prediction problems. The algorithm combines the particle swarm optimization (PSO), genetic algorithm (GA), and tabu search (TS) algorithms. Otherwise, we also take some different improved strategies. The factor of stochastic disturbance is joined in the particle swarm optimization to improve the search ability; the operations of crossover and mutation that are in the genetic algorithm are changed to a kind of random liner method; at last tabu search algorithm is improved by appending a mutation operator. Through the combination of a variety of strategies and algorithms, the protein structure prediction (PSP) in a 3D off-lattice model is achieved. The PSP problem is an NP-hard problem, but the problem can be attributed to a global optimization problem of multi-extremum and multi-parameters. This is the theoretical principle of the hybrid optimization algorithm that is proposed in this paper. The algorithm combines local search and global search, which overcomes the shortcoming of a single algorithm, giving full play to the advantage of each algorithm. In the current universal standard sequences, Fibonacci sequences and real protein sequences are certified. Experiments show that the proposed new method outperforms single algorithms on the accuracy of calculating the protein sequence energy value, which is proved to be an effective way to predict the structure of proteins. PMID:25069136

  15. A computational method to predict carbonylation sites in yeast proteins.

    PubMed

    Lv, H Q; Liu, J; Han, J Q; Zheng, J G; Liu, R L

    2016-01-01

    Several post-translational modifications (PTM) have been discussed in literature. Among a variety of oxidative stress-induced PTM, protein carbonylation is considered a biomarker of oxidative stress. Only certain proteins can be carbonylated because only four amino acid residues, namely lysine (K), arginine (R), threonine (T) and proline (P), are susceptible to carbonylation. The yeast proteome is an excellent model to explore oxidative stress, especially protein carbonylation. Current experimental approaches in identifying carbonylation sites are expensive, time-consuming and limited in their abilities to process proteins. Furthermore, there is no bioinformational method to predict carbonylation sites in yeast proteins. Therefore, we propose a computational method to predict yeast carbonylation sites. This method has total accuracies of 86.32, 85.89, 84.80, and 86.80% in predicting the carbonylation sites of K, R, T, and P, respectively. These results were confirmed by 10-fold cross-validation. The ability to identify carbonylation sites in different kinds of features was analyzed and the position-specific composition of the modification site-flanking residues was discussed. Additionally, a software tool has been developed to help with the calculations in this method. Datasets and the software are available at https://sourceforge.net/projects/hqlstudio/ files/CarSpred.Y/. PMID:27420944

  16. From Nonspecific DNA–Protein Encounter Complexes to the Prediction of DNA–Protein Interactions

    PubMed Central

    Gao, Mu; Skolnick, Jeffrey

    2009-01-01

    DNA–protein interactions are involved in many essential biological activities. Because there is no simple mapping code between DNA base pairs and protein amino acids, the prediction of DNA–protein interactions is a challenging problem. Here, we present a novel computational approach for predicting DNA-binding protein residues and DNA–protein interaction modes without knowing its specific DNA target sequence. Given the structure of a DNA-binding protein, the method first generates an ensemble of complex structures obtained by rigid-body docking with a nonspecific canonical B-DNA. Representative models are subsequently selected through clustering and ranking by their DNA–protein interfacial energy. Analysis of these encounter complex models suggests that the recognition sites for specific DNA binding are usually favorable interaction sites for the nonspecific DNA probe and that nonspecific DNA–protein interaction modes exhibit some similarity to specific DNA–protein binding modes. Although the method requires as input the knowledge that the protein binds DNA, in benchmark tests, it achieves better performance in identifying DNA-binding sites than three previously established methods, which are based on sophisticated machine-learning techniques. We further apply our method to protein structures predicted through modeling and demonstrate that our method performs satisfactorily on protein models whose root-mean-square Cα deviation from native is up to 5 Å from their native structures. This study provides valuable structural insights into how a specific DNA-binding protein interacts with a nonspecific DNA sequence. The similarity between the specific DNA–protein interaction mode and nonspecific interaction modes may reflect an important sampling step in search of its specific DNA targets by a DNA-binding protein. PMID:19343221

  17. Choosing negative examples for the prediction of protein-protein interactions

    PubMed Central

    Ben-Hur, Asa; Noble, William Stafford

    2006-01-01

    The protein-protein interaction networks of even well-studied model organisms are sketchy at best, highlighting the continued need for computational methods to help direct experimentalists in the search for novel interactions. This need has prompted the development of a number of methods for predicting protein-protein interactions based on various sources of data and methodologies. The common method for choosing negative examples for training a predictor of protein-protein interactions is based on annotations of cellular localization, and the observation that pairs of proteins that have different localization patterns are unlikely to interact. While this method leads to high quality sets of non-interacting proteins, we find that this choice can lead to biased estimates of prediction accuracy, because the constraints placed on the distribution of the negative examples makes the task easier. The effects of this bias are demonstrated in the context of both sequence-based and non-sequence based features used for predicting protein-protein interactions. PMID:16723005

  18. Predicting Gene-Regulation Functions: Lessons from Temperate Bacteriophages

    PubMed Central

    Teif, Vladimir B.

    2010-01-01

    Gene-regulation functions (GRF) provide a unique characteristic of a cis-regulatory module (CRM), relating the concentrations of transcription factors (input) to the promoter activities (output). The challenge is to predict GRFs from the sequence. Here we systematically consider the lysogeny-lysis CRMs of different temperate bacteriophages such as the Lactobacillus casei phage A2, Escherichia coli phages λ, and 186 and Lactococcal phage TP901-1. This study allowed explaining a recent experimental puzzle on the role of Cro protein in the lambda switch. Several general conclusions have been drawn: 1), long-range interactions, multilayer assembly and DNA looping may lead to complex GRFs that cannot be described by linear functions of binding site occupancies; 2), in general, GRFs cannot be described by the Boolean logic, whereas a three-state non-Boolean logic suffices for the studied examples; 3), studied CRMs of the intact phages seemed to have a similar GRF topology (the number of plateaus and peaks corresponding to different expression regimes); we hypothesize that functionally equivalent CRMs might have topologically equivalent GRFs for a larger class of genetic systems; and 4) within a given GRF class, a set of mechanistic-to-mathematical transformations has been identified, which allows shaping the GRF before carrying out a system-level analysis. PMID:20371324

  19. PDP-CON: prediction of domain/linker residues in protein sequences using a consensus approach.

    PubMed

    Chatterjee, Piyali; Basu, Subhadip; Zubek, Julian; Kundu, Mahantapas; Nasipuri, Mita; Plewczynski, Dariusz

    2016-04-01

    The prediction of domain/linker residues in protein sequences is a crucial task in the functional classification of proteins, homology-based protein structure prediction, and high-throughput structural genomics. In this work, a novel consensus-based machine-learning technique was applied for residue-level prediction of the domain/linker annotations in protein sequences using ordered/disordered regions along protein chains and a set of physicochemical properties. Six different classifiers-decision tree, Gaussian naïve Bayes, linear discriminant analysis, support vector machine, random forest, and multilayer perceptron-were exhaustively explored for the residue-level prediction of domain/linker regions. The protein sequences from the curated CATH database were used for training and cross-validation experiments. Test results obtained by applying the developed PDP-CON tool to the mutually exclusive, independent proteins of the CASP-8, CASP-9, and CASP-10 databases are reported. An n-star quality consensus approach was used to combine the results yielded by different classifiers. The average PDP-CON accuracy and F-measure values for the CASP targets were found to be 0.86 and 0.91, respectively. The dataset, source code, and all supplementary materials for this work are available at https://cmaterju.org/cmaterbioinfo/ for noncommercial use. PMID:26969678

  20. DEPTH: a web server to compute depth and predict small-molecule binding cavities in proteins.

    PubMed

    Tan, Kuan Pern; Varadarajan, Raghavan; Madhusudhan, M S

    2011-07-01

    Depth measures the extent of atom/residue burial within a protein. It correlates with properties such as protein stability, hydrogen exchange rate, protein-protein interaction hot spots, post-translational modification sites and sequence variability. Our server, DEPTH, accurately computes depth and solvent-accessible surface area (SASA) values. We show that depth can be used to predict small molecule ligand binding cavities in proteins. Often, some of the residues lining a ligand binding cavity are both deep and solvent exposed. Using the depth-SASA pair values for a residue, its likelihood to form part of a small molecule binding cavity is estimated. The parameters of the method were calibrated over a training set of 900 high-resolution X-ray crystal structures of single-domain proteins bound to small molecules (molecular weight <1.5  KDa). The prediction accuracy of DEPTH is comparable to that of other geometry-based prediction methods including LIGSITE, SURFNET and Pocket-Finder (all with Matthew's correlation coefficient of ∼0.4) over a testing set of 225 single and multi-chain protein structures. Users have the option of tuning several parameters to detect cavities of different sizes, for example, geometrically flat binding sites. The input to the server is a protein 3D structure in PDB format. The users have the option of tuning the values of four parameters associated with the computation of residue depth and the prediction of binding cavities. The computed depths, SASA and binding cavity predictions are displayed in 2D plots and mapped onto 3D representations of the protein structure using Jmol. Links are provided to download the outputs. Our server is useful for all structural analysis based on residue depth and SASA, such as guiding site-directed mutagenesis experiments and small molecule docking exercises, in the context of protein functional annotation and drug discovery. PMID:21576233

  1. Computational prediction of virus-human protein-protein interactions using embedding kernelized heterogeneous data.

    PubMed

    Nourani, Esmaeil; Khunjush, Farshad; Durmuş, Saliha

    2016-05-24

    Pathogenic microorganisms exploit host cellular mechanisms and evade host defense mechanisms through molecular pathogen-host interactions (PHIs). Therefore, comprehensive analysis of these PHI networks should be an initial step for developing effective therapeutics against infectious diseases. Computational prediction of PHI data is gaining increasing demand because of scarcity of experimental data. Prediction of protein-protein interactions (PPIs) within PHI systems can be formulated as a classification problem, which requires the knowledge of non-interacting protein pairs. This is a restricting requirement since we lack datasets that report non-interacting protein pairs. In this study, we formulated the "computational prediction of PHI data" problem using kernel embedding of heterogeneous data. This eliminates the abovementioned requirement and enables us to predict new interactions without randomly labeling protein pairs as non-interacting. Domain-domain associations are used to filter the predicted results leading to 175 novel PHIs between 170 human proteins and 105 viral proteins. To compare our results with the state-of-the-art studies that use a binary classification formulation, we modified our settings to consider the same formulation. Detailed evaluations are conducted and our results provide more than 10 percent improvements for accuracy and AUC (area under the receiving operating curve) results in comparison with state-of-the-art methods. PMID:27072625

  2. Nanoparticles-cell association predicted by protein corona fingerprints

    NASA Astrophysics Data System (ADS)

    Palchetti, S.; Digiacomo, L.; Pozzi, D.; Peruzzi, G.; Micarelli, E.; Mahmoudi, M.; Caracciolo, G.

    2016-06-01

    In a physiological environment (e.g., blood and interstitial fluids) nanoparticles (NPs) will bind proteins shaping a ``protein corona'' layer. The long-lived protein layer tightly bound to the NP surface is referred to as the hard corona (HC) and encodes information that controls NP bioactivity (e.g. cellular association, cellular signaling pathways, biodistribution, and toxicity). Decrypting this complex code has become a priority to predict the NP biological outcomes. Here, we use a library of 16 lipid NPs of varying size (Ø ~ 100-250 nm) and surface chemistry (unmodified and PEGylated) to investigate the relationships between NP physicochemical properties (nanoparticle size, aggregation state and surface charge), protein corona fingerprints (PCFs), and NP-cell association. We found out that none of the NPs' physicochemical properties alone was exclusively able to account for association with human cervical cancer cell line (HeLa). For the entire library of NPs, a total of 436 distinct serum proteins were detected. We developed a predictive-validation modeling that provides a means of assessing the relative significance of the identified corona proteins. Interestingly, a minor fraction of the HC, which consists of only 8 PCFs were identified as main promoters of NP association with HeLa cells. Remarkably, identified PCFs have several receptors with high level of expression on the plasma membrane of HeLa cells.In a physiological environment (e.g., blood and interstitial fluids) nanoparticles (NPs) will bind proteins shaping a ``protein corona'' layer. The long-lived protein layer tightly bound to the NP surface is referred to as the hard corona (HC) and encodes information that controls NP bioactivity (e.g. cellular association, cellular signaling pathways, biodistribution, and toxicity). Decrypting this complex code has become a priority to predict the NP biological outcomes. Here, we use a library of 16 lipid NPs of varying size (Ø ~ 100-250 nm) and surface

  3. Predicting the Effect of Mutations on Protein-Protein Binding Interactions through Structure-Based Interface Profiles

    PubMed Central

    Brender, Jeffrey R.; Zhang, Yang

    2015-01-01

    The formation of protein-protein complexes is essential for proteins to perform their physiological functions in the cell. Mutations that prevent the proper formation of the correct complexes can have serious consequences for the associated cellular processes. Since experimental determination of protein-protein binding affinity remains difficult when performed on a large scale, computational methods for predicting the consequences of mutations on binding affinity are highly desirable. We show that a scoring function based on interface structure profiles collected from analogous protein-protein interactions in the PDB is a powerful predictor of protein binding affinity changes upon mutation. As a standalone feature, the differences between the interface profile score of the mutant and wild-type proteins has an accuracy equivalent to the best all-atom potentials, despite being two orders of magnitude faster once the profile has been constructed. Due to its unique sensitivity in collecting the evolutionary profiles of analogous binding interactions and the high speed of calculation, the interface profile score has additional advantages as a complementary feature to combine with physics-based potentials for improving the accuracy of composite scoring approaches. By incorporating the sequence-derived and residue-level coarse-grained potentials with the interface structure profile score, a composite model was constructed through the random forest training, which generates a Pearson correlation coefficient >0.8 between the predicted and observed binding free-energy changes upon mutation. This accuracy is comparable to, or outperforms in most cases, the current best methods, but does not require high-resolution full-atomic models of the mutant structures. The binding interface profiling approach should find useful application in human-disease mutation recognition and protein interface design studies. PMID:26506533

  4. Nanoparticles-cell association predicted by protein corona fingerprints.

    PubMed

    Palchetti, S; Digiacomo, L; Pozzi, D; Peruzzi, G; Micarelli, E; Mahmoudi, M; Caracciolo, G

    2016-07-01

    In a physiological environment (e.g., blood and interstitial fluids) nanoparticles (NPs) will bind proteins shaping a "protein corona" layer. The long-lived protein layer tightly bound to the NP surface is referred to as the hard corona (HC) and encodes information that controls NP bioactivity (e.g. cellular association, cellular signaling pathways, biodistribution, and toxicity). Decrypting this complex code has become a priority to predict the NP biological outcomes. Here, we use a library of 16 lipid NPs of varying size (Ø≈ 100-250 nm) and surface chemistry (unmodified and PEGylated) to investigate the relationships between NP physicochemical properties (nanoparticle size, aggregation state and surface charge), protein corona fingerprints (PCFs), and NP-cell association. We found out that none of the NPs' physicochemical properties alone was exclusively able to account for association with human cervical cancer cell line (HeLa). For the entire library of NPs, a total of 436 distinct serum proteins were detected. We developed a predictive-validation modeling that provides a means of assessing the relative significance of the identified corona proteins. Interestingly, a minor fraction of the HC, which consists of only 8 PCFs were identified as main promoters of NP association with HeLa cells. Remarkably, identified PCFs have several receptors with high level of expression on the plasma membrane of HeLa cells. PMID:27279572

  5. Functionalized nanoparticle probes for protein detection

    NASA Astrophysics Data System (ADS)

    Park, Do Hyun; Lee, Jae-Seung

    2015-05-01

    In this Review, we discuss representative studies of recent advances in the development of nanoparticle-based protein detection methods, with a focus on the properties and functionalization of nanoparticle probes, as well as their use in detection schemes. We have focused on functionalized nanoparticle probes because they offer a number of advantages over conventional assays and because their use for detecting protein targets for diagnostic purposed has been demonstrated. In this report, we discuss nanoparticle probes classified by material type (gold, silver, silica, semiconductor, carbon, and virus) and surface functionality (antibody, aptamer, and DNA), which play a critical role in enhancing the sensitivity, selectivity, and efficiency of the detection systems. In particular, the synergistic function of each component of the nanoparticle probe is emphasized in terms of specific chemical and physical properties. This research area is in its early stages with many milestones to reach before nanoparticle probes are successfully applied in the field; however, the substantial ongoing efforts of researchers underline the great promise offered by nanoparticlebased probes for future applications. [Figure not available: see fulltext.

  6. Functional Characterization of the Alphavirus TF Protein

    PubMed Central

    Snyder, Jonathan E.; Kulcsar, Kirsten A.; Schultz, Kimberly L. W.; Riley, Catherine P.; Neary, Jacob T.; Marr, Scott; Jose, Joyce; Griffin, Diane E.

    2013-01-01

    Alphavirus dogma has long dictated the production of a discrete set of structural proteins during infection of a cell: capsid, pE2, 6K, and E1. However, bioinformatic analyses of alphavirus genomes (A. E. Firth, B. Y. Chung, M. N. Fleeton, and J. F. Atkins, Virol. J. 5:108, 2008) suggested that a ribosomal frameshifting event occurs during translation of the alphavirus structural polyprotein. Specifically, a frameshift event is suggested to occur during translation of the 6K gene, yielding production of a novel protein, termed transframe (TF), comprised of a C-terminal extension of the 6K protein in the −1 open reading frame (ORF). Here, we validate the findings of Firth and colleagues with respect to the production of the TF protein and begin to characterize the function of TF. Using a mass spectrometry-based approach, we identified TF in purified preparations of both Sindbis and Chikungunya virus particles. We next constructed a panel of Sindbis virus mutants with mutations which alter the production, size, or sequence of TF. We demonstrate that TF is not absolutely required in culture, although disrupting TF production leads to a decrease in virus particle release in both mammalian and insect cells. In a mouse neuropathogenesis model, mortality was <15% in animals infected with the TF mutants, whereas mortality was 95% in animals infected with the wild-type virus. Using a variety of additional assays, we demonstrate that TF retains ion-channel activity analogous to that of 6K and that lack of production of TF does not affect genome replication, particle infectivity, or envelope protein transit to the cell surface. The TF protein therefore represents a previously uncharacterized factor important for alphavirus assembly. PMID:23720714

  7. Electrostatics, structure prediction, and the energy landscapes for protein folding and binding.

    PubMed

    Tsai, Min-Yeh; Zheng, Weihua; Balamurugan, D; Schafer, Nicholas P; Kim, Bobby L; Cheung, Margaret S; Wolynes, Peter G

    2016-01-01

    While being long in range and therefore weakly specific, electrostatic interactions are able to modulate the stability and folding landscapes of some proteins. The relevance of electrostatic forces for steering the docking of proteins to each other is widely acknowledged, however, the role of electrostatics in establishing specifically funneled landscapes and their relevance for protein structure prediction are still not clear. By introducing Debye-Hückel potentials that mimic long-range electrostatic forces into the Associative memory, Water mediated, Structure, and Energy Model (AWSEM), a transferable protein model capable of predicting tertiary structures, we assess the effects of electrostatics on the landscapes of thirteen monomeric proteins and four dimers. For the monomers, we find that adding electrostatic interactions does not improve structure prediction. Simulations of ribosomal protein S6 show, however, that folding stability depends monotonically on electrostatic strength. The trend in predicted melting temperatures of the S6 variants agrees with experimental observations. Electrostatic effects can play a range of roles in binding. The binding of the protein complex KIX-pKID is largely assisted by electrostatic interactions, which provide direct charge-charge stabilization of the native state and contribute to the funneling of the binding landscape. In contrast, for several other proteins, including the DNA-binding protein FIS, electrostatics causes frustration in the DNA-binding region, which favors its binding with DNA but not with its protein partner. This study highlights the importance of long-range electrostatics in functional responses to problems where proteins interact with their charged partners, such as DNA, RNA, as well as membranes. PMID:26183799

  8. Ab initio prediction of protein structure with both all-atom and simplified force fields

    NASA Astrophysics Data System (ADS)

    Scheraga, Harold

    2004-03-01

    Using only a physics-based ab initio method, and both all-atom (ECEPP/3) and simplified united-residue (UNRES) force fields, global optimization of both potential functions with Monte Carlo-plus-Minimization (MCM) and Conformational Space Annealing (CSA), respectively, provides predicted structures of proteins without use of knowledge-based information. The all-atom approach has been applied to the 46-residue protein A, and the UNRES approach has been applied to larger CASP targets. The predicted structures will be described.

  9. Topological Predictions for Integral Membrane Channel and Carrier Proteins

    PubMed Central

    Abhinay, Reddy; Jaehoon, Cho; Sam, Ling; Vamsee, Reddy; Maksim, Shlykov; Milton, Saier

    2014-01-01

    We evaluated topological predictions for nine different programs, HMMTOP, TMHMM, SVMTOP, DAS, SOSUI, TOPCONS, PHOBIUS, MEMSAT-SVM (hereinafter referred to as MEMSAT), and SPOCTOPUS. These programs were first evaluated using four large topologically well-defined families of secondary transporters, and the three best programs were further evaluated using topologically more diverse families of channels and carriers. In the initial studies, the order of accuracy was: SPOCTOPUS>MEMSAT>HMMTOP>TOPCONS>PHOBIUS>TMHMM>SVMTOP>DAS>S OSUI. Some families, such as the Sugar Porter family (2.A.1.1) of the Major Facilitator Superfamily (MFS; TC# 2.A.1) and the Amino acid/Polyamine/Organocation (APC) Family (TC# 2.A.3), were correctly predicted with high accuracy while others, such as the Mitochondrial Carrier (MC) (TC# 2.A.29) and the K+ transporter (Trk) families (TC# 2.A.38), were predicted with much lower accuracy. For small, topologically homogeneous families, SPOCTOPUS and MEMSAT were generally most reliable, while with large, more diverse superfamilies, HMMTOP often proved to have the greatest prediction accuracy. We next developed a novel program, TM-STATS, that tabulates HMMTOP, SPOCTOPUS or MEMSAT-based topological predictions for any subdivision (class, subclass, superfamily, family, subfamily, or any combination of these) of the Transporter Classification Database (TCDB; www.tcdb.org) and examined the following subclasses: α-type channel proteins (TC subclasses 1.A and 1.E), secreted poreforming toxins (TC subclass 1.C) and secondary carriers (subclass 2.A). Histograms 3 were generated for each of these subclasses, and the results were analyzed according to subclass, family and protein. The results provide an update of topological predictions for integral membrane transport proteins as well as guides for the development of more reliable topological prediction programs, taking family-specific characteristics into account. PMID:24992992

  10. Predicting and analyzing protein phosphorylation sites in plants using musite.

    PubMed

    Yao, Qiuming; Gao, Jianjiong; Bollinger, Curtis; Thelen, Jay J; Xu, Dong

    2012-01-01

    Although protein phosphorylation sites can be reliably identified with high-resolution mass spectrometry, the experimental approach is time-consuming and resource-dependent. Furthermore, it is unlikely that an experimental approach could catalog an entire phosphoproteome. Computational prediction of phosphorylation sites provides an efficient and flexible way to reveal potential phosphorylation sites and provide hypotheses in experimental design. Musite is a tool that we previously developed to predict phosphorylation sites based solely on protein sequence. However, it was not comprehensively applied to plants. In this study, the phosphorylation data from Arabidopsis thaliana, B. napus, G. max, M. truncatula, O. sativa, and Z. mays were collected for cross-species testing and the overall plant-specific prediction as well. The results show that the model for A. thaliana can be extended to other organisms, and the overall plant model from Musite outperforms the current plant-specific prediction tools, Plantphos, and PhosphAt, in prediction accuracy. Furthermore, a comparative study of predicted phosphorylation sites across orthologs among different plants was conducted to reveal potential evolutionary features. A bipolar distribution of isolated, non-conserved phosphorylation sites, and highly conserved ones in terms of the amino acid type was observed. It also shows that predicted phosphorylation sites conserved within orthologs do not necessarily share more sequence similarity in the flanking regions than the background, but they often inherit protein disorder, a property that does not necessitate high sequence conservation. Our analysis also suggests that the phosphorylation frequencies among serine, threonine, and tyrosine correlate with their relative proportion in disordered regions. Musite can be used as a web server (http://musite.net) or downloaded as an open-source standalone tool (http://musite.sourceforge.net/). PMID:22934099

  11. Unfolded protein ensembles, folding trajectories, and refolding rate prediction.

    PubMed

    Das, A; Sin, B K; Mohazab, A R; Plotkin, S S

    2013-09-28

    Computer simulations can provide critical information on the unfolded ensemble of proteins under physiological conditions, by explicitly characterizing the geometrical properties of the diverse conformations that are sampled in the unfolded state. A general computational analysis across many proteins has not been implemented however. Here, we develop a method for generating a diverse conformational ensemble, to characterize properties of the unfolded states of intrinsically disordered or intrinsically folded proteins. The method allows unfolded proteins to retain disulfide bonds. We examined physical properties of the unfolded ensembles of several proteins, including chemical shifts, clustering properties, and scaling exponents for the radius of gyration with polymer length. A problem relating simulated and experimental residual dipolar couplings is discussed. We apply our generated ensembles to the problem of folding kinetics, by examining whether the ensembles of some proteins are closer geometrically to their folded structures than others. We find that for a randomly selected dataset of 15 non-homologous 2- and 3-state proteins, quantities such as the average root mean squared deviation between the folded structure and unfolded ensemble correlate with folding rates as strongly as absolute contact order. We introduce a new order parameter that measures the distance travelled per residue, which naturally partitions into a smooth "laminar" and subsequent "turbulent" part of the trajectory. This latter conceptually simple measure with no fitting parameters predicts folding rates in 0 M denaturant with remarkable accuracy (r = -0.95, p = 1 × 10(-7)). The high correlation between folding times and sterically modulated, reconfigurational motion supports the rapid collapse of proteins prior to the transition state as a generic feature in the folding of both two-state and multi-state proteins. This method for generating unfolded ensembles provides a powerful approach to

  12. Unfolded protein ensembles, folding trajectories, and refolding rate prediction

    NASA Astrophysics Data System (ADS)

    Das, A.; Sin, B. K.; Mohazab, A. R.; Plotkin, S. S.

    2013-09-01

    Computer simulations can provide critical information on the unfolded ensemble of proteins under physiological conditions, by explicitly characterizing the geometrical properties of the diverse conformations that are sampled in the unfolded state. A general computational analysis across many proteins has not been implemented however. Here, we develop a method for generating a diverse conformational ensemble, to characterize properties of the unfolded states of intrinsically disordered or intrinsically folded proteins. The method allows unfolded proteins to retain disulfide bonds. We examined physical properties of the unfolded ensembles of several proteins, including chemical shifts, clustering properties, and scaling exponents for the radius of gyration with polymer length. A problem relating simulated and experimental residual dipolar couplings is discussed. We apply our generated ensembles to the problem of folding kinetics, by examining whether the ensembles of some proteins are closer geometrically to their folded structures than others. We find that for a randomly selected dataset of 15 non-homologous 2- and 3-state proteins, quantities such as the average root mean squared deviation between the folded structure and unfolded ensemble correlate with folding rates as strongly as absolute contact order. We introduce a new order parameter that measures the distance travelled per residue, which naturally partitions into a smooth "laminar" and subsequent "turbulent" part of the trajectory. This latter conceptually simple measure with no fitting parameters predicts folding rates in 0 M denaturant with remarkable accuracy (r = -0.95, p = 1 × 10-7). The high correlation between folding times and sterically modulated, reconfigurational motion supports the rapid collapse of proteins prior to the transition state as a generic feature in the folding of both two-state and multi-state proteins. This method for generating unfolded ensembles provides a powerful approach to

  13. Nanostructured functional films from engineered repeat proteins

    PubMed Central

    Grove, Tijana Z.; Regan, Lynne; Cortajarena, Aitziber L.

    2013-01-01

    Fundamental advances in biotechnology, medicine, environment, electronics and energy require methods for precise control of spatial organization at the nanoscale. Assemblies that rely on highly specific biomolecular interactions are an attractive approach to form materials that display novel and useful properties. Here, we report on assembly of films from the designed, rod-shaped, superhelical, consensus tetratricopeptide repeat protein (CTPR). We have designed three peptide-binding sites into the 18 repeat CTPR to allow for further specific and non-covalent functionalization of films through binding of fluorescein labelled peptides. The fluorescence signal from the peptide ligand bound to the protein in the solid film is anisotropic, demonstrating that CTPR films can impose order on otherwise isotropic moieties. Circular dichroism measurements show that the individual protein molecules retain their secondary structure in the film, and X-ray scattering, birefringence and atomic force microscopy experiments confirm macroscopic alignment of CTPR molecules within the film. This work opens the door to the generation of innovative biomaterials with tailored structure and function. PMID:23594813

  14. [Location and functions of secretagogin protein].

    PubMed

    Liu, Qin; Lai, Maode

    2016-01-01

    Secretagogin (SCGN) is a novel member of EF-hand Ca2+-binding proteins, which was identified in islet β cells by Wagner. SCGN is a six EF-hand Ca2+-binding protein, primarily expressed on the neuroendocrine axis and the central nervous system. The protein has abundant biological functions. A certain concentration of calcium ion can lead to conformation change of SCGN, resulting in the change of intracellular signal transduction. Preliminary studies showed that SCGN would be used to treat stress reaction, such as mental illness (depression), burns or post-traumatic stress disorder and chronic stress reaction caused by pain. In Alzheimer's disease, the expression of SCGN in the hippocampus can boycott neurodegeneration. In neuroendocrine tumors, SCGN presents a good consistency with neuroendocrine markers such as CgA, Syn, and NSE, with a higher overall sensitivity and specificity. In addition, SCGN is released into serum after neural damage in cerebral ischemic diseases, suggesting that SCGN can be used as a marker for brain trauma. In this article, we review the recent research progress of secretagogin, focus on its distribution and functions in various tumorous diseases and non-tumorous diseases, such as Alzheimer's disease. PMID:27045242

  15. MASS FUNCTION PREDICTIONS BEYOND {Lambda}CDM

    SciTech Connect

    Bhattacharya, Suman; Lukic, Zarija; Habib, Salman; Heitmann, Katrin; White, Martin; Wagner, Christian

    2011-05-10

    The statistics of dark matter halos is an essential component of precision cosmology. The mass distribution of halos, as specified by the halo mass function, is a key input for several cosmological probes. The sizes of N-body simulations are now such that, for the most part, results need no longer be statistics-limited, but are still subject to various systematic uncertainties. Discrepancies in the results of simulation campaigns for the halo mass function remain in excess of statistical uncertainties and of roughly the same size as the error limits set by near-future observations; we investigate and discuss some of the reasons for these differences. Quantifying error sources and compensating for them as appropriate, we carry out a high-statistics study of dark matter halos from 67 N-body simulations to investigate the mass function and its evolution for a reference {Lambda}CDM cosmology and for a set of wCDM cosmologies. For the reference {Lambda}CDM cosmology (close to WMAP5), we quantify the breaking of universality in the form of the mass function as a function of redshift, finding an evolution of as much as 10% away from the universal form between redshifts z = 0 and z = 2. For cosmologies very close to this reference we provide a fitting formula to our results for the (evolving) {Lambda}CDM mass function over a mass range of 6 x 10{sup 11}-3 x 10{sup 15} M{sub sun} to an estimated accuracy of about 2%. The set of wCDM cosmologies is taken from the Coyote Universe simulation suite. The mass functions from this suite (which includes a {Lambda}CDM cosmology and others with w {approx_equal} -1) are described by the fitting formula for the reference {Lambda}CDM case at an accuracy level of 10%, but with clear systematic deviations. We argue that, as a consequence, fitting formulae based on a universal form for the mass function may have limited utility in high-precision cosmological applications.

  16. Mass Function Predictions Beyond ΛCDM

    NASA Astrophysics Data System (ADS)

    Bhattacharya, Suman; Heitmann, Katrin; White, Martin; Lukić, Zarija; Wagner, Christian; Habib, Salman

    2011-05-01

    The statistics of dark matter halos is an essential component of precision cosmology. The mass distribution of halos, as specified by the halo mass function, is a key input for several cosmological probes. The sizes of N-body simulations are now such that, for the most part, results need no longer be statistics-limited, but are still subject to various systematic uncertainties. Discrepancies in the results of simulation campaigns for the halo mass function remain in excess of statistical uncertainties and of roughly the same size as the error limits set by near-future observations; we investigate and discuss some of the reasons for these differences. Quantifying error sources and compensating for them as appropriate, we carry out a high-statistics study of dark matter halos from 67 N-body simulations to investigate the mass function and its evolution for a reference ΛCDM cosmology and for a set of wCDM cosmologies. For the reference ΛCDM cosmology (close to WMAP5), we quantify the breaking of universality in the form of the mass function as a function of redshift, finding an evolution of as much as 10% away from the universal form between redshifts z = 0 and z = 2. For cosmologies very close to this reference we provide a fitting formula to our results for the (evolving) ΛCDM mass function over a mass range of 6 × 1011-3 × 1015 M sun to an estimated accuracy of about 2%. The set of wCDM cosmologies is taken from the Coyote Universe simulation suite. The mass functions from this suite (which includes a ΛCDM cosmology and others with w ~= -1) are described by the fitting formula for the reference ΛCDM case at an accuracy level of 10%, but with clear systematic deviations. We argue that, as a consequence, fitting formulae based on a universal form for the mass function may have limited utility in high-precision cosmological applications.

  17. Tandem Repeats in Proteins: Prediction Algorithms and Biological Role.

    PubMed

    Pellegrini, Marco

    2015-01-01

    Tandem repetitions in protein sequence and structure is a fascinating subject of research which has been a focus of study since the late 1990s. In this survey, we give an overview on the multi-faceted aspects of research on protein tandem repeats (PTR for short), including prediction algorithms, databases, early classification efforts, mechanisms of PTR formation and evolution, and synthetic PTR design. We also touch on the rather open issue of the relationship between PTR and flexibility (or disorder) in proteins. Detection of PTR either from protein sequence or structure data is challenging due to inherent high (biological) signal-to-noise ratio that is a key feature of this problem. As early in silico analytic tools have been key enablers for starting this field of study, we expect that current and future algorithmic and statistical breakthroughs will have a high impact on the investigations of the biological role of PTR. PMID:26442257

  18. Tandem Repeats in Proteins: Prediction Algorithms and Biological Role

    PubMed Central

    Pellegrini, Marco

    2015-01-01

    Tandem repetitions in protein sequence and structure is a fascinating subject of research which has been a focus of study since the late 1990s. In this survey, we give an overview on the multi-faceted aspects of research on protein tandem repeats (PTR for short), including prediction algorithms, databases, early classification efforts, mechanisms of PTR formation and evolution, and synthetic PTR design. We also touch on the rather open issue of the relationship between PTR and flexibility (or disorder) in proteins. Detection of PTR either from protein sequence or structure data is challenging due to inherent high (biological) signal-to-noise ratio that is a key feature of this problem. As early in silico analytic tools have been key enablers for starting this field of study, we expect that current and future algorithmic and statistical breakthroughs will have a high impact on the investigations of the biological role of PTR. PMID:26442257

  19. Protein secondary structure prediction using logic-based machine learning.

    PubMed

    Muggleton, S; King, R D; Sternberg, M J

    1992-10-01

    Many attempts have been made to solve the problem of predicting protein secondary structure from the primary sequence but the best performance results are still disappointing. In this paper, the use of a machine learning algorithm which allows relational descriptions is shown to lead to improved performance. The Inductive Logic Programming computer program, Golem, was applied to learning secondary structure prediction rules for alpha/alpha domain type proteins. The input to the program consisted of 12 non-homologous proteins (1612 residues) of known structure, together with a background knowledge describing the chemical and physical properties of the residues. Golem learned a small set of rules that predict which residues are part of the alpha-helices--based on their positional relationships and chemical and physical properties. The rules were tested on four independent non-homologous proteins (416 residues) giving an accuracy of 81% (+/- 2%). This is an improvement, on identical data, over the previously reported result of 73% by King and Sternberg (1990, J. Mol. Biol., 216, 441-457) using the machine learning program PROMIS, and of 72% using the standard Garnier-Osguthorpe-Robson method. The best previously reported result in the literature for the alpha/alpha domain type is 76%, achieved using a neural net approach. Machine learning also has the advantage over neural network and statistical methods in producing more understandable results. PMID:1480619

  20. Neural network definitions of highly predictable protein secondary structure classes

    SciTech Connect

    Lapedes, A. |; Steeg, E.; Farber, R.

    1994-02-01

    We use two co-evolving neural networks to determine new classes of protein secondary structure which are significantly more predictable from local amino sequence than the conventional secondary structure classification. Accurate prediction of the conventional secondary structure classes: alpha helix, beta strand, and coil, from primary sequence has long been an important problem in computational molecular biology. Neural networks have been a popular method to attempt to predict these conventional secondary structure classes. Accuracy has been disappointingly low. The algorithm presented here uses neural networks to similtaneously examine both sequence and structure data, and to evolve new classes of secondary structure that can be predicted from sequence with significantly higher accuracy than the conventional classes. These new classes have both similarities to, and differences with the conventional alpha helix, beta strand and coil.

  1. Prediction of change in protein unfolding rates upon point mutations in two state proteins.

    PubMed

    Chaudhary, Priyashree; Naganathan, Athi N; Gromiha, M Michael

    2016-09-01

    Studies on protein unfolding rates are limited and challenging due to the complexity of unfolding mechanism and the larger dynamic range of the experimental data. Though attempts have been made to predict unfolding rates using protein sequence-structure information there is no available method for predicting the unfolding rates of proteins upon specific point mutations. In this work, we have systematically analyzed a set of 790 single mutants and developed a robust method for predicting protein unfolding rates upon mutations (Δlnku) in two-state proteins by combining amino acid properties and knowledge-based classification of mutants with multiple linear regression technique. We obtain a mean absolute error (MAE) of 0.79/s and a Pearson correlation coefficient (PCC) of 0.71 between predicted unfolding rates and experimental observations using jack-knife test. We have developed a web server for predicting protein unfolding rates upon mutation and it is freely available at https://www.iitm.ac.in/bioinfo/proteinunfolding/unfoldingrace.html. Prominent features that determine unfolding kinetics as well as plausible reasons for the observed outliers are also discussed. PMID:27264959

  2. Quantitative reactivity profiling predicts functional cysteines in proteomes

    PubMed Central

    Weerapana, Eranthie; Wang, Chu; Simon, Gabriel M.; Richter, Florian; Khare, Sagar; Dillon, Myles B.D.; Bachovchin, Daniel A.; Mowen, Kerri; Baker, David; Cravatt, Benjamin F.

    2010-01-01

    Cysteine is the most intrinsically nucleophilic amino acid in proteins, where its reactivity is tuned to perform diverse biochemical functions. The absence of a consensus sequence that defines functional cysteines in proteins has hindered their discovery and characterization. Here, we describe a proteomics method to quantitatively profile the intrinsic reactivity of cysteine residues en masse directly in native biological systems. Hyperreactivity was a rare feature among cysteines and found to specify a wide range of activities, including nucleophilic and reductive catalysis and sites of oxidative modification. Hyperreactive cysteines were identified in several proteins of uncharacterized function, including a residue conserved across eukaryotic phylogeny that we show is required for yeast viability and involved in iron-sulfur protein biogenesis. Finally, we demonstrate that quantitative reactivity profiling can also form the basis for screening and functional assignment of cysteines in computationally designed proteins, where it discriminated catalytically active from inactive cysteine hydrolase designs. PMID:21085121

  3. Quantitative reactivity profiling predicts functional cysteines in proteomes.

    PubMed

    Weerapana, Eranthie; Wang, Chu; Simon, Gabriel M; Richter, Florian; Khare, Sagar; Dillon, Myles B D; Bachovchin, Daniel A; Mowen, Kerri; Baker, David; Cravatt, Benjamin F

    2010-12-01

    Cysteine is the most intrinsically nucleophilic amino acid in proteins, where its reactivity is tuned to perform diverse biochemical functions. The absence of a consensus sequence that defines functional cysteines in proteins has hindered their discovery and characterization. Here we describe a proteomics method to profile quantitatively the intrinsic reactivity of cysteine residues en masse directly in native biological systems. Hyper-reactivity was a rare feature among cysteines and it was found to specify a wide range of activities, including nucleophilic and reductive catalysis and sites of oxidative modification. Hyper-reactive cysteines were identified in several proteins of uncharacterized function, including a residue conserved across eukaryotic phylogeny that we show is required for yeast viability and is involved in iron-sulphur protein biogenesis. We also demonstrate that quantitative reactivity profiling can form the basis for screening and functional assignment of cysteines in computationally designed proteins, where it discriminated catalytically active from inactive cysteine hydrolase designs. PMID:21085121

  4. DeepCNF-D: Predicting Protein Order/Disorder Regions by Weighted Deep Convolutional Neural Fields

    PubMed Central

    Wang, Sheng; Weng, Shunyan; Ma, Jianzhu; Tang, Qingming

    2015-01-01

    Intrinsically disordered proteins or protein regions are involved in key biological processes including regulation of transcription, signal transduction, and alternative splicing. Accurately predicting order/disorder regions ab initio from the protein sequence is a prerequisite step for further analysis of functions and mechanisms for these disordered regions. This work presents a learning method, weighted DeepCNF (Deep Convolutional Neural Fields), to improve the accuracy of order/disorder prediction by exploiting the long-range sequential information and the interdependency between adjacent order/disorder labels and by assigning different weights for each label during training and prediction to solve the label imbalance issue. Evaluated by the CASP9 and CASP10 targets, our method obtains 0.855 and 0.898 AUC values, which are higher than the state-of-the-art single ab initio predictors. PMID:26230689

  5. Prediction of Protein-Protein Interactions with Physicochemical Descriptors and Wavelet Transform via Random Forests.

    PubMed

    Jia, Jianhua; Xiao, Xuan; Liu, Bingxiang

    2016-06-01

    Protein-protein interactions (PPIs) provide valuable insight into the inner workings of cells, and it is significant to study the network of PPIs. It is vitally important to develop an automated method as a high-throughput tool to timely predict PPIs. Based on the physicochemical descriptors, a protein was converted into several digital signals, and then wavelet transform was used to analyze them. With such a formulation frame to represent the samples of protein sequences, the random forests algorithm was adopted to conduct prediction. The results on a large-scale independent-test data set show that the proposed model can achieve a good performance with an accuracy value of about 0.86 and a geometric mean value of about 0.85. Therefore, it can be a usefully supplementary tool for PPI prediction. The predictor used in this article is freely available at http://www.jci-bioinfo.cn/PPI_RF. PMID:25882187

  6. Prediction of human protein-protein interaction by a domain-based approach.

    PubMed

    Zhang, Xiaopan; Jiao, Xiong; Song, Jie; Chang, Shan

    2016-05-01

    Protein-protein interactions (PPIs) are vital to a number of biological processes. With computational methods, plenty of domain information can help us to predict and assess PPIs. In this study, we proposed a domain-based approach for the prediction of human PPIs based on the interactions between the proteins and the domains. In this method, an optimizing model was built with the information from InterDom, 3did, DOMINE and Pfam databases. With this model, for 147 proteins in the integrin adhesome PPI network, 736 probable PPIs have been predicted, and the corresponding confidence probabilities of these PPIs were also calculated. It provides an opportunity to visualize the PPIs by using network graphs, which were constructed with Cytoscape, so that we can indicate underlying pathways possible. PMID:26925814

  7. [Functions of prion protein PrPc].

    PubMed

    Cazaubon, Sylvie; Viegas, Pedro; Couraud, Pierre-Olivier

    2007-01-01

    It is now well established that both normal and pathological (or scrapie) isoforms of prion protein, PrPc and PrPsc respectively, are involved in the development and progression of various forms of neurodegenerative diseases, including scrapie in sheep, bovine spongiform encephalopathy (or "mad cow disease") and Creutzfeldt-Jakob disease in human, collectively known as prion diseases. The protein PrPc is highly expressed in the central nervous system in neurons and glial cells, and also present in non-brain cells, such as immune cells or epithelial and endothelial cells. Identification of the physiological functions of PrPc in these different cell types thus appears crucial for understanding the progression of prion diseases. Recent studies highlighted several major roles for PrPc that may be considered in two major domains : (1) cell survival (protection against oxidative stress and apoptosis) and (2) cell adhesion. In association with cell adhesion, distinct functions of PrPc were observed, depending on cell types : neuronal differentiation, epithelial and endothelial barrier integrity, transendothelial migration of monocytes, T cell activation. These observations suggest that PrPc functions may be particularly relevant to cellular stress, as well as inflammatory or infectious situations. PMID:17875293

  8. Early executive function predicts reasoning development.

    PubMed

    Richland, Lindsey E; Burchinal, Margaret R

    2013-01-01

    Analogical reasoning is a core cognitive skill that distinguishes humans from all other species and contributes to general fluid intelligence, creativity, and adaptive learning capacities. Yet its origins are not well understood. In the study reported here, we analyzed large-scale longitudinal data from the Study of Early Child Care and Youth Development to test predictors of growth in analogical-reasoning skill from third grade to adolescence. Our results suggest an integrative resolution to the theoretical debate regarding contributory factors arising from smaller-scale, cross-sectional experiments on analogy development. Children with greater executive-function skills (both composite and inhibitory control) and vocabulary knowledge in early elementary school displayed higher scores on a verbal analogies task at age 15 years, even after adjusting for key covariates. We posit that knowledge is a prerequisite to analogy performance, but strong executive-functioning resources during early childhood are related to long-term gains in fundamental reasoning skills. PMID:23184588

  9. Fast prediction and visualization of protein binding pockets with PASS.

    PubMed

    Brady, G P; Stouten, P F

    2000-05-01

    PASS (Putative Active Sites with Spheres) is a simple computational tool that uses geometry to characterize regions of buried volume in proteins and to identify positions likely to represent binding sites based upon the size, shape, and burial extent of these volumes. Its utility as a predictive tool for binding site identification is tested by predicting known binding sites of proteins in the PDB using both complexed macromolecules and their corresponding apoprotein structures. The results indicate that PASS can serve as a front-end to fast docking. The main utility of PASS lies in the fact that it can analyze a moderate-size protein (approximately 30 kDa) in under 20 s, which makes it suitable for interactive molecular modeling, protein database analysis, and aggressive virtual screening efforts. As a modeling tool, PASS (i) rapidly identifies favorable regions of the protein surface, (ii) simplifies visualization of residues modulating binding in these regions, and (iii) provides a means of directly visualizing buried volume, which is often inferred indirectly from curvature in a surface representation. PASS produces output in the form of standard PDB files, which are suitable for any modeling package, and provides script files to simplify visualization in Cerius2, InsightII, MOE, Quanta, RasMol, and Sybyl. PASS is freely available to all. PMID:10815774

  10. A Historical Perspective and Overview of Protein Structure Prediction

    NASA Astrophysics Data System (ADS)

    Wooley, John C.; Ye, Yuzhen

    Carrying on many different biological functions, proteins are all composed of one or more polypeptide chains, each containing from several to hundreds or even thousands of the 20 amino acids. During the 1950s at the dawn of modern biochemistry, an essential question for biochemists was to understand the structure and function of these polypeptide chains. The sequences of protein, also referred to as their primary structures, determine the different chemical properties for different proteins, and thus continue to captivate much of the attention of biochemists. As an early step in characterizing protein chemistry, British biochemist Frederick Sanger designed an experimental method to identify the sequence of insulin (Sanger et al., 1955). He became the first person to obtain the primary structure of a protein and in 1958 won his first Nobel Price in Chemistry. This important progress in sequencing did not answer the question of whether a single (individual) protein has a distinctive shape in three dimensions (3D), and if so, what factors determine its 3D architecture. However, during the period when Sanger was studying the primary structure of proteins, American biochemist Christian Anfinsen observed that the active polypeptide chain of a model protein, bovine pancreatic ribonuclease (RNase), could fold spontaneously into a unique 3D structure, which was later called native conformation of the protein (Anfinsen et al., 1954). Anfinsen also studied the refolding of RNase enzyme and observed that an enzyme unfolded under extreme chemical environment could refold spontaneously back into its native conformation upon changing the environment back to natural conditions (Anfinsen et al., 1961). By 1962, Anfinsen had developed his theory of protein folding (which was summarized in his 1972 Nobel acceptance speech): "The native conformation is determined by the totality of interatomic interactions and hence, by the amino acid sequence, in a given environment."

  11. A simple feature construction method for predicting upstream/downstream signal flow in human protein-protein interaction networks

    PubMed Central

    Mei, Suyu; Zhu, Hao

    2015-01-01

    Signaling pathways play important roles in understanding the underlying mechanism of cell growth, cell apoptosis, organismal development and pathways-aberrant diseases. Protein-protein interaction (PPI) networks are commonly-used infrastructure to infer signaling pathways. However, PPI networks generally carry no information of upstream/downstream relationship between interacting proteins, which retards our inferring the signal flow of signaling pathways. In this work, we propose a simple feature construction method to train a SVM (support vector machine) classifier to predict PPI upstream/downstream relations. The domain based asymmetric feature representation naturally embodies domain-domain upstream/downstream relations, providing an unconventional avenue to predict the directionality between two objects. Moreover, we propose a semantically interpretable decision function and a macro bag-level performance metric to satisfy the need of two-instance depiction of an interacting protein pair. Experimental results show that the proposed method achieves satisfactory cross validation performance and independent test performance. Lastly, we use the trained model to predict the PPIs in HPRD, Reactome and IntAct. Some predictions have been validated against recent literature. PMID:26648121

  12. Functions and possible provenance of primordial proteins.

    PubMed

    Sommer, Andrei P; Miyake, Norimune; Wickramasinghe, N Chandra; Narlikar, Jayant V; Al-Mufti, Shirwan

    2004-01-01

    Nanobacteria or living nanovesicles are of great interest to the scientific community because of their dual nature: on the one hand, they appear as primal biosystems originating life; on the other hand, they can cause severe diseases. Their survival as well as their pathogenic potential is apparently linked to a self-synthesized protein-based slime, rich in calcium and phosphate (when available). Here, we provide challenging evidence for the occurrence of nanobacteria in the stratosphere, reflecting a possibly primordial provenance of the slime. An analysis of the slime's biological functions may lead to novel strategies suitable to block adhesion modalities in modern bacterial populations. PMID:15595742

  13. Quantification of protein group coherence and pathway assignment using functional association

    PubMed Central

    2011-01-01

    Background Genomics and proteomics experiments produce a large amount of data that are awaiting functional elucidation. An important step in analyzing such data is to identify functional units, which consist of proteins that play coherent roles to carry out the function. Importantly, functional coherence is not identical with functional similarity. For example, proteins in the same pathway may not share the same Gene Ontology (GO) terms, but they work in a coordinated fashion so that the aimed function can be performed. Thus, simply applying existing functional similarity measures might not be the best solution to identify functional units in omics data. Results We have designed two scores for quantifying the functional coherence by considering association of GO terms observed in two biological contexts, co-occurrences in protein annotations and co-mentions in literature in the PubMed database. The counted co-occurrences of GO terms were normalized in a similar fashion as the statistical amino acid contact potential is computed in the protein structure prediction field. We demonstrate that the developed scores can identify functionally coherent protein sets, i.e. proteins in the same pathways, co-localized proteins, and protein complexes, with statistically significant score values showing a better accuracy than existing functional similarity scores. The scores are also capable of detecting protein pairs that interact with each other. It is further shown that the functional coherence scores can accurately assign proteins to their respective pathways. Conclusion We have developed two scores which quantify the functional coherence of sets of proteins. The scores reflect the actual associations of GO terms observed either in protein annotations or in literature. It has been shown that they have the ability to accurately distinguish biologically relevant groups of proteins from random ones as well as a good discriminative power for detecting interacting pairs of

  14. Probing High-density Functional Protein Microarrays to Detect Protein-protein Interactions.

    PubMed

    Fasolo, Joseph; Im, Hogune; Snyder, Michael P

    2015-01-01

    High-density functional protein microarrays containing ~4,200 recombinant yeast proteins are examined for kinase protein-protein interactions using an affinity purified yeast kinase fusion protein containing a V5-epitope tag for read-out. Purified kinase is obtained through culture of a yeast strain optimized for high copy protein production harboring a plasmid containing a Kinase-V5 fusion construct under a GAL inducible promoter. The yeast is grown in restrictive media with a neutral carbon source for 6 hr followed by induction with 2% galactose. Next, the culture is harvested and kinase is purified using standard affinity chromatographic techniques to obtain a highly purified protein kinase for use in the assay. The purified kinase is diluted with kinase buffer to an appropriate range for the assay and the protein microarrays are blocked prior to hybridization with the protein microarray. After the hybridization, the arrays are probed with monoclonal V5 antibody to identify proteins bound by the kinase-V5 protein. Finally, the arrays are scanned using a standard microarray scanner, and data is extracted for downstream informatics analysis to determine a high confidence set of protein interactions for downstream validation in vivo. PMID:26274875

  15. Functional Analysis of GLRX5 Mutants Reveals Distinct Functionalities of GLRX5 Protein.

    PubMed

    Liu, Gang; Wang, Yongwei; Anderson, Gregory J; Camaschella, Clara; Chang, Yanzhong; Nie, Guangjun

    2016-01-01

    Glutaredoxin 5 (GLRX5) is a 156 amino acid mitochondrial protein that plays an essential role in mitochondrial iron-sulfur cluster transfer. Mutations in this protein were reported to result in sideroblastic anemia and variant nonketotic hyperglycinemia in human. Recently, we have characterized a Chinese congenital sideroblastic anemia patient who has two compound heterozygous missense mutations (c. 301 A>C and c. 443 T>C) in his GLRX5 gene. Herein, we developed a GLRX5 knockout K562 cell line and studied the biochemical functions of the identified pathogenic mutations and other conserved amino acids with predicted essential functions. We observed that the K101Q mutation (due to c. 301 A>C mutation) may prevent the binding of [Fe-S] to GLRX5 protein, while L148S (due to c. 443 T>C mutation) may interfere with [Fe-S] transfer from GLRX5 to iron regulatory protein 1 (IRP1), mitochondrial aconitase (m-aconitase) and ferrochelatase. We also demonstrated that L148S is functionally complementary to the K51del mutant with respect to Fe/S-ferrochelatase, Fe/S-IRP1, Fe/S-succinate dehydrogenase, and Fe/S-m-aconitase biosynthesis and lipoylation of pyruvate dehydrogenase complex and α-ketoglutarate dehydrogenase complex. Furthermore, we demonstrated that the mutations of highly conserved amino acid residues in GLRX5 protein can have different effects on downstream Fe/S proteins. Collectively, our current work demonstrates that GLRX5 protein is multifunctional in [Fe-S] protein synthesis and maturation and defects of the different amino acids of the protein will lead to distinct effects on downstream Fe/S biosynthesis. PMID:26100117

  16. Metabolic Syndrome Biomarkers Predict Lung Function Impairment

    PubMed Central

    Naveed, Bushra; Weiden, Michael D.; Kwon, Sophia; Gracely, Edward J.; Comfort, Ashley L.; Ferrier, Natalia; Kasturiarachchi, Kusali J.; Cohen, Hillel W.; Aldrich, Thomas K.; Rom, William N.; Kelly, Kerry; Prezant, David J.

    2012-01-01

    Rationale: Cross-sectional studies demonstrate an association between metabolic syndrome and impaired lung function. Objectives: To define if metabolic syndrome biomarkers are risk factors for loss of lung function after irritant exposure. Methods: A nested case-control study of Fire Department of New York personnel with normal pre–September 11th FEV1 and who presented for subspecialty pulmonary evaluation before March 10, 2008. We correlated metabolic syndrome biomarkers obtained within 6 months of World Trade Center dust exposure with subsequent FEV1. FEV1 at subspecialty pulmonary evaluation within 6.5 years defined disease status; cases had FEV1 less than lower limit of normal, whereas control subjects had FEV1 greater than or equal to lower limit of normal. Measurements and Main Results: Clinical data and serum sampled at the first monitoring examination within 6 months of September 11, 2001, assessed body mass index, heart rate, serum glucose, triglycerides and high-density lipoprotein (HDL), leptin, pancreatic polypeptide, and amylin. Cases and control subjects had significant differences in HDL less than 40 mg/dl with triglycerides greater than or equal to 150 mg/dl, heart rate greater than or equal to 66 bpm, and leptin greater than or equal to 10,300 pg/ml. Each increased the odds of abnormal FEV1 at pulmonary evaluation by more than twofold, whereas amylin greater than or equal to 116 pg/ml decreased the odds by 84%, in a multibiomarker model adjusting for age, race, body mass index, and World Trade Center arrival time. This model had a sensitivity of 41%, a specificity of 86%, and a receiver operating characteristic area under the curve of 0.77. Conclusions: Abnormal triglycerides and HDL and elevated heart rate and leptin are independent risk factors of greater susceptibility to lung function impairment after September 11, 2001, whereas elevated amylin is protective. Metabolic biomarkers are predictors of lung disease, and may be useful for assessing

  17. Structure and Function of Microbial Metal-Reduction Proteins

    SciTech Connect

    Xu, Ying; Crawford, Oakly H.; Xu, Dong; Larimer, Frank W.; Uberbacher, Edward C.; Zhou, Jizhong

    2009-09-02

    In this project, we proposed (i) identification of metal-reduction genes, (ii) development of new threading techniques and (iii) fold recognition and structure prediction of metal-reduction proteins. However, due to the reduction of the budget, we revised our plan to focus on two specific aims of (i) developing a new threading-based protein structure prediction method, and (ii) developing an expert system for protein structure prediction.

  18. Physiological Functions of APP Family Proteins

    PubMed Central

    Müller, Ulrike C.; Zheng, Hui

    2012-01-01

    Biochemical and genetic evidence establishes a central role of the amyloid precursor protein (APP) in Alzheimer disease (AD) pathogenesis. Biochemically, deposition of the β-amyloid (Aβ) peptides produced from proteolytic processing of APP forms the defining pathological hallmark of AD; genetically, both point mutations and duplications of wild-type APP are linked to a subset of early onset of familial AD (FAD) and cerebral amyloid angiopathy. As such, the biological functions of APP and its processing products have been the subject of intense investigation, and the past 20+ years of research have met with both excitement and challenges. This article will review the current understanding of the physiological functions of APP in the context of APP family members. PMID:22355794

  19. Using viromes to predict novel immune proteins in non-model organisms

    PubMed Central

    Lim, Yan Wei; Silva, Genivaldo Gueiros Z.; Nelson, Craig E.; Haas, Andreas F.; Kelly, Linda Wegley; Edwards, Robert A.; Rohwer, Forest L.

    2016-01-01

    Immunity is mostly studied in a few model organisms, leaving the majority of immune systems on the planet unexplored. To characterize the immune systems of non-model organisms alternative approaches are required. Viruses manipulate host cell biology through the expression of proteins that modulate the immune response. We hypothesized that metagenomic sequencing of viral communities would be useful to identify both known and unknown host immune proteins. To test this hypothesis, a mock human virome was generated and compared to the human proteome using tBLASTn, resulting in 36 proteins known to be involved in immunity. This same pipeline was then applied to reef-building coral, a non-model organism that currently lacks traditional molecular tools like transgenic animals, gene-editing capabilities, and in vitro cell cultures. Viromes isolated from corals and compared with the predicted coral proteome resulted in 2503 coral proteins, including many proteins involved with pathogen sensing and apoptosis. There were also 159 coral proteins predicted to be involved with coral immunity but currently lacking any functional annotation. The pipeline described here provides a novel method to rapidly predict host immune components that can be applied to virtually any system with the potential to discover novel immune proteins. PMID:27581878

  20. Predicting Interaction Sites from the Energetics of Isolated Proteins: A New Approach to Epitope Mapping

    PubMed Central

    Scarabelli, Guido; Morra, Giulia; Colombo, Giorgio

    2010-01-01

    Abstract An increasing number of functional studies of proteins have shown that sequence and structural similarities alone may not be sufficient for reliable prediction of their interaction properties. This is particularly true for proteins recognizing specific antibodies, where the prediction of antibody-binding sites, called epitopes, has proven challenging. The antibody-binding properties of an antigen depend on its structure and related dynamics. Aiming to predict the antibody-binding regions of a protein, we investigate a new approach based on the integrated analysis of the dynamical and energetic properties of antigens, to identify nonoptimized, low-intensity energetic interaction networks in the protein structure isolated in solution. The method is based on the idea that recognition sites may correspond to localized regions with low-intensity energetic couplings with the rest of the protein, which allows them to undergo conformational changes, to be recognized by a binding partner, and to tolerate mutations with minimal energetic expense. Upon analyzing the results on isolated proteins and benchmarking against antibody complexes, it is found that the method successfully identifies binding sites located on the protein surface that are accessible to putative binding partners. The combination of dynamics and energetics can thus discriminate between epitopes and other substructures based only on physical properties. We discuss implications for vaccine design. PMID:20441761