Science.gov

Sample records for predicting protein function

  1. Predicting protein functions from PPI networks using functional aggregation.

    PubMed

    Hou, Jingyu; Chi, Xiaoxiao

    2012-11-01

    Predicting protein functions computationally from massive protein-protein interaction (PPI) data generated by high-throughput technology is one of the challenges and fundamental problems in the post-genomic era. Although there have been many approaches developed for computationally predicting protein functions, the mutual correlations among proteins in terms of protein functions have not been thoroughly investigated and incorporated into existing prediction methods, especially in voting based prediction methods. In this paper, we propose an innovative method to predict protein functions from PPI data by aggregating the functional correlations among relevant proteins using the Choquet-Integral in fuzzy theory. This functional aggregation measures the real impact of each relevant protein function on the final prediction results, and reduces the impact of repeated functional information on the prediction. Accordingly, a new protein similarity and a new iterative prediction algorithm are proposed in this paper. The experimental evaluations on real PPI datasets demonstrate the effectiveness of our method.

  2. Protein Function Prediction: Problems and Pitfalls.

    PubMed

    Pearson, William R

    2015-01-01

    The characterization of new genomes based on their protein sets has been revolutionized by new sequencing technologies, but biologists seeking to exploit new sequence information are often frustrated by the challenges associated with accurately assigning biological functions to newly identified proteins. Here, we highlight some of the challenges in functional inference from sequence similarity. Investigators can improve the accuracy of function prediction by (1) being conservative about the evolutionary distance to a protein of known function; (2) considering the ambiguous meaning of "functional similarity," and (3) being aware of the limitations of annotations in functional databases. Protein function prediction does not offer "one-size-fits-all" solutions. Prediction strategies work better when the idiosyncrasies of function and functional annotation are better understood. PMID:26334923

  3. Protein Function Prediction: Problems and Pitfalls.

    PubMed

    Pearson, William R

    2015-01-01

    The characterization of new genomes based on their protein sets has been revolutionized by new sequencing technologies, but biologists seeking to exploit new sequence information are often frustrated by the challenges associated with accurately assigning biological functions to newly identified proteins. Here, we highlight some of the challenges in functional inference from sequence similarity. Investigators can improve the accuracy of function prediction by (1) being conservative about the evolutionary distance to a protein of known function; (2) considering the ambiguous meaning of "functional similarity," and (3) being aware of the limitations of annotations in functional databases. Protein function prediction does not offer "one-size-fits-all" solutions. Prediction strategies work better when the idiosyncrasies of function and functional annotation are better understood.

  4. Year 2 Report: Protein Function Prediction Platform

    SciTech Connect

    Zhou, C E

    2012-04-27

    Upon completion of our second year of development in a 3-year development cycle, we have completed a prototype protein structure-function annotation and function prediction system: Protein Function Prediction (PFP) platform (v.0.5). We have met our milestones for Years 1 and 2 and are positioned to continue development in completion of our original statement of work, or a reasonable modification thereof, in service to DTRA Programs involved in diagnostics and medical countermeasures research and development. The PFP platform is a multi-scale computational modeling system for protein structure-function annotation and function prediction. As of this writing, PFP is the only existing fully automated, high-throughput, multi-scale modeling, whole-proteome annotation platform, and represents a significant advance in the field of genome annotation (Fig. 1). PFP modules perform protein functional annotations at the sequence, systems biology, protein structure, and atomistic levels of biological complexity (Fig. 2). Because these approaches provide orthogonal means of characterizing proteins and suggesting protein function, PFP processing maximizes the protein functional information that can currently be gained by computational means. Comprehensive annotation of pathogen genomes is essential for bio-defense applications in pathogen characterization, threat assessment, and medical countermeasure design and development in that it can short-cut the time and effort required to select and characterize protein biomarkers.

  5. Integrating multiple networks for protein function prediction

    PubMed Central

    2015-01-01

    Background High throughput techniques produce multiple functional association networks. Integrating these networks can enhance the accuracy of protein function prediction. Many algorithms have been introduced to generate a composite network, which is obtained as a weighted sum of individual networks. The weight assigned to an individual network reflects its benefit towards the protein functional annotation inference. A classifier is then trained on the composite network for predicting protein functions. However, since these techniques model the optimization of the composite network and the prediction tasks as separate objectives, the resulting composite network is not necessarily optimal for the follow-up protein function prediction. Results We address this issue by modeling the optimization of the composite network and the prediction problems within a unified objective function. In particular, we use a kernel target alignment technique and the loss function of a network based classifier to jointly adjust the weights assigned to the individual networks. We show that the proposed method, called MNet, can achieve a performance that is superior (with respect to different evaluation criteria) to related techniques using the multiple networks of four example species (yeast, human, mouse, and fly) annotated with thousands (or hundreds) of GO terms. Conclusion MNet can effectively integrate multiple networks for protein function prediction and is robust to the input parameters. Supplementary data is available at https://sites.google.com/site/guoxian85/home/mnet. The Matlab code of MNet is available upon request. PMID:25707434

  6. Quantitative assessment of protein function prediction programs.

    PubMed

    Rodrigues, B N; Steffens, M B R; Raittz, R T; Santos-Weiss, I C R; Marchaukoski, J N

    2015-12-21

    Fast prediction of protein function is essential for high-throughput sequencing analysis. Bioinformatic resources provide cheaper and faster techniques for function prediction and have helped to accelerate the process of protein sequence characterization. In this study, we assessed protein function prediction programs that accept amino acid sequences as input. We analyzed the classification, equality, and similarity between programs, and, additionally, compared program performance. The following programs were selected for our assessment: Blast2GO, InterProScan, PANTHER, Pfam, and ScanProsite. This selection was based on the high number of citations (over 500), fully automatic analysis, and the possibility of returning a single best classification per sequence. We tested these programs using 12 gold standard datasets from four different sources. The gold standard classification of the databases was based on expert analysis, the Protein Data Bank, or the Structure-Function Linkage Database. We found that the miss rate among the programs is globally over 50%. Furthermore, we observed little overlap in the correct predictions from each program. Therefore, a combination of multiple types of sources and methods, including experimental data, protein-protein interaction, and data mining, may be the best way to generate more reliable predictions and decrease the miss rate.

  7. Quantitative assessment of protein function prediction programs.

    PubMed

    Rodrigues, B N; Steffens, M B R; Raittz, R T; Santos-Weiss, I C R; Marchaukoski, J N

    2015-01-01

    Fast prediction of protein function is essential for high-throughput sequencing analysis. Bioinformatic resources provide cheaper and faster techniques for function prediction and have helped to accelerate the process of protein sequence characterization. In this study, we assessed protein function prediction programs that accept amino acid sequences as input. We analyzed the classification, equality, and similarity between programs, and, additionally, compared program performance. The following programs were selected for our assessment: Blast2GO, InterProScan, PANTHER, Pfam, and ScanProsite. This selection was based on the high number of citations (over 500), fully automatic analysis, and the possibility of returning a single best classification per sequence. We tested these programs using 12 gold standard datasets from four different sources. The gold standard classification of the databases was based on expert analysis, the Protein Data Bank, or the Structure-Function Linkage Database. We found that the miss rate among the programs is globally over 50%. Furthermore, we observed little overlap in the correct predictions from each program. Therefore, a combination of multiple types of sources and methods, including experimental data, protein-protein interaction, and data mining, may be the best way to generate more reliable predictions and decrease the miss rate. PMID:26782400

  8. Hierarchical Ensemble Methods for Protein Function Prediction

    PubMed Central

    2014-01-01

    Protein function prediction is a complex multiclass multilabel classification problem, characterized by multiple issues such as the incompleteness of the available annotations, the integration of multiple sources of high dimensional biomolecular data, the unbalance of several functional classes, and the difficulty of univocally determining negative examples. Moreover, the hierarchical relationships between functional classes that characterize both the Gene Ontology and FunCat taxonomies motivate the development of hierarchy-aware prediction methods that showed significantly better performances than hierarchical-unaware “flat” prediction methods. In this paper, we provide a comprehensive review of hierarchical methods for protein function prediction based on ensembles of learning machines. According to this general approach, a separate learning machine is trained to learn a specific functional term and then the resulting predictions are assembled in a “consensus” ensemble decision, taking into account the hierarchical relationships between classes. The main hierarchical ensemble methods proposed in the literature are discussed in the context of existing computational methods for protein function prediction, highlighting their characteristics, advantages, and limitations. Open problems of this exciting research area of computational biology are finally considered, outlining novel perspectives for future research. PMID:25937954

  9. A new protein structure representation for efficient protein function prediction.

    PubMed

    Maghawry, Huda A; Mostafa, Mostafa G M; Gharib, Tarek F

    2014-12-01

    One of the challenging problems in bioinformatics is the prediction of protein function. Protein function is the main key that can be used to classify different proteins. Protein function can be inferred experimentally with very small throughput or computationally with very high throughput. Computational methods are sequence based or structure based. Structure-based methods produce more accurate protein function prediction. In this article, we propose a new protein structure representation for efficient protein function prediction. The representation is based on three-dimensional patterns of protein residues. In the analysis, we used protein function based on enzyme activity through six mechanistically diverse enzyme superfamilies: amidohydrolase, crotonase, haloacid dehalogenase, isoprenoid synthase type I, and vicinal oxygen chelate. We applied three different classification methods, naïve Bayes, k-nearest neighbors, and random forest, to predict the enzyme superfamily of a given protein. The prediction accuracy using the proposed representation outperforms a recently introduced representation method that is based only on the distance patterns. The results show that the proposed representation achieved prediction accuracy up to 98%, with improvement of about 10% on average.

  10. Predicting Protein Function Using Multiple Kernels.

    PubMed

    Yu, Guoxian; Rangwala, Huzefa; Domeniconi, Carlotta; Zhang, Guoji; Zhang, Zili

    2015-01-01

    High-throughput experimental techniques provide a wide variety of heterogeneous proteomic data sources. To exploit the information spread across multiple sources for protein function prediction, these data sources are transformed into kernels and then integrated into a composite kernel. Several methods first optimize the weights on these kernels to produce a composite kernel, and then train a classifier on the composite kernel. As such, these approaches result in an optimal composite kernel, but not necessarily in an optimal classifier. On the other hand, some approaches optimize the loss of binary classifiers and learn weights for the different kernels iteratively. For multi-class or multi-label data, these methods have to solve the problem of optimizing weights on these kernels for each of the labels, which are computationally expensive and ignore the correlation among labels. In this paper, we propose a method called Predicting Protein Function using Multiple Kernels (ProMK). ProMK iteratively optimizes the phases of learning optimal weights and reduces the empirical loss of multi-label classifier for each of the labels simultaneously. ProMK can integrate kernels selectively and downgrade the weights on noisy kernels. We investigate the performance of ProMK on several publicly available protein function prediction benchmarks and synthetic datasets. We show that the proposed approach performs better than previously proposed protein function prediction approaches that integrate multiple data sources and multi-label multiple kernel learning methods. The codes of our proposed method are available at https://sites.google.com/site/guoxian85/promk.

  11. Protein function prediction based on data fusion and functional interrelationship.

    PubMed

    Meng, Jun; Wekesa, Jael-Sanyanda; Shi, Guan-Li; Luan, Yu-Shi

    2016-04-01

    One of the challenging tasks of bioinformatics is to predict more accurate and confident protein functions from genomics and proteomics datasets. Computational approaches use a variety of high throughput experimental data, such as protein-protein interaction (PPI), protein sequences and phylogenetic profiles, to predict protein functions. This paper presents a method that uses transductive multi-label learning algorithm by integrating multiple data sources for classification. Multiple proteomics datasets are integrated to make inferences about functions of unknown proteins and use a directed bi-relational graph to assign labels to unannotated proteins. Our method, bi-relational graph based transductive multi-label function annotation (Bi-TMF) uses functional correlation and topological PPI network properties on both the training and testing datasets to predict protein functions through data fusion of the individual kernel result. The main purpose of our proposed method is to enhance the performance of classifier integration for protein function prediction algorithms. Experimental results demonstrate the effectiveness and efficiency of Bi-TMF on multi-sources datasets in yeast, human and mouse benchmarks. Bi-TMF outperforms other recently proposed methods. PMID:26869536

  12. Consistent probabilistic outputs for protein function prediction

    PubMed Central

    Obozinski, Guillaume; Lanckriet, Gert; Grant, Charles; Jordan, Michael I; Noble, William Stafford

    2008-01-01

    In predicting hierarchical protein function annotations, such as terms in the Gene Ontology (GO), the simplest approach makes predictions for each term independently. However, this approach has the unfortunate consequence that the predictor may assign to a single protein a set of terms that are inconsistent with one another; for example, the predictor may assign a specific GO term to a given protein ('purine nucleotide binding') but not assign the parent term ('nucleotide binding'). Such predictions are difficult to interpret. In this work, we focus on methods for calibrating and combining independent predictions to obtain a set of probabilistic predictions that are consistent with the topology of the ontology. We call this procedure 'reconciliation'. We begin with a baseline method for predicting GO terms from a collection of data types using an ensemble of discriminative classifiers. We apply the method to a previously described benchmark data set, and we demonstrate that the resulting predictions are frequently inconsistent with the topology of the GO. We then consider 11 distinct reconciliation methods: three heuristic methods; four variants of a Bayesian network; an extension of logistic regression to the structured case; and three novel projection methods - isotonic regression and two variants of a Kullback-Leibler projection method. We evaluate each method in three different modes - per term, per protein and joint - corresponding to three types of prediction tasks. Although the principal goal of reconciliation is interpretability, it is important to assess whether interpretability comes at a cost in terms of precision and recall. Indeed, we find that many apparently reasonable reconciliation methods yield reconciled probabilities with significantly lower precision than the original, unreconciled estimates. On the other hand, we find that isotonic regression usually performs better than the underlying, unreconciled method, and almost never performs worse

  13. Template-based prediction of protein function.

    PubMed

    Petrey, Donald; Chen, T Scott; Deng, Lei; Garzon, Jose Ignacio; Hwang, Howook; Lasso, Gorka; Lee, Hunjoong; Silkov, Antonina; Honig, Barry

    2015-06-01

    We discuss recent approaches for structure-based protein function annotation. We focus on template-based methods where the function of a query protein is deduced from that of a template for which both the structure and function are known. We describe the different ways of identifying a template. These are typically based on sequence analysis but new methods based on purely structural similarity are also being developed that allow function annotation based on structural relationships that cannot be recognized by sequence. The growing number of available structures of known function, improved homology modeling techniques and new developments in the use of structure allow template-based methods to be applied on a proteome-wide scale and in many different biological contexts. This progress significantly expands the range of applicability of structural information in function annotation to a level that previously was only achievable by sequence comparison.

  14. Functional prediction of hypothetical proteins in human adenoviruses.

    PubMed

    Dorden, Shane; Mahadevan, Padmanabhan

    2015-01-01

    Assigning functional information to hypothetical proteins in virus genomes is crucial for gaining insight into their proteomes. Human adenoviruses are medium sized viruses that cause a range of diseases. Their genomes possess proteins with uncharacterized function known as hypothetical proteins. Using a wide range of protein function prediction servers, functional information was obtained about these hypothetical proteins. A comparison of functional information obtained from these servers revealed that some of them produced functional information, while others provided little functional information about these human adenovirus hypothetical proteins. The PFP, ESG, PSIPRED, 3d2GO, and ProtFun servers produced the most functional information regarding these hypothetical proteins. PMID:26664031

  15. A Survey of Computational Intelligence Techniques in Protein Function Prediction

    PubMed Central

    Tiwari, Arvind Kumar; Srivastava, Rajeev

    2014-01-01

    During the past, there was a massive growth of knowledge of unknown proteins with the advancement of high throughput microarray technologies. Protein function prediction is the most challenging problem in bioinformatics. In the past, the homology based approaches were used to predict the protein function, but they failed when a new protein was different from the previous one. Therefore, to alleviate the problems associated with homology based traditional approaches, numerous computational intelligence techniques have been proposed in the recent past. This paper presents a state-of-the-art comprehensive review of various computational intelligence techniques for protein function predictions using sequence, structure, protein-protein interaction network, and gene expression data used in wide areas of applications such as prediction of DNA and RNA binding sites, subcellular localization, enzyme functions, signal peptides, catalytic residues, nuclear/G-protein coupled receptors, membrane proteins, and pathway analysis from gene expression datasets. This paper also summarizes the result obtained by many researchers to solve these problems by using computational intelligence techniques with appropriate datasets to improve the prediction performance. The summary shows that ensemble classifiers and integration of multiple heterogeneous data are useful for protein function prediction. PMID:25574395

  16. Protein Structure and Function Prediction Using I-TASSER.

    PubMed

    Yang, Jianyi; Zhang, Yang

    2015-01-01

    I-TASSER is a hierarchical protocol for automated protein structure prediction and structure-based function annotation. Starting from the amino acid sequence of target proteins, I-TASSER first generates full-length atomic structural models from multiple threading alignments and iterative structural assembly simulations followed by atomic-level structure refinement. The biological functions of the protein, including ligand-binding sites, enzyme commission number, and gene ontology terms, are then inferred from known protein function databases based on sequence and structure profile comparisons. I-TASSER is freely available as both an on-line server and a stand-alone package. This unit describes how to use the I-TASSER protocol to generate structure and function prediction and how to interpret the prediction results, as well as alternative approaches for further improving the I-TASSER modeling quality for distant-homologous and multi-domain protein targets.

  17. Protein function prediction using neighbor relativity in protein-protein interaction network.

    PubMed

    Moosavi, Sobhan; Rahgozar, Masoud; Rahimi, Amir

    2013-04-01

    There is a large gap between the number of discovered proteins and the number of functionally annotated ones. Due to the high cost of determining protein function by wet-lab research, function prediction has become a major task for computational biology and bioinformatics. Some researches utilize the proteins interaction information to predict function for un-annotated proteins. In this paper, we propose a novel approach called "Neighbor Relativity Coefficient" (NRC) based on interaction network topology which estimates the functional similarity between two proteins. NRC is calculated for each pair of proteins based on their graph-based features including distance, common neighbors and the number of paths between them. In order to ascribe function to an un-annotated protein, NRC estimates a weight for each neighbor to transfer its annotation to the unknown protein. Finally, the unknown protein will be annotated by the top score transferred functions. We also investigate the effect of using different coefficients for various types of functions. The proposed method has been evaluated on Saccharomyces cerevisiae and Homo sapiens interaction networks. The performance analysis demonstrates that NRC yields better results in comparison with previous protein function prediction approaches that utilize interaction network.

  18. Protein function prediction using guilty by association from interaction networks.

    PubMed

    Piovesan, Damiano; Giollo, Manuel; Ferrari, Carlo; Tosatto, Silvio C E

    2015-12-01

    Protein function prediction from sequence using the Gene Ontology (GO) classification is useful in many biological problems. It has recently attracted increasing interest, thanks in part to the Critical Assessment of Function Annotation (CAFA) challenge. In this paper, we introduce Guilty by Association on STRING (GAS), a tool to predict protein function exploiting protein-protein interaction networks without sequence similarity. The assumption is that whenever a protein interacts with other proteins, it is part of the same biological process and located in the same cellular compartment. GAS retrieves interaction partners of a query protein from the STRING database and measures enrichment of the associated functional annotations to generate a sorted list of putative functions. A performance evaluation based on CAFA metrics and a fair comparison with optimized BLAST similarity searches is provided. The consensus of GAS and BLAST is shown to improve overall performance. The PPI approach is shown to outperform similarity searches for biological process and cellular compartment GO predictions. Moreover, an analysis of the best practices to exploit protein-protein interaction networks is also provided.

  19. Protein side chain conformation predictions with an MMGBSA energy function.

    PubMed

    Gaillard, Thomas; Panel, Nicolas; Simonson, Thomas

    2016-06-01

    The prediction of protein side chain conformations from backbone coordinates is an important task in structural biology, with applications in structure prediction and protein design. It is a difficult problem due to its combinatorial nature. We study the performance of an "MMGBSA" energy function, implemented in our protein design program Proteus, which combines molecular mechanics terms, a Generalized Born and Surface Area (GBSA) solvent model, with approximations that make the model pairwise additive. Proteus is not a competitor to specialized side chain prediction programs due to its cost, but it allows protein design applications, where side chain prediction is an important step and MMGBSA an effective energy model. We predict the side chain conformations for 18 proteins. The side chains are first predicted individually, with the rest of the protein in its crystallographic conformation. Next, all side chains are predicted together. The contributions of individual energy terms are evaluated and various parameterizations are compared. We find that the GB and SA terms, with an appropriate choice of the dielectric constant and surface energy coefficients, are beneficial for single side chain predictions. For the prediction of all side chains, however, errors due to the pairwise additive approximation overcome the improvement brought by these terms. We also show the crucial contribution of side chain minimization to alleviate the rigid rotamer approximation. Even without GB and SA terms, we obtain accuracies comparable to SCWRL4, a specialized side chain prediction program. In particular, we obtain a better RMSD than SCWRL4 for core residues (at a higher cost), despite our simpler rotamer library. Proteins 2016; 84:803-819. © 2016 Wiley Periodicals, Inc.

  20. Pattern recognition methods for protein functional site prediction.

    PubMed

    Yang, Zheng Rong; Wang, Lipo; Young, Natasha; Trudgian, Dave; Chou, Kuo-Chen

    2005-10-01

    Protein functional site prediction is closely related to drug design, hence to public health. In order to save the cost and the time spent on identifying the functional sites in sequenced proteins in biology laboratory, computer programs have been widely used for decades. Many of them are implemented using the state-of-the-art pattern recognition algorithms, including decision trees, neural networks and support vector machines. Although the success of this effort has been obvious, advanced and new algorithms are still under development for addressing some difficult issues. This review will go through the major stages in developing pattern recognition algorithms for protein functional site prediction and outline the future research directions in this important area. PMID:16248799

  1. PSCL: predicting protein subcellular localization based on optimal functional domains.

    PubMed

    Wang, Kai; Hu, Le-Le; Shi, Xiao-He; Dong, Ying-Song; Li, Hai-Peng; Wen, Tie-Qiao

    2012-01-01

    It is well known that protein subcellular localizations are closely related to their functions. Although many computational methods and tools are available from Internet, it is still necessary to develop new algorithms in this filed to gain a better understanding of the complex mechanism of plant subcellular localization. Here, we provide a new web server named PSCL for plant protein subcellular localization prediction by employing optimized functional domains. After feature optimization, 848 optimal functional domains from InterPro were obtained to represent each protein. By calculating the distances to each of the seven categories, PSCL showing the possibilities of a protein located into each of those categories in ascending order. Toward our dataset, PSCL achieved a first-order predicted accuracy of 75.7% by jackknife test. Gene Ontology enrichment analysis showing that catalytic activity, cellular process and metabolic process are strongly correlated with the localization of plant proteins. Finally, PSCL, a Linux Operate System based web interface for the predictor was designed and is accessible for public use at http://pscl.biosino.org/.

  2. Cloud Prediction of Protein Structure and Function with PredictProtein for Debian

    PubMed Central

    Kaján, László; Yachdav, Guy; Vicedo, Esmeralda; Steinegger, Martin; Mirdita, Milot; Angermüller, Christof; Böhm, Ariane; Domke, Simon; Ertl, Julia; Mertes, Christian; Reisinger, Eva; Rost, Burkhard

    2013-01-01

    We report the release of PredictProtein for the Debian operating system and derivatives, such as Ubuntu, Bio-Linux, and Cloud BioLinux. The PredictProtein suite is available as a standard set of open source Debian packages. The release covers the most popular prediction methods from the Rost Lab, including methods for the prediction of secondary structure and solvent accessibility (profphd), nuclear localization signals (predictnls), and intrinsically disordered regions (norsnet). We also present two case studies that successfully utilize PredictProtein packages for high performance computing in the cloud: the first analyzes protein disorder for whole organisms, and the second analyzes the effect of all possible single sequence variants in protein coding regions of the human genome. PMID:23971032

  3. Cloud prediction of protein structure and function with PredictProtein for Debian.

    PubMed

    Kaján, László; Yachdav, Guy; Vicedo, Esmeralda; Steinegger, Martin; Mirdita, Milot; Angermüller, Christof; Böhm, Ariane; Domke, Simon; Ertl, Julia; Mertes, Christian; Reisinger, Eva; Staniewski, Cedric; Rost, Burkhard

    2013-01-01

    We report the release of PredictProtein for the Debian operating system and derivatives, such as Ubuntu, Bio-Linux, and Cloud BioLinux. The PredictProtein suite is available as a standard set of open source Debian packages. The release covers the most popular prediction methods from the Rost Lab, including methods for the prediction of secondary structure and solvent accessibility (profphd), nuclear localization signals (predictnls), and intrinsically disordered regions (norsnet). We also present two case studies that successfully utilize PredictProtein packages for high performance computing in the cloud: the first analyzes protein disorder for whole organisms, and the second analyzes the effect of all possible single sequence variants in protein coding regions of the human genome.

  4. Graphlet kernels for prediction of functional residues in protein structures.

    PubMed

    Vacic, Vladimir; Iakoucheva, Lilia M; Lonardi, Stefano; Radivojac, Predrag

    2010-01-01

    We introduce a novel graph-based kernel method for annotating functional residues in protein structures. A structure is first modeled as a protein contact graph, where nodes correspond to residues and edges connect spatially neighboring residues. Each vertex in the graph is then represented as a vector of counts of labeled non-isomorphic subgraphs (graphlets), centered on the vertex of interest. A similarity measure between two vertices is expressed as the inner product of their respective count vectors and is used in a supervised learning framework to classify protein residues. We evaluated our method on two function prediction problems: identification of catalytic residues in proteins, which is a well-studied problem suitable for benchmarking, and a much less explored problem of predicting phosphorylation sites in protein structures. The performance of the graphlet kernel approach was then compared against two alternative methods, a sequence-based predictor and our implementation of the FEATURE framework. On both tasks, the graphlet kernel performed favorably; however, the margin of difference was considerably higher on the problem of phosphorylation site prediction. While there is data that phosphorylation sites are preferentially positioned in intrinsically disordered regions, we provide evidence that for the sites that are located in structured regions, neither the surface accessibility alone nor the averaged measures calculated from the residue microenvironments utilized by FEATURE were sufficient to achieve high accuracy. The key benefit of the graphlet representation is its ability to capture neighborhood similarities in protein structures via enumerating the patterns of local connectivity in the corresponding labeled graphs.

  5. High Precision Prediction of Functional Sites in Protein Structures

    PubMed Central

    Buturovic, Ljubomir; Wong, Mike; Tang, Grace W.; Altman, Russ B.; Petkovic, Dragutin

    2014-01-01

    We address the problem of assigning biological function to solved protein structures. Computational tools play a critical role in identifying potential active sites and informing screening decisions for further lab analysis. A critical parameter in the practical application of computational methods is the precision, or positive predictive value. Precision measures the level of confidence the user should have in a particular computed functional assignment. Low precision annotations lead to futile laboratory investigations and waste scarce research resources. In this paper we describe an advanced version of the protein function annotation system FEATURE, which achieved 99% precision and average recall of 95% across 20 representative functional sites. The system uses a Support Vector Machine classifier operating on the microenvironment of physicochemical features around an amino acid. We also compared performance of our method with state-of-the-art sequence-level annotator Pfam in terms of precision, recall and localization. To our knowledge, no other functional site annotator has been rigorously evaluated against these key criteria. The software and predictive models are incorporated into the WebFEATURE service at http://feature.stanford.edu/wf4.0-beta. PMID:24632601

  6. PredictProtein—an open resource for online prediction of protein structural and functional features

    PubMed Central

    Yachdav, Guy; Kloppmann, Edda; Kajan, Laszlo; Hecht, Maximilian; Goldberg, Tatyana; Hamp, Tobias; Hönigschmid, Peter; Schafferhans, Andrea; Roos, Manfred; Bernhofer, Michael; Richter, Lothar; Ashkenazy, Haim; Punta, Marco; Schlessinger, Avner; Bromberg, Yana; Schneider, Reinhard; Vriend, Gerrit; Sander, Chris; Ben-Tal, Nir; Rost, Burkhard

    2014-01-01

    PredictProtein is a meta-service for sequence analysis that has been predicting structural and functional features of proteins since 1992. Queried with a protein sequence it returns: multiple sequence alignments, predicted aspects of structure (secondary structure, solvent accessibility, transmembrane helices (TMSEG) and strands, coiled-coil regions, disulfide bonds and disordered regions) and function. The service incorporates analysis methods for the identification of functional regions (ConSurf), homology-based inference of Gene Ontology terms (metastudent), comprehensive subcellular localization prediction (LocTree3), protein–protein binding sites (ISIS2), protein–polynucleotide binding sites (SomeNA) and predictions of the effect of point mutations (non-synonymous SNPs) on protein function (SNAP2). Our goal has always been to develop a system optimized to meet the demands of experimentalists not highly experienced in bioinformatics. To this end, the PredictProtein results are presented as both text and a series of intuitive, interactive and visually appealing figures. The web server and sources are available at http://ppopen.rostlab.org. PMID:24799431

  7. FunPred-1: protein function prediction from a protein interaction network using neighborhood analysis.

    PubMed

    Saha, Sovan; Chatterjee, Piyali; Basu, Subhadip; Kundu, Mahantapas; Nasipuri, Mita

    2014-12-01

    Proteins are responsible for all biological activities in living organisms. Thanks to genome sequencing projects, large amounts of DNA and protein sequence data are now available, but the biological functions of many proteins are still not annotated in most cases. The unknown function of such non-annotated proteins may be inferred or deduced from their neighbors in a protein interaction network. In this paper, we propose two new methods to predict protein functions based on network neighborhood properties. FunPred 1.1 uses a combination of three simple-yet-effective scoring techniques: the neighborhood ratio, the protein path connectivity and the relative functional similarity. FunPred 1.2 applies a heuristic approach using the edge clustering coefficient to reduce the search space by identifying densely connected neighborhood regions. The overall accuracy achieved in FunPred 1.2 over 8 functional groups involving hetero-interactions in 650 yeast proteins is around 87%, which is higher than the accuracy with FunPred 1.1. It is also higher than the accuracy of many of the state-of-the-art protein function prediction methods described in the literature. The test datasets and the complete source code of the developed software are now freely available at http://code.google.com/p/cmaterbioinfo/ . PMID:25424913

  8. FunPred-1: protein function prediction from a protein interaction network using neighborhood analysis.

    PubMed

    Saha, Sovan; Chatterjee, Piyali; Basu, Subhadip; Kundu, Mahantapas; Nasipuri, Mita

    2014-12-01

    Proteins are responsible for all biological activities in living organisms. Thanks to genome sequencing projects, large amounts of DNA and protein sequence data are now available, but the biological functions of many proteins are still not annotated in most cases. The unknown function of such non-annotated proteins may be inferred or deduced from their neighbors in a protein interaction network. In this paper, we propose two new methods to predict protein functions based on network neighborhood properties. FunPred 1.1 uses a combination of three simple-yet-effective scoring techniques: the neighborhood ratio, the protein path connectivity and the relative functional similarity. FunPred 1.2 applies a heuristic approach using the edge clustering coefficient to reduce the search space by identifying densely connected neighborhood regions. The overall accuracy achieved in FunPred 1.2 over 8 functional groups involving hetero-interactions in 650 yeast proteins is around 87%, which is higher than the accuracy with FunPred 1.1. It is also higher than the accuracy of many of the state-of-the-art protein function prediction methods described in the literature. The test datasets and the complete source code of the developed software are now freely available at http://code.google.com/p/cmaterbioinfo/ .

  9. Improvement in protein functional site prediction by distinguishing structural and functional constraints on protein family evolution using computational design.

    PubMed

    Cheng, Gong; Qian, Bin; Samudrala, Ram; Baker, David

    2005-01-01

    The prediction of functional sites in newly solved protein structures is a challenge for computational structural biology. Most methods for approaching this problem use evolutionary conservation as the primary indicator of the location of functional sites. However, sequence conservation reflects not only evolutionary selection at functional sites to maintain protein function, but also selection throughout the protein to maintain the stability of the folded state. To disentangle sequence conservation due to protein functional constraints from sequence conservation due to protein structural constraints, we use all atom computational protein design methodology to predict sequence profiles expected under solely structural constraints, and to compute the free energy difference between the naturally occurring amino acid and the lowest free energy amino acid at each position. We show that functional sites are more likely than non-functional sites to have computed sequence profiles which differ significantly from the naturally occurring sequence profiles and to have residues with sub-optimal free energies, and that incorporation of these two measures improves sequence based prediction of protein functional sites. The combined sequence and structure based functional site prediction method has been implemented in a publicly available web server.

  10. UET: a database of evolutionarily-predicted functional determinants of protein sequences that cluster as functional sites in protein structures.

    PubMed

    Lua, Rhonald C; Wilson, Stephen J; Konecki, Daniel M; Wilkins, Angela D; Venner, Eric; Morgan, Daniel H; Lichtarge, Olivier

    2016-01-01

    The structure and function of proteins underlie most aspects of biology and their mutational perturbations often cause disease. To identify the molecular determinants of function as well as targets for drugs, it is central to characterize the important residues and how they cluster to form functional sites. The Evolutionary Trace (ET) achieves this by ranking the functional and structural importance of the protein sequence positions. ET uses evolutionary distances to estimate functional distances and correlates genotype variations with those in the fitness phenotype. Thus, ET ranks are worse for sequence positions that vary among evolutionarily closer homologs but better for positions that vary mostly among distant homologs. This approach identifies functional determinants, predicts function, guides the mutational redesign of functional and allosteric specificity, and interprets the action of coding sequence variations in proteins, people and populations. Now, the UET database offers pre-computed ET analyses for the protein structure databank, and on-the-fly analysis of any protein sequence. A web interface retrieves ET rankings of sequence positions and maps results to a structure to identify functionally important regions. This UET database integrates several ways of viewing the results on the protein sequence or structure and can be found at http://mammoth.bcm.tmc.edu/uet/.

  11. Prediction of protein complexes using empirical free energy functions.

    PubMed Central

    Weng, Z.; Vajda, S.; Delisi, C.

    1996-01-01

    A long sought goal in the physical chemistry of macromolecular structure, and one directly relevant to understanding the molecular basis of biological recognition, is predicting the geometry of bimolecular complexes from the geometries of their free monomers. Even when the monomers remain relatively unchanged by complex formation, prediction has been difficult because the free energies of alternative conformations of the complex have been difficult to evaluate quickly and accurately. This has forced the use of incomplete target functions, which typically do no better than to provide tens of possible complexes with no way of choosing between them. Here we present a general framework for empirical free energy evaluation and report calculations, based on a relatively complete and easily executable free energy function, that indicate that the structures of complexes can be predicted accurately from the structures of monomers, including close sequence homologues. The calculations also suggest that the binding free energies themselves may be predicted with reasonable accuracy. The method is compared to an alternative formulation that has also been applied recently to the same data set. Both approaches promise to open new opportunities in macromolecular design and specificity modification. PMID:8845751

  12. Prediction of functional sites in proteins using conserved functional group analysis.

    PubMed

    Innis, C Axel; Anand, A Prem; Sowdhamini, R

    2004-04-01

    A detailed knowledge of a protein's functional site is an absolute prerequisite for understanding its mode of action at the molecular level. However, the rapid pace at which sequence and structural information is being accumulated for proteins greatly exceeds our ability to determine their biochemical roles experimentally. As a result, computational methods are required which allow for the efficient processing of the evolutionary information contained in this wealth of data, in particular that related to the nature and location of functionally important sites and residues. The method presented here, referred to as conserved functional group (CFG) analysis, relies on a simplified representation of the chemical groups found in amino acid side-chains to identify functional sites from a single protein structure and a number of its sequence homologues. We show that CFG analysis can fully or partially predict the location of functional sites in approximately 96% of the 470 cases tested and that, unlike other methods available, it is able to tolerate wide variations in sequence identity. In addition, we discuss its potential in a structural genomics context, where automation, scalability and efficiency are critical, and an increasing number of protein structures are determined with no prior knowledge of function. This is exemplified by our analysis of the hypothetical protein Ydde_Ecoli, whose structure was recently solved by members of the North East Structural Genomics consortium. Although the proposed active site for this protein needs to be validated experimentally, this example illustrates the scope of CFG analysis as a general tool for the identification of residues likely to play an important role in a protein's biochemical function. Thus, our method offers a convenient solution to rapidly and automatically process the vast amounts of data that are beginning to emerge from structural genomics projects. PMID:15033369

  13. Bayesian Markov Random Field analysis for protein function prediction based on network data.

    PubMed

    Kourmpetis, Yiannis A I; van Dijk, Aalt D J; Bink, Marco C A M; van Ham, Roeland C H J; ter Braak, Cajo J F

    2010-02-24

    Inference of protein functions is one of the most important aims of modern biology. To fully exploit the large volumes of genomic data typically produced in modern-day genomic experiments, automated computational methods for protein function prediction are urgently needed. Established methods use sequence or structure similarity to infer functions but those types of data do not suffice to determine the biological context in which proteins act. Current high-throughput biological experiments produce large amounts of data on the interactions between proteins. Such data can be used to infer interaction networks and to predict the biological process that the protein is involved in. Here, we develop a probabilistic approach for protein function prediction using network data, such as protein-protein interaction measurements. We take a Bayesian approach to an existing Markov Random Field method by performing simultaneous estimation of the model parameters and prediction of protein functions. We use an adaptive Markov Chain Monte Carlo algorithm that leads to more accurate parameter estimates and consequently to improved prediction performance compared to the standard Markov Random Fields method. We tested our method using a high quality S. cereviciae validation network with 1622 proteins against 90 Gene Ontology terms of different levels of abstraction. Compared to three other protein function prediction methods, our approach shows very good prediction performance. Our method can be directly applied to protein-protein interaction or coexpression networks, but also can be extended to use multiple data sources. We apply our method to physical protein interaction data from S. cerevisiae and provide novel predictions, using 340 Gene Ontology terms, for 1170 unannotated proteins and we evaluate the predictions using the available literature.

  14. Dynamic circadian protein-protein interaction networks predict temporal organization of cellular functions.

    PubMed

    Wallach, Thomas; Schellenberg, Katja; Maier, Bert; Kalathur, Ravi Kiran Reddy; Porras, Pablo; Wanker, Erich E; Futschik, Matthias E; Kramer, Achim

    2013-03-01

    Essentially all biological processes depend on protein-protein interactions (PPIs). Timing of such interactions is crucial for regulatory function. Although circadian (~24-hour) clocks constitute fundamental cellular timing mechanisms regulating important physiological processes, PPI dynamics on this timescale are largely unknown. Here, we identified 109 novel PPIs among circadian clock proteins via a yeast-two-hybrid approach. Among them, the interaction of protein phosphatase 1 and CLOCK/BMAL1 was found to result in BMAL1 destabilization. We constructed a dynamic circadian PPI network predicting the PPI timing using circadian expression data. Systematic circadian phenotyping (RNAi and overexpression) suggests a crucial role for components involved in dynamic interactions. Systems analysis of a global dynamic network in liver revealed that interacting proteins are expressed at similar times likely to restrict regulatory interactions to specific phases. Moreover, we predict that circadian PPIs dynamically connect many important cellular processes (signal transduction, cell cycle, etc.) contributing to temporal organization of cellular physiology in an unprecedented manner. PMID:23555304

  15. A Protein Domain Co-Occurrence Network Approach for Predicting Protein Function and Inferring Species Phylogeny

    PubMed Central

    Wang, Zheng; Zhang, Xue-Cheng; Le, Mi Ha; Xu, Dong; Stacey, Gary; Cheng, Jianlin

    2011-01-01

    Protein Domain Co-occurrence Network (DCN) is a biological network that has not been fully-studied. We analyzed the properties of the DCNs of H. sapiens, S. cerevisiae, C. elegans, D. melanogaster, and 15 plant genomes. These DCNs have the hallmark features of scale-free networks. We investigated the possibility of using DCNs to predict protein and domain functions. Based on our experiment conducted on 66 randomly selected proteins, the best of top 3 predictions made by our DCN-based aggregated neighbor-counting method achieved a semantic similarity score of 0.81 to the actual Gene Ontology terms of the proteins. Moreover, the top 3 predictions using neighbor-counting, χ2, and a SVM-based method achieved an accuracy of 66%, 59%, and 61%, respectively, when used to predict specific Gene Ontology terms of human target domains. These predictions on average had a semantic similarity score of 0.82, 0.80, and 0.79 to the actual Gene Ontology terms, respectively. We also used DCNs to predict whether a domain is an enzyme domain, and our SVM-based and neighbor-inference method correctly classified 79% and 77% of the target domains, respectively. When using DCNs to classify a target domain into one of the six enzyme classes, we found that, as long as there is one EC number available in the neighboring domains, our SVM-based and neighboring-counting method correctly classified 92.4% and 91.9% of the target domains, respectively. Furthermore, we benchmarked the performance of using DCNs to infer species phylogenies on six different combinations of 398 single-chromosome prokaryotic genomes. The phylogenetic tree of 54 prokaryotic taxa generated by our DCNs-alignment-based method achieved a 93.45% similarity score compared to the Bergey's taxonomy. In summary, our studies show that genome-wide DCNs contain rich information that can be effectively used to decipher protein function and reveal the evolutionary relationship among species. PMID:21455299

  16. Correlated Protein Function Prediction via Maximization of Data-Knowledge Consistency.

    PubMed

    Wang, Hua; Huang, Heng; Ding, Chris

    2015-06-01

    Conventional computational approaches for protein function prediction usually predict one function at a time, fundamentally. As a result, the protein functions are treated as separate target classes. However, biological processes are highly correlated in reality, which makes multiple functions assigned to a protein not independent. Therefore, it would be beneficial to make use of function category correlations when predicting protein functions. In this article, we propose a novel Maximization of Data-Knowledge Consistency (MDKC) approach to exploit function category correlations for protein function prediction. Our approach banks on the assumption that two proteins are likely to have large overlap in their annotated functions if they are highly similar according to certain experimental data. We first establish a new pairwise protein similarity using protein annotations from knowledge perspective. Then by maximizing the consistency between the established knowledge similarity upon annotations and the data similarity upon biological experiments, putative functions are assigned to unannotated proteins. Most importantly, function category correlations are gracefully incorporated into our learning objective through the knowledge similarity. Comprehensive experimental evaluations on the Saccharomyces cerevisiae species have demonstrated promising results that validate the performance of our methods.

  17. SIFTER search: a web server for accurate phylogeny-based protein function prediction.

    PubMed

    Sahraeian, Sayed M; Luo, Kevin R; Brenner, Steven E

    2015-07-01

    We are awash in proteins discovered through high-throughput sequencing projects. As only a minuscule fraction of these have been experimentally characterized, computational methods are widely used for automated annotation. Here, we introduce a user-friendly web interface for accurate protein function prediction using the SIFTER algorithm. SIFTER is a state-of-the-art sequence-based gene molecular function prediction algorithm that uses a statistical model of function evolution to incorporate annotations throughout the phylogenetic tree. Due to the resources needed by the SIFTER algorithm, running SIFTER locally is not trivial for most users, especially for large-scale problems. The SIFTER web server thus provides access to precomputed predictions on 16 863 537 proteins from 232 403 species. Users can explore SIFTER predictions with queries for proteins, species, functions, and homologs of sequences not in the precomputed prediction set. The SIFTER web server is accessible at http://sifter.berkeley.edu/ and the source code can be downloaded.

  18. COMBREX-DB: an experiment centered database of protein function: knowledge, predictions and knowledge gaps.

    PubMed

    Chang, Yi-Chien; Hu, Zhenjun; Rachlin, John; Anton, Brian P; Kasif, Simon; Roberts, Richard J; Steffen, Martin

    2016-01-01

    The COMBREX database (COMBREX-DB; combrex.bu.edu) is an online repository of information related to (i) experimentally determined protein function, (ii) predicted protein function, (iii) relationships among proteins of unknown function and various types of experimental data, including molecular function, protein structure, and associated phenotypes. The database was created as part of the novel COMBREX (COMputational BRidges to EXperiments) effort aimed at accelerating the rate of gene function validation. It currently holds information on ∼ 3.3 million known and predicted proteins from over 1000 completely sequenced bacterial and archaeal genomes. The database also contains a prototype recommendation system for helping users identify those proteins whose experimental determination of function would be most informative for predicting function for other proteins within protein families. The emphasis on documenting experimental evidence for function predictions, and the prioritization of uncharacterized proteins for experimental testing distinguish COMBREX from other publicly available microbial genomics resources. This article describes updates to COMBREX-DB since an initial description in the 2011 NAR Database Issue.

  19. COMBREX-DB: an experiment centered database of protein function: knowledge, predictions and knowledge gaps.

    PubMed

    Chang, Yi-Chien; Hu, Zhenjun; Rachlin, John; Anton, Brian P; Kasif, Simon; Roberts, Richard J; Steffen, Martin

    2016-01-01

    The COMBREX database (COMBREX-DB; combrex.bu.edu) is an online repository of information related to (i) experimentally determined protein function, (ii) predicted protein function, (iii) relationships among proteins of unknown function and various types of experimental data, including molecular function, protein structure, and associated phenotypes. The database was created as part of the novel COMBREX (COMputational BRidges to EXperiments) effort aimed at accelerating the rate of gene function validation. It currently holds information on ∼ 3.3 million known and predicted proteins from over 1000 completely sequenced bacterial and archaeal genomes. The database also contains a prototype recommendation system for helping users identify those proteins whose experimental determination of function would be most informative for predicting function for other proteins within protein families. The emphasis on documenting experimental evidence for function predictions, and the prioritization of uncharacterized proteins for experimental testing distinguish COMBREX from other publicly available microbial genomics resources. This article describes updates to COMBREX-DB since an initial description in the 2011 NAR Database Issue. PMID:26635392

  20. COMBREX-DB: an experiment centered database of protein function: knowledge, predictions and knowledge gaps

    PubMed Central

    Chang, Yi-Chien; Hu, Zhenjun; Rachlin, John; Anton, Brian P.; Kasif, Simon; Roberts, Richard J.; Steffen, Martin

    2016-01-01

    The COMBREX database (COMBREX-DB; combrex.bu.edu) is an online repository of information related to (i) experimentally determined protein function, (ii) predicted protein function, (iii) relationships among proteins of unknown function and various types of experimental data, including molecular function, protein structure, and associated phenotypes. The database was created as part of the novel COMBREX (COMputational BRidges to EXperiments) effort aimed at accelerating the rate of gene function validation. It currently holds information on ∼3.3 million known and predicted proteins from over 1000 completely sequenced bacterial and archaeal genomes. The database also contains a prototype recommendation system for helping users identify those proteins whose experimental determination of function would be most informative for predicting function for other proteins within protein families. The emphasis on documenting experimental evidence for function predictions, and the prioritization of uncharacterized proteins for experimental testing distinguish COMBREX from other publicly available microbial genomics resources. This article describes updates to COMBREX-DB since an initial description in the 2011 NAR Database Issue. PMID:26635392

  1. Phagonaute: A web-based interface for phage synteny browsing and protein function prediction.

    PubMed

    Delattre, Hadrien; Souiai, Oussema; Fagoonee, Khema; Guerois, Raphaël; Petit, Marie-Agnès

    2016-09-01

    Distant homology search tools are of great help to predict viral protein functions. However, due to the lack of profile databases dedicated to viruses, they can lack sensitivity. We constructed HMM profiles for more than 80,000 proteins from both phages and archaeal viruses, and performed all pairwise comparisons with HHsearch program. The whole resulting database can be explored through a user-friendly "Phagonaute" interface to help predict functions. Results are displayed together with their genetic context, to strengthen inferences based on remote homology. Beyond function prediction, this tool permits detections of co-occurrences, often indicative of proteins completing a task together, and observation of conserved patterns across large evolutionary distances. As a test, Herpes simplex virus I was added to Phagonaute, and 25% of its proteome matched to bacterial or archaeal viral protein counterparts. Phagonaute should therefore help virologists in their quest for protein functions and evolutionary relationships. PMID:27254594

  2. Phagonaute: A web-based interface for phage synteny browsing and protein function prediction.

    PubMed

    Delattre, Hadrien; Souiai, Oussema; Fagoonee, Khema; Guerois, Raphaël; Petit, Marie-Agnès

    2016-09-01

    Distant homology search tools are of great help to predict viral protein functions. However, due to the lack of profile databases dedicated to viruses, they can lack sensitivity. We constructed HMM profiles for more than 80,000 proteins from both phages and archaeal viruses, and performed all pairwise comparisons with HHsearch program. The whole resulting database can be explored through a user-friendly "Phagonaute" interface to help predict functions. Results are displayed together with their genetic context, to strengthen inferences based on remote homology. Beyond function prediction, this tool permits detections of co-occurrences, often indicative of proteins completing a task together, and observation of conserved patterns across large evolutionary distances. As a test, Herpes simplex virus I was added to Phagonaute, and 25% of its proteome matched to bacterial or archaeal viral protein counterparts. Phagonaute should therefore help virologists in their quest for protein functions and evolutionary relationships.

  3. Local structure based method for prediction of the biochemical function of proteins: Applications to glycoside hydrolases.

    PubMed

    Parasuram, Ramya; Mills, Caitlyn L; Wang, Zhouxi; Somasundaram, Saroja; Beuning, Penny J; Ondrechen, Mary Jo

    2016-01-15

    Thousands of protein structures of unknown or uncertain function have been reported as a result of high-throughput structure determination techniques developed by Structural Genomics (SG) projects. However, many of the putative functional assignments of these SG proteins in the Protein Data Bank (PDB) are incorrect. While high-throughput biochemical screening techniques have provided valuable functional information for limited sets of SG proteins, the biochemical functions for most SG proteins are still unknown or uncertain. Therefore, computational methods for the reliable prediction of protein function from structure can add tremendous value to the existing SG data. In this article, we show how computational methods may be used to predict the function of SG proteins, using examples from the six-hairpin glycosidase (6-HG) and the concanavalin A-like lectin/glucanase (CAL/G) superfamilies. Using a set of predicted functional residues, obtained from computed electrostatic and chemical properties for each protein structure, it is shown that these superfamilies may be sorted into functional families according to biochemical function. Within these superfamilies, a total of 18 SG proteins were analyzed according to their predicted, local functional sites: 13 from the 6-HG superfamily, five from the CAL/G superfamily. Within the 6-HG superfamily, an uncharacterized protein BACOVA_03626 from Bacteroides ovatus (PDB 3ON6) and a hypothetical protein BT3781 from Bacteroides thetaiotaomicron (PDB 2P0V) are shown to have very strong active site matches with exo-α-1,6-mannosidases, thus likely possessing this function. Also in this superfamily, it is shown that protein BH0842, a putative glycoside hydrolase from Bacillus halodurans (PDB 2RDY), has a predicted active site that matches well with a known α-L-galactosidase. In the CAL/G superfamily, an uncharacterized glycosyl hydrolase family 16 protein from Mycobacterium smegmatis (PDB 3RQ0) is shown to have local structural

  4. FINDSITE: a combined evolution/structure-based approach to protein function prediction

    PubMed Central

    Brylinski, Michal

    2009-01-01

    A key challenge of the post-genomic era is the identification of the function(s) of all the molecules in a given organism. Here, we review the status of sequence and structure-based approaches to protein function inference and ligand screening that can provide functional insights for a significant fraction of the ∼50% of ORFs of unassigned function in an average proteome. We then describe FINDSITE, a recently developed algorithm for ligand binding site prediction, ligand screening and molecular function prediction, which is based on binding site conservation across evolutionary distant proteins identified by threading. Importantly, FINDSITE gives comparable results when high-resolution experimental structures as well as predicted protein models are used. PMID:19324930

  5. Functional prediction: identification of protein orthologs and paralogs.

    PubMed Central

    Chen, R.; Jeong, S. S.

    2000-01-01

    Orthologs typically retain the same function in the course of evolution. Using beta-decarboxylating dehydrogenase family as a model, we demonstrate that orthologs can be confidently identified. The strategy is based on our recent findings that substitutions of only a few amino acid residues in these enzymes are sufficient to exchange substrate and coenzyme specificities. Hence, the few major specificity determinants can serve as reliable markers for determining orthologous or paralogous relationships. The power of this approach has been demonstrated by correcting similarity-based functional misassignment and discovering new genes and related pathways, and should be broadly applicable to other enzyme families. PMID:11206056

  6. Predicting protein functions from redundancies in large-scale protein interaction networks

    NASA Technical Reports Server (NTRS)

    Samanta, Manoj Pratim; Liang, Shoudan

    2003-01-01

    Interpreting data from large-scale protein interaction experiments has been a challenging task because of the widespread presence of random false positives. Here, we present a network-based statistical algorithm that overcomes this difficulty and allows us to derive functions of unannotated proteins from large-scale interaction data. Our algorithm uses the insight that if two proteins share significantly larger number of common interaction partners than random, they have close functional associations. Analysis of publicly available data from Saccharomyces cerevisiae reveals >2,800 reliable functional associations, 29% of which involve at least one unannotated protein. By further analyzing these associations, we derive tentative functions for 81 unannotated proteins with high certainty. Our method is not overly sensitive to the false positives present in the data. Even after adding 50% randomly generated interactions to the measured data set, we are able to recover almost all (approximately 89%) of the original associations.

  7. A multi-label classifier for prediction membrane protein functional types in animal.

    PubMed

    Zou, Hong-Liang

    2014-11-01

    Membrane protein is an important composition of cell membrane. Given a membrane protein sequence, how can we identify its type(s) is very important because the type keeps a close correlation with its functions. According to previous studies, membrane protein can be divided into the following eight types: single-pass type I, single-pass type II, single-pass type III, single-pass type IV, multipass, lipid-anchor, GPI-anchor, peripheral membrane protein. With the avalanche of newly found protein sequences in the post-genomic age, it is urgent to develop an automatic and effective computational method to rapid and reliable prediction of the types of membrane proteins. At present, most of the existing methods were based on the assumption that one membrane protein only belongs to one type. Actually, a membrane protein may simultaneously exist at two or more different functional types. In this study, a new method by hybridizing the pseudo amino acid composition with multi-label algorithm called LIFT (multi-label learning with label-specific features) was proposed to predict the functional types both singleplex and multiplex animal membrane proteins. Experimental result on a stringent benchmark dataset of membrane proteins by jackknife test show that the absolute-true obtained was 0.6342, indicating that our approach is quite promising. It may become a useful high-through tool, or at least play a complementary role to the existing predictors in identifying functional types of membrane proteins.

  8. iPFPi: A System for Improving Protein Function Prediction through Cumulative Iterations.

    PubMed

    Taha, Kamal; Yoo, Paul D; Alzaabi, Mohammed

    2015-01-01

    We propose a classifier system called iPFPi that predicts the functions of un-annotated proteins. iPFPi assigns an un-annotated protein P the functions of GO annotation terms that are semantically similar to P. An un-annotated protein P and a GO annotation term T are represented by their characteristics. The characteristics of P are GO terms found within the abstracts of biomedical literature associated with P. The characteristics of Tare GO terms found within the abstracts of biomedical literature associated with the proteins annotated with the function of T. Let F and F/ be the important (dominant) sets of characteristic terms representing T and P, respectively. iPFPi would annotate P with the function of T, if F and F/ are semantically similar. We constructed a novel semantic similarity measure that takes into consideration several factors, such as the dominance degree of each characteristic term t in set F based on its score, which is a value that reflects the dominance status of t relative to other characteristic terms, using pairwise beats and looses procedure. Every time a protein P is annotated with the function of T, iPFPi updates and optimizes the current scores of the characteristic terms for T based on the weights of the characteristic terms for P. Set F will be updated accordingly. Thus, the accuracy of predicting the function of T as the function of subsequent proteins improves. This prediction accuracy keeps improving over time iteratively through the cumulative weights of the characteristic terms representing proteins that are successively annotated with the function of T. We evaluated the quality of iPFPi by comparing it experimentally with two recent protein function prediction systems. Results showed marked improvement.

  9. Structure- and Sequence-Based Function Prediction for Non-Homologous Proteins

    PubMed Central

    Sael, Lee; Chitale, Meghana; Kihara, Daisuke

    2012-01-01

    The structural genomics projects have been accumulating an increasing number of protein structures, many of which remain functionally unknown. In parallel effort to experimental methods, computational methods are expected to make a significant contribution for functional elucidation of such proteins. However, conventional computational methods that transfer functions from homologous proteins do not help much for these uncharacterized protein structures because they do not have apparent structural or sequence similarity with the known proteins. Here, we briefly review two avenues of computational function prediction methods, i.e. structure-based methods and sequence-based methods. The focus is on our recently developments of local structure-based methods and sequence-based methods, which can effectively extract function information from distantly related proteins. Two structure-based methods, Pocket-Surfer and Patch-Surfer, identify similar known ligand binding sites for pocket regions in a query protein without using global protein fold similarity information. Two sequence-based methods, PFP and ESG, make use of weakly similar sequences that are conventionally discarded in homology based function annotation. Combined together with experimental methods we hope that computational methods will make leading contribution in functional elucidation of the protein structures. PMID:22270458

  10. Automated protein motif generation in the structure-based protein function prediction tool ProMOL.

    PubMed

    Osipovitch, Mikhail; Lambrecht, Mitchell; Baker, Cameron; Madha, Shariq; Mills, Jeffrey L; Craig, Paul A; Bernstein, Herbert J

    2015-12-01

    ProMOL, a plugin for the PyMOL molecular graphics system, is a structure-based protein function prediction tool. ProMOL includes a set of routines for building motif templates that are used for screening query structures for enzyme active sites. Previously, each motif template was generated manually and required supervision in the optimization of parameters for sensitivity and selectivity. We developed an algorithm and workflow for the automation of motif building and testing routines in ProMOL. The algorithm uses a set of empirically derived parameters for optimization and requires little user intervention. The automated motif generation algorithm was first tested in a performance comparison with a set of manually generated motifs based on identical active sites from the same 112 PDB entries. The two sets of motifs were equally effective in identifying alignments with homologs and in rejecting alignments with unrelated structures. A second set of 296 active site motifs were generated automatically, based on Catalytic Site Atlas entries with literature citations, as an expansion of the library of existing manually generated motif templates. The new motif templates exhibited comparable performance to the existing ones in terms of hit rates against native structures, homologs with the same EC and Pfam designations, and randomly selected unrelated structures with a different EC designation at the first EC digit, as well as in terms of RMSD values obtained from local structural alignments of motifs and query structures. This research is supported by NIH grant GM078077. PMID:26573864

  11. Application of Gap-Constraints Given Sequential Frequent Pattern Mining for Protein Function Prediction

    PubMed Central

    Park, Hyeon Ah; Kim, Taewook; Li, Meijing; Shon, Ho Sun; Park, Jeong Seok; Ryu, Keun Ho

    2015-01-01

    Objectives Predicting protein function from the protein–protein interaction network is challenging due to its complexity and huge scale of protein interaction process along with inconsistent pattern. Previously proposed methods such as neighbor counting, network analysis, and graph pattern mining has predicted functions by calculating the rules and probability of patterns inside network. Although these methods have shown good prediction, difficulty still exists in searching several functions that are exceptional from simple rules and patterns as a result of not considering the inconsistent aspect of the interaction network. Methods In this article, we propose a novel approach using the sequential pattern mining method with gap-constraints. To overcome the inconsistency problem, we suggest frequent functional patterns to include every possible functional sequence—including patterns for which search is limited by the structure of connection or level of neighborhood layer. We also constructed a tree-graph with the most crucial interaction information of the target protein, and generated candidate sets to assign by sequential pattern mining allowing gaps. Results The parameters of pattern length, maximum gaps, and minimum support were given to find the best setting for the most accurate prediction. The highest accuracy rate was 0.972, which showed better results than the simple neighbor counting approach and link-based approach. Conclusion The results comparison with other approaches has confirmed that the proposed approach could reach more function candidates that previous methods could not obtain. PMID:25938021

  12. SIFTER search: a web server for accurate phylogeny-based protein function prediction

    DOE PAGES

    Sahraeian, Sayed M.; Luo, Kevin R.; Brenner, Steven E.

    2015-05-15

    We are awash in proteins discovered through high-throughput sequencing projects. As only a minuscule fraction of these have been experimentally characterized, computational methods are widely used for automated annotation. Here, we introduce a user-friendly web interface for accurate protein function prediction using the SIFTER algorithm. SIFTER is a state-of-the-art sequence-based gene molecular function prediction algorithm that uses a statistical model of function evolution to incorporate annotations throughout the phylogenetic tree. Due to the resources needed by the SIFTER algorithm, running SIFTER locally is not trivial for most users, especially for large-scale problems. The SIFTER web server thus provides access tomore » precomputed predictions on 16 863 537 proteins from 232 403 species. Users can explore SIFTER predictions with queries for proteins, species, functions, and homologs of sequences not in the precomputed prediction set. Lastly, the SIFTER web server is accessible at http://sifter.berkeley.edu/ and the source code can be downloaded.« less

  13. SIFTER search: a web server for accurate phylogeny-based protein function prediction.

    PubMed

    Sahraeian, Sayed M; Luo, Kevin R; Brenner, Steven E

    2015-07-01

    We are awash in proteins discovered through high-throughput sequencing projects. As only a minuscule fraction of these have been experimentally characterized, computational methods are widely used for automated annotation. Here, we introduce a user-friendly web interface for accurate protein function prediction using the SIFTER algorithm. SIFTER is a state-of-the-art sequence-based gene molecular function prediction algorithm that uses a statistical model of function evolution to incorporate annotations throughout the phylogenetic tree. Due to the resources needed by the SIFTER algorithm, running SIFTER locally is not trivial for most users, especially for large-scale problems. The SIFTER web server thus provides access to precomputed predictions on 16 863 537 proteins from 232 403 species. Users can explore SIFTER predictions with queries for proteins, species, functions, and homologs of sequences not in the precomputed prediction set. The SIFTER web server is accessible at http://sifter.berkeley.edu/ and the source code can be downloaded. PMID:25979264

  14. SIFTER search: a web server for accurate phylogeny-based protein function prediction

    SciTech Connect

    Sahraeian, Sayed M.; Luo, Kevin R.; Brenner, Steven E.

    2015-05-15

    We are awash in proteins discovered through high-throughput sequencing projects. As only a minuscule fraction of these have been experimentally characterized, computational methods are widely used for automated annotation. Here, we introduce a user-friendly web interface for accurate protein function prediction using the SIFTER algorithm. SIFTER is a state-of-the-art sequence-based gene molecular function prediction algorithm that uses a statistical model of function evolution to incorporate annotations throughout the phylogenetic tree. Due to the resources needed by the SIFTER algorithm, running SIFTER locally is not trivial for most users, especially for large-scale problems. The SIFTER web server thus provides access to precomputed predictions on 16 863 537 proteins from 232 403 species. Users can explore SIFTER predictions with queries for proteins, species, functions, and homologs of sequences not in the precomputed prediction set. Lastly, the SIFTER web server is accessible at http://sifter.berkeley.edu/ and the source code can be downloaded.

  15. An Accurate Method for Prediction of Protein-Ligand Binding Site on Protein Surface Using SVM and Statistical Depth Function

    PubMed Central

    Wang, Kui; Gao, Jianzhao; Shen, Shiyi; Tuszynski, Jack A.; Ruan, Jishou

    2013-01-01

    Since proteins carry out their functions through interactions with other molecules, accurately identifying the protein-ligand binding site plays an important role in protein functional annotation and rational drug discovery. In the past two decades, a lot of algorithms were present to predict the protein-ligand binding site. In this paper, we introduce statistical depth function to define negative samples and propose an SVM-based method which integrates sequence and structural information to predict binding site. The results show that the present method performs better than the existent ones. The accuracy, sensitivity, and specificity on training set are 77.55%, 56.15%, and 87.96%, respectively; on the independent test set, the accuracy, sensitivity, and specificity are 80.36%, 53.53%, and 92.38%, respectively. PMID:24195070

  16. Combining Phylogenetic Profiling-Based and Machine Learning-Based Techniques to Predict Functional Related Proteins

    PubMed Central

    Lin, Tzu-Wen; Wu, Jian-Wei; Chang, Darby Tien-Hao

    2013-01-01

    Annotating protein functions and linking proteins with similar functions are important in systems biology. The rapid growth rate of newly sequenced genomes calls for the development of computational methods to help experimental techniques. Phylogenetic profiling (PP) is a method that exploits the evolutionary co-occurrence pattern to identify functional related proteins. However, PP-based methods delivered satisfactory performance only on prokaryotes but not on eukaryotes. This study proposed a two-stage framework to predict protein functional linkages, which successfully enhances a PP-based method with machine learning. The experimental results show that the proposed two-stage framework achieved the best overall performance in comparison with three PP-based methods. PMID:24069454

  17. Self-consistently optimized energy functions for protein structure prediction by molecular dynamics.

    PubMed

    Koretke, K K; Luthey-Schulten, Z; Wolynes, P G

    1998-03-17

    The protein energy landscape theory is used to obtain optimal energy functions for protein structure prediction via simulated annealing. The analysis here takes advantage of a more complete statistical characterization of the protein energy landscape and thereby improves on previous approximations. This schema partially takes into account correlations in the energy landscape. It also incorporates the relationships between folding dynamics and characteristic energy scales that control the collapse of the proteins and modulate rigidity of short-range interactions. Simulated annealing for the optimal energy functions, which are associative memory hamiltonians using a database of folding patterns, generally leads to quantitatively correct structures. In some cases the algorithm achieves "creativity," i.e., structures result that are better than any homolog in the database.

  18. Predicting Structure and Function for Novel Proteins of an Extremophilic Iron Oxidizing Bacterium

    NASA Astrophysics Data System (ADS)

    Wheeler, K.; Zemla, A.; Banfield, J.; Thelen, M.

    2007-12-01

    Proteins isolated from uncultivated microbial populations represent the functional components of microbial processes and contribute directly to community fitness under natural conditions. Investigations into proteins in the environment are hindered by the lack of genome data, or where available, the high proportion of proteins of unknown function. We have identified thousands of proteins from biofilms in the extremely acidic drainage outflow of an iron mine ecosystem (1). With an extensive genomic and proteomic foundation, we have focused directly on the problem of several hundred proteins of unknown function within this well-defined model system. Here we describe the geobiological insights gained by using a high throughput computational approach for predicting structure and function of 421 novel proteins from the biofilm community. We used a homology based modeling system to compare these proteins to those of known structure (AS2TS) (2). This approach has resulted in the assignment of structures to 360 proteins (85%) and provided functional information for up to 75% of the modeled proteins. Detailed examination of the modeling results enables confident, high-throughput prediction of the roles of many of the novel proteins within the microbial community. For instance, one prediction places a protein in the phosphoenolpyruvate/pyruvate domain superfamily as a carboxylase that fills in a gap in an otherwise complete carbon cycle. Particularly important for a community in such a metal rich environment is the evolution of over 25% of the novel proteins that contain a metal cofactor; of these, one third are likely Fe containing proteins. Two of the most abundant proteins in biofilm samples are unusual c-type cytochromes. Both of these proteins catalyze iron- oxidation, a key metabolic reaction supporting the energy requirements of this community. Structural models of these cytochromes verify our experimental results on heme binding and electron transfer reactivity, and

  19. Multi-instance multi-label distance metric learning for genome-wide protein function prediction.

    PubMed

    Xu, Yonghui; Min, Huaqing; Song, Hengjie; Wu, Qingyao

    2016-08-01

    Multi-instance multi-label (MIML) learning has been proven to be effective for the genome-wide protein function prediction problems where each training example is associated with not only multiple instances but also multiple class labels. To find an appropriate MIML learning method for genome-wide protein function prediction, many studies in the literature attempted to optimize objective functions in which dissimilarity between instances is measured using the Euclidean distance. But in many real applications, Euclidean distance may be unable to capture the intrinsic similarity/dissimilarity in feature space and label space. Unlike other previous approaches, in this paper, we propose to learn a multi-instance multi-label distance metric learning framework (MIMLDML) for genome-wide protein function prediction. Specifically, we learn a Mahalanobis distance to preserve and utilize the intrinsic geometric information of both feature space and label space for MIML learning. In addition, we try to deal with the sparsely labeled data by giving weight to the labeled data. Extensive experiments on seven real-world organisms covering the biological three-domain system (i.e., archaea, bacteria, and eukaryote; Woese et al., 1990) show that the MIMLDML algorithm is superior to most state-of-the-art MIML learning algorithms.

  20. Negative Example Selection for Protein Function Prediction: The NoGO Database

    PubMed Central

    Youngs, Noah; Penfold-Brown, Duncan; Bonneau, Richard; Shasha, Dennis

    2014-01-01

    Negative examples – genes that are known not to carry out a given protein function – are rarely recorded in genome and proteome annotation databases, such as the Gene Ontology database. Negative examples are required, however, for several of the most powerful machine learning methods for integrative protein function prediction. Most protein function prediction efforts have relied on a variety of heuristics for the choice of negative examples. Determining the accuracy of methods for negative example prediction is itself a non-trivial task, given that the Open World Assumption as applied to gene annotations rules out many traditional validation metrics. We present a rigorous comparison of these heuristics, utilizing a temporal holdout, and a novel evaluation strategy for negative examples. We add to this comparison several algorithms adapted from Positive-Unlabeled learning scenarios in text-classification, which are the current state of the art methods for generating negative examples in low-density annotation contexts. Lastly, we present two novel algorithms of our own construction, one based on empirical conditional probability, and the other using topic modeling applied to genes and annotations. We demonstrate that our algorithms achieve significantly fewer incorrect negative example predictions than the current state of the art, using multiple benchmarks covering multiple organisms. Our methods may be applied to generate negative examples for any type of method that deals with protein function, and to this end we provide a database of negative examples in several well-studied organisms, for general use (The NoGO database, available at: bonneaulab.bio.nyu.edu/nogo.html). PMID:24922051

  1. Negative example selection for protein function prediction: the NoGO database.

    PubMed

    Youngs, Noah; Penfold-Brown, Duncan; Bonneau, Richard; Shasha, Dennis

    2014-06-01

    Negative examples - genes that are known not to carry out a given protein function - are rarely recorded in genome and proteome annotation databases, such as the Gene Ontology database. Negative examples are required, however, for several of the most powerful machine learning methods for integrative protein function prediction. Most protein function prediction efforts have relied on a variety of heuristics for the choice of negative examples. Determining the accuracy of methods for negative example prediction is itself a non-trivial task, given that the Open World Assumption as applied to gene annotations rules out many traditional validation metrics. We present a rigorous comparison of these heuristics, utilizing a temporal holdout, and a novel evaluation strategy for negative examples. We add to this comparison several algorithms adapted from Positive-Unlabeled learning scenarios in text-classification, which are the current state of the art methods for generating negative examples in low-density annotation contexts. Lastly, we present two novel algorithms of our own construction, one based on empirical conditional probability, and the other using topic modeling applied to genes and annotations. We demonstrate that our algorithms achieve significantly fewer incorrect negative example predictions than the current state of the art, using multiple benchmarks covering multiple organisms. Our methods may be applied to generate negative examples for any type of method that deals with protein function, and to this end we provide a database of negative examples in several well-studied organisms, for general use (The NoGO database, available at: bonneaulab.bio.nyu.edu/nogo.html).

  2. Genome wide prediction of protein function via a generic knowledge discovery approach based on evidence integration

    PubMed Central

    Xiong, Jianghui; Rayner, Simon; Luo, Kunyi; Li, Yinghui; Chen, Shanguang

    2006-01-01

    Background The automation of many common molecular biology techniques has resulted in the accumulation of vast quantities of experimental data. One of the major challenges now facing researchers is how to process this data to yield useful information about a biological system (e.g. knowledge of genes and their products, and the biological roles of proteins, their molecular functions, localizations and interaction networks). We present a technique called Global Mapping of Unknown Proteins (GMUP) which uses the Gene Ontology Index to relate diverse sources of experimental data by creation of an abstraction layer of evidence data. This abstraction layer is used as input to a neural network which, once trained, can be used to predict function from the evidence data of unannotated proteins. The method allows us to include almost any experimental data set related to protein function, which incorporates the Gene Ontology, to our evidence data in order to seek relationships between the different sets. Results We have demonstrated the capabilities of this method in two ways. We first collected various experimental datasets associated with yeast (Saccharomyces cerevisiae) and applied the technique to a set of previously annotated open reading frames (ORFs). These ORFs were divided into training and test sets and were used to examine the accuracy of the predictions made by our method. Then we applied GMUP to previously un-annotated ORFs and made 1980, 836 and 1969 predictions corresponding to the GO Biological Process, Molecular Function and Cellular Component sub-categories respectively. We found that GMUP was particularly successful at predicting ORFs with functions associated with the ribonucleoprotein complex, protein metabolism and transportation. Conclusion This study presents a global and generic gene knowledge discovery approach based on evidence integration of various genome-scale data. It can be used to provide insight as to how certain biological processes are

  3. PTMcode: a database of known and predicted functional associations between post-translational modifications in proteins.

    PubMed

    Minguez, Pablo; Letunic, Ivica; Parca, Luca; Bork, Peer

    2013-01-01

    Post-translational modifications (PTMs) are involved in the regulation and structural stabilization of eukaryotic proteins. The combination of individual PTM states is a key to modulate cellular functions as became evident in a few well-studied proteins. This combinatorial setting, dubbed the PTM code, has been proposed to be extended to whole proteomes in eukaryotes. Although we are still far from deciphering such a complex language, thousands of protein PTM sites are being mapped by high-throughput technologies, thus providing sufficient data for comparative analysis. PTMcode (http://ptmcode.embl.de) aims to compile known and predicted PTM associations to provide a framework that would enable hypothesis-driven experimental or computational analysis of various scales. In its first release, PTMcode provides PTM functional associations of 13 different PTM types within proteins in 8 eukaryotes. They are based on five evidence channels: a literature survey, residue co-evolution, structural proximity, PTMs at the same residue and location within PTM highly enriched protein regions (hotspots). PTMcode is presented as a protein-based searchable database with an interactive web interface providing the context of the co-regulation of nearly 75 000 residues in >10 000 proteins.

  4. Incorporating significant amino acid pairs and protein domains to predict RNA splicing-related proteins with functional roles.

    PubMed

    Hsu, Justin Bo-Kai; Huang, Kai-Yao; Weng, Tzu-Ya; Huang, Chien-Hsun; Lee, Tzong-Yi

    2014-01-01

    Machinery of pre-mRNA splicing is carried out through the interaction of RNA sequence elements and a variety of RNA splicing-related proteins (SRPs) (e.g. spliceosome and splicing factors). Alternative splicing, which is an important post-transcriptional regulation in eukaryotes, gives rise to multiple mature mRNA isoforms, which encodes proteins with functional diversities. However, the regulation of RNA splicing is not yet fully elucidated, partly because SRPs have not yet been exhaustively identified and the experimental identification is labor-intensive. Therefore, we are motivated to design a new method for identifying SRPs with their functional roles in the regulation of RNA splicing. The experimentally verified SRPs were manually curated from research articles. According to the functional annotation of Splicing Related Gene Database, the collected SRPs were further categorized into four functional groups including small nuclear Ribonucleoprotein, Splicing Factor, Splicing Regulation Factor and Novel Spliceosome Protein. The composition of amino acid pairs indicates that there are remarkable differences among four functional groups of SRPs. Then, support vector machines (SVMs) were utilized to learn the predictive models for identifying SRPs as well as their functional roles. The cross-validation evaluation presents that the SVM models trained with significant amino acid pairs and functional domains could provide a better predictive performance. In addition, the independent testing demonstrates that the proposed method could accurately identify SRPs in mammals/plants as well as effectively distinguish between SRPs and RNA-binding proteins. This investigation provides a practical means to identifying potential SRPs and a perspective for exploring the regulation of RNA splicing.

  5. Coevolutionary modeling of protein sequences: Predicting structure, function, and mutational landscapes

    NASA Astrophysics Data System (ADS)

    Weigt, Martin

    Over the last years, biological research has been revolutionized by experimental high-throughput techniques, in particular by next-generation sequencing technology. Unprecedented amounts of data are accumulating, and there is a growing request for computational methods unveiling the information hidden in raw data, thereby increasing our understanding of complex biological systems. Statistical-physics models based on the maximum-entropy principle have, in the last few years, played an important role in this context. To give a specific example, proteins and many non-coding RNA show a remarkable degree of structural and functional conservation in the course of evolution, despite a large variability in amino acid sequences. We have developed a statistical-mechanics inspired inference approach - called Direct-Coupling Analysis - to link this sequence variability (easy to observe in sequence alignments, which are available in public sequence databases) to bio-molecular structure and function. In my presentation I will show, how this methodology can be used (i) to infer contacts between residues and thus to guide tertiary and quaternary protein structure prediction and RNA structure prediction, (ii) to discriminate interacting from non-interacting protein families, and thus to infer conserved protein-protein interaction networks, and (iii) to reconstruct mutational landscapes and thus to predict the phenotypic effect of mutations. References [1] M. Figliuzzi, H. Jacquier, A. Schug, O. Tenaillon and M. Weigt ''Coevolutionary landscape inference and the context-dependence of mutations in beta-lactamase TEM-1'', Mol. Biol. Evol. (2015), doi: 10.1093/molbev/msv211 [2] E. De Leonardis, B. Lutz, S. Ratz, S. Cocco, R. Monasson, A. Schug, M. Weigt ''Direct-Coupling Analysis of nucleotide coevolution facilitates RNA secondary and tertiary structure prediction'', Nucleic Acids Research (2015), doi: 10.1093/nar/gkv932 [3] F. Morcos, A. Pagnani, B. Lunt, A. Bertolino, D. Marks, C

  6. Prediction of mitochondrial protein function by comparative physiology and phylogenetic profiling.

    PubMed

    Cheng, Yiming; Perocchi, Fabiana

    2015-01-01

    According to the endosymbiotic theory, mitochondria originate from a free-living alpha-proteobacteria that established an intracellular symbiosis with the ancestor of present-day eukaryotic cells. During the bacterium-to-organelle transformation, the proto-mitochondrial proteome has undergone a massive turnover, whereby less than 20 % of modern mitochondrial proteomes can be traced back to the bacterial ancestor. Moreover, mitochondrial proteomes from several eukaryotic organisms, for example, yeast and human, show a rather modest overlap, reflecting differences in mitochondrial physiology. Those differences may result from the combination of differential gain and loss of genes and retargeting processes among lineages. Therefore, an evolutionary signature, also called "phylogenetic profile", could be generated for every mitochondrial protein. Here, we present two evolutionary biology approaches to study mitochondrial physiology: the first strategy, which we refer to as "comparative physiology," allows the de novo identification of mitochondrial proteins involved in a physiological function; the second, known as "phylogenetic profiling," allows to predict protein functions and functional interactions by comparing phylogenetic profiles of uncharacterized and known components.

  7. The involvement of proline-rich protein Mus musculus predicted gene 4736 in ocular surface functions

    PubMed Central

    Qi, Xia; Ren, Sheng-Wei; Zhang, Feng; Wang, Yi-Qiang

    2016-01-01

    AIM To research the two homologous predicted proline-rich protein genes, Mus musculus predicted gene 4736 (MP4) and proline-rich protein BstNI subfamily 1 (Prb1) which were significantly upregulated in cultured corneal organs when encountering fungal pathogen preparations. This study was to confirm the expression and potential functions of these two genes in ocular surface. METHODS A Pseudomonas aeruginosa keratitis model was established in Balb/c mice. One day post infection, mRNA level of MP4 was measured using real-time polymerase chain reaction (PCR), and MP4 protein detected by immunohistochemistry (IHC) or Western blot using a customized polyclonal anti-MP4 antibody preparation. Lacrimal glands from normal mice were also subjected to IHC staining for MP4. An online bioinformatics program, BioGPS, was utilized to screen public data to determine other potential locations of MP4. RESULTS One day after keratitis induction, MP4 was upregulated in the corneas at both mRNA level as measured using real-time PCR and protein levels as measured using Western blot and IHC. BioGPS analysis of public data suggested that the MP4 gene was most abundantly expressed in the lacrimal glands, and IHC revealed that normal murine lacrimal glands were positive for MP4 staining. CONCLUSION MP4 and Prb1 are closely related with the physiology and pathological processes of the ocular surface. Considering the significance of ocular surface abnormalities like dry eye, we propose that MP4 and Prb1 contribute to homeostasis of ocular surface, and deserve more extensive functional and disease correlation studies. PMID:27588265

  8. Enhancing protein function prediction with taxonomic constraints--The Argot2.5 web server.

    PubMed

    Lavezzo, Enrico; Falda, Marco; Fontana, Paolo; Bianco, Luca; Toppo, Stefano

    2016-01-15

    Argot2.5 (Annotation Retrieval of Gene Ontology Terms) is a web server designed to predict protein function. It is an updated version of the previous Argot2 enriched with new features in order to enhance its usability and its overall performance. The algorithmic strategy exploits the grouping of Gene Ontology terms by means of semantic similarity to infer protein function. The tool has been challenged over two independent benchmarks and compared to Argot2, PANNZER, and a baseline method relying on BLAST, proving to obtain a better performance thanks to the contribution of some key interventions in critical steps of the working pipeline. The most effective changes regard: (a) the selection of the input data from sequence similarity searches performed against a clustered version of UniProt databank and a remodeling of the weights given to Pfam hits, (b) the application of taxonomic constraints to filter out annotations that cannot be applied to proteins belonging to the species under investigation. The taxonomic rules are derived from our in-house developed tool, FunTaxIS, that extends those provided by the Gene Ontology consortium. The web server is free for academic users and is available online at http://www.medcomp.medicina.unipd.it/Argot2-5/.

  9. Statistical prediction of protein structural, localization and functional properties by the analysis of its fragment mass distributions after proteolytic cleavage

    PubMed Central

    Bogachev, Mikhail I.; Kayumov, Airat R.; Markelov, Oleg A.; Bunde, Armin

    2016-01-01

    Structural, localization and functional properties of unknown proteins are often being predicted from their primary polypeptide chains using sequence alignment with already characterized proteins and consequent molecular modeling. Here we suggest an approach to predict various structural and structure-associated properties of proteins directly from the mass distributions of their proteolytic cleavage fragments. For amino-acid-specific cleavages, the distributions of fragment masses are determined by the distributions of inter-amino-acid intervals in the protein, that in turn apparently reflect its structural and structure-related features. Large-scale computer simulations revealed that for transmembrane proteins, either α-helical or β -barrel secondary structure could be predicted with about 90% accuracy after thermolysin cleavage. Moreover, 3/4 intrinsically disordered proteins could be correctly distinguished from proteins with fixed three-dimensional structure belonging to all four SCOP structural classes by combining 3–4 different cleavages. Additionally, in some cases the protein cellular localization (cytosolic or membrane-associated) and its host organism (Firmicute or Proteobacteria) could be predicted with around 80% accuracy. In contrast to cytosolic proteins, for membrane-associated proteins exhibiting specific structural conformations, their monotopic or transmembrane localization and functional group (ATP-binding, transporters, sensors and so on) could be also predicted with high accuracy and particular robustness against missing cleavages. PMID:26924271

  10. Statistical prediction of protein structural, localization and functional properties by the analysis of its fragment mass distributions after proteolytic cleavage

    NASA Astrophysics Data System (ADS)

    Bogachev, Mikhail I.; Kayumov, Airat R.; Markelov, Oleg A.; Bunde, Armin

    2016-02-01

    Structural, localization and functional properties of unknown proteins are often being predicted from their primary polypeptide chains using sequence alignment with already characterized proteins and consequent molecular modeling. Here we suggest an approach to predict various structural and structure-associated properties of proteins directly from the mass distributions of their proteolytic cleavage fragments. For amino-acid-specific cleavages, the distributions of fragment masses are determined by the distributions of inter-amino-acid intervals in the protein, that in turn apparently reflect its structural and structure-related features. Large-scale computer simulations revealed that for transmembrane proteins, either α-helical or β -barrel secondary structure could be predicted with about 90% accuracy after thermolysin cleavage. Moreover, 3/4 intrinsically disordered proteins could be correctly distinguished from proteins with fixed three-dimensional structure belonging to all four SCOP structural classes by combining 3–4 different cleavages. Additionally, in some cases the protein cellular localization (cytosolic or membrane-associated) and its host organism (Firmicute or Proteobacteria) could be predicted with around 80% accuracy. In contrast to cytosolic proteins, for membrane-associated proteins exhibiting specific structural conformations, their monotopic or transmembrane localization and functional group (ATP-binding, transporters, sensors and so on) could be also predicted with high accuracy and particular robustness against missing cleavages.

  11. Sparse Markov chain-based semi-supervised multi-instance multi-label method for protein function prediction.

    PubMed

    Han, Chao; Chen, Jian; Wu, Qingyao; Mu, Shuai; Min, Huaqing

    2015-10-01

    Automated assignment of protein function has received considerable attention in recent years for genome-wide study. With the rapid accumulation of genome sequencing data produced by high-throughput experimental techniques, the process of manually predicting functional properties of proteins has become increasingly cumbersome. Such large genomics data sets can only be annotated computationally. However, automated assignment of functions to unknown protein is challenging due to its inherent difficulty and complexity. Previous studies have revealed that solving problems involving complicated objects with multiple semantic meanings using the multi-instance multi-label (MIML) framework is effective. For the protein function prediction problems, each protein object in nature may associate with distinct structural units (instances) and multiple functional properties (class labels) where each unit is described by an instance and each functional property is considered as a class label. Thus, it is convenient and natural to tackle the protein function prediction problem by using the MIML framework. In this paper, we propose a sparse Markov chain-based semi-supervised MIML method, called Sparse-Markov. A sparse transductive probability graph is constructed to encode the affinity information of the data based on ensemble of Hausdorff distance metrics. Our goal is to exploit the affinity between protein objects in the sparse transductive probability graph to seek a sparse steady state probability of the Markov chain model to do protein function prediction, such that two proteins are given similar functional labels if they are close to each other in terms of an ensemble Hausdorff distance in the graph. Experimental results on seven real-world organism data sets covering three biological domains show that our proposed Sparse-Markov method is able to achieve better performance than four state-of-the-art MIML learning algorithms.

  12. VR-BFDT: A variance reduction based binary fuzzy decision tree induction method for protein function prediction.

    PubMed

    Golzari, Fahimeh; Jalili, Saeed

    2015-07-21

    In protein function prediction (PFP) problem, the goal is to predict function of numerous well-sequenced known proteins whose function is not still known precisely. PFP is one of the special and complex problems in machine learning domain in which a protein (regarded as instance) may have more than one function simultaneously. Furthermore, the functions (regarded as classes) are dependent and also are organized in a hierarchical structure in the form of a tree or directed acyclic graph. One of the common learning methods proposed for solving this problem is decision trees in which, by partitioning data into sharp boundaries sets, small changes in the attribute values of a new instance may cause incorrect change in predicted label of the instance and finally misclassification. In this paper, a Variance Reduction based Binary Fuzzy Decision Tree (VR-BFDT) algorithm is proposed to predict functions of the proteins. This algorithm just fuzzifies the decision boundaries instead of converting the numeric attributes into fuzzy linguistic terms. It has the ability of assigning multiple functions to each protein simultaneously and preserves the hierarchy consistency between functional classes. It uses the label variance reduction as splitting criterion to select the best "attribute-value" at each node of the decision tree. The experimental results show that the overall performance of the proposed algorithm is promising.

  13. PINALOG: a novel approach to align protein interaction networks—implications for complex detection and function prediction

    PubMed Central

    Phan, Hang T. T.; Sternberg, Michael J. E.

    2012-01-01

    Motivation: Analysis of protein–protein interaction networks (PPINs) at the system level has become increasingly important in understanding biological processes. Comparison of the interactomes of different species not only provides a better understanding of species evolution but also helps with detecting conserved functional components and in function prediction. Method and Results: Here we report a PPIN alignment method, called PINALOG, which combines information from protein sequence, function and network topology. Alignment of human and yeast PPINs reveals several conserved subnetworks between them that participate in similar biological processes, notably the proteasome and transcription related processes. PINALOG has been tested for its power in protein complex prediction as well as function prediction. Comparison with PSI-BLAST in predicting protein function in the twilight zone also shows that PINALOG is valuable in predicting protein function. Availability and implementation: The PINALOG web-server is freely available from http://www.sbg.bio.ic.ac.uk/~pinalog. The PINALOG program and associated data are available from the Download section of the web-server. Contact: m.sternberg@imperial.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online. PMID:22419782

  14. Predicting protein fold pattern with functional domain and sequential evolution information.

    PubMed

    Shen, Hong-Bin; Chou, Kuo-Chen

    2009-02-01

    The fold pattern of a protein is one level deeper than its structural classification, and hence is more challenging and complicated for prediction. Many efforts have been made in this regard, but so far all the reported success rates are still under 70%, indicating that it is extremely difficult to enhance the success rate even by 1% or 2%. To address this problem, here a novel approach is proposed that is featured by combining the functional domain information and the sequential evolution information through a fusion ensemble classifier. The predictor thus developed is called PFP-FunDSeqE. Tests were performed for identifying proteins among their 27 fold patterns. Compared with the existing predictors tested by a same stringent benchmark dataset, the new predictor can, for the first time, achieve over 70% success rate. The PFP-FunDSeqE predictor is freely available to the public as a web server at http://www.csbio.sjtu.edu.cn/bioinf/PFP-FunDSeqE/.

  15. In silico prediction of structure and functions for some proteins of male-specific region of the human Y chromosome.

    PubMed

    Saha, Chinmoy; Polash, Ahsan Habib; Islam, Md Tariqul; Shafrin, Farhana

    2013-12-01

    Male-specific region of the human Y chromosome (MSY) comprises 95% of its length that is functionally active. This portion inherits in block from father to male offspring. Most of the genes in the MSY region are involved in male-specific function, such as sex determination and spermatogenesis; also contains genes probably involved in other cellular functions. However, a detailed characterization of numerous MSY-encoded proteins still remains to be done. In this study, 12 uncharacterized proteins of MSY were analyzed through bioinformatics tools for structural and functional characterization. Within these 12 proteins, a total of 55 domains were found, with DnaJ domain signature corresponding to be the highest (11%) followed by both FAD-dependent pyridine nucleotide reductase signature and fumarate lyase superfamily signature (9%). The 3D structures of our selected proteins were built up using homology modeling and the protein threading approaches. These predicted structures confirmed in detail the stereochemistry; indicating reasonably good quality model. Furthermore the predicted functions and the proteins with whom they interact established their biological role and their mechanism of action at molecular level. The results of these structure-functional annotations provide a comprehensive view of the proteins encoded by MSY, which sheds light on their biological functions and molecular mechanisms. The data presented in this study may assist in future prognosis of several human diseases such as Turner syndrome, gonadal sex reversal, spermatogenic failure, and gonadoblastoma.

  16. Predicting functional divergence in protein evolution by site-specific rate shifts

    NASA Technical Reports Server (NTRS)

    Gaucher, Eric A.; Gu, Xun; Miyamoto, Michael M.; Benner, Steven A.

    2002-01-01

    Most modern tools that analyze protein evolution allow individual sites to mutate at constant rates over the history of the protein family. However, Walter Fitch observed in the 1970s that, if a protein changes its function, the mutability of individual sites might also change. This observation is captured in the "non-homogeneous gamma model", which extracts functional information from gene families by examining the different rates at which individual sites evolve. This model has recently been coupled with structural and molecular biology to identify sites that are likely to be involved in changing function within the gene family. Applying this to multiple gene families highlights the widespread divergence of functional behavior among proteins to generate paralogs and orthologs.

  17. The Recipe for Protein Sequence-Based Function Prediction and Its Implementation in the ANNOTATOR Software Environment.

    PubMed

    Eisenhaber, Birgit; Kuchibhatla, Durga; Sherman, Westley; Sirota, Fernanda L; Berezovsky, Igor N; Wong, Wing-Cheong; Eisenhaber, Frank

    2016-01-01

    As biomolecular sequencing is becoming the main technique in life sciences, functional interpretation of sequences in terms of biomolecular mechanisms with in silico approaches is getting increasingly significant. Function prediction tools are most powerful for protein-coding sequences; yet, the concepts and technologies used for this purpose are not well reflected in bioinformatics textbooks. Notably, protein sequences typically consist of globular domains and non-globular segments. The two types of regions require cardinally different approaches for function prediction. Whereas the former are classic targets for homology-inspired function transfer based on remnant, yet statistically significant sequence similarity to other, characterized sequences, the latter type of regions are characterized by compositional bias or simple, repetitive patterns and require lexical analysis and/or empirical sequence pattern-function correlations. The recipe for function prediction recommends first to find all types of non-globular segments and, then, to subject the remaining query sequence to sequence similarity searches. We provide an updated description of the ANNOTATOR software environment as an advanced example of a software platform that facilitates protein sequence-based function prediction. PMID:27115649

  18. The Recipe for Protein Sequence-Based Function Prediction and Its Implementation in the ANNOTATOR Software Environment.

    PubMed

    Eisenhaber, Birgit; Kuchibhatla, Durga; Sherman, Westley; Sirota, Fernanda L; Berezovsky, Igor N; Wong, Wing-Cheong; Eisenhaber, Frank

    2016-01-01

    As biomolecular sequencing is becoming the main technique in life sciences, functional interpretation of sequences in terms of biomolecular mechanisms with in silico approaches is getting increasingly significant. Function prediction tools are most powerful for protein-coding sequences; yet, the concepts and technologies used for this purpose are not well reflected in bioinformatics textbooks. Notably, protein sequences typically consist of globular domains and non-globular segments. The two types of regions require cardinally different approaches for function prediction. Whereas the former are classic targets for homology-inspired function transfer based on remnant, yet statistically significant sequence similarity to other, characterized sequences, the latter type of regions are characterized by compositional bias or simple, repetitive patterns and require lexical analysis and/or empirical sequence pattern-function correlations. The recipe for function prediction recommends first to find all types of non-globular segments and, then, to subject the remaining query sequence to sequence similarity searches. We provide an updated description of the ANNOTATOR software environment as an advanced example of a software platform that facilitates protein sequence-based function prediction.

  19. A Random Forest Model for Predicting Allosteric and Functional Sites on Proteins.

    PubMed

    Chen, Ava S-Y; Westwood, Nicholas J; Brear, Paul; Rogers, Graeme W; Mavridis, Lazaros; Mitchell, John B O

    2016-04-01

    We created a computational method to identify allosteric sites using a machine learning method trained and tested on protein structures containing bound ligand molecules. The Random Forest machine learning approach was adopted to build our three-way predictive model. Based on descriptors collated for each ligand and binding site, the classification model allows us to assign protein cavities as allosteric, regular or orthosteric, and hence to identify allosteric sites. 43 structural descriptors per complex were derived and were used to characterize individual protein-ligand binding sites belonging to the three classes, allosteric, regular and orthosteric. We carried out a separate validation on a further unseen set of protein structures containing the ligand 2-(N-cyclohexylamino) ethane sulfonic acid (CHES). PMID:27491922

  20. Family of G protein alpha chains: amphipathic analysis and predicted structure of functional domains.

    PubMed

    Masters, S B; Stroud, R M; Bourne, H R

    1986-01-01

    The G proteins transduce hormonal and other signals into regulation of enzymes such as adenylyl cyclase and retinal cGMP phosphodiesterase. Each G protein contains an alpha subunit that binds and hydrolyzes guanine nucleotides and interacts with beta gamma subunits and specific receptor and effector proteins. Amphipathic and secondary structure analysis of the primary sequences of five different alpha chains (bovine alpha s, alpha t1 and alpha t2, mouse alpha i, and rat alpha o) predicted the secondary structure of a composite alpha chain (alpha avg). The alpha chains contain four short regions of sequence homologous to regions in the GDP binding domain of bacterial elongation factor Tu (EF-Tu). Similarities between the predicted secondary structures of these regions in alpha avg and the known secondary structure of EF-Tu allowed us to construct a three-dimensional model of the GDP binding domain of alpha avg. Identification of the GDP binding domain of alpha avg defined three additional domains in the composite polypeptide. The first includes the amino terminal 41 residues of alpha avg, with a predicted amphipathic alpha helical structure; this domain may control binding of the alpha chains to the beta gamma complex. The second domain, containing predicted beta strands and alpha helices, several of which are strongly amphipathic, probably contains sequences responsible for interaction of alpha chains with effector enzymes. The predicted structure of the third domain, containing the carboxy terminal 100 amino acids, is predominantly beta sheet with an amphipathic alpha helix at the carboxy terminus. We propose that this domain is responsible for receptor binding.(ABSTRACT TRUNCATED AT 250 WORDS) PMID:3148932

  1. A comparison of different functions for predicted protein model quality assessment.

    PubMed

    Li, Juan; Fang, Huisheng

    2016-07-01

    In protein structure prediction, a considerable number of models are usually produced by either the Template-Based Method (TBM) or the ab initio prediction. The purpose of this study is to find the critical parameter in assessing the quality of the predicted models. A non-redundant template library was developed and 138 target sequences were modeled. The target sequences were all distant from the proteins in the template library and were aligned with template library proteins on the basis of the transformation matrix. The quality of each model was first assessed with QMEAN and its six parameters, which are C_β interaction energy (C_beta), all-atom pairwise energy (PE), solvation energy (SE), torsion angle energy (TAE), secondary structure agreement (SSA), and solvent accessibility agreement (SAE). Finally, the alignment score (score) was also used to assess the quality of model. Hence, a total of eight parameters (i.e., QMEAN, C_beta, PE, SE, TAE, SSA, SAE, score) were independently used to assess the quality of each model. The results indicate that SSA is the best parameter to estimate the quality of the model. PMID:27488386

  2. Towards New Drug Targets? Function Prediction of Putative Proteins of Neisseria meningitidis MC58 and Their Virulence Characterization

    PubMed Central

    Shahbaaz, Mohd.; Bisetty, Krishna; Ahmad, Faizan

    2015-01-01

    Abstract Neisseria meningitidis is a Gram-negative aerobic diplococcus, responsible for a variety of meningococcal diseases. The genome of N. meningitidis MC58 is comprised of 2114 genes that are translated into 1953 proteins. The 698 genes (∼35%) encode hypothetical proteins (HPs), because no experimental evidence of their biological functions are available. Analyses of these proteins are important to understand their functions in the metabolic networks and may lead to the discovery of novel drug targets against the infections caused by N. meningitidis. This study aimed at the identification and categorization of each HP present in the genome of N. meningitidis MC58 using computational tools. Functions of 363 proteins were predicted with high accuracy among the annotated set of HPs investigated. The reliably predicted 363 HPs were further grouped into 41 different classes of proteins, based on their possible roles in cellular processes such as metabolism, transport, and replication. Our studies revealed that 22 HPs may be involved in the pathogenesis caused by this microorganism. The top two HPs with highest virulence scores were subjected to molecular dynamics (MD) simulations to better understand their conformational behavior in a water environment. We also compared the MD simulation results with other virulent proteins present in N. meningitidis. This study broadens our understanding of the mechanistic pathways of pathogenesis, drug resistance, tolerance, and adaptability for host immune responses to N. meningitidis. PMID:26076386

  3. SVM-Prot 2016: A Web-Server for Machine Learning Prediction of Protein Functional Families from Sequence Irrespective of Similarity.

    PubMed

    Li, Ying Hong; Xu, Jing Yu; Tao, Lin; Li, Xiao Feng; Li, Shuang; Zeng, Xian; Chen, Shang Ying; Zhang, Peng; Qin, Chu; Zhang, Cheng; Chen, Zhe; Zhu, Feng; Chen, Yu Zong

    2016-01-01

    Knowledge of protein function is important for biological, medical and therapeutic studies, but many proteins are still unknown in function. There is a need for more improved functional prediction methods. Our SVM-Prot web-server employed a machine learning method for predicting protein functional families from protein sequences irrespective of similarity, which complemented those similarity-based and other methods in predicting diverse classes of proteins including the distantly-related proteins and homologous proteins of different functions. Since its publication in 2003, we made major improvements to SVM-Prot with (1) expanded coverage from 54 to 192 functional families, (2) more diverse protein descriptors protein representation, (3) improved predictive performances due to the use of more enriched training datasets and more variety of protein descriptors, (4) newly integrated BLAST analysis option for assessing proteins in the SVM-Prot predicted functional families that were similar in sequence to a query protein, and (5) newly added batch submission option for supporting the classification of multiple proteins. Moreover, 2 more machine learning approaches, K nearest neighbor and probabilistic neural networks, were added for facilitating collective assessment of protein functions by multiple methods. SVM-Prot can be accessed at http://bidd2.nus.edu.sg/cgi-bin/svmprot/svmprot.cgi. PMID:27525735

  4. SVM-Prot 2016: A Web-Server for Machine Learning Prediction of Protein Functional Families from Sequence Irrespective of Similarity

    PubMed Central

    Li, Xiao Feng; Li, Shuang; Zeng, Xian; Chen, Shang Ying; Zhang, Peng; Qin, Chu; Zhang, Cheng; Chen, Zhe; Zhu, Feng; Chen, Yu Zong

    2016-01-01

    Knowledge of protein function is important for biological, medical and therapeutic studies, but many proteins are still unknown in function. There is a need for more improved functional prediction methods. Our SVM-Prot web-server employed a machine learning method for predicting protein functional families from protein sequences irrespective of similarity, which complemented those similarity-based and other methods in predicting diverse classes of proteins including the distantly-related proteins and homologous proteins of different functions. Since its publication in 2003, we made major improvements to SVM-Prot with (1) expanded coverage from 54 to 192 functional families, (2) more diverse protein descriptors protein representation, (3) improved predictive performances due to the use of more enriched training datasets and more variety of protein descriptors, (4) newly integrated BLAST analysis option for assessing proteins in the SVM-Prot predicted functional families that were similar in sequence to a query protein, and (5) newly added batch submission option for supporting the classification of multiple proteins. Moreover, 2 more machine learning approaches, K nearest neighbor and probabilistic neural networks, were added for facilitating collective assessment of protein functions by multiple methods. SVM-Prot can be accessed at http://bidd2.nus.edu.sg/cgi-bin/svmprot/svmprot.cgi. PMID:27525735

  5. Construction of polycythemia vera protein interaction network and prediction of related biological functions.

    PubMed

    Liu, L-J; Cao, X-J; Zhou, C; Sun, Y; Lv, Q-L; Feng, F-B; Zhang, Y-Y; Sun, C-G

    2016-01-01

    Here, polycythemia vera (PV)-related genes were screened by the Online Mendelian Inheritance in Man (OMIM), and literature pertaining to the identified genes was extracted and a protein-protein interaction network was constructed using various Cytoscape plugins. Various molecular complexes were detected using the Clustervize plugin and a gene ontology-enrichment analysis of the biological pathways, molecular functions, and cellular components of the selected molecular complexes were identified using the BiNGo plugin. Fifty-four PV-related genes were identified in OMIM. The protein-protein interaction network contains 5 molecular complexes with correlation integral values >4. These complexes regulated various biological processes (peptide tyrosinase acidification, cell metabolism, and macromolecular biosynthesis), molecular functions (kinase activity, receptor binding, and cytokine activity), and the cellular components were mainly concentrated in the nucleus, intracellular membrane-bounded organelles, and extracellular region. These complexes were associated with the JAK-STAT signal transduction pathway, neurotrophic factor signaling pathway, and Wnt signaling pathway, which were correlated with chronic myeloid leukemia and acute myeloid leukemia. PMID:26909922

  6. PREFACE: Protein protein interactions: principles and predictions

    NASA Astrophysics Data System (ADS)

    Nussinov, Ruth; Tsai, Chung-Jung

    2005-06-01

    Proteins are the `workhorses' of the cell. Their roles span functions as diverse as being molecular machines and signalling. They carry out catalytic reactions, transport, form viral capsids, traverse membranes and form regulated channels, transmit information from DNA to RNA, making possible the synthesis of new proteins, and they are responsible for the degradation of unnecessary proteins and nucleic acids. They are the vehicles of the immune response and are responsible for viral entry into the cell. Given their importance, considerable effort has been centered on the prediction of protein function. A prime way to do this is through identification of binding partners. If the function of at least one of the components with which the protein interacts is known, that should let us assign its function(s) and the pathway(s) in which it plays a role. This holds since the vast majority of their chores in the living cell involve protein-protein interactions. Hence, through the intricate network of these interactions we can map cellular pathways, their interconnectivities and their dynamic regulation. Their identification is at the heart of functional genomics; their prediction is crucial for drug discovery. Knowledge of the pathway, its topology, length, and dynamics may provide useful information for forecasting side effects. The goal of predicting protein-protein interactions is daunting. Some associations are obligatory, others are continuously forming and dissociating. In principle, from the physical standpoint, any two proteins can interact, but under what conditions and at which strength? The principles of protein-protein interactions are general: the non-covalent interactions of two proteins are largely the outcome of the hydrophobic effect, which drives the interactions. In addition, hydrogen bonds and electrostatic interactions play important roles. Thus, many of the interactions observed in vitro are the outcome of experimental overexpression. Protein disorder

  7. Many amino acid substitution variants identified in DNA repair genes during human population screenings are predicted to impact protein function

    SciTech Connect

    Xi, T; Jones, I M; Mohrenweiser, H W

    2003-11-03

    Over 520 different amino acid substitution variants have been previously identified in the systematic screening of 91 human DNA repair genes for sequence variation. Two algorithms were employed to predict the impact of these amino acid substitutions on protein activity. Sorting Intolerant From Tolerant (SIFT) classified 226 of 508 variants (44%) as ''Intolerant''. Polymorphism Phenotyping (PolyPhen) classed 165 of 489 amino acid substitutions (34%) as ''Probably or Possibly Damaging''. Another 9-15% of the variants were classed as ''Potentially Intolerant or Damaging''. The results from the two algorithms are highly associated, with concordance in predicted impact observed for {approx}62% of the variants. Twenty one to thirty one percent of the variant proteins are predicted to exhibit reduced activity by both algorithms. These variants occur at slightly lower individual allele frequency than do the variants classified as ''Tolerant'' or ''Benign''. Both algorithms correctly predicted the impact of 26 functionally characterized amino acid substitutions in the APE1 protein on biochemical activity, with one exception. It is concluded that a substantial fraction of the missense variants observed in the general human population are functionally relevant. These variants are expected to be the molecular genetic and biochemical basis for the associations of reduced DNA repair capacity phenotypes with elevated cancer risk.

  8. Predicting free energy contributions to the conformational stability of folded proteins from the residue sequence with radial basis function networks.

    PubMed

    Casadio, R; Compiani, M; Fariselli, P; Vivarelli, F

    1995-01-01

    Radial basis function neural networks are trained on a data base comprising 38 globular proteins of well resolved crystallographic structure and the corresponding free energy contributions to the overall protein stability (as computed partially from chrystallographic analysis and partially with multiple regression from experimental thermodynamic data by Ponnuswamy and Gromiha (1994)). Starting from the residue sequence and using as input code the percentage of each residue and the total residue number of the protein, it is found with a cross-validation method that neural networks can optimally predict the free energy contributions due to hydrogen bonds, hydrophobic interactions and the unfolded state. Terms due to electrostatic and disulfide bonding free energies are poorly predicted. This is so also when other input codes, including the percentage of secondary structure type of the protein and/or residue-pair information are used. Furthermore, trained on the computed and/or experimental delta G values of the data base, neural networks predict a conformational stability ranging from about 10 to 20 kcal mol-1 rather independently of the residue sequence, with an average error per protein of about 9 kcal mol-1.

  9. WS-SNPs&GO: a web server for predicting the deleterious effect of human protein variants using functional annotation

    PubMed Central

    2013-01-01

    Background SNPs&GO is a method for the prediction of deleterious Single Amino acid Polymorphisms (SAPs) using protein functional annotation. In this work, we present the web server implementation of SNPs&GO (WS-SNPs&GO). The server is based on Support Vector Machines (SVM) and for a given protein, its input comprises: the sequence and/or its three-dimensional structure (when available), a set of target variations and its functional Gene Ontology (GO) terms. The output of the server provides, for each protein variation, the probabilities to be associated to human diseases. Results The server consists of two main components, including updated versions of the sequence-based SNPs&GO (recently scored as one of the best algorithms for predicting deleterious SAPs) and of the structure-based SNPs&GO3d programs. Sequence and structure based algorithms are extensively tested on a large set of annotated variations extracted from the SwissVar database. Selecting a balanced dataset with more than 38,000 SAPs, the sequence-based approach achieves 81% overall accuracy, 0.61 correlation coefficient and an Area Under the Curve (AUC) of the Receiver Operating Characteristic (ROC) curve of 0.88. For the subset of ~6,600 variations mapped on protein structures available at the Protein Data Bank (PDB), the structure-based method scores with 84% overall accuracy, 0.68 correlation coefficient, and 0.91 AUC. When tested on a new blind set of variations, the results of the server are 79% and 83% overall accuracy for the sequence-based and structure-based inputs, respectively. Conclusions WS-SNPs&GO is a valuable tool that includes in a unique framework information derived from protein sequence, structure, evolutionary profile, and protein function. WS-SNPs&GO is freely available at http://snps.biofold.org/snps-and-go. PMID:23819482

  10. AUTO-MUTE 2.0: A Portable Framework with Enhanced Capabilities for Predicting Protein Functional Consequences upon Mutation.

    PubMed

    Masso, Majid; Vaisman, Iosif I

    2014-01-01

    The AUTO-MUTE 2.0 stand-alone software package includes a collection of programs for predicting functional changes to proteins upon single residue substitutions, developed by combining structure-based features with trained statistical learning models. Three of the predictors evaluate changes to protein stability upon mutation, each complementing a distinct experimental approach. Two additional classifiers are available, one for predicting activity changes due to residue replacements and the other for determining the disease potential of mutations associated with nonsynonymous single nucleotide polymorphisms (nsSNPs) in human proteins. These five command-line driven tools, as well as all the supporting programs, complement those that run our AUTO-MUTE web-based server. Nevertheless, all the codes have been rewritten and substantially altered for the new portable software, and they incorporate several new features based on user feedback. Included among these upgrades is the ability to perform three highly requested tasks: to run "big data" batch jobs; to generate predictions using modified protein data bank (PDB) structures, and unpublished personal models prepared using standard PDB file formatting; and to utilize NMR structure files that contain multiple models.

  11. Information theory-based scoring function for the structure-based prediction of protein-ligand binding affinity.

    PubMed

    Kulharia, Mahesh; Goody, Roger S; Jackson, Richard M

    2008-10-01

    The development and validation of a new knowledge based scoring function (SIScoreJE) to predict binding energy between proteins and ligands is presented. SIScoreJE efficiently predicts the binding energy between a small molecule and its protein receptor. Protein-ligand atomic contact information was derived from a Non-Redundant Data set (NRD) of over 3000 X-ray crystal structures of protein-ligand complexes. This information was classified for individual "atom contact pairs" (ACP) which is used to calculate the atomic contact preferences. In addition to the two schemes generated in this study we have assessed a number of other common atom-type classification schemes. The preferences were calculated using an information theoretic relationship of joint entropy. Among 18 different atom-type classification schemes "ScoreJE Atom Type set2" (SATs2) was found to be the most suitable for our approach. To test the sensitivity of the method to the inclusion of solvent, Single-body Solvation Potentials (SSP) were also derived from the atomic contacts between the protein atom types and water molecules modeled using AQUARIUS2. Validation was carried out using an evaluation data set of 100 protein-ligand complexes with known binding energies to test the ability of the scoring functions to reproduce known binding affinities. In summary, it was found that a combined SSP/ScoreJE (SIScoreJE) performed significantly better than ScoreJE alone, and SIScoreJE and ScoreJE performed better than GOLD::GoldScore, GOLD::ChemScore, and XScore.

  12. Final report for LDRD project {open_quotes}A new approach to protein function and structure prediction{close_quotes}

    SciTech Connect

    Phillips, C.A.

    1997-03-01

    This report describes the research performed under the laboratory-Directed Research and Development (LDRD) grant {open_quotes}A new approach to protein function and structure prediction{close_quotes}, funded FY94-6. We describe the goals of the research, motivate and list our improvements to the state of the art in multiple sequence alignment and phylogeny (evolutionary tree) construction, but leave technical details to the six publications resulting from this work. At least three algorithms for phylogeny construction or tree consensus have been implemented and used by researchers outside of Sandia.

  13. Novel Urinary Protein Biomarkers Predicting the Development of Microalbuminuria and Renal Function Decline in Type 1 Diabetes

    PubMed Central

    Schlatzer, Daniela; Maahs, David M.; Chance, Mark R.; Dazard, Jean-Eudes; Li, Xiaolin; Hazlett, Fred; Rewers, Marian; Snell-Bergeon, Janet K.

    2012-01-01

    OBJECTIVE To define a panel of novel protein biomarkers of renal disease. RESEARCH DESIGN AND METHODS Adults with type 1 diabetes in the Coronary Artery Calcification in Type 1 Diabetes study who were initially free of renal complications (n = 465) were followed for development of micro- or macroalbuminuria (MA) and early renal function decline (ERFD, annual decline in estimated glomerular filtration rate of ≥3.3%). The label-free proteomic discovery phase was conducted in 13 patients who progressed to MA by the 6-year visit and 11 control subjects, and four proteins (Tamm-Horsfall glycoprotein, α-1 acid glycoprotein, clusterin, and progranulin) identified in the discovery phase were measured by enzyme-linked immunosorbent assay in 74 subjects: group A, normal renal function (n = 35); group B, ERFD without MA (n = 15); group C, MA without ERFD (n = 16); and group D, both ERFD and MA (n = 8). RESULTS In the label-free analysis, a model of progression to MA was built using 252 peptides, yielding an area under the curve (AUC) of 84.7 ± 5.3%. In the validation study, ordinal logistic regression was used to predict development of ERFD, MA, or both. A panel including Tamm-Horsfall glycoprotein (odds ratio 2.9, 95% CI 1.3–6.2, P = 0.008), progranulin (1.9, 0.8–4.5, P = 0.16), clusterin (0.6, 0.3–1.1, P = 0.09), and α-1 acid glycoprotein (1.6, 0.7–3.7, P = 0.27) improved the AUC from 0.841 to 0.889. CONCLUSIONS A panel of four novel protein biomarkers predicted early renal damage in type 1 diabetes. These findings require further validation in other populations for prediction of renal complications and treatment monitoring. PMID:22238279

  14. Predicting protein function with hierarchical phylogenetic profiles: the Gene3D Phylo-Tuner method applied to eukaryotic genomes.

    PubMed

    Ranea, Juan A G; Yeats, Corin; Grant, Alastair; Orengo, Christine A

    2007-11-01

    "Phylogenetic profiling" is based on the hypothesis that during evolution functionally or physically interacting genes are likely to be inherited or eliminated in a codependent manner. Creating presence-absence profiles of orthologous genes is now a common and powerful way of identifying functionally associated genes. In this approach, correctly determining orthology, as a means of identifying functional equivalence between two genes, is a critical and nontrivial step and largely explains why previous work in this area has mainly focused on using presence-absence profiles in prokaryotic species. Here, we demonstrate that eukaryotic genomes have a high proportion of multigene families whose phylogenetic profile distributions are poor in presence-absence information content. This feature makes them prone to orthology mis-assignment and unsuited to standard profile-based prediction methods. Using CATH structural domain assignments from the Gene3D database for 13 complete eukaryotic genomes, we have developed a novel modification of the phylogenetic profiling method that uses genome copy number of each domain superfamily to predict functional relationships. In our approach, superfamilies are subclustered at ten levels of sequence identity-from 30% to 100%-and phylogenetic profiles built at each level. All the profiles are compared using normalised Euclidean distances to identify those with correlated changes in their domain copy number. We demonstrate that two protein families will "auto-tune" with strong co-evolutionary signals when their profiles are compared at the similarity levels that capture their functional relationship. Our method finds functional relationships that are not detectable by the conventional presence-absence profile comparisons, and it does not require a priori any fixed criteria to define orthologous genes. PMID:18052542

  15. Prediction of Certain Well-Characterized Domains of Known Functions within the PE and PPE Proteins of Mycobacteria.

    PubMed

    Sultana, Rafiya; Tanneeru, Karunakar; Kumar, Ashwin B R; Guruprasad, Lalitha

    2016-01-01

    The PE and PPE protein family are unique to mycobacteria. Though the complete genome sequences for over 500 M. tuberculosis strains and mycobacterial species are available, few PE and PPE proteins have been structurally and functionally characterized. We have therefore used bioinformatics tools to characterize the structure and function of these proteins. We selected representative members of the PE and PPE protein family by phylogeny analysis and using structure-based sequence annotation identified ten well-characterized protein domains of known function. Some of these domains were observed to be common to all mycobacterial species and some were species specific.

  16. Prediction of Certain Well-Characterized Domains of Known Functions within the PE and PPE Proteins of Mycobacteria

    PubMed Central

    Sultana, Rafiya; Tanneeru, Karunakar; Kumar, Ashwin B. R.; Guruprasad, Lalitha

    2016-01-01

    The PE and PPE protein family are unique to mycobacteria. Though the complete genome sequences for over 500 M. tuberculosis strains and mycobacterial species are available, few PE and PPE proteins have been structurally and functionally characterized. We have therefore used bioinformatics tools to characterize the structure and function of these proteins. We selected representative members of the PE and PPE protein family by phylogeny analysis and using structure-based sequence annotation identified ten well-characterized protein domains of known function. Some of these domains were observed to be common to all mycobacterial species and some were species specific. PMID:26891364

  17. Functional, structural and epitopic prediction of hypothetical proteins of Mycobacterium tuberculosis H37Rv: An in silico approach for prioritizing the targets.

    PubMed

    Gazi, Md Amran; Kibria, Mohammad Golam; Mahfuz, Mustafa; Islam, Md Rezaul; Ghosh, Prakash; Afsar, Md Nure Alam; Khan, Md Arif; Ahmed, Tahmeed

    2016-10-15

    The global control of tuberculosis (TB) remains a great challenge from the standpoint of diagnosis, detection of drug resistance, and treatment. Major serodiagnostic limitations include low sensitivity and high cost in detecting TB. On the other hand, treatment measures are often hindered by low efficacies of commonly used drugs and resistance developed by the bacteria. Hence, there is a need to look into newer diagnostic and therapeutic targets. The proteome information available suggests that among the 3906 proteins in Mycobacterium tuberculosis H37Rv, about quarter remain classified as hypothetical uncharacterized set. This study involves a combination of a number of bioinformatics tools to analyze those hypothetical proteins (HPs). An entire set of 999 proteins was primarily screened for protein sequences having conserved domains with high confidence using a combination of the latest versions of protein family databases. Subsequently, 98 of such potential target proteins were extensively analyzed by means of physicochemical characteristics, protein-protein interaction, sub-cellular localization, structural similarity and functional classification. Next, we predicted antigenic proteins from the entire set and identified B and T cell epitopes of these proteins in M. tuberculosis H37Rv. We predicted the function of these HPs belong to various classes of proteins such as enzymes, transporters, receptors, structural proteins, transcription regulators and other proteins. However, the structural similarity prediction of the annotated proteins substantiated the functional classification of those proteins. Consequently, based on higher antigenicity score and sub-cellular localization, we choose two (NP_216420.1, NP_216903.1) of the antigenic proteins to exemplify B and T cell epitope prediction approach. Finally we found 15 epitopes those located partially or fully in the linear epitope region. We found 21 conformational epitopes by using Ellipro server as well. In

  18. Functional, structural and epitopic prediction of hypothetical proteins of Mycobacterium tuberculosis H37Rv: An in silico approach for prioritizing the targets.

    PubMed

    Gazi, Md Amran; Kibria, Mohammad Golam; Mahfuz, Mustafa; Islam, Md Rezaul; Ghosh, Prakash; Afsar, Md Nure Alam; Khan, Md Arif; Ahmed, Tahmeed

    2016-10-15

    The global control of tuberculosis (TB) remains a great challenge from the standpoint of diagnosis, detection of drug resistance, and treatment. Major serodiagnostic limitations include low sensitivity and high cost in detecting TB. On the other hand, treatment measures are often hindered by low efficacies of commonly used drugs and resistance developed by the bacteria. Hence, there is a need to look into newer diagnostic and therapeutic targets. The proteome information available suggests that among the 3906 proteins in Mycobacterium tuberculosis H37Rv, about quarter remain classified as hypothetical uncharacterized set. This study involves a combination of a number of bioinformatics tools to analyze those hypothetical proteins (HPs). An entire set of 999 proteins was primarily screened for protein sequences having conserved domains with high confidence using a combination of the latest versions of protein family databases. Subsequently, 98 of such potential target proteins were extensively analyzed by means of physicochemical characteristics, protein-protein interaction, sub-cellular localization, structural similarity and functional classification. Next, we predicted antigenic proteins from the entire set and identified B and T cell epitopes of these proteins in M. tuberculosis H37Rv. We predicted the function of these HPs belong to various classes of proteins such as enzymes, transporters, receptors, structural proteins, transcription regulators and other proteins. However, the structural similarity prediction of the annotated proteins substantiated the functional classification of those proteins. Consequently, based on higher antigenicity score and sub-cellular localization, we choose two (NP_216420.1, NP_216903.1) of the antigenic proteins to exemplify B and T cell epitope prediction approach. Finally we found 15 epitopes those located partially or fully in the linear epitope region. We found 21 conformational epitopes by using Ellipro server as well. In

  19. A partial loss of function allele of Methyl-CpG-binding protein 2 predicts a human neurodevelopmental syndrome

    PubMed Central

    Samaco, Rodney C.; Fryer, John D.; Ren, Jun; Fyffe, Sharyl; Chao, Hsiao-Tuan; Sun, Yaling; Greer, John J.; Zoghbi, Huda Y.; Neul, Jeffrey L.

    2008-01-01

    Rett Syndrome, an X-linked dominant neurodevelopmental disorder characterized by regression of language and hand use, is primarily caused by mutations in methyl-CpG-binding protein 2 (MECP2). Loss of function mutations in MECP2 are also found in other neurodevelopmental disorders such as autism, Angelman-like syndrome and non-specific mental retardation. Furthermore, duplication of the MECP2 genomic region results in mental retardation with speech and social problems. The common features of human neurodevelopmental disorders caused by the loss or increase of MeCP2 function suggest that even modest alterations of MeCP2 protein levels result in neurodevelopmental problems. To determine whether a small reduction in MeCP2 level has phenotypic consequences, we characterized a conditional mouse allele of Mecp2 that expresses 50% of the wild-type level of MeCP2. Upon careful behavioral analysis, mice that harbor this allele display a spectrum of abnormalities such as learning and motor deficits, decreased anxiety, altered social behavior and nest building, decreased pain recognition and disrupted breathing patterns. These results indicate that precise control of MeCP2 is critical for normal behavior and predict that human neurodevelopmental disorders will result from a subtle reduction in MeCP2 expression. PMID:18321864

  20. A partial loss of function allele of methyl-CpG-binding protein 2 predicts a human neurodevelopmental syndrome.

    PubMed

    Samaco, Rodney C; Fryer, John D; Ren, Jun; Fyffe, Sharyl; Chao, Hsiao-Tuan; Sun, Yaling; Greer, John J; Zoghbi, Huda Y; Neul, Jeffrey L

    2008-06-15

    Rett Syndrome, an X-linked dominant neurodevelopmental disorder characterized by regression of language and hand use, is primarily caused by mutations in methyl-CpG-binding protein 2 (MECP2). Loss of function mutations in MECP2 are also found in other neurodevelopmental disorders such as autism, Angelman-like syndrome and non-specific mental retardation. Furthermore, duplication of the MECP2 genomic region results in mental retardation with speech and social problems. The common features of human neurodevelopmental disorders caused by the loss or increase of MeCP2 function suggest that even modest alterations of MeCP2 protein levels result in neurodevelopmental problems. To determine whether a small reduction in MeCP2 level has phenotypic consequences, we characterized a conditional mouse allele of Mecp2 that expresses 50% of the wild-type level of MeCP2. Upon careful behavioral analysis, mice that harbor this allele display a spectrum of abnormalities such as learning and motor deficits, decreased anxiety, altered social behavior and nest building, decreased pain recognition and disrupted breathing patterns. These results indicate that precise control of MeCP2 is critical for normal behavior and predict that human neurodevelopmental disorders will result from a subtle reduction in MeCP2 expression. PMID:18321864

  1. De Novo Protein Structure Prediction

    NASA Astrophysics Data System (ADS)

    Hung, Ling-Hong; Ngan, Shing-Chung; Samudrala, Ram

    An unparalleled amount of sequence data is being made available from large-scale genome sequencing efforts. The data provide a shortcut to the determination of the function of a gene of interest, as long as there is an existing sequenced gene with similar sequence and of known function. This has spurred structural genomic initiatives with the goal of determining as many protein folds as possible (Brenner and Levitt, 2000; Burley, 2000; Brenner, 2001; Heinemann et al., 2001). The purpose of this is twofold: First, the structure of a gene product can often lead to direct inference of its function. Second, since the function of a protein is dependent on its structure, direct comparison of the structures of gene products can be more sensitive than the comparison of sequences of genes for detecting homology. Presently, structural determination by crystallography and NMR techniques is still slow and expensive in terms of manpower and resources, despite attempts to automate the processes. Computer structure prediction algorithms, while not providing the accuracy of the traditional techniques, are extremely quick and inexpensive and can provide useful low-resolution data for structure comparisons (Bonneau and Baker, 2001). Given the immense number of structures which the structural genomic projects are attempting to solve, there would be a considerable gain even if the computer structure prediction approach were applicable to a subset of proteins.

  2. PSSP-RFE: Accurate Prediction of Protein Structural Class by Recursive Feature Extraction from PSI-BLAST Profile, Physical-Chemical Property and Functional Annotations

    PubMed Central

    Yu, Sanjiu; Zhang, Yuan; Luo, Zhong; Yang, Hua; Zhou, Yue; Zheng, Xiaoqi

    2014-01-01

    Protein structure prediction is critical to functional annotation of the massively accumulated biological sequences, which prompts an imperative need for the development of high-throughput technologies. As a first and key step in protein structure prediction, protein structural class prediction becomes an increasingly challenging task. Amongst most homological-based approaches, the accuracies of protein structural class prediction are sufficiently high for high similarity datasets, but still far from being satisfactory for low similarity datasets, i.e., below 40% in pairwise sequence similarity. Therefore, we present a novel method for accurate and reliable protein structural class prediction for both high and low similarity datasets. This method is based on Support Vector Machine (SVM) in conjunction with integrated features from position-specific score matrix (PSSM), PROFEAT and Gene Ontology (GO). A feature selection approach, SVM-RFE, is also used to rank the integrated feature vectors through recursively removing the feature with the lowest ranking score. The definitive top features selected by SVM-RFE are input into the SVM engines to predict the structural class of a query protein. To validate our method, jackknife tests were applied to seven widely used benchmark datasets, reaching overall accuracies between 84.61% and 99.79%, which are significantly higher than those achieved by state-of-the-art tools. These results suggest that our method could serve as an accurate and cost-effective alternative to existing methods in protein structural classification, especially for low similarity datasets. PMID:24675610

  3. SPOT-Seq-RNA: predicting protein-RNA complex structure and RNA-binding function by fold recognition and binding affinity prediction.

    PubMed

    Yang, Yuedong; Zhao, Huiying; Wang, Jihua; Zhou, Yaoqi

    2014-01-01

    RNA-binding proteins (RBPs) play key roles in RNA metabolism and post-transcriptional regulation. Computational methods have been developed separately for prediction of RBPs and RNA-binding residues by machine-learning techniques and prediction of protein-RNA complex structures by rigid or semiflexible structure-to-structure docking. Here, we describe a template-based technique called SPOT-Seq-RNA that integrates prediction of RBPs, RNA-binding residues, and protein-RNA complex structures into a single package. This integration is achieved by combining template-based structure-prediction software, SPARKS X, with binding affinity prediction software, DRNA. This tool yields reasonable sensitivity (46 %) and high precision (84 %) for an independent test set of 215 RBPs and 5,766 non-RBPs. SPOT-Seq-RNA is computationally efficient for genome-scale prediction of RBPs and protein-RNA complex structures. Its application to human genome study has revealed a similar sensitivity and ability to uncover hundreds of novel RBPs beyond simple homology. The online server and downloadable version of SPOT-Seq-RNA are available at http://sparks-lab.org/server/SPOT-Seq-RNA/.

  4. Structure Prediction of Protein Complexes

    NASA Astrophysics Data System (ADS)

    Pierce, Brian; Weng, Zhiping

    Protein-protein interactions are critical for biological function. They directly and indirectly influence the biological systems of which they are a part. Antibodies bind with antigens to detect and stop viruses and other infectious agents. Cell signaling is performed in many cases through the interactions between proteins. Many diseases involve protein-protein interactions on some level, including cancer and prion diseases.

  5. Predicting the protein-protein interactions using primary structures with predicted protein surface

    PubMed Central

    2010-01-01

    Background Many biological functions involve various protein-protein interactions (PPIs). Elucidating such interactions is crucial for understanding general principles of cellular systems. Previous studies have shown the potential of predicting PPIs based on only sequence information. Compared to approaches that require other auxiliary information, these sequence-based approaches can be applied to a broader range of applications. Results This study presents a novel sequence-based method based on the assumption that protein-protein interactions are more related to amino acids at the surface than those at the core. The present method considers surface information and maintains the advantage of relying on only sequence data by including an accessible surface area (ASA) predictor recently proposed by the authors. This study also reports the experiments conducted to evaluate a) the performance of PPI prediction achieved by including the predicted surface and b) the quality of the predicted surface in comparison with the surface obtained from structures. The experimental results show that surface information helps to predict interacting protein pairs. Furthermore, the prediction performance achieved by using the surface estimated with the ASA predictor is close to that using the surface obtained from protein structures. Conclusion This work presents a sequence-based method that takes into account surface information for predicting PPIs. The proposed procedure of surface identification improves the prediction performance with an F-measure of 5.1%. The extracted surfaces are also valuable in other biomedical applications that require similar information. PMID:20122202

  6. Modeling Protein Domain Function

    ERIC Educational Resources Information Center

    Baker, William P.; Jones, Carleton "Buck"; Hull, Elizabeth

    2007-01-01

    This simple but effective laboratory exercise helps students understand the concept of protein domain function. They use foam beads, Styrofoam craft balls, and pipe cleaners to explore how domains within protein active sites interact to form a functional protein. The activity allows students to gain content mastery and an understanding of the…

  7. Prediction of CTL epitope, in silico modeling and functional analysis of cytolethal distending toxin (CDT) protein of Campylobacter jejuni

    PubMed Central

    2014-01-01

    Background Campylobacter jejuni is a potent bacterial pathogen culpable for diarrheal disease called campylobacteriosis. It is realized as a major health issue attributable to unavailability of appropriate vaccines and clinical treatment options. As other pathogens, C. jejuni entails host cellular components of an infected individual to disseminate this disease. These host–pathogen interfaces during C. jejuni infection are complex, vibrant and involved in the nicking of host cell environment, enzymes and pathways. Existing therapies are trusted only on a much smaller number of drugs, most of them are insufficient because of their severe host toxicity or drug-resistance phenomena. To find out remedial alternatives, the identification of new biotargets is highly anticipated. Understanding the molecules involved in pathogenesis has the potential to yield new and exciting strategies for therapeutic intervention. In this direction, advances in bioinformatics have opened up new possibilities for the rapid measurement of global changes during infection and this could be exploited to understand the molecular interactions involved in campylobacteriosis. Methods In this study, homology modeling, epitope prediction and identification of ligand binding sites has been explored. Further attempt to generate strapping 3D model of cytolethal distending toxin protein from C. jejuni have been described for the first time. Results CDT protein isolated from C. jejuni was analyzed using various bioinformatics and immuno-informatics tools including sequence and structure tools. A total of fifty five antigenic determinants were predicted and prediction results of CTL epitopes revealed that five MHC ligand are found in CDT. The three potential pocket binding site are found in the sequence that can be useful for drug designing. Conclusions This model, we hope, will be of help in designing and predicting novel CDT inhibitors and vaccine candidates. PMID:24552167

  8. Functional Protein Microarray Technology

    PubMed Central

    Hu, Shaohui; Xie, Zhi; Qian, Jiang; Blackshaw, Seth; Zhu, Heng

    2010-01-01

    Functional protein microarrays are emerging as a promising new tool for large-scale and high-throughput studies. In this article, we will review their applications in basic proteomics research, where various types of assays have been developed to probe binding activities to other biomolecules, such as proteins, DNA, RNA, small molecules, and glycans. We will also report recent progress of using functional protein microarrays in profiling protein posttranslational modifications, including phosphorylation, ubiquitylation, acetylation, and nitrosylation. Finally, we will discuss potential of functional protein microarrays in biomarker identification and clinical diagnostics. We strongly believe that functional protein microarrays will soon become an indispensible and invaluable tool in proteomics research and systems biology. PMID:20872749

  9. Transmembrane beta-barrel protein structure prediction

    NASA Astrophysics Data System (ADS)

    Randall, Arlo; Baldi, Pierre

    Transmembrane β-barrel (TMB) proteins are embedded in the outer membranes of mitochondria, Gram-negative bacteria, and chloroplasts. These proteins perform critical functions, including active ion-transport and passive nutrient intake. Therefore, there is a need for accurate prediction of secondary and tertiary structures of TMB proteins. A variety of methods have been developed for predicting the secondary structure and these predictions are very useful for constructing a coarse topology of TMB structure; however, they do not provide enough information to construct a low-resolution tertiary structure for a TMB protein. In addition, while the overall structural architecture is well conserved among TMB proteins, the amino acid sequences are highly divergent. Thus, traditional homology modeling methods cannot be applied to many putative TMB proteins. Here, we describe the TMBpro: a pipeline of methods for predicting TMB secondary structure, β-residue contacts, and finally tertiary structure. The tertiary prediction method relies on the specific construction rules that TMB proteins adhere to and on the predicted β-residue contacts to dramatically reduce the search space for the model building procedure.

  10. IMP 2.0: a multi-species functional genomics portal for integration, visualization and prediction of protein functions and networks.

    PubMed

    Wong, Aaron K; Krishnan, Arjun; Yao, Victoria; Tadych, Alicja; Troyanskaya, Olga G

    2015-07-01

    IMP (Integrative Multi-species Prediction), originally released in 2012, is an interactive web server that enables molecular biologists to interpret experimental results and to generate hypotheses in the context of a large cross-organism compendium of functional predictions and networks. The system provides biologists with a framework to analyze their candidate gene sets in the context of functional networks, expanding or refining their sets using functional relationships predicted from integrated high-throughput data. IMP 2.0 integrates updated prior knowledge and data collections from the last three years in the seven supported organisms (Homo sapiens, Mus musculus, Rattus norvegicus, Drosophila melanogaster, Danio rerio, Caenorhabditis elegans, and Saccharomyces cerevisiae) and extends function prediction coverage to include human disease. IMP identifies homologs with conserved functional roles for disease knowledge transfer, allowing biologists to analyze disease contexts and predictions across all organisms. Additionally, IMP 2.0 implements a new flexible platform for experts to generate custom hypotheses about biological processes or diseases, making sophisticated data-driven methods easily accessible to researchers. IMP does not require any registration or installation and is freely available for use at http://imp.princeton.edu.

  11. Predicting protein-peptide interactions from scratch

    NASA Astrophysics Data System (ADS)

    Yan, Chengfei; Xu, Xianjin; Zou, Xiaoqin; Zou lab Team

    Protein-peptide interactions play an important role in many cellular processes. The ability to predict protein-peptide complex structures is valuable for mechanistic investigation and therapeutic development. Due to the high flexibility of peptides and lack of templates for homologous modeling, predicting protein-peptide complex structures is extremely challenging. Recently, we have developed a novel docking framework for protein-peptide structure prediction. Specifically, given the sequence of a peptide and a 3D structure of the protein, initial conformations of the peptide are built through protein threading. Then, the peptide is globally and flexibly docked onto the protein using a novel iterative approach. Finally, the sampled modes are scored and ranked by a statistical potential-based energy scoring function that was derived for protein-peptide interactions from statistical mechanics principles. Our docking methodology has been tested on the Peptidb database and compared with other protein-peptide docking methods. Systematic analysis shows significantly improved results compared to the performances of the existing methods. Our method is computationally efficient and suitable for large-scale applications. Nsf CAREER Award 0953839 (XZ) NIH R01GM109980 (XZ).

  12. Enzyme function prediction with interpretable models.

    PubMed

    Syed, Umar; Yona, Golan

    2009-01-01

    Enzymes play central roles in metabolic pathways, and the prediction of metabolic pathways in newly sequenced genomes usually starts with the assignment of genes to enzymatic reactions. However, genes with similar catalytic activity are not necessarily similar in sequence, and therefore the traditional sequence similarity-based approach often fails to identify the relevant enzymes, thus hindering efforts to map the metabolome of an organism.Here we study the direct relationship between basic protein properties and their function. Our goal is to develop a new tool for functional prediction (e.g., prediction of Enzyme Commission number), which can be used to complement and support other techniques based on sequence or structure information. In order to define this mapping we collected a set of 453 features and properties that characterize proteins and are believed to be related to structural and functional aspects of proteins. We introduce a mixture model of stochastic decision trees to learn the set of potentially complex relationships between features and function. To study these correlations, trees are created and tested on the Pfam classification of proteins, which is based on sequence, and the EC classification, which is based on enzymatic function. The model is very effective in learning highly diverged protein families or families that are not defined on the basis of sequence. The resulting tree structures highlight the properties that are strongly correlated with structural and functional aspects of protein families, and can be used to suggest a concise definition of a protein family.

  13. Predicting disease-related proteins based on clique backbone in protein-protein interaction network.

    PubMed

    Yang, Lei; Zhao, Xudong; Tang, Xianglong

    2014-01-01

    Network biology integrates different kinds of data, including physical or functional networks and disease gene sets, to interpret human disease. A clique (maximal complete subgraph) in a protein-protein interaction network is a topological module and possesses inherently biological significance. A disease-related clique possibly associates with complex diseases. Fully identifying disease components in a clique is conductive to uncovering disease mechanisms. This paper proposes an approach of predicting disease proteins based on cliques in a protein-protein interaction network. To tolerate false positive and negative interactions in protein networks, extending cliques and scoring predicted disease proteins with gene ontology terms are introduced to the clique-based method. Precisions of predicted disease proteins are verified by disease phenotypes and steadily keep to more than 95%. The predicted disease proteins associated with cliques can partly complement mapping between genotype and phenotype, and provide clues for understanding the pathogenesis of serious diseases.

  14. Semiempirical prediction of protein folds

    NASA Astrophysics Data System (ADS)

    Fernández, Ariel; Colubri, Andrés; Appignanesi, Gustavo

    2001-08-01

    We introduce a semiempirical approach to predict ab initio expeditious pathways and native backbone geometries of proteins that fold under in vitro renaturation conditions. The algorithm is engineered to incorporate a discrete codification of local steric hindrances that constrain the movements of the peptide backbone throughout the folding process. Thus, the torsional state of the chain is assumed to be conditioned by the fact that hopping from one basin of attraction to another in the Ramachandran map (local potential energy surface) of each residue is energetically more costly than the search for a specific (Φ, Ψ) torsional state within a single basin. A combinatorial procedure is introduced to evaluate coarsely defined torsional states of the chain defined ``modulo basins'' and translate them into meaningful patterns of long range interactions. Thus, an algorithm for structure prediction is designed based on the fact that local contributions to the potential energy may be subsumed into time-evolving conformational constraints defining sets of restricted backbone geometries whereupon the patterns of nonbonded interactions are constructed. The predictive power of the algorithm is assessed by (a) computing ab initio folding pathways for mammalian ubiquitin that ultimately yield a stable structural pattern reproducing all of its native features, (b) determining the nucleating event that triggers the hydrophobic collapse of the chain, and (c) comparing coarse predictions of the stable folds of moderately large proteins (N~100) with structural information extracted from the protein data bank.

  15. Prediction and integration of regulatory and protein-protein interactions

    SciTech Connect

    Wichadakul, Duangdao; McDermott, Jason E.; Samudrala, Ram

    2009-04-20

    Knowledge of transcriptional regulatory interactions (TRIs) is essential for exploring functional genomics and systems biology in any organism. While several results from genome-wide analysis of transcriptional regulatory networks are available, they are limited to model organisms such as yeast [1] and worm [2]. Beyond these networks, experiments on TRIs study only individual genes and proteins of specific interest. In this chapter, we present a method for the integration of various data sets to predict TRIs for 54 organisms in the Bioverse [3]. We describe how to compile and handle various formats and identifiers of data sets from different sources, and how to predict the TRIs using a homology-based approach, utilizing the compiled data sets. Integrated data sets include experimentally verified TRIs, binding sites of transcription factors, promoter sequences, protein sub-cellular localization, and protein families. Predicted TRIs expand the networks of gene regulation for a large number of organisms. The integration of experimentally verified and predicted TRIs with other known protein-protein interactions (PPIs) gives insight into specific pathways, network motifs, and the topological dynamics of an integrated network with gene expression under different conditions, essential for exploring functional genomics and systems biology.

  16. Predicting Thermodynamic Behaviors of Non-Protein Amino Acids as a Function of Temperature and pH

    NASA Astrophysics Data System (ADS)

    Kitadai, Norio

    2016-03-01

    Why does life use α-amino acids exclusively as building blocks of proteins? To address that fundamental question from an energetic perspective, this study estimated the standard molal thermodynamic data for three non-α-amino acids (β-alanine, γ-aminobutyric acid, and ɛ-aminocaproic acid) and α-amino- n-butyric acid in their zwitterionic, negative, and positive ionization states based on the corresponding experimental measurements reported in the literature. Temperature dependences of their heat capacities were described based on the revised Helgeson-Kirkham-Flowers (HKF) equations of state. The obtained dataset was then used to calculate the standard molal Gibbs energies ( ∆G o) of the non-α-amino acids as a function of temperature and pH. Comparison of their ∆G o values with those of α-amino acids having the same molecular formula showed that the non-α-amino acids have similar ∆G o values to the corresponding α-amino acids in physiologically relevant conditions (neutral pH, <100 °C). In acidic and alkaline pH, the non-α-amino acids are thermodynamically more stable than the corresponding α-ones over a broad temperature range. These results suggest that the energetic cost of synthesis is not an important selection pressure to incorporate α-amino acids into biological systems.

  17. Predicting Thermodynamic Behaviors of Non-Protein Amino Acids as a Function of Temperature and pH.

    PubMed

    Kitadai, Norio

    2016-03-01

    Why does life use α-amino acids exclusively as building blocks of proteins? To address that fundamental question from an energetic perspective, this study estimated the standard molal thermodynamic data for three non-α-amino acids (β-alanine, γ-aminobutyric acid, and ε-aminocaproic acid) and α-amino-n-butyric acid in their zwitterionic, negative, and positive ionization states based on the corresponding experimental measurements reported in the literature. Temperature dependences of their heat capacities were described based on the revised Helgeson-Kirkham-Flowers (HKF) equations of state. The obtained dataset was then used to calculate the standard molal Gibbs energies (∆G (o)) of the non-α-amino acids as a function of temperature and pH. Comparison of their ∆G (o) values with those of α-amino acids having the same molecular formula showed that the non-α-amino acids have similar ∆G (o) values to the corresponding α-amino acids in physiologically relevant conditions (neutral pH, <100 °C). In acidic and alkaline pH, the non-α-amino acids are thermodynamically more stable than the corresponding α-ones over a broad temperature range. These results suggest that the energetic cost of synthesis is not an important selection pressure to incorporate α-amino acids into biological systems.

  18. New protein functions in yeast chromosome VIII.

    PubMed Central

    Ouzounis, C.; Bork, P.; Casari, G.; Sander, C.

    1995-01-01

    The analysis of the 269 open reading frames of yeast chromosome VIII by computational methods has yielded 24 new significant sequence similarities to proteins of known function. The resulting predicted functions include three particularly interesting cases of translation-associated proteins: peptidyl-tRNA hydrolase, a ribosome recycling factor homologue, and a protein similar to cytochrome b translational activator CBS2. The methodological limits of the meaningful transfer of functional information between distant homologues are discussed. PMID:8563640

  19. Actin-interacting and flagellar proteins in Leishmania spp.: Bioinformatics predictions to functional assignments in phagosome formation

    PubMed Central

    2009-01-01

    Several motile processes are responsible for the movement of proteins into and within the flagellar membrane, but little is known about the process by which specific proteins (either actin-associated or not) are targeted to protozoan flagellar membranes. Actin is a major cytoskeleton protein, while polymerization and depolymerization of parasite actin and actin-interacting proteins (AIPs) during both processes of motility and host cell entry might be key events for successful infection. For a better understanding the eukaryotic flagellar dynamics, we have surveyed genomes, transcriptomes and proteomes of pathogenic Leishmania spp. to identify pertinent genes/proteins and to build in silico models to properly address their putative roles in trypanosomatid virulence. In a search for AIPs involved in flagellar activities, we applied computational biology and proteomic tools to infer from the biological meaning of coronins and Arp2/3, two important elements in phagosome formation after parasite phagocytosis by macrophages. Results presented here provide the first report of Leishmania coronin and Arp2/3 as flagellar proteins that also might be involved in phagosome formation through actin polymerization within the flagellar environment. This is an issue worthy of further in vitro examination that remains now as a direct, positive bioinformatics-derived inference to be presented. PMID:21637533

  20. Actin-interacting and flagellar proteins in Leishmania spp.: Bioinformatics predictions to functional assignments in phagosome formation.

    PubMed

    Diniz, Michely C; Costa, Marcília P; Pacheco, Ana C L; Kamimura, Michel T; Silva, Samara C; Carneiro, Laura D G; Sousa, Ana P L; Soares, Carlos E A; Souza, Celeste S F; de Oliveira, Diana Magalhães

    2009-07-01

    Several motile processes are responsible for the movement of proteins into and within the flagellar membrane, but little is known about the process by which specific proteins (either actin-associated or not) are targeted to protozoan flagellar membranes. Actin is a major cytoskeleton protein, while polymerization and depolymerization of parasite actin and actin-interacting proteins (AIPs) during both processes of motility and host cell entry might be key events for successful infection. For a better understanding the eukaryotic flagellar dynamics, we have surveyed genomes, transcriptomes and proteomes of pathogenic Leishmania spp. to identify pertinent genes/proteins and to build in silico models to properly address their putative roles in trypanosomatid virulence. In a search for AIPs involved in flagellar activities, we applied computational biology and proteomic tools to infer from the biological meaning of coronins and Arp2/3, two important elements in phagosome formation after parasite phagocytosis by macrophages. Results presented here provide the first report of Leishmania coronin and Arp2/3 as flagellar proteins that also might be involved in phagosome formation through actin polymerization within the flagellar environment. This is an issue worthy of further in vitro examination that remains now as a direct, positive bioinformatics-derived inference to be presented. PMID:21637533

  1. Predicting communities from functional traits.

    PubMed

    Cadotte, Marc W; Arnillas, Carlos A; Livingstone, Stuart W; Yasui, Simone-Louise E

    2015-09-01

    Species traits influence where species live and how they interact. While there have been many advances in describing the functional composition and diversity of communities, only recently do researchers have the ability to predict community composition and diversity. This predictive ability can offer fundamental insights into ecosystem resilience and restoration. PMID:26190136

  2. Predicting protein-ligand and protein-peptide interfaces

    NASA Astrophysics Data System (ADS)

    Bertolazzi, Paola; Guerra, Concettina; Liuzzi, Giampaolo

    2014-06-01

    The paper deals with the identification of binding sites and concentrates on interactions involving small interfaces. In particular we focus our attention on two major interface types, namely protein-ligand and protein-peptide interfaces. As concerns protein-ligand binding site prediction, we classify the most interesting methods and approaches into four main categories: (a) shape-based methods, (b) alignment-based methods, (c) graph-theoretic approaches and (d) machine learning methods. Class (a) encompasses those methods which employ, in some way, geometric information about the protein surface. Methods falling into class (b) address the prediction problem as an alignment problem, i.e. finding protein-ligand atom pairs that occupy spatially equivalent positions. Graph theoretic approaches, class (c), are mainly based on the definition of a particular graph, known as the protein contact graph, and then apply some sophisticated methods from graph theory to discover subgraphs or score similarities for uncovering functional sites. The last class (d) contains those methods that are based on the learn-from-examples paradigm and that are able to take advantage of the large amount of data available on known protein-ligand pairs. As for protein-peptide interfaces, due to the often disordered nature of the regions involved in binding, shape similarity is no longer a determining factor. Then, in geometry-based methods, geometry is accounted for by providing the relative position of the atoms surrounding the peptide residues in known structures. Finally, also for protein-peptide interfaces, we present a classification of some successful machine learning methods. Indeed, they can be categorized in the way adopted to construct the learning examples. In particular, we envisage three main methods: distance functions, structure and potentials and structure alignment.

  3. Developing algorithms for predicting protein-protein interactions of homology modeled proteins.

    SciTech Connect

    Martin, Shawn Bryan; Sale, Kenneth L.; Faulon, Jean-Loup Michel; Roe, Diana C.

    2006-01-01

    The goal of this project was to examine the protein-protein docking problem, especially as it relates to homology-based structures, identify the key bottlenecks in current software tools, and evaluate and prototype new algorithms that may be developed to improve these bottlenecks. This report describes the current challenges in the protein-protein docking problem: correctly predicting the binding site for the protein-protein interaction and correctly placing the sidechains. Two different and complementary approaches are taken that can help with the protein-protein docking problem. The first approach is to predict interaction sites prior to docking, and uses bioinformatics studies of protein-protein interactions to predict theses interaction site. The second approach is to improve validation of predicted complexes after docking, and uses an improved scoring function for evaluating proposed docked poses, incorporating a solvation term. This scoring function demonstrates significant improvement over current state-of-the art functions. Initial studies on both these approaches are promising, and argue for full development of these algorithms.

  4. Predicting Resistance Mutations Using Protein Design Algorithms

    SciTech Connect

    Frey, K.; Georgiev, I; Donald, B; Anderson, A

    2010-01-01

    Drug resistance resulting from mutations to the target is an unfortunate common phenomenon that limits the lifetime of many of the most successful drugs. In contrast to the investigation of mutations after clinical exposure, it would be powerful to be able to incorporate strategies early in the development process to predict and overcome the effects of possible resistance mutations. Here we present a unique prospective application of an ensemble-based protein design algorithm, K*, to predict potential resistance mutations in dihydrofolate reductase from Staphylococcus aureus using positive design to maintain catalytic function and negative design to interfere with binding of a lead inhibitor. Enzyme inhibition assays show that three of the four highly-ranked predicted mutants are active yet display lower affinity (18-, 9-, and 13-fold) for the inhibitor. A crystal structure of the top-ranked mutant enzyme validates the predicted conformations of the mutated residues and the structural basis of the loss of potency. The use of protein design algorithms to predict resistance mutations could be incorporated in a lead design strategy against any target that is susceptible to mutational resistance.

  5. Scoring docking conformations using predicted protein interfaces

    PubMed Central

    2014-01-01

    Background Since proteins function by interacting with other molecules, analysis of protein-protein interactions is essential for comprehending biological processes. Whereas understanding of atomic interactions within a complex is especially useful for drug design, limitations of experimental techniques have restricted their practical use. Despite progress in docking predictions, there is still room for improvement. In this study, we contribute to this topic by proposing T-PioDock, a framework for detection of a native-like docked complex 3D structure. T-PioDock supports the identification of near-native conformations from 3D models that docking software produced by scoring those models using binding interfaces predicted by the interface predictor, Template based Protein Interface Prediction (T-PIP). Results First, exhaustive evaluation of interface predictors demonstrates that T-PIP, whose predictions are customised to target complexity, is a state-of-the-art method. Second, comparative study between T-PioDock and other state-of-the-art scoring methods establishes T-PioDock as the best performing approach. Moreover, there is good correlation between T-PioDock performance and quality of docking models, which suggests that progress in docking will lead to even better results at recognising near-native conformations. Conclusion Accurate identification of near-native conformations remains a challenging task. Although availability of 3D complexes will benefit from template-based methods such as T-PioDock, we have identified specific limitations which need to be addressed. First, docking software are still not able to produce native like models for every target. Second, current interface predictors do not explicitly consider pairwise residue interactions between proteins and their interacting partners which leaves ambiguity when assessing quality of complex conformations. PMID:24906633

  6. Functional significance of protein assemblies predicted by the crystal structure of the restriction endonuclease BsaWI.

    PubMed

    Tamulaitis, Gintautas; Rutkauskas, Marius; Zaremba, Mindaugas; Grazulis, Saulius; Tamulaitiene, Giedre; Siksnys, Virginijus

    2015-09-18

    Type II restriction endonuclease BsaWI recognizes a degenerated sequence 5'-W/CCGGW-3' (W stands for A or T, '/' denotes the cleavage site). It belongs to a large family of restriction enzymes that contain a conserved CCGG tetranucleotide in their target sites. These enzymes are arranged as dimers or tetramers, and require binding of one, two or three DNA targets for their optimal catalytic activity. Here, we present a crystal structure and biochemical characterization of the restriction endonuclease BsaWI. BsaWI is arranged as an 'open' configuration dimer and binds a single DNA copy through a minor groove contacts. In the crystal primary BsaWI dimers form an indefinite linear chain via the C-terminal domain contacts implying possible higher order aggregates. We show that in solution BsaWI protein exists in a dimer-tetramer-oligomer equilibrium, but in the presence of specific DNA forms a tetramer bound to two target sites. Site-directed mutagenesis and kinetic experiments show that BsaWI is active as a tetramer and requires two target sites for optimal activity. We propose BsaWI mechanism that shares common features both with dimeric Ecl18kI/SgrAI and bona fide tetrameric NgoMIV/SfiI enzymes. PMID:26240380

  7. Protein Residue Contacts and Prediction Methods

    PubMed Central

    Adhikari, Badri

    2016-01-01

    In the field of computational structural proteomics, contact predictions have shown new prospects of solving the longstanding problem of ab initio protein structure prediction. In the last few years, application of deep learning algorithms and availability of large protein sequence databases, combined with improvement in methods that derive contacts from multiple sequence alignments, have shown a huge increase in the precision of contact prediction. In addition, these predicted contacts have also been used to build three-dimensional models from scratch. In this chapter, we briefly discuss many elements of protein residue–residue contacts and the methods available for prediction, focusing on a state-of-the-art contact prediction tool, DNcon. Illustrating with a case study, we describe how DNcon can be used to make ab initio contact predictions for a given protein sequence and discuss how the predicted contacts may be analyzed and evaluated. PMID:27115648

  8. Phospholipid liposomes functionalized by protein

    NASA Astrophysics Data System (ADS)

    Glukhova, O. E.; Savostyanov, G. V.; Grishina, O. A.

    2015-03-01

    Finding new ways to deliver neurotrophic drugs to the brain in newborns is one of the contemporary problems of medicine and pharmaceutical industry. Modern researches in this field indicate the promising prospects of supramolecular transport systems for targeted drug delivery to the brain which can overcome the blood-brain barrier (BBB). Thus, the solution of this problem is actual not only for medicine, but also for society as a whole because it determines the health of future generations. Phospholipid liposomes due to combination of lipo- and hydrophilic properties are considered as the main future objects in medicine for drug delivery through the BBB as well as increasing their bioavailability and toxicity. Liposomes functionalized by various proteins were used as transport systems for ease of liposomes use. Designing of modification oligosaccharide of liposomes surface is promising in the last decade because it enables the delivery of liposomes to specific receptor of human cells by selecting ligand and it is widely used in pharmacology for the treatment of several diseases. The purpose of this work is creation of a coarse-grained model of bilayer of phospholipid liposomes, functionalized by specific to the structural elements of the BBB proteins, as well as prediction of the most favorable orientation and position of the molecules in the generated complex by methods of molecular docking for the formation of the structure. Investigation of activity of the ligand molecule to protein receptor of human cells by the methods of molecular dynamics was carried out.

  9. An Overview of Practical Applications of Protein Disorder Prediction and Drive for Faster, More Accurate Predictions

    PubMed Central

    Deng, Xin; Gumm, Jordan; Karki, Suman; Eickholt, Jesse; Cheng, Jianlin

    2015-01-01

    Protein disordered regions are segments of a protein chain that do not adopt a stable structure. Thus far, a variety of protein disorder prediction methods have been developed and have been widely used, not only in traditional bioinformatics domains, including protein structure prediction, protein structure determination and function annotation, but also in many other biomedical fields. The relationship between intrinsically-disordered proteins and some human diseases has played a significant role in disorder prediction in disease identification and epidemiological investigations. Disordered proteins can also serve as potential targets for drug discovery with an emphasis on the disordered-to-ordered transition in the disordered binding regions, and this has led to substantial research in drug discovery or design based on protein disordered region prediction. Furthermore, protein disorder prediction has also been applied to healthcare by predicting the disease risk of mutations in patients and studying the mechanistic basis of diseases. As the applications of disorder prediction increase, so too does the need to make quick and accurate predictions. To fill this need, we also present a new approach to predict protein residue disorder using wide sequence windows that is applicable on the genomic scale. PMID:26198229

  10. Membrane Topology and Predicted RNA-Binding Function of the ‘Early Responsive to Dehydration (ERD4)’ Plant Protein

    PubMed Central

    Rai, Archana; Suprasanna, Penna; D'Souza, Stanislaus F.; Kumar, Vinay

    2012-01-01

    Functional annotation of uncharacterized genes is the main focus of computational methods in the post genomic era. These tools search for similarity between proteins on the premise that those sharing sequence or structural motifs usually perform related functions, and are thus particularly useful for membrane proteins. Early responsive to dehydration (ERD) genes are rapidly induced in response to dehydration stress in a variety of plant species. In the present work we characterized function of Brassica juncea ERD4 gene using computational approaches. The ERD4 protein of unknown function possesses ubiquitous DUF221 domain (residues 312–634) and is conserved in all plant species. We suggest that the protein is localized in chloroplast membrane with at least nine transmembrane helices. We detected a globular domain of 165 amino acid residues (183–347) in plant ERD4 proteins and expect this to be posited inside the chloroplast. The structural-functional annotation of the globular domain was arrived at using fold recognition methods, which suggested in its sequence presence of two tandem RNA-recognition motif (RRM) domains each folded into βαββαβ topology. The structure based sequence alignment with the known RNA-binding proteins revealed conservation of two non-canonical ribonucleoprotein sub-motifs in both the putative RNA-recognition domains of the ERD4 protein. The function of highly conserved ERD4 protein may thus be associated with its RNA-binding ability during the stress response. This is the first functional annotation of ERD4 family of proteins that can be useful in designing experiments to unravel crucial aspects of stress tolerance mechanism. PMID:22431979

  11. Prediction of functional phosphorylation sites by incorporating evolutionary information.

    PubMed

    Niu, Shen; Wang, Zhen; Ge, Dongya; Zhang, Guoqing; Li, Yixue

    2012-09-01

    Protein phosphorylation is a ubiquitous protein post-translational modification, which plays an important role in cellular signaling systems underlying various physiological and pathological processes. Current in silico methods mainly focused on the prediction of phosphorylation sites, but rare methods considered whether a phosphorylation site is functional or not. Since functional phosphorylation sites are more valuable for further experimental research and a proportion of phosphorylation sites have no direct functional effects, the prediction of functional phosphorylation sites is quite necessary for this research area. Previous studies have shown that functional phosphorylation sites are more conserved than non-functional phosphorylation sites in evolution. Thus, in our method, we developed a web server by integrating existing phosphorylation site prediction methods, as well as both absolute and relative evolutionary conservation scores to predict the most likely functional phosphorylation sites. Using our method, we predicted the most likely functional sites of the human, rat and mouse proteomes and built a database for the predicted sites. By the analysis of overall prediction results, we demonstrated that protein phosphorylation plays an important role in all the enriched KEGG pathways. By the analysis of protein-specific prediction results, we demonstrated the usefulness of our method for individual protein studies. Our method would help to characterize the most likely functional phosphorylation sites for further studies in this research area.

  12. The DynaMine webserver: predicting protein dynamics from sequence.

    PubMed

    Cilia, Elisa; Pancsa, Rita; Tompa, Peter; Lenaerts, Tom; Vranken, Wim F

    2014-07-01

    Protein dynamics are important for understanding protein function. Unfortunately, accurate protein dynamics information is difficult to obtain: here we present the DynaMine webserver, which provides predictions for the fast backbone movements of proteins directly from their amino-acid sequence. DynaMine rapidly produces a profile describing the statistical potential for such movements at residue-level resolution. The predicted values have meaning on an absolute scale and go beyond the traditional binary classification of residues as ordered or disordered, thus allowing for direct dynamics comparisons between protein regions. Through this webserver, we provide molecular biologists with an efficient and easy to use tool for predicting the dynamical characteristics of any protein of interest, even in the absence of experimental observations. The prediction results are visualized and can be directly downloaded. The DynaMine webserver, including instructive examples describing the meaning of the profiles, is available at http://dynamine.ibsquare.be.

  13. Protein structure prediction using hybrid AI methods

    SciTech Connect

    Guan, X.; Mural, R.J.; Uberbacher, E.C.

    1993-11-01

    This paper describes a new approach for predicting protein structures based on Artificial Intelligence methods and genetic algorithms. We combine nearest neighbor searching algorithms, neural networks, heuristic rules and genetic algorithms to form an integrated system to predict protein structures from their primary amino acid sequences. First we describe our methods and how they are integrated, and then apply our methods to several protein sequences. The results are very close to the real structures obtained by crystallography. Parallel genetic algorithms are also implemented.

  14. eF-seek: prediction of the functional sites of proteins by searching for similar electrostatic potential and molecular surface shape

    PubMed Central

    Kinoshita, Kengo; Murakami, Yoichi; Nakamura, Haruki

    2007-01-01

    We have developed a method to predict ligand-binding sites in a new protein structure by searching for similar binding sites in the Protein Data Bank (PDB). The similarities are measured according to the shapes of the molecular surfaces and their electrostatic potentials. A new web server, eF-seek, provides an interface to our search method. It simply requires a coordinate file in the PDB format, and generates a prediction result as a virtual complex structure, with the putative ligands in a PDB format file as the output. In addition, the predicted interacting interface is displayed to facilitate the examination of the virtual complex structure on our own applet viewer with the web browser (URL: http://eF-site.hgc.jp/eF-seek). PMID:17567616

  15. Predicting the Dynamics of Protein Abundance

    PubMed Central

    Mehdi, Ahmed M.; Patrick, Ralph; Bailey, Timothy L.; Bodén, Mikael

    2014-01-01

    Protein synthesis is finely regulated across all organisms, from bacteria to humans, and its integrity underpins many important processes. Emerging evidence suggests that the dynamic range of protein abundance is greater than that observed at the transcript level. Technological breakthroughs now mean that sequencing-based measurement of mRNA levels is routine, but protocols for measuring protein abundance remain both complex and expensive. This paper introduces a Bayesian network that integrates transcriptomic and proteomic data to predict protein abundance and to model the effects of its determinants. We aim to use this model to follow a molecular response over time, from condition-specific data, in order to understand adaptation during processes such as the cell cycle. With microarray data now available for many conditions, the general utility of a protein abundance predictor is broad. Whereas most quantitative proteomics studies have focused on higher organisms, we developed a predictive model of protein abundance for both Saccharomyces cerevisiae and Schizosaccharomyces pombe to explore the latitude at the protein level. Our predictor primarily relies on mRNA level, mRNA–protein interaction, mRNA folding energy and half-life, and tRNA adaptation. The combination of key features, allowing for the low certainty and uneven coverage of experimental observations, gives comparatively minor but robust prediction accuracy. The model substantially improved the analysis of protein regulation during the cell cycle: predicted protein abundance identified twice as many cell-cycle-associated proteins as experimental mRNA levels. Predicted protein abundance was more dynamic than observed mRNA expression, agreeing with experimental protein abundance from a human cell line. We illustrate how the same model can be used to predict the folding energy of mRNA when protein abundance is available, lending credence to the emerging view that mRNA folding affects translation

  16. Predicting protein dynamics from structural ensembles

    NASA Astrophysics Data System (ADS)

    Copperman, J.; Guenza, M. G.

    2015-12-01

    The biological properties of proteins are uniquely determined by their structure and dynamics. A protein in solution populates a structural ensemble of metastable configurations around the global fold. From overall rotation to local fluctuations, the dynamics of proteins can cover several orders of magnitude in time scales. We propose a simulation-free coarse-grained approach which utilizes knowledge of the important metastable folded states of the protein to predict the protein dynamics. This approach is based upon the Langevin Equation for Protein Dynamics (LE4PD), a Langevin formalism in the coordinates of the protein backbone. The linear modes of this Langevin formalism organize the fluctuations of the protein, so that more extended dynamical cooperativity relates to increasing energy barriers to mode diffusion. The accuracy of the LE4PD is verified by analyzing the predicted dynamics across a set of seven different proteins for which both relaxation data and NMR solution structures are available. Using experimental NMR conformers as the input structural ensembles, LE4PD predicts quantitatively accurate results, with correlation coefficient ρ = 0.93 to NMR backbone relaxation measurements for the seven proteins. The NMR solution structure derived ensemble and predicted dynamical relaxation is compared with molecular dynamics simulation-derived structural ensembles and LE4PD predictions and is consistent in the time scale of the simulations. The use of the experimental NMR conformers frees the approach from computationally demanding simulations.

  17. Scoring function to predict solubility mutagenesis

    PubMed Central

    2010-01-01

    Background Mutagenesis is commonly used to engineer proteins with desirable properties not present in the wild type (WT) protein, such as increased or decreased stability, reactivity, or solubility. Experimentalists often have to choose a small subset of mutations from a large number of candidates to obtain the desired change, and computational techniques are invaluable to make the choices. While several such methods have been proposed to predict stability and reactivity mutagenesis, solubility has not received much attention. Results We use concepts from computational geometry to define a three body scoring function that predicts the change in protein solubility due to mutations. The scoring function captures both sequence and structure information. By exploring the literature, we have assembled a substantial database of 137 single- and multiple-point solubility mutations. Our database is the largest such collection with structural information known so far. We optimize the scoring function using linear programming (LP) methods to derive its weights based on training. Starting with default values of 1, we find weights in the range [0,2] so that predictions of increase or decrease in solubility are optimized. We compare the LP method to the standard machine learning techniques of support vector machines (SVM) and the Lasso. Using statistics for leave-one-out (LOO), 10-fold, and 3-fold cross validations (CV) for training and prediction, we demonstrate that the LP method performs the best overall. For the LOOCV, the LP method has an overall accuracy of 81%. Availability Executables of programs, tables of weights, and datasets of mutants are available from the following web page: http://www.wsu.edu/~kbala/OptSolMut.html. PMID:20929563

  18. Functional annotation of hypothetical proteins - A review.

    PubMed

    Sivashankari, Selvarajan; Shanmughavel, Piramanayagam

    2006-12-29

    The complete human genome sequences in the public database provide ways to understand the blue print of life. As of June 29, 2006, 27 archaeal, 326 bacterial and 21 eukaryotes is complete genomes are available and the sequencing for 316 bacterial, 24 archaeal, 126 eukaryotic genomes are in progress. The traditional biochemical/molecular experiments can assign accurate functions for genes in these genomes. However, the process is time-consuming and costly. Despite several efforts, only 50-60 % of genes have been annotated in most completely sequenced genomes. Automated genome sequence analysis and annotation may provide ways to understand genomes. Thus, determination of protein function is one of the challenging problems of the post-genome era. This demands bioinformatics to predict functions of un-annotated protein sequences by developing efficient tools. Here, we discuss some of the recent and popular approaches developed in Bioinformatics to predict functions for hypothetical proteins.

  19. The MULTICOM protein tertiary structure prediction system.

    PubMed

    Li, Jilong; Bhattacharya, Debswapna; Cao, Renzhi; Adhikari, Badri; Deng, Xin; Eickholt, Jesse; Cheng, Jianlin

    2014-01-01

    With the expansion of genomics and proteomics data aided by the rapid progress of next-generation sequencing technologies, computational prediction of protein three-dimensional structure is an essential part of modern structural genomics initiatives. Prediction of protein structure through understanding of the theories behind protein sequence-structure relationship, however, remains one of the most challenging problems in contemporary life sciences. Here, we describe MULTICOM, a multi-level combination technique, intended to predict moderate- to high-resolution structure of a protein through a novel approach of combining multiple sources of complementary information derived from the experimentally solved protein structures in the Protein Data Bank. The MULTICOM web server is freely available at http://sysbio.rnet.missouri.edu/multicom_toolbox/.

  20. Structure prediction of magnetosome-associated proteins.

    PubMed

    Nudelman, Hila; Zarivach, Raz

    2014-01-01

    Magnetotactic bacteria (MTB) are Gram-negative bacteria that can navigate along geomagnetic fields. This ability is a result of a unique intracellular organelle, the magnetosome. These organelles are composed of membrane-enclosed magnetite (Fe3O4) or greigite (Fe3S4) crystals ordered into chains along the cell. Magnetosome formation, assembly, and magnetic nano-crystal biomineralization are controlled by magnetosome-associated proteins (MAPs). Most MAP-encoding genes are located in a conserved genomic region - the magnetosome island (MAI). The MAI appears to be conserved in all MTB that were analyzed so far, although the MAI size and organization differs between species. It was shown that MAI deletion leads to a non-magnetic phenotype, further highlighting its important role in magnetosome formation. Today, about 28 proteins are known to be involved in magnetosome formation, but the structures and functions of most MAPs are unknown. To reveal the structure-function relationship of MAPs we used bioinformatics tools in order to build homology models as a way to understand their possible role in magnetosome formation. Here we present a predicted 3D structural models' overview for all known Magnetospirillum gryphiswaldense strain MSR-1 MAPs.

  1. Predicting Protein-Protein Interactions from the Molecular to the Proteome Level.

    PubMed

    Keskin, Ozlem; Tuncbag, Nurcan; Gursoy, Attila

    2016-04-27

    Identification of protein-protein interactions (PPIs) is at the center of molecular biology considering the unquestionable role of proteins in cells. Combinatorial interactions result in a repertoire of multiple functions; hence, knowledge of PPI and binding regions naturally serve to functional proteomics and drug discovery. Given experimental limitations to find all interactions in a proteome, computational prediction/modeling of protein interactions is a prerequisite to proceed on the way to complete interactions at the proteome level. This review aims to provide a background on PPIs and their types. Computational methods for PPI predictions can use a variety of biological data including sequence-, evolution-, expression-, and structure-based data. Physical and statistical modeling are commonly used to integrate these data and infer PPI predictions. We review and list the state-of-the-art methods, servers, databases, and tools for protein-protein interaction prediction. PMID:27074302

  2. Phosphorylation and glycosylation interplay: protein modifications at hydroxy amino acids and prediction of signaling functions of the human beta3 integrin family.

    PubMed

    Ahmad, Ishtiaq; Hoessli, Daniel C; Walker-Nasir, Evelyne; Choudhary, M Iqbal; Rafik, Saleem M; Shakoori, Abdul Rauf

    2006-10-15

    Protein functions are determined by their three-dimensional structures and the folded 3-D structure is in turn governed by the primary structure and post-translational modifications the protein undergoes during synthesis and transport. Defining protein functions in vivo in the cellular and extracellular environments is made very difficult in the presence of other molecules. However, the modifications taking place during and after protein folding are determined by the modification potential of amino acids and not by the primary structure or sequence. These post-translational modifications, like phosphorylation and O-linked N-acetylglucosamine (O-GlcNAc) modifications, are dynamic and result in temporary conformational changes that regulate many functions of the protein. Computer-assisted studies can help determining protein functions by assessing the modification potentials of a given protein. Integrins are important membrane receptors involved in bi-directional (outside-in and inside-out) signaling events. The beta3 integrin family, including, alpha(IIb)beta3 and alpha(v)beta3, has been studied for its role in platelet aggregation during clot formation and clot retraction based on hydroxyl group modification by phosphate and GlcNAc on Ser, Thr, or Tyr and their interplay on Ser and Thr in the cytoplasmic domain of the beta3 subunit. An antagonistic role of phosphate and GlcNAc interplay at Thr758 for controlling both inside-out and outside-in signaling events is proposed. Additionally, interplay of GlcNAc and phosphate at Ser752 has been proposed to control activation and inactivation of integrin-associated Src kinases. This study describes the multifunctional behavior of integrins based on their modification potential at hydroxyl groups of amino acids as a source of interplay.

  3. Signature Product Code for Predicting Protein-Protein Interactions

    SciTech Connect

    Martin, Shawn B.; Brown, William M.

    2004-09-25

    The SigProdV1.0 software consists of four programs which together allow the prediction of protein-protein interactions using only amino acid sequences and experimental data. The software is based on the use of tensor products of amino acid trimers coupled with classifiers known as support vector machines. Essentially the program looks for amino acid trimer pairs which occur more frequently in protein pairs which are known to interact. These trimer pairs are then used to make predictions about unknown protein pairs. A detailed description of the method can be found in the paper: S. Martin, D. Roe, J.L. Faulon. "Predicting protein-protein interactions using signature products," Bioinformatics, available online from Advance Access, Aug. 19, 2004.

  4. Assigning protein functions by comparative genome analysis protein phylogenetic profiles

    DOEpatents

    Pellegrini, Matteo; Marcotte, Edward M.; Thompson, Michael J.; Eisenberg, David; Grothe, Robert; Yeates, Todd O.

    2003-05-13

    A computational method system, and computer program are provided for inferring functional links from genome sequences. One method is based on the observation that some pairs of proteins A' and B' have homologs in another organism fused into a single protein chain AB. A trans-genome comparison of sequences can reveal these AB sequences, which are Rosetta Stone sequences because they decipher an interaction between A' and B. Another method compares the genomic sequence of two or more organisms to create a phylogenetic profile for each protein indicating its presence or absence across all the genomes. The profile provides information regarding functional links between different families of proteins. In yet another method a combination of the above two methods is used to predict functional links.

  5. Predicting the fission yeast protein interaction network.

    PubMed

    Pancaldi, Vera; Saraç, Omer S; Rallis, Charalampos; McLean, Janel R; Převorovský, Martin; Gould, Kathleen; Beyer, Andreas; Bähler, Jürg

    2012-04-01

    A systems-level understanding of biological processes and information flow requires the mapping of cellular component interactions, among which protein-protein interactions are particularly important. Fission yeast (Schizosaccharomyces pombe) is a valuable model organism for which no systematic protein-interaction data are available. We exploited gene and protein properties, global genome regulation datasets, and conservation of interactions between budding and fission yeast to predict fission yeast protein interactions in silico. We have extensively tested our method in three ways: first, by predicting with 70-80% accuracy a selected high-confidence test set; second, by recapitulating interactions between members of the well-characterized SAGA co-activator complex; and third, by verifying predicted interactions of the Cbf11 transcription factor using mass spectrometry of TAP-purified protein complexes. Given the importance of the pathway in cell physiology and human disease, we explore the predicted sub-networks centered on the Tor1/2 kinases. Moreover, we predict the histidine kinases Mak1/2/3 to be vital hubs in the fission yeast stress response network, and we suggest interactors of argonaute 1, the principal component of the siRNA-mediated gene silencing pathway, lost in budding yeast but preserved in S. pombe. Of the new high-quality interactions that were discovered after we started this work, 73% were found in our predictions. Even though any predicted interactome is imperfect, the protein network presented here can provide a valuable basis to explore biological processes and to guide wet-lab experiments in fission yeast and beyond. Our predicted protein interactions are freely available through PInt, an online resource on our website (www.bahlerlab.info/PInt).

  6. Blind predictions of protein interfaces by docking calculations in CAPRI.

    PubMed

    Lensink, Marc F; Wodak, Shoshana J

    2010-11-15

    Reliable prediction of the amino acid residues involved in protein-protein interfaces can provide valuable insight into protein function, and inform mutagenesis studies, and drug design applications. A fast-growing number of methods are being proposed for predicting protein interfaces, using structural information, energetic criteria, or sequence conservation or by integrating multiple criteria and approaches. Overall however, their performance remains limited, especially when applied to nonobligate protein complexes, where the individual components are also stable on their own. Here, we evaluate interface predictions derived from protein-protein docking calculations. To this end we measure the overlap between the interfaces in models of protein complexes submitted by 76 participants in CAPRI (Critical Assessment of Predicted Interactions) and those of 46 observed interfaces in 20 CAPRI targets corresponding to nonobligate complexes. Our evaluation considers multiple models for each target interface, submitted by different participants, using a variety of docking methods. Although this results in a substantial variability in the prediction performance across participants and targets, clear trends emerge. Docking methods that perform best in our evaluation predict interfaces with average recall and precision levels of about 60%, for a small majority (60%) of the analyzed interfaces. These levels are significantly higher than those obtained for nonobligate complexes by most extant interface prediction methods. We find furthermore that a sizable fraction (24%) of the interfaces in models ranked as incorrect in the CAPRI assessment are actually correctly predicted (recall and precision ≥50%), and that these models contribute to 70% of the correct docking-based interface predictions overall. Our analysis proves that docking methods are much more successful in identifying interfaces than in predicting complexes, and suggests that these methods have an excellent

  7. Proteins and Their Interacting Partners: An Introduction to Protein-Ligand Binding Site Prediction Methods.

    PubMed

    Roche, Daniel Barry; Brackenridge, Danielle Allison; McGuffin, Liam James

    2015-12-15

    Elucidating the biological and biochemical roles of proteins, and subsequently determining their interacting partners, can be difficult and time consuming using in vitro and/or in vivo methods, and consequently the majority of newly sequenced proteins will have unknown structures and functions. However, in silico methods for predicting protein-ligand binding sites and protein biochemical functions offer an alternative practical solution. The characterisation of protein-ligand binding sites is essential for investigating new functional roles, which can impact the major biological research spheres of health, food, and energy security. In this review we discuss the role in silico methods play in 3D modelling of protein-ligand binding sites, along with their role in predicting biochemical functionality. In addition, we describe in detail some of the key alternative in silico prediction approaches that are available, as well as discussing the Critical Assessment of Techniques for Protein Structure Prediction (CASP) and the Continuous Automated Model EvaluatiOn (CAMEO) projects, and their impact on developments in the field. Furthermore, we discuss the importance of protein function prediction methods for tackling 21st century problems.

  8. Characterization and Prediction of Protein Flexibility Based on Structural Alphabets

    PubMed Central

    Liu, Bin

    2016-01-01

    Motivation. To assist efforts in determining and exploring the functional properties of proteins, it is desirable to characterize and predict protein flexibilities. Results. In this study, the conformational entropy is used as an indicator of the protein flexibility. We first explore whether the conformational change can capture the protein flexibility. The well-defined decoy structures are converted into one-dimensional series of letters from a structural alphabet. Four different structure alphabets, including the secondary structure in 3-class and 8-class, the PB structure alphabet (16-letter), and the DW structure alphabet (28-letter), are investigated. The conformational entropy is then calculated from the structure alphabet letters. Some of the proteins show high correlation between the conformation entropy and the protein flexibility. We then predict the protein flexibility from basic amino acid sequence. The local structures are predicted by the dual-layer model and the conformational entropy of the predicted class distribution is then calculated. The results show that the conformational entropy is a good indicator of the protein flexibility, but false positives remain a problem. The DW structure alphabet performs the best, which means that more subtle local structures can be captured by large number of structure alphabet letters. Overall this study provides a simple and efficient method for the characterization and prediction of the protein flexibility. PMID:27660756

  9. Characterization and Prediction of Protein Flexibility Based on Structural Alphabets

    PubMed Central

    Liu, Bin

    2016-01-01

    Motivation. To assist efforts in determining and exploring the functional properties of proteins, it is desirable to characterize and predict protein flexibilities. Results. In this study, the conformational entropy is used as an indicator of the protein flexibility. We first explore whether the conformational change can capture the protein flexibility. The well-defined decoy structures are converted into one-dimensional series of letters from a structural alphabet. Four different structure alphabets, including the secondary structure in 3-class and 8-class, the PB structure alphabet (16-letter), and the DW structure alphabet (28-letter), are investigated. The conformational entropy is then calculated from the structure alphabet letters. Some of the proteins show high correlation between the conformation entropy and the protein flexibility. We then predict the protein flexibility from basic amino acid sequence. The local structures are predicted by the dual-layer model and the conformational entropy of the predicted class distribution is then calculated. The results show that the conformational entropy is a good indicator of the protein flexibility, but false positives remain a problem. The DW structure alphabet performs the best, which means that more subtle local structures can be captured by large number of structure alphabet letters. Overall this study provides a simple and efficient method for the characterization and prediction of the protein flexibility.

  10. Chemical shift prediction for denatured proteins.

    PubMed

    Prestegard, James H; Sahu, Sarata C; Nkari, Wendy K; Morris, Laura C; Live, David; Gruta, Christian

    2013-02-01

    While chemical shift prediction has played an important role in aspects of protein NMR that include identification of secondary structure, generation of torsion angle constraints for structure determination, and assignment of resonances in spectra of intrinsically disordered proteins, interest has arisen more recently in using it in alternate assignment strategies for crosspeaks in (1)H-(15)N HSQC spectra of sparsely labeled proteins. One such approach involves correlation of crosspeaks in the spectrum of the native protein with those observed in the spectrum of the denatured protein, followed by assignment of the peaks in the latter spectrum. As in the case of disordered proteins, predicted chemical shifts can aid in these assignments. Some previously developed empirical formulas for chemical shift prediction have depended on basis data sets of 20 pentapeptides. In each case the central residue was varied among the 20 amino common acids, with the flanking residues held constant throughout the given series. However, previous choices of solvent conditions and flanking residues make the parameters in these formulas less than ideal for general application to denatured proteins. Here, we report (1)H and (15)N shifts for a set of alanine based pentapeptides under the low pH urea denaturing conditions that are more appropriate for sparse label assignments. New parameters have been derived and a Perl script was created to facilitate comparison with other parameter sets. A small, but significant, improvement in shift predictions for denatured ubiquitin is demonstrated.

  11. GSAFold: a new application of GSA to protein structure prediction.

    PubMed

    Melo, Marcelo C R; Bernardi, Rafael C; Fernandes, Tácio V A; Pascutti, Pedro G

    2012-08-01

    The folding process defines three-dimensional protein structures from their amino acid chains. A protein's structure determines its activity and properties; thus knowing such conformation on an atomic level is essential for both basic and applied studies of protein function and dynamics. However, the acquisition of such structures by experimental methods is slow and expensive, and current computational methods mostly depend on previously known structures to determine new ones. Here we present a new software called GSAFold that applies the generalized simulated annealing (GSA) algorithm on ab initio protein structure prediction. The GSA is a stochastic search algorithm employed in energy minimization and used in global optimization problems, especially those that depend on long-range interactions, such as gravity models and conformation optimization of small molecules. This new implementation applies, for the first time in ab initio protein structure prediction, an analytical inverse for the Visitation function of GSA. It also employs the broadly used NAMD Molecular Dynamics package to carry out energy calculations, allowing the user to select different force fields and parameterizations. Moreover, the software also allows the execution of several simulations simultaneously. Applications that depend on protein structures include rational drug design and structure-based protein function prediction. Applying GSAFold in a test peptide, it was possible to predict the structure of mastoparan-X to a root mean square deviation of 3.00 Å. PMID:22622959

  12. Signature Product Code for Predicting Protein-Protein Interactions

    2004-09-25

    The SigProdV1.0 software consists of four programs which together allow the prediction of protein-protein interactions using only amino acid sequences and experimental data. The software is based on the use of tensor products of amino acid trimers coupled with classifiers known as support vector machines. Essentially the program looks for amino acid trimer pairs which occur more frequently in protein pairs which are known to interact. These trimer pairs are then used to make predictionsmore » about unknown protein pairs. A detailed description of the method can be found in the paper: S. Martin, D. Roe, J.L. Faulon. "Predicting protein-protein interactions using signature products," Bioinformatics, available online from Advance Access, Aug. 19, 2004.« less

  13. Genome-wide protein-protein interactions and protein function exploration in cyanobacteria.

    PubMed

    Lv, Qi; Ma, Weimin; Liu, Hui; Li, Jiang; Wang, Huan; Lu, Fang; Zhao, Chen; Shi, Tieliu

    2015-10-22

    Genome-wide network analysis is well implemented to study proteins of unknown function. Here, we effectively explored protein functions and the biological mechanism based on inferred high confident protein-protein interaction (PPI) network in cyanobacteria. We integrated data from seven different sources and predicted 1,997 PPIs, which were evaluated by experiments in molecular mechanism, text mining of literatures in proved direct/indirect evidences, and "interologs" in conservation. Combined the predicted PPIs with known PPIs, we obtained 4,715 no-redundant PPIs (involving 3,231 proteins covering over 90% of genome) to generate the PPI network. Based on the PPI network, terms in Gene ontology (GO) were assigned to function-unknown proteins. Functional modules were identified by dissecting the PPI network into sub-networks and analyzing pathway enrichment, with which we investigated novel function of underlying proteins in protein complexes and pathways. Examples of photosynthesis and DNA repair indicate that the network approach is a powerful tool in protein function analysis. Overall, this systems biology approach provides a new insight into posterior functional analysis of PPIs in cyanobacteria.

  14. Genome-wide protein-protein interactions and protein function exploration in cyanobacteria

    PubMed Central

    Lv, Qi; Ma, Weimin; Liu, Hui; Li, Jiang; Wang, Huan; Lu, Fang; Zhao, Chen; Shi, Tieliu

    2015-01-01

    Genome-wide network analysis is well implemented to study proteins of unknown function. Here, we effectively explored protein functions and the biological mechanism based on inferred high confident protein-protein interaction (PPI) network in cyanobacteria. We integrated data from seven different sources and predicted 1,997 PPIs, which were evaluated by experiments in molecular mechanism, text mining of literatures in proved direct/indirect evidences, and “interologs” in conservation. Combined the predicted PPIs with known PPIs, we obtained 4,715 no-redundant PPIs (involving 3,231 proteins covering over 90% of genome) to generate the PPI network. Based on the PPI network, terms in Gene ontology (GO) were assigned to function-unknown proteins. Functional modules were identified by dissecting the PPI network into sub-networks and analyzing pathway enrichment, with which we investigated novel function of underlying proteins in protein complexes and pathways. Examples of photosynthesis and DNA repair indicate that the network approach is a powerful tool in protein function analysis. Overall, this systems biology approach provides a new insight into posterior functional analysis of PPIs in cyanobacteria. PMID:26490033

  15. Reduced alphabet for protein folding prediction.

    PubMed

    Huang, Jitao T; Wang, Titi; Huang, Shanran R; Li, Xin

    2015-04-01

    What are the key building blocks that would have been needed to construct complex protein folds? This is an important issue for understanding protein folding mechanism and guiding de novo protein design. Twenty naturally occurring amino acids and eight secondary structures consist of a 28-letter alphabet to determine folding kinetics and mechanism. Here we predict folding kinetic rates of proteins from many reduced alphabets. We find that a reduced alphabet of 10 letters achieves good correlation with folding rates, close to the one achieved by full 28-letter alphabet. Many other reduced alphabets are not significantly correlated to folding rates. The finding suggests that not all amino acids and secondary structures are equally important for protein folding. The foldable sequence of a protein could be designed using at least 10 folding units, which can either promote or inhibit protein folding. Reducing alphabet cardinality without losing key folding kinetic information opens the door to potentially faster machine learning and data mining applications in protein structure prediction, sequence alignment and protein design.

  16. Using protein binding site prediction to improve protein docking.

    PubMed

    Huang, Bingding; Schroeder, Michael

    2008-10-01

    Predicting protein interaction interfaces and protein complexes are two important related problems. For interface prediction, there are a number of tools, such as PPI-Pred, PPISP, PINUP, Promate, and SPPIDER, which predict enzyme-inhibitor interfaces with success rates of 23% to 55% and other interfaces with 10% to 28% on a benchmark dataset of 62 complexes. Here, we develop, metaPPI, a meta server for interface prediction. It significantly improves prediction success rates to 70% for enzyme-inhibitor and 44% for other interfaces. As shown with Promate, predicted interfaces can be used to improve protein docking. Here, we follow this idea using the meta server instead of individual predictions. We confirm that filtering with predicted interfaces significantly improves candidate generation in rigid-body docking based on shape complementarity. Finally, we show that the initial ranking of candidate solutions in rigid-body docking can be further improved for the class of enzyme-inhibitor complexes by a geometrical scoring which rewards deep pockets. A web server of metaPPI is available at scoppi.tu-dresden.de/metappi. The source code of our docking algorithm BDOCK is also available at www.biotec.tu-dresden.de /approximately bhuang/bdock.

  17. [Study of decision tree in the application of predicting protein-protein interactions].

    PubMed

    Guo, Xiaolong; Jiang, Yan; Qui, Lu

    2013-10-01

    Proteins are the final executive actor of cell viability and function. Protein-protein interactions determine the complexity of the organism. Research on the protein interactions can help us understand the function of the protein at the molecular level, learn the cell growth, development, differentiation, apoptosis and understand biological regulation mechanisms and other activities. They are essential for understanding the pathologies of diseases and helpful in the prevention and treatment of diseases, as well as in the development of new drugs. In this paper, we employ the single decision-tree classification model to predict protein-protein interactions in the yeast. The original data came from the existing literature. Using software Clementine, this paper analyzes how these attributes affect the accuracy of the model by adjusting the predicted attributes. The result shows that a single decision tree is a good classification model and it has higher accuracy compared to those in the previous researches.

  18. Computational Prediction of Effector Proteins in Fungi: Opportunities and Challenges.

    PubMed

    Sonah, Humira; Deshmukh, Rupesh K; Bélanger, Richard R

    2016-01-01

    Effector proteins are mostly secretory proteins that stimulate plant infection by manipulating the host response. Identifying fungal effector proteins and understanding their function is of great importance in efforts to curb losses to plant diseases. Recent advances in high-throughput sequencing technologies have facilitated the availability of several fungal genomes and 1000s of transcriptomes. As a result, the growing amount of genomic information has provided great opportunities to identify putative effector proteins in different fungal species. There is little consensus over the annotation and functionality of effector proteins, and mostly small secretory proteins are considered as effector proteins, a concept that tends to overestimate the number of proteins involved in a plant-pathogen interaction. With the characterization of Avr genes, criteria for computational prediction of effector proteins are becoming more efficient. There are 100s of tools available for the identification of conserved motifs, signature sequences and structural features in the proteins. Many pipelines and online servers, which combine several tools, are made available to perform genome-wide identification of effector proteins. In this review, available tools and pipelines, their strength and limitations for effective identification of fungal effector proteins are discussed. We also present an exhaustive list of classically secreted proteins along with their key conserved motifs found in 12 common plant pathogens (11 fungi and one oomycete) through an analytical pipeline. PMID:26904083

  19. Computational Prediction of Effector Proteins in Fungi: Opportunities and Challenges

    PubMed Central

    Sonah, Humira; Deshmukh, Rupesh K.; Bélanger, Richard R.

    2016-01-01

    Effector proteins are mostly secretory proteins that stimulate plant infection by manipulating the host response. Identifying fungal effector proteins and understanding their function is of great importance in efforts to curb losses to plant diseases. Recent advances in high-throughput sequencing technologies have facilitated the availability of several fungal genomes and 1000s of transcriptomes. As a result, the growing amount of genomic information has provided great opportunities to identify putative effector proteins in different fungal species. There is little consensus over the annotation and functionality of effector proteins, and mostly small secretory proteins are considered as effector proteins, a concept that tends to overestimate the number of proteins involved in a plant–pathogen interaction. With the characterization of Avr genes, criteria for computational prediction of effector proteins are becoming more efficient. There are 100s of tools available for the identification of conserved motifs, signature sequences and structural features in the proteins. Many pipelines and online servers, which combine several tools, are made available to perform genome-wide identification of effector proteins. In this review, available tools and pipelines, their strength and limitations for effective identification of fungal effector proteins are discussed. We also present an exhaustive list of classically secreted proteins along with their key conserved motifs found in 12 common plant pathogens (11 fungi and one oomycete) through an analytical pipeline. PMID:26904083

  20. 3D-Fun: predicting enzyme function from structure.

    PubMed

    von Grotthuss, Marcin; Plewczynski, Dariusz; Vriend, Gert; Rychlewski, Leszek

    2008-07-01

    The 'omics' revolution is causing a flurry of data that all needs to be annotated for it to become useful. Sequences of proteins of unknown function can be annotated with a putative function by comparing them with proteins of known function. This form of annotation is typically performed with BLAST or similar software. Structural genomics is nowadays also bringing us three dimensional structures of proteins with unknown function. We present here software that can be used when sequence comparisons fail to determine the function of a protein with known structure but unknown function. The software, called 3D-Fun, is implemented as a server that runs at several European institutes and is freely available for everybody at all these sites. The 3D-Fun servers accept protein coordinates in the standard PDB format and compare them with all known protein structures by 3D structural superposition using the 3D-Hit software. If structural hits are found with proteins with known function, these are listed together with their function and some vital comparison statistics. This is conceptually very similar in 3D to what BLAST does in 1D. Additionally, the superposition results are displayed using interactive graphics facilities. Currently, the 3D-Fun system only predicts enzyme function but an expanded version with Gene Ontology predictions will be available soon. The server can be accessed at http://3dfun.bioinfo.pl/ or at http://3dfun.cmbi.ru.nl/.

  1. Functions of S100 Proteins

    PubMed Central

    Donato, R.; Cannon, B.R.; Sorci, G.; Riuzzi, F.; Hsu, K.; Weber, D.J.; Geczy, C.L.

    2013-01-01

    The S100 protein family consists of 24 members functionally distributed into three main subgroups: those that only exert intracellular regulatory effects, those with intracellular and extracellular functions and those which mainly exert extracellular regulatory effects. S100 proteins are only expressed in vertebrates and show cell-specific expression patterns. In some instances, a particular S100 protein can be induced in pathological circumstances in a cell type that does not express it in normal physiological conditions. Within cells, S100 proteins are involved in aspects of regulation of proliferation, differentiation, apoptosis, Ca2+ homeostasis, energy metabolism, inflammation and migration/invasion through interactions with a variety of target proteins including enzymes, cytoskeletal subunits, receptors, transcription factors and nucleic acids. Some S100 proteins are secreted or released and regulate cell functions in an autocrine and paracrine manner via activation of surface receptors (e.g. the receptor for advanced glycation end-products and toll-like receptor 4), G-protein-coupled receptors, scavenger receptors, or heparan sulfate proteoglycans and N-glycans. Extracellular S100A4 and S100B also interact with epidermal growth factor and basic fibroblast growth factor, respectively, thereby enhancing the activity of the corresponding receptors. Thus, extracellular S100 proteins exert regulatory activities on monocytes/macrophages/microglia, neutrophils, lymphocytes, mast cells, articular chondrocytes, endothelial and vascular smooth muscle cells, neurons, astrocytes, Schwann cells, epithelial cells, myoblasts and cardiomyocytes, thereby participating in innate and adaptive immune responses, cell migration and chemotaxis, tissue development and repair, and leukocyte and tumor cell invasion. PMID:22834835

  2. Progress and challenges in predicting protein interfaces

    PubMed Central

    Krawczyk, Konrad; Knapp, Bernhard; Nebel, Jean-Christophe; Deane, Charlotte M.

    2016-01-01

    The majority of biological processes are mediated via protein–protein interactions. Determination of residues participating in such interactions improves our understanding of molecular mechanisms and facilitates the development of therapeutics. Experimental approaches to identifying interacting residues, such as mutagenesis, are costly and time-consuming and thus, computational methods for this purpose could streamline conventional pipelines. Here we review the field of computational protein interface prediction. We make a distinction between methods which address proteins in general and those targeted at antibodies, owing to the radically different binding mechanism of antibodies. We organize the multitude of currently available methods hierarchically based on required input and prediction principles to provide an overview of the field. PMID:25971595

  3. Genome-wide Membrane Protein Structure Prediction

    PubMed Central

    Piccoli, Stefano; Suku, Eda; Garonzi, Marianna; Giorgetti, Alejandro

    2013-01-01

    Transmembrane proteins allow cells to extensively communicate with the external world in a very accurate and specific way. They form principal nodes in several signaling pathways and attract large interest in therapeutic intervention, as the majority pharmaceutical compounds target membrane proteins. Thus, according to the current genome annotation methods, a detailed structural/functional characterization at the protein level of each of the elements codified in the genome is also required. The extreme difficulty in obtaining high-resolution three-dimensional structures, calls for computational approaches. Here we review to which extent the efforts made in the last few years, combining the structural characterization of membrane proteins with protein bioinformatics techniques, could help describing membrane proteins at a genome-wide scale. In particular we analyze the use of comparative modeling techniques as a way of overcoming the lack of high-resolution three-dimensional structures in the human membrane proteome. PMID:24403851

  4. Computational Prediction of Protein–Protein Interaction Networks: Algo-rithms and Resources

    PubMed Central

    Zahiri, Javad; Bozorgmehr, Joseph Hannon; Masoudi-Nejad, Ali

    2013-01-01

    Protein interactions play an important role in the discovery of protein functions and pathways in biological processes. This is especially true in case of the diseases caused by the loss of specific protein-protein interactions in the organism. The accuracy of experimental results in finding protein-protein interactions, however, is rather dubious and high throughput experimental results have shown both high false positive beside false negative information for protein interaction. Computational methods have attracted tremendous attention among biologists because of the ability to predict protein-protein interactions and validate the obtained experimental results. In this study, we have reviewed several computational methods for protein-protein interaction prediction as well as describing major databases, which store both predicted and detected protein-protein interactions, and the tools used for analyzing protein interaction networks and improving protein-protein interaction reliability. PMID:24396273

  5. Functional Classification of Immune Regulatory Proteins

    SciTech Connect

    Rubinstein, Rotem; Ramagopal, Udupi A.; Nathenson, Stanley G.; Almo, Steven C.; Fiser, Andras

    2013-05-01

    Members of the immunoglobulin superfamily (IgSF) control innate and adaptive immunity and are prime targets for the treatment of autoimmune diseases, infectious diseases, and malignancies. We describe a computational method, termed the Brotherhood algorithm, which utilizes intermediate sequence information to classify proteins into functionally related families. This approach identifies functional relationships within the IgSF and predicts additional receptor-ligand interactions. As a specific example, we examine the nectin/nectin-like family of cell adhesion and signaling proteins and propose receptor-ligand interactions within this family. We were guided by the Brotherhood approach and present the high-resolution structural characterization of a homophilic interaction involving the class-I MHC-restricted T-cell-associated molecule, which we now classify as a nectin-like family member. The Brotherhood algorithm is likely to have a significant impact on structural immunology by identifying those proteins and complexes for which structural characterization will be particularly informative.

  6. How special is the biochemical function of native proteins?

    PubMed

    Skolnick, Jeffrey; Gao, Mu; Zhou, Hongyi

    2016-01-01

    Native proteins perform an amazing variety of biochemical functions, including enzymatic catalysis, and can engage in protein-protein and protein-DNA interactions that are essential for life. A key question is how special are these functional properties of proteins. Are they extremely rare, or are they an intrinsic feature? Comparison to the properties of compact conformations of artificially generated compact protein structures selected for thermodynamic stability but not any type of function, the artificial (ART) protein library, demonstrates that a remarkable number of the properties of native-like proteins are recapitulated. These include the complete set of small molecule ligand-binding pockets and most protein-protein interfaces. ART structures are predicted to be capable of weakly binding metabolites and cover a significant fraction of metabolic pathways, with the most enriched pathways including ancient ones such as glycolysis. Native-like active sites are also found in ART proteins. A small fraction of ART proteins are predicted to have strong protein-protein and protein-DNA interactions. Overall, it appears that biochemical function is an intrinsic feature of proteins which nature has significantly optimized during evolution. These studies raise questions as to the relative roles of specificity and promiscuity in the biochemical function and control of cells that need investigation.

  7. How special is the biochemical function of native proteins?

    PubMed Central

    Skolnick, Jeffrey; Gao, Mu; Zhou, Hongyi

    2016-01-01

    Native proteins perform an amazing variety of biochemical functions, including enzymatic catalysis, and can engage in protein-protein and protein-DNA interactions that are essential for life. A key question is how special are these functional properties of proteins. Are they extremely rare, or are they an intrinsic feature? Comparison to the properties of compact conformations of artificially generated compact protein structures selected for thermodynamic stability but not any type of function, the artificial (ART) protein library, demonstrates that a remarkable number of the properties of native-like proteins are recapitulated. These include the complete set of small molecule ligand-binding pockets and most protein-protein interfaces. ART structures are predicted to be capable of weakly binding metabolites and cover a significant fraction of metabolic pathways, with the most enriched pathways including ancient ones such as glycolysis. Native-like active sites are also found in ART proteins. A small fraction of ART proteins are predicted to have strong protein-protein and protein-DNA interactions. Overall, it appears that biochemical function is an intrinsic feature of proteins which nature has significantly optimized during evolution. These studies raise questions as to the relative roles of specificity and promiscuity in the biochemical function and control of cells that need investigation. PMID:26962440

  8. PREDICTION OF NONLINEAR SPATIAL FUNCTIONALS. (R827257)

    EPA Science Inventory

    Spatial statistical methodology can be useful in the arena of environmental regulation. Some regulatory questions may be addressed by predicting linear functionals of the underlying signal, but other questions may require the prediction of nonlinear functionals of the signal. ...

  9. Learning Protein Folding Energy Functions

    PubMed Central

    Guan, Wei; Ozakin, Arkadas; Gray, Alexander; Borreguero, Jose; Pandit, Shashi; Jagielska, Anna; Wroblewska, Liliana; Skolnick, Jeffrey

    2014-01-01

    A critical open problem in ab initio protein folding is protein energy function design, which pertains to defining the energy of protein conformations in a way that makes folding most efficient and reliable. In this paper, we address this issue as a weight optimization problem and utilize a machine learning approach, learning-to-rank, to solve this problem. We investigate the ranking-via-classification approach, especially the RankingSVM method and compare it with the state-of-the-art approach to the problem using the MINUIT optimization package. To maintain the physicality of the results, we impose non-negativity constraints on the weights. For this we develop two efficient non-negative support vector machine (NNSVM) methods, derived from L2-norm SVM and L1-norm SVMs, respectively. We demonstrate an energy function which maintains the correct ordering with respect to structure dissimilarity to the native state more often, is more efficient and reliable for learning on large protein sets, and is qualitatively superior to the current state-of-the-art energy function. PMID:25311546

  10. Protein Structure Prediction with Evolutionary Algorithms

    SciTech Connect

    Hart, W.E.; Krasnogor, N.; Pelta, D.A.; Smith, J.

    1999-02-08

    Evolutionary algorithms have been successfully applied to a variety of molecular structure prediction problems. In this paper we reconsider the design of genetic algorithms that have been applied to a simple protein structure prediction problem. Our analysis considers the impact of several algorithmic factors for this problem: the confirmational representation, the energy formulation and the way in which infeasible conformations are penalized, Further we empirically evaluated the impact of these factors on a small set of polymer sequences. Our analysis leads to specific recommendations for both GAs as well as other heuristic methods for solving PSP on the HP model.

  11. MUFOLD: A new solution for protein 3D structure prediction

    PubMed Central

    Zhang, Jingfen; Wang, Qingguo; Barz, Bogdan; He, Zhiquan; Kosztin, Ioan; Shang, Yi; Xu, Dong

    2010-01-01

    There have been steady improvements in protein structure prediction during the past 2 decades. However, current methods are still far from consistently predicting structural models accurately with computing power accessible to common users. Toward achieving more accurate and efficient structure prediction, we developed a number of novel methods and integrated them into a software package, MUFOLD. First, a systematic protocol was developed to identify useful templates and fragments from Protein Data Bank for a given target protein. Then, an efficient process was applied for iterative coarse-grain model generation and evaluation at the Cα or backbone level. In this process, we construct models using interresidue spatial restraints derived from alignments by multidimensional scaling, evaluate and select models through clustering and static scoring functions, and iteratively improve the selected models by integrating spatial restraints and previous models. Finally, the full-atom models were evaluated using molecular dynamics simulations based on structural changes under simulated heating. We have continuously improved the performance of MUFOLD by using a benchmark of 200 proteins from the Astral database, where no template with >25% sequence identity to any target protein is included. The average root-mean-square deviation of the best models from the native structures is 4.28 Å, which shows significant and systematic improvement over our previous methods. The computing time of MUFOLD is much shorter than many other tools, such as Rosetta. MUFOLD demonstrated some success in the 2008 community-wide experiment for protein structure prediction CASP8. PMID:19927325

  12. Myeloperoxidase levels predict executive function.

    PubMed

    Haslacher, H; Perkmann, T; Lukas, I; Barth, A; Ponocny-Seliger, E; Michlmayr, M; Scheichenberger, V; Wagner, O; Winker, R

    2012-12-01

    The main purpose of the study was to investigate whether baseline myeloperoxidase (MPO) levels are associated with executive cognitive function in individuals with high physical activity. Baseline serum MPO levels of 56 elderly marathon runners and 58 controls were assessed by ELISA. Standardized tests were applied to survey domain-specific cognitive functions. Changes in brain morphology were visualized by magnetic resonance imaging (MRI). High baseline serum MPO levels correlated with worse outcome in tests assessing executive cognitive function in athletes but not in the control group (NAI maze test p<0.05, Trail Making Test ratio p<0.01). In control participants, subcortical white matter hyperintensities were associated with higher scores on the Geriatric Depression Scale (p<0.05), whereas athletes seem to be protected from this effect. During strenuous exercising, MPO as well as its educts may be elevated due to increased oxygen intake and excretion of pro-inflammatory mediators inducing host tissue damage via oxidative stress. This outweighs the potential benefits of physical activity on cognitive function.

  13. Myeloperoxidase levels predict executive function.

    PubMed

    Haslacher, H; Perkmann, T; Lukas, I; Barth, A; Ponocny-Seliger, E; Michlmayr, M; Scheichenberger, V; Wagner, O; Winker, R

    2012-12-01

    The main purpose of the study was to investigate whether baseline myeloperoxidase (MPO) levels are associated with executive cognitive function in individuals with high physical activity. Baseline serum MPO levels of 56 elderly marathon runners and 58 controls were assessed by ELISA. Standardized tests were applied to survey domain-specific cognitive functions. Changes in brain morphology were visualized by magnetic resonance imaging (MRI). High baseline serum MPO levels correlated with worse outcome in tests assessing executive cognitive function in athletes but not in the control group (NAI maze test p<0.05, Trail Making Test ratio p<0.01). In control participants, subcortical white matter hyperintensities were associated with higher scores on the Geriatric Depression Scale (p<0.05), whereas athletes seem to be protected from this effect. During strenuous exercising, MPO as well as its educts may be elevated due to increased oxygen intake and excretion of pro-inflammatory mediators inducing host tissue damage via oxidative stress. This outweighs the potential benefits of physical activity on cognitive function. PMID:22855218

  14. Improving structure-based function prediction using molecular dynamics

    PubMed Central

    Glazer, Dariya S.; Radmer, Randall J.; Altman, Russ B.

    2009-01-01

    Summary The number of molecules with solved three-dimensional structure but unknown function is increasing rapidly. Particularly problematic are novel folds with little detectable similarity to molecules of known function. Experimental assays can determine the functions of such molecules, but are time-consuming and expensive. Computational approaches can identify potential functional sites; however, these approaches generally rely on single static structures and do not use information about dynamics. In fact, structural dynamics can enhance function prediction: we coupled molecular dynamics simulations with structure-based function prediction algorithms that identify Ca2+ binding sites. When applied to 11 challenging proteins, both methods showed substantial improvement in performance, revealing 22 more sites in one case and 12 more in the other, with a modest increase in apparent false positives. Thus, we show that treating molecules as dynamic entities improves the performance of structure-based function prediction methods. PMID:19604472

  15. Bioinformatics pipeline for functional identification and characterization of proteins

    NASA Astrophysics Data System (ADS)

    Skarzyńska, Agnieszka; Pawełkowicz, Magdalena; Krzywkowski, Tomasz; Świerkula, Katarzyna; PlÄ der, Wojciech; Przybecki, Zbigniew

    2015-09-01

    The new sequencing methods, called Next Generation Sequencing gives an opportunity to possess a vast amount of data in short time. This data requires structural and functional annotation. Functional identification and characterization of predicted proteins could be done by in silico approches, thanks to a numerous computational tools available nowadays. However, there is a need to confirm the results of proteins function prediction using different programs and comparing the results or confirm experimentally. Here we present a bioinformatics pipeline for structural and functional annotation of proteins.

  16. Protein Markers Predict Survival in Glioma Patients.

    PubMed

    Stetson, Lindsay C; Dazard, Jean-Eudes; Barnholtz-Sloan, Jill S

    2016-07-01

    Glioblastoma multiforme (GBM) is a genomically complex and aggressive primary adult brain tumor, with a median survival time of 12-14 months. The heterogeneous nature of this disease has made the identification and validation of prognostic biomarkers difficult. Using reverse phase protein array data from 203 primary untreated GBM patients, we have identified a set of 13 proteins with prognostic significance. Our protein signature predictive of glioblastoma (PROTGLIO) patient survival model was constructed and validated on independent data sets and was shown to significantly predict survival in GBM patients (log-rank test: p = 0.0009). Using a multivariate Cox proportional hazards, we have shown that our PROTGLIO model is distinct from other known GBM prognostic factors (age at diagnosis, extent of surgical resection, postoperative Karnofsky performance score (KPS), treatment with temozolomide (TMZ) chemoradiation, and methylation of the MGMT gene). Tenfold cross-validation repetition of our model generation procedure confirmed validation of PROTGLIO. The model was further validated on an independent set of isocitrate dehydrogenase wild-type (IDHwt) lower grade gliomas (LGG)-a portion of these tumors progress rapidly to GBM. The PROTGLIO model contains proteins, such as Cox-2 and Annexin 1, involved in inflammatory response, pointing to potential therapeutic interventions. The PROTGLIO model is a simple and effective predictor of overall survival in glioblastoma patients, making it potentially useful in clinical practice of glioblastoma multiforme. PMID:27143410

  17. Proteins with Novel Structure, Function and Dynamics

    NASA Technical Reports Server (NTRS)

    Pohorille, Andrew

    2014-01-01

    Recently, a small enzyme that ligates two RNA fragments with the rate of 10(exp 6) above background was evolved in vitro (Seelig and Szostak, Nature 448:828-831, 2007). This enzyme does not resemble any contemporary protein (Chao et al., Nature Chem. Biol. 9:81-83, 2013). It consists of a dynamic, catalytic loop, a small, rigid core containing two zinc ions coordinated by neighboring amino acids, and two highly flexible tails that might be unimportant for protein function. In contrast to other proteins, this enzyme does not contain ordered secondary structure elements, such as alpha-helix or beta-sheet. The loop is kept together by just two interactions of a charged residue and a histidine with a zinc ion, which they coordinate on the opposite side of the loop. Such structure appears to be very fragile. Surprisingly, computer simulations indicate otherwise. As the coordinating, charged residue is mutated to alanine, another, nearby charged residue takes its place, thus keeping the structure nearly intact. If this residue is also substituted by alanine a salt bridge involving two other, charged residues on the opposite sides of the loop keeps the loop in place. These adjustments are facilitated by high flexibility of the protein. Computational predictions have been confirmed experimentally, as both mutants retain full activity and overall structure. These results challenge our notions about what is required for protein activity and about the relationship between protein dynamics, stability and robustness. We hypothesize that small, highly dynamic proteins could be both active and fault tolerant in ways that many other proteins are not, i.e. they can adjust to retain their structure and activity even if subjected to mutations in structurally critical regions. This opens the doors for designing proteins with novel functions, structures and dynamics that have not been yet considered.

  18. Prediction and Annotation of Plant Protein Interaction Networks

    SciTech Connect

    McDermott, Jason E.; Wang, Jun; Yu, Jun; Wong, Gane Ka-Shu; Samudrala, Ram

    2009-02-01

    Large-scale experimental studies of interactions between components of biological systems have been performed for a variety of eukaryotic organisms. However, there is a dearth of such data for plants. Computational methods for prediction of relationships between proteins, primarily based on comparative genomics, provide a useful systems-level view of cellular functioning and can be used to extend information about other eukaryotes to plants. We have predicted networks for Arabidopsis thaliana, Oryza sativa indica and japonica and several plant pathogens using the Bioverse (http://bioverse.compbio.washington.edu) and show that they are similar to experimentally-derived interaction networks. Predicted interaction networks for plants can be used to provide novel functional annotations and predictions about plant phenotypes and aid in rational engineering of biosynthesis pathways.

  19. Multi-level machine learning prediction of protein-protein interactions in Saccharomyces cerevisiae.

    PubMed

    Zubek, Julian; Tatjewski, Marcin; Boniecki, Adam; Mnich, Maciej; Basu, Subhadip; Plewczynski, Dariusz

    2015-01-01

    Accurate identification of protein-protein interactions (PPI) is the key step in understanding proteins' biological functions, which are typically context-dependent. Many existing PPI predictors rely on aggregated features from protein sequences, however only a few methods exploit local information about specific residue contacts. In this work we present a two-stage machine learning approach for prediction of protein-protein interactions. We start with the carefully filtered data on protein complexes available for Saccharomyces cerevisiae in the Protein Data Bank (PDB) database. First, we build linear descriptions of interacting and non-interacting sequence segment pairs based on their inter-residue distances. Secondly, we train machine learning classifiers to predict binary segment interactions for any two short sequence fragments. The final prediction of the protein-protein interaction is done using the 2D matrix representation of all-against-all possible interacting sequence segments of both analysed proteins. The level-I predictor achieves 0.88 AUC for micro-scale, i.e., residue-level prediction. The level-II predictor improves the results further by a more complex learning paradigm. We perform 30-fold macro-scale, i.e., protein-level cross-validation experiment. The level-II predictor using PSIPRED-predicted secondary structure reaches 0.70 precision, 0.68 recall, and 0.70 AUC, whereas other popular methods provide results below 0.6 threshold (recall, precision, AUC). Our results demonstrate that multi-scale sequence features aggregation procedure is able to improve the machine learning results by more than 10% as compared to other sequence representations. Prepared datasets and source code for our experimental pipeline are freely available for download from: http://zubekj.github.io/mlppi/ (open source Python implementation, OS independent).

  20. Multi-level machine learning prediction of protein-protein interactions in Saccharomyces cerevisiae.

    PubMed

    Zubek, Julian; Tatjewski, Marcin; Boniecki, Adam; Mnich, Maciej; Basu, Subhadip; Plewczynski, Dariusz

    2015-01-01

    Accurate identification of protein-protein interactions (PPI) is the key step in understanding proteins' biological functions, which are typically context-dependent. Many existing PPI predictors rely on aggregated features from protein sequences, however only a few methods exploit local information about specific residue contacts. In this work we present a two-stage machine learning approach for prediction of protein-protein interactions. We start with the carefully filtered data on protein complexes available for Saccharomyces cerevisiae in the Protein Data Bank (PDB) database. First, we build linear descriptions of interacting and non-interacting sequence segment pairs based on their inter-residue distances. Secondly, we train machine learning classifiers to predict binary segment interactions for any two short sequence fragments. The final prediction of the protein-protein interaction is done using the 2D matrix representation of all-against-all possible interacting sequence segments of both analysed proteins. The level-I predictor achieves 0.88 AUC for micro-scale, i.e., residue-level prediction. The level-II predictor improves the results further by a more complex learning paradigm. We perform 30-fold macro-scale, i.e., protein-level cross-validation experiment. The level-II predictor using PSIPRED-predicted secondary structure reaches 0.70 precision, 0.68 recall, and 0.70 AUC, whereas other popular methods provide results below 0.6 threshold (recall, precision, AUC). Our results demonstrate that multi-scale sequence features aggregation procedure is able to improve the machine learning results by more than 10% as compared to other sequence representations. Prepared datasets and source code for our experimental pipeline are freely available for download from: http://zubekj.github.io/mlppi/ (open source Python implementation, OS independent). PMID:26157620

  1. Using support vector machine for improving protein-protein interaction prediction utilizing domain interactions

    SciTech Connect

    Singhal, Mudita; Shah, Anuj R.; Brown, Roslyn N.; Adkins, Joshua N.

    2010-10-02

    Understanding protein interactions is essential to gain insights into the biological processes at the whole cell level. The high-throughput experimental techniques for determining protein-protein interactions (PPI) are error prone and expensive with low overlap amongst them. Although several computational methods have been proposed for predicting protein interactions there is definite room for improvement. Here we present DomainSVM, a predictive method for PPI that uses computationally inferred domain-domain interaction values in a Support Vector Machine framework to predict protein interactions. DomainSVM method utilizes evidence of multiple interacting domains to predict a protein interaction. It outperforms existing methods of PPI prediction by achieving very high explanation ratios, precision, specificity, sensitivity and F-measure values in a 10 fold cross-validation study conducted on the positive and negative PPIs in yeast. A Functional comparison study using GO annotations on the positive and the negative test sets is presented in addition to discussing novel PPI predictions in Salmonella Typhimurium.

  2. Neurodegenerative diseases: quantitative predictions of protein-RNA interactions.

    PubMed

    Cirillo, Davide; Agostini, Federico; Klus, Petr; Marchese, Domenica; Rodriguez, Silvia; Bolognesi, Benedetta; Tartaglia, Gian Gaetano

    2013-02-01

    Increasing evidence indicates that RNA plays an active role in a number of neurodegenerative diseases. We recently introduced a theoretical framework, catRAPID, to predict the binding ability of protein and RNA molecules. Here, we use catRAPID to investigate ribonucleoprotein interactions linked to inherited intellectual disability, amyotrophic lateral sclerosis, Creutzfeuld-Jakob, Alzheimer's, and Parkinson's diseases. We specifically focus on (1) RNA interactions with fragile X mental retardation protein FMRP; (2) protein sequestration caused by CGG repeats; (3) noncoding transcripts regulated by TAR DNA-binding protein 43 TDP-43; (4) autogenous regulation of TDP-43 and FMRP; (5) iron-mediated expression of amyloid precursor protein APP and α-synuclein; (6) interactions between prions and RNA aptamers. Our results are in striking agreement with experimental evidence and provide new insights in processes associated with neuronal function and misfunction.

  3. Transferring network topological knowledge for predicting protein-protein interactions.

    PubMed

    Xu, Qian; Xiang, Evan Wei; Yang, Qiang

    2011-10-01

    Protein-protein interactions (PPIs) play an important role in cellular processes within a cell. An important task is to determine the existence of interactions among proteins. Unfortunately, the existing biological experimental techniques are expensive, time-consuming and labor-intensive. The network structures of many such networks are sparse, incomplete and noisy. Thus, state-of-the-art methods for link prediction in these networks often cannot give satisfactory prediction results, especially when some networks are extremely sparse. Noticing that we typically have more than one PPI network available, we naturally wonder whether it is possible to 'transfer' the linkage knowledge from some existing, relatively dense networks to a sparse network, to improve the prediction performance. Noticing that a network structure can be modeled using a matrix model, we introduce the well-known collective matrix factorization technique to 'transfer' usable linkage knowledge from relatively dense interaction network to a sparse target network. Our approach is to establish a correspondence between a source network and a target network via network-wide similarities. We test this method on two real PPI networks, Helicobacter pylori (as a target network) and human (as a source network). Our experimental results show that our method can achieve higher performance as compared with some baseline methods. PMID:21770035

  4. Identifying functional sites based on prediction of charged group behavior.

    PubMed

    Ondrechen, Mary Jo

    2004-09-01

    This protocol describes the implementation and interpretation of THEMATICS, a simple computational predictor of functional information for proteins from the three-dimensional structure. This method is based on the computation of the electrical potential function for the protein and the calculation of the predicted titration curves for each of the titratable groups in the protein. While most of the titratable residues in a protein have predicted titration behavior that fits the Henderson-Hasselbalch equation, the ionizable residues in the active site generally deviate dramatically from the typical behavior. From the calculated titration curves, one identifies those residues that deviate significantly from Henderson-Hasselbalch behavior. A cluster of two or more of such deviant titratable residues in physical proximity is a reliable predictor of active-site location.

  5. Predicting protein-protein interactions in the post synaptic density.

    PubMed

    Bar-shira, Ossnat; Chechik, Gal

    2013-09-01

    The post synaptic density (PSD) is a specialization of the cytoskeleton at the synaptic junction, composed of hundreds of different proteins. Characterizing the protein components of the PSD and their interactions can help elucidate the mechanism of long-term changes in synaptic plasticity, which underlie learning and memory. Unfortunately, our knowledge of the proteome and interactome of the PSD is still partial and noisy. In this study we describe a computational framework to improve the reconstruction of the PSD network. The approach is based on learning the characteristics of PSD protein interactions from a set of trusted interactions, expanding this set with data collected from large scale repositories, and then predicting novel interaction with proteins that are suspected to reside in the PSD. Using this method we obtained thirty predicted interactions, with more than half of which having supporting evidence in the literature. We discuss in details two of these new interactions, Lrrtm1 with PSD-95 and Src with Capg. The first may take part in a mechanism underlying glutamatergic dysfunction in schizophrenia. The second suggests an alternative mechanism to regulate dendritic spines maturation.

  6. CATH FunFHMMer web server: protein functional annotations using functional family assignments.

    PubMed

    Das, Sayoni; Sillitoe, Ian; Lee, David; Lees, Jonathan G; Dawson, Natalie L; Ward, John; Orengo, Christine A

    2015-07-01

    The widening function annotation gap in protein databases and the increasing number and diversity of the proteins being sequenced presents new challenges to protein function prediction methods. Multidomain proteins complicate the protein sequence-structure-function relationship further as new combinations of domains can expand the functional repertoire, creating new proteins and functions. Here, we present the FunFHMMer web server, which provides Gene Ontology (GO) annotations for query protein sequences based on the functional classification of the domain-based CATH-Gene3D resource. Our server also provides valuable information for the prediction of functional sites. The predictive power of FunFHMMer has been validated on a set of 95 proteins where FunFHMMer performs better than BLAST, Pfam and CDD. Recent validation by an independent international competition ranks FunFHMMer as one of the top function prediction methods in predicting GO annotations for both the Biological Process and Molecular Function Ontology. The FunFHMMer web server is available at http://www.cathdb.info/search/by_funfhmmer.

  7. CATH FunFHMMer web server: protein functional annotations using functional family assignments

    PubMed Central

    Das, Sayoni; Sillitoe, Ian; Lee, David; Lees, Jonathan G.; Dawson, Natalie L.; Ward, John; Orengo, Christine A.

    2015-01-01

    The widening function annotation gap in protein databases and the increasing number and diversity of the proteins being sequenced presents new challenges to protein function prediction methods. Multidomain proteins complicate the protein sequence–structure–function relationship further as new combinations of domains can expand the functional repertoire, creating new proteins and functions. Here, we present the FunFHMMer web server, which provides Gene Ontology (GO) annotations for query protein sequences based on the functional classification of the domain-based CATH-Gene3D resource. Our server also provides valuable information for the prediction of functional sites. The predictive power of FunFHMMer has been validated on a set of 95 proteins where FunFHMMer performs better than BLAST, Pfam and CDD. Recent validation by an independent international competition ranks FunFHMMer as one of the top function prediction methods in predicting GO annotations for both the Biological Process and Molecular Function Ontology. The FunFHMMer web server is available at http://www.cathdb.info/search/by_funfhmmer. PMID:25964299

  8. Systematic Prediction of Scaffold Proteins Reveals New Design Principles in Scaffold-Mediated Signal Transduction

    PubMed Central

    Hu, Jianfei; Neiswinger, Johnathan; Zhang, Jin; Zhu, Heng; Qian, Jiang

    2015-01-01

    Scaffold proteins play a crucial role in facilitating signal transduction in eukaryotes by bringing together multiple signaling components. In this study, we performed a systematic analysis of scaffold proteins in signal transduction by integrating protein-protein interaction and kinase-substrate relationship networks. We predicted 212 scaffold proteins that are involved in 605 distinct signaling pathways. The computational prediction was validated using a protein microarray-based approach. The predicted scaffold proteins showed several interesting characteristics, as we expected from the functionality of scaffold proteins. We found that the scaffold proteins are likely to interact with each other, which is consistent with previous finding that scaffold proteins tend to form homodimers and heterodimers. Interestingly, a single scaffold protein can be involved in multiple signaling pathways by interacting with other scaffold protein partners. Furthermore, we propose two possible regulatory mechanisms by which the activity of scaffold proteins is coordinated with their associated pathways through phosphorylation process. PMID:26393507

  9. Systematic Prediction of Scaffold Proteins Reveals New Design Principles in Scaffold-Mediated Signal Transduction.

    PubMed

    Hu, Jianfei; Neiswinger, Johnathan; Zhang, Jin; Zhu, Heng; Qian, Jiang

    2015-01-01

    Scaffold proteins play a crucial role in facilitating signal transduction in eukaryotes by bringing together multiple signaling components. In this study, we performed a systematic analysis of scaffold proteins in signal transduction by integrating protein-protein interaction and kinase-substrate relationship networks. We predicted 212 scaffold proteins that are involved in 605 distinct signaling pathways. The computational prediction was validated using a protein microarray-based approach. The predicted scaffold proteins showed several interesting characteristics, as we expected from the functionality of scaffold proteins. We found that the scaffold proteins are likely to interact with each other, which is consistent with previous finding that scaffold proteins tend to form homodimers and heterodimers. Interestingly, a single scaffold protein can be involved in multiple signaling pathways by interacting with other scaffold protein partners. Furthermore, we propose two possible regulatory mechanisms by which the activity of scaffold proteins is coordinated with their associated pathways through phosphorylation process.

  10. Sequence-only evolutionary and predicted structural features for the prediction of stability changes in protein mutants

    PubMed Central

    2013-01-01

    Background Even a single amino acid substitution in a protein sequence may result in significant changes in protein stability, structure, and therefore in protein function as well. In the post-genomic era, computational methods for predicting stability changes from only the sequence of a protein are of importance. While evolutionary relationships of protein mutations can be extracted from large protein databases holding millions of protein sequences, relevant evolutionary features for the prediction of stability changes have not been proposed. Also, the use of predicted structural features in situations when a protein structure is not available has not been explored. Results We proposed a number of evolutionary and predicted structural features for the prediction of stability changes and analysed which of them capture the determinants of protein stability the best. We trained and evaluated our machine learning method on a non-redundant data set of experimentally measured stability changes. When only the direction of the stability change was predicted, we found that the best performance improvement can be achieved by the combination of the evolutionary features mutation likelihood and SIFTscore in conjunction with the predicted structural feature secondary structure. The same two evolutionary features in the combination with the predicted structural feature accessible surface area achieved the lowest error when the prediction of actual values of stability changes was assessed. Compared to similar studies, our method achieved improvements in prediction performance. Conclusion Although the strongest feature for the prediction of stability changes appears to be the vector of amino acid identities in the sequential neighbourhood of the mutation, the most relevant combination of evolutionary and predicted structural features further improves prediction performance. Even the predicted structural features, which did not perform well on their own, turn out to be beneficial

  11. Rosetta stone method for detecting protein function and protein-protein interactions from genome sequences

    DOEpatents

    Eisenberg, David; Marcotte, Edward M.; Pellegrini, Matteo; Thompson, Michael J.; Yeates, Todd O.

    2002-10-15

    A computational method system, and computer program are provided for inferring functional links from genome sequences. One method is based on the observation that some pairs of proteins A' and B' have homologs in another organism fused into a single protein chain AB. A trans-genome comparison of sequences can reveal these AB sequences, which are Rosetta Stone sequences because they decipher an interaction between A' and B. Another method compares the genomic sequence of two or more organisms to create a phylogenetic profile for each protein indicating its presence or absence across all the genomes. The profile provides information regarding functional links between different families of proteins. In yet another method a combination of the above two methods is used to predict functional links.

  12. MULTIPROSPECTOR: an algorithm for the prediction of protein-protein interactions by multimeric threading.

    PubMed

    Lu, Long; Lu, Hui; Skolnick, Jeffrey

    2002-11-15

    In this postgenomic era, the ability to identify protein-protein interactions on a genomic scale is very important to assist in the assignment of physiological function. Because of the increasing number of solved structures involving protein complexes, the time is ripe to extend threading to the prediction of quaternary structure. In this spirit, a multimeric threading approach has been developed. The approach is comprised of two phases. In the first phase, traditional threading on a single chain is applied to generate a set of potential structures for the query sequences. In particular, we use our recently developed threading algorithm, PROSPECTOR. Then, for those proteins whose template structures are part of a known complex, we rethread on both partners in the complex and now include a protein-protein interfacial energy. To perform this analysis, a database of multimeric protein structures has been constructed, the necessary interfacial pairwise potentials have been derived, and a set of empirical indicators to identify true multimers based on the threading Z-score and the magnitude of the interfacial energy have been established. The algorithm has been tested on a benchmark set comprised of 40 homodimers, 15 heterodimers, and 69 monomers that were scanned against a protein library of 2478 structures that comprise a representative set of structures in the Protein Data Bank. Of these, the method correctly recognized and assigned 36 homodimers, 15 heterodimers, and 65 monomers. This protocol was applied to identify partners and assign quaternary structures of proteins found in the yeast database of interacting proteins. Our multimeric threading algorithm correctly predicts 144 interacting proteins, compared to the 56 (26) cases assigned by PSI-BLAST using a (less) permissive E-value of 1 (0.01). Next, all possible pairs of yeast proteins have been examined. Predictions (n = 2865) of protein-protein interactions are made; 1138 of these 2865 interactions have

  13. Protein flexibility predictions using graph theory.

    PubMed

    Jacobs, D J; Rader, A J; Kuhn, L A; Thorpe, M F

    2001-08-01

    Techniques from graph theory are applied to analyze the bond networks in proteins and identify the flexible and rigid regions. The bond network consists of distance constraints defined by the covalent and hydrogen bonds and salt bridges in the protein, identified by geometric and energetic criteria. We use an algorithm that counts the degrees of freedom within this constraint network and that identifies all the rigid and flexible substructures in the protein, including overconstrained regions (with more crosslinking bonds than are needed to rigidify the region) and underconstrained or flexible regions, in which dihedral bond rotations can occur. The number of extra constraints or remaining degrees of bond-rotational freedom within a substructure quantifies its relative rigidity/flexibility and provides a flexibility index for each bond in the structure. This novel computational procedure, first used in the analysis of glassy materials, is approximately a million times faster than molecular dynamics simulations and captures the essential conformational flexibility of the protein main and side-chains from analysis of a single, static three-dimensional structure. This approach is demonstrated by comparison with experimental measures of flexibility for three proteins in which hinge and loop motion are essential for biological function: HIV protease, adenylate kinase, and dihydrofolate reductase.

  14. Predicting protein-protein interactions in Arabidopsis thaliana through integration of orthology, gene ontology and co-expression

    PubMed Central

    De Bodt, Stefanie; Proost, Sebastian; Vandepoele, Klaas; Rouzé, Pierre; Van de Peer, Yves

    2009-01-01

    Background Large-scale identification of the interrelationships between different components of the cell, such as the interactions between proteins, has recently gained great interest. However, unraveling large-scale protein-protein interaction maps is laborious and expensive. Moreover, assessing the reliability of the interactions can be cumbersome. Results In this study, we have developed a computational method that exploits the existing knowledge on protein-protein interactions in diverse species through orthologous relations on the one hand, and functional association data on the other hand to predict and filter protein-protein interactions in Arabidopsis thaliana. A highly reliable set of protein-protein interactions is predicted through this integrative approach making use of existing protein-protein interaction data from yeast, human, C. elegans and D. melanogaster. Localization, biological process, and co-expression data are used as powerful indicators for protein-protein interactions. The functional repertoire of the identified interactome reveals interactions between proteins functioning in well-conserved as well as plant-specific biological processes. We observe that although common mechanisms (e.g. actin polymerization) and components (e.g. ARPs, actin-related proteins) exist between different lineages, they are active in specific processes such as growth, cancer metastasis and trichome development in yeast, human and Arabidopsis, respectively. Conclusion We conclude that the integration of orthology with functional association data is adequate to predict protein-protein interactions. Through this approach, a high number of novel protein-protein interactions with diverse biological roles is discovered. Overall, we have predicted a reliable set of protein-protein interactions suitable for further computational as well as experimental analyses. PMID:19563678

  15. Protein Secondary Structure Prediction Using Deep Convolutional Neural Fields.

    PubMed

    Wang, Sheng; Peng, Jian; Ma, Jianzhu; Xu, Jinbo

    2016-01-01

    Protein secondary structure (SS) prediction is important for studying protein structure and function. When only the sequence (profile) information is used as input feature, currently the best predictors can obtain ~80% Q3 accuracy, which has not been improved in the past decade. Here we present DeepCNF (Deep Convolutional Neural Fields) for protein SS prediction. DeepCNF is a Deep Learning extension of Conditional Neural Fields (CNF), which is an integration of Conditional Random Fields (CRF) and shallow neural networks. DeepCNF can model not only complex sequence-structure relationship by a deep hierarchical architecture, but also interdependency between adjacent SS labels, so it is much more powerful than CNF. Experimental results show that DeepCNF can obtain ~84% Q3 accuracy, ~85% SOV score, and ~72% Q8 accuracy, respectively, on the CASP and CAMEO test proteins, greatly outperforming currently popular predictors. As a general framework, DeepCNF can be used to predict other protein structure properties such as contact number, disorder regions, and solvent accessibility. PMID:26752681

  16. Protein Secondary Structure Prediction Using Deep Convolutional Neural Fields

    NASA Astrophysics Data System (ADS)

    Wang, Sheng; Peng, Jian; Ma, Jianzhu; Xu, Jinbo

    2016-01-01

    Protein secondary structure (SS) prediction is important for studying protein structure and function. When only the sequence (profile) information is used as input feature, currently the best predictors can obtain ~80% Q3 accuracy, which has not been improved in the past decade. Here we present DeepCNF (Deep Convolutional Neural Fields) for protein SS prediction. DeepCNF is a Deep Learning extension of Conditional Neural Fields (CNF), which is an integration of Conditional Random Fields (CRF) and shallow neural networks. DeepCNF can model not only complex sequence-structure relationship by a deep hierarchical architecture, but also interdependency between adjacent SS labels, so it is much more powerful than CNF. Experimental results show that DeepCNF can obtain ~84% Q3 accuracy, ~85% SOV score, and ~72% Q8 accuracy, respectively, on the CASP and CAMEO test proteins, greatly outperforming currently popular predictors. As a general framework, DeepCNF can be used to predict other protein structure properties such as contact number, disorder regions, and solvent accessibility.

  17. BPROMPT: A consensus server for membrane protein prediction.

    PubMed

    Taylor, Paul D; Attwood, Teresa K; Flower, Darren R

    2003-07-01

    Protein structure prediction is a cornerstone of bioinformatics research. Membrane proteins require their own prediction methods due to their intrinsically different composition. A variety of tools exist for topology prediction of membrane proteins, many of them available on the Internet. The server described in this paper, BPROMPT (Bayesian PRediction Of Membrane Protein Topology), uses a Bayesian Belief Network to combine the results of other prediction methods, providing a more accurate consensus prediction. Topology predictions with accuracies of 70% for prokaryotes and 53% for eukaryotes were achieved. BPROMPT can be accessed at http://www.jenner.ac.uk/BPROMPT. PMID:12824397

  18. Predicting the Binding Patterns of Hub Proteins: A Study Using Yeast Protein Interaction Networks

    PubMed Central

    Andorf, Carson M.; Honavar, Vasant; Sen, Taner Z.

    2013-01-01

    Background Protein-protein interactions are critical to elucidating the role played by individual proteins in important biological pathways. Of particular interest are hub proteins that can interact with large numbers of partners and often play essential roles in cellular control. Depending on the number of binding sites, protein hubs can be classified at a structural level as singlish-interface hubs (SIH) with one or two binding sites, or multiple-interface hubs (MIH) with three or more binding sites. In terms of kinetics, hub proteins can be classified as date hubs (i.e., interact with different partners at different times or locations) or party hubs (i.e., simultaneously interact with multiple partners). Methodology Our approach works in 3 phases: Phase I classifies if a protein is likely to bind with another protein. Phase II determines if a protein-binding (PB) protein is a hub. Phase III classifies PB proteins as singlish-interface versus multiple-interface hubs and date versus party hubs. At each stage, we use sequence-based predictors trained using several standard machine learning techniques. Conclusions Our method is able to predict whether a protein is a protein-binding protein with an accuracy of 94% and a correlation coefficient of 0.87; identify hubs from non-hubs with 100% accuracy for 30% of the data; distinguish date hubs/party hubs with 69% accuracy and area under ROC curve of 0.68; and SIH/MIH with 89% accuracy and area under ROC curve of 0.84. Because our method is based on sequence information alone, it can be used even in settings where reliable protein-protein interaction data or structures of protein-protein complexes are unavailable to obtain useful insights into the functional and evolutionary characteristics of proteins and their interactions. Availability We provide a web server for our three-phase approach: http://hybsvm.gdcb.iastate.edu. PMID:23431393

  19. MEGADOCK: an all-to-all protein-protein interaction prediction system using tertiary structure data.

    PubMed

    Ohue, Masahito; Matsuzaki, Yuri; Uchikoga, Nobuyuki; Ishida, Takashi; Akiyama, Yutaka

    2014-01-01

    The elucidation of protein-protein interaction (PPI) networks is important for understanding cellular structure and function and structure-based drug design. However, the development of an effective method to conduct exhaustive PPI screening represents a computational challenge. We have been investigating a protein docking approach based on shape complementarity and physicochemical properties. We describe here the development of the protein-protein docking software package "MEGADOCK" that samples an extremely large number of protein dockings at high speed. MEGADOCK reduces the calculation time required for docking by using several techniques such as a novel scoring function called the real Pairwise Shape Complementarity (rPSC) score. We showed that MEGADOCK is capable of exhaustive PPI screening by completing docking calculations 7.5 times faster than the conventional docking software, ZDOCK, while maintaining an acceptable level of accuracy. When MEGADOCK was applied to a subset of a general benchmark dataset to predict 120 relevant interacting pairs from 120 x 120 = 14,400 combinations of proteins, an F-measure value of 0.231 was obtained. Further, we showed that MEGADOCK can be applied to a large-scale protein-protein interaction-screening problem with accuracy better than random. When our approach is combined with parallel high-performance computing systems, it is now feasible to search and analyze protein-protein interactions while taking into account three-dimensional structures at the interactome scale. MEGADOCK is freely available at http://www.bi.cs.titech.ac.jp/megadock. PMID:23855673

  20. Modelling proteins' hidden conformations to predict antibiotic resistance

    NASA Astrophysics Data System (ADS)

    Hart, Kathryn M.; Ho, Chris M. W.; Dutta, Supratik; Gross, Michael L.; Bowman, Gregory R.

    2016-10-01

    TEM β-lactamase confers bacteria with resistance to many antibiotics and rapidly evolves activity against new drugs. However, functional changes are not easily explained by differences in crystal structures. We employ Markov state models to identify hidden conformations and explore their role in determining TEM's specificity. We integrate these models with existing drug-design tools to create a new technique, called Boltzmann docking, which better predicts TEM specificity by accounting for conformational heterogeneity. Using our MSMs, we identify hidden states whose populations correlate with activity against cefotaxime. To experimentally detect our predicted hidden states, we use rapid mass spectrometric footprinting and confirm our models' prediction that increased cefotaxime activity correlates with reduced Ω-loop flexibility. Finally, we design novel variants to stabilize the hidden cefotaximase states, and find their populations predict activity against cefotaxime in vitro and in vivo. Therefore, we expect this framework to have numerous applications in drug and protein design.

  1. Large-scale de novo prediction of physical protein-protein association.

    PubMed

    Elefsinioti, Antigoni; Saraç, Ömer Sinan; Hegele, Anna; Plake, Conrad; Hubner, Nina C; Poser, Ina; Sarov, Mihail; Hyman, Anthony; Mann, Matthias; Schroeder, Michael; Stelzl, Ulrich; Beyer, Andreas

    2011-11-01

    Information about the physical association of proteins is extensively used for studying cellular processes and disease mechanisms. However, complete experimental mapping of the human interactome will remain prohibitively difficult in the near future. Here we present a map of predicted human protein interactions that distinguishes functional association from physical binding. Our network classifies more than 5 million protein pairs predicting 94,009 new interactions with high confidence. We experimentally tested a subset of these predictions using yeast two-hybrid analysis and affinity purification followed by quantitative mass spectrometry. Thus we identified 462 new protein-protein interactions and confirmed the predictive power of the network. These independent experiments address potential issues of circular reasoning and are a distinctive feature of this work. Analysis of the physical interactome unravels subnetworks mediating between different functional and physical subunits of the cell. Finally, we demonstrate the utility of the network for the analysis of molecular mechanisms of complex diseases by applying it to genome-wide association studies of neurodegenerative diseases. This analysis provides new evidence implying TOMM40 as a factor involved in Alzheimer's disease. The network provides a high-quality resource for the analysis of genomic data sets and genetic association studies in particular. Our interactome is available via the hPRINT web server at: www.print-db.org.

  2. Collective Dynamics Differentiates Functional Divergence in Protein Evolution

    PubMed Central

    Glembo, Tyler J.; Farrell, Daniel W.; Gerek, Z. Nevin; Thorpe, M. F.; Ozkan, S. Banu

    2012-01-01

    Protein evolution is most commonly studied by analyzing related protein sequences and generating ancestral sequences through Bayesian and Maximum Likelihood methods, and/or by resurrecting ancestral proteins in the lab and performing ligand binding studies to determine function. Structural and dynamic evolution have largely been left out of molecular evolution studies. Here we incorporate both structure and dynamics to elucidate the molecular principles behind the divergence in the evolutionary path of the steroid receptor proteins. We determine the likely structure of three evolutionarily diverged ancestral steroid receptor proteins using the Zipping and Assembly Method with FRODA (ZAMF). Our predictions are within ∼2.7 Å all-atom RMSD of the respective crystal structures of the ancestral steroid receptors. Beyond static structure prediction, a particular feature of ZAMF is that it generates protein dynamics information. We investigate the differences in conformational dynamics of diverged proteins by obtaining the most collective motion through essential dynamics. Strikingly, our analysis shows that evolutionarily diverged proteins of the same family do not share the same dynamic subspace, while those sharing the same function are simultaneously clustered together and distant from those, that have functionally diverged. Dynamic analysis also enables those mutations that most affect dynamics to be identified. It correctly predicts all mutations (functional and permissive) necessary to evolve new function and ∼60% of permissive mutations necessary to recover ancestral function. PMID:22479170

  3. Biological cluster evaluation for gene function prediction.

    PubMed

    Klie, Sebastian; Nikoloski, Zoran; Selbig, Joachim

    2014-06-01

    Recent advances in high-throughput omics techniques render it possible to decode the function of genes by using the "guilt-by-association" principle on biologically meaningful clusters of gene expression data. However, the existing frameworks for biological evaluation of gene clusters are hindered by two bottleneck issues: (1) the choice for the number of clusters, and (2) the external measures which do not take in consideration the structure of the analyzed data and the ontology of the existing biological knowledge. Here, we address the identified bottlenecks by developing a novel framework that allows not only for biological evaluation of gene expression clusters based on existing structured knowledge, but also for prediction of putative gene functions. The proposed framework facilitates propagation of statistical significance at each of the following steps: (1) estimating the number of clusters, (2) evaluating the clusters in terms of novel external structural measures, (3) selecting an optimal clustering algorithm, and (4) predicting gene functions. The framework also includes a method for evaluation of gene clusters based on the structure of the employed ontology. Moreover, our method for obtaining a probabilistic range for the number of clusters is demonstrated valid on synthetic data and available gene expression profiles from Saccharomyces cerevisiae. Finally, we propose a network-based approach for gene function prediction which relies on the clustering of optimal score and the employed ontology. Our approach effectively predicts gene function on the Saccharomyces cerevisiae data set and is also employed to obtain putative gene functions for an Arabidopsis thaliana data set.

  4. Integration of genomic datasets to predict protein complexes in yeast.

    PubMed

    Jansen, Ronald; Lan, Ning; Qian, Jiang; Gerstein, Mark

    2002-01-01

    The ultimate goal of functional genomics is to define the function of all the genes in the genome of an organism. A large body of information of the biological roles of genes has been accumulated and aggregated in the past decades of research, both from traditional experiments detailing the role of individual genes and proteins, and from newer experimental strategies that aim to characterize gene function on a genomic scale. It is clear that the goal of functional genomics can only be achieved by integrating information and data sources from the variety of these different experiments. Integration of different data is thus an important challenge for bioinformatics. The integration of different data sources often helps to uncover non-obvious relationships between genes, but there are also two further benefits. First, it is likely that whenever information from multiple independent sources agrees, it should be more valid and reliable. Secondly, by looking at the union of multiple sources, one can cover larger parts of the genome. This is obvious for integrating results from multiple single gene or protein experiments, but also necessary for many of the results from genome-wide experiments since they are often confined to certain (although sizable) subsets of the genome. In this paper, we explore an example of such a data integration procedure. We focus on the prediction of membership in protein complexes for individual genes. For this, we recruit six different data sources that include expression profiles, interaction data, essentiality and localization information. Each of these data sources individually contains some weakly predictive information with respect to protein complexes, but we show how this prediction can be improved by combining all of them. Supplementary information is available at http:// bioinfo.mbb.yale.edu/integrate/interactions/. PMID:12836664

  5. 3D protein structure prediction using Imperialist Competitive algorithm and half sphere exposure prediction.

    PubMed

    Khaji, Erfan; Karami, Masoumeh; Garkani-Nejad, Zahra

    2016-02-21

    Predicting the native structure of proteins based on half-sphere exposure and contact numbers has been studied deeply within recent years. Online predictors of these vectors and secondary structures of amino acids sequences have made it possible to design a function for the folding process. By choosing variant structures and directs for each secondary structure, a random conformation can be generated, and a potential function can then be assigned. Minimizing the potential function utilizing meta-heuristic algorithms is the final step of finding the native structure of a given amino acid sequence. In this work, Imperialist Competitive algorithm was used in order to accelerate the process of minimization. Moreover, we applied an adaptive procedure to apply revolutionary changes. Finally, we considered a more accurate tool for prediction of secondary structure. The results of the computational experiments on standard benchmark show the superiority of the new algorithm over the previous methods with similar potential function. PMID:26718864

  6. 3D protein structure prediction using Imperialist Competitive algorithm and half sphere exposure prediction.

    PubMed

    Khaji, Erfan; Karami, Masoumeh; Garkani-Nejad, Zahra

    2016-02-21

    Predicting the native structure of proteins based on half-sphere exposure and contact numbers has been studied deeply within recent years. Online predictors of these vectors and secondary structures of amino acids sequences have made it possible to design a function for the folding process. By choosing variant structures and directs for each secondary structure, a random conformation can be generated, and a potential function can then be assigned. Minimizing the potential function utilizing meta-heuristic algorithms is the final step of finding the native structure of a given amino acid sequence. In this work, Imperialist Competitive algorithm was used in order to accelerate the process of minimization. Moreover, we applied an adaptive procedure to apply revolutionary changes. Finally, we considered a more accurate tool for prediction of secondary structure. The results of the computational experiments on standard benchmark show the superiority of the new algorithm over the previous methods with similar potential function.

  7. (PS)2: protein structure prediction server version 3.0.

    PubMed

    Huang, Tsun-Tsao; Hwang, Jenn-Kang; Chen, Chu-Huang; Chu, Chih-Sheng; Lee, Chi-Wen; Chen, Chih-Chieh

    2015-07-01

    Protein complexes are involved in many biological processes. Examining coupling between subunits of a complex would be useful to understand the molecular basis of protein function. Here, our updated (PS)(2) web server predicts the three-dimensional structures of protein complexes based on comparative modeling; furthermore, this server examines the coupling between subunits of the predicted complex by combining structural and evolutionary considerations. The predicted complex structure could be indicated and visualized by Java-based 3D graphics viewers and the structural and evolutionary profiles are shown and compared chain-by-chain. For each subunit, considerations with or without the packing contribution of other subunits cause the differences in similarities between structural and evolutionary profiles, and these differences imply which form, complex or monomeric, is preferred in the biological condition for the subunit. We believe that the (PS)(2) server would be a useful tool for biologists who are interested not only in the structures of protein complexes but also in the coupling between subunits of the complexes. The (PS)(2) is freely available at http://ps2v3.life.nctu.edu.tw/. PMID:25943546

  8. (PS)2: protein structure prediction server version 3.0.

    PubMed

    Huang, Tsun-Tsao; Hwang, Jenn-Kang; Chen, Chu-Huang; Chu, Chih-Sheng; Lee, Chi-Wen; Chen, Chih-Chieh

    2015-07-01

    Protein complexes are involved in many biological processes. Examining coupling between subunits of a complex would be useful to understand the molecular basis of protein function. Here, our updated (PS)(2) web server predicts the three-dimensional structures of protein complexes based on comparative modeling; furthermore, this server examines the coupling between subunits of the predicted complex by combining structural and evolutionary considerations. The predicted complex structure could be indicated and visualized by Java-based 3D graphics viewers and the structural and evolutionary profiles are shown and compared chain-by-chain. For each subunit, considerations with or without the packing contribution of other subunits cause the differences in similarities between structural and evolutionary profiles, and these differences imply which form, complex or monomeric, is preferred in the biological condition for the subunit. We believe that the (PS)(2) server would be a useful tool for biologists who are interested not only in the structures of protein complexes but also in the coupling between subunits of the complexes. The (PS)(2) is freely available at http://ps2v3.life.nctu.edu.tw/.

  9. Combining physicochemical and evolutionary information for protein contact prediction.

    PubMed

    Schneider, Michael; Brock, Oliver

    2014-01-01

    We introduce a novel contact prediction method that achieves high prediction accuracy by combining evolutionary and physicochemical information about native contacts. We obtain evolutionary information from multiple-sequence alignments and physicochemical information from predicted ab initio protein structures. These structures represent low-energy states in an energy landscape and thus capture the physicochemical information encoded in the energy function. Such low-energy structures are likely to contain native contacts, even if their overall fold is not native. To differentiate native from non-native contacts in those structures, we develop a graph-based representation of the structural context of contacts. We then use this representation to train an support vector machine classifier to identify most likely native contacts in otherwise non-native structures. The resulting contact predictions are highly accurate. As a result of combining two sources of information--evolutionary and physicochemical--we maintain prediction accuracy even when only few sequence homologs are present. We show that the predicted contacts help to improve ab initio structure prediction. A web service is available at http://compbio.robotics.tu-berlin.de/epc-map/.

  10. Prediction of lipid-binding regions in cytoplasmic and extracellular loops of membrane proteins as exemplified by protein translocation membrane proteins.

    PubMed

    Keller, Rob C A

    2013-01-01

    The presence of possible lipid-binding regions in the cytoplasmic or extracellular loops of membrane proteins with an emphasis on protein translocation membrane proteins was investigated in this study using bioinformatics. Recent developments in approaches recognizing lipid-binding regions in proteins were found to be promising. In this study a total bioinformatics approach specialized in identifying lipid-binding helical regions in proteins was explored. Two features of the protein translocation membrane proteins, the position of the transmembrane regions and the identification of additional lipid-binding regions, were analyzed. A number of well-studied protein translocation membrane protein structures were checked in order to demonstrate the predictive value of the bioinformatics approach. Furthermore, the results demonstrated that lipid-binding regions in the cytoplasmic and extracellular loops in protein translocation membrane proteins can be predicted, and it is proposed that the interaction of these regions with phospholipids is important for proper functioning during protein translocation. PMID:22961045

  11. Predicted Protein Subcellular Localization in Dominant Surface Ocean Bacterioplankton

    PubMed Central

    2012-01-01

    Bacteria consume dissolved organic matter (DOM) through hydrolysis, transport and intracellular metabolism, and these activities occur in distinct subcellular localizations. Bacterial protein subcellular localizations for several major marine bacterial groups were predicted using genomic, metagenomic and metatranscriptomic data sets following modification of MetaP software for use with partial gene sequences. The most distinct pattern of subcellular localization was found for Bacteroidetes, whose genomes were substantially enriched with outer membrane and extracellular proteins but depleted of inner membrane proteins compared with five other taxa (SAR11, Roseobacter, Synechococcus, Prochlorococcus, oligotrophic marine Gammaproteobacteria). When subcellular localization patterns were compared between genes and transcripts, three taxa had expression biased toward proteins localized to cell locations outside of the cytosol (SAR11, Roseobacter, and Synechococcus), as expected based on the importance of carbon and nutrient acquisition in an oligotrophic ocean, but two taxa did not (oligotrophic marine Gammaproteobacteria and Bacteroidetes). Diel variations in the fraction and putative gene functions of transcripts encoding inner membrane and periplasmic proteins compared to cytoplasmic proteins suggest a close coupling of photosynthetic extracellular release and bacterial consumption, providing insights into interactions between phytoplankton, bacteria, and DOM. PMID:22773648

  12. Exploration of the dynamic properties of protein complexes predicted from spatially constrained protein-protein interaction networks.

    PubMed

    Yen, Eric A; Tsay, Aaron; Waldispuhl, Jerome; Vogel, Jackie

    2014-05-01

    Protein complexes are not static, but rather highly dynamic with subunits that undergo 1-dimensional diffusion with respect to each other. Interactions within protein complexes are modulated through regulatory inputs that alter interactions and introduce new components and deplete existing components through exchange. While it is clear that the structure and function of any given protein complex is coupled to its dynamical properties, it remains a challenge to predict the possible conformations that complexes can adopt. Protein-fragment Complementation Assays detect physical interactions between protein pairs constrained to ≤8 nm from each other in living cells. This method has been used to build networks composed of 1000s of pair-wise interactions. Significantly, these networks contain a wealth of dynamic information, as the assay is fully reversible and the proteins are expressed in their natural context. In this study, we describe a method that extracts this valuable information in the form of predicted conformations, allowing the user to explore the conformational landscape, to search for structures that correlate with an activity state, and estimate the abundance of conformations in the living cell. The generator is based on a Markov Chain Monte Carlo simulation that uses the interaction dataset as input and is constrained by the physical resolution of the assay. We applied this method to an 18-member protein complex composed of the seven core proteins of the budding yeast Arp2/3 complex and 11 associated regulators and effector proteins. We generated 20,480 output structures and identified conformational states using principle component analysis. We interrogated the conformation landscape and found evidence of symmetry breaking, a mixture of likely active and inactive conformational states and dynamic exchange of the core protein Arc15 between core and regulatory components. Our method provides a novel tool for prediction and visualization of the hidden

  13. Network Analysis of Circular Permutations in Multidomain Proteins Reveals Functional Linkages for Uncharacterized Proteins

    PubMed Central

    Adjeroh, Donald; Jiang, Yue; Jiang, Bing-Hua; Lin, Jie

    2014-01-01

    Various studies have implicated different multidomain proteins in cancer. However, there has been little or no detailed study on the role of circular multidomain proteins in the general problem of cancer or on specific cancer types. This work represents an initial attempt at investigating the potential for predicting linkages between known cancer-associated proteins with uncharacterized or hypothetical multidomain proteins, based primarily on circular permutation (CP) relationships. First, we propose an efficient algorithm for rapid identification of both exact and approximate CPs in multidomain proteins. Using the circular relations identified, we construct networks between multidomain proteins, based on which we perform functional annotation of multidomain proteins. We then extend the method to construct subnetworks for selected cancer subtypes, and performed prediction of potential link-ages between uncharacterized multidomain proteins and the selected cancer types. We include practical results showing the performance of the proposed methods. PMID:25741177

  14. Predictive energy landscapes for folding membrane protein assemblies

    NASA Astrophysics Data System (ADS)

    Truong, Ha H.; Kim, Bobby L.; Schafer, Nicholas P.; Wolynes, Peter G.

    2015-12-01

    We study the energy landscapes for membrane protein oligomerization using the Associative memory, Water mediated, Structure and Energy Model with an implicit membrane potential (AWSEM-membrane), a coarse-grained molecular dynamics model previously optimized under the assumption that the energy landscapes for folding α-helical membrane protein monomers are funneled once their native topology within the membrane is established. In this study we show that the AWSEM-membrane force field is able to sample near native binding interfaces of several oligomeric systems. By predicting candidate structures using simulated annealing, we further show that degeneracies in predicting structures of membrane protein monomers are generally resolved in the folding of the higher order assemblies as is the case in the assemblies of both nicotinic acetylcholine receptor and V-type Na+-ATPase dimers. The physics of the phenomenon resembles domain swapping, which is consistent with the landscape following the principle of minimal frustration. We revisit also the classic Khorana study of the reconstitution of bacteriorhodopsin from its fragments, which is the close analogue of the early Anfinsen experiment on globular proteins. Here, we show the retinal cofactor likely plays a major role in selecting the final functional assembly.

  15. High-throughput prediction of RNA, DNA and protein binding regions mediated by intrinsic disorder.

    PubMed

    Peng, Zhenling; Kurgan, Lukasz

    2015-10-15

    Intrinsically disordered proteins and regions (IDPs and IDRs) lack stable 3D structure under physiological conditions in-vitro, are common in eukaryotes, and facilitate interactions with RNA, DNA and proteins. Current methods for prediction of IDPs and IDRs do not provide insights into their functions, except for a handful of methods that address predictions of protein-binding regions. We report first-of-its-kind computational method DisoRDPbind for high-throughput prediction of RNA, DNA and protein binding residues located in IDRs from protein sequences. DisoRDPbind is implemented using a runtime-efficient multi-layered design that utilizes information extracted from physiochemical properties of amino acids, sequence complexity, putative secondary structure and disorder and sequence alignment. Empirical tests demonstrate that it provides accurate predictions that are competitive with other predictors of disorder-mediated protein binding regions and complementary to the methods that predict RNA- and DNA-binding residues annotated based on crystal structures. Application in Homo sapiens, Mus musculus, Caenorhabditis elegans and Drosophila melanogaster proteomes reveals that RNA- and DNA-binding proteins predicted by DisoRDPbind complement and overlap with the corresponding known binding proteins collected from several sources. Also, the number of the putative protein-binding regions predicted with DisoRDPbind correlates with the promiscuity of proteins in the corresponding protein-protein interaction networks. Webserver: http://biomine.ece.ualberta.ca/DisoRDPbind/.

  16. Protein-Based Urine Test Predicts Kidney Transplant Outcomes

    MedlinePlus

    ... News Releases News Release Thursday, August 22, 2013 Protein-based urine test predicts kidney transplant outcomes NIH- ... supporting development of noninvasive tests. Levels of a protein in the urine of kidney transplant recipients can ...

  17. Functionalizing Microporous Membranes for Protein Purification and Protein Digestion

    NASA Astrophysics Data System (ADS)

    Dong, Jinlan; Bruening, Merlin L.

    2015-07-01

    This review examines advances in the functionalization of microporous membranes for protein purification and the development of protease-containing membranes for controlled protein digestion prior to mass spectrometry analysis. Recent studies confirm that membranes are superior to bead-based columns for rapid protein capture, presumably because convective mass transport in membrane pores rapidly brings proteins to binding sites. Modification of porous membranes with functional polymeric films or TiO2 nanoparticles yields materials that selectively capture species ranging from phosphopeptides to His-tagged proteins, and protein-binding capacities often exceed those of commercial beads. Thin membranes also provide a convenient framework for creating enzyme-containing reactors that afford control over residence times. With millisecond residence times, reactors with immobilized proteases limit protein digestion to increase sequence coverage in mass spectrometry analysis and facilitate elucidation of protein structures. This review emphasizes the advantages of membrane-based techniques and concludes with some challenges for their practical application.

  18. Prediction of thermodynamic instabilities of protein solutions from simple protein-protein interactions

    NASA Astrophysics Data System (ADS)

    D'Agostino, Tommaso; Solana, José Ramón; Emanuele, Antonio

    2013-10-01

    Statistical thermodynamics of protein solutions is often studied in terms of simple, microscopic models of particles interacting via pairwise potentials. Such modelling can reproduce the short range structure of protein solutions at equilibrium and predict thermodynamics instabilities of these systems. We introduce a square well model of effective protein-protein interaction that embeds the solvent’s action. We modify an existing model [45] by considering a well depth having an explicit dependence on temperature, i.e. an explicit free energy character, thus encompassing the statistically relevant configurations of solvent molecules around proteins. We choose protein solutions exhibiting demixing upon temperature decrease (lysozyme, enthalpy driven) and upon temperature increase (haemoglobin, entropy driven). We obtain satisfactory fits of spinodal curves for both the two proteins without adding any mean field term, thus extending the validity of the original model. Our results underline the solvent role in modulating or stretching the interaction potential.

  19. Functional classification of CATH superfamilies: a domain-based approach for protein function annotation

    PubMed Central

    Das, Sayoni; Lee, David; Sillitoe, Ian; Dawson, Natalie L.; Lees, Jonathan G.; Orengo, Christine A.

    2015-01-01

    Motivation: Computational approaches that can predict protein functions are essential to bridge the widening function annotation gap especially since <1.0% of all proteins in UniProtKB have been experimentally characterized. We present a domain-based method for protein function classification and prediction of functional sites that exploits functional sub-classification of CATH superfamilies. The superfamilies are sub-classified into functional families (FunFams) using a hierarchical clustering algorithm supervised by a new classification method, FunFHMMer. Results: FunFHMMer generates more functionally coherent groupings of protein sequences than other domain-based protein classifications. This has been validated using known functional information. The conserved positions predicted by the FunFams are also found to be enriched in known functional residues. Moreover, the functional annotations provided by the FunFams are found to be more precise than other domain-based resources. FunFHMMer currently identifies 110 439 FunFams in 2735 superfamilies which can be used to functionally annotate > 16 million domain sequences. Availability and implementation: All FunFam annotation data are made available through the CATH webpages (http://www.cathdb.info). The FunFHMMer webserver (http://www.cathdb.info/search/by_funfhmmer) allows users to submit query sequences for assignment to a CATH FunFam. Contact: sayoni.das.12@ucl.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online. PMID:26139634

  20. Parental Education Predicts Corticostriatal Functionality in Adulthood

    PubMed Central

    Manuck, Stephen B.; Sheu, Lei K.; Kuan, Dora C. H.; Votruba-Drzal, Elizabeth; Craig, Anna E.; Hariri, Ahmad R.

    2011-01-01

    Socioeconomic disadvantage experienced in early development predicts ill health in adulthood. However, the neurobiological pathways linking early disadvantage to adult health remain unclear. Lower parental education—a presumptive indicator of early socioeconomic disadvantage—predicts health-impairing adult behaviors, including tobacco and alcohol dependencies. These behaviors depend, in part, on the functionality of corticostriatal brain systems that 1) show developmental plasticity and early vulnerability, 2) process reward-related information, and 3) regulate impulsive decisions and actions. Hence, corticostriatal functionality in adulthood may covary directly with indicators of early socioeconomic disadvantage, particularly lower parental education. Here, we tested the covariation between parental education and corticostriatal activation and connectivity in 76 adults without confounding clinical syndromes. Corticostriatal activation and connectivity were assessed during the processing of stimuli signaling monetary gains (positive feedback [PF]) and losses (negative feedback). After accounting for participants’ own education and other explanatory factors, lower parental education predicted reduced activation in anterior cingulate and dorsomedial prefrontal cortices during PF, along with reduced connectivity between these cortices and orbitofrontal and striatal areas implicated in reward processing and impulse regulation. In speculation, adult alterations in corticostriatal functionality may represent facets of a neurobiological endophenotype linked to socioeconomic conditions of early development. PMID:20810623

  1. MEGADOCK: An All-to-All Protein-Protein Interaction Prediction System Using Tertiary Structure Data

    PubMed Central

    Ohue, Masahito; Matsuzaki, Yuri; Uchikoga, Nobuyuki; Ishida, Takashi; Akiyama, Yutaka

    2014-01-01

    The elucidation of protein-protein interaction (PPI) networks is important for understanding cellular structure and function and structure-based drug design. However, the development of an effective method to conduct exhaustive PPI screening represents a computational challenge. We have been investigating a protein docking approach based on shape complementarity and physicochemical properties. We describe here the development of the protein-protein docking software package “MEGADOCK” that samples an extremely large number of protein dockings at high speed. MEGADOCK reduces the calculation time required for docking by using several techniques such as a novel scoring function called the real Pairwise Shape Complementarity (rPSC) score. We showed that MEGADOCK is capable of exhaustive PPI screening by completing docking calculations 7.5 times faster than the conventional docking software, ZDOCK, while maintaining an acceptable level of accuracy. When MEGADOCK was applied to a subset of a general benchmark dataset to predict 120 relevant interacting pairs from 120 x 120 = 14,400 combinations of proteins, an F-measure value of 0.231 was obtained. Further, we showed that MEGADOCK can be applied to a large-scale protein-protein interaction-screening problem with accuracy better than random. When our approach is combined with parallel high-performance computing systems, it is now feasible to search and analyze protein-protein interactions while taking into account three-dimensional structures at the interactome scale. MEGADOCK is freely available at http://www.bi.cs.titech.ac.jp/megadock. PMID:23855673

  2. Origins of Protein Functions in Cells

    NASA Technical Reports Server (NTRS)

    Seelig, Burchard; Pohorille, Andrzej

    2011-01-01

    In modern organisms proteins perform a majority of cellular functions, such as chemical catalysis, energy transduction and transport of material across cell walls. Although great strides have been made towards understanding protein evolution, a meaningful extrapolation from contemporary proteins to their earliest ancestors is virtually impossible. In an alternative approach, the origin of water-soluble proteins was probed through the synthesis and in vitro evolution of very large libraries of random amino acid sequences. In combination with computer modeling and simulations, these experiments allow us to address a number of fundamental questions about the origins of proteins. Can functionality emerge from random sequences of proteins? How did the initial repertoire of functional proteins diversify to facilitate new functions? Did this diversification proceed primarily through drawing novel functionalities from random sequences or through evolution of already existing proto-enzymes? Did protein evolution start from a pool of proteins defined by a frozen accident and other collections of proteins could start a different evolutionary pathway? Although we do not have definitive answers to these questions yet, important clues have been uncovered. In one example (Keefe and Szostak, 2001), novel ATP binding proteins were identified that appear to be unrelated in both sequence and structure to any known ATP binding proteins. One of these proteins was subsequently redesigned computationally to bind GTP through introducing several mutations that introduce targeted structural changes to the protein, improve its binding to guanine and prevent water from accessing the active center. This study facilitates further investigations of individual evolutionary steps that lead to a change of function in primordial proteins. In a second study (Seelig and Szostak, 2007), novel enzymes were generated that can join two pieces of RNA in a reaction for which no natural enzymes are known

  3. Deducing protein function by forensic integrative cell biology.

    PubMed

    Earnshaw, William C

    2013-12-01

    Our ability to sequence genomes has provided us with near-complete lists of the proteins that compose cells, tissues, and organisms, but this is only the beginning of the process to discover the functions of cellular components. In the future, it's going to be crucial to develop computational analyses that can predict the biological functions of uncharacterised proteins. At the same time, we must not forget those fundamental experimental skills needed to confirm the predictions or send the analysts back to the drawing board to devise new ones.

  4. Functional annotation of hypothetical proteins – A review

    PubMed Central

    Sivashankari, Selvarajan; Shanmughavel, Piramanayagam

    2006-01-01

    The complete human genome sequences in the public database provide ways to understand the blue print of life. As of June 29, 2006, 27 archaeal, 326 bacterial and 21 eukaryotes is complete genomes are available and the sequencing for 316 bacterial, 24 archaeal, 126 eukaryotic genomes are in progress. The traditional biochemical/molecular experiments can assign accurate functions for genes in these genomes. However, the process is time-consuming and costly. Despite several efforts, only 50-60 % of genes have been annotated in most completely sequenced genomes. Automated genome sequence analysis and annotation may provide ways to understand genomes. Thus, determination of protein function is one of the challenging problems of the post-genome era. This demands bioinformatics to predict functions of un-annotated protein sequences by developing efficient tools. Here, we discuss some of the recent and popular approaches developed in Bioinformatics to predict functions for hypothetical proteins. PMID:17597916

  5. HHomp--prediction and classification of outer membrane proteins.

    PubMed

    Remmert, Michael; Linke, Dirk; Lupas, Andrei N; Söding, Johannes

    2009-07-01

    Outer membrane proteins (OMPs) are the transmembrane proteins found in the outer membranes of Gram-negative bacteria, mitochondria and plastids. Most prediction methods have focused on analogous features, such as alternating hydrophobicity patterns. Here, we start from the observation that almost all beta-barrel OMPs are related by common ancestry. We identify proteins as OMPs by detecting their homologous relationships to known OMPs using sequence similarity. Given an input sequence, HHomp builds a profile hidden Markov model (HMM) and compares it with an OMP database by pairwise HMM comparison, integrating OMP predictions by PROFtmb. A crucial ingredient is the OMP database, which contains profile HMMs for over 20,000 putative OMP sequences. These were collected with the exhaustive, transitive homology detection method HHsenser, starting from 23 representative OMPs in the PDB database. In a benchmark on TransportDB, HHomp detects 63.5% of the true positives before including the first false positive. This is 70% more than PROFtmb, four times more than BOMP and 10 times more than TMB-Hunt. In Escherichia coli, HHomp identifies 57 out of 59 known OMPs and correctly assigns them to their functional subgroups. HHomp can be accessed at http://toolkit.tuebingen.mpg.de/hhomp.

  6. Phosphoinositide Control of Membrane Protein Function

    PubMed Central

    Logothetis, Diomedes E.; Petrou, Vasileios I.; Zhang, Miao; Mahajan, Rahul; Meng, Xuan-Yu; Adney, Scott K.; Cui, Meng; Baki, Lia

    2015-01-01

    Anionic phospholipids are critical constituents of the inner leaflet of the plasma membrane, ensuring appropriate membrane topology of transmembrane proteins. Additionally, in eukaryotes, the negatively charged phosphoinositides serve as key signals not only through their hydrolysis products but also through direct control of transmembrane protein function. Direct phosphoinositide control of the activity of ion channels and transporters has been the most convincing case of the critical importance of phospholipid-protein interactions in the functional control of membrane proteins. Furthermore, second messengers, such as [Ca2+]i, or posttranslational modifications, such as phosphorylation, can directly or allosterically fine-tune phospholipid-protein interactions and modulate activity. Recent advances in structure determination of membrane proteins have allowed investigators to obtain complexes of ion channels with phosphoinositides and to use computational and experimental approaches to probe the dynamic mechanisms by which lipid-protein interactions control active and inactive protein states. PMID:25293526

  7. The unexpected structure of the designed protein Octarellin V.1 forms a challenge for protein structure prediction tools.

    PubMed

    Figueroa, Maximiliano; Sleutel, Mike; Vandevenne, Marylene; Parvizi, Gregory; Attout, Sophie; Jacquin, Olivier; Vandenameele, Julie; Fischer, Axel W; Damblon, Christian; Goormaghtigh, Erik; Valerio-Lepiniec, Marie; Urvoas, Agathe; Durand, Dominique; Pardon, Els; Steyaert, Jan; Minard, Philippe; Maes, Dominique; Meiler, Jens; Matagne, André; Martial, Joseph A; Van de Weerdt, Cécile

    2016-07-01

    Despite impressive successes in protein design, designing a well-folded protein of more 100 amino acids de novo remains a formidable challenge. Exploiting the promising biophysical features of the artificial protein Octarellin V, we improved this protein by directed evolution, thus creating a more stable and soluble protein: Octarellin V.1. Next, we obtained crystals of Octarellin V.1 in complex with crystallization chaperons and determined the tertiary structure. The experimental structure of Octarellin V.1 differs from its in silico design: the (αβα) sandwich architecture bears some resemblance to a Rossman-like fold instead of the intended TIM-barrel fold. This surprising result gave us a unique and attractive opportunity to test the state of the art in protein structure prediction, using this artificial protein free of any natural selection. We tested 13 automated webservers for protein structure prediction and found none of them to predict the actual structure. More than 50% of them predicted a TIM-barrel fold, i.e. the structure we set out to design more than 10years ago. In addition, local software runs that are human operated can sample a structure similar to the experimental one but fail in selecting it, suggesting that the scoring and ranking functions should be improved. We propose that artificial proteins could be used as tools to test the accuracy of protein structure prediction algorithms, because their lack of evolutionary pressure and unique sequences features.

  8. Qualitative and Quantitative Protein Complex Prediction Through Proteome-Wide Simulations.

    PubMed

    Rizzetto, Simone; Priami, Corrado; Csikász-Nagy, Attila

    2015-10-01

    Despite recent progress in proteomics most protein complexes are still unknown. Identification of these complexes will help us understand cellular regulatory mechanisms and support development of new drugs. Therefore it is really important to establish detailed information about the composition and the abundance of protein complexes but existing algorithms can only give qualitative predictions. Herein, we propose a new approach based on stochastic simulations of protein complex formation that integrates multi-source data--such as protein abundances, domain-domain interactions and functional annotations--to predict alternative forms of protein complexes together with their abundances. This method, called SiComPre (Simulation based Complex Prediction), achieves better qualitative prediction of yeast and human protein complexes than existing methods and is the first to predict protein complex abundances. Furthermore, we show that SiComPre can be used to predict complexome changes upon drug treatment with the example of bortezomib. SiComPre is the first method to produce quantitative predictions on the abundance of molecular complexes while performing the best qualitative predictions. With new data on tissue specific protein complexes becoming available SiComPre will be able to predict qualitative and quantitative differences in the complexome in various tissue types and under various conditions.

  9. J domain independent functions of J proteins.

    PubMed

    Ajit Tamadaddi, Chetana; Sahi, Chandan

    2016-07-01

    Heat shock proteins of 40 kDa (Hsp40s), also called J proteins, are obligate partners of Hsp70s. Via their highly conserved and functionally critical J domain, J proteins interact and modulate the activity of their Hsp70 partners. Mutations in the critical residues in the J domain often result in the null phenotype for the J protein in question. However, as more J proteins have been characterized, it is becoming increasingly clear that a significant number of J proteins do not "completely" rely on their J domains to carry out their cellular functions, as previously thought. In some cases, regions outside the highly conserved J domain have become more important making the J domain dispensable for some, if not for all functions of a J protein. This has profound effects on the evolution of such J proteins. Here we present selected examples of J proteins that perform J domain independent functions and discuss this in the context of evolution of J proteins with dispensable J domains and J-like proteins in eukaryotes.

  10. A yeast functional screen predicts new candidate ALS disease genes

    PubMed Central

    Couthouis, Julien; Hart, Michael P.; Shorter, James; DeJesus-Hernandez, Mariely; Erion, Renske; Oristano, Rachel; Liu, Annie X.; Ramos, Daniel; Jethava, Niti; Hosangadi, Divya; Epstein, James; Chiang, Ashley; Diaz, Zamia; Nakaya, Tadashi; Ibrahim, Fadia; Kim, Hyung-Jun; Solski, Jennifer A.; Williams, Kelly L.; Mojsilovic-Petrovic, Jelena; Ingre, Caroline; Boylan, Kevin; Graff-Radford, Neill R.; Dickson, Dennis W.; Clay-Falcone, Dana; Elman, Lauren; McCluskey, Leo; Greene, Robert; Kalb, Robert G.; Lee, Virginia M.-Y.; Trojanowski, John Q.; Ludolph, Albert; Robberecht, Wim; Andersen, Peter M.; Nicholson, Garth A.; Blair, Ian P.; King, Oliver D.; Bonini, Nancy M.; Van Deerlin, Vivianna; Rademakers, Rosa; Mourelatos, Zissimos; Gitler, Aaron D.

    2011-01-01

    Amyotrophic lateral sclerosis (ALS) is a devastating and universally fatal neurodegenerative disease. Mutations in two related RNA-binding proteins, TDP-43 and FUS, that harbor prion-like domains, cause some forms of ALS. There are at least 213 human proteins harboring RNA recognition motifs, including FUS and TDP-43, raising the possibility that additional RNA-binding proteins might contribute to ALS pathogenesis. We performed a systematic survey of these proteins to find additional candidates similar to TDP-43 and FUS, followed by bioinformatics to predict prion-like domains in a subset of them. We sequenced one of these genes, TAF15, in patients with ALS and identified missense variants, which were absent in a large number of healthy controls. These disease-associated variants of TAF15 caused formation of cytoplasmic foci when expressed in primary cultures of spinal cord neurons. Very similar to TDP-43 and FUS, TAF15 aggregated in vitro and conferred neurodegeneration in Drosophila, with the ALS-linked variants having a more severe effect than wild type. Immunohistochemistry of postmortem spinal cord tissue revealed mislocalization of TAF15 in motor neurons of patients with ALS. We propose that aggregation-prone RNA-binding proteins might contribute very broadly to ALS pathogenesis and the genes identified in our yeast functional screen, coupled with prion-like domain prediction analysis, now provide a powerful resource to facilitate ALS disease gene discovery. PMID:22065782

  11. Protein-protein structure prediction by scoring molecular dynamics trajectories of putative poses.

    PubMed

    Sarti, Edoardo; Gladich, Ivan; Zamuner, Stefano; Correia, Bruno E; Laio, Alessandro

    2016-09-01

    The prediction of protein-protein interactions and their structural configuration remains a largely unsolved problem. Most of the algorithms aimed at finding the native conformation of a protein complex starting from the structure of its monomers are based on searching the structure corresponding to the global minimum of a suitable scoring function. However, protein complexes are often highly flexible, with mobile side chains and transient contacts due to thermal fluctuations. Flexibility can be neglected if one aims at finding quickly the approximate structure of the native complex, but may play a role in structure refinement, and in discriminating solutions characterized by similar scores. We here benchmark the capability of some state-of-the-art scoring functions (BACH-SixthSense, PIE/PISA and Rosetta) in discriminating finite-temperature ensembles of structures corresponding to the native state and to non-native configurations. We produce the ensembles by running thousands of molecular dynamics simulations in explicit solvent starting from poses generated by rigid docking and optimized in vacuum. We find that while Rosetta outperformed the other two scoring functions in scoring the structures in vacuum, BACH-SixthSense and PIE/PISA perform better in distinguishing near-native ensembles of structures generated by molecular dynamics in explicit solvent. Proteins 2016; 84:1312-1320. © 2016 Wiley Periodicals, Inc. PMID:27253756

  12. Predicting protein-protein interactions in unbalanced data using the primary structure of proteins

    PubMed Central

    2010-01-01

    Background Elucidating protein-protein interactions (PPIs) is essential to constructing protein interaction networks and facilitating our understanding of the general principles of biological systems. Previous studies have revealed that interacting protein pairs can be predicted by their primary structure. Most of these approaches have achieved satisfactory performance on datasets comprising equal number of interacting and non-interacting protein pairs. However, this ratio is highly unbalanced in nature, and these techniques have not been comprehensively evaluated with respect to the effect of the large number of non-interacting pairs in realistic datasets. Moreover, since highly unbalanced distributions usually lead to large datasets, more efficient predictors are desired when handling such challenging tasks. Results This study presents a method for PPI prediction based only on sequence information, which contributes in three aspects. First, we propose a probability-based mechanism for transforming protein sequences into feature vectors. Second, the proposed predictor is designed with an efficient classification algorithm, where the efficiency is essential for handling highly unbalanced datasets. Third, the proposed PPI predictor is assessed with several unbalanced datasets with different positive-to-negative ratios (from 1:1 to 1:15). This analysis provides solid evidence that the degree of dataset imbalance is important to PPI predictors. Conclusions Dealing with data imbalance is a key issue in PPI prediction since there are far fewer interacting protein pairs than non-interacting ones. This article provides a comprehensive study on this issue and develops a practical tool that achieves both good prediction performance and efficiency using only protein sequence information. PMID:20361868

  13. Hill-Climbing search and diversification within an evolutionary approach to protein structure prediction.

    PubMed

    Chira, Camelia; Horvath, Dragos; Dumitrescu, D

    2011-01-01

    Proteins are complex structures made of amino acids having a fundamental role in the correct functioning of living cells. The structure of a protein is the result of the protein folding process. However, the general principles that govern the folding of natural proteins into a native structure are unknown. The problem of predicting a protein structure with minimum-energy starting from the unfolded amino acid sequence is a highly complex and important task in molecular and computational biology. Protein structure prediction has important applications in fields such as drug design and disease prediction. The protein structure prediction problem is NP-hard even in simplified lattice protein models. An evolutionary model based on hill-climbing genetic operators is proposed for protein structure prediction in the hydrophobic - polar (HP) model. Problem-specific search operators are implemented and applied using a steepest-ascent hill-climbing approach. Furthermore, the proposed model enforces an explicit diversification stage during the evolution in order to avoid local optimum. The main features of the resulting evolutionary algorithm - hill-climbing mechanism and diversification strategy - are evaluated in a set of numerical experiments for the protein structure prediction problem to assess their impact to the efficiency of the search process. Furthermore, the emerging consolidated model is compared to relevant algorithms from the literature for a set of difficult bidimensional instances from lattice protein models. The results obtained by the proposed algorithm are promising and competitive with those of related methods.

  14. Food Protein Functionality--A New Model.

    PubMed

    Foegeding, E Allen

    2015-12-01

    Proteins in foods serve dual roles as nutrients and structural building blocks. The concept of protein functionality has historically been restricted to nonnutritive functions--such as creating emulsions, foams, and gels--but this places sole emphasis on food quality considerations and potentially overlooks modifications that may also alter nutritional quality or allergenicity. A new model is proposed that addresses the function of proteins in foods based on the length scale(s) responsible for the function. Properties such as flavor binding, color, allergenicity, and digestibility are explained based on the structure of individual molecules; placing this functionality at the nano/molecular scale. At the next higher scale, applications in foods involving gelation, emulsification, and foam formation are based on how proteins form secondary structures that are seen at the nano and microlength scales, collectively called the mesoscale. The macroscale structure represents the arrangements of molecules and mesoscale structures in a food. Macroscale properties determine overall product appearance, stability, and texture. The historical approach of comparing among proteins based on forming and stabilizing specific mesoscale structures remains valid but emphasis should be on a common means for structure formation to allow for comparisons across investigations. For applications in food products, protein functionality should start with identification of functional needs across scales. Those needs are then evaluated relative to how processing and other ingredients could alter desired molecular scale properties, or proper formation of mesoscale structures. This allows for a comprehensive approach to achieving the desired function of proteins in foods.

  15. Predictability of gene ontology slim-terms from primary structure information in Embryophyta plant proteins

    PubMed Central

    2013-01-01

    Background Proteins are the key elements on the path from genetic information to the development of life. The roles played by the different proteins are difficult to uncover experimentally as this process involves complex procedures such as genetic modifications, injection of fluorescent proteins, gene knock-out methods and others. The knowledge learned from each protein is usually annotated in databases through different methods such as the proposed by The Gene Ontology (GO) consortium. Different methods have been proposed in order to predict GO terms from primary structure information, but very few are available for large-scale functional annotation of plants, and reported success rates are much less than the reported by other non-plant predictors. This paper explores the predictability of GO annotations on proteins belonging to the Embryophyta group from a set of features extracted solely from their primary amino acid sequence. Results High predictability of several GO terms was found for Molecular Function and Cellular Component. As expected, a lower degree of predictability was found on Biological Process ontology annotations, although a few biological processes were easily predicted. Proteins related to transport and transcription were particularly well predicted from primary structure information. The most discriminant features for prediction were those related to electric charges of the amino-acid sequence and hydropathicity derived features. Conclusions An analysis of GO-slim terms predictability in plants was carried out, in order to determine single categories or groups of functions that are most related with primary structure information. For each highly predictable GO term, the responsible features of such successfulness were identified and discussed. In addition to most published studies, focused on few categories or single ontologies, results in this paper comprise a complete landscape of GO predictability from primary structure encompassing 75 GO

  16. The 82-plex plasma protein signature that predicts increasing inflammation

    PubMed Central

    Tepel, Martin; Beck, Hans C.; Tan, Qihua; Borst, Christoffer; Rasmussen, Lars M.

    2015-01-01

    The objective of the study was to define the specific plasma protein signature that predicts the increase of the inflammation marker C-reactive protein from index day to next-day using proteome analysis and novel bioinformatics tools. We performed a prospective study of 91 incident kidney transplant recipients and quantified 359 plasma proteins simultaneously using nano-Liquid-Chromatography-Tandem Mass-Spectrometry in individual samples and plasma C-reactive protein on the index day and the next day. Next-day C-reactive protein increased in 59 patients whereas it decreased in 32 patients. The prediction model selected and validated 82 plasma proteins which determined increased next-day C-reactive protein (area under receiver-operator-characteristics curve, 0.772; 95% confidence interval, 0.669 to 0.876; P < 0.0001). Multivariable logistic regression showed that 82-plex protein signature (P < 0.001) was associated with observed increased next-day C-reactive protein. The 82-plex protein signature outperformed routine clinical procedures. The category-free net reclassification index improved with 82-plex plasma protein signature (total net reclassification index, 88.3%). Using the 82-plex plasma protein signature increased net reclassification index with a clinical meaningful 10% increase of risk mainly by the improvement of reclassification of subjects in the event group. An 82-plex plasma protein signature predicts an increase of the inflammatory marker C-reactive protein. PMID:26445912

  17. Architecture and Function of Mechanosensitive Membrane Protein Lattices

    PubMed Central

    Kahraman, Osman; Koch, Peter D.; Klug, William S.; Haselwandter, Christoph A.

    2016-01-01

    Experiments have revealed that membrane proteins can form two-dimensional clusters with regular translational and orientational protein arrangements, which may allow cells to modulate protein function. However, the physical mechanisms yielding supramolecular organization and collective function of membrane proteins remain largely unknown. Here we show that bilayer-mediated elastic interactions between membrane proteins can yield regular and distinctive lattice architectures of protein clusters, and may provide a link between lattice architecture and lattice function. Using the mechanosensitive channel of large conductance (MscL) as a model system, we obtain relations between the shape of MscL and the supramolecular architecture of MscL lattices. We predict that the tetrameric and pentameric MscL symmetries observed in previous structural studies yield distinct lattice architectures of MscL clusters and that, in turn, these distinct MscL lattice architectures yield distinct lattice activation barriers. Our results suggest general physical mechanisms linking protein symmetry, the lattice architecture of membrane protein clusters, and the collective function of membrane protein lattices. PMID:26771082

  18. Accuracy of functional surfaces on comparatively modeled protein structures

    PubMed Central

    Zhao, Jieling; Dundas, Joe; Kachalo, Sema; Ouyang, Zheng; Liang, Jie

    2012-01-01

    Identification and characterization of protein functional surfaces are important for predicting protein function, understanding enzyme mechanism, and docking small compounds to proteins. As the rapid speed of accumulation of protein sequence information far exceeds that of structures, constructing accurate models of protein functional surfaces and identify their key elements become increasingly important. A promising approach is to build comparative models from sequences using known structural templates such as those obtained from structural genome projects. Here we assess how well this approach works in modeling binding surfaces. By systematically building three-dimensional comparative models of proteins using Modeller, we determine how well functional surfaces can be accurately reproduced. We use an alpha shape based pocket algorithm to compute all pockets on the modeled structures, and conduct a large-scale computation of similarity measurements (pocket RMSD and fraction of functional atoms captured) for 26,590 modeled enzyme protein structures. Overall, we find that when the sequence fragment of the binding surfaces has more than 45% identity to that of the tempalte protein, the modeled surfaces have on average an RMSD of 0.5 Å, and contain 48% or more of the binding surface atoms, with nearly all of the important atoms in the signatures of binding pockets captured. PMID:21541664

  19. Novel 3D bio-macromolecular bilinear descriptors for protein science: Predicting protein structural classes.

    PubMed

    Marrero-Ponce, Yovani; Contreras-Torres, Ernesto; García-Jacas, César R; Barigye, Stephen J; Cubillán, Néstor; Alvarado, Ysaías J

    2015-06-01

    In the present study, we introduce novel 3D protein descriptors based on the bilinear algebraic form in the ℝ(n) space on the coulombic matrix. For the calculation of these descriptors, macromolecular vectors belonging to ℝ(n) space, whose components represent certain amino acid side-chain properties, were used as weighting schemes. Generalization approaches for the calculation of inter-amino acidic residue spatial distances based on Minkowski metrics are proposed. The simple- and double-stochastic schemes were defined as approaches to normalize the coulombic matrix. The local-fragment indices for both amino acid-types and amino acid-groups are presented in order to permit characterizing fragments of interest in proteins. On the other hand, with the objective of taking into account specific interactions among amino acids in global or local indices, geometric and topological cut-offs are defined. To assess the utility of global and local indices a classification model for the prediction of the major four protein structural classes, was built with the Linear Discriminant Analysis (LDA) technique. The developed LDA-model correctly classifies the 92.6% and 92.7% of the proteins on the training and test sets, respectively. The obtained model showed high values of the generalized square correlation coefficient (GC(2)) on both the training and test series. The statistical parameters derived from the internal and external validation procedures demonstrate the robustness, stability and the high predictive power of the proposed model. The performance of the LDA-model demonstrates the capability of the proposed indices not only to codify relevant biochemical information related to the structural classes of proteins, but also to yield suitable interpretability. It is anticipated that the current method will benefit the prediction of other protein attributes or functions. PMID:25843214

  20. Novel 3D bio-macromolecular bilinear descriptors for protein science: Predicting protein structural classes.

    PubMed

    Marrero-Ponce, Yovani; Contreras-Torres, Ernesto; García-Jacas, César R; Barigye, Stephen J; Cubillán, Néstor; Alvarado, Ysaías J

    2015-06-01

    In the present study, we introduce novel 3D protein descriptors based on the bilinear algebraic form in the ℝ(n) space on the coulombic matrix. For the calculation of these descriptors, macromolecular vectors belonging to ℝ(n) space, whose components represent certain amino acid side-chain properties, were used as weighting schemes. Generalization approaches for the calculation of inter-amino acidic residue spatial distances based on Minkowski metrics are proposed. The simple- and double-stochastic schemes were defined as approaches to normalize the coulombic matrix. The local-fragment indices for both amino acid-types and amino acid-groups are presented in order to permit characterizing fragments of interest in proteins. On the other hand, with the objective of taking into account specific interactions among amino acids in global or local indices, geometric and topological cut-offs are defined. To assess the utility of global and local indices a classification model for the prediction of the major four protein structural classes, was built with the Linear Discriminant Analysis (LDA) technique. The developed LDA-model correctly classifies the 92.6% and 92.7% of the proteins on the training and test sets, respectively. The obtained model showed high values of the generalized square correlation coefficient (GC(2)) on both the training and test series. The statistical parameters derived from the internal and external validation procedures demonstrate the robustness, stability and the high predictive power of the proposed model. The performance of the LDA-model demonstrates the capability of the proposed indices not only to codify relevant biochemical information related to the structural classes of proteins, but also to yield suitable interpretability. It is anticipated that the current method will benefit the prediction of other protein attributes or functions.

  1. Protein function from its emergence to diversity in contemporary proteins

    NASA Astrophysics Data System (ADS)

    Goncearenco, Alexander; Berezovsky, Igor N.

    2015-07-01

    The goal of this work is to learn from nature the rules that govern evolution and the design of protein function. The fundamental laws of physics lie in the foundation of the protein structure and all stages of the protein evolution, determining optimal sizes and shapes at different levels of structural hierarchy. We looked back into the very onset of the protein evolution with a goal to find elementary functions (EFs) that came from the prebiotic world and served as building blocks of the first enzymes. We defined the basic structural and functional units of biochemical reactions—elementary functional loops. The diversity of contemporary enzymes can be described via combinations of a limited number of elementary chemical reactions, many of which are performed by the descendants of primitive prebiotic peptides/proteins. By analyzing protein sequences we were able to identify EFs shared by seemingly unrelated protein superfamilies and folds and to unravel evolutionary relations between them. Binding and metabolic processing of the metal- and nucleotide-containing cofactors and ligands are among the most abundant ancient EFs that became indispensable in many natural enzymes. Highly designable folds provide structural scaffolds for many different biochemical reactions. We show that contemporary proteins are built from a limited number of EFs, making their analysis instrumental for establishing the rules for protein design. Evolutionary studies help us to accumulate the library of essential EFs and to establish intricate relations between different folds and functional superfamilies. Generalized sequence-structure descriptors of the EF will become useful in future design and engineering of desired enzymatic functions.

  2. Protein function from its emergence to diversity in contemporary proteins.

    PubMed

    Goncearenco, Alexander; Berezovsky, Igor N

    2015-07-01

    The goal of this work is to learn from nature the rules that govern evolution and the design of protein function. The fundamental laws of physics lie in the foundation of the protein structure and all stages of the protein evolution, determining optimal sizes and shapes at different levels of structural hierarchy. We looked back into the very onset of the protein evolution with a goal to find elementary functions (EFs) that came from the prebiotic world and served as building blocks of the first enzymes. We defined the basic structural and functional units of biochemical reactions-elementary functional loops. The diversity of contemporary enzymes can be described via combinations of a limited number of elementary chemical reactions, many of which are performed by the descendants of primitive prebiotic peptides/proteins. By analyzing protein sequences we were able to identify EFs shared by seemingly unrelated protein superfamilies and folds and to unravel evolutionary relations between them. Binding and metabolic processing of the metal- and nucleotide-containing cofactors and ligands are among the most abundant ancient EFs that became indispensable in many natural enzymes. Highly designable folds provide structural scaffolds for many different biochemical reactions. We show that contemporary proteins are built from a limited number of EFs, making their analysis instrumental for establishing the rules for protein design. Evolutionary studies help us to accumulate the library of essential EFs and to establish intricate relations between different folds and functional superfamilies. Generalized sequence-structure descriptors of the EF will become useful in future design and engineering of desired enzymatic functions.

  3. NOXclass: prediction of protein-protein interaction types

    PubMed Central

    Zhu, Hongbo; Domingues, Francisco S; Sommer, lngolf; Lengauer, Thomas

    2006-01-01

    Background Structural models determined by X-ray crystallography play a central role in understanding protein-protein interactions at the molecular level. Interpretation of these models requires the distinction between non-specific crystal packing contacts and biologically relevant interactions. This has been investigated previously and classification approaches have been proposed. However, less attention has been devoted to distinguishing different types of biological interactions. These interactions are classified as obligate and non-obligate according to the effect of the complex formation on the stability of the protomers. So far no automatic classification methods for distinguishing obligate, non-obligate and crystal packing interactions have been made available. Results Six interface properties have been investigated on a dataset of 243 protein interactions. The six properties have been combined using a support vector machine algorithm, resulting in NOXclass, a classifier for distinguishing obligate, non-obligate and crystal packing interactions. We achieve an accuracy of 91.8% for the classification of these three types of interactions using a leave-one-out cross-validation procedure. Conclusion NOXclass allows the interpretation and analysis of protein quaternary structures. In particular, it generates testable hypotheses regarding the nature of protein-protein interactions, when experimental results are not available. We expect this server will benefit the users of protein structural models, as well as protein crystallographers and NMR spectroscopists. A web server based on the method and the datasets used in this study are available at . PMID:16423290

  4. The APSES family proteins in fungi: Characterizations, evolution and functions.

    PubMed

    Zhao, Yong; Su, Hao; Zhou, Jing; Feng, Huihua; Zhang, Ke-Qin; Yang, Jinkui

    2015-08-01

    The APSES protein family belongs to transcriptional factors of the basic helix-loop-helix (bHLH) class, the originally described members (APSES: Asm1p, Phd1p, Sok2p, Efg1p and StuAp) are used to designate this group of proteins, and they have been identified as key regulators of fungal development and other biological processes. APSES proteins share a highly conserved DNA-binding domain (APSES domain) of about 100 amino acids, whose central domain is predicted to form a typical bHLH structure. Besides APSES domain, several APSES proteins also contain additional domains, such as KilA-N and ankyrin repeats. In recent years, an increasing number of APSES proteins have been identified from diverse fungi, and they involve in numerous biological processes, such as sporulation, cellular differentiation, mycelial growth, secondary metabolism and virulence. Most fungi, including Aspergillus fumigatus, Aspergillus nidulans, Candida albicans, Fusarium graminearum, and Neurospora crassa, contain five APSES proteins. However, Cryptococcus neoformans only contains two APSES proteins, and Saccharomyces cerevisiae contains six APSES proteins. The phylogenetic analysis showed the APSES domains from different fungi were grouped into four clades (A, B, C and D), which is consistent with the result of homologous alignment of APSES domains using DNAman. The roles of APSES proteins in clade C have been studied in detail, while little is known about the roles of other APSES proteins in clades A, B and D. In this review, the biochemical properties and functional domains of APSES proteins are predicted and compared, and the phylogenetic relationship among APSES proteins from various fungi are analyzed based on the APSES domains. Moreover, the functions of APSES proteins in different fungi are summarized and discussed.

  5. A Simple Method for Predicting Transmembrane Proteins Based on Wavelet Transform

    PubMed Central

    Yu, Bin; Zhang, Yan

    2013-01-01

    The increasing protein sequences from the genome project require theoretical methods to predict transmembrane helical segments (TMHs). So far, several prediction methods have been reported, but there are some deficiencies in prediction accuracy and adaptability in these methods. In this paper, a method based on discrete wavelet transform (DWT) has been developed to predict the number and location of TMHs in membrane proteins. PDB coded as 1KQG is chosen as an example to describe the prediction process by this method. 80 proteins with known 3D structure from Mptopo database are chosen at random as data sets (including 325 TMHs) and 80 sequences are divided into 13 groups according to their function and type. TMHs prediction is carried out for each group of membrane protein sequences and obtain satisfactory result. To verify the feasibility of this method, 80 membrane protein sequences are treated as test sets, 308 TMHs can be predicted and the prediction accuracy is 96.3%. Compared with the main prediction results of seven popular prediction methods, the obtained results indicate that the proposed method in this paper has higher prediction accuracy. PMID:23289014

  6. Computational Prediction of RNA-Binding Proteins and Binding Sites.

    PubMed

    Si, Jingna; Cui, Jing; Cheng, Jin; Wu, Rongling

    2015-01-01

    Proteins and RNA interaction have vital roles in many cellular processes such as protein synthesis, sequence encoding, RNA transfer, and gene regulation at the transcriptional and post-transcriptional levels. Approximately 6%-8% of all proteins are RNA-binding proteins (RBPs). Distinguishing these RBPs or their binding residues is a major aim of structural biology. Previously, a number of experimental methods were developed for the determination of protein-RNA interactions. However, these experimental methods are expensive, time-consuming, and labor-intensive. Alternatively, researchers have developed many computational approaches to predict RBPs and protein-RNA binding sites, by combining various machine learning methods and abundant sequence and/or structural features. There are three kinds of computational approaches, which are prediction from protein sequence, prediction from protein structure, and protein-RNA docking. In this paper, we review all existing studies of predictions of RNA-binding sites and RBPs and complexes, including data sets used in different approaches, sequence and structural features used in several predictors, prediction method classifications, performance comparisons, evaluation methods, and future directions.

  7. Scalable prediction of compound-protein interactions using minwise hashing.

    PubMed

    Tabei, Yasuo; Yamanishi, Yoshihiro

    2013-01-01

    The identification of compound-protein interactions plays key roles in the drug development toward discovery of new drug leads and new therapeutic protein targets. There is therefore a strong incentive to develop new efficient methods for predicting compound-protein interactions on a genome-wide scale. In this paper we develop a novel chemogenomic method to make a scalable prediction of compound-protein interactions from heterogeneous biological data using minwise hashing. The proposed method mainly consists of two steps: 1) construction of new compact fingerprints for compound-protein pairs by an improved minwise hashing algorithm, and 2) application of a sparsity-induced classifier to the compact fingerprints. We test the proposed method on its ability to make a large-scale prediction of compound-protein interactions from compound substructure fingerprints and protein domain fingerprints, and show superior performance of the proposed method compared with the previous chemogenomic methods in terms of prediction accuracy, computational efficiency, and interpretability of the predictive model. All the previously developed methods are not computationally feasible for the full dataset consisting of about 200 millions of compound-protein pairs. The proposed method is expected to be useful for virtual screening of a huge number of compounds against many protein targets.

  8. Ribosomal proteins: functions beyond the ribosome

    PubMed Central

    Zhou, Xiang; Liao, Wen-Juan; Liao, Jun-Ming; Liao, Peng; Lu, Hua

    2015-01-01

    Although ribosomal proteins are known for playing an essential role in ribosome assembly and protein translation, their ribosome-independent functions have also been greatly appreciated. Over the past decade, more than a dozen of ribosomal proteins have been found to activate the tumor suppressor p53 pathway in response to ribosomal stress. In addition, these ribosomal proteins are involved in various physiological and pathological processes. This review is composed to overview the current understanding of how ribosomal stress provokes the accumulation of ribosome-free ribosomal proteins, as well as the ribosome-independent functions of ribosomal proteins in tumorigenesis, immune signaling, and development. We also propose the potential of applying these pieces of knowledge to the development of ribosomal stress-based cancer therapeutics. PMID:25735597

  9. Versatile hemidesmosomal linker proteins: structure and function.

    PubMed

    Chaudhari, Pratik R; Vaidya, Milind M

    2015-04-01

    Hemidesmosomes are anchoring junctions which connect basal epidermal cells to the extracellular matrix. In complex epithelia like skin, hemidesmosomes are composed of transmembrane proteins like α6β4 integrin, BP180, CD151 and cytoplasmic proteins like BPAG1e and plectin. BPAG1e and plectin are plakin family cytolinker proteins which anchor intermediate filament proteins i.e. keratins to the hemidesmosomal transmembrane proteins. Mutations in BPAG1e and plectin lead to severe skin blistering disorders. Recent reports indicate that these hemidesmosomal linker proteins play a role in various cellular processes like cell motility and cytoskeleton dynamics apart from their known anchoring function. In this review, we will discuss their role in structural and signaling functions.

  10. Comprehensive predictions of target proteins based on protein-chemical interaction using virtual screening and experimental verifications

    PubMed Central

    2012-01-01

    Background Identification of the target proteins of bioactive compounds is critical for elucidating the mode of action; however, target identification has been difficult in general, mostly due to the low sensitivity of detection using affinity chromatography followed by CBB staining and MS/MS analysis. Results We applied our protocol of predicting target proteins combining in silico screening and experimental verification for incednine, which inhibits the anti-apoptotic function of Bcl-xL by an unknown mechanism. One hundred eighty-two target protein candidates were computationally predicted to bind to incednine by the statistical prediction method, and the predictions were verified by in vitro binding of incednine to seven proteins, whose expression can be confirmed in our cell system. As a result, 40% accuracy of the computational predictions was achieved successfully, and we newly found 3 incednine-binding proteins. Conclusions This study revealed that our proposed protocol of predicting target protein combining in silico screening and experimental verification is useful, and provides new insight into a strategy for identifying target proteins of small molecules. PMID:22480302

  11. Determining protein function and interaction from genome analysis

    DOEpatents

    Eisenberg, David; Marcotte, Edward M.; Thompson, Michael J.; Pellegrini, Matteo; Yeates, Todd O.

    2004-08-03

    A computational method system, and computer program are provided for inferring functional links from genome sequences. One method is based on the observation that some pairs of proteins A' and B' have homologs in another organism fused into a single protein chain AB. A trans-genome comparison of sequences can reveal these AB sequences, which are Rosetta Stone sequences because they decipher an interaction between A' and B. Another method compares the genomic sequence of two or more organisms to create a phylogenetic profile for each protein indicating its presence or absence across all the genomes. The profile provides information regarding functional links between different families of proteins. In yet another method a combination of the above two methods is used to predict functional links.

  12. A review on protein functionalized carbon nanotubes.

    PubMed

    Nagaraju, Kathyayini; Reddy, Roopa; Reddy, Narendra

    2015-01-01

    Carbon nanotubes (CNTs) have been widely recognized and used for controlled drug delivery and in various other fields due to their unique properties and distinct advantages. Both single-walled carbon nanotubes (SWCNTs) and multiwalled (MWCNTs) carbon nanotubes are used and/or studied for potential applications in medical, energy, textile, composite, and other areas. Since CNTs are chemically inert and are insoluble in water or other organic solvents, they are functionalized or modified to carry payloads or interact with biological molecules. CNTs have been preferably functionalized with proteins because CNTs are predominantly used for medical applications such as delivery of drugs, DNA and genes, and also for biosensing. Extensive studies have been conducted to understand the interactions, cytotoxicity, and potential applications of protein functionalized CNTs but contradicting results have been published on the cytotoxicity of the functionalized CNTs. This paper provides a brief review of CNTs functionalized with proteins, methods used to functionalize the CNTs, and their potential applications. PMID:26660626

  13. A review on protein functionalized carbon nanotubes.

    PubMed

    Nagaraju, Kathyayini; Reddy, Roopa; Reddy, Narendra

    2015-12-18

    Carbon nanotubes (CNTs) have been widely recognized and used for controlled drug delivery and in various other fields due to their unique properties and distinct advantages. Both single-walled carbon nanotubes (SWCNTs) and multiwalled (MWCNTs) carbon nanotubes are used and/or studied for potential applications in medical, energy, textile, composite, and other areas. Since CNTs are chemically inert and are insoluble in water or other organic solvents, they are functionalized or modified to carry payloads or interact with biological molecules. CNTs have been preferably functionalized with proteins because CNTs are predominantly used for medical applications such as delivery of drugs, DNA and genes, and also for biosensing. Extensive studies have been conducted to understand the interactions, cytotoxicity, and potential applications of protein functionalized CNTs but contradicting results have been published on the cytotoxicity of the functionalized CNTs. This paper provides a brief review of CNTs functionalized with proteins, methods used to functionalize the CNTs, and their potential applications.

  14. Comparison of Algorithms for Prediction of Protein Structural Features from Evolutionary Data

    PubMed Central

    Bywater, Robert P.

    2016-01-01

    Proteins have many functions and predicting these is still one of the major challenges in theoretical biophysics and bioinformatics. Foremost amongst these functions is the need to fold correctly thereby allowing the other genetically dictated tasks that the protein has to carry out to proceed efficiently. In this work, some earlier algorithms for predicting protein domain folds are revisited and they are compared with more recently developed methods. In dealing with intractable problems such as fold prediction, when different algorithms show convergence onto the same result there is every reason to take all algorithms into account such that a consensus result can be arrived at. In this work it is shown that the application of different algorithms in protein structure prediction leads to results that do not converge as such but rather they collude in a striking and useful way that has never been considered before. PMID:26963911

  15. Comparison of Algorithms for Prediction of Protein Structural Features from Evolutionary Data.

    PubMed

    Bywater, Robert P

    2016-01-01

    Proteins have many functions and predicting these is still one of the major challenges in theoretical biophysics and bioinformatics. Foremost amongst these functions is the need to fold correctly thereby allowing the other genetically dictated tasks that the protein has to carry out to proceed efficiently. In this work, some earlier algorithms for predicting protein domain folds are revisited and they are compared with more recently developed methods. In dealing with intractable problems such as fold prediction, when different algorithms show convergence onto the same result there is every reason to take all algorithms into account such that a consensus result can be arrived at. In this work it is shown that the application of different algorithms in protein structure prediction leads to results that do not converge as such but rather they collude in a striking and useful way that has never been considered before.

  16. Prediction of Protein Structure Using Surface Accessibility Data

    PubMed Central

    Hartlmüller, Christoph; Göbl, Christoph

    2016-01-01

    Abstract An approach to the de novo structure prediction of proteins is described that relies on surface accessibility data from NMR paramagnetic relaxation enhancements by a soluble paramagnetic compound (sPRE). This method exploits the distance‐to‐surface information encoded in the sPRE data in the chemical shift‐based CS‐Rosetta de novo structure prediction framework to generate reliable structural models. For several proteins, it is demonstrated that surface accessibility data is an excellent measure of the correct protein fold in the early stages of the computational folding algorithm and significantly improves accuracy and convergence of the standard Rosetta structure prediction approach. PMID:27560616

  17. Prediction of Protein Structure Using Surface Accessibility Data.

    PubMed

    Hartlmüller, Christoph; Göbl, Christoph; Madl, Tobias

    2016-09-19

    An approach to the de novo structure prediction of proteins is described that relies on surface accessibility data from NMR paramagnetic relaxation enhancements by a soluble paramagnetic compound (sPRE). This method exploits the distance-to-surface information encoded in the sPRE data in the chemical shift-based CS-Rosetta de novo structure prediction framework to generate reliable structural models. For several proteins, it is demonstrated that surface accessibility data is an excellent measure of the correct protein fold in the early stages of the computational folding algorithm and significantly improves accuracy and convergence of the standard Rosetta structure prediction approach.

  18. Prediction of Protein Structure Using Surface Accessibility Data.

    PubMed

    Hartlmüller, Christoph; Göbl, Christoph; Madl, Tobias

    2016-09-19

    An approach to the de novo structure prediction of proteins is described that relies on surface accessibility data from NMR paramagnetic relaxation enhancements by a soluble paramagnetic compound (sPRE). This method exploits the distance-to-surface information encoded in the sPRE data in the chemical shift-based CS-Rosetta de novo structure prediction framework to generate reliable structural models. For several proteins, it is demonstrated that surface accessibility data is an excellent measure of the correct protein fold in the early stages of the computational folding algorithm and significantly improves accuracy and convergence of the standard Rosetta structure prediction approach. PMID:27560616

  19. Protein meta-functional signatures from combining sequence, structure, evolution, and amino acid property information.

    PubMed

    Wang, Kai; Horst, Jeremy A; Cheng, Gong; Nickle, David C; Samudrala, Ram

    2008-09-26

    Protein function is mediated by different amino acid residues, both their positions and types, in a protein sequence. Some amino acids are responsible for the stability or overall shape of the protein, playing an indirect role in protein function. Others play a functionally important role as part of active or binding sites of the protein. For a given protein sequence, the residues and their degree of functional importance can be thought of as a signature representing the function of the protein. We have developed a combination of knowledge- and biophysics-based function prediction approaches to elucidate the relationships between the structural and the functional roles of individual residues and positions. Such a meta-functional signature (MFS), which is a collection of continuous values representing the functional significance of each residue in a protein, may be used to study proteins of known function in greater detail and to aid in experimental characterization of proteins of unknown function. We demonstrate the superior performance of MFS in predicting protein functional sites and also present four real-world examples to apply MFS in a wide range of settings to elucidate protein sequence-structure-function relationships. Our results indicate that the MFS approach, which can combine multiple sources of information and also give biological interpretation to each component, greatly facilitates the understanding and characterization of protein function.

  20. Neutral genetic drift can alter promiscuous protein functions, potentially aiding functional evolution

    PubMed Central

    Bloom, Jesse D; Romero, Philip A; Lu, Zhongyi; Arnold, Frances H

    2007-01-01

    Background Many of the mutations accumulated by naturally evolving proteins are neutral in the sense that they do not significantly alter a protein's ability to perform its primary biological function. However, new protein functions evolve when selection begins to favor other, "promiscuous" functions that are incidental to a protein's original biological role. If mutations that are neutral with respect to a protein's primary biological function cause substantial changes in promiscuous functions, these mutations could enable future functional evolution. Results Here we investigate this possibility experimentally by examining how cytochrome P450 enzymes that have evolved neutrally with respect to activity on a single substrate have changed in their abilities to catalyze reactions on five other substrates. We find that the enzymes have sometimes changed as much as four-fold in the promiscuous activities. The changes in promiscuous activities tend to increase with the number of mutations, and can be largely rationalized in terms of the chemical structures of the substrates. The activities on chemically similar substrates tend to change in a coordinated fashion, potentially providing a route for systematically predicting the change in one activity based on the measurement of several others. Conclusion Our work suggests that initially neutral genetic drift can lead to substantial changes in protein functions that are not currently under selection, in effect poising the proteins to more readily undergo functional evolution should selection favor new functions in the future. Reviewers This article was reviewed by Martijn Huynen, Fyodor Kondrashov, and Dan Tawfik (nominated by Christoph Adami). PMID:17598905

  1. Flavin Redox Switching of Protein Functions

    PubMed Central

    Zhu, Weidong; Moxley, Michael A.

    2011-01-01

    Abstract Flavin cofactors impart remarkable catalytic diversity to enzymes, enabling them to participate in a broad array of biological processes. The properties of flavins also provide proteins with a versatile redox sensor that can be utilized for converting physiological signals such as cellular metabolism, light, and redox status into a unique functional output. The control of protein functions by the flavin redox state is important for transcriptional regulation, cell signaling pathways, and environmental adaptation. A significant number of proteins that have flavin redox switches are found in the Per-Arnt-Sim (PAS) domain family and include flavoproteins that act as photosensors and respond to changes in cellular redox conditions. Biochemical and structural studies of PAS domain flavoproteins have revealed key insights into how flavin redox changes are propagated to the surface of the protein and translated into a new functional output such as the binding of a target protein in a signaling pathway. Mechanistic details of proteins unrelated to the PAS domain are also emerging and provide novel examples of how the flavin redox state governs protein–membrane interactions in response to appropriate stimuli. Analysis of different flavin switch proteins reveals shared mechanistic themes for the regulation of protein structure and function by flavins. Antioxid. Redox Signal. 14, 1079–1091. PMID:21028987

  2. QuaBingo: A Prediction System for Protein Quaternary Structure Attributes Using Block Composition.

    PubMed

    Tung, Chi-Hua; Chen, Chi-Wei; Guo, Ren-Chao; Ng, Hui-Fuang; Chu, Yen-Wei

    2016-01-01

    Background. Quaternary structures of proteins are closely relevant to gene regulation, signal transduction, and many other biological functions of proteins. In the current study, a new method based on protein-conserved motif composition in block format for feature extraction is proposed, which is termed block composition. Results. The protein quaternary assembly states prediction system which combines blocks with functional domain composition, called QuaBingo, is constructed by three layers of classifiers that can categorize quaternary structural attributes of monomer, homooligomer, and heterooligomer. The building of the first layer classifier uses support vector machines (SVM) based on blocks and functional domains of proteins, and the second layer SVM was utilized to process the outputs of the first layer. Finally, the result is determined by the Random Forest of the third layer. We compared the effectiveness of the combination of block composition, functional domain composition, and pseudoamino acid composition of the model. In the 11 kinds of functional protein families, QuaBingo is 23% of Matthews Correlation Coefficient (MCC) higher than the existing prediction system. The results also revealed the biological characterization of the top five block compositions. Conclusions. QuaBingo provides better predictive ability for predicting the quaternary structural attributes of proteins. PMID:27610389

  3. QuaBingo: A Prediction System for Protein Quaternary Structure Attributes Using Block Composition.

    PubMed

    Tung, Chi-Hua; Chen, Chi-Wei; Guo, Ren-Chao; Ng, Hui-Fuang; Chu, Yen-Wei

    2016-01-01

    Background. Quaternary structures of proteins are closely relevant to gene regulation, signal transduction, and many other biological functions of proteins. In the current study, a new method based on protein-conserved motif composition in block format for feature extraction is proposed, which is termed block composition. Results. The protein quaternary assembly states prediction system which combines blocks with functional domain composition, called QuaBingo, is constructed by three layers of classifiers that can categorize quaternary structural attributes of monomer, homooligomer, and heterooligomer. The building of the first layer classifier uses support vector machines (SVM) based on blocks and functional domains of proteins, and the second layer SVM was utilized to process the outputs of the first layer. Finally, the result is determined by the Random Forest of the third layer. We compared the effectiveness of the combination of block composition, functional domain composition, and pseudoamino acid composition of the model. In the 11 kinds of functional protein families, QuaBingo is 23% of Matthews Correlation Coefficient (MCC) higher than the existing prediction system. The results also revealed the biological characterization of the top five block compositions. Conclusions. QuaBingo provides better predictive ability for predicting the quaternary structural attributes of proteins.

  4. QuaBingo: A Prediction System for Protein Quaternary Structure Attributes Using Block Composition

    PubMed Central

    Tung, Chi-Hua; Chen, Chi-Wei; Guo, Ren-Chao; Ng, Hui-Fuang

    2016-01-01

    Background. Quaternary structures of proteins are closely relevant to gene regulation, signal transduction, and many other biological functions of proteins. In the current study, a new method based on protein-conserved motif composition in block format for feature extraction is proposed, which is termed block composition. Results. The protein quaternary assembly states prediction system which combines blocks with functional domain composition, called QuaBingo, is constructed by three layers of classifiers that can categorize quaternary structural attributes of monomer, homooligomer, and heterooligomer. The building of the first layer classifier uses support vector machines (SVM) based on blocks and functional domains of proteins, and the second layer SVM was utilized to process the outputs of the first layer. Finally, the result is determined by the Random Forest of the third layer. We compared the effectiveness of the combination of block composition, functional domain composition, and pseudoamino acid composition of the model. In the 11 kinds of functional protein families, QuaBingo is 23% of Matthews Correlation Coefficient (MCC) higher than the existing prediction system. The results also revealed the biological characterization of the top five block compositions. Conclusions. QuaBingo provides better predictive ability for predicting the quaternary structural attributes of proteins. PMID:27610389

  5. QuaBingo: A Prediction System for Protein Quaternary Structure Attributes Using Block Composition

    PubMed Central

    Tung, Chi-Hua; Chen, Chi-Wei; Guo, Ren-Chao; Ng, Hui-Fuang

    2016-01-01

    Background. Quaternary structures of proteins are closely relevant to gene regulation, signal transduction, and many other biological functions of proteins. In the current study, a new method based on protein-conserved motif composition in block format for feature extraction is proposed, which is termed block composition. Results. The protein quaternary assembly states prediction system which combines blocks with functional domain composition, called QuaBingo, is constructed by three layers of classifiers that can categorize quaternary structural attributes of monomer, homooligomer, and heterooligomer. The building of the first layer classifier uses support vector machines (SVM) based on blocks and functional domains of proteins, and the second layer SVM was utilized to process the outputs of the first layer. Finally, the result is determined by the Random Forest of the third layer. We compared the effectiveness of the combination of block composition, functional domain composition, and pseudoamino acid composition of the model. In the 11 kinds of functional protein families, QuaBingo is 23% of Matthews Correlation Coefficient (MCC) higher than the existing prediction system. The results also revealed the biological characterization of the top five block compositions. Conclusions. QuaBingo provides better predictive ability for predicting the quaternary structural attributes of proteins.

  6. Proteomics enhances evolutionary and functional analysis of reproductive proteins.

    PubMed

    Findlay, Geoffrey D; Swanson, Willie J

    2010-01-01

    Reproductive proteins maintain species-specific barriers to fertilization, affect the outcome of sperm competition, mediate reproductive conflicts between the sexes, and potentially contribute to the formation of new species. However, the specific proteins and molecular mechanisms that underlie these processes are understood in only a handful of cases. Advances in genomic and proteomic technologies enable the identification of large suites of reproductive proteins, making it possible to dissect reproductive phenotypes at the molecular level. We first review these technological advances and describe how reproductive proteins are identified in diverse animal taxa. We then discuss the dynamic evolution of reproductive proteins and the potential selective forces that act on them. Finally, we describe molecular and genomic tools for functional analysis and detail how evolutionary data may be used to make predictions about interactions among reproductive proteins.

  7. Identifying the singleplex and multiplex proteins based on transductive learning for protein subcellular localization prediction.

    PubMed

    Cao, Junzhe; Liu, Wenqi; He, Jianjun; Gu, Hong

    2013-07-01

    A new method is proposed to identify whether a query protein is singleplex or multiplex for improving the quality of protein subcellular localization prediction. Based on the transductive learning technique, this approach utilizes the information from the both query proteins and known proteins to estimate the subcellular location number of every query protein so that the singleplex and multiplex proteins can be recognized and distinguished. Each query protein is then dealt with by a targeted single-label or multi-label predictor to achieve a high-accuracy prediction result. We assess the performance of the proposed approach by applying it to three groups of protein sequences datasets. Simulation experiments show that the proposed approach can effectively identify the singleplex and multiplex proteins. Through a comparison, the reliably of this method for enhancing the power of predicting protein subcellular localization can also be verified.

  8. Some functional properties of oilseed proteins.

    PubMed

    Khalil, M; Ragab, M; Hassanien, F R

    1985-01-01

    Oilseeds have potential food uses because of their high protein content. Besides, these proteins when added to a type of foods, supply desirable functional properties, such as whipping capacity and viscosity, emulsification and water and oil holding capacities. Rapeseed and soybean protein isolates were found to possess whipping capacity followed by those of sunflower, peanut, sesame, cottonseed and safflower. The addition of sugar improved the whipping properties of oilseed proteins. The whipping capacity of oilseed proteins decreased due to heating at 100 degrees C for time of 15 to 60 min. Soybean protein had the highest emulsifying capacity compared with the other oilseed proteins. The heated oilseed proteins had emulsification properties similar to or better than the control. Glandless cottonseed protein had high water and oil holding capacities. The water holding capacity of oilseed proteins decreased gradually as the duration of heating at 100 degrees C was increased. On the other hand the heated oilseed proteins had oil holding capacities similar to or better than unheated proteins. PMID:4000248

  9. Measuring the functional sequence complexity of proteins

    PubMed Central

    Durston, Kirk K; Chiu, David KY; Abel, David L; Trevors, Jack T

    2007-01-01

    Background Abel and Trevors have delineated three aspects of sequence complexity, Random Sequence Complexity (RSC), Ordered Sequence Complexity (OSC) and Functional Sequence Complexity (FSC) observed in biosequences such as proteins. In this paper, we provide a method to measure functional sequence complexity. Methods and Results We have extended Shannon uncertainty by incorporating the data variable with a functionality variable. The resulting measured unit, which we call Functional bit (Fit), is calculated from the sequence data jointly with the defined functionality variable. To demonstrate the relevance to functional bioinformatics, a method to measure functional sequence complexity was developed and applied to 35 protein families. Considerations were made in determining how the measure can be used to correlate functionality when relating to the whole molecule and sub-molecule. In the experiment, we show that when the proposed measure is applied to the aligned protein sequences of ubiquitin, 6 of the 7 highest value sites correlate with the binding domain. Conclusion For future extensions, measures of functional bioinformatics may provide a means to evaluate potential evolving pathways from effects such as mutations, as well as analyzing the internal structural and functional relationships within the 3-D structure of proteins. PMID:18062814

  10. PredPlantPTS1: A Web Server for the Prediction of Plant Peroxisomal Proteins.

    PubMed

    Reumann, Sigrun; Buchwald, Daniela; Lingner, Thomas

    2012-01-01

    Prediction of subcellular protein localization is essential to correctly assign unknown proteins to cell organelle-specific protein networks and to ultimately determine protein function. For metazoa, several computational approaches have been developed in the past decade to predict peroxisomal proteins carrying the peroxisome targeting signal type 1 (PTS1). However, plant-specific PTS1 protein prediction methods have been lacking up to now, and pre-existing methods generally were incapable of correctly predicting low-abundance plant proteins possessing non-canonical PTS1 patterns. Recently, we presented a machine learning approach that is able to predict PTS1 proteins for higher plants (spermatophytes) with high accuracy and which can correctly identify unknown targeting patterns, i.e., novel PTS1 tripeptides and tripeptide residues. Here we describe the first plant-specific web server PredPlantPTS1 for the prediction of plant PTS1 proteins using the above-mentioned underlying models. The server allows the submission of protein sequences from diverse spermatophytes and also performs well for mosses and algae. The easy-to-use web interface provides detailed output in terms of (i) the peroxisomal targeting probability of the given sequence, (ii) information whether a particular non-canonical PTS1 tripeptide has already been experimentally verified, and (iii) the prediction scores for the single C-terminal 14 amino acid residues. The latter allows identification of predicted residues that inhibit peroxisome targeting and which can be optimized using site-directed mutagenesis to raise the peroxisome targeting efficiency. The prediction server will be instrumental in identifying low-abundance and stress-inducible peroxisomal proteins and defining the entire peroxisomal proteome of Arabidopsis and agronomically important crop plants. PredPlantPTS1 is freely accessible at ppp.gobics.de.

  11. How Good Are Simplified Models for Protein Structure Prediction?

    PubMed Central

    Newton, M. A. Hakim; Rashid, Mahmood A.; Pham, Duc Nghia; Sattar, Abdul

    2014-01-01

    Protein structure prediction (PSP) has been one of the most challenging problems in computational biology for several decades. The challenge is largely due to the complexity of the all-atomic details and the unknown nature of the energy function. Researchers have therefore used simplified energy models that consider interaction potentials only between the amino acid monomers in contact on discrete lattices. The restricted nature of the lattices and the energy models poses a twofold concern regarding the assessment of the models. Can a native or a very close structure be obtained when structures are mapped to lattices? Can the contact based energy models on discrete lattices guide the search towards the native structures? In this paper, we use the protein chain lattice fitting (PCLF) problem to address the first concern; we developed a constraint-based local search algorithm for the PCLF problem for cubic and face-centered cubic lattices and found very close lattice fits for the native structures. For the second concern, we use a number of techniques to sample the conformation space and find correlations between energy functions and root mean square deviation (RMSD) distance of the lattice-based structures with the native structures. Our analysis reveals weakness of several contact based energy models used that are popular in PSP. PMID:24876837

  12. The use of serum glial fibrillary acidic protein test as a promising tool for intracerebral hemorrhage diagnosis in Chinese patients and prediction of the short-term functional outcomes.

    PubMed

    Xiong, Lijun; Yang, Yan; Zhang, Mei; Xu, Wuping

    2015-11-01

    The objective of this study was to explore the efficacy of glial fibrillary acidic protein (GFAP) in differentiating intracerebral hemorrhage (ICH) from ischemic stroke (IS). Suspicious patients of acute stroke were screened and finally diagnosed by computed tomography and magnetic resonance imaging. Blood samples were collected within 2-6 h after onset of symptoms, and serum GFAP level was determined by ELISA assay. The functional outcome for the patients was determined by modified Rankin Scale (mRS) 90 days after onset of symptoms. 43 ICH patients and 65 IS patients were enrolled. GFAP concentration in ICH group was significantly higher than in IS group (p < 0.001). Significant correlation was found when comparing GFAP with National Institutes of Health Stroke Scale (NIHSS) (r = 0.418, p = 0.005) and hemorrhage volume (r = 0.840, p < 0.001) in ICH group, while such correlation was not observed in IS group. ROC analysis indicated that GFAP level at the cut-point of 0.7 ng/ml yielded an AUC of 0.901 (95 % CI 0.828-0.950) with high sensitivity (86.0 %) and specificity (76.9 %) to differentiate ICH from IS. Patients with higher serum GFAP concentration in ICH group experienced poorer functional disability (r = 0.755, p < 0.001), while this phenomenon was not observed in IS group (r = -0.114, p = 0.368). ROC curve analysis found that GFAP level at the cut-point of 1.04 ng/ml yielded an AUC of 0.936 (95 % CI 0.817-0.988) in identifying patients with poor functional outcome, at the sensitivity and specificity of 95.7 and 80.0 %, respectively. GFAP test is a promising technique for diagnosis of ICH from IS and prediction of short-term functional outcomes.

  13. Prediction of three-dimensional transmembrane helical protein structures

    NASA Astrophysics Data System (ADS)

    Barth, Patrick

    Membrane proteins are critical to living cells and their dysfunction can lead to serious diseases. High-resolution structures of these proteins would provide very valuable information for designing eficient therapies but membrane protein crystallization is a major bottleneck. As an important alternative approach, methods for predicting membrane protein structures have been developed in recent years. This chapter focuses on the problem of modeling the structure of transmembrane helical proteins, and describes recent advancements, current limitations, and future challenges facing de novo modeling, modeling with experimental constraints, and high-resolution comparative modeling of these proteins. Abbreviations: MP, membrane protein; SP, water-soluble protein; RMSD, root-mean square deviation; Cα RMSD, root-mean square deviation over Cα atoms; TM, transmembrane; TMH, transmembrane helix; GPCR, G protein-coupled receptor; 3D, three dimensional; NMR, nuclear magnetic resonance spectroscopy; EPR, electron paramagnetic resonance spectroscopy; FTIR, Fourier transform infrared spectroscopy.

  14. The N and C Termini of ZO-1 Are Surrounded by Distinct Proteins and Functional Protein Networks*

    PubMed Central

    Van Itallie, Christina M.; Aponte, Angel; Tietgens, Amber Jean; Gucek, Marjan; Fredriksson, Karin; Anderson, James Melvin

    2013-01-01

    The proteins and functional protein networks of the tight junction remain incompletely defined. Among the currently known proteins are barrier-forming proteins like occludin and the claudin family; scaffolding proteins like ZO-1; and some cytoskeletal, signaling, and cell polarity proteins. To define a more complete list of proteins and infer their functional implications, we identified the proteins that are within molecular dimensions of ZO-1 by fusing biotin ligase to either its N or C terminus, expressing these fusion proteins in Madin-Darby canine kidney epithelial cells, and purifying and identifying the resulting biotinylated proteins by mass spectrometry. Of a predicted proteome of ∼9000, we identified more than 400 proteins tagged by biotin ligase fused to ZO-1, with both identical and distinct proteins near the N- and C-terminal ends. Those proximal to the N terminus were enriched in transmembrane tight junction proteins, and those proximal to the C terminus were enriched in cytoskeletal proteins. We also identified many unexpected but easily rationalized proteins and verified partial colocalization of three of these proteins with ZO-1 as examples. In addition, functional networks of interacting proteins were tagged, such as the basolateral but not apical polarity network. These results provide a rich inventory of proteins and potential novel insights into functions and protein networks that should catalyze further understanding of tight junction biology. Unexpectedly, the technique demonstrates high spatial resolution, which could be generally applied to defining other subcellular protein compartmentalization. PMID:23553632

  15. Evolution of Ftz protein function in insects.

    PubMed

    Alonso, C R; Maxton-Kuechenmeister, J; Akam, M

    2001-09-18

    The Drosophila gene fushi tarazu (ftz) encodes a homeodomain-containing transcriptional regulator (Ftz) required at several stages during development. Drosophila melanogaster ftz (Dm-ftz) is first expressed in seven stripes defining alternate parasegments of the embryo--a "pair-rule" segmentation function [1, 2]. It is then expressed in specific neural precursor cells in the central nervous system and finally in the developing hindgut [3]. An Orthopteran ortholog of ftz (Sg-ftz, formally Dax) has been isolated from the grasshopper Schistocerca gregaria [4]. The pattern of Sg-ftz expression in Schistocerca embryos suggests that some developmental roles of the ftz gene are likely to be conserved between these two species (e.g., CNS functions) while others may have diverged (e.g., segmentation functions). To test whether the function of the Ftz protein itself differs between these two species, here we compare the functions of Sg-Ftz and Dm-Ftz proteins by expressing both in Drosophila embryos. Sg-ftz mimics only poorly several segmentation roles of Dm-ftz (engrailed activation, wingless repression, and embryonic cuticle transformation). However, the two proteins are similarly active in the rescue of a CNS-specific ftz mutant. These findings argue that this ftz CNS function is mediated by conserved parts of the protein, while efficient pair-rule function requires sequences present specifically in the Drosophila protein. PMID:11566109

  16. Prediction of structural features and application to outer membrane protein identification

    NASA Astrophysics Data System (ADS)

    Yan, Renxiang; Wang, Xiaofeng; Huang, Lanqing; Yan, Feidi; Xue, Xiaoyu; Cai, Weiwen

    2015-06-01

    Protein three-dimensional (3D) structures provide insightful information in many fields of biology. One-dimensional properties derived from 3D structures such as secondary structure, residue solvent accessibility, residue depth and backbone torsion angles are helpful to protein function prediction, fold recognition and ab initio folding. Here, we predict various structural features with the assistance of neural network learning. Based on an independent test dataset, protein secondary structure prediction generates an overall Q3 accuracy of ~80%. Meanwhile, the prediction of relative solvent accessibility obtains the highest mean absolute error of 0.164, and prediction of residue depth achieves the lowest mean absolute error of 0.062. We further improve the outer membrane protein identification by including the predicted structural features in a scoring function using a simple profile-to-profile alignment. The results demonstrate that the accuracy of outer membrane protein identification can be improved by ~3% at a 1% false positive level when structural features are incorporated. Finally, our methods are available as two convenient and easy-to-use programs. One is PSSM-2-Features for predicting secondary structure, relative solvent accessibility, residue depth and backbone torsion angles, the other is PPA-OMP for identifying outer membrane proteins from proteomes.

  17. An Atomistic Statistically Effective Energy Function for Computational Protein Design.

    PubMed

    Topham, Christopher M; Barbe, Sophie; André, Isabelle

    2016-08-01

    Shortcomings in the definition of effective free-energy surfaces of proteins are recognized to be a major contributory factor responsible for the low success rates of existing automated methods for computational protein design (CPD). The formulation of an atomistic statistically effective energy function (SEEF) suitable for a wide range of CPD applications and its derivation from structural data extracted from protein domains and protein-ligand complexes are described here. The proposed energy function comprises nonlocal atom-based and local residue-based SEEFs, which are coupled using a novel atom connectivity number factor to scale short-range, pairwise, nonbonded atomic interaction energies and a surface-area-dependent cavity energy term. This energy function was used to derive additional SEEFs describing the unfolded-state ensemble of any given residue sequence based on computed average energies for partially or fully solvent-exposed fragments in regions of irregular structure in native proteins. Relative thermal stabilities of 97 T4 bacteriophage lysozyme mutants were predicted from calculated energy differences for folded and unfolded states with an average unsigned error (AUE) of 0.84 kcal mol(-1) when compared to experiment. To demonstrate the utility of the energy function for CPD, further validation was carried out in tests of its capacity to recover cognate protein sequences and to discriminate native and near-native protein folds, loop conformers, and small-molecule ligand binding poses from non-native benchmark decoys. Experimental ligand binding free energies for a diverse set of 80 protein complexes could be predicted with an AUE of 2.4 kcal mol(-1) using an additional energy term to account for the loss in ligand configurational entropy upon binding. The atomistic SEEF is expected to improve the accuracy of residue-based coarse-grained SEEFs currently used in CPD and to extend the range of applications of extant atom-based protein statistical

  18. An Atomistic Statistically Effective Energy Function for Computational Protein Design.

    PubMed

    Topham, Christopher M; Barbe, Sophie; André, Isabelle

    2016-08-01

    Shortcomings in the definition of effective free-energy surfaces of proteins are recognized to be a major contributory factor responsible for the low success rates of existing automated methods for computational protein design (CPD). The formulation of an atomistic statistically effective energy function (SEEF) suitable for a wide range of CPD applications and its derivation from structural data extracted from protein domains and protein-ligand complexes are described here. The proposed energy function comprises nonlocal atom-based and local residue-based SEEFs, which are coupled using a novel atom connectivity number factor to scale short-range, pairwise, nonbonded atomic interaction energies and a surface-area-dependent cavity energy term. This energy function was used to derive additional SEEFs describing the unfolded-state ensemble of any given residue sequence based on computed average energies for partially or fully solvent-exposed fragments in regions of irregular structure in native proteins. Relative thermal stabilities of 97 T4 bacteriophage lysozyme mutants were predicted from calculated energy differences for folded and unfolded states with an average unsigned error (AUE) of 0.84 kcal mol(-1) when compared to experiment. To demonstrate the utility of the energy function for CPD, further validation was carried out in tests of its capacity to recover cognate protein sequences and to discriminate native and near-native protein folds, loop conformers, and small-molecule ligand binding poses from non-native benchmark decoys. Experimental ligand binding free energies for a diverse set of 80 protein complexes could be predicted with an AUE of 2.4 kcal mol(-1) using an additional energy term to account for the loss in ligand configurational entropy upon binding. The atomistic SEEF is expected to improve the accuracy of residue-based coarse-grained SEEFs currently used in CPD and to extend the range of applications of extant atom-based protein statistical

  19. Genetically modified proteins: functional improvement and chimeragenesis

    PubMed Central

    Balabanova, Larissa; Golotin, Vasily; Podvolotskaya, Anna; Rasskazov, Valery

    2015-01-01

    This review focuses on the emerging role of site-specific mutagenesis and chimeragenesis for the functional improvement of proteins in areas where traditional protein engineering methods have been extensively used and practically exhausted. The novel path for the creation of the novel proteins has been created on the farther development of the new structure and sequence optimization algorithms for generating and designing the accurate structure models in result of x-ray crystallography studies of a lot of proteins and their mutant forms. Artificial genetic modifications aim to expand nature's repertoire of biomolecules. One of the most exciting potential results of mutagenesis or chimeragenesis finding could be design of effective diagnostics, bio-therapeutics and biocatalysts. A sampling of recent examples is listed below for the in vivo and in vitro genetically improvement of various binding protein and enzyme functions, with references for more in-depth study provided for the reader's benefit. PMID:26211369

  20. Profiling protein function with small molecule microarrays

    PubMed Central

    Winssinger, Nicolas; Ficarro, Scott; Schultz, Peter G.; Harris, Jennifer L.

    2002-01-01

    The regulation of protein function through posttranslational modification, local environment, and protein–protein interaction is critical to cellular function. The ability to analyze on a genome-wide scale protein functional activity rather than changes in protein abundance or structure would provide important new insights into complex biological processes. Herein, we report the application of a spatially addressable small molecule microarray to an activity-based profile of proteases in crude cell lysates. The potential of this small molecule-based profiling technology is demonstrated by the detection of caspase activation upon induction of apoptosis, characterization of the activated caspase, and inhibition of the caspase-executed apoptotic phenotype using the small molecule inhibitor identified in the microarray-based profile. PMID:12167675

  1. INTREPID: a web server for prediction of functionally important residues by evolutionary analysis.

    PubMed

    Sankararaman, Sriram; Kolaczkowski, Bryan; Sjölander, Kimmen

    2009-07-01

    We present the INTREPID web server for predicting functionally important residues in proteins. INTREPID has been shown to boost the recall and precision of catalytic residue prediction over other sequence-based methods and can be used to identify other types of functional residues. The web server takes an input protein sequence, gathers homologs, constructs a multiple sequence alignment and phylogenetic tree and finally runs the INTREPID method to assign a score to each position. Residues predicted to be functionally important are displayed on homologous 3D structures (where available), highlighting spatial patterns of conservation at various significance thresholds. The INTREPID web server is available at http://phylogenomics.berkeley.edu/intrepid.

  2. Proteins and Their Interacting Partners: An Introduction to Protein–Ligand Binding Site Prediction Methods

    PubMed Central

    Roche, Daniel Barry; Brackenridge, Danielle Allison; McGuffin, Liam James

    2015-01-01

    Elucidating the biological and biochemical roles of proteins, and subsequently determining their interacting partners, can be difficult and time consuming using in vitro and/or in vivo methods, and consequently the majority of newly sequenced proteins will have unknown structures and functions. However, in silico methods for predicting protein–ligand binding sites and protein biochemical functions offer an alternative practical solution. The characterisation of protein–ligand binding sites is essential for investigating new functional roles, which can impact the major biological research spheres of health, food, and energy security. In this review we discuss the role in silico methods play in 3D modelling of protein–ligand binding sites, along with their role in predicting biochemical functionality. In addition, we describe in detail some of the key alternative in silico prediction approaches that are available, as well as discussing the Critical Assessment of Techniques for Protein Structure Prediction (CASP) and the Continuous Automated Model EvaluatiOn (CAMEO) projects, and their impact on developments in the field. Furthermore, we discuss the importance of protein function prediction methods for tackling 21st century problems. PMID:26694353

  3. Toxicological relationships between proteins obtained from protein target predictions of large toxicity databases

    SciTech Connect

    Nigsch, Florian; Mitchell, John B.O.

    2008-09-01

    The combination of models for protein target prediction with large databases containing toxicological information for individual molecules allows the derivation of 'toxiclogical' profiles, i.e., to what extent are molecules of known toxicity predicted to interact with a set of protein targets. To predict protein targets of drug-like and toxic molecules, we built a computational multiclass model using the Winnow algorithm based on a dataset of protein targets derived from the MDL Drug Data Report. A 15-fold Monte Carlo cross-validation using 50% of each class for training, and the remaining 50% for testing, provided an assessment of the accuracy of that model. We retained the 3 top-ranking predictions and found that in 82% of all cases the correct target was predicted within these three predictions. The first prediction was the correct one in almost 70% of cases. A model built on the whole protein target dataset was then used to predict the protein targets for 150 000 molecules from the MDL Toxicity Database. We analysed the frequency of the predictions across the panel of protein targets for experimentally determined toxicity classes of all molecules. This allowed us to identify clusters of proteins related by their toxicological profiles, as well as toxicities that are related. Literature-based evidence is provided for some specific clusters to show the relevance of the relationships identified.

  4. A Prediction Model for Membrane Proteins Using Moments Based Features.

    PubMed

    Butt, Ahmad Hassan; Khan, Sher Afzal; Jamil, Hamza; Rasool, Nouman; Khan, Yaser Daanial

    2016-01-01

    The most expedient unit of the human body is its cell. Encapsulated within the cell are many infinitesimal entities and molecules which are protected by a cell membrane. The proteins that are associated with this lipid based bilayer cell membrane are known as membrane proteins and are considered to play a significant role. These membrane proteins exhibit their effect in cellular activities inside and outside of the cell. According to the scientists in pharmaceutical organizations, these membrane proteins perform key task in drug interactions. In this study, a technique is presented that is based on various computationally intelligent methods used for the prediction of membrane protein without the experimental use of mass spectrometry. Statistical moments were used to extract features and furthermore a Multilayer Neural Network was trained using backpropagation for the prediction of membrane proteins. Results show that the proposed technique performs better than existing methodologies.

  5. A Prediction Model for Membrane Proteins Using Moments Based Features

    PubMed Central

    Butt, Ahmad Hassan; Khan, Sher Afzal; Jamil, Hamza; Rasool, Nouman; Khan, Yaser Daanial

    2016-01-01

    The most expedient unit of the human body is its cell. Encapsulated within the cell are many infinitesimal entities and molecules which are protected by a cell membrane. The proteins that are associated with this lipid based bilayer cell membrane are known as membrane proteins and are considered to play a significant role. These membrane proteins exhibit their effect in cellular activities inside and outside of the cell. According to the scientists in pharmaceutical organizations, these membrane proteins perform key task in drug interactions. In this study, a technique is presented that is based on various computationally intelligent methods used for the prediction of membrane protein without the experimental use of mass spectrometry. Statistical moments were used to extract features and furthermore a Multilayer Neural Network was trained using backpropagation for the prediction of membrane proteins. Results show that the proposed technique performs better than existing methodologies. PMID:26966690

  6. Proteome-wide prediction of self-interacting proteins based on multiple properties.

    PubMed

    Liu, Zhongyang; Guo, Feifei; Zhang, Jiyang; Wang, Jian; Lu, Liang; Li, Dong; He, Fuchu

    2013-06-01

    Self-interacting proteins, whose two or more copies can interact with each other, play important roles in cellular functions and the evolution of protein interaction networks (PINs). Knowing whether a protein can self-interact can contribute to and sometimes is crucial for the elucidation of its functions. Previous related research has mainly focused on the structures and functions of specific self-interacting proteins, whereas knowledge on their overall properties is limited. Meanwhile, the two current most common high throughput protein interaction assays have limited ability to detect self-interactions because of biological artifacts and design limitations, whereas the bioinformatic prediction method of self-interacting proteins is lacking. This study aims to systematically study and predict self-interacting proteins from an overall perspective. We find that compared with other proteins the self-interacting proteins in the structural aspect contain more domains; in the evolutionary aspect they tend to be conserved and ancient; in the functional aspect they are significantly enriched with enzyme genes, housekeeping genes, and drug targets, and in the topological aspect tend to occupy important positions in PINs. Furthermore, based on these features, after feature selection, we use logistic regression to integrate six representative features, including Gene Ontology term, domain, paralogous interactor, enzyme, model organism self-interacting protein, and betweenness centrality in the PIN, to develop a proteome-wide prediction model of self-interacting proteins. Using 5-fold cross-validation and an independent test, this model shows good performance. Finally, the prediction model is developed into a user-friendly web service SLIPPER (SeLf-Interacting Protein PrEdictoR). Users may submit a list of proteins, and then SLIPPER will return the probability_scores measuring their possibility to be self-interacting proteins and various related annotation information. This

  7. Proteome-wide prediction of self-interacting proteins based on multiple properties.

    PubMed

    Liu, Zhongyang; Guo, Feifei; Zhang, Jiyang; Wang, Jian; Lu, Liang; Li, Dong; He, Fuchu

    2013-06-01

    Self-interacting proteins, whose two or more copies can interact with each other, play important roles in cellular functions and the evolution of protein interaction networks (PINs). Knowing whether a protein can self-interact can contribute to and sometimes is crucial for the elucidation of its functions. Previous related research has mainly focused on the structures and functions of specific self-interacting proteins, whereas knowledge on their overall properties is limited. Meanwhile, the two current most common high throughput protein interaction assays have limited ability to detect self-interactions because of biological artifacts and design limitations, whereas the bioinformatic prediction method of self-interacting proteins is lacking. This study aims to systematically study and predict self-interacting proteins from an overall perspective. We find that compared with other proteins the self-interacting proteins in the structural aspect contain more domains; in the evolutionary aspect they tend to be conserved and ancient; in the functional aspect they are significantly enriched with enzyme genes, housekeeping genes, and drug targets, and in the topological aspect tend to occupy important positions in PINs. Furthermore, based on these features, after feature selection, we use logistic regression to integrate six representative features, including Gene Ontology term, domain, paralogous interactor, enzyme, model organism self-interacting protein, and betweenness centrality in the PIN, to develop a proteome-wide prediction model of self-interacting proteins. Using 5-fold cross-validation and an independent test, this model shows good performance. Finally, the prediction model is developed into a user-friendly web service SLIPPER (SeLf-Interacting Protein PrEdictoR). Users may submit a list of proteins, and then SLIPPER will return the probability_scores measuring their possibility to be self-interacting proteins and various related annotation information. This

  8. Predicting three-dimensional structures of transmembrane domains of β-barrel membrane proteins

    PubMed Central

    Naveed, Hammad; Xu, Yun; Jackups, Ronald; Liang, Jie

    2012-01-01

    β-barrel membrane proteins are found in the outer membrane of gram-negative bacteria, mitochondria, and chloroplasts. They are important for pore formation, membrane anchoring, enzyme activity, and are often responsible for bacterial virulence. Due to difficulties in experimental structure determination, they are sparsely represented in the protein structure databank. We have developed a computational method for predicting structures of the trans-membrane (TM) domains of β-barrel membrane proteins. Our method based on key organization principles, can predict structures of the TM domain of β-barrel membrane proteins of novel topology, including those from eukaryotic mitochondria. Our method is based on a model of physical interactions, a discrete conformational state-space, an empirical potential function, as well as a model to account for interstrand loop entropy. We are able to construct three dimensional atomic structure of the TM-domains from sequences for a set of 23 non-homologous proteins (resolution 1.8 – 3.0 Å). The median RMSD of TM-domains containing 75–222 residues between predicted and measured structures is 3.9 Å for main chain atoms. In addition, stability determinants and protein-protein interaction sites can be predicted. Such predictions on eukaryotic mitochondria outer membrane protein Tom40 and VDAC are confirmed by independent mutagenesis and chemical cross-linking studies. These results suggest that our model captures key components of the organization principles of β-barrel membrane protein assembly. PMID:22148174

  9. Evolution-Based Functional Decomposition of Proteins.

    PubMed

    Rivoire, Olivier; Reynolds, Kimberly A; Ranganathan, Rama

    2016-06-01

    The essential biological properties of proteins-folding, biochemical activities, and the capacity to adapt-arise from the global pattern of interactions between amino acid residues. The statistical coupling analysis (SCA) is an approach to defining this pattern that involves the study of amino acid coevolution in an ensemble of sequences comprising a protein family. This approach indicates a functional architecture within proteins in which the basic units are coupled networks of amino acids termed sectors. This evolution-based decomposition has potential for new understandings of the structural basis for protein function. To facilitate its usage, we present here the principles and practice of the SCA and introduce new methods for sector analysis in a python-based software package (pySCA). We show that the pattern of amino acid interactions within sectors is linked to the divergence of functional lineages in a multiple sequence alignment-a model for how sector properties might be differentially tuned in members of a protein family. This work provides new tools for studying proteins and for generally testing the concept of sectors as the principal units of function and adaptive variation. PMID:27254668

  10. CRYSTALP2: sequence-based protein crystallization propensity prediction

    PubMed Central

    Kurgan, Lukasz; Razib, Ali A; Aghakhani, Sara; Dick, Scott; Mizianty, Marcin; Jahandideh, Samad

    2009-01-01

    Background Current protocols yield crystals for <30% of known proteins, indicating that automatically identifying crystallizable proteins may improve high-throughput structural genomics efforts. We introduce CRYSTALP2, a kernel-based method that predicts the propensity of a given protein sequence to produce diffraction-quality crystals. This method utilizes the composition and collocation of amino acids, isoelectric point, and hydrophobicity, as estimated from the primary sequence, to generate predictions. CRYSTALP2 extends its predecessor, CRYSTALP, by enabling predictions for sequences of unrestricted size and provides improved prediction quality. Results A significant majority of the collocations used by CRYSTALP2 include residues with high conformational entropy, or low entropy and high potential to mediate crystal contacts; notably, such residues are utilized by surface entropy reduction methods. We show that the collocations provide complementary information to the hydrophobicity and isoelectric point. Tests on four datasets show that CRYSTALP2 outperforms several existing sequence-based predictors (CRYSTALP, OB-score, and SECRET). CRYSTALP2's accuracy, MCC, and AROC range between 69.3 and 77.5%, 0.39 and 0.55, and 0.72 and 0.79, respectively. Our predictions are similar in quality and are complementary to the predictions of the most recent ParCrys and XtalPred methods. Our results also suggest that, as work in protein crystallization continues (thereby enlarging the population of proteins with known crystallization propensities), the prediction quality of the CRYSTALP2 method should increase. The prediction model and the datasets used in this contribution can be downloaded from . Conclusion CRYSTALP2 provides relatively accurate crystallization propensity predictions for a given protein chain that either outperform or complement the existing approaches. The proposed method can be used to support current efforts towards improving the success rate in obtaining

  11. Network-based function prediction and interactomics: the case for metabolic enzymes.

    PubMed

    Janga, S C; Díaz-Mejía, J Javier; Moreno-Hagelsieb, G

    2011-01-01

    As sequencing technologies increase in power, determining the functions of unknown proteins encoded by the DNA sequences so produced becomes a major challenge. Functional annotation is commonly done on the basis of amino-acid sequence similarity alone. Long after sequence similarity becomes undetectable by pair-wise comparison, profile-based identification of homologs can often succeed due to the conservation of position-specific patterns, important for a protein's three dimensional folding and function. Nevertheless, prediction of protein function from homology-driven approaches is not without problems. Homologous proteins might evolve different functions and the power of homology detection has already started to reach its maximum. Computational methods for inferring protein function, which exploit the context of a protein in cellular networks, have come to be built on top of homology-based approaches. These network-based functional inference techniques provide both a first hand hint into a proteins' functional role and offer complementary insights to traditional methods for understanding the function of uncharacterized proteins. Most recent network-based approaches aim to integrate diverse kinds of functional interactions to boost both coverage and confidence level. These techniques not only promise to solve the moonlighting aspect of proteins by annotating proteins with multiple functions, but also increase our understanding on the interplay between different functional classes in a cell. In this article we review the state of the art in network-based function prediction and describe some of the underlying difficulties and successes. Given the volume of high-throughput data that is being reported the time is ripe to employ these network-based approaches, which can be used to unravel the functions of the uncharacterized proteins accumulating in the genomic databases.

  12. Predicting protein disorder by analyzing amino acid sequence

    PubMed Central

    Yang, Jack Y; Yang, Mary Qu

    2008-01-01

    Background Many protein regions and some entire proteins have no definite tertiary structure, presenting instead as dynamic, disorder ensembles under different physiochemical circumstances. These proteins and regions are known as Intrinsically Unstructured Proteins (IUP). IUP have been associated with a wide range of protein functions, along with roles in diseases characterized by protein misfolding and aggregation. Results Identifying IUP is important task in structural and functional genomics. We exact useful features from sequences and develop machine learning algorithms for the above task. We compare our IUP predictor with PONDRs (mainly neural-network-based predictors), disEMBL (also based on neural networks) and Globplot (based on disorder propensity). Conclusion We find that augmenting features derived from physiochemical properties of amino acids (such as hydrophobicity, complexity etc.) and using ensemble method proved beneficial. The IUP predictor is a viable alternative software tool for identifying IUP protein regions and proteins. PMID:18831799

  13. Ligand Similarity Complements Sequence, Physical Interaction, and Co-Expression for Gene Function Prediction.

    PubMed

    O'Meara, Matthew J; Ballouz, Sara; Shoichet, Brian K; Gillis, Jesse

    2016-01-01

    The expansion of protein-ligand annotation databases has enabled large-scale networking of proteins by ligand similarity. These ligand-based protein networks, which implicitly predict the ability of neighboring proteins to bind related ligands, may complement biologically-oriented gene networks, which are used to predict functional or disease relevance. To quantify the degree to which such ligand-based protein associations might complement functional genomic associations, including sequence similarity, physical protein-protein interactions, co-expression, and disease gene annotations, we calculated a network based on the Similarity Ensemble Approach (SEA: sea.docking.org), where protein neighbors reflect the similarity of their ligands. We also measured the similarity with functional genomic networks over a common set of 1,131 genes, and found that the networks had only small overlaps, which were significant only due to the large scale of the data. Consistent with the view that the networks contain different information, combining them substantially improved Molecular Function prediction within GO (from AUROC~0.63-0.75 for the individual data modalities to AUROC~0.8 in the aggregate). We investigated the boost in guilt-by-association gene function prediction when the networks are combined and describe underlying properties that can be further exploited. PMID:27467773

  14. Ligand Similarity Complements Sequence, Physical Interaction, and Co-Expression for Gene Function Prediction

    PubMed Central

    Shoichet, Brian K.; Gillis, Jesse

    2016-01-01

    The expansion of protein-ligand annotation databases has enabled large-scale networking of proteins by ligand similarity. These ligand-based protein networks, which implicitly predict the ability of neighboring proteins to bind related ligands, may complement biologically-oriented gene networks, which are used to predict functional or disease relevance. To quantify the degree to which such ligand-based protein associations might complement functional genomic associations, including sequence similarity, physical protein-protein interactions, co-expression, and disease gene annotations, we calculated a network based on the Similarity Ensemble Approach (SEA: sea.docking.org), where protein neighbors reflect the similarity of their ligands. We also measured the similarity with functional genomic networks over a common set of 1,131 genes, and found that the networks had only small overlaps, which were significant only due to the large scale of the data. Consistent with the view that the networks contain different information, combining them substantially improved Molecular Function prediction within GO (from AUROC~0.63–0.75 for the individual data modalities to AUROC~0.8 in the aggregate). We investigated the boost in guilt-by-association gene function prediction when the networks are combined and describe underlying properties that can be further exploited. PMID:27467773

  15. Improving protein-protein interactions prediction accuracy using protein evolutionary information and relevance vector machine model.

    PubMed

    An, Ji-Yong; Meng, Fan-Rong; You, Zhu-Hong; Chen, Xing; Yan, Gui-Ying; Hu, Ji-Pu

    2016-10-01

    Predicting protein-protein interactions (PPIs) is a challenging task and essential to construct the protein interaction networks, which is important for facilitating our understanding of the mechanisms of biological systems. Although a number of high-throughput technologies have been proposed to predict PPIs, there are unavoidable shortcomings, including high cost, time intensity, and inherently high false positive rates. For these reasons, many computational methods have been proposed for predicting PPIs. However, the problem is still far from being solved. In this article, we propose a novel computational method called RVM-BiGP that combines the relevance vector machine (RVM) model and Bi-gram Probabilities (BiGP) for PPIs detection from protein sequences. The major improvement includes (1) Protein sequences are represented using the Bi-gram probabilities (BiGP) feature representation on a Position Specific Scoring Matrix (PSSM), in which the protein evolutionary information is contained; (2) For reducing the influence of noise, the Principal Component Analysis (PCA) method is used to reduce the dimension of BiGP vector; (3) The powerful and robust Relevance Vector Machine (RVM) algorithm is used for classification. Five-fold cross-validation experiments executed on yeast and Helicobacter pylori datasets, which achieved very high accuracies of 94.57 and 90.57%, respectively. Experimental results are significantly better than previous methods. To further evaluate the proposed method, we compare it with the state-of-the-art support vector machine (SVM) classifier on the yeast dataset. The experimental results demonstrate that our RVM-BiGP method is significantly better than the SVM-based method. In addition, we achieved 97.15% accuracy on imbalance yeast dataset, which is higher than that of balance yeast dataset. The promising experimental results show the efficiency and robust of the proposed method, which can be an automatic decision support tool for future

  16. Improving protein-protein interactions prediction accuracy using protein evolutionary information and relevance vector machine model.

    PubMed

    An, Ji-Yong; Meng, Fan-Rong; You, Zhu-Hong; Chen, Xing; Yan, Gui-Ying; Hu, Ji-Pu

    2016-10-01

    Predicting protein-protein interactions (PPIs) is a challenging task and essential to construct the protein interaction networks, which is important for facilitating our understanding of the mechanisms of biological systems. Although a number of high-throughput technologies have been proposed to predict PPIs, there are unavoidable shortcomings, including high cost, time intensity, and inherently high false positive rates. For these reasons, many computational methods have been proposed for predicting PPIs. However, the problem is still far from being solved. In this article, we propose a novel computational method called RVM-BiGP that combines the relevance vector machine (RVM) model and Bi-gram Probabilities (BiGP) for PPIs detection from protein sequences. The major improvement includes (1) Protein sequences are represented using the Bi-gram probabilities (BiGP) feature representation on a Position Specific Scoring Matrix (PSSM), in which the protein evolutionary information is contained; (2) For reducing the influence of noise, the Principal Component Analysis (PCA) method is used to reduce the dimension of BiGP vector; (3) The powerful and robust Relevance Vector Machine (RVM) algorithm is used for classification. Five-fold cross-validation experiments executed on yeast and Helicobacter pylori datasets, which achieved very high accuracies of 94.57 and 90.57%, respectively. Experimental results are significantly better than previous methods. To further evaluate the proposed method, we compare it with the state-of-the-art support vector machine (SVM) classifier on the yeast dataset. The experimental results demonstrate that our RVM-BiGP method is significantly better than the SVM-based method. In addition, we achieved 97.15% accuracy on imbalance yeast dataset, which is higher than that of balance yeast dataset. The promising experimental results show the efficiency and robust of the proposed method, which can be an automatic decision support tool for future

  17. Functional divergence outlines the evolution of novel protein function in NifH/BchL protein family.

    PubMed

    Thakur, Subarna; Bothra, Asim K; Sen, Arnab

    2013-11-01

    Biological nitrogen fixation is accomplished by prokaryotes through the catalytic action of complex metalloenzyme, nitrogenase. Nitrogenase is a two-protein component system comprising MoFe protein (NifD and K) and Fe protein (NifH). NifH shares structural and mechanistic similarities as well as evolutionary relationships with light-independent protochlorophyllide reductase (BchL), a photosynthesis-related metalloenzyme belonging to the same protein family. We performed a comprehensive bioinformatics analysis of the NifH/BchL family in order to elucidate the intrinsic functional diversity and the underlying evolutionary mechanism among the members. To analyse functional divergence in the NifH/ BchL family, we have conducted pair-wise estimation in altered evolutionary rates between the member proteins. We identified a number of vital amino acid sites which contribute to predicted functional diversity. We have also made use of the maximum likelihood tests for detection of positive selection at the amino acid level followed by the structure-based phylogenetic approach to draw conclusion on the ancient lineage and novel characterization of the NifH/BchL protein family. Our investigation provides ample support to the fact that NifH protein and BchL share robust structural similarities and have probably deviated from a common ancestor followed by divergence in functional properties possibly due to gene duplication. PMID:24287653

  18. Origin and Functional Prediction of Pollen Allergens in Plants.

    PubMed

    Chen, Miaolin; Xu, Jie; Devis, Deborah; Shi, Jianxin; Ren, Kang; Searle, Iain; Zhang, Dabing

    2016-09-01

    Pollen allergies have long been a major pandemic health problem for human. However, the evolutionary events and biological function of pollen allergens in plants remain largely unknown. Here, we report the genome-wide prediction of pollen allergens and their biological function in the dicotyledonous model plant Arabidopsis (Arabidopsis thaliana) and the monocotyledonous model plant rice (Oryza sativa). In total, 145 and 107 pollen allergens were predicted from rice and Arabidopsis, respectively. These pollen allergens are putatively involved in stress responses and metabolic processes such as cell wall metabolism during pollen development. Interestingly, these putative pollen allergen genes were derived from large gene families and became diversified during evolution. Sequence analysis across 25 plant species from green alga to angiosperms suggest that about 40% of putative pollen allergenic proteins existed in both lower and higher plants, while other allergens emerged during evolution. Although a high proportion of gene duplication has been observed among allergen-coding genes, our data show that these genes might have undergone purifying selection during evolution. We also observed that epitopes of an allergen might have a biological function, as revealed by comprehensive analysis of two known allergens, expansin and profilin. This implies a crucial role of conserved amino acid residues in both in planta biological function and allergenicity. Finally, a model explaining how pollen allergens were generated and maintained in plants is proposed. Prediction and systematic analysis of pollen allergens in model plants suggest that pollen allergens were evolved by gene duplication and then functional specification. This study provides insight into the phylogenetic and evolutionary scenario of pollen allergens that will be helpful to future characterization and epitope screening of pollen allergens. PMID:27436829

  19. Prediction of Spontaneous Protein Deamidation from Sequence-Derived Secondary Structure and Intrinsic Disorder

    PubMed Central

    Lorenzo, J. Ramiro; Alonso, Leonardo G.; Sánchez, Ignacio E.

    2015-01-01

    Asparagine residues in proteins undergo spontaneous deamidation, a post-translational modification that may act as a molecular clock for the regulation of protein function and turnover. Asparagine deamidation is modulated by protein local sequence, secondary structure and hydrogen bonding. We present NGOME, an algorithm able to predict non-enzymatic deamidation of internal asparagine residues in proteins in the absence of structural data, using sequence-based predictions of secondary structure and intrinsic disorder. Compared to previous algorithms, NGOME does not require three-dimensional structures yet yields better predictions than available sequence-only methods. Four case studies of specific proteins show how NGOME may help the user identify deamidation-prone asparagine residues, often related to protein gain of function, protein degradation or protein misfolding in pathological processes. A fifth case study applies NGOME at a proteomic scale and unveils a correlation between asparagine deamidation and protein degradation in yeast. NGOME is freely available as a webserver at the National EMBnet node Argentina, URL: http://www.embnet.qb.fcen.uba.ar/ in the subpage “Protein and nucleic acid structure and sequence analysis”. PMID:26674530

  20. Prediction of Spontaneous Protein Deamidation from Sequence-Derived Secondary Structure and Intrinsic Disorder.

    PubMed

    Lorenzo, J Ramiro; Alonso, Leonardo G; Sánchez, Ignacio E

    2015-01-01

    Asparagine residues in proteins undergo spontaneous deamidation, a post-translational modification that may act as a molecular clock for the regulation of protein function and turnover. Asparagine deamidation is modulated by protein local sequence, secondary structure and hydrogen bonding. We present NGOME, an algorithm able to predict non-enzymatic deamidation of internal asparagine residues in proteins in the absence of structural data, using sequence-based predictions of secondary structure and intrinsic disorder. Compared to previous algorithms, NGOME does not require three-dimensional structures yet yields better predictions than available sequence-only methods. Four case studies of specific proteins show how NGOME may help the user identify deamidation-prone asparagine residues, often related to protein gain of function, protein degradation or protein misfolding in pathological processes. A fifth case study applies NGOME at a proteomic scale and unveils a correlation between asparagine deamidation and protein degradation in yeast. NGOME is freely available as a webserver at the National EMBnet node Argentina, URL: http://www.embnet.qb.fcen.uba.ar/ in the subpage "Protein and nucleic acid structure and sequence analysis".

  1. Knowledge base and neural network approach for protein secondary structure prediction.

    PubMed

    Patel, Maulika S; Mazumdar, Himanshu S

    2014-11-21

    Protein structure prediction is of great relevance given the abundant genomic and proteomic data generated by the genome sequencing projects. Protein secondary structure prediction is addressed as a sub task in determining the protein tertiary structure and function. In this paper, a novel algorithm, KB-PROSSP-NN, which is a combination of knowledge base and modeling of the exceptions in the knowledge base using neural networks for protein secondary structure prediction (PSSP), is proposed. The knowledge base is derived from a proteomic sequence-structure database and consists of the statistics of association between the 5-residue words and corresponding secondary structure. The predicted results obtained using knowledge base are refined with a Backpropogation neural network algorithm. Neural net models the exceptions of the knowledge base. The Q3 accuracy of 90% and 82% is achieved on the RS126 and CB396 test sets respectively which suggest improvement over existing state of art methods.

  2. Calreticulin: one protein, one gene, many functions.

    PubMed Central

    Michalak, M; Corbett, E F; Mesaeli, N; Nakamura, K; Opas, M

    1999-01-01

    The endoplasmic reticulum (ER) plays a critical role in the synthesis and chaperoning of membrane-associated and secreted proteins. The membrane is also an important site of Ca(2+) storage and release. Calreticulin is a unique ER luminal resident protein. The protein affects many cellular functions, both in the ER lumen and outside of the ER environment. In the ER lumen, calreticulin performs two major functions: chaperoning and regulation of Ca(2+) homoeostasis. Calreticulin is a highly versatile lectin-like chaperone, and it participates during the synthesis of a variety of molecules, including ion channels, surface receptors, integrins and transporters. The protein also affects intracellular Ca(2+) homoeostasis by modulation of ER Ca(2+) storage and transport. Studies on the cell biology of calreticulin revealed that the ER membrane is a very dynamic intracellular compartment affecting many aspects of cell physiology. PMID:10567207

  3. WeFold: A Coopetition for Protein Structure Prediction

    PubMed Central

    Khoury, George A.; Liwo, Adam; Khatib, Firas; Zhou, Hongyi; Chopra, Gaurav; Bacardit, Jaume; Bortot, Leandro O.; Faccioli, Rodrigo A.; Deng, Xin; He, Yi; Krupa, Pawel; Li, Jilong; Mozolewska, Magdalena A.; Sieradzan, Adam K.; Smadbeck, James; Wirecki, Tomasz; Cooper, Seth; Flatten, Jeff; Xu, Kefan; Baker, David; Cheng, Jianlin; Delbem, Alexandre C. B.; Floudas, Christodoulos A.; Keasar, Chen; Levitt, Michael; Popović, Zoran; Scheraga, Harold A.; Skolnick, Jeffrey; Crivelli, Silvia N.; Players, Foldit

    2014-01-01

    The protein structure prediction problem continues to elude scientists. Despite the introduction of many methods, only modest gains were made over the last decade for certain classes of prediction targets. To address this challenge, a social-media based worldwide collaborative effort, named WeFold, was undertaken by thirteen labs. During the collaboration, the labs were simultaneously competing with each other. Here, we present the first attempt at “coopetition” in scientific research applied to the protein structure prediction and refinement problems. The coopetition was possible by allowing the participating labs to contribute different components of their protein structure prediction pipelines and create new hybrid pipelines that they tested during CASP10. This manuscript describes both successes and areas needing improvement as identified throughout the first WeFold experiment and discusses the efforts that are underway to advance this initiative. A footprint of all contributions and structures are publicly accessible at http://www.wefold.org. PMID:24677212

  4. Rapid Catalytic Template Searching as an Enzyme Function Prediction Procedure

    PubMed Central

    Nilmeier, Jerome P.; Kirshner, Daniel A.; Wong, Sergio E.; Lightstone, Felice C.

    2013-01-01

    We present an enzyme protein function identification algorithm, Catalytic Site Identification (CatSId), based on identification of catalytic residues. The method is optimized for highly accurate template identification across a diverse template library and is also very efficient in regards to time and scalability of comparisons. The algorithm matches three-dimensional residue arrangements in a query protein to a library of manually annotated, catalytic residues – The Catalytic Site Atlas (CSA). Two main processes are involved. The first process is a rapid protein-to-template matching algorithm that scales quadratically with target protein size and linearly with template size. The second process incorporates a number of physical descriptors, including binding site predictions, in a logistic scoring procedure to re-score matches found in Process 1. This approach shows very good performance overall, with a Receiver-Operator-Characteristic Area Under Curve (AUC) of 0.971 for the training set evaluated. The procedure is able to process cofactors, ions, nonstandard residues, and point substitutions for residues and ions in a robust and integrated fashion. Sites with only two critical (catalytic) residues are challenging cases, resulting in AUCs of 0.9411 and 0.5413 for the training and test sets, respectively. The remaining sites show excellent performance with AUCs greater than 0.90 for both the training and test data on templates of size greater than two critical (catalytic) residues. The procedure has considerable promise for larger scale searches. PMID:23675414

  5. Rapid catalytic template searching as an enzyme function prediction procedure.

    PubMed

    Nilmeier, Jerome P; Kirshner, Daniel A; Wong, Sergio E; Lightstone, Felice C

    2013-01-01

    We present an enzyme protein function identification algorithm, Catalytic Site Identification (CatSId), based on identification of catalytic residues. The method is optimized for highly accurate template identification across a diverse template library and is also very efficient in regards to time and scalability of comparisons. The algorithm matches three-dimensional residue arrangements in a query protein to a library of manually annotated, catalytic residues--The Catalytic Site Atlas (CSA). Two main processes are involved. The first process is a rapid protein-to-template matching algorithm that scales quadratically with target protein size and linearly with template size. The second process incorporates a number of physical descriptors, including binding site predictions, in a logistic scoring procedure to re-score matches found in Process 1. This approach shows very good performance overall, with a Receiver-Operator-Characteristic Area Under Curve (AUC) of 0.971 for the training set evaluated. The procedure is able to process cofactors, ions, nonstandard residues, and point substitutions for residues and ions in a robust and integrated fashion. Sites with only two critical (catalytic) residues are challenging cases, resulting in AUCs of 0.9411 and 0.5413 for the training and test sets, respectively. The remaining sites show excellent performance with AUCs greater than 0.90 for both the training and test data on templates of size greater than two critical (catalytic) residues. The procedure has considerable promise for larger scale searches.

  6. 'Unite and conquer': enhanced prediction of protein subcellular localization by integrating multiple specialized tools

    PubMed Central

    Shen, Yao Qing; Burger, Gertraud

    2007-01-01

    Background Knowing the subcellular location of proteins provides clues to their function as well as the interconnectivity of biological processes. Dozens of tools are available for predicting protein location in the eukaryotic cell. Each tool performs well on certain data sets, but their predictions often disagree for a given protein. Since the individual tools each have particular strengths, we set out to integrate them in a way that optimally exploits their potential. The method we present here is applicable to various subcellular locations, but tailored for predicting whether or not a protein is localized in mitochondria. Knowledge of the mitochondrial proteome is relevant to understanding the role of this organelle in global cellular processes. Results In order to develop a method for enhanced prediction of subcellular localization, we integrated the outputs of available localization prediction tools by several strategies, and tested the performance of each strategy with known mitochondrial proteins. The accuracy obtained (up to 92%) surpasses by far the individual tools. The method of integration proved crucial to the performance. For the prediction of mitochondrion-located proteins, integration via a two-layer decision tree clearly outperforms simpler methods, as it allows emphasis of biologically relevant features such as the mitochondrial targeting peptide and transmembrane domains. Conclusion We developed an approach that enhances the prediction accuracy of mitochondrial proteins by uniting the strength of specialized tools. The combination of machine-learning based integration with biological expert knowledge leads to improved performance. This approach also alleviates the conundrum of how to choose between conflicting predictions. Our approach is easy to implement, and applicable to predicting subcellular locations other than mitochondria, as well as other biological features. For a trial of our approach, we provide a webservice for mitochondrial protein

  7. Insights into prion protein function from atomistic simulations.

    PubMed

    Hodak, Miroslav; Bernholc, Jerzy

    2010-01-01

    Computer simulations are a powerful tool for studies of biological systems. They have often been used to study prion protein (PrP), a protein responsible for neurodegenerative diseases, which include "mad cow disease" in cattle and Creutzfeldt-Jacob disease in humans. An important aspect of the prion protein is its interaction with copper ion, which is thought to be relevant for PrP's yet undetermined function and also potentially play a role in prion diseases. for studies of copper attachment to the prion protein, computer simulations have often been used to complement experimental data and to obtain binding structures of Cu-PrP complexes. This paper summarizes the results of recent ab initio calculations of copper-prion protein interactions focusing on the recently discovered concentration-dependent binding modes in the octarepeat region of this protein. In addition to determining the binding structures, computer simulations were also used to make predictions about PrP's function and the role of copper in prion diseases. The results demonstrate the predictive power and applicability of ab initio simulations for studies of metal-biomolecular complexes. PMID:20118658

  8. Insights into prion protein function from atomistic simulations

    PubMed Central

    Hodak, Miroslav

    2010-01-01

    Computer simulations are a powerful tool for studies of biological systems. They have often been used to study prion protein (PrP), a protein responsible for neurodegenerative diseases, which include “mad cow disease” in cattle and Creutzfeldt-Jacob disease in humans. An important aspect of the prion protein is its interaction with copper ion, which is thought to be relevant for PrP’s yet undetermined function and also potentially play a role in prion diseases. For studies of copper attachment to the prion protein, computer simulations have often been used to complement experimental data and to obtain binding structures of Cu-PrP complexes. This paper summarizes the results of recent ab initio calculations of copper-prion protein interactions focusing on the recently discovered concentration-dependent binding modes in the octarepeat region of this protein. In addition to determining the binding structures, computer simulations were also used to make predictions about PrP’s function and the role of copper in prion diseases. The results demonstrate the predictive power and applicability of ab initio simulations for studies of metal-biomolecular complexes. PMID:20118658

  9. Computational approaches for inferring the functions of intrinsically disordered proteins

    PubMed Central

    Varadi, Mihaly; Vranken, Wim; Guharoy, Mainak; Tompa, Peter

    2015-01-01

    Intrinsically disordered proteins (IDPs) are ubiquitously involved in cellular processes and often implicated in human pathological conditions. The critical biological roles of these proteins, despite not adopting a well-defined fold, encouraged structural biologists to revisit their views on the protein structure-function paradigm. Unfortunately, investigating the characteristics and describing the structural behavior of IDPs is far from trivial, and inferring the function(s) of a disordered protein region remains a major challenge. Computational methods have proven particularly relevant for studying IDPs: on the sequence level their dependence on distinct characteristics determined by the local amino acid context makes sequence-based prediction algorithms viable and reliable tools for large scale analyses, while on the structure level the in silico integration of fundamentally different experimental data types is essential to describe the behavior of a flexible protein chain. Here, we offer an overview of the latest developments and computational techniques that aim to uncover how protein function is connected to intrinsic disorder. PMID:26301226

  10. Gene3D: modelling protein structure, function and evolution.

    PubMed

    Yeats, Corin; Maibaum, Michael; Marsden, Russell; Dibley, Mark; Lee, David; Addou, Sarah; Orengo, Christine A

    2006-01-01

    The Gene3D release 4 database and web portal (http://cathwww.biochem.ucl.ac.uk:8080/Gene3D) provide a combined structural, functional and evolutionary view of the protein world. It is focussed on providing structural annotation for protein sequences without structural representatives--including the complete proteome sets of over 240 different species. The protein sequences have also been clustered into whole-chain families so as to aid functional prediction. The structural annotation is generated using HMM models based on the CATH domain families; CATH is a repository for manually deduced protein domains. Amongst the changes from the last publication are: the addition of over 100 genomes and the UniProt sequence database, domain data from Pfam, metabolic pathway and functional data from COGs, KEGG and GO, and protein-protein interaction data from MINT and BIND. The website has been rebuilt to allow more sophisticated querying and the data returned is presented in a clearer format with greater functionality. Furthermore, all data can be downloaded in a simple XML format, allowing users to carry out complex investigations at their own computers.

  11. Investigating neuronal function with optically controllable proteins

    PubMed Central

    Zhou, Xin X.; Pan, Michael; Lin, Michael Z.

    2015-01-01

    In the nervous system, protein activities are highly regulated in space and time. This regulation allows for fine modulation of neuronal structure and function during development and adaptive responses. For example, neurite extension and synaptogenesis both involve localized and transient activation of cytoskeletal and signaling proteins, allowing changes in microarchitecture to occur rapidly and in a localized manner. To investigate the role of specific protein regulation events in these processes, methods to optically control the activity of specific proteins have been developed. In this review, we focus on how photosensory domains enable optical control over protein activity and have been used in neuroscience applications. These tools have demonstrated versatility in controlling various proteins and thereby cellular functions, and possess enormous potential for future applications in nervous systems. Just as optogenetic control of neuronal firing using opsins has changed how we investigate the function of cellular circuits in vivo, optical control may yet yield another revolution in how we study the circuitry of intracellular signaling in the brain. PMID:26257603

  12. Protein-protein interaction network-based detection of functionally similar proteins within species.

    PubMed

    Song, Baoxing; Wang, Fen; Guo, Yang; Sang, Qing; Liu, Min; Li, Dengyun; Fang, Wei; Zhang, Deli

    2012-07-01

    Although functionally similar proteins across species have been widely studied, functionally similar proteins within species showing low sequence similarity have not been examined in detail. Identification of these proteins is of significant importance for understanding biological functions, evolution of protein families, progression of co-evolution, and convergent evolution and others which cannot be obtained by detection of functionally similar proteins across species. Here, we explored a method of detecting functionally similar proteins within species based on graph theory. After denoting protein-protein interaction networks using graphs, we split the graphs into subgraphs using the 1-hop method. Proteins with functional similarities in a species were detected using a method of modified shortest path to compare these subgraphs and to find the eligible optimal results. Using seven protein-protein interaction networks and this method, some functionally similar proteins with low sequence similarity that cannot detected by sequence alignment were identified. By analyzing the results, we found that, sometimes, it is difficult to separate homologous from convergent evolution. Evaluation of the performance of our method by gene ontology term overlap showed that the precision of our method was excellent.

  13. Predicting Ligand Binding Sites on Protein Surfaces by 3-Dimensional Probability Density Distributions of Interacting Atoms

    PubMed Central

    Jian, Jhih-Wei; Elumalai, Pavadai; Pitti, Thejkiran; Wu, Chih Yuan; Tsai, Keng-Chang; Chang, Jeng-Yih; Peng, Hung-Pin; Yang, An-Suei

    2016-01-01

    Predicting ligand binding sites (LBSs) on protein structures, which are obtained either from experimental or computational methods, is a useful first step in functional annotation or structure-based drug design for the protein structures. In this work, the structure-based machine learning algorithm ISMBLab-LIG was developed to predict LBSs on protein surfaces with input attributes derived from the three-dimensional probability density maps of interacting atoms, which were reconstructed on the query protein surfaces and were relatively insensitive to local conformational variations of the tentative ligand binding sites. The prediction accuracy of the ISMBLab-LIG predictors is comparable to that of the best LBS predictors benchmarked on several well-established testing datasets. More importantly, the ISMBLab-LIG algorithm has substantial tolerance to the prediction uncertainties of computationally derived protein structure models. As such, the method is particularly useful for predicting LBSs not only on experimental protein structures without known LBS templates in the database but also on computationally predicted model protein structures with structural uncertainties in the tentative ligand binding sites. PMID:27513851

  14. Evolution-Based Functional Decomposition of Proteins

    PubMed Central

    Rivoire, Olivier; Reynolds, Kimberly A.; Ranganathan, Rama

    2016-01-01

    The essential biological properties of proteins—folding, biochemical activities, and the capacity to adapt—arise from the global pattern of interactions between amino acid residues. The statistical coupling analysis (SCA) is an approach to defining this pattern that involves the study of amino acid coevolution in an ensemble of sequences comprising a protein family. This approach indicates a functional architecture within proteins in which the basic units are coupled networks of amino acids termed sectors. This evolution-based decomposition has potential for new understandings of the structural basis for protein function. To facilitate its usage, we present here the principles and practice of the SCA and introduce new methods for sector analysis in a python-based software package (pySCA). We show that the pattern of amino acid interactions within sectors is linked to the divergence of functional lineages in a multiple sequence alignment—a model for how sector properties might be differentially tuned in members of a protein family. This work provides new tools for studying proteins and for generally testing the concept of sectors as the principal units of function and adaptive variation. PMID:27254668

  15. Different combinations of atomic interactions predict protein-small molecule and protein-DNA/RNA affinities with similar accuracy.

    PubMed

    Dias, Raquel; Kolazckowski, Bryan

    2015-11-01

    Interactions between proteins and other molecules play essential roles in all biological processes. Although it is widely held that a protein's ligand specificity is determined primarily by its three-dimensional structure, the general principles by which structure determines ligand binding remain poorly understood. Here we use statistical analyses of a large number of protein-ligand complexes with associated binding-affinity measurements to quantitatively characterize how combinations of atomic interactions contribute to ligand affinity. We find that there are significant differences in how atomic interactions determine ligand affinity for proteins that bind small chemical ligands, those that bind DNA/RNA and those that interact with other proteins. Although protein-small molecule and protein-DNA/RNA binding affinities can be accurately predicted from structural data, models predicting one type of interaction perform poorly on the others. Additionally, the particular combinations of atomic interactions required to predict binding affinity differed between small-molecule and DNA/RNA data sets, consistent with the conclusion that the structural bases determining ligand affinity differ among interaction types. In contrast to what we observed for small-molecule and DNA/RNA interactions, no statistical models were capable of predicting protein-protein affinity with >60% correlation. We demonstrate the potential usefulness of protein-DNA/RNA binding prediction as a possible tool for high-throughput virtual screening to guide laboratory investigations, suggesting that quantitative characterization of diverse molecular interactions may have practical applications as well as fundamentally advancing our understanding of how molecular structure translates into function.

  16. Protein Secondary Structure Prediction Using Local Adaptive Techniques in Training Neural Networks

    NASA Astrophysics Data System (ADS)

    Aik, Lim Eng; Zainuddin, Zarita; Joseph, Annie

    2008-01-01

    One of the most significant problems in computer molecular biology today is how to predict a protein's three-dimensional structure from its one-dimensional amino acid sequence or generally call the protein folding problem and difficult to determine the corresponding protein functions. Thus, this paper involves protein secondary structure prediction using neural network in order to solve the protein folding problem. The neural network used for protein secondary structure prediction is multilayer perceptron (MLP) of the feed-forward variety. The training set are taken from the protein data bank which are 120 proteins while 60 testing set is the proteins which were chosen randomly from the protein data bank. Multiple sequence alignment (MSA) is used to get the protein similar sequence and Position Specific Scoring matrix (PSSM) is used for network input. The training process of the neural network involves local adaptive techniques. Local adaptive techniques used in this paper comprises Learning rate by sign changes, SuperSAB, Quickprop and RPROP. From the simulation, the performance for learning rate by Rprop and Quickprop are superior to all other algorithms with respect to the convergence time. However, the best result was obtained using Rprop algorithm.

  17. Efficient Prediction of Co-Complexed Proteins Based on Coevolution

    PubMed Central

    de Vienne, Damien M.; Azé, Jérôme

    2012-01-01

    The prediction of the network of protein-protein interactions (PPI) of an organism is crucial for the understanding of biological processes and for the development of new drugs. Machine learning methods have been successfully applied to the prediction of PPI in yeast by the integration of multiple direct and indirect biological data sources. However, experimental data are not available for most organisms. We propose here an ensemble machine learning approach for the prediction of PPI that depends solely on features independent from experimental data. We developed new estimators of the coevolution between proteins and combined them in an ensemble learning procedure. We applied this method to a dataset of known co-complexed proteins in Escherichia coli and compared it to previously published methods. We show that our method allows prediction of PPI with an unprecedented precision of 95.5% for the first 200 sorted pairs of proteins compared to 28.5% on the same dataset with the previous best method. A close inspection of the best predicted pairs allowed us to detect new or recently discovered interactions between chemotactic components, the flagellar apparatus and RNA polymerase complexes in E. coli. PMID:23152796

  18. Functional module identification in protein interaction networks by interaction patterns

    PubMed Central

    Wang, Yijie; Qian, Xiaoning

    2014-01-01

    Motivation: Identifying functional modules in protein–protein interaction (PPI) networks may shed light on cellular functional organization and thereafter underlying cellular mechanisms. Many existing module identification algorithms aim to detect densely connected groups of proteins as potential modules. However, based on this simple topological criterion of ‘higher than expected connectivity’, those algorithms may miss biologically meaningful modules of functional significance, in which proteins have similar interaction patterns to other proteins in networks but may not be densely connected to each other. A few blockmodel module identification algorithms have been proposed to address the problem but the lack of global optimum guarantee and the prohibitive computational complexity have been the bottleneck of their applications in real-world large-scale PPI networks. Results: In this article, we propose a novel optimization formulation LCP2 (low two-hop conductance sets) using the concept of Markov random walk on graphs, which enables simultaneous identification of both dense and sparse modules based on protein interaction patterns in given networks through searching for LCP2 by random walk. A spectral approximate algorithm SLCP2 is derived to identify non-overlapping functional modules. Based on a bottom-up greedy strategy, we further extend LCP2 to a new algorithm (greedy algorithm for LCP2) GLCP2 to identify overlapping functional modules. We compare SLCP2 and GLCP2 with a range of state-of-the-art algorithms on synthetic networks and real-world PPI networks. The performance evaluation based on several criteria with respect to protein complex prediction, high level Gene Ontology term prediction and especially sparse module detection, has demonstrated that our algorithms based on searching for LCP2 outperform all other compared algorithms. Availability and implementation: All data and code are available at http://www.cse.usf.edu/∼xqian/fmi/slcp2hop

  19. DPROT: prediction of disordered proteins using evolutionary information.

    PubMed

    Sethi, Deepti; Garg, Aarti; Raghava, G P S

    2008-10-01

    The association of structurally disordered proteins with a number of diseases has engendered enormous interest and therefore demands a prediction method that would facilitate their expeditious study at molecular level. The present study describes the development of a computational method for predicting disordered proteins using sequence and profile compositions as input features for the training of SVM models. First, we developed the amino acid and dipeptide compositions based SVM modules which yielded sensitivities of 75.6 and 73.2% along with Matthew's Correlation Coefficient (MCC) values of 0.75 and 0.60, respectively. In addition, the use of predicted secondary structure content (coil, sheet and helices) in the form of composition values attained a sensitivity of 76.8% and MCC value of 0.77. Finally, the training of SVM models using evolutionary information hidden in the multiple sequence alignment profile improved the prediction performance by achieving a sensitivity value of 78% and MCC of 0.78. Furthermore, when evaluated on an independent dataset of partially disordered proteins, the same SVM module provided a correct prediction rate of 86.6%. Based on the above study, a web server ("DPROT") was developed for the prediction of disordered proteins, which is available at http://www.imtech.res.in/raghava/dprot/.

  20. Defining and predicting structurally conserved regions in protein superfamilies

    PubMed Central

    Huang, Ivan K.; Grishin, Nick V.

    2013-01-01

    Motivation: The structures of homologous proteins are generally better conserved than their sequences. This phenomenon is demonstrated by the prevalence of structurally conserved regions (SCRs) even in highly divergent protein families. Defining SCRs requires the comparison of two or more homologous structures and is affected by their availability and divergence, and our ability to deduce structurally equivalent positions among them. In the absence of multiple homologous structures, it is necessary to predict SCRs of a protein using information from only a set of homologous sequences and (if available) a single structure. Accurate SCR predictions can benefit homology modelling and sequence alignment. Results: Using pairwise DaliLite alignments among a set of homologous structures, we devised a simple measure of structural conservation, termed structural conservation index (SCI). SCI was used to distinguish SCRs from non-SCRs. A database of SCRs was compiled from 386 SCOP superfamilies containing 6489 protein domains. Artificial neural networks were then trained to predict SCRs with various features deduced from a single structure and homologous sequences. Assessment of the predictions via a 5-fold cross-validation method revealed that predictions based on features derived from a single structure perform similarly to ones based on homologous sequences, while combining sequence and structural features was optimal in terms of accuracy (0.755) and Matthews correlation coefficient (0.476). These results suggest that even without information from multiple structures, it is still possible to effectively predict SCRs for a protein. Finally, inspection of the structures with the worst predictions pinpoints difficulties in SCR definitions. Availability: The SCR database and the prediction server can be found at http://prodata.swmed.edu/SCR. Contact: 91huangi@gmail.com or grishin@chop.swmed.edu Supplementary information: Supplementary data are available at Bioinformatics

  1. Prediction of disease-related mutations affecting protein localization

    PubMed Central

    Laurila, Kirsti; Vihinen, Mauno

    2009-01-01

    Background Eukaryotic cells contain numerous compartments, which have different protein constituents. Proteins are typically directed to compartments by short peptide sequences that act as targeting signals. Translocation to the proper compartment allows a protein to form the necessary interactions with its partners and take part in biological networks such as signalling and metabolic pathways. If a protein is not transported to the correct intracellular compartment either the reaction performed or information carried by the protein does not reach the proper site, causing either inactivation of central reactions or misregulation of signalling cascades, or the mislocalized active protein has harmful effects by acting in the wrong place. Results Numerous methods have been developed to predict protein subcellular localization with quite high accuracy. We applied bioinformatics methods to investigate the effects of known disease-related mutations on protein targeting and localization by analyzing over 22,000 missense mutations in more than 1,500 proteins with two complementary prediction approaches. Several hundred putative localization affecting mutations were identified and investigated statistically. Conclusion Although alterations to localization signals are rare, these effects should be taken into account when analyzing the consequences of disease-related mutations. PMID:19309509

  2. A Prediction Model of the Capillary Pressure J-Function

    PubMed Central

    Xu, W. S.; Luo, P. Y.; Sun, L.; Lin, N.

    2016-01-01

    The capillary pressure J-function is a dimensionless measure of the capillary pressure of a fluid in a porous medium. The function was derived based on a capillary bundle model. However, the dependence of the J-function on the saturation Sw is not well understood. A prediction model for it is presented based on capillary pressure model, and the J-function prediction model is a power function instead of an exponential or polynomial function. Relative permeability is calculated with the J-function prediction model, resulting in an easier calculation and results that are more representative. PMID:27603701

  3. A Prediction Model of the Capillary Pressure J-Function.

    PubMed

    Xu, W S; Luo, P Y; Sun, L; Lin, N

    2016-01-01

    The capillary pressure J-function is a dimensionless measure of the capillary pressure of a fluid in a porous medium. The function was derived based on a capillary bundle model. However, the dependence of the J-function on the saturation Sw is not well understood. A prediction model for it is presented based on capillary pressure model, and the J-function prediction model is a power function instead of an exponential or polynomial function. Relative permeability is calculated with the J-function prediction model, resulting in an easier calculation and results that are more representative. PMID:27603701

  4. Functions of TET Proteins in Hematopoietic Transformation.

    PubMed

    Han, Jae-A; An, Jungeun; Ko, Myunggon

    2015-11-01

    DNA methylation is a well-characterized epigenetic modification that plays central roles in mammalian development, genomic imprinting, X-chromosome inactivation and silencing of retrotransposon elements. Aberrant DNA methylation pattern is a characteristic feature of cancers and associated with abnormal expression of oncogenes, tumor suppressor genes or repair genes. Ten-eleven-translocation (TET) proteins are recently characterized dioxygenases that catalyze progressive oxidation of 5-methylcytosine to produce 5-hydroxymethylcytosine and further oxidized derivatives. These oxidized methylcytosines not only potentiate DNA demethylation but also behave as independent epigenetic modifications per se. The expression or activity of TET proteins and DNA hydroxymethylation are highly dysregulated in a wide range of cancers including hematologic and non-hematologic malignancies, and accumulating evidence points TET proteins as a novel tumor suppressor in cancers. Here we review DNA demethylation-dependent and -independent functions of TET proteins. We also describe diverse TET loss-of-function mutations that are recurrently found in myeloid and lymphoid malignancies and their potential roles in hematopoietic transformation. We discuss consequences of the deficiency of individual Tet genes and potential compensation between different Tet members in mice. Possible mechanisms underlying facilitated oncogenic transformation of TET-deficient hematopoietic cells are also described. Lastly, we address non-mutational mechanisms that lead to suppression or inactivation of TET proteins in cancers. Strategies to restore normal 5mC oxidation status in cancers by targeting TET proteins may provide new avenues to expedite the development of promising anti-cancer agents.

  5. The lipocalin protein family: structure and function.

    PubMed Central

    Flower, D R

    1996-01-01

    The lipocalin protein family is a large group of small extracellular proteins. The family demonstrates great diversity at the sequence level; however, most lipocalins share three characteristic conserved sequence motifs, the kernel lipocalins, while a group of more divergent family members, the outlier lipocalins, share only one. Belying this sequence dissimilarity, lipocalin crystal structures are highly conserved and comprise a single eight-stranded continuously hydrogen-bonded antiparallel beta-barrel, which encloses an internal ligand-binding site. Together with two other families of ligand-binding proteins, the fatty-acid-binding proteins (FABPs) and the avidins, the lipocalins form part of an overall structural superfamily: the calycins. Members of the lipocalin family are characterized by several common molecular-recognition properties: the ability to bind a range of small hydrophobic molecules, binding to specific cell-surface receptors and the formation of complexes with soluble macromolecules. The varied biological functions of the lipocalins are mediated by one or more of these properties. In the past, the lipocalins have been classified as transport proteins; however, it is now clear that the lipocalins exhibit great functional diversity, with roles in retinol transport, invertebrate cryptic coloration, olfaction and pheromone transport, and prostaglandin synthesis. The lipocalins have also been implicated in the regulation of cell homoeostasis and the modulation of the immune response, and, as carrier proteins, to act in the general clearance of endogenous and exogenous compounds. PMID:8761444

  6. Highly Accurate Prediction of Protein-Protein Interactions via Incorporating Evolutionary Information and Physicochemical Characteristics

    PubMed Central

    Li, Zheng-Wei; You, Zhu-Hong; Chen, Xing; Gui, Jie; Nie, Ru

    2016-01-01

    Protein-protein interactions (PPIs) occur at almost all levels of cell functions and play crucial roles in various cellular processes. Thus, identification of PPIs is critical for deciphering the molecular mechanisms and further providing insight into biological processes. Although a variety of high-throughput experimental techniques have been developed to identify PPIs, existing PPI pairs by experimental approaches only cover a small fraction of the whole PPI networks, and further, those approaches hold inherent disadvantages, such as being time-consuming, expensive, and having high false positive rate. Therefore, it is urgent and imperative to develop automatic in silico approaches to predict PPIs efficiently and accurately. In this article, we propose a novel mixture of physicochemical and evolutionary-based feature extraction method for predicting PPIs using our newly developed discriminative vector machine (DVM) classifier. The improvements of the proposed method mainly consist in introducing an effective feature extraction method that can capture discriminative features from the evolutionary-based information and physicochemical characteristics, and then a powerful and robust DVM classifier is employed. To the best of our knowledge, it is the first time that DVM model is applied to the field of bioinformatics. When applying the proposed method to the Yeast and Helicobacter pylori (H. pylori) datasets, we obtain excellent prediction accuracies of 94.35% and 90.61%, respectively. The computational results indicate that our method is effective and robust for predicting PPIs, and can be taken as a useful supplementary tool to the traditional experimental methods for future proteomics research. PMID:27571061

  7. Highly Accurate Prediction of Protein-Protein Interactions via Incorporating Evolutionary Information and Physicochemical Characteristics.

    PubMed

    Li, Zheng-Wei; You, Zhu-Hong; Chen, Xing; Gui, Jie; Nie, Ru

    2016-01-01

    Protein-protein interactions (PPIs) occur at almost all levels of cell functions and play crucial roles in various cellular processes. Thus, identification of PPIs is critical for deciphering the molecular mechanisms and further providing insight into biological processes. Although a variety of high-throughput experimental techniques have been developed to identify PPIs, existing PPI pairs by experimental approaches only cover a small fraction of the whole PPI networks, and further, those approaches hold inherent disadvantages, such as being time-consuming, expensive, and having high false positive rate. Therefore, it is urgent and imperative to develop automatic in silico approaches to predict PPIs efficiently and accurately. In this article, we propose a novel mixture of physicochemical and evolutionary-based feature extraction method for predicting PPIs using our newly developed discriminative vector machine (DVM) classifier. The improvements of the proposed method mainly consist in introducing an effective feature extraction method that can capture discriminative features from the evolutionary-based information and physicochemical characteristics, and then a powerful and robust DVM classifier is employed. To the best of our knowledge, it is the first time that DVM model is applied to the field of bioinformatics. When applying the proposed method to the Yeast and Helicobacter pylori (H. pylori) datasets, we obtain excellent prediction accuracies of 94.35% and 90.61%, respectively. The computational results indicate that our method is effective and robust for predicting PPIs, and can be taken as a useful supplementary tool to the traditional experimental methods for future proteomics research. PMID:27571061

  8. Highly Accurate Prediction of Protein-Protein Interactions via Incorporating Evolutionary Information and Physicochemical Characteristics.

    PubMed

    Li, Zheng-Wei; You, Zhu-Hong; Chen, Xing; Gui, Jie; Nie, Ru

    2016-01-01

    Protein-protein interactions (PPIs) occur at almost all levels of cell functions and play crucial roles in various cellular processes. Thus, identification of PPIs is critical for deciphering the molecular mechanisms and further providing insight into biological processes. Although a variety of high-throughput experimental techniques have been developed to identify PPIs, existing PPI pairs by experimental approaches only cover a small fraction of the whole PPI networks, and further, those approaches hold inherent disadvantages, such as being time-consuming, expensive, and having high false positive rate. Therefore, it is urgent and imperative to develop automatic in silico approaches to predict PPIs efficiently and accurately. In this article, we propose a novel mixture of physicochemical and evolutionary-based feature extraction method for predicting PPIs using our newly developed discriminative vector machine (DVM) classifier. The improvements of the proposed method mainly consist in introducing an effective feature extraction method that can capture discriminative features from the evolutionary-based information and physicochemical characteristics, and then a powerful and robust DVM classifier is employed. To the best of our knowledge, it is the first time that DVM model is applied to the field of bioinformatics. When applying the proposed method to the Yeast and Helicobacter pylori (H. pylori) datasets, we obtain excellent prediction accuracies of 94.35% and 90.61%, respectively. The computational results indicate that our method is effective and robust for predicting PPIs, and can be taken as a useful supplementary tool to the traditional experimental methods for future proteomics research.

  9. Functional Constraint Profiling of a Viral Protein Reveals Discordance of Evolutionary Conservation and Functionality.

    PubMed

    Wu, Nicholas C; Olson, C Anders; Du, Yushen; Le, Shuai; Tran, Kevin; Remenyi, Roland; Gong, Danyang; Al-Mawsawi, Laith Q; Qi, Hangfei; Wu, Ting-Ting; Sun, Ren

    2015-07-01

    Viruses often encode proteins with multiple functions due to their compact genomes. Existing approaches to identify functional residues largely rely on sequence conservation analysis. Inferring functional residues from sequence conservation can produce false positives, in which the conserved residues are functionally silent, or false negatives, where functional residues are not identified since they are species-specific and therefore non-conserved. Furthermore, the tedious process of constructing and analyzing individual mutations limits the number of residues that can be examined in a single study. Here, we developed a systematic approach to identify the functional residues of a viral protein by coupling experimental fitness profiling with protein stability prediction using the influenza virus polymerase PA subunit as the target protein. We identified a significant number of functional residues that were influenza type-specific and were evolutionarily non-conserved among different influenza types. Our results indicate that type-specific functional residues are prevalent and may not otherwise be identified by sequence conservation analysis alone. More importantly, this technique can be adapted to any viral (and potentially non-viral) protein where structural information is available.

  10. Knowledge of Native Protein-Protein Interfaces Is Sufficient To Construct Predictive Models for the Selection of Binding Candidates.

    PubMed

    Popov, Petr; Grudinin, Sergei

    2015-10-26

    Selection of putative binding poses is a challenging part of virtual screening for protein-protein interactions. Predictive models to filter out binding candidates with the highest binding affinities comprise scoring functions that assign a score to each binding pose. Existing scoring functions are typically deduced by collecting statistical information about interfaces of native conformations of protein complexes along with interfaces of a large generated set of non-native conformations. However, the obtained scoring functions become biased toward the method used to generate the non-native conformations, i.e., they may not recognize near-native interfaces generated with a different method. The present study demonstrates that knowledge of only native protein-protein interfaces is sufficient to construct well-discriminative predictive models for the selection of binding candidates. Here we introduce a new scoring method that comprises a knowledge-based potential called KSENIA deduced from structural information about the native interfaces of 844 crystallographic protein-protein complexes. We derive KSENIA using convex optimization with a training set composed of native protein complexes and their near-native conformations obtained using deformations along the low-frequency normal modes. As a result, our knowledge-based potential has only marginal bias toward a method used to generate putative binding poses. Furthermore, KSENIA is smooth by construction, which allows it to be used along with rigid-body optimization to refine the binding poses. Using several test benchmarks, we demonstrate that our method discriminates well native and near-native conformations of protein complexes from non-native ones. Our methodology can be easily adapted to the recognition of other types of molecular interactions, such as protein-ligand, protein-RNA, etc. KSENIA will be made publicly available as a part of the SAMSON software platform at https://team.inria.fr/nano-d/software . PMID

  11. Modification of sorghum proteins for enhanced functionality

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Sorghum is the third most widely produced crop in the United States (U.S.) and fifth in the world during fiscal year 2006/07(USDA-FAS, 2007). The use of sorghum in foods faces functional and nutritional constraints due, mainly, to the rigidity of the protein bodies. The disruption and modificatio...

  12. PPCM: Combing Multiple Classifiers to Improve Protein-Protein Interaction Prediction

    DOE PAGES

    Yao, Jianzhuang; Guo, Hong; Yang, Xiaohan

    2015-01-01

    Determining protein-protein interaction (PPI) in biological systems is of considerable importance, and prediction of PPI has become a popular research area. Although different classifiers have been developed for PPI prediction, no single classifier seems to be able to predict PPI with high confidence. We postulated that by combining individual classifiers the accuracy of PPI prediction could be improved. We developed a method called protein-protein interaction prediction classifiers merger (PPCM), and this method combines output from two PPI prediction tools, GO2PPI and Phyloprof, using Random Forests algorithm. The performance of PPCM was tested by area under the curve (AUC) using anmore » assembled Gold Standard database that contains both positive and negative PPI pairs. Our AUC test showed that PPCM significantly improved the PPI prediction accuracy over the corresponding individual classifiers. We found that additional classifiers incorporated into PPCM could lead to further improvement in the PPI prediction accuracy. Furthermore, cross species PPCM could achieve competitive and even better prediction accuracy compared to the single species PPCM. This study established a robust pipeline for PPI prediction by integrating multiple classifiers using Random Forests algorithm. This pipeline will be useful for predicting PPI in nonmodel species.« less

  13. A predicted protein interactome identifies conserved global networks and disease resistance subnetworks in maize

    PubMed Central

    Musungu, Bryan; Bhatnagar, Deepak; Brown, Robert L.; Fakhoury, Ahmad M.; Geisler, Matt

    2015-01-01

    Interactomes are genome-wide roadmaps of protein-protein interactions. They have been produced for humans, yeast, the fruit fly, and Arabidopsis thaliana and have become invaluable tools for generating and testing hypotheses. A predicted interactome for Zea mays (PiZeaM) is presented here as an aid to the research community for this valuable crop species. PiZeaM was built using a proven method of interologs (interacting orthologs) that were identified using both one-to-one and many-to-many orthology between genomes of maize and reference species. Where both maize orthologs occurred for an experimentally determined interaction in the reference species, we predicted a likely interaction in maize. A total of 49,026 unique interactions for 6004 maize proteins were predicted. These interactions are enriched for processes that are evolutionarily conserved, but include many otherwise poorly annotated proteins in maize. The predicted maize interactions were further analyzed by comparing annotation of interacting proteins, including different layers of ontology. A map of pairwise gene co-expression was also generated and compared to predicted interactions. Two global subnetworks were constructed for highly conserved interactions. These subnetworks showed clear clustering of proteins by function. Another subnetwork was created for disease response using a bait and prey strategy to capture interacting partners for proteins that respond to other organisms. Closer examination of this subnetwork revealed the connectivity between biotic and abiotic hormone stress pathways. We believe PiZeaM will provide a useful tool for the prediction of protein function and analysis of pathways for Z. mays researchers and is presented in this paper as a reference tool for the exploration of protein interactions in maize. PMID:26089837

  14. Functional conservation of an ancestral Pellino protein in helminth species.

    PubMed

    Cluxton, Christopher D; Caffrey, Brian E; Kinsella, Gemma K; Moynagh, Paul N; Fares, Mario A; Fallon, Padraic G

    2015-01-01

    The immune system of H. sapiens has innate signaling pathways that arose in ancestral species. This is exemplified by the discovery of the Toll-like receptor (TLR) pathway using free-living model organisms such as Drosophila melanogaster. The TLR pathway is ubiquitous and controls sensitivity to pathogen-associated molecular patterns (PAMPs) in eukaryotes. There is, however, a marked absence of this pathway from the plathyhelminthes, with the exception of the Pellino protein family, which is present in a number of species from this phylum. Helminth Pellino proteins are conserved having high similarity, both at the sequence and predicted structural protein level, with that of human Pellino proteins. Pellino from a model helminth, Schistosoma mansoni Pellino (SmPellino), was shown to bind and poly-ubiquitinate human IRAK-1, displaying E3 ligase activity consistent with its human counterparts. When transfected into human cells SmPellino is functional, interacting with signaling proteins and modulating mammalian signaling pathways. Strict conservation of a protein family in species lacking its niche signalling pathway is rare and provides a platform to examine the ancestral functions of Pellino proteins that may translate into novel mechanisms of immune regulation in humans. PMID:26120048

  15. Functional conservation of an ancestral Pellino protein in helminth species.

    PubMed

    Cluxton, Christopher D; Caffrey, Brian E; Kinsella, Gemma K; Moynagh, Paul N; Fares, Mario A; Fallon, Padraic G

    2015-01-01

    The immune system of H. sapiens has innate signaling pathways that arose in ancestral species. This is exemplified by the discovery of the Toll-like receptor (TLR) pathway using free-living model organisms such as Drosophila melanogaster. The TLR pathway is ubiquitous and controls sensitivity to pathogen-associated molecular patterns (PAMPs) in eukaryotes. There is, however, a marked absence of this pathway from the plathyhelminthes, with the exception of the Pellino protein family, which is present in a number of species from this phylum. Helminth Pellino proteins are conserved having high similarity, both at the sequence and predicted structural protein level, with that of human Pellino proteins. Pellino from a model helminth, Schistosoma mansoni Pellino (SmPellino), was shown to bind and poly-ubiquitinate human IRAK-1, displaying E3 ligase activity consistent with its human counterparts. When transfected into human cells SmPellino is functional, interacting with signaling proteins and modulating mammalian signaling pathways. Strict conservation of a protein family in species lacking its niche signalling pathway is rare and provides a platform to examine the ancestral functions of Pellino proteins that may translate into novel mechanisms of immune regulation in humans.

  16. Functional conservation of an ancestral Pellino protein in helminth species

    PubMed Central

    Cluxton, Christopher D.; Caffrey, Brian E.; Kinsella, Gemma K.; Moynagh, Paul N.; Fares, Mario A.; Fallon, Padraic G.

    2015-01-01

    The immune system of H. sapiens has innate signaling pathways that arose in ancestral species. This is exemplified by the discovery of the Toll-like receptor (TLR) pathway using free-living model organisms such as Drosophila melanogaster. The TLR pathway is ubiquitous and controls sensitivity to pathogen-associated molecular patterns (PAMPs) in eukaryotes. There is, however, a marked absence of this pathway from the plathyhelminthes, with the exception of the Pellino protein family, which is present in a number of species from this phylum. Helminth Pellino proteins are conserved having high similarity, both at the sequence and predicted structural protein level, with that of human Pellino proteins. Pellino from a model helminth, Schistosoma mansoni Pellino (SmPellino), was shown to bind and poly-ubiquitinate human IRAK-1, displaying E3 ligase activity consistent with its human counterparts. When transfected into human cells SmPellino is functional, interacting with signaling proteins and modulating mammalian signaling pathways. Strict conservation of a protein family in species lacking its niche signalling pathway is rare and provides a platform to examine the ancestral functions of Pellino proteins that may translate into novel mechanisms of immune regulation in humans. PMID:26120048

  17. SAM-T08, HMM-based protein structure prediction

    PubMed Central

    Karplus, Kevin

    2009-01-01

    The SAM-T08 web server is a protein structure prediction server that provides several useful intermediate results in addition to the final predicted 3D structure: three multiple sequence alignments of putative homologs using different iterated search procedures, prediction of local structure features including various backbone and burial properties, calibrated E-values for the significance of template searches of PDB and residue–residue contact predictions. The server has been validated as part of the CASP8 assessment of structure prediction as having good performance across all classes of predictions. The SAM-T08 server is available at http://compbio.soe.ucsc.edu/SAM_T08/T08-query.html PMID:19483096

  18. [Functionally-relevant conformational dynamics of water-soluble proteins].

    PubMed

    Novikov, G V; Sivozhelezov, V S; Shaĭtan, K V

    2013-01-01

    A study is reported of the functional-relevant dynamics of three typical water-soluble proteins: Calmodulin, Src-tyrosine kinase as well as repressor of Trp operon. Application of the state-of-art methods of structural bioinformatics allowed to identify dynamics seen in the X-ray structures of the investigated proteins associated with their specific biological functions. In addition, Normal Mode analysis technique revealed the most probable directions of the functionally-relevant motions for all that proteins were also predicted. Importantly, overall type of the motions observed on the lowest-frequency modes was very similar to the motions seen from the analysis of the X-ray data of the examined macromolecules. Thereby it was shown that the large-scale as well as local conformational motions of the proteins might be predetermined already at the level of their tertiary structures. In particular, the determining factor might be the specific fold of the alpha-helixes. Thus functionally-relevant in vivo dynamics of the investigated proteins might be evolutionally formed by means of natural selection at the level of the spatial topology. PMID:23705506

  19. Plasma proteins predict conversion to dementia from prodromal disease

    PubMed Central

    Hye, Abdul; Riddoch-Contreras, Joanna; Baird, Alison L.; Ashton, Nicholas J.; Bazenet, Chantal; Leung, Rufina; Westman, Eric; Simmons, Andrew; Dobson, Richard; Sattlecker, Martina; Lupton, Michelle; Lunnon, Katie; Keohane, Aoife; Ward, Malcolm; Pike, Ian; Zucht, Hans Dieter; Pepin, Danielle; Zheng, Wei; Tunnicliffe, Alan; Richardson, Jill; Gauthier, Serge; Soininen, Hilkka; Kłoszewska, Iwona; Mecocci, Patrizia; Tsolaki, Magda; Vellas, Bruno; Lovestone, Simon

    2014-01-01

    Background The study aimed to validate previously discovered plasma biomarkers associated with AD, using a design based on imaging measures as surrogate for disease severity and assess their prognostic value in predicting conversion to dementia. Methods Three multicenter cohorts of cognitively healthy elderly, mild cognitive impairment (MCI), and AD participants with standardized clinical assessments and structural neuroimaging measures were used. Twenty-six candidate proteins were quantified in 1148 subjects using multiplex (xMAP) assays. Results Sixteen proteins correlated with disease severity and cognitive decline. Strongest associations were in the MCI group with a panel of 10 proteins predicting progression to AD (accuracy 87%, sensitivity 85%, and specificity 88%). Conclusions We have identified 10 plasma proteins strongly associated with disease severity and disease progression. Such markers may be useful for patient selection for clinical trials and assessment of patients with predisease subjective memory complaints. PMID:25012867

  20. DSP: a protein shape string and its profile prediction server.

    PubMed

    Sun, Jiangming; Tang, Shengnan; Xiong, Wenwei; Cong, Peisheng; Li, Tonghua

    2012-07-01

    Many studies have demonstrated that shape string is an extremely important structure representation, since it is more complete than the classical secondary structure. The shape string provides detailed information also in the regions denoted random coil. But few services are provided for systematic analysis of protein shape string. To fill this gap, we have developed an accurate shape string predictor based on two innovative technologies: a knowledge-driven sequence alignment and a sequence shape string profile method. The performance on blind test data demonstrates that the proposed method can be used for accurate prediction of protein shape string. The DSP server provides both predicted shape string and sequence shape string profile for each query sequence. Using this information, the users can compare protein structure or display protein evolution in shape string space. The DSP server is available at both http://cheminfo.tongji.edu.cn/dsp/ and its main mirror http://chemcenter.tongji.edu.cn/dsp/.

  1. Prediction of Glycosylphosphatidylinositol-Anchored Proteins in Arabidopsis. A Genomic Analysis1

    PubMed Central

    Borner, Georg H.H.; Sherrier, D. Janine; Stevens, Timothy J.; Arkin, Isaiah T.; Dupree, Paul

    2002-01-01

    Glycosylphosphatidylinositol (GPI) anchoring of proteins provides a potential mechanism for targeting to the plant plasma membrane and cell wall. However, relatively few such proteins have been identified. Here, we develop a procedure for database analysis to identify GPI-anchored proteins (GAP) based on their possession of common features. In a comprehensive search of the annotated Arabidopsis genome, we identified 167 novel putative GAP in addition to the 43 previously described candidates. Many of these 210 proteins show similarity to characterized cell surface proteins. The predicted GAP include homologs of β-1,3-glucanases (16), metallo- and aspartyl proteases (13), glycerophosphodiesterases (6), phytocyanins (25), multi-copper oxidases (2), extensins (6), plasma membrane receptors (19), and lipid-transfer-proteins (18). Classical arabinogalactan (AG) proteins (13), AG peptides (9), fasciclin-like proteins (20), COBRA and 10 homologs, and novel potential signaling peptides that we name GAPEPs (8) were also identified. A further 34 proteins of unknown function were predicted to be GPI anchored. A surprising finding was that over 40% of the proteins identified here have probable AG glycosylation modules, suggesting that AG glycosylation of cell surface proteins is widespread. This analysis shows that GPI anchoring is likely to be a major modification in plants that is used to target a specific subset of proteins to the cell surface for extracellular matrix remodeling and signaling. PMID:12068095

  2. JPred4: a protein secondary structure prediction server.

    PubMed

    Drozdetskiy, Alexey; Cole, Christian; Procter, James; Barton, Geoffrey J

    2015-07-01

    JPred4 (http://www.compbio.dundee.ac.uk/jpred4) is the latest version of the popular JPred protein secondary structure prediction server which provides predictions by the JNet algorithm, one of the most accurate methods for secondary structure prediction. In addition to protein secondary structure, JPred also makes predictions of solvent accessibility and coiled-coil regions. The JPred service runs up to 94 000 jobs per month and has carried out over 1.5 million predictions in total for users in 179 countries. The JPred4 web server has been re-implemented in the Bootstrap framework and JavaScript to improve its design, usability and accessibility from mobile devices. JPred4 features higher accuracy, with a blind three-state (α-helix, β-strand and coil) secondary structure prediction accuracy of 82.0% while solvent accessibility prediction accuracy has been raised to 90% for residues <5% accessible. Reporting of results is enhanced both on the website and through the optional email summaries and batch submission results. Predictions are now presented in SVG format with options to view full multiple sequence alignments with and without gaps and insertions. Finally, the help-pages have been updated and tool-tips added as well as step-by-step tutorials. PMID:25883141

  3. JPred4: a protein secondary structure prediction server

    PubMed Central

    Drozdetskiy, Alexey; Cole, Christian; Procter, James; Barton, Geoffrey J.

    2015-01-01

    JPred4 (http://www.compbio.dundee.ac.uk/jpred4) is the latest version of the popular JPred protein secondary structure prediction server which provides predictions by the JNet algorithm, one of the most accurate methods for secondary structure prediction. In addition to protein secondary structure, JPred also makes predictions of solvent accessibility and coiled-coil regions. The JPred service runs up to 94 000 jobs per month and has carried out over 1.5 million predictions in total for users in 179 countries. The JPred4 web server has been re-implemented in the Bootstrap framework and JavaScript to improve its design, usability and accessibility from mobile devices. JPred4 features higher accuracy, with a blind three-state (α-helix, β-strand and coil) secondary structure prediction accuracy of 82.0% while solvent accessibility prediction accuracy has been raised to 90% for residues <5% accessible. Reporting of results is enhanced both on the website and through the optional email summaries and batch submission results. Predictions are now presented in SVG format with options to view full multiple sequence alignments with and without gaps and insertions. Finally, the help-pages have been updated and tool-tips added as well as step-by-step tutorials. PMID:25883141

  4. Addressing the Role of Conformational Diversity in Protein Structure Prediction.

    PubMed

    Palopoli, Nicolas; Monzon, Alexander Miguel; Parisi, Gustavo; Fornasari, Maria Silvina

    2016-01-01

    Computational modeling of tertiary structures has become of standard use to study proteins that lack experimental characterization. Unfortunately, 3D structure prediction methods and model quality assessment programs often overlook that an ensemble of conformers in equilibrium populates the native state of proteins. In this work we collected sets of publicly available protein models and the corresponding target structures experimentally solved and studied how they describe the conformational diversity of the protein. For each protein, we assessed the quality of the models against known conformers by several standard measures and identified those models ranked best. We found that model rankings are defined by both the selected target conformer and the similarity measure used. 70% of the proteins in our datasets show that different models are structurally closest to different conformers of the same protein target. We observed that model building protocols such as template-based or ab initio approaches describe in similar ways the conformational diversity of the protein, although for template-based methods this description may depend on the sequence similarity between target and template sequences. Taken together, our results support the idea that protein structure modeling could help to identify members of the native ensemble, highlight the importance of considering conformational diversity in protein 3D quality evaluations and endorse the study of the variability of the native structure for a meaningful biological analysis. PMID:27159429

  5. Addressing the Role of Conformational Diversity in Protein Structure Prediction

    PubMed Central

    Parisi, Gustavo; Fornasari, Maria Silvina

    2016-01-01

    Computational modeling of tertiary structures has become of standard use to study proteins that lack experimental characterization. Unfortunately, 3D structure prediction methods and model quality assessment programs often overlook that an ensemble of conformers in equilibrium populates the native state of proteins. In this work we collected sets of publicly available protein models and the corresponding target structures experimentally solved and studied how they describe the conformational diversity of the protein. For each protein, we assessed the quality of the models against known conformers by several standard measures and identified those models ranked best. We found that model rankings are defined by both the selected target conformer and the similarity measure used. 70% of the proteins in our datasets show that different models are structurally closest to different conformers of the same protein target. We observed that model building protocols such as template-based or ab initio approaches describe in similar ways the conformational diversity of the protein, although for template-based methods this description may depend on the sequence similarity between target and template sequences. Taken together, our results support the idea that protein structure modeling could help to identify members of the native ensemble, highlight the importance of considering conformational diversity in protein 3D quality evaluations and endorse the study of the variability of the native structure for a meaningful biological analysis. PMID:27159429

  6. Predicting Long Noncoding RNA and Protein Interactions Using Heterogeneous Network Model

    PubMed Central

    2015-01-01

    Recent study shows that long noncoding RNAs (lncRNAs) are participating in diverse biological processes and complex diseases. However, at present the functions of lncRNAs are still rarely known. In this study, we propose a network-based computational method, which is called lncRNA-protein interaction prediction based on Heterogeneous Network Model (LPIHN), to predict the potential lncRNA-protein interactions. First, we construct a heterogeneous network by integrating the lncRNA-lncRNA similarity network, lncRNA-protein interaction network, and protein-protein interaction (PPI) network. Then, a random walk with restart is implemented on the heterogeneous network to infer novel lncRNA-protein interactions. The leave-one-out cross validation test shows that our approach can achieve an AUC value of 96.0%. Some lncRNA-protein interactions predicted by our method have been confirmed in recent research or database, indicating the efficiency of LPIHN to predict novel lncRNA-protein interactions. PMID:26839884

  7. The IntFOLD server: an integrated web resource for protein fold recognition, 3D model quality assessment, intrinsic disorder prediction, domain prediction and ligand binding site prediction.

    PubMed

    Roche, Daniel B; Buenavista, Maria T; Tetchner, Stuart J; McGuffin, Liam J

    2011-07-01

    The IntFOLD server is a novel independent server that integrates several cutting edge methods for the prediction of structure and function from sequence. Our guiding principles behind the server development were as follows: (i) to provide a simple unified resource that makes our prediction software accessible to all and (ii) to produce integrated output for predictions that can be easily interpreted. The output for predictions is presented as a simple table that summarizes all results graphically via plots and annotated 3D models. The raw machine readable data files for each set of predictions are also provided for developers, which comply with the Critical Assessment of Methods for Protein Structure Prediction (CASP) data standards. The server comprises an integrated suite of five novel methods: nFOLD4, for tertiary structure prediction; ModFOLD 3.0, for model quality assessment; DISOclust 2.0, for disorder prediction; DomFOLD 2.0 for domain prediction; and FunFOLD 1.0, for ligand binding site prediction. Predictions from the IntFOLD server were found to be competitive in several categories in the recent CASP9 experiment. The IntFOLD server is available at the following web site: http://www.reading.ac.uk/bioinf/IntFOLD/.

  8. Insect Seminal Fluid Proteins: Identification and Function

    PubMed Central

    Avila, Frank W.; Sirot, Laura K.; LaFlamme, Brooke A.; Rubinstein, C. Dustin; Wolfner, Mariana F.

    2014-01-01

    Seminal fluid proteins (SFPs) produced in reproductive tract tissues of male insects and transferred to females during mating induce numerous physiological and behavioral post-mating changes in females. These changes include decreasing receptivity to re-mating, affecting sperm storage parameters, increasing egg production, modulating sperm competition, feeding behaviors, and mating plug formation. In addition, SFPs also have anti-microbial functions and induce expression of anti-microbial peptides in at least some insects. Here, we review recent identification of insect SFPs and discuss the multiple roles these proteins play in the post-mating processes of female insects. PMID:20868282

  9. Ice-Binding Proteins and Their Function.

    PubMed

    Bar Dolev, Maya; Braslavsky, Ido; Davies, Peter L

    2016-06-01

    Ice-binding proteins (IBPs) are a diverse class of proteins that assist organism survival in the presence of ice in cold climates. They have different origins in many organisms, including bacteria, fungi, algae, diatoms, plants, insects, and fish. This review covers the gamut of IBP structures and functions and the common features they use to bind ice. We discuss mechanisms by which IBPs adsorb to ice and interfere with its growth, evidence for their irreversible association with ice, and methods for enhancing the activity of IBPs. The applications of IBPs in the food industry, in cryopreservation, and in other technologies are vast, and we chart out some possibilities. PMID:27145844

  10. Prediction of Membrane Transport Proteins and Their Substrate Specificities Using Primary Sequence Information

    PubMed Central

    Mishra, Nitish K.; Chang, Junil; Zhao, Patrick X.

    2014-01-01

    Background Membrane transport proteins (transporters) move hydrophilic substrates across hydrophobic membranes and play vital roles in most cellular functions. Transporters represent a diverse group of proteins that differ in topology, energy coupling mechanism, and substrate specificity as well as sequence similarity. Among the functional annotations of transporters, information about their transporting substrates is especially important. The experimental identification and characterization of transporters is currently costly and time-consuming. The development of robust bioinformatics-based methods for the prediction of membrane transport proteins and their substrate specificities is therefore an important and urgent task. Results Support vector machine (SVM)-based computational models, which comprehensively utilize integrative protein sequence features such as amino acid composition, dipeptide composition, physico-chemical composition, biochemical composition, and position-specific scoring matrices (PSSM), were developed to predict the substrate specificity of seven transporter classes: amino acid, anion, cation, electron, protein/mRNA, sugar, and other transporters. An additional model to differentiate transporters from non-transporters was also developed. Among the developed models, the biochemical composition and PSSM hybrid model outperformed other models and achieved an overall average prediction accuracy of 76.69% with a Mathews correlation coefficient (MCC) of 0.49 and a receiver operating characteristic area under the curve (AUC) of 0.833 on our main dataset. This model also achieved an overall average prediction accuracy of 78.88% and MCC of 0.41 on an independent dataset. Conclusions Our analyses suggest that evolutionary information (i.e., the PSSM) and the AAIndex are key features for the substrate specificity prediction of transport proteins. In comparison, similarity-based methods such as BLAST, PSI-BLAST, and hidden Markov models do not provide

  11. CoinFold: a web server for protein contact prediction and contact-assisted protein folding

    PubMed Central

    Wang, Sheng; Li, Wei; Zhang, Renyu; Liu, Shiwang; Xu, Jinbo

    2016-01-01

    CoinFold (http://raptorx2.uchicago.edu/ContactMap/) is a web server for protein contact prediction and contact-assisted de novo structure prediction. CoinFold predicts contacts by integrating joint multi-family evolutionary coupling (EC) analysis and supervised machine learning. This joint EC analysis is unique in that it not only uses residue coevolution information in the target protein family, but also that in the related families which may have divergent sequences but similar folds. The supervised learning further improves contact prediction accuracy by making use of sequence profile, contact (distance) potential and other information. Finally, this server predicts tertiary structure of a sequence by feeding its predicted contacts and secondary structure to the CNS suite. Tested on the CASP and CAMEO targets, this server shows significant advantages over existing ones of similar category in both contact and tertiary structure prediction. PMID:27112569

  12. Which Working Memory Functions Predict Intelligence?

    ERIC Educational Resources Information Center

    Oberauer, Klaus; Sub, Heinz-Martin; Wilhelm, Oliver; Wittmann, Werner W.

    2008-01-01

    Investigates the relationship between three factors of working memory (storage and processing, relational integration, and supervision) and four factors of intelligence (reasoning, speed, memory, and creativity) using structural equation models. Relational integration predicted reasoning ability at least as well as the storage-and-processing…

  13. Binding affinity prediction for protein-ligand complexes based on β contacts and B factor.

    PubMed

    Liu, Qian; Kwoh, Chee Keong; Li, Jinyan

    2013-11-25

    Accurate determination of protein-ligand binding affinity is a fundamental problem in biochemistry useful for many applications including drug design and protein-ligand docking. A number of scoring functions have been proposed for the prediction of protein-ligand binding affinity. However, accurate prediction is still a challenging problem because poor performance is often seen in the evaluation under the leave-one-cluster-out cross-validation (LCOCV). We introduce a new scoring function named B2BScore to improve the prediction performance. B2BScore integrates two physicochemical properties for protein-ligand binding affinity prediction. One is the property of β contacts. A β contact between two atoms requires no other atoms to interrupt the atomic contact and assumes that the two atoms should have enough direct contact area. The other is the property of B factor to capture the atomic mobility in the dynamic protein-ligand binding process. Tested on the PDBBind2009 data set, B2BScore shows superior prediction performance to existing methods on independent test data as well as under the LCOCV evaluation framework. In particular, B2BScore achieves a significant LCOCV improvement across 26 protein clusters-a big increase of the averaged Pearson's correlation coefficients from 0.418 to 0.518 and a significant decrease of standard deviation of the coefficients from 0.352 to 0.196. We also identified several important and intuitive contact descriptors of protein-ligand binding through the random forest learning in B2BScore. Some of these descriptors are closely related to contacts between carbon atoms without covalent-bond oxygen/nitrogen, preferred contacts of metal ions, interfacial backbone atoms from proteins, or π rings. Some others are negative descriptors relating to those contacts with nitrogen atoms without covalent-bond hydrogens or nonpreferred contacts of metal ions. These descriptors can be directly used to guide protein-ligand docking.

  14. Intrinsic Disorder in Transmembrane Proteins: Roles in Signaling and Topology Prediction

    PubMed Central

    Bürgi, Jérôme; Xue, Bin; Uversky, Vladimir N.

    2016-01-01

    Intrinsically disordered regions (IDRs) are peculiar stretches of amino acids that lack stable conformations in solution. Intrinsic Disorder containing Proteins (IDP) are defined by the presence of at least one large IDR and have been linked to multiple cellular processes including cell signaling, DNA binding and cancer. Here we used computational analyses and publicly available databases to deepen insight into the prevalence and function of IDRs specifically in transmembrane proteins, which are somewhat neglected in most studies. We found that 50% of transmembrane proteins have at least one IDR of 30 amino acids or more. Interestingly, these domains preferentially localize to the cytoplasmic side especially of multi-pass transmembrane proteins, suggesting that disorder prediction could increase the confidence of topology prediction algorithms. This was supported by the successful prediction of the topology of the uncharacterized multi-pass transmembrane protein TMEM117, as confirmed experimentally. Pathway analysis indicated that IDPs are enriched in cell projection and axons and appear to play an important role in cell adhesion, signaling and ion binding. In addition, we found that IDP are enriched in phosphorylation sites, a crucial post translational modification in signal transduction, when compared to fully ordered proteins and to be implicated in more protein-protein interaction events. Accordingly, IDPs were highly enriched in short protein binding regions called Molecular Recognition Features (MoRFs). Altogether our analyses strongly support the notion that the transmembrane IDPs act as hubs in cellular signal events. PMID:27391701

  15. CARDIO-PRED: an in silico tool for predicting cardiovascular-disorder associated proteins.

    PubMed

    Jain, Prerna; Thukral, Nitin; Gahlot, Lokesh Kumar; Hasija, Yasha

    2015-06-01

    Interactions between proteins largely govern cellular processes and this has led to numerous efforts culminating in enormous information related to the proteins, their interactions and the function which is determined by their interactions. The main concern of the present study is to present interface analysis of cardiovascular-disorder (CVD) related proteins to shed lights on details of interactions and to emphasize the importance of using structures in network studies. This study combines the network-centred approach with three dimensional studies to comprehend the fundamentals of biology. Interface properties were used as descriptors to classify the CVD associated proteins and non-CVD associated proteins. Machine learning algorithm was used to generate a classifier based on the training set which was then used to predict potential CVD related proteins from a set of polymorphic proteins which are not known to be involved in any disease. Among several classifying algorithms applied to generate models, best performance was achieved using Random Forest with an accuracy of 69.5 %. The tool named CARDIO-PRED, based on the prediction model is present at http://www.genomeinformatics.dce.edu/CARDIO-PRED/. The predicted CVD related proteins may not be the causing factor of particular disease but can be involved in pathways and reactions yet unknown to us thus permitting a more rational analysis of disease mechanism. Study of their interactions with other proteins can significantly improve our understanding of the molecular mechanism of diseases.

  16. Combining many interaction networks to predict gene function and analyze gene lists.

    PubMed

    Mostafavi, Sara; Morris, Quaid

    2012-05-01

    In this article, we review how interaction networks can be used alone or in combination in an automated fashion to provide insight into gene and protein function. We describe the concept of a "gene-recommender system" that can be applied to any large collection of interaction networks to make predictions about gene or protein function based on a query list of proteins that share a function of interest. We discuss these systems in general and focus on one specific system, GeneMANIA, that has unique features and uses different algorithms from the majority of other systems.

  17. Protein structure prediction enhanced with evolutionary diversity : SPEED.

    SciTech Connect

    DeBartolo, J.; Hocky, G.; Wilde, M.; Xu, J.; Freed, K. F.; Sosnick, T. R.; Univ. of Chicago; Toyota Technological Inst. at Chicago

    2010-03-01

    For naturally occurring proteins, similar sequence implies similar structure. Consequently, multiple sequence alignments (MSAs) often are used in template-based modeling of protein structure and have been incorporated into fragment-based assembly methods. Our previous homology-free structure prediction study introduced an algorithm that mimics the folding pathway by coupling the formation of secondary and tertiary structure. Moves in the Monte Carlo procedure involve only a change in a single pair of {phi},{psi} backbone dihedral angles that are obtained from a Protein Data Bank-based distribution appropriate for each amino acid, conditional on the type and conformation of the flanking residues. We improve this method by using MSAs to enrich the sampling distribution, but in a manner that does not require structural knowledge of any protein sequence (i.e., not homologous fragment insertion). In combination with other tools, including clustering and refinement, the accuracies of the predicted secondary and tertiary structures are substantially improved and a global and position-resolved measure of confidence is introduced for the accuracy of the predictions. Performance of the method in the Critical Assessment of Structure Prediction (CASP8) is discussed.

  18. Prediction and redesign of protein–protein interactions

    PubMed Central

    Lua, Rhonald C.; Marciano, David C.; Katsonis, Panagiotis; Adikesavan, Anbu K.; Wilkins, Angela D.; Lichtarge, Olivier

    2014-01-01

    Understanding the molecular basis of protein function remains a central goal of biology, with the hope to elucidate the role of human genes in health and in disease, and to rationally design therapies through targeted molecular perturbations. We review here some of the computational techniques and resources available for characterizing a critical aspect of protein function – those mediated by protein–protein interactions (PPI). We describe several applications and recent successes of the Evolutionary Trace (ET) in identifying molecular events and shapes that underlie protein function and specificity in both eukaryotes and prokaryotes. ET is a part of analytical approaches based on the successes and failures of evolution that enable the rational control of PPI. PMID:24878423

  19. Genome-wide protein localization prediction strategies for gram negative bacteria

    SciTech Connect

    Romine, Margaret F.

    2011-06-15

    Genome-wide prediction of protein subcellular localization is an important type of evidence used for inferring protein function. While a variety of computational tools have been developed for this purpose, errors in the gene models and use of protein sorting signals that are not recognized by the more commonly accepted tools can diminish the accuracy of their output. As part of an effort to manually curate the annotations of 19 strains of Shewanella, numerous insights were gained regarding the use of computational tools and proteomics data to predict protein localization. Identification of the suite of secretion systems present in each strain at the start of the process made it possible to tailor-fit the subsequent localization prediction strategies to each strain for improved accuracy. Comparisons of the computational predictions among orthologous proteins revealed inconsistencies in the computational outputs, which could often be resolved by adjusting the gene models or ortholog group memberships. While proteomic data was useful for verifying start site predictions and post-translational proteolytic cleavage, care was needed to distinguish cellular versus sample processing-mediated cleavage events. Searches for lipoprotein signal peptides revealed that neither TatP nor LipoP are designed for identification of lipoprotein substrates of the twin arginine translocation system and that the +2 rule for lipoprotein sorting does not apply to this Genus. Analysis of the relationships between domain occurrence and protein localization prediction enabled identification of numerous location-informative domains which could then be used to refine or increase confidence in location predictions. This collective knowledge was used to develop a general strategy for predicting protein localization that could be adapted to other organisms.

  20. A computational method to predict carbonylation sites in yeast proteins.

    PubMed

    Lv, H Q; Liu, J; Han, J Q; Zheng, J G; Liu, R L

    2016-01-01

    Several post-translational modifications (PTM) have been discussed in literature. Among a variety of oxidative stress-induced PTM, protein carbonylation is considered a biomarker of oxidative stress. Only certain proteins can be carbonylated because only four amino acid residues, namely lysine (K), arginine (R), threonine (T) and proline (P), are susceptible to carbonylation. The yeast proteome is an excellent model to explore oxidative stress, especially protein carbonylation. Current experimental approaches in identifying carbonylation sites are expensive, time-consuming and limited in their abilities to process proteins. Furthermore, there is no bioinformational method to predict carbonylation sites in yeast proteins. Therefore, we propose a computational method to predict yeast carbonylation sites. This method has total accuracies of 86.32, 85.89, 84.80, and 86.80% in predicting the carbonylation sites of K, R, T, and P, respectively. These results were confirmed by 10-fold cross-validation. The ability to identify carbonylation sites in different kinds of features was analyzed and the position-specific composition of the modification site-flanking residues was discussed. Additionally, a software tool has been developed to help with the calculations in this method. Datasets and the software are available at https://sourceforge.net/projects/hqlstudio/ files/CarSpred.Y/. PMID:27420944

  1. Improved hybrid optimization algorithm for 3D protein structure prediction.

    PubMed

    Zhou, Changjun; Hou, Caixia; Wei, Xiaopeng; Zhang, Qiang

    2014-07-01

    A new improved hybrid optimization algorithm - PGATS algorithm, which is based on toy off-lattice model, is presented for dealing with three-dimensional protein structure prediction problems. The algorithm combines the particle swarm optimization (PSO), genetic algorithm (GA), and tabu search (TS) algorithms. Otherwise, we also take some different improved strategies. The factor of stochastic disturbance is joined in the particle swarm optimization to improve the search ability; the operations of crossover and mutation that are in the genetic algorithm are changed to a kind of random liner method; at last tabu search algorithm is improved by appending a mutation operator. Through the combination of a variety of strategies and algorithms, the protein structure prediction (PSP) in a 3D off-lattice model is achieved. The PSP problem is an NP-hard problem, but the problem can be attributed to a global optimization problem of multi-extremum and multi-parameters. This is the theoretical principle of the hybrid optimization algorithm that is proposed in this paper. The algorithm combines local search and global search, which overcomes the shortcoming of a single algorithm, giving full play to the advantage of each algorithm. In the current universal standard sequences, Fibonacci sequences and real protein sequences are certified. Experiments show that the proposed new method outperforms single algorithms on the accuracy of calculating the protein sequence energy value, which is proved to be an effective way to predict the structure of proteins. PMID:25069136

  2. A multilayer evaluation approach for protein structure prediction and model quality assessment.

    PubMed

    Zhang, Jingfen; Wang, Qingguo; Vantasin, Kittinun; Zhang, Jiong; He, Zhiquan; Kosztin, Ioan; Shang, Yi; Xu, Dong

    2011-01-01

    Protein tertiary structures are essential for studying functions of proteins at molecular level. An indispensable approach for protein structure solution is computational prediction. Most protein structure prediction methods generate candidate models first and select the best candidates by model quality assessment (QA). In many cases, good models can be produced, but the QA tools fail to select the best ones from the candidate model pool. Because of incomplete understanding of protein folding, each QA method only reflects partial facets of a structure model and thus has limited discerning power with no one consistently outperforming others. In this article, we developed a set of new QA methods, including two QA methods for evaluating target/template alignments, a molecular dynamics (MD)-based QA method, and three consensus QA methods with selected references to reveal new facets of protein structures complementary to the existing methods. Moreover, the underlying relationship among different QA methods were analyzed and then integrated into a multilayer evaluation approach to guide the model generation and model selection in prediction. All methods are integrated and implemented into an innovative and improved prediction system hereafter referred to as MUFOLD. In CASP8 and CASP9, MUFOLD has demonstrated the proof of the principles in terms of both QA discerning power and structure prediction accuracy. PMID:21997706

  3. A multilayer evaluation approach for protein structure prediction and model quality assessment.

    PubMed

    Zhang, Jingfen; Wang, Qingguo; Vantasin, Kittinun; Zhang, Jiong; He, Zhiquan; Kosztin, Ioan; Shang, Yi; Xu, Dong

    2011-01-01

    Protein tertiary structures are essential for studying functions of proteins at molecular level. An indispensable approach for protein structure solution is computational prediction. Most protein structure prediction methods generate candidate models first and select the best candidates by model quality assessment (QA). In many cases, good models can be produced, but the QA tools fail to select the best ones from the candidate model pool. Because of incomplete understanding of protein folding, each QA method only reflects partial facets of a structure model and thus has limited discerning power with no one consistently outperforming others. In this article, we developed a set of new QA methods, including two QA methods for evaluating target/template alignments, a molecular dynamics (MD)-based QA method, and three consensus QA methods with selected references to reveal new facets of protein structures complementary to the existing methods. Moreover, the underlying relationship among different QA methods were analyzed and then integrated into a multilayer evaluation approach to guide the model generation and model selection in prediction. All methods are integrated and implemented into an innovative and improved prediction system hereafter referred to as MUFOLD. In CASP8 and CASP9, MUFOLD has demonstrated the proof of the principles in terms of both QA discerning power and structure prediction accuracy.

  4. iStable: off-the-shelf predictor integration for predicting protein stability changes

    PubMed Central

    2013-01-01

    Background Mutation of a single amino acid residue can cause changes in a protein, which could then lead to a loss of protein function. Predicting the protein stability changes can provide several possible candidates for the novel protein designing. Although many prediction tools are available, the conflicting prediction results from different tools could cause confusion to users. Results We proposed an integrated predictor, iStable, with grid computing architecture constructed by using sequence information and prediction results from different element predictors. In the learning model, several machine learning methods were evaluated and adopted the support vector machine as an integrator, while not just choosing the majority answer given by element predictors. Furthermore, the role of the sequence information played was analyzed in our model, and an 11-window size was determined. On the other hand, iStable is available with two different input types: structural and sequential. After training and cross-validation, iStable has better performance than all of the element predictors on several datasets. Under different classifications and conditions for validation, this study has also shown better overall performance in different types of secondary structures, relative solvent accessibility circumstances, protein memberships in different superfamilies, and experimental conditions. Conclusions The trained and validated version of iStable provides an accurate approach for prediction of protein stability changes. iStable is freely available online at: http://predictor.nchu.edu.tw/iStable. PMID:23369171

  5. CSF protein biomarkers predicting longitudinal reduction of CSF β-amyloid42 in cognitively healthy elders

    PubMed Central

    Mattsson, N; Insel, P; Nosheny, R; Zetterberg, H; Trojanowski, J Q; Shaw, L M; Tosun, D; Weiner, M

    2013-01-01

    β-amyloid (Aβ) plaque accumulation is a hallmark of Alzheimer's disease (AD). It is believed to start many years prior to symptoms and is reflected by reduced cerebrospinal fluid (CSF) levels of the peptide Aβ1–42 (Aβ42). Here we tested the hypothesis that baseline levels of CSF proteins involved in microglia activity, synaptic function and Aβ metabolism predict the development of Aβ plaques, assessed by longitudinal CSF Aβ42 decrease in cognitively healthy people. Forty-six healthy people with three to four serial CSF samples were included (mean follow-up 3 years, range 2–4 years). There was an overall reduction in Aβ42 from a mean concentration of 211–195 pg ml−1 after 4 years. Linear mixed-effects models using longitudinal Aβ42 as the response variable, and baseline proteins as explanatory variables (n=69 proteins potentially relevant for Aβ metabolism, microglia or synaptic/neuronal function), identified 10 proteins with significant effects on longitudinal Aβ42. The most significant proteins were angiotensin-converting enzyme (ACE, P=0.009), Chromogranin A (CgA, P=0.009) and Axl receptor tyrosine kinase (AXL, P=0.009). Receiver-operating characteristic analysis identified 11 proteins with significant effects on longitudinal Aβ42 (largely overlapping with the proteins identified by linear mixed-effects models). Several proteins (including ACE, CgA and AXL) were associated with Aβ42 reduction only in subjects with normal baseline Aβ42, and not in subjects with reduced baseline Aβ42. We conclude that baseline CSF proteins related to Aβ metabolism, microglia activity or synapses predict longitudinal Aβ42 reduction in cognitively healthy elders. The finding that some proteins only predict Aβ42 reduction in subjects with normal baseline Aβ42 suggest that they predict future development of the brain Aβ pathology at the earliest stages of AD, prior to widespread development of Aβ plaques. PMID:23962923

  6. Folding funnels, binding funnels, and protein function.

    PubMed Central

    Tsai, C. J.; Kumar, S.; Ma, B.; Nussinov, R.

    1999-01-01

    Folding funnels have been the focus of considerable attention during the last few years. These have mostly been discussed in the general context of the theory of protein folding. Here we extend the utility of the concept of folding funnels, relating them to biological mechanisms and function. In particular, here we describe the shape of the funnels in light of protein synthesis and folding; flexibility, conformational diversity, and binding mechanisms; and the associated binding funnels, illustrating the multiple routes and the range of complexed conformers. Specifically, the walls of the folding funnels, their crevices, and bumps are related to the complexity of protein folding, and hence to sequential vs. nonsequential folding. Whereas the former is more frequently observed in eukaryotic proteins, where the rate of protein synthesis is slower, the latter is more frequent in prokaryotes, with faster translation rates. The bottoms of the funnels reflect the extent of the flexibility of the proteins. Rugged floors imply a range of conformational isomers, which may be close on the energy landscape. Rather than undergoing an induced fit binding mechanism, the conformational ensembles around the rugged bottoms argue that the conformers, which are most complementary to the ligand, will bind to it with the equilibrium shifting in their favor. Furthermore, depending on the extent of the ruggedness, or of the smoothness with only a few minima, we may infer nonspecific, broad range vs. specific binding. In particular, folding and binding are similar processes, with similar underlying principles. Hence, the shape of the folding funnel of the monomer enables making reasonable guesses regarding the shape of the corresponding binding funnel. Proteins having a broad range of binding, such as proteolytic enzymes or relatively nonspecific endonucleases, may be expected to have not only rugged floors in their folding funnels, but their binding funnels will also behave similarly

  7. Genome-scale gene function prediction using multiple sources of high-throughput data in yeast Saccharomyces cerevisiae.

    PubMed

    Joshi, Trupti; Chen, Yu; Becker, Jeffrey M; Alexandrov, Nickolai; Xu, Dong

    2004-01-01

    Characterizing gene function is one of the major challenging tasks in the post-genomic era. To address this challenge, we have developed GeneFAS (Gene Function Annotation System), a new integrated probabilistic method for cellular function prediction by combining information from protein-protein interactions, protein complexes, microarray gene expression profiles, and annotations of known proteins through an integrative statistical model. Our approach is based on a novel assessment for the relationship between (1) the interaction/correlation of two proteins' high-throughput data and (2) their functional relationship in terms of their Gene Ontology (GO) hierarchy. We have developed a Web server for the predictions. We have applied our method to yeast Saccharomyces cerevisiae and predicted functions for 1548 out of 2472 unannotated proteins.

  8. Functional Differences in Yeast Protein Disulfide Isomerases

    PubMed Central

    Nørgaard, Per; Westphal, Vibeke; Tachibana, Christine; Alsøe, Lene; Holst, Bjørn; Winther, Jakob R.

    2001-01-01

    PDI1 is the essential gene encoding protein disulfide isomerase in yeast. The Saccharomyces cerevisiae genome, however, contains four other nonessential genes with homology to PDI1: MPD1, MPD2, EUG1, and EPS1. We have investigated the effects of simultaneous deletions of these genes. In several cases, we found that the ability of the PDI1 homologues to restore viability to a pdi1-deleted strain when overexpressed was dependent on the presence of low endogenous levels of one or more of the other homologues. This shows that the homologues are not functionally interchangeable. In fact, Mpd1p was the only homologue capable of carrying out all the essential functions of Pdi1p. Furthermore, the presence of endogenous homologues with a CXXC motif in the thioredoxin-like domain is required for suppression of a pdi1 deletion by EUG1 (which contains two CXXS active site motifs). This underlines the essentiality of protein disulfide isomerase-catalyzed oxidation. Most mutant combinations show defects in carboxypeptidase Y folding as well as in glycan modification. There are, however, no significant effects on ER-associated protein degradation in the various protein disulfide isomerase-deleted strains. PMID:11157982

  9. YB-1 protein: functions and regulation.

    PubMed

    Lyabin, Dmitry N; Eliseeva, Irina A; Ovchinnikov, Lev P

    2014-01-01

    The Y-box binding protein 1 (YB-1, YBX1) is a member of the family of DNA- and RNA-binding proteins with an evolutionarily ancient and conserved cold shock domain. It falls into a group of intrinsically disordered proteins that do not follow the classical rule 'one protein-one function' but introduce a novel principle stating that a disordered structure suggests many functions. YB-1 participates in a wide variety of DNA/RNA-dependent events, including DNA reparation, pre-mRNA transcription and splicing, mRNA packaging, and regulation of mRNA stability and translation. At the cell level, the multiple activities of YB-1 are manifested as its involvement in cell proliferation and differentiation, stress response, and malignant cell transformation. WIREs RNA 2014, 5:95-110. doi: 10.1002/wrna.1200 CONFLICT OF INTEREST: The authors have declared no conflicts of interest for this article. For further resources related to this article, please visit the WIREs website.

  10. Choosing negative examples for the prediction of protein-protein interactions

    PubMed Central

    Ben-Hur, Asa; Noble, William Stafford

    2006-01-01

    The protein-protein interaction networks of even well-studied model organisms are sketchy at best, highlighting the continued need for computational methods to help direct experimentalists in the search for novel interactions. This need has prompted the development of a number of methods for predicting protein-protein interactions based on various sources of data and methodologies. The common method for choosing negative examples for training a predictor of protein-protein interactions is based on annotations of cellular localization, and the observation that pairs of proteins that have different localization patterns are unlikely to interact. While this method leads to high quality sets of non-interacting proteins, we find that this choice can lead to biased estimates of prediction accuracy, because the constraints placed on the distribution of the negative examples makes the task easier. The effects of this bias are demonstrated in the context of both sequence-based and non-sequence based features used for predicting protein-protein interactions. PMID:16723005

  11. Nanoparticles-cell association predicted by protein corona fingerprints

    NASA Astrophysics Data System (ADS)

    Palchetti, S.; Digiacomo, L.; Pozzi, D.; Peruzzi, G.; Micarelli, E.; Mahmoudi, M.; Caracciolo, G.

    2016-06-01

    In a physiological environment (e.g., blood and interstitial fluids) nanoparticles (NPs) will bind proteins shaping a ``protein corona'' layer. The long-lived protein layer tightly bound to the NP surface is referred to as the hard corona (HC) and encodes information that controls NP bioactivity (e.g. cellular association, cellular signaling pathways, biodistribution, and toxicity). Decrypting this complex code has become a priority to predict the NP biological outcomes. Here, we use a library of 16 lipid NPs of varying size (Ø ~ 100-250 nm) and surface chemistry (unmodified and PEGylated) to investigate the relationships between NP physicochemical properties (nanoparticle size, aggregation state and surface charge), protein corona fingerprints (PCFs), and NP-cell association. We found out that none of the NPs' physicochemical properties alone was exclusively able to account for association with human cervical cancer cell line (HeLa). For the entire library of NPs, a total of 436 distinct serum proteins were detected. We developed a predictive-validation modeling that provides a means of assessing the relative significance of the identified corona proteins. Interestingly, a minor fraction of the HC, which consists of only 8 PCFs were identified as main promoters of NP association with HeLa cells. Remarkably, identified PCFs have several receptors with high level of expression on the plasma membrane of HeLa cells.In a physiological environment (e.g., blood and interstitial fluids) nanoparticles (NPs) will bind proteins shaping a ``protein corona'' layer. The long-lived protein layer tightly bound to the NP surface is referred to as the hard corona (HC) and encodes information that controls NP bioactivity (e.g. cellular association, cellular signaling pathways, biodistribution, and toxicity). Decrypting this complex code has become a priority to predict the NP biological outcomes. Here, we use a library of 16 lipid NPs of varying size (Ø ~ 100-250 nm) and surface

  12. Predicting Gene-Regulation Functions: Lessons from Temperate Bacteriophages

    PubMed Central

    Teif, Vladimir B.

    2010-01-01

    Gene-regulation functions (GRF) provide a unique characteristic of a cis-regulatory module (CRM), relating the concentrations of transcription factors (input) to the promoter activities (output). The challenge is to predict GRFs from the sequence. Here we systematically consider the lysogeny-lysis CRMs of different temperate bacteriophages such as the Lactobacillus casei phage A2, Escherichia coli phages λ, and 186 and Lactococcal phage TP901-1. This study allowed explaining a recent experimental puzzle on the role of Cro protein in the lambda switch. Several general conclusions have been drawn: 1), long-range interactions, multilayer assembly and DNA looping may lead to complex GRFs that cannot be described by linear functions of binding site occupancies; 2), in general, GRFs cannot be described by the Boolean logic, whereas a three-state non-Boolean logic suffices for the studied examples; 3), studied CRMs of the intact phages seemed to have a similar GRF topology (the number of plateaus and peaks corresponding to different expression regimes); we hypothesize that functionally equivalent CRMs might have topologically equivalent GRFs for a larger class of genetic systems; and 4) within a given GRF class, a set of mechanistic-to-mathematical transformations has been identified, which allows shaping the GRF before carrying out a system-level analysis. PMID:20371324

  13. PDP-CON: prediction of domain/linker residues in protein sequences using a consensus approach.

    PubMed

    Chatterjee, Piyali; Basu, Subhadip; Zubek, Julian; Kundu, Mahantapas; Nasipuri, Mita; Plewczynski, Dariusz

    2016-04-01

    The prediction of domain/linker residues in protein sequences is a crucial task in the functional classification of proteins, homology-based protein structure prediction, and high-throughput structural genomics. In this work, a novel consensus-based machine-learning technique was applied for residue-level prediction of the domain/linker annotations in protein sequences using ordered/disordered regions along protein chains and a set of physicochemical properties. Six different classifiers-decision tree, Gaussian naïve Bayes, linear discriminant analysis, support vector machine, random forest, and multilayer perceptron-were exhaustively explored for the residue-level prediction of domain/linker regions. The protein sequences from the curated CATH database were used for training and cross-validation experiments. Test results obtained by applying the developed PDP-CON tool to the mutually exclusive, independent proteins of the CASP-8, CASP-9, and CASP-10 databases are reported. An n-star quality consensus approach was used to combine the results yielded by different classifiers. The average PDP-CON accuracy and F-measure values for the CASP targets were found to be 0.86 and 0.91, respectively. The dataset, source code, and all supplementary materials for this work are available at https://cmaterju.org/cmaterbioinfo/ for noncommercial use.

  14. Heterogeneity in Retroviral Nucleocapsid Protein Function

    NASA Astrophysics Data System (ADS)

    Landes, Christy

    2009-03-01

    Time-resolved single-molecule fluorescence spectroscopy was used to study the human T-cell lymphotropic virus type 1 (HTLV-1) nucleocapsid protein (NC) chaperone activity as compared to that of the HIV-1 NC protein. HTLV-1 NC contains two zinc fingers with each having a CCHC binding motif similar to HIV-1 NC. HIV-1 NC is required for recognition and packaging of the viral RNA and is also a nucleic acid chaperone protein that facilitates nucleic acid restructuring during reverse transcription. Because of similarities in structures between the two retroviruses, we have used single-molecule fluorescence energy transfer to investigate the chaperoning activity of HTLV-1 NC protein. The results indicate that HTLV-1 NC protein induces structural changes by opening the transactivation response (TAR)-DNA hairpin to an even greater extent than HIV-1 NC. However, unlike HIV-1 NC, HTLV-1 NC does not chaperone the strand-transfer reaction involving TAR-DNA. These results suggest that despite its effective destabilization capability, HTLV-1 NC is not as effective at overall chaperone function as is its HIV-1 counterpart.

  15. TSEMA: interactive prediction of protein pairings between interacting families.

    PubMed

    Izarzugaza, José M G; Juan, David; Pons, Carles; Ranea, Juan A G; Valencia, Alfonso; Pazos, Florencio

    2006-07-01

    An entire family of methodologies for predicting protein interactions is based on the observed fact that families of interacting proteins tend to have similar phylogenetic trees due to co-evolution. One application of this concept is the prediction of the mapping between the members of two interacting protein families (which protein within one family interacts with which protein within the other). The idea is that the real mapping would be the one maximizing the similarity between the trees. Since the exhaustive exploration of all possible mappings is not feasible for large families, current approaches use heuristic techniques which do not ensure the best solution to be found. This is why it is important to check the results proposed by heuristic techniques and to manually explore other solutions. Here we present TSEMA, the server for efficient mapping assessment. This system calculates an initial mapping between two families of proteins based on a Monte Carlo approach and allows the user to interactively modify it based on performance figures and/or specific biological knowledge. All the explored mappings are graphically shown over a representation of the phylogenetic trees. The system is freely available at http://pdg.cnb.uam.es/TSEMA. Standalone versions of the software behind the interface are available upon request from the authors.

  16. A system for predicting energy and protein requirements of wild ruminants.

    PubMed

    Hackmann, Timothy J

    2011-01-01

    Wild ruminants require energy and protein for the normal function. I developed a system for predicting these energy and protein requirements across ruminant species and life stages. This system defines requirements on the basis of net energy (NE), net protein (NP), and ruminally degraded protein (RDP). Total NE and NP requirements are calculated as the sum of NE and NP required for several functions (maintenance, activity, thermoregulation, gain, lactation, and gestation). To estimate the requirements for each function, I collected data predominantly for wild species and then formulated allometric and other equations that predict requirements across species. I estimated RDP requirements using an equation for cattle. I then related NE, NP, and RDP to quantities more practical for diet formulation (e.g. dry matter intake). I tabulated requirements over a range of body mass and life stages (neonate, juvenile, nonproductive adult, lactating adult, and gestating adult). Tabulated requirements suggest that adults at peak lactation require greatest quantities of energy and neonates generally require greatest quantities of protein, agreeing with suggestions that lactation is energetically expensive and protein is most limiting during growth. Equations used in this system were precise (allometric equations had R(2) generally ≥0.89 and coefficient of variation <31.1%) and expected to reliably predict requirements across species. Results showed that a system for beef cattle would overestimate NE and either over- or underestimate NP for gain when applied to wild ruminants, showing that systems for wild ruminants should not extrapolate from requirements for domestic ruminants. One prominent system for wild ruminants predicted at times vastly different protein requirements from those predicted by the proposed system. The proposed system should be further evaluated and expanded to include other nutrients.

  17. Predicting the Effect of Mutations on Protein-Protein Binding Interactions through Structure-Based Interface Profiles

    PubMed Central

    Brender, Jeffrey R.; Zhang, Yang

    2015-01-01

    The formation of protein-protein complexes is essential for proteins to perform their physiological functions in the cell. Mutations that prevent the proper formation of the correct complexes can have serious consequences for the associated cellular processes. Since experimental determination of protein-protein binding affinity remains difficult when performed on a large scale, computational methods for predicting the consequences of mutations on binding affinity are highly desirable. We show that a scoring function based on interface structure profiles collected from analogous protein-protein interactions in the PDB is a powerful predictor of protein binding affinity changes upon mutation. As a standalone feature, the differences between the interface profile score of the mutant and wild-type proteins has an accuracy equivalent to the best all-atom potentials, despite being two orders of magnitude faster once the profile has been constructed. Due to its unique sensitivity in collecting the evolutionary profiles of analogous binding interactions and the high speed of calculation, the interface profile score has additional advantages as a complementary feature to combine with physics-based potentials for improving the accuracy of composite scoring approaches. By incorporating the sequence-derived and residue-level coarse-grained potentials with the interface structure profile score, a composite model was constructed through the random forest training, which generates a Pearson correlation coefficient >0.8 between the predicted and observed binding free-energy changes upon mutation. This accuracy is comparable to, or outperforms in most cases, the current best methods, but does not require high-resolution full-atomic models of the mutant structures. The binding interface profiling approach should find useful application in human-disease mutation recognition and protein interface design studies. PMID:26506533

  18. Predicting the Effect of Mutations on Protein-Protein Binding Interactions through Structure-Based Interface Profiles.

    PubMed

    Brender, Jeffrey R; Zhang, Yang

    2015-10-01

    The formation of protein-protein complexes is essential for proteins to perform their physiological functions in the cell. Mutations that prevent the proper formation of the correct complexes can have serious consequences for the associated cellular processes. Since experimental determination of protein-protein binding affinity remains difficult when performed on a large scale, computational methods for predicting the consequences of mutations on binding affinity are highly desirable. We show that a scoring function based on interface structure profiles collected from analogous protein-protein interactions in the PDB is a powerful predictor of protein binding affinity changes upon mutation. As a standalone feature, the differences between the interface profile score of the mutant and wild-type proteins has an accuracy equivalent to the best all-atom potentials, despite being two orders of magnitude faster once the profile has been constructed. Due to its unique sensitivity in collecting the evolutionary profiles of analogous binding interactions and the high speed of calculation, the interface profile score has additional advantages as a complementary feature to combine with physics-based potentials for improving the accuracy of composite scoring approaches. By incorporating the sequence-derived and residue-level coarse-grained potentials with the interface structure profile score, a composite model was constructed through the random forest training, which generates a Pearson correlation coefficient >0.8 between the predicted and observed binding free-energy changes upon mutation. This accuracy is comparable to, or outperforms in most cases, the current best methods, but does not require high-resolution full-atomic models of the mutant structures. The binding interface profiling approach should find useful application in human-disease mutation recognition and protein interface design studies.

  19. Protein design by fusion: implications for protein structure prediction and evolution

    SciTech Connect

    Skorupka, Katarzyna; Han, Seong Kyu; Nam, Hyun-Jun; Kim, Sanguk; Faham, Salem

    2013-11-19

    Domain fusion is a useful tool in protein design. Here, the structure of a fusion of the heterodimeric flagella-assembly proteins FliS and FliC is reported. Although the ability of the fusion protein to maintain the structure of the heterodimer may be apparent, threading-based structural predictions do not properly fuse the heterodimer. Additional examples of naturally occurring heterodimers that are homologous to full-length proteins were identified. These examples highlight that the designed protein was engineered by the same tools as used in the natural evolution of proteins and that heterodimeric structures contain a wealth of information, currently unused, that can improve structural predictions.

  20. Sequence-Based Prediction of Type III Secreted Proteins

    PubMed Central

    Arnold, Roland; Brandmaier, Stefan; Kleine, Frederick; Tischler, Patrick; Heinz, Eva; Behrens, Sebastian; Niinikoski, Antti; Mewes, Hans-Werner; Horn, Matthias; Rattei, Thomas

    2009-01-01

    The type III secretion system (TTSS) is a key mechanism for host cell interaction used by a variety of bacterial pathogens and symbionts of plants and animals including humans. The TTSS represents a molecular syringe with which the bacteria deliver effector proteins directly into the host cell cytosol. Despite the importance of the TTSS for bacterial pathogenesis, recognition and targeting of type III secreted proteins has up until now been poorly understood. Several hypotheses are discussed, including an mRNA-based signal, a chaperon-mediated process, or an N-terminal signal peptide. In this study, we systematically analyzed the amino acid composition and secondary structure of N-termini of 100 experimentally verified effector proteins. Based on this, we developed a machine-learning approach for the prediction of TTSS effector proteins, taking into account N-terminal sequence features such as frequencies of amino acids, short peptides, or residues with certain physico-chemical properties. The resulting computational model revealed a strong type III secretion signal in the N-terminus that can be used to detect effectors with sensitivity of ∼71% and selectivity of ∼85%. This signal seems to be taxonomically universal and conserved among animal pathogens and plant symbionts, since we could successfully detect effector proteins if the respective group was excluded from training. The application of our prediction approach to 739 complete bacterial and archaeal genome sequences resulted in the identification of between 0% and 12% putative TTSS effector proteins. Comparison of effector proteins with orthologs that are not secreted by the TTSS showed no clear pattern of signal acquisition by fusion, suggesting convergent evolutionary processes shaping the type III secretion signal. The newly developed program EffectiveT3 (http://www.chlamydiaedb.org) is the first universal in silico prediction program for the identification of novel TTSS effectors. Our findings will

  1. PREDITOR: a web server for predicting protein torsion angle restraints

    PubMed Central

    Berjanskii, Mark V.; Neal, Stephen; Wishart, David S.

    2006-01-01

    Every year between 500 and 1000 peptide and protein structures are determined by NMR and deposited into the Protein Data Bank. However, the process of NMR structure determination continues to be a manually intensive and time-consuming task. One of the most tedious and error-prone aspects of this process involves the determination of torsion angle restraints including phi, psi, omega and chi angles. Most methods require many days of additional experiments, painstaking measurements or complex calculations. Here we wish to describe a web server, called PREDITOR, which greatly accelerates and simplifies this task. PREDITOR accepts sequence and/or chemical shift data as input and generates torsion angle predictions (with predicted errors) for phi, psi, omega and chi-1 angles. PREDITOR combines sequence alignment methods with advanced chemical shift analysis techniques to generate its torsion angle predictions. The method is fast (<40 s per protein) and accurate, with 88% of phi/psi predictions being within 30° of the correct values, 84% of chi-1 predictions being correct and 99.97% of omega angles being correct. PREDITOR is 35 times faster and up to 20% more accurate than any existing method. PREDITOR also provides accurate assessments of the torsion angle errors so that the torsion angle constraints can be readily fed into standard structure refinement programs, such as CNS, XPLOR, AMBER and CYANA. Other unique features to PREDITOR include dihedral angle prediction via PDB structure mapping, automated chemical shift re-referencing (to improve accuracy), prediction of proline cis/trans states and a simple user interface. The PREDITOR website is located at: . PMID:16845087

  2. Topological Predictions for Integral Membrane Channel and Carrier Proteins

    PubMed Central

    Abhinay, Reddy; Jaehoon, Cho; Sam, Ling; Vamsee, Reddy; Maksim, Shlykov; Milton, Saier

    2014-01-01

    We evaluated topological predictions for nine different programs, HMMTOP, TMHMM, SVMTOP, DAS, SOSUI, TOPCONS, PHOBIUS, MEMSAT-SVM (hereinafter referred to as MEMSAT), and SPOCTOPUS. These programs were first evaluated using four large topologically well-defined families of secondary transporters, and the three best programs were further evaluated using topologically more diverse families of channels and carriers. In the initial studies, the order of accuracy was: SPOCTOPUS>MEMSAT>HMMTOP>TOPCONS>PHOBIUS>TMHMM>SVMTOP>DAS>S OSUI. Some families, such as the Sugar Porter family (2.A.1.1) of the Major Facilitator Superfamily (MFS; TC# 2.A.1) and the Amino acid/Polyamine/Organocation (APC) Family (TC# 2.A.3), were correctly predicted with high accuracy while others, such as the Mitochondrial Carrier (MC) (TC# 2.A.29) and the K+ transporter (Trk) families (TC# 2.A.38), were predicted with much lower accuracy. For small, topologically homogeneous families, SPOCTOPUS and MEMSAT were generally most reliable, while with large, more diverse superfamilies, HMMTOP often proved to have the greatest prediction accuracy. We next developed a novel program, TM-STATS, that tabulates HMMTOP, SPOCTOPUS or MEMSAT-based topological predictions for any subdivision (class, subclass, superfamily, family, subfamily, or any combination of these) of the Transporter Classification Database (TCDB; www.tcdb.org) and examined the following subclasses: α-type channel proteins (TC subclasses 1.A and 1.E), secreted poreforming toxins (TC subclass 1.C) and secondary carriers (subclass 2.A). Histograms 3 were generated for each of these subclasses, and the results were analyzed according to subclass, family and protein. The results provide an update of topological predictions for integral membrane transport proteins as well as guides for the development of more reliable topological prediction programs, taking family-specific characteristics into account. PMID:24992992

  3. Predicting and analyzing protein phosphorylation sites in plants using musite.

    PubMed

    Yao, Qiuming; Gao, Jianjiong; Bollinger, Curtis; Thelen, Jay J; Xu, Dong

    2012-01-01

    Although protein phosphorylation sites can be reliably identified with high-resolution mass spectrometry, the experimental approach is time-consuming and resource-dependent. Furthermore, it is unlikely that an experimental approach could catalog an entire phosphoproteome. Computational prediction of phosphorylation sites provides an efficient and flexible way to reveal potential phosphorylation sites and provide hypotheses in experimental design. Musite is a tool that we previously developed to predict phosphorylation sites based solely on protein sequence. However, it was not comprehensively applied to plants. In this study, the phosphorylation data from Arabidopsis thaliana, B. napus, G. max, M. truncatula, O. sativa, and Z. mays were collected for cross-species testing and the overall plant-specific prediction as well. The results show that the model for A. thaliana can be extended to other organisms, and the overall plant model from Musite outperforms the current plant-specific prediction tools, Plantphos, and PhosphAt, in prediction accuracy. Furthermore, a comparative study of predicted phosphorylation sites across orthologs among different plants was conducted to reveal potential evolutionary features. A bipolar distribution of isolated, non-conserved phosphorylation sites, and highly conserved ones in terms of the amino acid type was observed. It also shows that predicted phosphorylation sites conserved within orthologs do not necessarily share more sequence similarity in the flanking regions than the background, but they often inherit protein disorder, a property that does not necessitate high sequence conservation. Our analysis also suggests that the phosphorylation frequencies among serine, threonine, and tyrosine correlate with their relative proportion in disordered regions. Musite can be used as a web server (http://musite.net) or downloaded as an open-source standalone tool (http://musite.sourceforge.net/).

  4. DOCK/PIERR: web server for structure prediction of protein-protein complexes.

    PubMed

    Viswanath, Shruthi; Ravikant, D V S; Elber, Ron

    2014-01-01

    In protein docking we aim to find the structure of the complex formed when two proteins interact. Protein-protein interactions are crucial for cell function. Here we discuss the usage of DOCK/PIERR. In DOCK/PIERR, a uniformly discrete sampling of orientations of one protein with respect to the other, are scored, followed by clustering, refinement, and reranking of structures. The novelty of this method lies in the scoring functions used. These are obtained by examining hundreds of millions of correctly and incorrectly docked structures, using an algorithm based on mathematical programming, with provable convergence properties.

  5. Unfolded protein ensembles, folding trajectories, and refolding rate prediction.

    PubMed

    Das, A; Sin, B K; Mohazab, A R; Plotkin, S S

    2013-09-28

    Computer simulations can provide critical information on the unfolded ensemble of proteins under physiological conditions, by explicitly characterizing the geometrical properties of the diverse conformations that are sampled in the unfolded state. A general computational analysis across many proteins has not been implemented however. Here, we develop a method for generating a diverse conformational ensemble, to characterize properties of the unfolded states of intrinsically disordered or intrinsically folded proteins. The method allows unfolded proteins to retain disulfide bonds. We examined physical properties of the unfolded ensembles of several proteins, including chemical shifts, clustering properties, and scaling exponents for the radius of gyration with polymer length. A problem relating simulated and experimental residual dipolar couplings is discussed. We apply our generated ensembles to the problem of folding kinetics, by examining whether the ensembles of some proteins are closer geometrically to their folded structures than others. We find that for a randomly selected dataset of 15 non-homologous 2- and 3-state proteins, quantities such as the average root mean squared deviation between the folded structure and unfolded ensemble correlate with folding rates as strongly as absolute contact order. We introduce a new order parameter that measures the distance travelled per residue, which naturally partitions into a smooth "laminar" and subsequent "turbulent" part of the trajectory. This latter conceptually simple measure with no fitting parameters predicts folding rates in 0 M denaturant with remarkable accuracy (r = -0.95, p = 1 × 10(-7)). The high correlation between folding times and sterically modulated, reconfigurational motion supports the rapid collapse of proteins prior to the transition state as a generic feature in the folding of both two-state and multi-state proteins. This method for generating unfolded ensembles provides a powerful approach to

  6. Unfolded protein ensembles, folding trajectories, and refolding rate prediction

    NASA Astrophysics Data System (ADS)

    Das, A.; Sin, B. K.; Mohazab, A. R.; Plotkin, S. S.

    2013-09-01

    Computer simulations can provide critical information on the unfolded ensemble of proteins under physiological conditions, by explicitly characterizing the geometrical properties of the diverse conformations that are sampled in the unfolded state. A general computational analysis across many proteins has not been implemented however. Here, we develop a method for generating a diverse conformational ensemble, to characterize properties of the unfolded states of intrinsically disordered or intrinsically folded proteins. The method allows unfolded proteins to retain disulfide bonds. We examined physical properties of the unfolded ensembles of several proteins, including chemical shifts, clustering properties, and scaling exponents for the radius of gyration with polymer length. A problem relating simulated and experimental residual dipolar couplings is discussed. We apply our generated ensembles to the problem of folding kinetics, by examining whether the ensembles of some proteins are closer geometrically to their folded structures than others. We find that for a randomly selected dataset of 15 non-homologous 2- and 3-state proteins, quantities such as the average root mean squared deviation between the folded structure and unfolded ensemble correlate with folding rates as strongly as absolute contact order. We introduce a new order parameter that measures the distance travelled per residue, which naturally partitions into a smooth "laminar" and subsequent "turbulent" part of the trajectory. This latter conceptually simple measure with no fitting parameters predicts folding rates in 0 M denaturant with remarkable accuracy (r = -0.95, p = 1 × 10-7). The high correlation between folding times and sterically modulated, reconfigurational motion supports the rapid collapse of proteins prior to the transition state as a generic feature in the folding of both two-state and multi-state proteins. This method for generating unfolded ensembles provides a powerful approach to

  7. Electrostatics, structure prediction, and the energy landscapes for protein folding and binding.

    PubMed

    Tsai, Min-Yeh; Zheng, Weihua; Balamurugan, D; Schafer, Nicholas P; Kim, Bobby L; Cheung, Margaret S; Wolynes, Peter G

    2016-01-01

    While being long in range and therefore weakly specific, electrostatic interactions are able to modulate the stability and folding landscapes of some proteins. The relevance of electrostatic forces for steering the docking of proteins to each other is widely acknowledged, however, the role of electrostatics in establishing specifically funneled landscapes and their relevance for protein structure prediction are still not clear. By introducing Debye-Hückel potentials that mimic long-range electrostatic forces into the Associative memory, Water mediated, Structure, and Energy Model (AWSEM), a transferable protein model capable of predicting tertiary structures, we assess the effects of electrostatics on the landscapes of thirteen monomeric proteins and four dimers. For the monomers, we find that adding electrostatic interactions does not improve structure prediction. Simulations of ribosomal protein S6 show, however, that folding stability depends monotonically on electrostatic strength. The trend in predicted melting temperatures of the S6 variants agrees with experimental observations. Electrostatic effects can play a range of roles in binding. The binding of the protein complex KIX-pKID is largely assisted by electrostatic interactions, which provide direct charge-charge stabilization of the native state and contribute to the funneling of the binding landscape. In contrast, for several other proteins, including the DNA-binding protein FIS, electrostatics causes frustration in the DNA-binding region, which favors its binding with DNA but not with its protein partner. This study highlights the importance of long-range electrostatics in functional responses to problems where proteins interact with their charged partners, such as DNA, RNA, as well as membranes.

  8. MASS FUNCTION PREDICTIONS BEYOND {Lambda}CDM

    SciTech Connect

    Bhattacharya, Suman; Lukic, Zarija; Habib, Salman; Heitmann, Katrin; White, Martin; Wagner, Christian

    2011-05-10

    The statistics of dark matter halos is an essential component of precision cosmology. The mass distribution of halos, as specified by the halo mass function, is a key input for several cosmological probes. The sizes of N-body simulations are now such that, for the most part, results need no longer be statistics-limited, but are still subject to various systematic uncertainties. Discrepancies in the results of simulation campaigns for the halo mass function remain in excess of statistical uncertainties and of roughly the same size as the error limits set by near-future observations; we investigate and discuss some of the reasons for these differences. Quantifying error sources and compensating for them as appropriate, we carry out a high-statistics study of dark matter halos from 67 N-body simulations to investigate the mass function and its evolution for a reference {Lambda}CDM cosmology and for a set of wCDM cosmologies. For the reference {Lambda}CDM cosmology (close to WMAP5), we quantify the breaking of universality in the form of the mass function as a function of redshift, finding an evolution of as much as 10% away from the universal form between redshifts z = 0 and z = 2. For cosmologies very close to this reference we provide a fitting formula to our results for the (evolving) {Lambda}CDM mass function over a mass range of 6 x 10{sup 11}-3 x 10{sup 15} M{sub sun} to an estimated accuracy of about 2%. The set of wCDM cosmologies is taken from the Coyote Universe simulation suite. The mass functions from this suite (which includes a {Lambda}CDM cosmology and others with w {approx_equal} -1) are described by the fitting formula for the reference {Lambda}CDM case at an accuracy level of 10%, but with clear systematic deviations. We argue that, as a consequence, fitting formulae based on a universal form for the mass function may have limited utility in high-precision cosmological applications.

  9. Mass Function Predictions Beyond ΛCDM

    NASA Astrophysics Data System (ADS)

    Bhattacharya, Suman; Heitmann, Katrin; White, Martin; Lukić, Zarija; Wagner, Christian; Habib, Salman

    2011-05-01

    The statistics of dark matter halos is an essential component of precision cosmology. The mass distribution of halos, as specified by the halo mass function, is a key input for several cosmological probes. The sizes of N-body simulations are now such that, for the most part, results need no longer be statistics-limited, but are still subject to various systematic uncertainties. Discrepancies in the results of simulation campaigns for the halo mass function remain in excess of statistical uncertainties and of roughly the same size as the error limits set by near-future observations; we investigate and discuss some of the reasons for these differences. Quantifying error sources and compensating for them as appropriate, we carry out a high-statistics study of dark matter halos from 67 N-body simulations to investigate the mass function and its evolution for a reference ΛCDM cosmology and for a set of wCDM cosmologies. For the reference ΛCDM cosmology (close to WMAP5), we quantify the breaking of universality in the form of the mass function as a function of redshift, finding an evolution of as much as 10% away from the universal form between redshifts z = 0 and z = 2. For cosmologies very close to this reference we provide a fitting formula to our results for the (evolving) ΛCDM mass function over a mass range of 6 × 1011-3 × 1015 M sun to an estimated accuracy of about 2%. The set of wCDM cosmologies is taken from the Coyote Universe simulation suite. The mass functions from this suite (which includes a ΛCDM cosmology and others with w ~= -1) are described by the fitting formula for the reference ΛCDM case at an accuracy level of 10%, but with clear systematic deviations. We argue that, as a consequence, fitting formulae based on a universal form for the mass function may have limited utility in high-precision cosmological applications.

  10. The Amyloid Precursor Protein Controls PIKfyve Function

    PubMed Central

    Balklava, Zita; Niehage, Christian; Currinn, Heather; Mellor, Laura; Guscott, Benjamin; Poulin, Gino; Hoflack, Bernard; Wassmer, Thomas

    2015-01-01

    While the Amyloid Precursor Protein (APP) plays a central role in Alzheimer’s disease, its cellular function still remains largely unclear. It was our goal to establish APP function which will provide insights into APP's implication in Alzheimer's disease. Using our recently developed proteo-liposome assay we established the interactome of APP's intracellular domain (known as AICD), thereby identifying novel APP interactors that provide mechanistic insights into APP function. By combining biochemical, cell biological and genetic approaches we validated the functional significance of one of these novel interactors. Here we show that APP binds the PIKfyve complex, an essential kinase for the synthesis of the endosomal phosphoinositide phosphatidylinositol-3,5-bisphosphate. This signalling lipid plays a crucial role in endosomal homeostasis and receptor sorting. Loss of PIKfyve function by mutation causes profound neurodegeneration in mammals. Using C. elegans genetics we demonstrate that APP functionally cooperates with PIKfyve in vivo. This regulation is required for maintaining endosomal and neuronal function. Our findings establish an unexpected role for APP in the regulation of endosomal phosphoinositide metabolism with dramatic consequences for endosomal biology and important implications for our understanding of Alzheimer's disease. PMID:26125944

  11. The Amyloid Precursor Protein Controls PIKfyve Function.

    PubMed

    Balklava, Zita; Niehage, Christian; Currinn, Heather; Mellor, Laura; Guscott, Benjamin; Poulin, Gino; Hoflack, Bernard; Wassmer, Thomas

    2015-01-01

    While the Amyloid Precursor Protein (APP) plays a central role in Alzheimer's disease, its cellular function still remains largely unclear. It was our goal to establish APP function which will provide insights into APP's implication in Alzheimer's disease. Using our recently developed proteo-liposome assay we established the interactome of APP's intracellular domain (known as AICD), thereby identifying novel APP interactors that provide mechanistic insights into APP function. By combining biochemical, cell biological and genetic approaches we validated the functional significance of one of these novel interactors. Here we show that APP binds the PIKfyve complex, an essential kinase for the synthesis of the endosomal phosphoinositide phosphatidylinositol-3,5-bisphosphate. This signalling lipid plays a crucial role in endosomal homeostasis and receptor sorting. Loss of PIKfyve function by mutation causes profound neurodegeneration in mammals. Using C. elegans genetics we demonstrate that APP functionally cooperates with PIKfyve in vivo. This regulation is required for maintaining endosomal and neuronal function. Our findings establish an unexpected role for APP in the regulation of endosomal phosphoinositide metabolism with dramatic consequences for endosomal biology and important implications for our understanding of Alzheimer's disease. PMID:26125944

  12. Tandem Repeats in Proteins: Prediction Algorithms and Biological Role

    PubMed Central

    Pellegrini, Marco

    2015-01-01

    Tandem repetitions in protein sequence and structure is a fascinating subject of research which has been a focus of study since the late 1990s. In this survey, we give an overview on the multi-faceted aspects of research on protein tandem repeats (PTR for short), including prediction algorithms, databases, early classification efforts, mechanisms of PTR formation and evolution, and synthetic PTR design. We also touch on the rather open issue of the relationship between PTR and flexibility (or disorder) in proteins. Detection of PTR either from protein sequence or structure data is challenging due to inherent high (biological) signal-to-noise ratio that is a key feature of this problem. As early in silico analytic tools have been key enablers for starting this field of study, we expect that current and future algorithmic and statistical breakthroughs will have a high impact on the investigations of the biological role of PTR. PMID:26442257

  13. Neural network definitions of highly predictable protein secondary structure classes

    SciTech Connect

    Lapedes, A. |; Steeg, E.; Farber, R.

    1994-02-01

    We use two co-evolving neural networks to determine new classes of protein secondary structure which are significantly more predictable from local amino sequence than the conventional secondary structure classification. Accurate prediction of the conventional secondary structure classes: alpha helix, beta strand, and coil, from primary sequence has long been an important problem in computational molecular biology. Neural networks have been a popular method to attempt to predict these conventional secondary structure classes. Accuracy has been disappointingly low. The algorithm presented here uses neural networks to similtaneously examine both sequence and structure data, and to evolve new classes of secondary structure that can be predicted from sequence with significantly higher accuracy than the conventional classes. These new classes have both similarities to, and differences with the conventional alpha helix, beta strand and coil.

  14. Protein secondary structure prediction using logic-based machine learning.

    PubMed

    Muggleton, S; King, R D; Sternberg, M J

    1992-10-01

    Many attempts have been made to solve the problem of predicting protein secondary structure from the primary sequence but the best performance results are still disappointing. In this paper, the use of a machine learning algorithm which allows relational descriptions is shown to lead to improved performance. The Inductive Logic Programming computer program, Golem, was applied to learning secondary structure prediction rules for alpha/alpha domain type proteins. The input to the program consisted of 12 non-homologous proteins (1612 residues) of known structure, together with a background knowledge describing the chemical and physical properties of the residues. Golem learned a small set of rules that predict which residues are part of the alpha-helices--based on their positional relationships and chemical and physical properties. The rules were tested on four independent non-homologous proteins (416 residues) giving an accuracy of 81% (+/- 2%). This is an improvement, on identical data, over the previously reported result of 73% by King and Sternberg (1990, J. Mol. Biol., 216, 441-457) using the machine learning program PROMIS, and of 72% using the standard Garnier-Osguthorpe-Robson method. The best previously reported result in the literature for the alpha/alpha domain type is 76%, achieved using a neural net approach. Machine learning also has the advantage over neural network and statistical methods in producing more understandable results. PMID:1480619

  15. Prediction of Protein-DNA binding by Monte Carlo method

    NASA Astrophysics Data System (ADS)

    Deng, Yuefan; Eisenberg, Moises; Korobka, Alex

    1997-08-01

    We present an analysis and prediction of protein-DNA binding specificity based on the hydrogen bonding between DNA, protein, and auxillary clusters of water molecules. Zif268, glucocorticoid receptor, λ-repressor mutant, HIN-recombinase, and tramtrack protein-DNA complexes are studied. Hydrogen bonds are approximated by the Lennard-Jones potential with a cutoff distance between the hydrogen and the acceptor atoms set to 3.2 Åand an angular component based on a dipole-dipole interaction. We use a three-stage docking algorithm: geometric hashing that matches pairs of hydrogen bonding sites; (2) least-squares minimization of pairwise distances to filter out insignificant matches; and (3) Monte Carlo stochastic search to minimize the energy of the system. More information can be obtained from our first paper on this subject [Y.Deng et all, J.Computational Chemistry (1995)]. Results show that the biologically correct base pair is selected preferentially when there are two or more strong hydrogen bonds (with LJ potential lower than -0.20) that bind it to the protein. Predicted sequences are less stable in the case of weaker bonding sites. In general the inclusion of water bridges does increase the number of base pairs for which correct specificity is predicted.

  16. DeepCNF-D: Predicting Protein Order/Disorder Regions by Weighted Deep Convolutional Neural Fields.

    PubMed

    Wang, Sheng; Weng, Shunyan; Ma, Jianzhu; Tang, Qingming

    2015-01-01

    Intrinsically disordered proteins or protein regions are involved in key biological processes including regulation of transcription, signal transduction, and alternative splicing. Accurately predicting order/disorder regions ab initio from the protein sequence is a prerequisite step for further analysis of functions and mechanisms for these disordered regions. This work presents a learning method, weighted DeepCNF (Deep Convolutional Neural Fields), to improve the accuracy of order/disorder prediction by exploiting the long-range sequential information and the interdependency between adjacent order/disorder labels and by assigning different weights for each label during training and prediction to solve the label imbalance issue. Evaluated by the CASP9 and CASP10 targets, our method obtains 0.855 and 0.898 AUC values, which are higher than the state-of-the-art single ab initio predictors.

  17. [Location and functions of secretagogin protein].

    PubMed

    Liu, Qin; Lai, Maode

    2016-01-01

    Secretagogin (SCGN) is a novel member of EF-hand Ca2+-binding proteins, which was identified in islet β cells by Wagner. SCGN is a six EF-hand Ca2+-binding protein, primarily expressed on the neuroendocrine axis and the central nervous system. The protein has abundant biological functions. A certain concentration of calcium ion can lead to conformation change of SCGN, resulting in the change of intracellular signal transduction. Preliminary studies showed that SCGN would be used to treat stress reaction, such as mental illness (depression), burns or post-traumatic stress disorder and chronic stress reaction caused by pain. In Alzheimer's disease, the expression of SCGN in the hippocampus can boycott neurodegeneration. In neuroendocrine tumors, SCGN presents a good consistency with neuroendocrine markers such as CgA, Syn, and NSE, with a higher overall sensitivity and specificity. In addition, SCGN is released into serum after neural damage in cerebral ischemic diseases, suggesting that SCGN can be used as a marker for brain trauma. In this article, we review the recent research progress of secretagogin, focus on its distribution and functions in various tumorous diseases and non-tumorous diseases, such as Alzheimer's disease. PMID:27045242

  18. Nanostructured functional films from engineered repeat proteins

    PubMed Central

    Grove, Tijana Z.; Regan, Lynne; Cortajarena, Aitziber L.

    2013-01-01

    Fundamental advances in biotechnology, medicine, environment, electronics and energy require methods for precise control of spatial organization at the nanoscale. Assemblies that rely on highly specific biomolecular interactions are an attractive approach to form materials that display novel and useful properties. Here, we report on assembly of films from the designed, rod-shaped, superhelical, consensus tetratricopeptide repeat protein (CTPR). We have designed three peptide-binding sites into the 18 repeat CTPR to allow for further specific and non-covalent functionalization of films through binding of fluorescein labelled peptides. The fluorescence signal from the peptide ligand bound to the protein in the solid film is anisotropic, demonstrating that CTPR films can impose order on otherwise isotropic moieties. Circular dichroism measurements show that the individual protein molecules retain their secondary structure in the film, and X-ray scattering, birefringence and atomic force microscopy experiments confirm macroscopic alignment of CTPR molecules within the film. This work opens the door to the generation of innovative biomaterials with tailored structure and function. PMID:23594813

  19. Early executive function predicts reasoning development.

    PubMed

    Richland, Lindsey E; Burchinal, Margaret R

    2013-01-01

    Analogical reasoning is a core cognitive skill that distinguishes humans from all other species and contributes to general fluid intelligence, creativity, and adaptive learning capacities. Yet its origins are not well understood. In the study reported here, we analyzed large-scale longitudinal data from the Study of Early Child Care and Youth Development to test predictors of growth in analogical-reasoning skill from third grade to adolescence. Our results suggest an integrative resolution to the theoretical debate regarding contributory factors arising from smaller-scale, cross-sectional experiments on analogy development. Children with greater executive-function skills (both composite and inhibitory control) and vocabulary knowledge in early elementary school displayed higher scores on a verbal analogies task at age 15 years, even after adjusting for key covariates. We posit that knowledge is a prerequisite to analogy performance, but strong executive-functioning resources during early childhood are related to long-term gains in fundamental reasoning skills.

  20. Genome-scale prediction of proteins with long intrinsically disordered regions.

    PubMed

    Peng, Zhenling; Mizianty, Marcin J; Kurgan, Lukasz

    2014-01-01

    Proteins with long disordered regions (LDRs), defined as having 30 or more consecutive disordered residues, are abundant in eukaryotes, and these regions are recognized as a distinct class of biologically functional domains. LDRs facilitate various cellular functions and are important for target selection in structural genomics. Motivated by the lack of methods that directly predict proteins with LDRs, we designed Super-fast predictor of proteins with Long Intrinsically DisordERed regions (SLIDER). SLIDER utilizes logistic regression that takes an empirically chosen set of numerical features, which consider selected physicochemical properties of amino acids, sequence complexity, and amino acid composition, as its inputs. Empirical tests show that SLIDER offers competitive predictive performance combined with low computational cost. It outperforms, by at least a modest margin, a comprehensive set of modern disorder predictors (that can indirectly predict LDRs) and is 16 times faster compared to the best currently available disorder predictor. Utilizing our time-efficient predictor, we characterized abundance and functional roles of proteins with LDRs over 110 eukaryotic proteomes. Similar to related studies, we found that eukaryotes have many (on average 30.3%) proteins with LDRs with majority of proteomes having between 25 and 40%, where higher abundance is characteristic to proteomes that have larger proteins. Our first-of-its-kind large-scale functional analysis shows that these proteins are enriched in a number of cellular functions and processes including certain binding events, regulation of catalytic activities, cellular component organization, biogenesis, biological regulation, and some metabolic and developmental processes. A webserver that implements SLIDER is available at http://biomine.ece.ualberta.ca/SLIDER/.

  1. Protein structure prediction using residue- and fragment-environment potentials in CASP11.

    PubMed

    Kim, Hyungrae; Kihara, Daisuke

    2016-09-01

    An accurate scoring function that can select near-native structure models from a pool of alternative models is key for successful protein structure prediction. For the critical assessment of techniques for protein structure prediction (CASP) 11, we have built a protocol of protein structure prediction that has novel coarse-grained scoring functions for selecting decoys as the heart of its pipeline. The score named PRESCO (Protein Residue Environment SCOre) developed recently by our group evaluates the native-likeness of local structural environment of residues in a structure decoy considering positions and the depth of side-chains of spatially neighboring residues. We also introduced a helix interaction potential as an additional scoring function for selecting decoys. The best models selected by PRESCO and the helix interaction potential underwent structure refinement, which includes side-chain modeling and relaxation with a short molecular dynamics simulation. Our protocol was successful, achieving the top rank in the free modeling category with a significant margin of the accumulated Z-score to the subsequent groups when the top 1 models were considered. Proteins 2016; 84(Suppl 1):105-117. © 2015 Wiley Periodicals, Inc.

  2. A Historical Perspective and Overview of Protein Structure Prediction

    NASA Astrophysics Data System (ADS)

    Wooley, John C.; Ye, Yuzhen

    Carrying on many different biological functions, proteins are all composed of one or more polypeptide chains, each containing from several to hundreds or even thousands of the 20 amino acids. During the 1950s at the dawn of modern biochemistry, an essential question for biochemists was to understand the structure and function of these polypeptide chains. The sequences of protein, also referred to as their primary structures, determine the different chemical properties for different proteins, and thus continue to captivate much of the attention of biochemists. As an early step in characterizing protein chemistry, British biochemist Frederick Sanger designed an experimental method to identify the sequence of insulin (Sanger et al., 1955). He became the first person to obtain the primary structure of a protein and in 1958 won his first Nobel Price in Chemistry. This important progress in sequencing did not answer the question of whether a single (individual) protein has a distinctive shape in three dimensions (3D), and if so, what factors determine its 3D architecture. However, during the period when Sanger was studying the primary structure of proteins, American biochemist Christian Anfinsen observed that the active polypeptide chain of a model protein, bovine pancreatic ribonuclease (RNase), could fold spontaneously into a unique 3D structure, which was later called native conformation of the protein (Anfinsen et al., 1954). Anfinsen also studied the refolding of RNase enzyme and observed that an enzyme unfolded under extreme chemical environment could refold spontaneously back into its native conformation upon changing the environment back to natural conditions (Anfinsen et al., 1961). By 1962, Anfinsen had developed his theory of protein folding (which was summarized in his 1972 Nobel acceptance speech): "The native conformation is determined by the totality of interatomic interactions and hence, by the amino acid sequence, in a given environment."

  3. A simple feature construction method for predicting upstream/downstream signal flow in human protein-protein interaction networks

    PubMed Central

    Mei, Suyu; Zhu, Hao

    2015-01-01

    Signaling pathways play important roles in understanding the underlying mechanism of cell growth, cell apoptosis, organismal development and pathways-aberrant diseases. Protein-protein interaction (PPI) networks are commonly-used infrastructure to infer signaling pathways. However, PPI networks generally carry no information of upstream/downstream relationship between interacting proteins, which retards our inferring the signal flow of signaling pathways. In this work, we propose a simple feature construction method to train a SVM (support vector machine) classifier to predict PPI upstream/downstream relations. The domain based asymmetric feature representation naturally embodies domain-domain upstream/downstream relations, providing an unconventional avenue to predict the directionality between two objects. Moreover, we propose a semantically interpretable decision function and a macro bag-level performance metric to satisfy the need of two-instance depiction of an interacting protein pair. Experimental results show that the proposed method achieves satisfactory cross validation performance and independent test performance. Lastly, we use the trained model to predict the PPIs in HPRD, Reactome and IntAct. Some predictions have been validated against recent literature. PMID:26648121

  4. Prediction of protein structural features from sequence data based on Shannon entropy and Kolmogorov complexity.

    PubMed

    Bywater, Robert Paul

    2015-01-01

    While the genome for a given organism stores the information necessary for the organism to function and flourish it is the proteins that are encoded by the genome that perhaps more than anything else characterize the phenotype for that organism. It is therefore not surprising that one of the many approaches to understanding and predicting protein folding and properties has come from genomics and more specifically from multiple sequence alignments. In this work I explore ways in which data derived from sequence alignment data can be used to investigate in a predictive way three different aspects of protein structure: secondary structures, inter-residue contacts and the dynamics of switching between different states of the protein. In particular the use of Kolmogorov complexity has identified a novel pathway towards achieving these goals.

  5. [Functions of prion protein PrPc].

    PubMed

    Cazaubon, Sylvie; Viegas, Pedro; Couraud, Pierre-Olivier

    2007-01-01

    It is now well established that both normal and pathological (or scrapie) isoforms of prion protein, PrPc and PrPsc respectively, are involved in the development and progression of various forms of neurodegenerative diseases, including scrapie in sheep, bovine spongiform encephalopathy (or "mad cow disease") and Creutzfeldt-Jakob disease in human, collectively known as prion diseases. The protein PrPc is highly expressed in the central nervous system in neurons and glial cells, and also present in non-brain cells, such as immune cells or epithelial and endothelial cells. Identification of the physiological functions of PrPc in these different cell types thus appears crucial for understanding the progression of prion diseases. Recent studies highlighted several major roles for PrPc that may be considered in two major domains : (1) cell survival (protection against oxidative stress and apoptosis) and (2) cell adhesion. In association with cell adhesion, distinct functions of PrPc were observed, depending on cell types : neuronal differentiation, epithelial and endothelial barrier integrity, transendothelial migration of monocytes, T cell activation. These observations suggest that PrPc functions may be particularly relevant to cellular stress, as well as inflammatory or infectious situations. PMID:17875293

  6. Structure-Based Prediction of Protein-Folding Transition Paths.

    PubMed

    Jacobs, William M; Shakhnovich, Eugene I

    2016-09-01

    We propose a general theory to describe the distribution of protein-folding transition paths. We show that transition paths follow a predictable sequence of high-free-energy transient states that are separated by free-energy barriers. Each transient state corresponds to the assembly of one or more discrete, cooperative units, which are determined directly from the native structure. We show that the transition state on a folding pathway is reached when a small number of critical contacts are formed between a specific set of substructures, after which folding proceeds downhill in free energy. This approach suggests a natural resolution for distinguishing parallel folding pathways and provides a simple means to predict the rate-limiting step in a folding reaction. Our theory identifies a common folding mechanism for proteins with diverse native structures and establishes general principles for the self-assembly of polymers with specific interactions. PMID:27602721

  7. Structure-Based Prediction of Protein-Folding Transition Paths

    NASA Astrophysics Data System (ADS)

    Jacobs, William M.; Shakhnovich, Eugene I.

    2016-09-01

    We propose a general theory to describe the distribution of protein-folding transition paths. We show that transition paths follow a predictable sequence of high-free-energy transient states that are separated by free-energy barriers. Each transient state corresponds to the assembly of one or more discrete, cooperative units, which are determined directly from the native structure. We show that the transition state on a folding pathway is reached when a small number of critical contacts are formed between a specific set of substructures, after which folding proceeds downhill in free energy. This approach suggests a natural resolution for distinguishing parallel folding pathways and provides a simple means to predict the rate-limiting step in a folding reaction. Our theory identifies a common folding mechanism for proteins with diverse native structures and establishes general principles for the self-assembly of polymers with specific interactions.

  8. Probabilistic Prediction of Contacts in Protein-Ligand Complexes

    PubMed Central

    Hakulinen, Riku; Puranen, Santeri; Lehtonen, Jukka V.; Johnson, Mark S.; Corander, Jukka

    2012-01-01

    We introduce a statistical method for evaluating atomic level 3D interaction patterns of protein-ligand contacts. Such patterns can be used for fast separation of likely ligand and ligand binding site combinations out of all those that are geometrically possible. The practical purpose of this probabilistic method is for molecular docking and scoring, as an essential part of a scoring function. Probabilities of interaction patterns are calculated conditional on structural x-ray data and predefined chemical classification of molecular fragment types. Spatial coordinates of atoms are modeled using a Bayesian statistical framework with parametric 3D probability densities. The parameters are given distributions a priori, which provides the possibility to update the densities of model parameters with new structural data and use the parameter estimates to create a contact hierarchy. The contact preferences can be defined for any spatial area around a specified type of fragment. We compared calculated contact point hierarchies with the number of contact atoms found near the contact point in a reference set of x-ray data, and found that these were in general in a close agreement. Additionally, using substrate binding site in cathechol-O-methyltransferase and 27 small potential binder molecules, it was demonstrated that these probabilities together with auxiliary parameters separate well ligands from decoys (true positive rate 0.75, false positive rate 0). A particularly useful feature of the proposed Bayesian framework is that it also characterizes predictive uncertainty in terms of probabilities, which have an intuitive interpretation from the applied perspective. PMID:23155467

  9. Using viromes to predict novel immune proteins in non-model organisms.

    PubMed

    Quistad, Steven D; Lim, Yan Wei; Silva, Genivaldo Gueiros Z; Nelson, Craig E; Haas, Andreas F; Kelly, Linda Wegley; Edwards, Robert A; Rohwer, Forest L

    2016-08-31

    Immunity is mostly studied in a few model organisms, leaving the majority of immune systems on the planet unexplored. To characterize the immune systems of non-model organisms alternative approaches are required. Viruses manipulate host cell biology through the expression of proteins that modulate the immune response. We hypothesized that metagenomic sequencing of viral communities would be useful to identify both known and unknown host immune proteins. To test this hypothesis, a mock human virome was generated and compared to the human proteome using tBLASTn, resulting in 36 proteins known to be involved in immunity. This same pipeline was then applied to reef-building coral, a non-model organism that currently lacks traditional molecular tools like transgenic animals, gene-editing capabilities, and in vitro cell cultures. Viromes isolated from corals and compared with the predicted coral proteome resulted in 2503 coral proteins, including many proteins involved with pathogen sensing and apoptosis. There were also 159 coral proteins predicted to be involved with coral immunity but currently lacking any functional annotation. The pipeline described here provides a novel method to rapidly predict host immune components that can be applied to virtually any system with the potential to discover novel immune proteins. PMID:27581878

  10. Using viromes to predict novel immune proteins in non-model organisms

    PubMed Central

    Lim, Yan Wei; Silva, Genivaldo Gueiros Z.; Nelson, Craig E.; Haas, Andreas F.; Kelly, Linda Wegley; Edwards, Robert A.; Rohwer, Forest L.

    2016-01-01

    Immunity is mostly studied in a few model organisms, leaving the majority of immune systems on the planet unexplored. To characterize the immune systems of non-model organisms alternative approaches are required. Viruses manipulate host cell biology through the expression of proteins that modulate the immune response. We hypothesized that metagenomic sequencing of viral communities would be useful to identify both known and unknown host immune proteins. To test this hypothesis, a mock human virome was generated and compared to the human proteome using tBLASTn, resulting in 36 proteins known to be involved in immunity. This same pipeline was then applied to reef-building coral, a non-model organism that currently lacks traditional molecular tools like transgenic animals, gene-editing capabilities, and in vitro cell cultures. Viromes isolated from corals and compared with the predicted coral proteome resulted in 2503 coral proteins, including many proteins involved with pathogen sensing and apoptosis. There were also 159 coral proteins predicted to be involved with coral immunity but currently lacking any functional annotation. The pipeline described here provides a novel method to rapidly predict host immune components that can be applied to virtually any system with the potential to discover novel immune proteins. PMID:27581878

  11. A Consensus Method for the Prediction of ‘Aggregation-Prone’ Peptides in Globular Proteins

    PubMed Central

    Tsolis, Antonios C.; Papandreou, Nikos C.; Iconomidou, Vassiliki A.; Hamodrakas, Stavros J.

    2013-01-01

    The purpose of this work was to construct a consensus prediction algorithm of ‘aggregation-prone’ peptides in globular proteins, combining existing tools. This allows comparison of the different algorithms and the production of more objective and accurate results. Eleven (11) individual methods are combined and produce AMYLPRED2, a publicly, freely available web tool to academic users (http://biophysics.biol.uoa.gr/AMYLPRED2), for the consensus prediction of amyloidogenic determinants/‘aggregation-prone’ peptides in proteins, from sequence alone. The performance of AMYLPRED2 indicates that it functions better than individual aggregation-prediction algorithms, as perhaps expected. AMYLPRED2 is a useful tool for identifying amyloid-forming regions in proteins that are associated with several conformational diseases, called amyloidoses, such as Altzheimer's, Parkinson's, prion diseases and type II diabetes. It may also be useful for understanding the properties of protein folding and misfolding and for helping to the control of protein aggregation/solubility in biotechnology (recombinant proteins forming bacterial inclusion bodies) and biotherapeutics (monoclonal antibodies and biopharmaceutical proteins). PMID:23326595

  12. Prediction of bioluminescent proteins using auto covariance transformation of evolutional profiles.

    PubMed

    Zhao, Xiaowei; Li, Jiakui; Huang, Yanxin; Ma, Zhiqiang; Yin, Minghao

    2012-01-01

    Bioluminescent proteins are important for various cellular processes, such as gene expression analysis, drug discovery, bioluminescent imaging, toxicity determination, and DNA sequencing studies. Hence, the correct identification of bioluminescent proteins is of great importance both for helping genome annotation and providing a supplementary role to experimental research to obtain insight into bioluminescent proteins' functions. However, few computational methods are available for identifying bioluminescent proteins. Therefore, in this paper we develop a new method to predict bioluminescent proteins using a model based on position specific scoring matrix and auto covariance. Tested by 10-fold cross-validation and independent test, the accuracy of the proposed model reaches 85.17% for the training dataset and 90.71% for the testing dataset respectively. These results indicate that our predictor is a useful tool to predict bioluminescent proteins. This is the first study in which evolutionary information and local sequence environment information have been successfully integrated for predicting bioluminescent proteins. A web server (BLPre) that implements the proposed predictor is freely available.

  13. Drosophila mechanotransduction--linking proteins and functions.

    PubMed

    Albert, Jörg T; Nadrowski, Björn; Göpfert, Martin C

    2007-01-01

    The sensation of touch, gravity, and sound all rely on dedicated ion channels that transduce mechanical stimulus forces into electrical signals. The functional workings and molecular identities of these mechanotransducer channels are little understood. Recent work shows that the mechanotransducers for fly and vertebrate hearing share equivalent gating mechanisms, whereby this mechanism can be probed non-invasively in the mechanics of the Drosophila ear. Here, we describe how this mechanics can be used to evaluate the roles of identified proteins in the process of mechanosensation and, specifically, their contributions to mechanotransduction. PMID:18820433

  14. Functions and possible provenance of primordial proteins.

    PubMed

    Sommer, Andrei P; Miyake, Norimune; Wickramasinghe, N Chandra; Narlikar, Jayant V; Al-Mufti, Shirwan

    2004-01-01

    Nanobacteria or living nanovesicles are of great interest to the scientific community because of their dual nature: on the one hand, they appear as primal biosystems originating life; on the other hand, they can cause severe diseases. Their survival as well as their pathogenic potential is apparently linked to a self-synthesized protein-based slime, rich in calcium and phosphate (when available). Here, we provide challenging evidence for the occurrence of nanobacteria in the stratosphere, reflecting a possibly primordial provenance of the slime. An analysis of the slime's biological functions may lead to novel strategies suitable to block adhesion modalities in modern bacterial populations. PMID:15595742

  15. Predicting protein structures with a multiplayer online game.

    PubMed

    Cooper, Seth; Khatib, Firas; Treuille, Adrien; Barbero, Janos; Lee, Jeehyung; Beenen, Michael; Leaver-Fay, Andrew; Baker, David; Popović, Zoran; Players, Foldit

    2010-08-01

    People exert large amounts of problem-solving effort playing computer games. Simple image- and text-recognition tasks have been successfully 'crowd-sourced' through games, but it is not clear if more complex scientific problems can be solved with human-directed computing. Protein structure prediction is one such problem: locating the biologically relevant native conformation of a protein is a formidable computational challenge given the very large size of the search space. Here we describe Foldit, a multiplayer online game that engages non-scientists in solving hard prediction problems. Foldit players interact with protein structures using direct manipulation tools and user-friendly versions of algorithms from the Rosetta structure prediction methodology, while they compete and collaborate to optimize the computed energy. We show that top-ranked Foldit players excel at solving challenging structure refinement problems in which substantial backbone rearrangements are necessary to achieve the burial of hydrophobic residues. Players working collaboratively develop a rich assortment of new strategies and algorithms; unlike computational approaches, they explore not only the conformational space but also the space of possible search strategies. The integration of human visual problem-solving and strategy development capabilities with traditional computational algorithms through interactive multiplayer games is a powerful new approach to solving computationally-limited scientific problems.

  16. Probing High-density Functional Protein Microarrays to Detect Protein-protein Interactions.

    PubMed

    Fasolo, Joseph; Im, Hogune; Snyder, Michael P

    2015-01-01

    High-density functional protein microarrays containing ~4,200 recombinant yeast proteins are examined for kinase protein-protein interactions using an affinity purified yeast kinase fusion protein containing a V5-epitope tag for read-out. Purified kinase is obtained through culture of a yeast strain optimized for high copy protein production harboring a plasmid containing a Kinase-V5 fusion construct under a GAL inducible promoter. The yeast is grown in restrictive media with a neutral carbon source for 6 hr followed by induction with 2% galactose. Next, the culture is harvested and kinase is purified using standard affinity chromatographic techniques to obtain a highly purified protein kinase for use in the assay. The purified kinase is diluted with kinase buffer to an appropriate range for the assay and the protein microarrays are blocked prior to hybridization with the protein microarray. After the hybridization, the arrays are probed with monoclonal V5 antibody to identify proteins bound by the kinase-V5 protein. Finally, the arrays are scanned using a standard microarray scanner, and data is extracted for downstream informatics analysis to determine a high confidence set of protein interactions for downstream validation in vivo. PMID:26274875

  17. Structure and Function of Microbial Metal-Reduction Proteins

    SciTech Connect

    Xu, Ying; Crawford, Oakly H.; Xu, Dong; Larimer, Frank W.; Uberbacher, Edward C.; Zhou, Jizhong

    2009-09-02

    In this project, we proposed (i) identification of metal-reduction genes, (ii) development of new threading techniques and (iii) fold recognition and structure prediction of metal-reduction proteins. However, due to the reduction of the budget, we revised our plan to focus on two specific aims of (i) developing a new threading-based protein structure prediction method, and (ii) developing an expert system for protein structure prediction.

  18. Multi-level machine learning prediction of protein–protein interactions in Saccharomyces cerevisiae

    PubMed Central

    Zubek, Julian; Tatjewski, Marcin; Boniecki, Adam; Mnich, Maciej; Basu, Subhadip

    2015-01-01

    Accurate identification of protein–protein interactions (PPI) is the key step in understanding proteins’ biological functions, which are typically context-dependent. Many existing PPI predictors rely on aggregated features from protein sequences, however only a few methods exploit local information about specific residue contacts. In this work we present a two-stage machine learning approach for prediction of protein–protein interactions. We start with the carefully filtered data on protein complexes available for Saccharomyces cerevisiae in the Protein Data Bank (PDB) database. First, we build linear descriptions of interacting and non-interacting sequence segment pairs based on their inter-residue distances. Secondly, we train machine learning classifiers to predict binary segment interactions for any two short sequence fragments. The final prediction of the protein–protein interaction is done using the 2D matrix representation of all-against-all possible interacting sequence segments of both analysed proteins. The level-I predictor achieves 0.88 AUC for micro-scale, i.e., residue-level prediction. The level-II predictor improves the results further by a more complex learning paradigm. We perform 30-fold macro-scale, i.e., protein-level cross-validation experiment. The level-II predictor using PSIPRED-predicted secondary structure reaches 0.70 precision, 0.68 recall, and 0.70 AUC, whereas other popular methods provide results below 0.6 threshold (recall, precision, AUC). Our results demonstrate that multi-scale sequence features aggregation procedure is able to improve the machine learning results by more than 10% as compared to other sequence representations. Prepared datasets and source code for our experimental pipeline are freely available for download from: http://zubekj.github.io/mlppi/ (open source Python implementation, OS independent). PMID:26157620

  19. Predicting copper-, iron-, and zinc-binding proteins in pathogenic species of the Paracoccidioides genus

    PubMed Central

    Tristão, Gabriel B.; Assunção, Leandro do Prado; dos Santos, Luiz Paulo A.; Borges, Clayton L.; Silva-Bailão, Mirelle Garcia; Soares, Célia M. de Almeida; Cavallaro, Gabriele; Bailão, Alexandre M.

    2015-01-01

    Approximately one-third of all proteins have been estimated to contain at least one metal cofactor, and these proteins are referred to as metalloproteins. These represent one of the most diverse classes of proteins, containing metal ions that bind to specific sites to perform catalytic, regulatory and structural functions. Bioinformatic tools have been developed to predict metalloproteins encoded by an organism based only on its genome sequence. Its function and the type of metal binder can also be predicted via a bioinformatics approach. Paracoccidioides complex includes termodimorphic pathogenic fungi that are found as saprobic mycelia in the environment and as yeast, the parasitic form, in host tissues. They are the etiologic agents of Paracoccidioidomycosis, a prevalent systemic mycosis in Latin America. Many metalloproteins are important for the virulence of several pathogenic microorganisms. Accordingly, the present work aimed to predict the copper, iron and zinc proteins encoded by the genomes of three phylogenetic species of Paracoccidioides (Pb01, Pb03, and Pb18). The metalloproteins were identified using bioinformatics approaches based on structure, annotation and domains. Cu-, Fe-, and Zn-binding proteins represent 7% of the total proteins encoded by Paracoccidioides spp. genomes. Zinc proteins were the most abundant metalloproteins, representing 5.7% of the fungus proteome, whereas copper and iron proteins represent 0.3 and 1.2%, respectively. Functional classification revealed that metalloproteins are related to many cellular processes. Furthermore, it was observed that many of these metalloproteins serve as virulence factors in the biology of the fungus. Thus, it is concluded that the Cu, Fe, and Zn metalloproteomes of the Paracoccidioides spp. are of the utmost importance for the biology and virulence of these particular human pathogens. PMID:25620964

  20. Predicting oligonucleotide-directed mutagenesis failures in protein engineering.

    PubMed

    Wassman, Christopher D; Tam, Phillip Y; Lathrop, Richard H; Weiss, Gregory A

    2004-01-01

    Protein engineering uses oligonucleotide-directed mutagenesis to modify DNA sequences through a two-step process of hybridization and enzymatic synthesis. Inefficient reactions confound attempts to introduce mutations, especially for the construction of vast combinatorial protein libraries. This paper applied computational approaches to the problem of inefficient mutagenesis. Several results implicated oligonucleotide annealing to non-target sites, termed 'cross-hybridization', as a significant contributor to mutagenesis reaction failures. Test oligonucleotides demonstrated control over reaction outcomes. A novel cross-hybridization score, quickly computable for any plasmid and oligonucleotide mixture, directly correlated with yields of deleterious mutagenesis side products. Cross-hybridization was confirmed conclusively by partial incorporation of an oligonucleotide at a predicted cross-hybridization site, and by modification of putative template secondary structure to control cross-hybridization. Even in low concentrations, cross-hybridizing species in mixtures poisoned reactions. These results provide a basis for improved mutagenesis efficiencies and increased diversities of cognate protein libraries.

  1. Turn prediction in proteins using a pattern-matching approach.

    PubMed

    Cohen, F E; Abarbanel, R M; Kuntz, I D; Fletterick, R J

    1986-01-14

    We extend the use of amino acid sequence patterns [Cohen, F.E., Abarbanel, R. M., Kuntz, I. D., & Fletterick, R. J. (1983) Biochemistry 22, 4894-4904] to the identification of turns in globular proteins. The approach uses a conservative strategy, combined with a hierarchical search (strongest patterns first) and length-dependent masking, to achieve high accuracy (95%) on a test set of proteins of known structure. Applying the same procedure to homologous families gives a 90% success rate. Straightforward changes are suggested to improve the predictive power. The computer program, written in Lisp, provides a general pattern-recognition language well suited for a number of investigations of protein and nucleic acid sequences. PMID:3754149

  2. Predicting protein concentrations with ELISA microarray assays, monotonic splines and Monte Carlo simulation

    SciTech Connect

    Daly, Don S.; Anderson, Kevin K.; White, Amanda M.; Gonzalez, Rachel M.; Varnum, Susan M.; Zangar, Richard C.

    2008-07-14

    Background: A microarray of enzyme-linked immunosorbent assays, or ELISA microarray, predicts simultaneously the concentrations of numerous proteins in a small sample. These predictions, however, are uncertain due to processing error and biological variability. Making sound biological inferences as well as improving the ELISA microarray process require require both concentration predictions and creditable estimates of their errors. Methods: We present a statistical method based on monotonic spline statistical models, penalized constrained least squares fitting (PCLS) and Monte Carlo simulation (MC) to predict concentrations and estimate prediction errors in ELISA microarray. PCLS restrains the flexible spline to a fit of assay intensity that is a monotone function of protein concentration. With MC, both modeling and measurement errors are combined to estimate prediction error. The spline/PCLS/MC method is compared to a common method using simulated and real ELISA microarray data sets. Results: In contrast to the rigid logistic model, the flexible spline model gave credible fits in almost all test cases including troublesome cases with left and/or right censoring, or other asymmetries. For the real data sets, 61% of the spline predictions were more accurate than their comparable logistic predictions; especially the spline predictions at the extremes of the prediction curve. The relative errors of 50% of comparable spline and logistic predictions differed by less than 20%. Monte Carlo simulation rendered acceptable asymmetric prediction intervals for both spline and logistic models while propagation of error produced symmetric intervals that diverged unrealistically as the standard curves approached horizontal asymptotes. Conclusions: The spline/PCLS/MC method is a flexible, robust alternative to a logistic/NLS/propagation-of-error method to reliably predict protein concentrations and estimate their errors. The spline method simplifies model selection and fitting

  3. A threading approach to protein structure prediction: Studies on TNF-like molecules, Rev proteins, and protein kinases

    NASA Astrophysics Data System (ADS)

    Ihm, Yungok

    The main focus of this dissertation is the application of the threading approach to specific biological problems. The threading scheme developed in our group targets incorporating important structural features necessary for detecting structural similarity between the target sequence and the template structure. This enables us to use our threading method to solve problems for which sequence-based methods are not very much useful. We applied our threading method to predict the three-dimensional structures of lentivirus (EIAV, HIV-1, FIV, SIV) Rev proteins. Predicted structures of Rev proteins suggest that they share a structural similarity among themselves (four-helix bundle). Also, the threading approach has been utilized for screening for potential TNF-like molecules in Arabidopsis. The threading approach identified 35 potential TNF-like proteins in Arabidopsis, six of which are particularly interesting to be tested for the receptor kinase ligand activity. Threading method has also been used to identify potentially new protein kinases, which are not included in the protein kinase data base of C. elegans and Arabidopis. We identified eleven potentially new protein kinases and an additional protein worth investigating for protein kinase activity in C. elegans. Further, we identified ten potentially new protein kinases and additional four proteins worth investigating for the protein kinase activity in Arabidopsis.

  4. Protein-spanning water networks and implications for prediction of protein-protein interactions mediated through hydrophobic effects.

    PubMed

    Cui, Di; Ou, Shuching; Patel, Sandeep

    2014-12-01

    Hydrophobic effects, often conflated with hydrophobic forces, are implicated as major determinants in biological association and self-assembly processes. Protein-protein interactions involved in signaling pathways in living systems are a prime example where hydrophobic effects have profound implications. In the context of protein-protein interactions, a priori knowledge of relevant binding interfaces (i.e., clusters of residues involved directly with binding interactions) is difficult. In the case of hydrophobically mediated interactions, use of hydropathy-based methods relying on single residue hydrophobicity properties are routinely and widely used to predict propensities for such residues to be present in hydrophobic interfaces. However, recent studies suggest that consideration of hydrophobicity for single residues on a protein surface require accounting of the local environment dictated by neighboring residues and local water. In this study, we use a method derived from percolation theory to evaluate spanning water networks in the first hydration shells of a series of small proteins. We use residue-based water density and single-linkage clustering methods to predict hydrophobic regions of proteins; these regions are putatively involved in binding interactions. We find that this simple method is able to predict with sufficient accuracy and coverage the binding interface residues of a series of proteins. The approach is competitive with automated servers. The results of this study highlight the importance of accounting of local environment in determining the hydrophobic nature of individual residues on protein surfaces.

  5. Gesture Performance in Schizophrenia Predicts Functional Outcome After 6 Months

    PubMed Central

    Walther, Sebastian; Eisenhardt, Sarah; Bohlhalter, Stephan; Vanbellingen, Tim; Müri, René; Strik, Werner; Stegmayer, Katharina

    2016-01-01

    The functional outcome of schizophrenia is heterogeneous and markers of the course are missing. Functional outcome is associated with social cognition and negative symptoms. Gesture performance and nonverbal social perception are critically impaired in schizophrenia. Here, we tested whether gesture performance or nonverbal social perception could predict functional outcome and the ability to adequately perform relevant skills of everyday function (functional capacity) after 6 months. In a naturalistic longitudinal study, 28 patients with schizophrenia completed tests of nonverbal communication at baseline and follow-up. In addition, functional outcome, social and occupational functioning, as well as functional capacity at follow-up were assessed. Gesture performance and nonverbal social perception at baseline predicted negative symptoms, functional outcome, and functional capacity at 6-month follow-up. Gesture performance predicted functional outcome beyond the baseline measure of functioning. Patients with gesture deficits at baseline had stable negative symptoms and experienced a decline in social functioning. While in patients without gesture deficits, negative symptom severity decreased and social functioning remained stable. Thus, a simple test of hand gesture performance at baseline may indicate favorable outcomes in short-term follow-up. The results further support the importance of nonverbal communication skills in subjects with schizophrenia. PMID:27566843

  6. Multiple functions of microsomal triglyceride transfer protein

    PubMed Central

    2012-01-01

    Microsomal triglyceride transfer protein (MTP) was first identified as a major cellular protein capable of transferring neutral lipids between membrane vesicles. Its role as an essential chaperone for the biosynthesis of apolipoprotein B (apoB)-containing triglyceride-rich lipoproteins was established after the realization that abetalipoproteinemia patients carry mutations in the MTTP gene resulting in the loss of its lipid transfer activity. Now it is known that it also plays a role in the biosynthesis of CD1, glycolipid presenting molecules, as well as in the regulation of cholesterol ester biosynthesis. In this review, we will provide a historical perspective about the identification, purification and characterization of MTP, describe methods used to measure its lipid transfer activity, and discuss tissue expression and function. Finally, we will review the role MTP plays in the assembly of apoB-lipoprotein, the regulation of cholesterol ester synthesis, biosynthesis of CD1 proteins and propagation of hepatitis C virus. We will also provide a brief overview about the clinical potentials of MTP inhibition. PMID:22353470

  7. Dopamine neurons share common response function for reward prediction error

    PubMed Central

    Eshel, Neir; Tian, Ju; Bukwich, Michael; Uchida, Naoshige

    2016-01-01

    Dopamine neurons are thought to signal reward prediction error, or the difference between actual and predicted reward. How dopamine neurons jointly encode this information, however, remains unclear. One possibility is that different neurons specialize in different aspects of prediction error; another is that each neuron calculates prediction error in the same way. We recorded from optogenetically-identified dopamine neurons in the lateral ventral tegmental area (VTA) while mice performed classical conditioning tasks. Our tasks allowed us to determine the full prediction error functions of dopamine neurons and compare them to each other. We found striking homogeneity among individual dopamine neurons: their responses to both unexpected and expected rewards followed the same function, just scaled up or down. As a result, we could describe both individual and population responses using just two parameters. Such uniformity ensures robust information coding, allowing each dopamine neuron to contribute fully to the prediction error signal. PMID:26854803

  8. Dopamine neurons share common response function for reward prediction error.

    PubMed

    Eshel, Neir; Tian, Ju; Bukwich, Michael; Uchida, Naoshige

    2016-03-01

    Dopamine neurons are thought to signal reward prediction error, or the difference between actual and predicted reward. How dopamine neurons jointly encode this information, however, remains unclear. One possibility is that different neurons specialize in different aspects of prediction error; another is that each neuron calculates prediction error in the same way. We recorded from optogenetically identified dopamine neurons in the lateral ventral tegmental area (VTA) while mice performed classical conditioning tasks. Our tasks allowed us to determine the full prediction error functions of dopamine neurons and compare them to each other. We found marked homogeneity among individual dopamine neurons: their responses to both unexpected and expected rewards followed the same function, just scaled up or down. As a result, we were able to describe both individual and population responses using just two parameters. Such uniformity ensures robust information coding, allowing each dopamine neuron to contribute fully to the prediction error signal. PMID:26854803

  9. Dopamine neurons share common response function for reward prediction error.

    PubMed

    Eshel, Neir; Tian, Ju; Bukwich, Michael; Uchida, Naoshige

    2016-03-01

    Dopamine neurons are thought to signal reward prediction error, or the difference between actual and predicted reward. How dopamine neurons jointly encode this information, however, remains unclear. One possibility is that different neurons specialize in different aspects of prediction error; another is that each neuron calculates prediction error in the same way. We recorded from optogenetically identified dopamine neurons in the lateral ventral tegmental area (VTA) while mice performed classical conditioning tasks. Our tasks allowed us to determine the full prediction error functions of dopamine neurons and compare them to each other. We found marked homogeneity among individual dopamine neurons: their responses to both unexpected and expected rewards followed the same function, just scaled up or down. As a result, we were able to describe both individual and population responses using just two parameters. Such uniformity ensures robust information coding, allowing each dopamine neuron to contribute fully to the prediction error signal.

  10. Protein structure prediction with local adjust tabu search algorithm

    PubMed Central

    2014-01-01

    Background Protein folding structure prediction is one of the most challenging problems in the bioinformatics domain. Because of the complexity of the realistic protein structure, the simplified structure model and the computational method should be adopted in the research. The AB off-lattice model is one of the simplification models, which only considers two classes of amino acids, hydrophobic (A) residues and hydrophilic (B) residues. Results The main work of this paper is to discuss how to optimize the lowest energy configurations in 2D off-lattice model and 3D off-lattice model by using Fibonacci sequences and real protein sequences. In order to avoid falling into local minimum and faster convergence to the global minimum, we introduce a novel method (SATS) to the protein structure problem, which combines simulated annealing algorithm and tabu search algorithm. Various strategies, such as the new encoding strategy, the adaptive neighborhood generation strategy and the local adjustment strategy, are adopted successfully for high-speed searching the optimal conformation corresponds to the lowest energy of the protein sequences. Experimental results show that some of the results obtained by the improved SATS are better than those reported in previous literatures, and we can sure that the lowest energy folding state for short Fibonacci sequences have been found. Conclusions Although the off-lattice models is not very realistic, they can reflect some important characteristics of the realistic protein. It can be found that 3D off-lattice model is more like native folding structure of the realistic protein than 2D off-lattice model. In addition, compared with some previous researches, the proposed hybrid algorithm can more effectively and more quickly search the spatial folding structure of a protein chain. PMID:25474708

  11. Effects of ozone on functional properties of proteins.

    PubMed

    Uzun, Hicran; Ibanoglu, Esra; Catal, Hatice; Ibanoglu, Senol

    2012-09-15

    The present study investigates whether the ozone treatment could be an alternative to improve some functional properties of proteins. Ozone treatment was applied on whey protein isolate and egg white proteins which have been extensively used in food products to improve textural, functional and sensory attributes. Ozone treatment of proteins was performed either in aqueous solutions or as gas ozonation of pure protein powders. Foam formation and foam stability of proteins were enhanced extensively. The solubility of proteins were reduced as influenced from the aqueous and gas ozonation medium. The reduction was more pronounced in egg white proteins. Ozone treatment affected emulsion activity of whey protein isolate negatively and reduced the emulsion stability.

  12. Quality Assessment of Predicted Protein Models Using Energies Calculated by the Fragment Molecular Orbital Method.

    PubMed

    Simoncini, David; Nakata, Hiroya; Ogata, Koji; Nakamura, Shinichiro; Zhang, Kam Yj

    2015-02-01

    Protein structure prediction directly from sequences is a very challenging problem in computational biology. One of the most successful approaches employs stochastic conformational sampling to search an empirically derived energy function landscape for the global energy minimum state. Due to the errors in the empirically derived energy function, the lowest energy conformation may not be the best model. We have evaluated the use of energy calculated by the fragment molecular orbital method (FMO energy) to assess the quality of predicted models and its ability to identify the best model among an ensemble of predicted models. The fragment molecular orbital method implemented in GAMESS was used to calculate the FMO energy of predicted models. When tested on eight protein targets, we found that the model ranking based on FMO energies is better than that based on empirically derived energies when there is sufficient diversity among these models. This model diversity can be estimated prior to the FMO energy calculations. Our result demonstrates that the FMO energy calculated by the fragment molecular orbital method is a practical and promising measure for the assessment of protein model quality and the selection of the best protein model among many generated.

  13. Green fluorescent protein nanopolygons as monodisperse supramolecular assemblies of functional proteins with defined valency

    NASA Astrophysics Data System (ADS)

    Kim, Young Eun; Kim, Yu-Na; Kim, Jung A.; Kim, Ho Min; Jung, Yongwon

    2015-05-01

    Supramolecular protein assemblies offer novel nanoscale architectures with molecular precision and unparalleled functional diversity. A key challenge, however, is to create precise nano-assemblies of functional proteins with both defined structures and a controlled number of protein-building blocks. Here we report a series of supramolecular green fluorescent protein oligomers that are assembled in precise polygonal geometries and prepared in a monodisperse population. Green fluorescent protein is engineered to be self-assembled in cells into oligomeric assemblies that are natively separated in a single-protein resolution by surface charge manipulation, affording monodisperse protein (nano)polygons from dimer to decamer. Several functional proteins are multivalently displayed on the oligomers with controlled orientations. Spatial arrangements of protein oligomers and displayed functional proteins are directly visualized by a transmission electron microscope. By employing our functional protein assemblies, we provide experimental insight into multivalent protein-protein interactions and tools to manipulate receptor clustering on live cell surfaces.

  14. Prediction of protein orientation upon immobilization on biological and nonbiological surfaces

    NASA Astrophysics Data System (ADS)

    Talasaz, Amirali H.; Nemat-Gorgani, Mohsen; Liu, Yang; Ståhl, Patrik; Dutton, Robert W.; Ronaghi, Mostafa; Davis, Ronald W.

    2006-10-01

    We report on a rapid simulation method for predicting protein orientation on a surface based on electrostatic interactions. New methods for predicting protein immobilization are needed because of the increasing use of biosensors and protein microarrays, two technologies that use protein immobilization onto a solid support, and because the orientation of an immobilized protein is important for its function. The proposed simulation model is based on the premise that the protein interacts with the electric field generated by the surface, and this interaction defines the orientation of attachment. Results of this model are in agreement with experimental observations of immobilization of mitochondrial creatine kinase and type I hexokinase on biological membranes. The advantages of our method are that it can be applied to any protein with a known structure; it does not require modeling of the surface at atomic resolution and can be run relatively quickly on readily available computing resources. Finally, we also propose an orientation of membrane-bound cytochrome c, a protein for which the membrane orientation has not been unequivocally determined. electric double layer | electrostatic simulations | orientation flexibility

  15. Optimizing Scoring Function of Protein-Nucleic Acid Interactions with Both Affinity and Specificity

    PubMed Central

    Yan, Zhiqiang; Wang, Jin

    2013-01-01

    Protein-nucleic acid (protein-DNA and protein-RNA) recognition is fundamental to the regulation of gene expression. Determination of the structures of the protein-nucleic acid recognition and insight into their interactions at molecular level are vital to understanding the regulation function. Recently, quantitative computational approach has been becoming an alternative of experimental technique for predicting the structures and interactions of biomolecular recognition. However, the progress of protein-nucleic acid structure prediction, especially protein-RNA, is far behind that of the protein-ligand and protein-protein structure predictions due to the lack of reliable and accurate scoring function for quantifying the protein-nucleic acid interactions. In this work, we developed an accurate scoring function (named as SPA-PN, SPecificity and Affinity of the Protein-Nucleic acid interactions) for protein-nucleic acid interactions by incorporating both the specificity and affinity into the optimization strategy. Specificity and affinity are two requirements of highly efficient and specific biomolecular recognition. Previous quantitative descriptions of the biomolecular interactions considered the affinity, but often ignored the specificity owing to the challenge of specificity quantification. We applied our concept of intrinsic specificity to connect the conventional specificity, which circumvents the challenge of specificity quantification. In addition to the affinity optimization, we incorporated the quantified intrinsic specificity into the optimization strategy of SPA-PN. The testing results and comparisons with other scoring functions validated that SPA-PN performs well on both the prediction of binding affinity and identification of native conformation. In terms of its performance, SPA-PN can be widely used to predict the protein-nucleic acid structures and quantify their interactions. PMID:24098651

  16. DING proteins: numerous functions, elusive genes, a potential for health.

    PubMed

    Bernier, François

    2013-09-01

    DING proteins, named after their conserved N-terminus, form an overlooked protein family whose members were generally discovered through serendipity. It is characterized by an unusually high sequence conservation, even between distantly related species, and by an outstanding diversity of activities and ligands. They all share a demonstrated capacity to bind phosphate with high affinity or at least a predicted phosphate-binding site. However, DING protein genes are conspicuously absent from databases. The many novel family members identified in recent years have confirmed that DING proteins are ubiquitous not only in animals and plants but probably also in prokaryotes. At the functional level, there is increasing evidence that they participate in many health-related processes such as cancers as well as bacterial (Pseudomonas) and viral (HIV) infections, by mechanisms that are now beginning to be understood. They thus represent potent targets for the development of novel therapeutic approaches, especially against HIV. The few genomic sequences that are now available are starting to give some clues on why DING protein genes and mRNAs are well conserved and difficult to clone. This could open a new era of research, of both fundamental and applied importance. PMID:23743708

  17. Predicting protein ligand binding sites by combining evolutionary sequence conservation and 3D structure.

    PubMed

    Capra, John A; Laskowski, Roman A; Thornton, Janet M; Singh, Mona; Funkhouser, Thomas A

    2009-12-01

    Identifying a protein's functional sites is an important step towards characterizing its molecular function. Numerous structure- and sequence-based methods have been developed for this problem. Here we introduce ConCavity, a small molecule binding site prediction algorithm that integrates evolutionary sequence conservation estimates with structure-based methods for identifying protein surface cavities. In large-scale testing on a diverse set of single- and multi-chain protein structures, we show that ConCavity substantially outperforms existing methods for identifying both 3D ligand binding pockets and individual ligand binding residues. As part of our testing, we perform one of the first direct comparisons of conservation-based and structure-based methods. We find that the two approaches provide largely complementary information, which can be combined to improve upon either approach alone. We also demonstrate that ConCavity has state-of-the-art performance in predicting catalytic sites and drug binding pockets. Overall, the algorithms and analysis presented here significantly improve our ability to identify ligand binding sites and further advance our understanding of the relationship between evolutionary sequence conservation and structural and functional attributes of proteins. Data, source code, and prediction visualizations are available on the ConCavity web site (http://compbio.cs.princeton.edu/concavity/).

  18. Mapping Plant Interactomes Using Literature Curated and Predicted Protein–Protein Interaction Data Sets[W

    PubMed Central

    Lee, KiYoung; Thorneycroft, David; Achuthan, Premanand; Hermjakob, Henning; Ideker, Trey

    2010-01-01

    Most cellular processes are enabled by cohorts of interacting proteins that form dynamic networks within the plant proteome. The study of these networks can provide insight into protein function and provide new avenues for research. This article informs the plant science community of the currently available sources of protein interaction data and discusses how they can be useful to researchers. Using our recently curated IntAct Arabidopsis thaliana protein–protein interaction data set as an example, we discuss potentials and limitations of the plant interactomes generated to date. In addition, we present our efforts to add value to the interaction data by using them to seed a proteome-wide map of predicted protein subcellular locations. PMID:20371643

  19. Stringent DDI-based Prediction of H. sapiens-M. tuberculosis H37Rv Protein-Protein Interactions

    PubMed Central

    2013-01-01

    Background H. sapiens-M. tuberculosis H37Rv protein-protein interaction (PPI) data are very important information to illuminate the infection mechanism of M. tuberculosis H37Rv. But current H. sapiens-M. tuberculosis H37Rv PPI data are very scarce. This seriously limits the study of the interaction between this important pathogen and its host H. sapiens. Computational prediction of H. sapiens-M. tuberculosis H37Rv PPIs is an important strategy to fill in the gap. Domain-domain interaction (DDI) based prediction is one of the frequently used computational approaches in predicting both intra-species and inter-species PPIs. However, the performance of DDI-based host-pathogen PPI prediction has been rather limited. Results We develop a stringent DDI-based prediction approach with emphasis on (i) differences between the specific domain sequences on annotated regions of proteins under the same domain ID and (ii) calculation of the interaction strength of predicted PPIs based on the interacting residues in their interaction interfaces. We compare our stringent DDI-based approach to a conventional DDI-based approach for predicting PPIs based on gold standard intra-species PPIs and coherent informative Gene Ontology terms assessment. The assessment results show that our stringent DDI-based approach achieves much better performance in predicting PPIs than the conventional approach. Using our stringent DDI-based approach, we have predicted a small set of reliable H. sapiens-M. tuberculosis H37Rv PPIs which could be very useful for a variety of related studies. We also analyze the H. sapiens-M. tuberculosis H37Rv PPIs predicted by our stringent DDI-based approach using cellular compartment distribution analysis, functional category enrichment analysis and pathway enrichment analysis. The analyses support the validity of our prediction result. Also, based on an analysis of the H. sapiens-M. tuberculosis H37Rv PPI network predicted by our stringent DDI-based approach, we have

  20. CMsearch: simultaneous exploration of protein sequence space and structure space improves not only protein homology detection but also protein structure prediction

    PubMed Central

    Cui, Xuefeng; Lu, Zhiwu; Wang, Sheng; Jing-Yan Wang, Jim; Gao, Xin

    2016-01-01

    Motivation: Protein homology detection, a fundamental problem in computational biology, is an indispensable step toward predicting protein structures and understanding protein functions. Despite the advances in recent decades on sequence alignment, threading and alignment-free methods, protein homology detection remains a challenging open problem. Recently, network methods that try to find transitive paths in the protein structure space demonstrate the importance of incorporating network information of the structure space. Yet, current methods merge the sequence space and the structure space into a single space, and thus introduce inconsistency in combining different sources of information. Method: We present a novel network-based protein homology detection method, CMsearch, based on cross-modal learning. Instead of exploring a single network built from the mixture of sequence and structure space information, CMsearch builds two separate networks to represent the sequence space and the structure space. It then learns sequence–structure correlation by simultaneously taking sequence information, structure information, sequence space information and structure space information into consideration. Results: We tested CMsearch on two challenging tasks, protein homology detection and protein structure prediction, by querying all 8332 PDB40 proteins. Our results demonstrate that CMsearch is insensitive to the similarity metrics used to define the sequence and the structure spaces. By using HMM–HMM alignment as the sequence similarity metric, CMsearch clearly outperforms state-of-the-art homology detection methods and the CASP-winning template-based protein structure prediction methods. Availability and implementation: Our program is freely available for download from http://sfb.kaust.edu.sa/Pages/Software.aspx. Contact: xin.gao@kaust.edu.sa Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27307635

  1. Characterization of the functional properties of carob germ proteins

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Proteins from the carob germ were identified as having gluten-like proteins in 1935. While some biochemical characterization of carob germ proteins and their functionality has been carried out, relatively little has been done when compared to proteins such as gluten. Carob germ proteins were separ...

  2. Novel semantic similarity measure improves an integrative approach to predicting gene functional associations

    PubMed Central

    2013-01-01

    Background Elucidation of the direct/indirect protein interactions and gene associations is required to fully understand the workings of the cell. This can be achieved through the use of both low- and high-throughput biological experiments and in silico methods. We present GAP (Gene functional Association Predictor), an integrative method for predicting and characterizing gene functional associations. GAP integrates different biological features using a novel taxonomy-based semantic similarity measure in predicting and prioritizing high-quality putative gene associations. The proposed similarity measure increases information gain from the available gene annotations. The annotation information is incorporated from several public pathway databases, Gene Ontology annotations as well as drug and disease associations from the scientific literature. Results We evaluated GAP by comparing its prediction performance with several other well-known functional interaction prediction tools over a comprehensive dataset of known direct and indirect interactions, and observed significantly better prediction performance. We also selected a small set of GAP’s highly-scored novel predicted pairs (i.e., currently not found in any known database or dataset), and by manually searching the literature for experimental evidence accessible in the public domain, we confirmed different categories of predicted functional associations with available evidence of interaction. We also provided extra supporting evidence for subset of the predicted</