Structure based alignment and clustering of proteins (STRALCP)
Zemla, Adam T.; Zhou, Carol E.; Smith, Jason R.; Lam, Marisa W.
2013-06-18
Disclosed are computational methods of clustering a set of protein structures based on local and pair-wise global similarity values. Pair-wise local and global similarity values are generated based on pair-wise structural alignments for each protein in the set of protein structures. Initially, the protein structures are clustered based on pair-wise local similarity values. The protein structures are then clustered based on pair-wise global similarity values. For each given cluster both a representative structure and spans of conserved residues are identified. The representative protein structure is used to assign newly-solved protein structures to a group. The spans are used to characterize conservation and assign a "structural footprint" to the cluster.
2014-01-01
Background Due to rapid sequencing of genomes, there are now millions of deposited protein sequences with no known function. Fast sequence-based comparisons allow detecting close homologs for a protein of interest to transfer functional information from the homologs to the given protein. Sequence-based comparison cannot detect remote homologs, in which evolution has adjusted the sequence while largely preserving structure. Structure-based comparisons can detect remote homologs but most methods for doing so are too expensive to apply at a large scale over structural databases of proteins. Recently, fragment-based structural representations have been proposed that allow fast detection of remote homologs with reasonable accuracy. These representations have also been used to obtain linearly-reducible maps of protein structure space. It has been shown, as additionally supported from analysis in this paper that such maps preserve functional co-localization of the protein structure space. Methods Inspired by a recent application of the Latent Dirichlet Allocation (LDA) model for conducting structural comparisons of proteins, we propose higher-order LDA-obtained topic-based representations of protein structures to provide an alternative route for remote homology detection and organization of the protein structure space in few dimensions. Various techniques based on natural language processing are proposed and employed to aid the analysis of topics in the protein structure domain. Results We show that a topic-based representation is just as effective as a fragment-based one at automated detection of remote homologs and organization of protein structure space. We conduct a detailed analysis of the information content in the topic-based representation, showing that topics have semantic meaning. The fragment-based and topic-based representations are also shown to allow prediction of superfamily membership. Conclusions This work opens exciting venues in designing novel representations to extract information about protein structures, as well as organizing and mining protein structure space with mature text mining tools. PMID:25080993
Mobilio, Dominick; Walker, Gary; Brooijmans, Natasja; Nilakantan, Ramaswamy; Denny, R Aldrin; Dejoannis, Jason; Feyfant, Eric; Kowticwar, Rupesh K; Mankala, Jyoti; Palli, Satish; Punyamantula, Sairam; Tatipally, Maneesh; John, Reji K; Humblet, Christine
2010-08-01
The Protein Data Bank is the most comprehensive source of experimental macromolecular structures. It can, however, be difficult at times to locate relevant structures with the Protein Data Bank search interface. This is particularly true when searching for complexes containing specific interactions between protein and ligand atoms. Moreover, searching within a family of proteins can be tedious. For example, one cannot search for some conserved residue as residue numbers vary across structures. We describe herein three databases, Protein Relational Database, Kinase Knowledge Base, and Matrix Metalloproteinase Knowledge Base, containing protein structures from the Protein Data Bank. In Protein Relational Database, atom-atom distances between protein and ligand have been precalculated allowing for millisecond retrieval based on atom identity and distance constraints. Ring centroids, centroid-centroid and centroid-atom distances and angles have also been included permitting queries for pi-stacking interactions and other structural motifs involving rings. Other geometric features can be searched through the inclusion of residue pair and triplet distances. In Kinase Knowledge Base and Matrix Metalloproteinase Knowledge Base, the catalytic domains have been aligned into common residue numbering schemes. Thus, by searching across Protein Relational Database and Kinase Knowledge Base, one can easily retrieve structures wherein, for example, a ligand of interest is making contact with the gatekeeper residue.
Loving, Kathryn A.; Lin, Andy; Cheng, Alan C.
2014-01-01
Advances reported over the last few years and the increasing availability of protein crystal structure data have greatly improved structure-based druggability approaches. However, in practice, nearly all druggability estimation methods are applied to protein crystal structures as rigid proteins, with protein flexibility often not directly addressed. The inclusion of protein flexibility is important in correctly identifying the druggability of pockets that would be missed by methods based solely on the rigid crystal structure. These include cryptic pockets and flexible pockets often found at protein-protein interaction interfaces. Here, we apply an approach that uses protein modeling in concert with druggability estimation to account for light protein backbone movement and protein side-chain flexibility in protein binding sites. We assess the advantages and limitations of this approach on widely-used protein druggability sets. Applying the approach to all mammalian protein crystal structures in the PDB results in identification of 69 proteins with potential druggable cryptic pockets. PMID:25079060
Template-based structure modeling of protein-protein interactions
Szilagyi, Andras; Zhang, Yang
2014-01-01
The structure of protein-protein complexes can be constructed by using the known structure of other protein complexes as a template. The complex structure templates are generally detected either by homology-based sequence alignments or, given the structure of monomer components, by structure-based comparisons. Critical improvements have been made in recent years by utilizing interface recognition and by recombining monomer and complex template libraries. Encouraging progress has also been witnessed in genome-wide applications of template-based modeling, with modeling accuracy comparable to high-throughput experimental data. Nevertheless, bottlenecks exist due to the incompleteness of the proteinprotein complex structure library and the lack of methods for distant homologous template identification and full-length complex structure refinement. PMID:24721449
An Evolution-Based Approach to De Novo Protein Design and Case Study on Mycobacterium tuberculosis
Brender, Jeffrey R.; Czajka, Jeff; Marsh, David; Gray, Felicia; Cierpicki, Tomasz; Zhang, Yang
2013-01-01
Computational protein design is a reverse procedure of protein folding and structure prediction, where constructing structures from evolutionarily related proteins has been demonstrated to be the most reliable method for protein 3-dimensional structure prediction. Following this spirit, we developed a novel method to design new protein sequences based on evolutionarily related protein families. For a given target structure, a set of proteins having similar fold are identified from the PDB library by structural alignments. A structural profile is then constructed from the protein templates and used to guide the conformational search of amino acid sequence space, where physicochemical packing is accommodated by single-sequence based solvation, torsion angle, and secondary structure predictions. The method was tested on a computational folding experiment based on a large set of 87 protein structures covering different fold classes, which showed that the evolution-based design significantly enhances the foldability and biological functionality of the designed sequences compared to the traditional physics-based force field methods. Without using homologous proteins, the designed sequences can be folded with an average root-mean-square-deviation of 2.1 Å to the target. As a case study, the method is extended to redesign all 243 structurally resolved proteins in the pathogenic bacteria Mycobacterium tuberculosis, which is the second leading cause of death from infectious disease. On a smaller scale, five sequences were randomly selected from the design pool and subjected to experimental validation. The results showed that all the designed proteins are soluble with distinct secondary structure and three have well ordered tertiary structure, as demonstrated by circular dichroism and NMR spectroscopy. Together, these results demonstrate a new avenue in computational protein design that uses knowledge of evolutionary conservation from protein structural families to engineer new protein molecules of improved fold stability and biological functionality. PMID:24204234
A Template-Based Protein Structure Reconstruction Method Using Deep Autoencoder Learning.
Li, Haiou; Lyu, Qiang; Cheng, Jianlin
2016-12-01
Protein structure prediction is an important problem in computational biology, and is widely applied to various biomedical problems such as protein function study, protein design, and drug design. In this work, we developed a novel deep learning approach based on a deeply stacked denoising autoencoder for protein structure reconstruction. We applied our approach to a template-based protein structure prediction using only the 3D structural coordinates of homologous template proteins as input. The templates were identified for a target protein by a PSI-BLAST search. 3DRobot (a program that automatically generates diverse and well-packed protein structure decoys) was used to generate initial decoy models for the target from the templates. A stacked denoising autoencoder was trained on the decoys to obtain a deep learning model for the target protein. The trained deep model was then used to reconstruct the final structural model for the target sequence. With target proteins that have highly similar template proteins as benchmarks, the GDT-TS score of the predicted structures is greater than 0.7, suggesting that the deep autoencoder is a promising method for protein structure reconstruction.
Sequence-similar, structure-dissimilar protein pairs in the PDB.
Kosloff, Mickey; Kolodny, Rachel
2008-05-01
It is often assumed that in the Protein Data Bank (PDB), two proteins with similar sequences will also have similar structures. Accordingly, it has proved useful to develop subsets of the PDB from which "redundant" structures have been removed, based on a sequence-based criterion for similarity. Similarly, when predicting protein structure using homology modeling, if a template structure for modeling a target sequence is selected by sequence alone, this implicitly assumes that all sequence-similar templates are equivalent. Here, we show that this assumption is often not correct and that standard approaches to create subsets of the PDB can lead to the loss of structurally and functionally important information. We have carried out sequence-based structural superpositions and geometry-based structural alignments of a large number of protein pairs to determine the extent to which sequence similarity ensures structural similarity. We find many examples where two proteins that are similar in sequence have structures that differ significantly from one another. The source of the structural differences usually has a functional basis. The number of such proteins pairs that are identified and the magnitude of the dissimilarity depend on the approach that is used to calculate the differences; in particular sequence-based structure superpositioning will identify a larger number of structurally dissimilar pairs than geometry-based structural alignments. When two sequences can be aligned in a statistically meaningful way, sequence-based structural superpositioning provides a meaningful measure of structural differences. This approach and geometry-based structure alignments reveal somewhat different information and one or the other might be preferable in a given application. Our results suggest that in some cases, notably homology modeling, the common use of nonredundant datasets, culled from the PDB based on sequence, may mask important structural and functional information. We have established a data base of sequence-similar, structurally dissimilar protein pairs that will help address this problem (http://luna.bioc.columbia.edu/rachel/seqsimstrdiff.htm).
Structure-sequence based analysis for identification of conserved regions in proteins
Zemla, Adam T; Zhou, Carol E; Lam, Marisa W; Smith, Jason R; Pardes, Elizabeth
2013-05-28
Disclosed are computational methods, and associated hardware and software products for scoring conservation in a protein structure based on a computationally identified family or cluster of protein structures. A method of computationally identifying a family or cluster of protein structures in also disclosed herein.
Lee, Hasup; Baek, Minkyung; Lee, Gyu Rie; Park, Sangwoo; Seok, Chaok
2017-03-01
Many proteins function as homo- or hetero-oligomers; therefore, attempts to understand and regulate protein functions require knowledge of protein oligomer structures. The number of available experimental protein structures is increasing, and oligomer structures can be predicted using the experimental structures of related proteins as templates. However, template-based models may have errors due to sequence differences between the target and template proteins, which can lead to functional differences. Such structural differences may be predicted by loop modeling of local regions or refinement of the overall structure. In CAPRI (Critical Assessment of PRotein Interactions) round 30, we used recently developed features of the GALAXY protein modeling package, including template-based structure prediction, loop modeling, model refinement, and protein-protein docking to predict protein complex structures from amino acid sequences. Out of the 25 CAPRI targets, medium and acceptable quality models were obtained for 14 and 1 target(s), respectively, for which proper oligomer or monomer templates could be detected. Symmetric interface loop modeling on oligomer model structures successfully improved model quality, while loop modeling on monomer model structures failed. Overall refinement of the predicted oligomer structures consistently improved the model quality, in particular in interface contacts. Proteins 2017; 85:399-407. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
Zhou, Carol L Ecale
2015-01-01
In order to better define regions of similarity among related protein structures, it is useful to identify the residue-residue correspondences among proteins. Few codes exist for constructing a one-to-many multiple sequence alignment derived from a set of structure or sequence alignments, and a need was evident for creating such a tool for combining pairwise structure alignments that would allow for insertion of gaps in the reference structure. This report describes a new Python code, CombAlign, which takes as input a set of pairwise sequence alignments (which may be structure based) and generates a one-to-many, gapped, multiple structure- or sequence-based sequence alignment (MSSA). The use and utility of CombAlign was demonstrated by generating gapped MSSAs using sets of pairwise structure-based sequence alignments between structure models of the matrix protein (VP40) and pre-small/secreted glycoprotein (sGP) of Reston Ebolavirus and the corresponding proteins of several other filoviruses. The gapped MSSAs revealed structure-based residue-residue correspondences, which enabled identification of structurally similar versus differing regions in the Reston proteins compared to each of the other corresponding proteins. CombAlign is a new Python code that generates a one-to-many, gapped, multiple structure- or sequence-based sequence alignment (MSSA) given a set of pairwise sequence alignments (which may be structure based). CombAlign has utility in assisting the user in distinguishing structurally conserved versus divergent regions on a reference protein structure relative to other closely related proteins. CombAlign was developed in Python 2.6, and the source code is available for download from the GitHub code repository.
Adjusting protein graphs based on graph entropy.
Peng, Sheng-Lung; Tsay, Yu-Wei
2014-01-01
Measuring protein structural similarity attempts to establish a relationship of equivalence between polymer structures based on their conformations. In several recent studies, researchers have explored protein-graph remodeling, instead of looking a minimum superimposition for pairwise proteins. When graphs are used to represent structured objects, the problem of measuring object similarity become one of computing the similarity between graphs. Graph theory provides an alternative perspective as well as efficiency. Once a protein graph has been created, its structural stability must be verified. Therefore, a criterion is needed to determine if a protein graph can be used for structural comparison. In this paper, we propose a measurement for protein graph remodeling based on graph entropy. We extend the concept of graph entropy to determine whether a graph is suitable for representing a protein. The experimental results suggest that when applied, graph entropy helps a conformational on protein graph modeling. Furthermore, it indirectly contributes to protein structural comparison if a protein graph is solid.
Adjusting protein graphs based on graph entropy
2014-01-01
Measuring protein structural similarity attempts to establish a relationship of equivalence between polymer structures based on their conformations. In several recent studies, researchers have explored protein-graph remodeling, instead of looking a minimum superimposition for pairwise proteins. When graphs are used to represent structured objects, the problem of measuring object similarity become one of computing the similarity between graphs. Graph theory provides an alternative perspective as well as efficiency. Once a protein graph has been created, its structural stability must be verified. Therefore, a criterion is needed to determine if a protein graph can be used for structural comparison. In this paper, we propose a measurement for protein graph remodeling based on graph entropy. We extend the concept of graph entropy to determine whether a graph is suitable for representing a protein. The experimental results suggest that when applied, graph entropy helps a conformational on protein graph modeling. Furthermore, it indirectly contributes to protein structural comparison if a protein graph is solid. PMID:25474347
From laptop to benchtop to bedside: Structure-based Drug Design on Protein Targets
Chen, Lu; Morrow, John K.; Tran, Hoang T.; Phatak, Sharangdhar S.; Du-Cuny, Lei; Zhang, Shuxing
2013-01-01
As an important aspect of computer-aided drug design, structure-based drug design brought a new horizon to pharmaceutical development. This in silico method permeates all aspects of drug discovery today, including lead identification, lead optimization, ADMET prediction and drug repurposing. Structure-based drug design has resulted in fruitful successes drug discovery targeting protein-ligand and protein-protein interactions. Meanwhile, challenges, noted by low accuracy and combinatoric issues, may also cause failures. In this review, state-of-the-art techniques for protein modeling (e.g. structure prediction, modeling protein flexibility, etc.), hit identification/optimization (e.g. molecular docking, focused library design, fragment-based design, molecular dynamic, etc.), and polypharmacology design will be discussed. We will explore how structure-based techniques can facilitate the drug discovery process and interplay with other experimental approaches. PMID:22316152
CHENG, JIANLIN; EICKHOLT, JESSE; WANG, ZHENG; DENG, XIN
2013-01-01
After decades of research, protein structure prediction remains a very challenging problem. In order to address the different levels of complexity of structural modeling, two types of modeling techniques — template-based modeling and template-free modeling — have been developed. Template-based modeling can often generate a moderate- to high-resolution model when a similar, homologous template structure is found for a query protein but fails if no template or only incorrect templates are found. Template-free modeling, such as fragment-based assembly, may generate models of moderate resolution for small proteins of low topological complexity. Seldom have the two techniques been integrated together to improve protein modeling. Here we develop a recursive protein modeling approach to selectively and collaboratively apply template-based and template-free modeling methods to model template-covered (i.e. certain) and template-free (i.e. uncertain) regions of a protein. A preliminary implementation of the approach was tested on a number of hard modeling cases during the 9th Critical Assessment of Techniques for Protein Structure Prediction (CASP9) and successfully improved the quality of modeling in most of these cases. Recursive modeling can signicantly reduce the complexity of protein structure modeling and integrate template-based and template-free modeling to improve the quality and efficiency of protein structure prediction. PMID:22809379
Modelling and enhanced molecular dynamics to steer structure-based drug discovery.
Kalyaanamoorthy, Subha; Chen, Yi-Ping Phoebe
2014-05-01
The ever-increasing gap between the availabilities of the genome sequences and the crystal structures of proteins remains one of the significant challenges to the modern drug discovery efforts. The knowledge of structure-dynamics-functionalities of proteins is important in order to understand several key aspects of structure-based drug discovery, such as drug-protein interactions, drug binding and unbinding mechanisms and protein-protein interactions. This review presents a brief overview on the different state of the art computational approaches that are applied for protein structure modelling and molecular dynamics simulations of biological systems. We give an essence of how different enhanced sampling molecular dynamics approaches, together with regular molecular dynamics methods, assist in steering the structure based drug discovery processes. Copyright © 2013 Elsevier Ltd. All rights reserved.
Fourier-based classification of protein secondary structures.
Shu, Jian-Jun; Yong, Kian Yan
2017-04-15
The correct prediction of protein secondary structures is one of the key issues in predicting the correct protein folded shape, which is used for determining gene function. Existing methods make use of amino acids properties as indices to classify protein secondary structures, but are faced with a significant number of misclassifications. The paper presents a technique for the classification of protein secondary structures based on protein "signal-plotting" and the use of the Fourier technique for digital signal processing. New indices are proposed to classify protein secondary structures by analyzing hydrophobicity profiles. The approach is simple and straightforward. Results show that the more types of protein secondary structures can be classified by means of these newly-proposed indices. Copyright © 2017 Elsevier Inc. All rights reserved.
Ołdziej, S; Czaplewski, C; Liwo, A; Chinchio, M; Nanias, M; Vila, J A; Khalili, M; Arnautova, Y A; Jagielska, A; Makowski, M; Schafroth, H D; Kaźmierkiewicz, R; Ripoll, D R; Pillardy, J; Saunders, J A; Kang, Y K; Gibson, K D; Scheraga, H A
2005-05-24
Recent improvements in the protein-structure prediction method developed in our laboratory, based on the thermodynamic hypothesis, are described. The conformational space is searched extensively at the united-residue level by using our physics-based UNRES energy function and the conformational space annealing method of global optimization. The lowest-energy coarse-grained structures are then converted to an all-atom representation and energy-minimized with the ECEPP/3 force field. The procedure was assessed in two recent blind tests of protein-structure prediction. During the first blind test, we predicted large fragments of alpha and alpha+beta proteins [60-70 residues with C(alpha) rms deviation (rmsd) <6 A]. However, for alpha+beta proteins, significant topological errors occurred despite low rmsd values. In the second exercise, we predicted whole structures of five proteins (two alpha and three alpha+beta, with sizes of 53-235 residues) with remarkably good accuracy. In particular, for the genomic target TM0487 (a 102-residue alpha+beta protein from Thermotoga maritima), we predicted the complete, topologically correct structure with 7.3-A C(alpha) rmsd. So far this protein is the largest alpha+beta protein predicted based solely on the amino acid sequence and a physics-based potential-energy function and search procedure. For target T0198, a phosphate transport system regulator PhoU from T. maritima (a 235-residue mainly alpha-helical protein), we predicted the topology of the whole six-helix bundle correctly within 8 A rmsd, except the 32 C-terminal residues, most of which form a beta-hairpin. These and other examples described in this work demonstrate significant progress in physics-based protein-structure prediction.
Modeling complexes of modeled proteins.
Anishchenko, Ivan; Kundrotas, Petras J; Vakser, Ilya A
2017-03-01
Structural characterization of proteins is essential for understanding life processes at the molecular level. However, only a fraction of known proteins have experimentally determined structures. This fraction is even smaller for protein-protein complexes. Thus, structural modeling of protein-protein interactions (docking) primarily has to rely on modeled structures of the individual proteins, which typically are less accurate than the experimentally determined ones. Such "double" modeling is the Grand Challenge of structural reconstruction of the interactome. Yet it remains so far largely untested in a systematic way. We present a comprehensive validation of template-based and free docking on a set of 165 complexes, where each protein model has six levels of structural accuracy, from 1 to 6 Å C α RMSD. Many template-based docking predictions fall into acceptable quality category, according to the CAPRI criteria, even for highly inaccurate proteins (5-6 Å RMSD), although the number of such models (and, consequently, the docking success rate) drops significantly for models with RMSD > 4 Å. The results show that the existing docking methodologies can be successfully applied to protein models with a broad range of structural accuracy, and the template-based docking is much less sensitive to inaccuracies of protein models than the free docking. Proteins 2017; 85:470-478. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
Structure and function of seed storage proteins in faba bean (Vicia faba L.).
Liu, Yujiao; Wu, Xuexia; Hou, Wanwei; Li, Ping; Sha, Weichao; Tian, Yingying
2017-05-01
The protein subunit is the most important basic unit of protein, and its study can unravel the structure and function of seed storage proteins in faba bean. In this study, we identified six specific protein subunits in Faba bean (cv. Qinghai 13) combining liquid chromatography (LC), liquid chromatography-electronic spray ionization mass (LC-ESI-MS/MS) and bio-information technology. The results suggested a diversity of seed storage proteins in faba bean, and a total of 16 proteins (four GroEL molecular chaperones and 12 plant-specific proteins) were identified from 97-, 96-, 64-, 47-, 42-, and 38-kD-specific protein subunits in faba bean based on the peptide sequence. We also analyzed the composition and abundance of the amino acids, the physicochemical characteristics, secondary structure, three-dimensional structure, transmembrane domain, and possible subcellular localization of these identified proteins in faba bean seed, and finally predicted function and structure. The three-dimensional structures were generated based on homologous modeling, and the protein function was analyzed based on the annotation from the non-redundant protein database (NR database, NCBI) and function analysis of optimal modeling. The objective of this study was to identify the seed storage proteins in faba bean and confirm the structure and function of these proteins. Our results can be useful for the study of protein nutrition and achieve breeding goals for optimal protein quality in faba bean.
DOE Office of Scientific and Technical Information (OSTI.GOV)
P Yu
Unlike traditional 'wet' analytical methods which during processing for analysis often result in destruction or alteration of the intrinsic protein structures, advanced synchrotron radiation-based Fourier transform infrared microspectroscopy has been developed as a rapid and nondestructive and bioanalytical technique. This cutting-edge synchrotron-based bioanalytical technology, taking advantages of synchrotron light brightness (million times brighter than sun), is capable of exploring the molecular chemistry or structure of a biological tissue without destruction inherent structures at ultra-spatial resolutions. In this article, a novel approach is introduced to show the potential of the advanced synchrotron-based analytical technology, which can be used to study plant-basedmore » food or feed protein molecular structure in relation to nutrient utilization and availability. Recent progress was reported on using synchrotron-based bioanalytical technique synchrotron radiation-based Fourier transform infrared microspectroscopy and diffused reflectance infrared Fourier transform spectroscopy to detect the effects of gene-transformation (Application 1), autoclaving (Application 2), and bio-ethanol processing (Application 3) on plant-based food and feed protein structure changes on a molecular basis. The synchrotron-based technology provides a new approach for plant-based protein structure research at ultra-spatial resolutions at cellular and molecular levels.« less
Cui, Xuefeng; Lu, Zhiwu; Wang, Sheng; Jing-Yan Wang, Jim; Gao, Xin
2016-06-15
Protein homology detection, a fundamental problem in computational biology, is an indispensable step toward predicting protein structures and understanding protein functions. Despite the advances in recent decades on sequence alignment, threading and alignment-free methods, protein homology detection remains a challenging open problem. Recently, network methods that try to find transitive paths in the protein structure space demonstrate the importance of incorporating network information of the structure space. Yet, current methods merge the sequence space and the structure space into a single space, and thus introduce inconsistency in combining different sources of information. We present a novel network-based protein homology detection method, CMsearch, based on cross-modal learning. Instead of exploring a single network built from the mixture of sequence and structure space information, CMsearch builds two separate networks to represent the sequence space and the structure space. It then learns sequence-structure correlation by simultaneously taking sequence information, structure information, sequence space information and structure space information into consideration. We tested CMsearch on two challenging tasks, protein homology detection and protein structure prediction, by querying all 8332 PDB40 proteins. Our results demonstrate that CMsearch is insensitive to the similarity metrics used to define the sequence and the structure spaces. By using HMM-HMM alignment as the sequence similarity metric, CMsearch clearly outperforms state-of-the-art homology detection methods and the CASP-winning template-based protein structure prediction methods. Our program is freely available for download from http://sfb.kaust.edu.sa/Pages/Software.aspx : xin.gao@kaust.edu.sa Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.
Kato, Koichi; Nakayoshi, Tomoki; Fukuyoshi, Shuichi; Kurimoto, Eiji; Oda, Akifumi
2017-10-12
Although various higher-order protein structure prediction methods have been developed, almost all of them were developed based on the three-dimensional (3D) structure information of known proteins. Here we predicted the short protein structures by molecular dynamics (MD) simulations in which only Newton's equations of motion were used and 3D structural information of known proteins was not required. To evaluate the ability of MD simulationto predict protein structures, we calculated seven short test protein (10-46 residues) in the denatured state and compared their predicted and experimental structures. The predicted structure for Trp-cage (20 residues) was close to the experimental structure by 200-ns MD simulation. For proteins shorter or longer than Trp-cage, root-mean square deviation values were larger than those for Trp-cage. However, secondary structures could be reproduced by MD simulations for proteins with 10-34 residues. Simulations by replica exchange MD were performed, but the results were similar to those from normal MD simulations. These results suggest that normal MD simulations can roughly predict short protein structures and 200-ns simulations are frequently sufficient for estimating the secondary structures of protein (approximately 20 residues). Structural prediction method using only fundamental physical laws are useful for investigating non-natural proteins, such as primitive proteins and artificial proteins for peptide-based drug delivery systems.
Structure refinement of membrane proteins via molecular dynamics simulations.
Dutagaci, Bercem; Heo, Lim; Feig, Michael
2018-07-01
A refinement protocol based on physics-based techniques established for water soluble proteins is tested for membrane protein structures. Initial structures were generated by homology modeling and sampled via molecular dynamics simulations in explicit lipid bilayer and aqueous solvent systems. Snapshots from the simulations were selected based on scoring with either knowledge-based or implicit membrane-based scoring functions and averaged to obtain refined models. The protocol resulted in consistent and significant refinement of the membrane protein structures similar to the performance of refinement methods for soluble proteins. Refinement success was similar between sampling in the presence of lipid bilayers and aqueous solvent but the presence of lipid bilayers may benefit the improvement of lipid-facing residues. Scoring with knowledge-based functions (DFIRE and RWplus) was found to be as good as scoring using implicit membrane-based scoring functions suggesting that differences in internal packing is more important than orientations relative to the membrane during the refinement of membrane protein homology models. © 2018 Wiley Periodicals, Inc.
Nucleic and Amino Acid Sequences Support Structure-Based Viral Classification.
Sinclair, Robert M; Ravantti, Janne J; Bamford, Dennis H
2017-04-15
Viral capsids ensure viral genome integrity by protecting the enclosed nucleic acids. Interactions between the genome and capsid and between individual capsid proteins (i.e., capsid architecture) are intimate and are expected to be characterized by strong evolutionary conservation. For this reason, a capsid structure-based viral classification has been proposed as a way to bring order to the viral universe. The seeming lack of sufficient sequence similarity to reproduce this classification has made it difficult to reject structural convergence as the basis for the classification. We reinvestigate whether the structure-based classification for viral coat proteins making icosahedral virus capsids is in fact supported by previously undetected sequence similarity. Since codon choices can influence nascent protein folding cotranslationally, we searched for both amino acid and nucleotide sequence similarity. To demonstrate the sensitivity of the approach, we identify a candidate gene for the pandoravirus capsid protein. We show that the structure-based classification is strongly supported by amino acid and also nucleotide sequence similarities, suggesting that the similarities are due to common descent. The correspondence between structure-based and sequence-based analyses of the same proteins shown here allow them to be used in future analyses of the relationship between linear sequence information and macromolecular function, as well as between linear sequence and protein folds. IMPORTANCE Viral capsids protect nucleic acid genomes, which in turn encode capsid proteins. This tight coupling of protein shell and nucleic acids, together with strong functional constraints on capsid protein folding and architecture, leads to the hypothesis that capsid protein-coding nucleotide sequences may retain signatures of ancient viral evolution. We have been able to show that this is indeed the case, using the major capsid proteins of viruses forming icosahedral capsids. Importantly, we detected similarity at the nucleotide level between capsid protein-coding regions from viruses infecting cells belonging to all three domains of life, reproducing a previously established structure-based classification of icosahedral viral capsids. Copyright © 2017 Sinclair et al.
Nucleic and Amino Acid Sequences Support Structure-Based Viral Classification
Sinclair, Robert M.; Ravantti, Janne J.
2017-01-01
ABSTRACT Viral capsids ensure viral genome integrity by protecting the enclosed nucleic acids. Interactions between the genome and capsid and between individual capsid proteins (i.e., capsid architecture) are intimate and are expected to be characterized by strong evolutionary conservation. For this reason, a capsid structure-based viral classification has been proposed as a way to bring order to the viral universe. The seeming lack of sufficient sequence similarity to reproduce this classification has made it difficult to reject structural convergence as the basis for the classification. We reinvestigate whether the structure-based classification for viral coat proteins making icosahedral virus capsids is in fact supported by previously undetected sequence similarity. Since codon choices can influence nascent protein folding cotranslationally, we searched for both amino acid and nucleotide sequence similarity. To demonstrate the sensitivity of the approach, we identify a candidate gene for the pandoravirus capsid protein. We show that the structure-based classification is strongly supported by amino acid and also nucleotide sequence similarities, suggesting that the similarities are due to common descent. The correspondence between structure-based and sequence-based analyses of the same proteins shown here allow them to be used in future analyses of the relationship between linear sequence information and macromolecular function, as well as between linear sequence and protein folds. IMPORTANCE Viral capsids protect nucleic acid genomes, which in turn encode capsid proteins. This tight coupling of protein shell and nucleic acids, together with strong functional constraints on capsid protein folding and architecture, leads to the hypothesis that capsid protein-coding nucleotide sequences may retain signatures of ancient viral evolution. We have been able to show that this is indeed the case, using the major capsid proteins of viruses forming icosahedral capsids. Importantly, we detected similarity at the nucleotide level between capsid protein-coding regions from viruses infecting cells belonging to all three domains of life, reproducing a previously established structure-based classification of icosahedral viral capsids. PMID:28122979
Protein docking by the interface structure similarity: how much structure is needed?
Sinha, Rohita; Kundrotas, Petras J; Vakser, Ilya A
2012-01-01
The increasing availability of co-crystallized protein-protein complexes provides an opportunity to use template-based modeling for protein-protein docking. Structure alignment techniques are useful in detection of remote target-template similarities. The size of the structure involved in the alignment is important for the success in modeling. This paper describes a systematic large-scale study to find the optimal definition/size of the interfaces for the structure alignment-based docking applications. The results showed that structural areas corresponding to the cutoff values <12 Å across the interface inadequately represent structural details of the interfaces. With the increase of the cutoff beyond 12 Å, the success rate for the benchmark set of 99 protein complexes, did not increase significantly for higher accuracy models, and decreased for lower-accuracy models. The 12 Å cutoff was optimal in our interface alignment-based docking, and a likely best choice for the large-scale (e.g., on the scale of the entire genome) applications to protein interaction networks. The results provide guidelines for the docking approaches, including high-throughput applications to modeled structures.
Protein-based hydrogels for tissue engineering
Schloss, Ashley C.; Williams, Danielle M.; Regan, Lynne J.
2017-01-01
The tunable mechanical and structural properties of protein-based hydrogels make them excellent scaffolds for tissue engineering and repair. Moreover, using protein-based components provides the option to insert sequences associated with the promoting both cellular adhesion to the substrate and overall cell growth. Protein-based hydrogel components are appealing for their structural designability, specific biological functionality, and stimuli-responsiveness. Here we present highlights in the field of protein-based hydrogels for tissue engineering applications including design requirements, components, and gel types. PMID:27677513
Sinz, Andrea
2018-05-28
Structural mass spectrometry (MS) is gaining increasing importance for deriving valuable three-dimensional structural information on proteins and protein complexes, and it complements existing techniques, such as NMR spectroscopy and X-ray crystallography. Structural MS unites different MS-based techniques, such as hydrogen/deuterium exchange, native MS, ion-mobility MS, protein footprinting, and chemical cross-linking/MS, and it allows fundamental questions in structural biology to be addressed. In this Minireview, I will focus on the cross-linking/MS strategy. This method not only delivers tertiary structural information on proteins, but is also increasingly being used to decipher protein interaction networks, both in vitro and in vivo. Cross-linking/MS is currently one of the most promising MS-based approaches to derive structural information on very large and transient protein assemblies and intrinsically disordered proteins. © 2018 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.
Structural Mass Spectrometry of Proteins Using Hydroxyl Radical Based Protein Footprinting
Wang, Liwen; Chance, Mark R.
2011-01-01
Structural MS is a rapidly growing field with many applications in basic research and pharmaceutical drug development. In this feature article the overall technology is described and several examples of how hydroxyl radical based footprinting MS can be used to map interfaces, evaluate protein structure, and identify ligand dependent conformational changes in proteins are described. PMID:21770468
Structure-based barcoding of proteins.
Metri, Rahul; Jerath, Gaurav; Kailas, Govind; Gacche, Nitin; Pal, Adityabarna; Ramakrishnan, Vibin
2014-01-01
A reduced representation in the format of a barcode has been developed to provide an overview of the topological nature of a given protein structure from 3D coordinate file. The molecular structure of a protein coordinate file from Protein Data Bank is first expressed in terms of an alpha-numero code and further converted to a barcode image. The barcode representation can be used to compare and contrast different proteins based on their structure. The utility of this method has been exemplified by comparing structural barcodes of proteins that belong to same fold family, and across different folds. In addition to this, we have attempted to provide an illustration to (i) the structural changes often seen in a given protein molecule upon interaction with ligands and (ii) Modifications in overall topology of a given protein during evolution. The program is fully downloadable from the website http://www.iitg.ac.in/probar/. © 2013 The Protein Society.
G-LoSA for Prediction of Protein-Ligand Binding Sites and Structures.
Lee, Hui Sun; Im, Wonpil
2017-01-01
Recent advances in high-throughput structure determination and computational protein structure prediction have significantly enriched the universe of protein structure. However, there is still a large gap between the number of available protein structures and that of proteins with annotated function in high accuracy. Computational structure-based protein function prediction has emerged to reduce this knowledge gap. The identification of a ligand binding site and its structure is critical to the determination of a protein's molecular function. We present a computational methodology for predicting small molecule ligand binding site and ligand structure using G-LoSA, our protein local structure alignment and similarity measurement tool. All the computational procedures described here can be easily implemented using G-LoSA Toolkit, a package of standalone software programs and preprocessed PDB structure libraries. G-LoSA and G-LoSA Toolkit are freely available to academic users at http://compbio.lehigh.edu/GLoSA . We also illustrate a case study to show the potential of our template-based approach harnessing G-LoSA for protein function prediction.
A structural-alphabet-based strategy for finding structural motifs across protein families
Wu, Chih Yuan; Chen, Yao Chi; Lim, Carmay
2010-01-01
Proteins with insignificant sequence and overall structure similarity may still share locally conserved contiguous structural segments; i.e. structural/3D motifs. Most methods for finding 3D motifs require a known motif to search for other similar structures or functionally/structurally crucial residues. Here, without requiring a query motif or essential residues, a fully automated method for discovering 3D motifs of various sizes across protein families with different folds based on a 16-letter structural alphabet is presented. It was applied to structurally non-redundant proteins bound to DNA, RNA, obligate/non-obligate proteins as well as free DNA-binding proteins (DBPs) and proteins with known structures but unknown function. Its usefulness was illustrated by analyzing the 3D motifs found in DBPs. A non-specific motif was found with a ‘corner’ architecture that confers a stable scaffold and enables diverse interactions, making it suitable for binding not only DNA but also RNA and proteins. Furthermore, DNA-specific motifs present ‘only’ in DBPs were discovered. The motifs found can provide useful guidelines in detecting binding sites and computational protein redesign. PMID:20525797
Zhang, Chengxin; Zheng, Wei; Freddolino, Peter L; Zhang, Yang
2018-03-10
Homology-based transferal remains the major approach to computational protein function annotations, but it becomes increasingly unreliable when the sequence identity between query and template decreases below 30%. We propose a novel pipeline, MetaGO, to deduce Gene Ontology attributes of proteins by combining sequence homology-based annotation with low-resolution structure prediction and comparison, and partner's homology-based protein-protein network mapping. The pipeline was tested on a large-scale set of 1000 non-redundant proteins from the CAFA3 experiment. Under the stringent benchmark conditions where templates with >30% sequence identity to the query are excluded, MetaGO achieves average F-measures of 0.487, 0.408, and 0.598, for Molecular Function, Biological Process, and Cellular Component, respectively, which are significantly higher than those achieved by other state-of-the-art function annotations methods. Detailed data analysis shows that the major advantage of the MetaGO lies in the new functional homolog detections from partner's homology-based network mapping and structure-based local and global structure alignments, the confidence scores of which can be optimally combined through logistic regression. These data demonstrate the power of using a hybrid model incorporating protein structure and interaction networks to deduce new functional insights beyond traditional sequence homology-based referrals, especially for proteins that lack homologous function templates. The MetaGO pipeline is available at http://zhanglab.ccmb.med.umich.edu/MetaGO/. Copyright © 2018. Published by Elsevier Ltd.
Quality assessment of protein model-structures based on structural and functional similarities.
Konopka, Bogumil M; Nebel, Jean-Christophe; Kotulska, Malgorzata
2012-09-21
Experimental determination of protein 3D structures is expensive, time consuming and sometimes impossible. A gap between number of protein structures deposited in the World Wide Protein Data Bank and the number of sequenced proteins constantly broadens. Computational modeling is deemed to be one of the ways to deal with the problem. Although protein 3D structure prediction is a difficult task, many tools are available. These tools can model it from a sequence or partial structural information, e.g. contact maps. Consequently, biologists have the ability to generate automatically a putative 3D structure model of any protein. However, the main issue becomes evaluation of the model quality, which is one of the most important challenges of structural biology. GOBA--Gene Ontology-Based Assessment is a novel Protein Model Quality Assessment Program. It estimates the compatibility between a model-structure and its expected function. GOBA is based on the assumption that a high quality model is expected to be structurally similar to proteins functionally similar to the prediction target. Whereas DALI is used to measure structure similarity, protein functional similarity is quantified using standardized and hierarchical description of proteins provided by Gene Ontology combined with Wang's algorithm for calculating semantic similarity. Two approaches are proposed to express the quality of protein model-structures. One is a single model quality assessment method, the other is its modification, which provides a relative measure of model quality. Exhaustive evaluation is performed on data sets of model-structures submitted to the CASP8 and CASP9 contests. The validation shows that the method is able to discriminate between good and bad model-structures. The best of tested GOBA scores achieved 0.74 and 0.8 as a mean Pearson correlation to the observed quality of models in our CASP8 and CASP9-based validation sets. GOBA also obtained the best result for two targets of CASP8, and one of CASP9, compared to the contest participants. Consequently, GOBA offers a novel single model quality assessment program that addresses the practical needs of biologists. In conjunction with other Model Quality Assessment Programs (MQAPs), it would prove useful for the evaluation of single protein models.
Xu, Dong; Zhang, Jian; Roy, Ambrish; Zhang, Yang
2011-01-01
I-TASSER is an automated pipeline for protein tertiary structure prediction using multiple threading alignments and iterative structure assembly simulations. In CASP9 experiments, two new algorithms, QUARK and FG-MD, were added to the I-TASSER pipeline for improving the structural modeling accuracy. QUARK is a de novo structure prediction algorithm used for structure modeling of proteins that lack detectable template structures. For distantly homologous targets, QUARK models are found useful as a reference structure for selecting good threading alignments and guiding the I-TASSER structure assembly simulations. FG-MD is an atomic-level structural refinement program that uses structural fragments collected from the PDB structures to guide molecular dynamics simulation and improve the local structure of predicted model, including hydrogen-bonding networks, torsion angles and steric clashes. Despite considerable progress in both the template-based and template-free structure modeling, significant improvements on protein target classification, domain parsing, model selection, and ab initio folding of beta-proteins are still needed to further improve the I-TASSER pipeline. PMID:22069036
Houston, Simon; Lithgow, Karen Vivien; Osbak, Kara Krista; Kenyon, Chris Richard; Cameron, Caroline E
2018-05-16
Syphilis continues to be a major global health threat with 11 million new infections each year, and a global burden of 36 million cases. The causative agent of syphilis, Treponema pallidum subspecies pallidum, is a highly virulent bacterium, however the molecular mechanisms underlying T. pallidum pathogenesis remain to be definitively identified. This is due to the fact that T. pallidum is currently uncultivatable, inherently fragile and thus difficult to work with, and phylogenetically distinct with no conventional virulence factor homologs found in other pathogens. In fact, approximately 30% of its predicted protein-coding genes have no known orthologs or assigned functions. Here we employed a structural bioinformatics approach using Phyre2-based tertiary structure modeling to improve our understanding of T. pallidum protein function on a proteome-wide scale. Phyre2-based tertiary structure modeling generated high-confidence predictions for 80% of the T. pallidum proteome (780/978 predicted proteins). Tertiary structure modeling also inferred the same function as primary structure-based annotations from genome sequencing pipelines for 525/605 proteins (87%), which represents 54% (525/978) of all T. pallidum proteins. Of the 175 T. pallidum proteins modeled with high confidence that were not assigned functions in the previously annotated published proteome, 167 (95%) were able to be assigned predicted functions. Twenty-one of the 175 hypothetical proteins modeled with high confidence were also predicted to exhibit significant structural similarity with proteins experimentally confirmed to be required for virulence in other pathogens. Phyre2-based structural modeling is a powerful bioinformatics tool that has provided insight into the potential structure and function of the majority of T. pallidum proteins and helped validate the primary structure-based annotation of more than 50% of all T. pallidum proteins with high confidence. This work represents the first T. pallidum proteome-wide structural modeling study and is one of few studies to apply this approach for the functional annotation of a whole proteome.
HBNG: Graph theory based visualization of hydrogen bond networks in protein structures.
Tiwari, Abhishek; Tiwari, Vivek
2007-07-09
HBNG is a graph theory based tool for visualization of hydrogen bond network in 2D. Digraphs generated by HBNG facilitate visualization of cooperativity and anticooperativity chains and rings in protein structures. HBNG takes hydrogen bonds list files (output from HBAT, HBEXPLORE, HBPLUS and STRIDE) as input and generates a DOT language script and constructs digraphs using freeware AT and T Graphviz tool. HBNG is useful in the enumeration of favorable topologies of hydrogen bond networks in protein structures and determining the effect of cooperativity and anticooperativity on protein stability and folding. HBNG can be applied to protein structure comparison and in the identification of secondary structural regions in protein structures. Program is available from the authors for non-commercial purposes.
Martin, Juliette; Regad, Leslie; Etchebest, Catherine; Camproux, Anne-Claude
2008-11-15
Interresidue protein contacts in proteins structures and at protein-protein interface are classically described by the amino acid types of interacting residues and the local structural context of the contact, if any, is described using secondary structures. In this study, we present an alternate analysis of interresidue contact using local structures defined by the structural alphabet introduced by Camproux et al. This structural alphabet allows to describe a 3D structure as a sequence of prototype fragments called structural letters, of 27 different types. Each residue can then be assigned to a particular local structure, even in loop regions. The analysis of interresidue contacts within protein structures defined using Voronoï tessellations reveals that pairwise contact specificity is greater in terms of structural letters than amino acids. Using a simple heuristic based on specificity score comparison, we find that 74% of the long-range contacts within protein structures are better described using structural letters than amino acid types. The investigation is extended to a set of protein-protein complexes, showing that the similar global rules apply as for intraprotein contacts, with 64% of the interprotein contacts best described by local structures. We then present an evaluation of pairing functions integrating structural letters to decoy scoring and show that some complexes could benefit from the use of structural letter-based pairing functions.
Protein Structure and Function Prediction Using I-TASSER
Yang, Jianyi; Zhang, Yang
2016-01-01
I-TASSER is a hierarchical protocol for automated protein structure prediction and structure-based function annotation. Starting from the amino acid sequence of target proteins, I-TASSER first generates full-length atomic structural models from multiple threading alignments and iterative structural assembly simulations followed by atomic-level structure refinement. The biological functions of the protein, including ligand-binding sites, enzyme commission number, and gene ontology terms, are then inferred from known protein function databases based on sequence and structure profile comparisons. I-TASSER is freely available as both an on-line server and a stand-alone package. This unit describes how to use the I-TASSER protocol to generate structure and function prediction and how to interpret the prediction results, as well as alternative approaches for further improving the I-TASSER modeling quality for distant-homologous and multi-domain protein targets. PMID:26678386
LenVarDB: database of length-variant protein domains.
Mutt, Eshita; Mathew, Oommen K; Sowdhamini, Ramanathan
2014-01-01
Protein domains are functionally and structurally independent modules, which add to the functional variety of proteins. This array of functional diversity has been enabled by evolutionary changes, such as amino acid substitutions or insertions or deletions, occurring in these protein domains. Length variations (indels) can introduce changes at structural, functional and interaction levels. LenVarDB (freely available at http://caps.ncbs.res.in/lenvardb/) traces these length variations, starting from structure-based sequence alignments in our Protein Alignments organized as Structural Superfamilies (PASS2) database, across 731 structural classification of proteins (SCOP)-based protein domain superfamilies connected to 2 730 625 sequence homologues. Alignment of sequence homologues corresponding to a structural domain is available, starting from a structure-based sequence alignment of the superfamily. Orientation of the length-variant (indel) regions in protein domains can be visualized by mapping them on the structure and on the alignment. Knowledge about location of length variations within protein domains and their visual representation will be useful in predicting changes within structurally or functionally relevant sites, which may ultimately regulate protein function. Non-technical summary: Evolutionary changes bring about natural changes to proteins that may be found in many organisms. Such changes could be reflected as amino acid substitutions or insertions-deletions (indels) in protein sequences. LenVarDB is a database that provides an early overview of observed length variations that were set among 731 protein families and after examining >2 million sequences. Indels are followed up to observe if they are close to the active site such that they can affect the activity of proteins. Inclusion of such information can aid the design of bioengineering experiments.
Jelínek, Jan; Škoda, Petr; Hoksza, David
2017-12-06
Protein-protein interactions (PPI) play a key role in an investigation of various biochemical processes, and their identification is thus of great importance. Although computational prediction of which amino acids take part in a PPI has been an active field of research for some time, the quality of in-silico methods is still far from perfect. We have developed a novel prediction method called INSPiRE which benefits from a knowledge base built from data available in Protein Data Bank. All proteins involved in PPIs were converted into labeled graphs with nodes corresponding to amino acids and edges to pairs of neighboring amino acids. A structural neighborhood of each node was then encoded into a bit string and stored in the knowledge base. When predicting PPIs, INSPiRE labels amino acids of unknown proteins as interface or non-interface based on how often their structural neighborhood appears as interface or non-interface in the knowledge base. We evaluated INSPiRE's behavior with respect to different types and sizes of the structural neighborhood. Furthermore, we examined the suitability of several different features for labeling the nodes. Our evaluations showed that INSPiRE clearly outperforms existing methods with respect to Matthews correlation coefficient. In this paper we introduce a new knowledge-based method for identification of protein-protein interaction sites called INSPiRE. Its knowledge base utilizes structural patterns of known interaction sites in the Protein Data Bank which are then used for PPI prediction. Extensive experiments on several well-established datasets show that INSPiRE significantly surpasses existing PPI approaches.
Park, Hahnbeom; Lee, Gyu Rie; Heo, Lim; Seok, Chaok
2014-01-01
Protein loop modeling is a tool for predicting protein local structures of particular interest, providing opportunities for applications involving protein structure prediction and de novo protein design. Until recently, the majority of loop modeling methods have been developed and tested by reconstructing loops in frameworks of experimentally resolved structures. In many practical applications, however, the protein loops to be modeled are located in inaccurate structural environments. These include loops in model structures, low-resolution experimental structures, or experimental structures of different functional forms. Accordingly, discrepancies in the accuracy of the structural environment assumed in development of the method and that in practical applications present additional challenges to modern loop modeling methods. This study demonstrates a new strategy for employing a hybrid energy function combining physics-based and knowledge-based components to help tackle this challenge. The hybrid energy function is designed to combine the strengths of each energy component, simultaneously maintaining accurate loop structure prediction in a high-resolution framework structure and tolerating minor environmental errors in low-resolution structures. A loop modeling method based on global optimization of this new energy function is tested on loop targets situated in different levels of environmental errors, ranging from experimental structures to structures perturbed in backbone as well as side chains and template-based model structures. The new method performs comparably to force field-based approaches in loop reconstruction in crystal structures and better in loop prediction in inaccurate framework structures. This result suggests that higher-accuracy predictions would be possible for a broader range of applications. The web server for this method is available at http://galaxy.seoklab.org/loop with the PS2 option for the scoring function.
Khafizov, Kamil; Madrid-Aliste, Carlos; Almo, Steven C; Fiser, Andras
2014-03-11
The exponential growth of protein sequence data provides an ever-expanding body of unannotated and misannotated proteins. The National Institutes of Health-supported Protein Structure Initiative and related worldwide structural genomics efforts facilitate functional annotation of proteins through structural characterization. Recently there have been profound changes in the taxonomic composition of sequence databases, which are effectively redefining the scope and contribution of these large-scale structure-based efforts. The faster-growing bacterial genomic entries have overtaken the eukaryotic entries over the last 5 y, but also have become more redundant. Despite the enormous increase in the number of sequences, the overall structural coverage of proteins--including proteins for which reliable homology models can be generated--on the residue level has increased from 30% to 40% over the last 10 y. Structural genomics efforts contributed ∼50% of this new structural coverage, despite determining only ∼10% of all new structures. Based on current trends, it is expected that ∼55% structural coverage (the level required for significant functional insight) will be achieved within 15 y, whereas without structural genomics efforts, realizing this goal will take approximately twice as long.
Conformational Transitions upon Ligand Binding: Holo-Structure Prediction from Apo Conformations
Seeliger, Daniel; de Groot, Bert L.
2010-01-01
Biological function of proteins is frequently associated with the formation of complexes with small-molecule ligands. Experimental structure determination of such complexes at atomic resolution, however, can be time-consuming and costly. Computational methods for structure prediction of protein/ligand complexes, particularly docking, are as yet restricted by their limited consideration of receptor flexibility, rendering them not applicable for predicting protein/ligand complexes if large conformational changes of the receptor upon ligand binding are involved. Accurate receptor models in the ligand-bound state (holo structures), however, are a prerequisite for successful structure-based drug design. Hence, if only an unbound (apo) structure is available distinct from the ligand-bound conformation, structure-based drug design is severely limited. We present a method to predict the structure of protein/ligand complexes based solely on the apo structure, the ligand and the radius of gyration of the holo structure. The method is applied to ten cases in which proteins undergo structural rearrangements of up to 7.1 Å backbone RMSD upon ligand binding. In all cases, receptor models within 1.6 Å backbone RMSD to the target were predicted and close-to-native ligand binding poses were obtained for 8 of 10 cases in the top-ranked complex models. A protocol is presented that is expected to enable structure modeling of protein/ligand complexes and structure-based drug design for cases where crystal structures of ligand-bound conformations are not available. PMID:20066034
RaptorX server: a resource for template-based protein structure modeling.
Källberg, Morten; Margaryan, Gohar; Wang, Sheng; Ma, Jianzhu; Xu, Jinbo
2014-01-01
Assigning functional properties to a newly discovered protein is a key challenge in modern biology. To this end, computational modeling of the three-dimensional atomic arrangement of the amino acid chain is often crucial in determining the role of the protein in biological processes. We present a community-wide web-based protocol, RaptorX server ( http://raptorx.uchicago.edu ), for automated protein secondary structure prediction, template-based tertiary structure modeling, and probabilistic alignment sampling.Given a target sequence, RaptorX server is able to detect even remotely related template sequences by means of a novel nonlinear context-specific alignment potential and probabilistic consistency algorithm. Using the protocol presented here it is thus possible to obtain high-quality structural models for many target protein sequences when only distantly related protein domains have experimentally solved structures. At present, RaptorX server can perform secondary and tertiary structure prediction of a 200 amino acid target sequence in approximately 30 min.
PASS2: an automated database of protein alignments organised as structural superfamilies.
Bhaduri, Anirban; Pugalenthi, Ganesan; Sowdhamini, Ramanathan
2004-04-02
The functional selection and three-dimensional structural constraints of proteins in nature often relates to the retention of significant sequence similarity between proteins of similar fold and function despite poor sequence identity. Organization of structure-based sequence alignments for distantly related proteins, provides a map of the conserved and critical regions of the protein universe that is useful for the analysis of folding principles, for the evolutionary unification of protein families and for maximizing the information return from experimental structure determination. The Protein Alignment organised as Structural Superfamily (PASS2) database represents continuously updated, structural alignments for evolutionary related, sequentially distant proteins. An automated and updated version of PASS2 is, in direct correspondence with SCOP 1.63, consisting of sequences having identity below 40% among themselves. Protein domains have been grouped into 628 multi-member superfamilies and 566 single member superfamilies. Structure-based sequence alignments for the superfamilies have been obtained using COMPARER, while initial equivalencies have been derived from a preliminary superposition using LSQMAN or STAMP 4.0. The final sequence alignments have been annotated for structural features using JOY4.0. The database is supplemented with sequence relatives belonging to different genomes, conserved spatially interacting and structural motifs, probabilistic hidden markov models of superfamilies based on the alignments and useful links to other databases. Probabilistic models and sensitive position specific profiles obtained from reliable superfamily alignments aid annotation of remote homologues and are useful tools in structural and functional genomics. PASS2 presents the phylogeny of its members both based on sequence and structural dissimilarities. Clustering of members allows us to understand diversification of the family members. The search engine has been improved for simpler browsing of the database. The database resolves alignments among the structural domains consisting of evolutionarily diverged set of sequences. Availability of reliable sequence alignments of distantly related proteins despite poor sequence identity and single-member superfamilies permit better sampling of structures in libraries for fold recognition of new sequences and for the understanding of protein structure-function relationships of individual superfamilies. PASS2 is accessible at http://www.ncbs.res.in/~faculty/mini/campass/pass2.html
Template-Based Modeling of Protein-RNA Interactions.
Zheng, Jinfang; Kundrotas, Petras J; Vakser, Ilya A; Liu, Shiyong
2016-09-01
Protein-RNA complexes formed by specific recognition between RNA and RNA-binding proteins play an important role in biological processes. More than a thousand of such proteins in human are curated and many novel RNA-binding proteins are to be discovered. Due to limitations of experimental approaches, computational techniques are needed for characterization of protein-RNA interactions. Although much progress has been made, adequate methodologies reliably providing atomic resolution structural details are still lacking. Although protein-RNA free docking approaches proved to be useful, in general, the template-based approaches provide higher quality of predictions. Templates are key to building a high quality model. Sequence/structure relationships were studied based on a representative set of binary protein-RNA complexes from PDB. Several approaches were tested for pairwise target/template alignment. The analysis revealed a transition point between random and correct binding modes. The results showed that structural alignment is better than sequence alignment in identifying good templates, suitable for generating protein-RNA complexes close to the native structure, and outperforms free docking, successfully predicting complexes where the free docking fails, including cases of significant conformational change upon binding. A template-based protein-RNA interaction modeling protocol PRIME was developed and benchmarked on a representative set of complexes.
Blind test of physics-based prediction of protein structures.
Shell, M Scott; Ozkan, S Banu; Voelz, Vincent; Wu, Guohong Albert; Dill, Ken A
2009-02-01
We report here a multiprotein blind test of a computer method to predict native protein structures based solely on an all-atom physics-based force field. We use the AMBER 96 potential function with an implicit (GB/SA) model of solvation, combined with replica-exchange molecular-dynamics simulations. Coarse conformational sampling is performed using the zipping and assembly method (ZAM), an approach that is designed to mimic the putative physical routes of protein folding. ZAM was applied to the folding of six proteins, from 76 to 112 monomers in length, in CASP7, a community-wide blind test of protein structure prediction. Because these predictions have about the same level of accuracy as typical bioinformatics methods, and do not utilize information from databases of known native structures, this work opens up the possibility of predicting the structures of membrane proteins, synthetic peptides, or other foldable polymers, for which there is little prior knowledge of native structures. This approach may also be useful for predicting physical protein folding routes, non-native conformations, and other physical properties from amino acid sequences.
Blind Test of Physics-Based Prediction of Protein Structures
Shell, M. Scott; Ozkan, S. Banu; Voelz, Vincent; Wu, Guohong Albert; Dill, Ken A.
2009-01-01
We report here a multiprotein blind test of a computer method to predict native protein structures based solely on an all-atom physics-based force field. We use the AMBER 96 potential function with an implicit (GB/SA) model of solvation, combined with replica-exchange molecular-dynamics simulations. Coarse conformational sampling is performed using the zipping and assembly method (ZAM), an approach that is designed to mimic the putative physical routes of protein folding. ZAM was applied to the folding of six proteins, from 76 to 112 monomers in length, in CASP7, a community-wide blind test of protein structure prediction. Because these predictions have about the same level of accuracy as typical bioinformatics methods, and do not utilize information from databases of known native structures, this work opens up the possibility of predicting the structures of membrane proteins, synthetic peptides, or other foldable polymers, for which there is little prior knowledge of native structures. This approach may also be useful for predicting physical protein folding routes, non-native conformations, and other physical properties from amino acid sequences. PMID:19186130
PDB2Graph: A toolbox for identifying critical amino acids map in proteins based on graph theory.
Niknam, Niloofar; Khakzad, Hamed; Arab, Seyed Shahriar; Naderi-Manesh, Hossein
2016-05-01
The integrative and cooperative nature of protein structure involves the assessment of topological and global features of constituent parts. Network concept takes complete advantage of both of these properties in the analysis concomitantly. High compatibility to structural concepts or physicochemical properties in addition to exploiting a remarkable simplification in the system has made network an ideal tool to explore biological systems. There are numerous examples in which different protein structural and functional characteristics have been clarified by the network approach. Here, we present an interactive and user-friendly Matlab-based toolbox, PDB2Graph, devoted to protein structure network construction, visualization, and analysis. Moreover, PDB2Graph is an appropriate tool for identifying critical nodes involved in protein structural robustness and function based on centrality indices. It maps critical amino acids in protein networks and can greatly aid structural biologists in selecting proper amino acid candidates for manipulating protein structures in a more reasonable and rational manner. To introduce the capability and efficiency of PDB2Graph in detail, the structural modification of Calmodulin through allosteric binding of Ca(2+) is considered. In addition, a mutational analysis for three well-identified model proteins including Phage T4 lysozyme, Barnase and Ribonuclease HI, was performed to inspect the influence of mutating important central residues on protein activity. Copyright © 2016 Elsevier Ltd. All rights reserved.
Quality assessment of protein model-structures based on structural and functional similarities
2012-01-01
Background Experimental determination of protein 3D structures is expensive, time consuming and sometimes impossible. A gap between number of protein structures deposited in the World Wide Protein Data Bank and the number of sequenced proteins constantly broadens. Computational modeling is deemed to be one of the ways to deal with the problem. Although protein 3D structure prediction is a difficult task, many tools are available. These tools can model it from a sequence or partial structural information, e.g. contact maps. Consequently, biologists have the ability to generate automatically a putative 3D structure model of any protein. However, the main issue becomes evaluation of the model quality, which is one of the most important challenges of structural biology. Results GOBA - Gene Ontology-Based Assessment is a novel Protein Model Quality Assessment Program. It estimates the compatibility between a model-structure and its expected function. GOBA is based on the assumption that a high quality model is expected to be structurally similar to proteins functionally similar to the prediction target. Whereas DALI is used to measure structure similarity, protein functional similarity is quantified using standardized and hierarchical description of proteins provided by Gene Ontology combined with Wang's algorithm for calculating semantic similarity. Two approaches are proposed to express the quality of protein model-structures. One is a single model quality assessment method, the other is its modification, which provides a relative measure of model quality. Exhaustive evaluation is performed on data sets of model-structures submitted to the CASP8 and CASP9 contests. Conclusions The validation shows that the method is able to discriminate between good and bad model-structures. The best of tested GOBA scores achieved 0.74 and 0.8 as a mean Pearson correlation to the observed quality of models in our CASP8 and CASP9-based validation sets. GOBA also obtained the best result for two targets of CASP8, and one of CASP9, compared to the contest participants. Consequently, GOBA offers a novel single model quality assessment program that addresses the practical needs of biologists. In conjunction with other Model Quality Assessment Programs (MQAPs), it would prove useful for the evaluation of single protein models. PMID:22998498
Yan, Yumeng; Wen, Zeyu; Wang, Xinxiang; Huang, Sheng-You
2017-03-01
Protein-protein docking is an important computational tool for predicting protein-protein interactions. With the rapid development of proteomics projects, more and more experimental binding information ranging from mutagenesis data to three-dimensional structures of protein complexes are becoming available. Therefore, how to appropriately incorporate the biological information into traditional ab initio docking has been an important issue and challenge in the field of protein-protein docking. To address these challenges, we have developed a Hybrid DOCKing protocol of template-based and template-free approaches, referred to as HDOCK. The basic procedure of HDOCK is to model the structures of individual components based on the template complex by a template-based method if a template is available; otherwise, the component structures will be modeled based on monomer proteins by regular homology modeling. Then, the complex structure of the component models is predicted by traditional protein-protein docking. With the HDOCK protocol, we have participated in the CPARI experiment for rounds 28-35. Out of the 25 CASP-CAPRI targets for oligomer modeling, our HDOCK protocol predicted correct models for 16 targets, ranking one of the top algorithms in this challenge. Our docking method also made correct predictions on other CAPRI challenges such as protein-peptide binding for 6 out of 8 targets and water predictions for 2 out of 2 targets. The advantage of our hybrid docking approach over pure template-based docking was further confirmed by a comparative evaluation on 20 CASP-CAPRI targets. Proteins 2017; 85:497-512. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
Jian, Jhih-Wei; Elumalai, Pavadai; Pitti, Thejkiran; Wu, Chih Yuan; Tsai, Keng-Chang; Chang, Jeng-Yih; Peng, Hung-Pin; Yang, An-Suei
2016-01-01
Predicting ligand binding sites (LBSs) on protein structures, which are obtained either from experimental or computational methods, is a useful first step in functional annotation or structure-based drug design for the protein structures. In this work, the structure-based machine learning algorithm ISMBLab-LIG was developed to predict LBSs on protein surfaces with input attributes derived from the three-dimensional probability density maps of interacting atoms, which were reconstructed on the query protein surfaces and were relatively insensitive to local conformational variations of the tentative ligand binding sites. The prediction accuracy of the ISMBLab-LIG predictors is comparable to that of the best LBS predictors benchmarked on several well-established testing datasets. More importantly, the ISMBLab-LIG algorithm has substantial tolerance to the prediction uncertainties of computationally derived protein structure models. As such, the method is particularly useful for predicting LBSs not only on experimental protein structures without known LBS templates in the database but also on computationally predicted model protein structures with structural uncertainties in the tentative ligand binding sites. PMID:27513851
Czaplewski, Cezary; Karczynska, Agnieszka; Sieradzan, Adam K; Liwo, Adam
2018-04-30
A server implementation of the UNRES package (http://www.unres.pl) for coarse-grained simulations of protein structures with the physics-based UNRES model, coined a name UNRES server, is presented. In contrast to most of the protein coarse-grained models, owing to its physics-based origin, the UNRES force field can be used in simulations, including those aimed at protein-structure prediction, without ancillary information from structural databases; however, the implementation includes the possibility of using restraints. Local energy minimization, canonical molecular dynamics simulations, replica exchange and multiplexed replica exchange molecular dynamics simulations can be run with the current UNRES server; the latter are suitable for protein-structure prediction. The user-supplied input includes protein sequence and, optionally, restraints from secondary-structure prediction or small x-ray scattering data, and simulation type and parameters which are selected or typed in. Oligomeric proteins, as well as those containing D-amino-acid residues and disulfide links can be treated. The output is displayed graphically (minimized structures, trajectories, final models, analysis of trajectory/ensembles); however, all output files can be downloaded by the user. The UNRES server can be freely accessed at http://unres-server.chem.ug.edu.pl.
Modularity in protein structures: study on all-alpha proteins.
Khan, Taushif; Ghosh, Indira
2015-01-01
Modularity is known as one of the most important features of protein's robust and efficient design. The architecture and topology of proteins play a vital role by providing necessary robust scaffolds to support organism's growth and survival in constant evolutionary pressure. These complex biomolecules can be represented by several layers of modular architecture, but it is pivotal to understand and explore the smallest biologically relevant structural component. In the present study, we have developed a component-based method, using protein's secondary structures and their arrangements (i.e. patterns) in order to investigate its structural space. Our result on all-alpha protein shows that the known structural space is highly populated with limited set of structural patterns. We have also noticed that these frequently observed structural patterns are present as modules or "building blocks" in large proteins (i.e. higher secondary structure content). From structural descriptor analysis, observed patterns are found to be within similar deviation; however, frequent patterns are found to be distinctly occurring in diverse functions e.g. in enzymatic classes and reactions. In this study, we are introducing a simple approach to explore protein structural space using combinatorial- and graph-based geometry methods, which can be used to describe modularity in protein structures. Moreover, analysis indicates that protein function seems to be the driving force that shapes the known structure space.
Synchrotron IR microspectroscopy for protein structure analysis: Potential and questions
Yu, Peiqiang
2006-01-01
Synchrotron radiation-based Fourier transform infrared microspectroscopy (S-FTIR) has been developed as a rapid, direct, non-destructive, bioanalytical technique. This technique takes advantage of synchrotron light brightness and small effective source size and is capable of exploring the molecular chemical make-up within microstructures of a biological tissue without destruction of inherent structures at ultra-spatial resolutions within cellular dimension. To date there has been very little application of this advanced technique to the study of pure protein inherent structure at a cellular level in biological tissues. In this review, a novel approach was introduced to show the potential of the newly developed, advancedmore » synchrotron-based analytical technology, which can be used to localize relatively “pure“ protein in the plant tissues and relatively reveal protein inherent structure and protein molecular chemical make-up within intact tissue at cellular and subcellular levels. Several complex protein IR spectra data analytical techniques (Gaussian and Lorentzian multi-component peak modeling, univariate and multivariate analysis, principal component analysis (PCA), and hierarchical cluster analysis (CLA) are employed to relatively reveal features of protein inherent structure and distinguish protein inherent structure differences between varieties/species and treatments in plant tissues. By using a multi-peak modeling procedure, RELATIVE estimates (but not EXACT determinations) for protein secondary structure analysis can be made for comparison purpose. The issues of pro- and anti-multi-peaking modeling/fitting procedure for relative estimation of protein structure were discussed. By using the PCA and CLA analyses, the plant molecular structure can be qualitatively separate one group from another, statistically, even though the spectral assignments are not known. The synchrotron-based technology provides a new approach for protein structure research in biological tissues at ultraspatial resolutions.« less
Zhu, Guanhua; Liu, Wei; Bao, Chenglong; Tong, Dudu; Ji, Hui; Shen, Zuowei; Yang, Daiwen; Lu, Lanyuan
2018-05-01
The structural variations of multidomain proteins with flexible parts mediate many biological processes, and a structure ensemble can be determined by selecting a weighted combination of representative structures from a simulated structure pool, producing the best fit to experimental constraints such as interatomic distance. In this study, a hybrid structure-based and physics-based atomistic force field with an efficient sampling strategy is adopted to simulate a model di-domain protein against experimental paramagnetic relaxation enhancement (PRE) data that correspond to distance constraints. The molecular dynamics simulations produce a wide range of conformations depicted on a protein energy landscape. Subsequently, a conformational ensemble recovered with low-energy structures and the minimum-size restraint is identified in good agreement with experimental PRE rates, and the result is also supported by chemical shift perturbations and small-angle X-ray scattering data. It is illustrated that the regularizations of energy and ensemble-size prevent an arbitrary interpretation of protein conformations. Moreover, energy is found to serve as a critical control to refine the structure pool and prevent data overfitting, because the absence of energy regularization exposes ensemble construction to the noise from high-energy structures and causes a more ambiguous representation of protein conformations. Finally, we perform structure-ensemble optimizations with a topology-based structure pool, to enhance the understanding on the ensemble results from different sources of pool candidates. © 2018 Wiley Periodicals, Inc.
The Prediction of Botulinum Toxin Structure Based on in Silico and in Vitro Analysis
NASA Astrophysics Data System (ADS)
Suzuki, Tomonori; Miyazaki, Satoru
2011-01-01
Many of biological system mediated through protein-protein interactions. Knowledge of protein-protein complex structure is required for understanding the function. The determination of huge size and flexible protein-protein complex structure by experimental studies remains difficult, costly and five-consuming, therefore computational prediction of protein structures by homolog modeling and docking studies is valuable method. In addition, MD simulation is also one of the most powerful methods allowing to see the real dynamics of proteins. Here, we predict protein-protein complex structure of botulinum toxin to analyze its property. These bioinformatics methods are useful to report the relation between the flexibility of backbone structure and the activity.
Protein Structure Classification and Loop Modeling Using Multiple Ramachandran Distributions.
Najibi, Seyed Morteza; Maadooliat, Mehdi; Zhou, Lan; Huang, Jianhua Z; Gao, Xin
2017-01-01
Recently, the study of protein structures using angular representations has attracted much attention among structural biologists. The main challenge is how to efficiently model the continuous conformational space of the protein structures based on the differences and similarities between different Ramachandran plots. Despite the presence of statistical methods for modeling angular data of proteins, there is still a substantial need for more sophisticated and faster statistical tools to model the large-scale circular datasets. To address this need, we have developed a nonparametric method for collective estimation of multiple bivariate density functions for a collection of populations of protein backbone angles. The proposed method takes into account the circular nature of the angular data using trigonometric spline which is more efficient compared to existing methods. This collective density estimation approach is widely applicable when there is a need to estimate multiple density functions from different populations with common features. Moreover, the coefficients of adaptive basis expansion for the fitted densities provide a low-dimensional representation that is useful for visualization, clustering, and classification of the densities. The proposed method provides a novel and unique perspective to two important and challenging problems in protein structure research: structure-based protein classification and angular-sampling-based protein loop structure prediction.
Visualizing and Clustering Protein Similarity Networks: Sequences, Structures, and Functions.
Mai, Te-Lun; Hu, Geng-Ming; Chen, Chi-Ming
2016-07-01
Research in the recent decade has demonstrated the usefulness of protein network knowledge in furthering the study of molecular evolution of proteins, understanding the robustness of cells to perturbation, and annotating new protein functions. In this study, we aimed to provide a general clustering approach to visualize the sequence-structure-function relationship of protein networks, and investigate possible causes for inconsistency in the protein classifications based on sequences, structures, and functions. Such visualization of protein networks could facilitate our understanding of the overall relationship among proteins and help researchers comprehend various protein databases. As a demonstration, we clustered 1437 enzymes by their sequences and structures using the minimum span clustering (MSC) method. The general structure of this protein network was delineated at two clustering resolutions, and the second level MSC clustering was found to be highly similar to existing enzyme classifications. The clustering of these enzymes based on sequence, structure, and function information is consistent with each other. For proteases, the Jaccard's similarity coefficient is 0.86 between sequence and function classifications, 0.82 between sequence and structure classifications, and 0.78 between structure and function classifications. From our clustering results, we discussed possible examples of divergent evolution and convergent evolution of enzymes. Our clustering approach provides a panoramic view of the sequence-structure-function network of proteins, helps visualize the relation between related proteins intuitively, and is useful in predicting the structure and function of newly determined protein sequences.
González-Díaz, Humberto; Munteanu, Cristian R; Postelnicu, Lucian; Prado-Prado, Francisco; Gestal, Marcos; Pazos, Alejandro
2012-03-01
Lipid-Binding Proteins (LIBPs) or Fatty Acid-Binding Proteins (FABPs) play an important role in many diseases such as different types of cancer, kidney injury, atherosclerosis, diabetes, intestinal ischemia and parasitic infections. Thus, the computational methods that can predict LIBPs based on 3D structure parameters became a goal of major importance for drug-target discovery, vaccine design and biomarker selection. In addition, the Protein Data Bank (PDB) contains 3000+ protein 3D structures with unknown function. This list, as well as new experimental outcomes in proteomics research, is a very interesting source to discover relevant proteins, including LIBPs. However, to the best of our knowledge, there are no general models to predict new LIBPs based on 3D structures. We developed new Quantitative Structure-Activity Relationship (QSAR) models based on 3D electrostatic parameters of 1801 different proteins, including 801 LIBPs. We calculated these electrostatic parameters with the MARCH-INSIDE software and they correspond to the entire protein or to specific protein regions named core, inner, middle, and surface. We used these parameters as inputs to develop a simple Linear Discriminant Analysis (LDA) classifier to discriminate 3D structure of LIBPs from other proteins. We implemented this predictor in the web server named LIBP-Pred, freely available at , along with other important web servers of the Bio-AIMS portal. The users can carry out an automatic retrieval of protein structures from PDB or upload their custom protein structural models from their disk created with LOMETS server. We demonstrated the PDB mining option performing a predictive study of 2000+ proteins with unknown function. Interesting results regarding the discovery of new Cancer Biomarkers in humans or drug targets in parasites have been discussed here in this sense.
CAB-Align: A Flexible Protein Structure Alignment Method Based on the Residue-Residue Contact Area.
Terashi, Genki; Takeda-Shitaka, Mayuko
2015-01-01
Proteins are flexible, and this flexibility has an essential functional role. Flexibility can be observed in loop regions, rearrangements between secondary structure elements, and conformational changes between entire domains. However, most protein structure alignment methods treat protein structures as rigid bodies. Thus, these methods fail to identify the equivalences of residue pairs in regions with flexibility. In this study, we considered that the evolutionary relationship between proteins corresponds directly to the residue-residue physical contacts rather than the three-dimensional (3D) coordinates of proteins. Thus, we developed a new protein structure alignment method, contact area-based alignment (CAB-align), which uses the residue-residue contact area to identify regions of similarity. The main purpose of CAB-align is to identify homologous relationships at the residue level between related protein structures. The CAB-align procedure comprises two main steps: First, a rigid-body alignment method based on local and global 3D structure superposition is employed to generate a sufficient number of initial alignments. Then, iterative dynamic programming is executed to find the optimal alignment. We evaluated the performance and advantages of CAB-align based on four main points: (1) agreement with the gold standard alignment, (2) alignment quality based on an evolutionary relationship without 3D coordinate superposition, (3) consistency of the multiple alignments, and (4) classification agreement with the gold standard classification. Comparisons of CAB-align with other state-of-the-art protein structure alignment methods (TM-align, FATCAT, and DaliLite) using our benchmark dataset showed that CAB-align performed robustly in obtaining high-quality alignments and generating consistent multiple alignments with high coverage and accuracy rates, and it performed extremely well when discriminating between homologous and nonhomologous pairs of proteins in both single and multi-domain comparisons. The CAB-align software is freely available to academic users as stand-alone software at http://www.pharm.kitasato-u.ac.jp/bmd/bmd/Publications.html.
Khafizov, Kamil; Madrid-Aliste, Carlos; Almo, Steven C.; Fiser, Andras
2014-01-01
The exponential growth of protein sequence data provides an ever-expanding body of unannotated and misannotated proteins. The National Institutes of Health-supported Protein Structure Initiative and related worldwide structural genomics efforts facilitate functional annotation of proteins through structural characterization. Recently there have been profound changes in the taxonomic composition of sequence databases, which are effectively redefining the scope and contribution of these large-scale structure-based efforts. The faster-growing bacterial genomic entries have overtaken the eukaryotic entries over the last 5 y, but also have become more redundant. Despite the enormous increase in the number of sequences, the overall structural coverage of proteins—including proteins for which reliable homology models can be generated—on the residue level has increased from 30% to 40% over the last 10 y. Structural genomics efforts contributed ∼50% of this new structural coverage, despite determining only ∼10% of all new structures. Based on current trends, it is expected that ∼55% structural coverage (the level required for significant functional insight) will be achieved within 15 y, whereas without structural genomics efforts, realizing this goal will take approximately twice as long. PMID:24567391
A protein block based fold recognition method for the annotation of twilight zone sequences.
Suresh, V; Ganesan, K; Parthasarathy, S
2013-03-01
The description of protein backbone was recently improved with a group of structural fragments called Structural Alphabets instead of the regular three states (Helix, Sheet and Coil) secondary structure description. Protein Blocks is one of the Structural Alphabets used to describe each and every region of protein backbone including the coil. According to de Brevern (2000) the Protein Blocks has 16 structural fragments and each one has 5 residues in length. Protein Blocks fragments are highly informative among the available Structural Alphabets and it has been used for many applications. Here, we present a protein fold recognition method based on Protein Blocks for the annotation of twilight zone sequences. In our method, we align the predicted Protein Blocks of a query amino acid sequence with a library of assigned Protein Blocks of 953 known folds using the local pair-wise alignment. The alignment results with z-value ≥ 2.5 and P-value ≤ 0.08 are predicted as possible folds. Our method is able to recognize the possible folds for nearly 35.5% of the twilight zone sequences with their predicted Protein Block sequence obtained by pb_prediction, which is available at Protein Block Export server.
Protein structure refinement using a quantum mechanics-based chemical shielding predictor.
Bratholm, Lars A; Jensen, Jan H
2017-03-01
The accurate prediction of protein chemical shifts using a quantum mechanics (QM)-based method has been the subject of intense research for more than 20 years but so far empirical methods for chemical shift prediction have proven more accurate. In this paper we show that a QM-based predictor of a protein backbone and CB chemical shifts (ProCS15, PeerJ , 2016, 3, e1344) is of comparable accuracy to empirical chemical shift predictors after chemical shift-based structural refinement that removes small structural errors. We present a method by which quantum chemistry based predictions of isotropic chemical shielding values (ProCS15) can be used to refine protein structures using Markov Chain Monte Carlo (MCMC) simulations, relating the chemical shielding values to the experimental chemical shifts probabilistically. Two kinds of MCMC structural refinement simulations were performed using force field geometry optimized X-ray structures as starting points: simulated annealing of the starting structure and constant temperature MCMC simulation followed by simulated annealing of a representative ensemble structure. Annealing of the CHARMM structure changes the CA-RMSD by an average of 0.4 Å but lowers the chemical shift RMSD by 1.0 and 0.7 ppm for CA and N. Conformational averaging has a relatively small effect (0.1-0.2 ppm) on the overall agreement with carbon chemical shifts but lowers the error for nitrogen chemical shifts by 0.4 ppm. If an amino acid specific offset is included the ProCS15 predicted chemical shifts have RMSD values relative to experiments that are comparable to popular empirical chemical shift predictors. The annealed representative ensemble structures differ in CA-RMSD relative to the initial structures by an average of 2.0 Å, with >2.0 Å difference for six proteins. In four of the cases, the largest structural differences arise in structurally flexible regions of the protein as determined by NMR, and in the remaining two cases, the large structural change may be due to force field deficiencies. The overall accuracy of the empirical methods are slightly improved by annealing the CHARMM structure with ProCS15, which may suggest that the minor structural changes introduced by ProCS15-based annealing improves the accuracy of the protein structures. Having established that QM-based chemical shift prediction can deliver the same accuracy as empirical shift predictors we hope this can help increase the accuracy of related approaches such as QM/MM or linear scaling approaches or interpreting protein structural dynamics from QM-derived chemical shift.
Fast protein tertiary structure retrieval based on global surface shape similarity.
Sael, Lee; Li, Bin; La, David; Fang, Yi; Ramani, Karthik; Rustamov, Raif; Kihara, Daisuke
2008-09-01
Characterization and identification of similar tertiary structure of proteins provides rich information for investigating function and evolution. The importance of structure similarity searches is increasing as structure databases continue to expand, partly due to the structural genomics projects. A crucial drawback of conventional protein structure comparison methods, which compare structures by their main-chain orientation or the spatial arrangement of secondary structure, is that a database search is too slow to be done in real-time. Here we introduce a global surface shape representation by three-dimensional (3D) Zernike descriptors, which represent a protein structure compactly as a series expansion of 3D functions. With this simplified representation, the search speed against a few thousand structures takes less than a minute. To investigate the agreement between surface representation defined by 3D Zernike descriptor and conventional main-chain based representation, a benchmark was performed against a protein classification generated by the combinatorial extension algorithm. Despite the different representation, 3D Zernike descriptor retrieved proteins of the same conformation defined by combinatorial extension in 89.6% of the cases within the top five closest structures. The real-time protein structure search by 3D Zernike descriptor will open up new possibility of large-scale global and local protein surface shape comparison. 2008 Wiley-Liss, Inc.
2013-01-01
Background SNPs&GO is a method for the prediction of deleterious Single Amino acid Polymorphisms (SAPs) using protein functional annotation. In this work, we present the web server implementation of SNPs&GO (WS-SNPs&GO). The server is based on Support Vector Machines (SVM) and for a given protein, its input comprises: the sequence and/or its three-dimensional structure (when available), a set of target variations and its functional Gene Ontology (GO) terms. The output of the server provides, for each protein variation, the probabilities to be associated to human diseases. Results The server consists of two main components, including updated versions of the sequence-based SNPs&GO (recently scored as one of the best algorithms for predicting deleterious SAPs) and of the structure-based SNPs&GO3d programs. Sequence and structure based algorithms are extensively tested on a large set of annotated variations extracted from the SwissVar database. Selecting a balanced dataset with more than 38,000 SAPs, the sequence-based approach achieves 81% overall accuracy, 0.61 correlation coefficient and an Area Under the Curve (AUC) of the Receiver Operating Characteristic (ROC) curve of 0.88. For the subset of ~6,600 variations mapped on protein structures available at the Protein Data Bank (PDB), the structure-based method scores with 84% overall accuracy, 0.68 correlation coefficient, and 0.91 AUC. When tested on a new blind set of variations, the results of the server are 79% and 83% overall accuracy for the sequence-based and structure-based inputs, respectively. Conclusions WS-SNPs&GO is a valuable tool that includes in a unique framework information derived from protein sequence, structure, evolutionary profile, and protein function. WS-SNPs&GO is freely available at http://snps.biofold.org/snps-and-go. PMID:23819482
Unraveling the meaning of chemical shifts in protein NMR.
Berjanskii, Mark V; Wishart, David S
2017-11-01
Chemical shifts are among the most informative parameters in protein NMR. They provide wealth of information about protein secondary and tertiary structure, protein flexibility, and protein-ligand binding. In this report, we review the progress in interpreting and utilizing protein chemical shifts that has occurred over the past 25years, with a particular focus on the large body of work arising from our group and other Canadian NMR laboratories. More specifically, this review focuses on describing, assessing, and providing some historical context for various chemical shift-based methods to: (1) determine protein secondary and super-secondary structure; (2) derive protein torsion angles; (3) assess protein flexibility; (4) predict residue accessible surface area; (5) refine 3D protein structures; (6) determine 3D protein structures and (7) characterize intrinsically disordered proteins. This review also briefly covers some of the methods that we previously developed to predict chemical shifts from 3D protein structures and/or protein sequence data. It is hoped that this review will help to increase awareness of the considerable utility of NMR chemical shifts in structural biology and facilitate more widespread adoption of chemical-shift based methods by the NMR spectroscopists, structural biologists, protein biophysicists, and biochemists worldwide. This article is part of a Special Issue entitled: Biophysics in Canada, edited by Lewis Kay, John Baenziger, Albert Berghuis and Peter Tieleman. Copyright © 2017 Elsevier B.V. All rights reserved.
Characterizing protein domain associations by Small-molecule ligand binding
Li, Qingliang; Cheng, Tiejun; Wang, Yanli; Bryant, Stephen H.
2012-01-01
Background Protein domains are evolutionarily conserved building blocks for protein structure and function, which are conventionally identified based on protein sequence or structure similarity. Small molecule binding domains are of great importance for the recognition of small molecules in biological systems and drug development. Many small molecules, including drugs, have been increasingly identified to bind to multiple targets, leading to promiscuous interactions with protein domains. Thus, a large scale characterization of the protein domains and their associations with respect to small-molecule binding is of particular interest to system biology research, drug target identification, as well as drug repurposing. Methods We compiled a collection of 13,822 physical interactions of small molecules and protein domains derived from the Protein Data Bank (PDB) structures. Based on the chemical similarity of these small molecules, we characterized pairwise associations of the protein domains and further investigated their global associations from a network point of view. Results We found that protein domains, despite lack of similarity in sequence and structure, were comprehensively associated through binding the same or similar small-molecule ligands. Moreover, we identified modules in the domain network that consisted of closely related protein domains by sharing similar biochemical mechanisms, being involved in relevant biological pathways, or being regulated by the same cognate cofactors. Conclusions A novel protein domain relationship was identified in the context of small-molecule binding, which is complementary to those identified by traditional sequence-based or structure-based approaches. The protein domain network constructed in the present study provides a novel perspective for chemogenomic study and network pharmacology, as well as target identification for drug repurposing. PMID:23745168
Mavridis, Lazaros; Janes, Robert W
2017-01-01
Circular dichroism (CD) spectroscopy is extensively utilized for determining the percentages of secondary structure content present in proteins. However, although a large contributor, secondary structure is not the only factor that influences the shape and magnitude of the CD spectrum produced. Other structural features can make contributions so an entire protein structural conformation can give rise to a CD spectrum. There is a need for an application capable of generating protein CD spectra from atomic coordinates. However, no empirically derived method to do this currently exists. PDB2CD has been created as an empirical-based approach to the generation of protein CD spectra from atomic coordinates. The method utilizes a combination of structural features within the conformation of a protein; not only its percentage secondary structure content, but also the juxtaposition of these structural components relative to one another, and the overall structure similarity of the query protein to proteins in our dataset, the SP175 dataset, the 'gold standard' set obtained from the Protein Circular Dichroism Data Bank (PCDDB). A significant number of the CD spectra associated with the 71 proteins in this dataset have been produced with excellent accuracy using a leave-one-out cross-validation process. The method also creates spectra in good agreement with those of a test set of 14 proteins from the PCDDB. The PDB2CD package provides a web-based, user friendly approach to enable researchers to produce CD spectra from protein atomic coordinates. http://pdb2cd.cryst.bbk.ac.uk CONTACT: r.w.janes@qmul.ac.ukSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.
Thermostabilisation of membrane proteins for structural studies
Magnani, Francesca; Serrano-Vega, Maria J.; Shibata, Yoko; Abdul-Hussein, Saba; Lebon, Guillaume; Miller-Gallacher, Jennifer; Singhal, Ankita; Strege, Annette; Thomas, Jennifer A.; Tate, Christopher G.
2017-01-01
The thermostability of an integral membrane protein in detergent solution is a key parameter that dictates the likelihood of obtaining well-diffracting crystals suitable for structure determination. However, many mammalian membrane proteins are too unstable for crystallisation. We developed a thermostabilisation strategy based on systematic mutagenesis coupled to a radioligand-binding thermostability assay that can be applied to receptors, ion channels and transporters. It takes approximately 6-12 months to thermostabilise a G protein-coupled receptor (GPCR) containing 300 amino acid residues. The resulting thermostabilised membrane proteins are more easily crystallised and result in high-quality structures. This methodology has facilitated structure-based drug design applied to GPCRs, because it is possible to determine multiple structures of the thermostabilised receptors bound to low affinity ligands. Protocols and advice are given on how to develop thermostability assays for membrane proteins and how to combine mutations to make an optimally stable mutant suitable for structural studies. PMID:27466713
Segmented molecular design of self-healing proteinaceous materials
NASA Astrophysics Data System (ADS)
Sariola, Veikko; Pena-Francesch, Abdon; Jung, Huihun; Çetinkaya, Murat; Pacheco, Carlos; Sitti, Metin; Demirel, Melik C.
2015-09-01
Hierarchical assembly of self-healing adhesive proteins creates strong and robust structural and interfacial materials, but understanding of the molecular design and structure-property relationships of structural proteins remains unclear. Elucidating this relationship would allow rational design of next generation genetically engineered self-healing structural proteins. Here we report a general self-healing and -assembly strategy based on a multiphase recombinant protein based material. Segmented structure of the protein shows soft glycine- and tyrosine-rich segments with self-healing capability and hard beta-sheet segments. The soft segments are strongly plasticized by water, lowering the self-healing temperature close to body temperature. The hard segments self-assemble into nanoconfined domains to reinforce the material. The healing strength scales sublinearly with contact time, which associates with diffusion and wetting of autohesion. The finding suggests that recombinant structural proteins from heterologous expression have potential as strong and repairable engineering materials.
Segmented molecular design of self-healing proteinaceous materials.
Sariola, Veikko; Pena-Francesch, Abdon; Jung, Huihun; Çetinkaya, Murat; Pacheco, Carlos; Sitti, Metin; Demirel, Melik C
2015-09-01
Hierarchical assembly of self-healing adhesive proteins creates strong and robust structural and interfacial materials, but understanding of the molecular design and structure-property relationships of structural proteins remains unclear. Elucidating this relationship would allow rational design of next generation genetically engineered self-healing structural proteins. Here we report a general self-healing and -assembly strategy based on a multiphase recombinant protein based material. Segmented structure of the protein shows soft glycine- and tyrosine-rich segments with self-healing capability and hard beta-sheet segments. The soft segments are strongly plasticized by water, lowering the self-healing temperature close to body temperature. The hard segments self-assemble into nanoconfined domains to reinforce the material. The healing strength scales sublinearly with contact time, which associates with diffusion and wetting of autohesion. The finding suggests that recombinant structural proteins from heterologous expression have potential as strong and repairable engineering materials.
NASA Astrophysics Data System (ADS)
Xu, Xianjin; Yan, Chengfei; Zou, Xiaoqin
2017-08-01
The growing number of protein-ligand complex structures, particularly the structures of proteins co-bound with different ligands, in the Protein Data Bank helps us tackle two major challenges in molecular docking studies: the protein flexibility and the scoring function. Here, we introduced a systematic strategy by using the information embedded in the known protein-ligand complex structures to improve both binding mode and binding affinity predictions. Specifically, a ligand similarity calculation method was employed to search a receptor structure with a bound ligand sharing high similarity with the query ligand for the docking use. The strategy was applied to the two datasets (HSP90 and MAP4K4) in recent D3R Grand Challenge 2015. In addition, for the HSP90 dataset, a system-specific scoring function (ITScore2_hsp90) was generated by recalibrating our statistical potential-based scoring function (ITScore2) using the known protein-ligand complex structures and the statistical mechanics-based iterative method. For the HSP90 dataset, better performances were achieved for both binding mode and binding affinity predictions comparing with the original ITScore2 and with ensemble docking. For the MAP4K4 dataset, although there were only eight known protein-ligand complex structures, our docking strategy achieved a comparable performance with ensemble docking. Our method for receptor conformational selection and iterative method for the development of system-specific statistical potential-based scoring functions can be easily applied to other protein targets that have a number of protein-ligand complex structures available to improve predictions on binding.
Template-Based Modeling of Protein-RNA Interactions
Zheng, Jinfang; Kundrotas, Petras J.; Vakser, Ilya A.
2016-01-01
Protein-RNA complexes formed by specific recognition between RNA and RNA-binding proteins play an important role in biological processes. More than a thousand of such proteins in human are curated and many novel RNA-binding proteins are to be discovered. Due to limitations of experimental approaches, computational techniques are needed for characterization of protein-RNA interactions. Although much progress has been made, adequate methodologies reliably providing atomic resolution structural details are still lacking. Although protein-RNA free docking approaches proved to be useful, in general, the template-based approaches provide higher quality of predictions. Templates are key to building a high quality model. Sequence/structure relationships were studied based on a representative set of binary protein-RNA complexes from PDB. Several approaches were tested for pairwise target/template alignment. The analysis revealed a transition point between random and correct binding modes. The results showed that structural alignment is better than sequence alignment in identifying good templates, suitable for generating protein-RNA complexes close to the native structure, and outperforms free docking, successfully predicting complexes where the free docking fails, including cases of significant conformational change upon binding. A template-based protein-RNA interaction modeling protocol PRIME was developed and benchmarked on a representative set of complexes. PMID:27662342
von Grotthuss, Marcin; Plewczynski, Dariusz; Ginalski, Krzysztof; Rychlewski, Leszek; Shakhnovich, Eugene I
2006-02-06
The number of protein structures from structural genomics centers dramatically increases in the Protein Data Bank (PDB). Many of these structures are functionally unannotated because they have no sequence similarity to proteins of known function. However, it is possible to successfully infer function using only structural similarity. Here we present the PDB-UF database, a web-accessible collection of predictions of enzymatic properties using structure-function relationship. The assignments were conducted for three-dimensional protein structures of unknown function that come from structural genomics initiatives. We show that 4 hypothetical proteins (with PDB accession codes: 1VH0, 1NS5, 1O6D, and 1TO0), for which standard BLAST tools such as PSI-BLAST or RPS-BLAST failed to assign any function, are probably methyltransferase enzymes. We suggest that the structure-based prediction of an EC number should be conducted having the different similarity score cutoff for different protein folds. Moreover, performing the annotation using two different algorithms can reduce the rate of false positive assignments. We believe, that the presented web-based repository will help to decrease the number of protein structures that have functions marked as "unknown" in the PDB file. http://paradox.harvard.edu/PDB-UF and http://bioinfo.pl/PDB-UF.
Structure-based analysis of catalysis and substrate definition in the HIT protein family.
Lima, C D; Klein, M G; Hendrickson, W A
1997-10-10
The histidine triad (HIT) protein family is among the most ubiquitous and highly conserved in nature, but a biological activity has not yet been identified for any member of the HIT family. Fragile histidine triad protein (FHIT) and protein kinase C interacting protein (PKCI) were used in a structure-based approach to elucidate characteristics of in vivo ligands and reactions. Crystallographic structures of apo, substrate analog, pentacovalent transition-state analog, and product states of both enzymes reveal a catalytic mechanism and define substrate characteristics required for catalysis, thus unifying the HIT family as nucleotidyl hydrolases, transferases, or both. The approach described here may be useful in identifying structure-function relations between protein families identified through genomics.
T-RMSD: a web server for automated fine-grained protein structural classification.
Magis, Cedrik; Di Tommaso, Paolo; Notredame, Cedric
2013-07-01
This article introduces the T-RMSD web server (tree-based on root-mean-square deviation), a service allowing the online computation of structure-based protein classification. It has been developed to address the relation between structural and functional similarity in proteins, and it allows a fine-grained structural clustering of a given protein family or group of structurally related proteins using distance RMSD (dRMSD) variations. These distances are computed between all pairs of equivalent residues, as defined by the ungapped columns within a given multiple sequence alignment. Using these generated distance matrices (one per equivalent position), T-RMSD produces a structural tree with support values for each cluster node, reminiscent of bootstrap values. These values, associated with the tree topology, allow a quantitative estimate of structural distances between proteins or group of proteins defined by the tree topology. The clusters thus defined have been shown to be structurally and functionally informative. The T-RMSD web server is a free website open to all users and available at http://tcoffee.crg.cat/apps/tcoffee/do:trmsd.
T-RMSD: a web server for automated fine-grained protein structural classification
Magis, Cedrik; Di Tommaso, Paolo; Notredame, Cedric
2013-01-01
This article introduces the T-RMSD web server (tree-based on root-mean-square deviation), a service allowing the online computation of structure-based protein classification. It has been developed to address the relation between structural and functional similarity in proteins, and it allows a fine-grained structural clustering of a given protein family or group of structurally related proteins using distance RMSD (dRMSD) variations. These distances are computed between all pairs of equivalent residues, as defined by the ungapped columns within a given multiple sequence alignment. Using these generated distance matrices (one per equivalent position), T-RMSD produces a structural tree with support values for each cluster node, reminiscent of bootstrap values. These values, associated with the tree topology, allow a quantitative estimate of structural distances between proteins or group of proteins defined by the tree topology. The clusters thus defined have been shown to be structurally and functionally informative. The T-RMSD web server is a free website open to all users and available at http://tcoffee.crg.cat/apps/tcoffee/do:trmsd. PMID:23716642
Representing and comparing protein structures as paths in three-dimensional space
Zhi, Degui; Krishna, S Sri; Cao, Haibo; Pevzner, Pavel; Godzik, Adam
2006-01-01
Background Most existing formulations of protein structure comparison are based on detailed atomic level descriptions of protein structures and bypass potential insights that arise from a higher-level abstraction. Results We propose a structure comparison approach based on a simplified representation of proteins that describes its three-dimensional path by local curvature along the generalized backbone of the polypeptide. We have implemented a dynamic programming procedure that aligns curvatures of proteins by optimizing a defined sum turning angle deviation measure. Conclusion Although our procedure does not directly optimize global structural similarity as measured by RMSD, our benchmarking results indicate that it can surprisingly well recover the structural similarity defined by structure classification databases and traditional structure alignment programs. In addition, our program can recognize similarities between structures with extensive conformation changes that are beyond the ability of traditional structure alignment programs. We demonstrate the applications of procedure to several contexts of structure comparison. An implementation of our procedure, CURVE, is available as a public webserver. PMID:17052359
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yu,P.
2007-01-01
Studying the secondary structure of proteins leads to an understanding of the components that make up a whole protein, and such an understanding of the structure of the whole protein is often vital to understanding its digestive behaviour and nutritive value in animals. The main protein secondary structures are the {alpha}-helix and {beta}-sheet. The percentage of these two structures in protein secondary structures influences protein nutritive value, quality and digestive behaviour. A high percentage of {beta}-sheet structure may partly cause a low access to gastrointestinal digestive enzymes, which results in a low protein value. The objectives of the present studymore » were to use advanced synchrotron-based Fourier transform IR (S-FTIR) microspectroscopy as a new approach to reveal the molecular chemistry of the protein secondary structures of feed tissues affected by heat-processing within intact tissue at a cellular level, and to quantify protein secondary structures using multicomponent peak modelling Gaussian and Lorentzian methods, in relation to protein digestive behaviours and nutritive value in the rumen, which was determined using the Cornell Net Carbohydrate Protein System. The synchrotron-based molecular chemistry research experiment was performed at the National Synchrotron Light Source at Brookhaven National Laboratory, US Department of Energy. The results showed that, with S-FTIR microspectroscopy, the molecular chemistry, ultrastructural chemical make-up and nutritive characteristics could be revealed at a high ultraspatial resolution ({approx}10 {mu}m). S-FTIR microspectroscopy revealed that the secondary structure of protein differed between raw and roasted golden flaxseeds in terms of the percentages and ratio of {alpha}-helixes and {beta}-sheets in the mid-IR range at the cellular level. By using multicomponent peak modelling, the results show that the roasting reduced (P <0.05) the percentage of {alpha}-helixes (from 47.1% to 36.1%: S-FTIR absorption intensity), increased the percentage of {beta}-sheets (from 37.2% to 49.8%: S-FTIR absorption intensity) and reduced the {alpha}-helix to {beta}-sheet ratio (from 0.3 to 0.7) in the golden flaxseeds, which indicated a negative effect of the roasting on protein values, utilisation and bioavailability. These results were proved by the Cornell Net Carbohydrate Protein System in situ animal trial, which also revealed that roasting increased the amount of protein bound to lignin, and well as of the Maillard reaction protein (both of which are poorly used by ruminants), and increased the level of indigestible and undegradable protein in ruminants. The present results demonstrate the potential of highly spatially resolved synchrotron-based infrared microspectroscopy to locate 'pure' protein in feed tissues, and reveal protein secondary structures and digestive behaviour, making a significant step forward in and an important contribution to protein nutritional research. Further study is needed to determine the sensitivities of protein secondary structures to various heat-processing conditions, and to quantify the relationship between protein secondary structures and the nutrient availability and digestive behaviour of various protein sources. Information from the present study arising from the synchrotron-based IR probing of the protein secondary structures of protein sources at the cellular level will be valuable as a guide to maintaining protein quality and predicting digestive behaviours.« less
Functional Evolution of PLP-dependent Enzymes based on Active-Site Structural Similarities
Catazaro, Jonathan; Caprez, Adam; Guru, Ashu; Swanson, David; Powers, Robert
2014-01-01
Families of distantly related proteins typically have very low sequence identity, which hinders evolutionary analysis and functional annotation. Slowly evolving features of proteins, such as an active site, are therefore valuable for annotating putative and distantly related proteins. To date, a complete evolutionary analysis of the functional relationship of an entire enzyme family based on active-site structural similarities has not yet been undertaken. Pyridoxal-5’-phosphate (PLP) dependent enzymes are primordial enzymes that diversified in the last universal ancestor. Using the Comparison of Protein Active Site Structures (CPASS) software and database, we show that the active site structures of PLP-dependent enzymes can be used to infer evolutionary relationships based on functional similarity. The enzymes successfully clustered together based on substrate specificity, function, and three-dimensional fold. This study demonstrates the value of using active site structures for functional evolutionary analysis and the effectiveness of CPASS. PMID:24920327
Functional evolution of PLP-dependent enzymes based on active-site structural similarities.
Catazaro, Jonathan; Caprez, Adam; Guru, Ashu; Swanson, David; Powers, Robert
2014-10-01
Families of distantly related proteins typically have very low sequence identity, which hinders evolutionary analysis and functional annotation. Slowly evolving features of proteins, such as an active site, are therefore valuable for annotating putative and distantly related proteins. To date, a complete evolutionary analysis of the functional relationship of an entire enzyme family based on active-site structural similarities has not yet been undertaken. Pyridoxal-5'-phosphate (PLP) dependent enzymes are primordial enzymes that diversified in the last universal ancestor. Using the comparison of protein active site structures (CPASS) software and database, we show that the active site structures of PLP-dependent enzymes can be used to infer evolutionary relationships based on functional similarity. The enzymes successfully clustered together based on substrate specificity, function, and three-dimensional-fold. This study demonstrates the value of using active site structures for functional evolutionary analysis and the effectiveness of CPASS. © 2014 Wiley Periodicals, Inc.
Shen, Hong-Bin; Yi, Dong-Liang; Yao, Li-Xiu; Yang, Jie; Chou, Kuo-Chen
2008-10-01
In the postgenomic age, with the avalanche of protein sequences generated and relatively slow progress in determining their structures by experiments, it is important to develop automated methods to predict the structure of a protein from its sequence. The membrane proteins are a special group in the protein family that accounts for approximately 30% of all proteins; however, solved membrane protein structures only represent less than 1% of known protein structures to date. Although a great success has been achieved for developing computational intelligence techniques to predict secondary structures in both globular and membrane proteins, there is still much challenging work in this regard. In this review article, we firstly summarize the recent progress of automation methodology development in predicting protein secondary structures, especially in membrane proteins; we will then give some future directions in this research field.
Protein asparagine deamidation prediction based on structures with machine learning methods.
Jia, Lei; Sun, Yaxiong
2017-01-01
Chemical stability is a major concern in the development of protein therapeutics due to its impact on both efficacy and safety. Protein "hotspots" are amino acid residues that are subject to various chemical modifications, including deamidation, isomerization, glycosylation, oxidation etc. A more accurate prediction method for potential hotspot residues would allow their elimination or reduction as early as possible in the drug discovery process. In this work, we focus on prediction models for asparagine (Asn) deamidation. Sequence-based prediction method simply identifies the NG motif (amino acid asparagine followed by a glycine) to be liable to deamidation. It still dominates deamidation evaluation process in most pharmaceutical setup due to its convenience. However, the simple sequence-based method is less accurate and often causes over-engineering a protein. We introduce structure-based prediction models by mining available experimental and structural data of deamidated proteins. Our training set contains 194 Asn residues from 25 proteins that all have available high-resolution crystal structures. Experimentally measured deamidation half-life of Asn in penta-peptides as well as 3D structure-based properties, such as solvent exposure, crystallographic B-factors, local secondary structure and dihedral angles etc., were used to train prediction models with several machine learning algorithms. The prediction tools were cross-validated as well as tested with an external test data set. The random forest model had high enrichment in ranking deamidated residues higher than non-deamidated residues while effectively eliminated false positive predictions. It is possible that such quantitative protein structure-function relationship tools can also be applied to other protein hotspot predictions. In addition, we extensively discussed metrics being used to evaluate the performance of predicting unbalanced data sets such as the deamidation case.
Urvoas, Agathe; Guellouz, Asma; Valerio-Lepiniec, Marie; Graille, Marc; Durand, Dominique; Desravines, Danielle C; van Tilbeurgh, Herman; Desmadril, Michel; Minard, Philippe
2010-11-26
Repeat proteins have a modular organization and a regular architecture that make them attractive models for design and directed evolution experiments. HEAT repeat proteins, although very common, have not been used as a scaffold for artificial proteins, probably because they are made of long and irregular repeats. Here, we present and validate a consensus sequence for artificial HEAT repeat proteins. The sequence was defined from the structure-based sequence analysis of a thermostable HEAT-like repeat protein. Appropriate sequences were identified for the N- and C-caps. A library of genes coding for artificial proteins based on this sequence design, named αRep, was assembled using new and versatile methodology based on circular amplification. Proteins picked randomly from this library are expressed as soluble proteins. The biophysical properties of proteins with different numbers of repeats and different combinations of side chains in hypervariable positions were characterized. Circular dichroism and differential scanning calorimetry experiments showed that all these proteins are folded cooperatively and are very stable (T(m) >70 °C). Stability of these proteins increases with the number of repeats. Detailed gel filtration and small-angle X-ray scattering studies showed that the purified proteins form either monomers or dimers. The X-ray structure of a stable dimeric variant structure was solved. The protein is folded with a highly regular topology and the repeat structure is organized, as expected, as pairs of alpha helices. In this protein variant, the dimerization interface results directly from the variable surface enriched in aromatic residues located in the randomized positions of the repeats. The dimer was crystallized both in an apo and in a PEG-bound form, revealing a very well defined binding crevice and some structure flexibility at the interface. This fortuitous binding site could later prove to be a useful binding site for other low molecular mass partners. Copyright © 2010 Elsevier Ltd. All rights reserved.
Structure-Based Characterization of Multiprotein Complexes
Wiederstein, Markus; Gruber, Markus; Frank, Karl; Melo, Francisco; Sippl, Manfred J.
2014-01-01
Summary Multiprotein complexes govern virtually all cellular processes. Their 3D structures provide important clues to their biological roles, especially through structural correlations among protein molecules and complexes. The detection of such correlations generally requires comprehensive searches in databases of known protein structures by means of appropriate structure-matching techniques. Here, we present a high-speed structure search engine capable of instantly matching large protein oligomers against the complete and up-to-date database of biologically functional assemblies of protein molecules. We use this tool to reveal unseen structural correlations on the level of protein quaternary structure and demonstrate its general usefulness for efficiently exploring complex structural relationships among known protein assemblies. PMID:24954616
An object programming based environment for protein secondary structure prediction.
Giacomini, M; Ruggiero, C; Sacile, R
1996-01-01
The most frequently used methods for protein secondary structure prediction are empirical statistical methods and rule based methods. A consensus system based on object-oriented programming is presented, which integrates the two approaches with the aim of improving the prediction quality. This system uses an object-oriented knowledge representation based on the concepts of conformation, residue and protein, where the conformation class is the basis, the residue class derives from it and the protein class derives from the residue class. The system has been tested with satisfactory results on several proteins of the Brookhaven Protein Data Bank. Its results have been compared with the results of the most widely used prediction methods, and they show a higher prediction capability and greater stability. Moreover, the system itself provides an index of the reliability of its current prediction. This system can also be regarded as a basis structure for programs of this kind.
Why Is There a Glass Ceiling for Threading Based Protein Structure Prediction Methods?
Skolnick, Jeffrey; Zhou, Hongyi
2017-04-20
Despite their different implementations, comparison of the best threading approaches to the prediction of evolutionary distant protein structures reveals that they tend to succeed or fail on the same protein targets. This is true despite the fact that the structural template library has good templates for all cases. Thus, a key question is why are certain protein structures threadable while others are not. Comparison with threading results on a set of artificial sequences selected for stability further argues that the failure of threading is due to the nature of the protein structures themselves. Using a new contact map based alignment algorithm, we demonstrate that certain folds are highly degenerate in that they can have very similar coarse grained fractions of native contacts aligned and yet differ significantly from the native structure. For threadable proteins, this is not the case. Thus, contemporary threading approaches appear to have reached a plateau, and new approaches to structure prediction are required.
Topological characteristics of helical repeat proteins.
Groves, M R; Barford, D
1999-06-01
The recent elucidation of protein structures based upon repeating amino acid motifs, including the armadillo motif, the HEAT motif and tetratricopeptide repeats, reveals that they belong to the class of helical repeat proteins. These proteins share the common property of being assembled from tandem repeats of an alpha-helical structural unit, creating extended superhelical structures that are ideally suited to create a protein recognition interface.
Protein Structure Determination using Metagenome sequence data
Ovchinnikov, Sergey; Park, Hahnbeom; Varghese, Neha; Huang, Po-Ssu; Pavlopoulos, Georgios A.; Kim, David E.; Kamisetty, Hetunandan; Kyrpides, Nikos C.; Baker, David
2017-01-01
Despite decades of work by structural biologists, there are still ~5200 protein families with unknown structure outside the range of comparative modeling. We show that Rosetta structure prediction guided by residue-residue contacts inferred from evolutionary information can accurately model proteins that belong to large families, and that metagenome sequence data more than triples the number of protein families with sufficient sequences for accurate modeling. We then integrate metagenome data, contact based structure matching and Rosetta structure calculations to generate models for 614 protein families with currently unknown structures; 206 are membrane proteins and 137 have folds not represented in the PDB. This approach provides the representative models for large protein families originally envisioned as the goal of the protein structure initiative at a fraction of the cost. PMID:28104891
Whitmore, Lee; Mavridis, Lazaros; Wallace, B A; Janes, Robert W
2018-01-01
Circular dichroism spectroscopy is a well-used, but simple method in structural biology for providing information on the secondary structure and folds of proteins. DichroMatch (DM@PCDDB) is an online tool that is newly available in the Protein Circular Dichroism Data Bank (PCDDB), which takes advantage of the wealth of spectral and metadata deposited therein, to enable identification of spectral nearest neighbors of a query protein based on four different methods of spectral matching. DM@PCDDB can potentially provide novel information about structural relationships between proteins and can be used in comparison studies of protein homologs and orthologs. © 2017 The Authors Protein Science published by Wiley Periodicals, Inc. on behalf of The Protein Society.
Fukunishi, Yoshifumi
2010-01-01
For fragment-based drug development, both hit (active) compound prediction and docking-pose (protein-ligand complex structure) prediction of the hit compound are important, since chemical modification (fragment linking, fragment evolution) subsequent to the hit discovery must be performed based on the protein-ligand complex structure. However, the naïve protein-compound docking calculation shows poor accuracy in terms of docking-pose prediction. Thus, post-processing of the protein-compound docking is necessary. Recently, several methods for the post-processing of protein-compound docking have been proposed. In FBDD, the compounds are smaller than those for conventional drug screening. This makes it difficult to perform the protein-compound docking calculation. A method to avoid this problem has been reported. Protein-ligand binding free energy estimation is useful to reduce the procedures involved in the chemical modification of the hit fragment. Several prediction methods have been proposed for high-accuracy estimation of protein-ligand binding free energy. This paper summarizes the various computational methods proposed for docking-pose prediction and their usefulness in FBDD.
ERIC Educational Resources Information Center
Robic, Srebrenka
2010-01-01
To fully understand the roles proteins play in cellular processes, students need to grasp complex ideas about protein structure, folding, and stability. Our current understanding of these topics is based on mathematical models and experimental data. However, protein structure, folding, and stability are often introduced as descriptive, qualitative…
Bhagavat, Raghu; Sankar, Santhosh; Srinivasan, Narayanaswamy; Chandra, Nagasuma
2018-03-06
Protein-ligand interactions form the basis of most cellular events. Identifying ligand binding pockets in proteins will greatly facilitate rationalizing and predicting protein function. Ligand binding sites are unknown for many proteins of known three-dimensional (3D) structure, creating a gap in our understanding of protein structure-function relationships. To bridge this gap, we detect pockets in proteins of known 3D structures, using computational techniques. This augmented pocketome (PocketDB) consists of 249,096 pockets, which is about seven times larger than what is currently known. We deduce possible ligand associations for about 46% of the newly identified pockets. The augmented pocketome, when subjected to clustering based on similarities among pockets, yielded 2,161 site types, which are associated with 1,037 ligand types, together providing fold-site-type-ligand-type associations. The PocketDB resource facilitates a structure-based function annotation, delineation of the structural basis of ligand recognition, and provides functional clues for domains of unknown functions, allosteric proteins, and druggable pockets. Copyright © 2018 Elsevier Ltd. All rights reserved.
Nagata, Koji
2010-01-01
Peptides and proteins with similar amino acid sequences can have different biological functions. Knowledge of their three-dimensional molecular structures is critically important in identifying their functional determinants. In this review, I describe the results of our and other groups' structure-based functional characterization of insect insulin-like peptides, a crustacean hyperglycemic hormone-family peptide, a mammalian epidermal growth factor-family protein, and an intracellular signaling domain that recognizes proline-rich sequence.
Protein-Protein Docking in Drug Design and Discovery.
Kaczor, Agnieszka A; Bartuzi, Damian; Stępniewski, Tomasz Maciej; Matosiuk, Dariusz; Selent, Jana
2018-01-01
Protein-protein interactions (PPIs) are responsible for a number of key physiological processes in the living cells and underlie the pathomechanism of many diseases. Nowadays, along with the concept of so-called "hot spots" in protein-protein interactions, which are well-defined interface regions responsible for most of the binding energy, these interfaces can be targeted with modulators. In order to apply structure-based design techniques to design PPIs modulators, a three-dimensional structure of protein complex has to be available. In this context in silico approaches, in particular protein-protein docking, are a valuable complement to experimental methods for elucidating 3D structure of protein complexes. Protein-protein docking is easy to use and does not require significant computer resources and time (in contrast to molecular dynamics) and it results in 3D structure of a protein complex (in contrast to sequence-based methods of predicting binding interfaces). However, protein-protein docking cannot address all the aspects of protein dynamics, in particular the global conformational changes during protein complex formation. In spite of this fact, protein-protein docking is widely used to model complexes of water-soluble proteins and less commonly to predict structures of transmembrane protein assemblies, including dimers and oligomers of G protein-coupled receptors (GPCRs). In this chapter we review the principles of protein-protein docking, available algorithms and software and discuss the recent examples, benefits, and drawbacks of protein-protein docking application to water-soluble proteins, membrane anchoring and transmembrane proteins, including GPCRs.
Bandyopadhyay, Deepak; Huan, Jun; Prins, Jan; Snoeyink, Jack; Wang, Wei; Tropsha, Alexander
2009-11-01
Protein function prediction is one of the central problems in computational biology. We present a novel automated protein structure-based function prediction method using libraries of local residue packing patterns that are common to most proteins in a known functional family. Critical to this approach is the representation of a protein structure as a graph where residue vertices (residue name used as a vertex label) are connected by geometrical proximity edges. The approach employs two steps. First, it uses a fast subgraph mining algorithm to find all occurrences of family-specific labeled subgraphs for all well characterized protein structural and functional families. Second, it queries a new structure for occurrences of a set of motifs characteristic of a known family, using a graph index to speed up Ullman's subgraph isomorphism algorithm. The confidence of function inference from structure depends on the number of family-specific motifs found in the query structure compared with their distribution in a large non-redundant database of proteins. This method can assign a new structure to a specific functional family in cases where sequence alignments, sequence patterns, structural superposition and active site templates fail to provide accurate annotation.
Classification of proteins: available structural space for molecular modeling.
Andreeva, Antonina
2012-01-01
The wealth of available protein structural data provides unprecedented opportunity to study and better understand the underlying principles of protein folding and protein structure evolution. A key to achieving this lies in the ability to analyse these data and to organize them in a coherent classification scheme. Over the past years several protein classifications have been developed that aim to group proteins based on their structural relationships. Some of these classification schemes explore the concept of structural neighbourhood (structural continuum), whereas other utilize the notion of protein evolution and thus provide a discrete rather than continuum view of protein structure space. This chapter presents a strategy for classification of proteins with known three-dimensional structure. Steps in the classification process along with basic definitions are introduced. Examples illustrating some fundamental concepts of protein folding and evolution with a special focus on the exceptions to them are presented.
Su, Qiudong; Guo, Minzhuo; Jia, Zhiyuan; Qiu, Feng; Lu, Xuexin; Gao, Yan; Meng, Qingling; Tian, Ruiguang; Bi, Shengli; Yi, Yao
2016-07-01
Hepatitis A virus (HAV) infection can stimulate the production of antibodies to structural and non-structural proteins of the virus. However, vaccination with an inactivated or attenuated HAV vaccine produces antibodies mainly against structural proteins, whereas no or very limited antibodies are produced against the non-structural proteins. Current diagnostic assays to determine exposure to HAV, such as the Abbott HAV AB test, detect antibodies only to the structural proteins and so are not able to distinguish a natural infection from vaccination with an inactivated or attenuated virus. Here, we constructed a recombinant tandem multi-epitope diagnostic antigen (designated 'H1') based on the immune-dominant epitopes of the non-structural proteins of HAV to distinguish the two situations. H1 protein expressed in Escherichia coli and purified by affinity and anion exchange chromatography was applied in a double-antigen sandwich ELISA for the detection of anti-non-structural HAV proteins, which was confirmed to distinguish a natural infection from vaccination with an inactivated or attenuated HAV vaccine. Copyright © 2016 Elsevier B.V. All rights reserved.
Liang, Yunyun; Liu, Sanyang; Zhang, Shengli
2015-01-01
Prediction of protein structural classes for low-similarity sequences is useful for understanding fold patterns, regulation, functions, and interactions of proteins. It is well known that feature extraction is significant to prediction of protein structural class and it mainly uses protein primary sequence, predicted secondary structure sequence, and position-specific scoring matrix (PSSM). Currently, prediction solely based on the PSSM has played a key role in improving the prediction accuracy. In this paper, we propose a novel method called CSP-SegPseP-SegACP by fusing consensus sequence (CS), segmented PsePSSM, and segmented autocovariance transformation (ACT) based on PSSM. Three widely used low-similarity datasets (1189, 25PDB, and 640) are adopted in this paper. Then a 700-dimensional (700D) feature vector is constructed and the dimension is decreased to 224D by using principal component analysis (PCA). To verify the performance of our method, rigorous jackknife cross-validation tests are performed on 1189, 25PDB, and 640 datasets. Comparison of our results with the existing PSSM-based methods demonstrates that our method achieves the favorable and competitive performance. This will offer an important complementary to other PSSM-based methods for prediction of protein structural classes for low-similarity sequences.
Predicting protein interactions by Brownian dynamics simulations.
Meng, Xuan-Yu; Xu, Yu; Zhang, Hong-Xing; Mezei, Mihaly; Cui, Meng
2012-01-01
We present a newly adapted Brownian-Dynamics (BD)-based protein docking method for predicting native protein complexes. The approach includes global BD conformational sampling, compact complex selection, and local energy minimization. In order to reduce the computational costs for energy evaluations, a shell-based grid force field was developed to represent the receptor protein and solvation effects. The performance of this BD protein docking approach has been evaluated on a test set of 24 crystal protein complexes. Reproduction of experimental structures in the test set indicates the adequate conformational sampling and accurate scoring of this BD protein docking approach. Furthermore, we have developed an approach to account for the flexibility of proteins, which has been successfully applied to reproduce the experimental complex structure from the structure of two unbounded proteins. These results indicate that this adapted BD protein docking approach can be useful for the prediction of protein-protein interactions.
Deoxycholate-Based Glycosides (DCGs) for Membrane Protein Stabilisation.
Bae, Hyoung Eun; Gotfryd, Kamil; Thomas, Jennifer; Hussain, Hazrat; Ehsan, Muhammad; Go, Juyeon; Loland, Claus J; Byrne, Bernadette; Chae, Pil Seok
2015-07-06
Detergents are an absolute requirement for studying the structure of membrane proteins. However, many conventional detergents fail to stabilise denaturation-sensitive membrane proteins, such as eukaryotic proteins and membrane protein complexes. New amphipathic agents with enhanced efficacy in stabilising membrane proteins will be helpful in overcoming the barriers to studying membrane protein structures. We have prepared a number of deoxycholate-based amphiphiles with carbohydrate head groups, designated deoxycholate-based glycosides (DCGs). These DCGs are the hydrophilic variants of previously reported deoxycholate-based N-oxides (DCAOs). Membrane proteins in these agents, particularly the branched diglucoside-bearing amphiphiles DCG-1 and DCG-2, displayed favourable behaviour compared to previously reported parent compounds (DCAOs) and conventional detergents (LDAO and DDM). Given their excellent properties, these agents should have significant potential for membrane protein studies. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Kaur, Parminder; Kiselar, Janna; Yang, Sichun; Chance, Mark R.
2015-01-01
Hydroxyl radical footprinting based MS for protein structure assessment has the goal of understanding ligand induced conformational changes and macromolecular interactions, for example, protein tertiary and quaternary structure, but the structural resolution provided by typical peptide-level quantification is limiting. In this work, we present experimental strategies using tandem-MS fragmentation to increase the spatial resolution of the technique to the single residue level to provide a high precision tool for molecular biophysics research. Overall, in this study we demonstrated an eightfold increase in structural resolution compared with peptide level assessments. In addition, to provide a quantitative analysis of residue based solvent accessibility and protein topography as a basis for high-resolution structure prediction; we illustrate strategies of data transformation using the relative reactivity of side chains as a normalization strategy and predict side-chain surface area from the footprinting data. We tested the methods by examination of Ca+2-calmodulin showing highly significant correlations between surface area and side-chain contact predictions for individual side chains and the crystal structure. Tandem ion based hydroxyl radical footprinting-MS provides quantitative high-resolution protein topology information in solution that can fill existing gaps in structure determination for large proteins and macromolecular complexes. PMID:25687570
BAYESIAN PROTEIN STRUCTURE ALIGNMENT.
Rodriguez, Abel; Schmidler, Scott C
The analysis of the three-dimensional structure of proteins is an important topic in molecular biochemistry. Structure plays a critical role in defining the function of proteins and is more strongly conserved than amino acid sequence over evolutionary timescales. A key challenge is the identification and evaluation of structural similarity between proteins; such analysis can aid in understanding the role of newly discovered proteins and help elucidate evolutionary relationships between organisms. Computational biologists have developed many clever algorithmic techniques for comparing protein structures, however, all are based on heuristic optimization criteria, making statistical interpretation somewhat difficult. Here we present a fully probabilistic framework for pairwise structural alignment of proteins. Our approach has several advantages, including the ability to capture alignment uncertainty and to estimate key "gap" parameters which critically affect the quality of the alignment. We show that several existing alignment methods arise as maximum a posteriori estimates under specific choices of prior distributions and error models. Our probabilistic framework is also easily extended to incorporate additional information, which we demonstrate by including primary sequence information to generate simultaneous sequence-structure alignments that can resolve ambiguities obtained using structure alone. This combined model also provides a natural approach for the difficult task of estimating evolutionary distance based on structural alignments. The model is illustrated by comparison with well-established methods on several challenging protein alignment examples.
Modularity of Protein Folds as a Tool for Template-Free Modeling of Structures.
Vallat, Brinda; Madrid-Aliste, Carlos; Fiser, Andras
2015-08-01
Predicting the three-dimensional structure of proteins from their amino acid sequences remains a challenging problem in molecular biology. While the current structural coverage of proteins is almost exclusively provided by template-based techniques, the modeling of the rest of the protein sequences increasingly require template-free methods. However, template-free modeling methods are much less reliable and are usually applicable for smaller proteins, leaving much space for improvement. We present here a novel computational method that uses a library of supersecondary structure fragments, known as Smotifs, to model protein structures. The library of Smotifs has saturated over time, providing a theoretical foundation for efficient modeling. The method relies on weak sequence signals from remotely related protein structures to create a library of Smotif fragments specific to the target protein sequence. This Smotif library is exploited in a fragment assembly protocol to sample decoys, which are assessed by a composite scoring function. Since the Smotif fragments are larger in size compared to the ones used in other fragment-based methods, the proposed modeling algorithm, SmotifTF, can employ an exhaustive sampling during decoy assembly. SmotifTF successfully predicts the overall fold of the target proteins in about 50% of the test cases and performs competitively when compared to other state of the art prediction methods, especially when sequence signal to remote homologs is diminishing. Smotif-based modeling is complementary to current prediction methods and provides a promising direction in addressing the structure prediction problem, especially when targeting larger proteins for modeling.
2017-01-01
The accurate prediction of protein chemical shifts using a quantum mechanics (QM)-based method has been the subject of intense research for more than 20 years but so far empirical methods for chemical shift prediction have proven more accurate. In this paper we show that a QM-based predictor of a protein backbone and CB chemical shifts (ProCS15, PeerJ, 2016, 3, e1344) is of comparable accuracy to empirical chemical shift predictors after chemical shift-based structural refinement that removes small structural errors. We present a method by which quantum chemistry based predictions of isotropic chemical shielding values (ProCS15) can be used to refine protein structures using Markov Chain Monte Carlo (MCMC) simulations, relating the chemical shielding values to the experimental chemical shifts probabilistically. Two kinds of MCMC structural refinement simulations were performed using force field geometry optimized X-ray structures as starting points: simulated annealing of the starting structure and constant temperature MCMC simulation followed by simulated annealing of a representative ensemble structure. Annealing of the CHARMM structure changes the CA-RMSD by an average of 0.4 Å but lowers the chemical shift RMSD by 1.0 and 0.7 ppm for CA and N. Conformational averaging has a relatively small effect (0.1–0.2 ppm) on the overall agreement with carbon chemical shifts but lowers the error for nitrogen chemical shifts by 0.4 ppm. If an amino acid specific offset is included the ProCS15 predicted chemical shifts have RMSD values relative to experiments that are comparable to popular empirical chemical shift predictors. The annealed representative ensemble structures differ in CA-RMSD relative to the initial structures by an average of 2.0 Å, with >2.0 Å difference for six proteins. In four of the cases, the largest structural differences arise in structurally flexible regions of the protein as determined by NMR, and in the remaining two cases, the large structural change may be due to force field deficiencies. The overall accuracy of the empirical methods are slightly improved by annealing the CHARMM structure with ProCS15, which may suggest that the minor structural changes introduced by ProCS15-based annealing improves the accuracy of the protein structures. Having established that QM-based chemical shift prediction can deliver the same accuracy as empirical shift predictors we hope this can help increase the accuracy of related approaches such as QM/MM or linear scaling approaches or interpreting protein structural dynamics from QM-derived chemical shift. PMID:28451325
A thermodynamic definition of protein domains.
Porter, Lauren L; Rose, George D
2012-06-12
Protein domains are conspicuous structural units in globular proteins, and their identification has been a topic of intense biochemical interest dating back to the earliest crystal structures. Numerous disparate domain identification algorithms have been proposed, all involving some combination of visual intuition and/or structure-based decomposition. Instead, we present a rigorous, thermodynamically-based approach that redefines domains as cooperative chain segments. In greater detail, most small proteins fold with high cooperativity, meaning that the equilibrium population is dominated by completely folded and completely unfolded molecules, with a negligible subpopulation of partially folded intermediates. Here, we redefine structural domains in thermodynamic terms as cooperative folding units, based on m-values, which measure the cooperativity of a protein or its substructures. In our analysis, a domain is equated to a contiguous segment of the folded protein whose m-value is largely unaffected when that segment is excised from its parent structure. Defined in this way, a domain is a self-contained cooperative unit; i.e., its cooperativity depends primarily upon intrasegment interactions, not intersegment interactions. Implementing this concept computationally, the domains in a large representative set of proteins were identified; all exhibit consistency with experimental findings. Specifically, our domain divisions correspond to the experimentally determined equilibrium folding intermediates in a set of nine proteins. The approach was also proofed against a representative set of 71 additional proteins, again with confirmatory results. Our reframed interpretation of a protein domain transforms an indeterminate structural phenomenon into a quantifiable molecular property grounded in solution thermodynamics.
Protein based Block Copolymers
Rabotyagova, Olena S.; Cebe, Peggy; Kaplan, David L.
2011-01-01
Advances in genetic engineering have led to the synthesis of protein-based block copolymers with control of chemistry and molecular weight, resulting in unique physical and biological properties. The benefits from incorporating peptide blocks into copolymer designs arise from the fundamental properties of proteins to adopt ordered conformations and to undergo self-assembly, providing control over structure formation at various length scales when compared to conventional block copolymers. This review covers the synthesis, structure, assembly, properties, and applications of protein-based block copolymers. PMID:21235251
Ghouzam, Yassine; Postic, Guillaume; Guerin, Pierre-Edouard; de Brevern, Alexandre G.; Gelly, Jean-Christophe
2016-01-01
Protein structure prediction based on comparative modeling is the most efficient way to produce structural models when it can be performed. ORION is a dedicated webserver based on a new strategy that performs this task. The identification by ORION of suitable templates is performed using an original profile-profile approach that combines sequence and structure evolution information. Structure evolution information is encoded into profiles using structural features, such as solvent accessibility and local conformation —with Protein Blocks—, which give an accurate description of the local protein structure. ORION has recently been improved, increasing by 5% the quality of its results. The ORION web server accepts a single protein sequence as input and searches homologous protein structures within minutes. Various databases such as PDB, SCOP and HOMSTRAD can be mined to find an appropriate structural template. For the modeling step, a protein 3D structure can be directly obtained from the selected template by MODELLER and displayed with global and local quality model estimation measures. The sequence and the predicted structure of 4 examples from the CAMEO server and a recent CASP11 target from the ‘Hard’ category (T0818-D1) are shown as pertinent examples. Our web server is accessible at http://www.dsimb.inserm.fr/ORION/. PMID:27319297
Ghouzam, Yassine; Postic, Guillaume; Guerin, Pierre-Edouard; de Brevern, Alexandre G; Gelly, Jean-Christophe
2016-06-20
Protein structure prediction based on comparative modeling is the most efficient way to produce structural models when it can be performed. ORION is a dedicated webserver based on a new strategy that performs this task. The identification by ORION of suitable templates is performed using an original profile-profile approach that combines sequence and structure evolution information. Structure evolution information is encoded into profiles using structural features, such as solvent accessibility and local conformation -with Protein Blocks-, which give an accurate description of the local protein structure. ORION has recently been improved, increasing by 5% the quality of its results. The ORION web server accepts a single protein sequence as input and searches homologous protein structures within minutes. Various databases such as PDB, SCOP and HOMSTRAD can be mined to find an appropriate structural template. For the modeling step, a protein 3D structure can be directly obtained from the selected template by MODELLER and displayed with global and local quality model estimation measures. The sequence and the predicted structure of 4 examples from the CAMEO server and a recent CASP11 target from the 'Hard' category (T0818-D1) are shown as pertinent examples. Our web server is accessible at http://www.dsimb.inserm.fr/ORION/.
Segmented molecular design of self-healing proteinaceous materials
Sariola, Veikko; Pena-Francesch, Abdon; Jung, Huihun; Çetinkaya, Murat; Pacheco, Carlos; Sitti, Metin; Demirel, Melik C.
2015-01-01
Hierarchical assembly of self-healing adhesive proteins creates strong and robust structural and interfacial materials, but understanding of the molecular design and structure–property relationships of structural proteins remains unclear. Elucidating this relationship would allow rational design of next generation genetically engineered self-healing structural proteins. Here we report a general self-healing and -assembly strategy based on a multiphase recombinant protein based material. Segmented structure of the protein shows soft glycine- and tyrosine-rich segments with self-healing capability and hard beta-sheet segments. The soft segments are strongly plasticized by water, lowering the self-healing temperature close to body temperature. The hard segments self-assemble into nanoconfined domains to reinforce the material. The healing strength scales sublinearly with contact time, which associates with diffusion and wetting of autohesion. The finding suggests that recombinant structural proteins from heterologous expression have potential as strong and repairable engineering materials. PMID:26323335
Zemla, Adam T; Lang, Dorothy M; Kostova, Tanya; Andino, Raul; Ecale Zhou, Carol L
2011-06-02
Most of the currently used methods for protein function prediction rely on sequence-based comparisons between a query protein and those for which a functional annotation is provided. A serious limitation of sequence similarity-based approaches for identifying residue conservation among proteins is the low confidence in assigning residue-residue correspondences among proteins when the level of sequence identity between the compared proteins is poor. Multiple sequence alignment methods are more satisfactory--still, they cannot provide reliable results at low levels of sequence identity. Our goal in the current work was to develop an algorithm that could help overcome these difficulties by facilitating the identification of structurally (and possibly functionally) relevant residue-residue correspondences between compared protein structures. Here we present StralSV (structure-alignment sequence variability), a new algorithm for detecting closely related structure fragments and quantifying residue frequency from tight local structure alignments. We apply StralSV in a study of the RNA-dependent RNA polymerase of poliovirus, and we demonstrate that the algorithm can be used to determine regions of the protein that are relatively unique, or that share structural similarity with proteins that would be considered distantly related. By quantifying residue frequencies among many residue-residue pairs extracted from local structural alignments, one can infer potential structural or functional importance of specific residues that are determined to be highly conserved or that deviate from a consensus. We further demonstrate that considerable detailed structural and phylogenetic information can be derived from StralSV analyses. StralSV is a new structure-based algorithm for identifying and aligning structure fragments that have similarity to a reference protein. StralSV analysis can be used to quantify residue-residue correspondences and identify residues that may be of particular structural or functional importance, as well as unusual or unexpected residues at a given sequence position. StralSV is provided as a web service at http://proteinmodel.org/AS2TS/STRALSV/.
Structure-based characterization of multiprotein complexes.
Wiederstein, Markus; Gruber, Markus; Frank, Karl; Melo, Francisco; Sippl, Manfred J
2014-07-08
Multiprotein complexes govern virtually all cellular processes. Their 3D structures provide important clues to their biological roles, especially through structural correlations among protein molecules and complexes. The detection of such correlations generally requires comprehensive searches in databases of known protein structures by means of appropriate structure-matching techniques. Here, we present a high-speed structure search engine capable of instantly matching large protein oligomers against the complete and up-to-date database of biologically functional assemblies of protein molecules. We use this tool to reveal unseen structural correlations on the level of protein quaternary structure and demonstrate its general usefulness for efficiently exploring complex structural relationships among known protein assemblies. Copyright © 2014 The Authors. Published by Elsevier Inc. All rights reserved.
HDAPD: a web tool for searching the disease-associated protein structures
2010-01-01
Background The protein structures of the disease-associated proteins are important for proceeding with the structure-based drug design to against a particular disease. Up until now, proteins structures are usually searched through a PDB id or some sequence information. However, in the HDAPD database presented here the protein structure of a disease-associated protein can be directly searched through the associated disease name keyed in. Description The search in HDAPD can be easily initiated by keying some key words of a disease, protein name, protein type, or PDB id. The protein sequence can be presented in FASTA format and directly copied for a BLAST search. HDAPD is also interfaced with Jmol so that users can observe and operate a protein structure with Jmol. The gene ontological data such as cellular components, molecular functions, and biological processes are provided once a hyperlink to Gene Ontology (GO) is clicked. Further, HDAPD provides a link to the KEGG map such that where the protein is placed and its relationship with other proteins in a metabolic pathway can be found from the map. The latest literatures namely titles, journals, authors, and abstracts searched from PubMed for the protein are also presented as a length controllable list. Conclusions Since the HDAPD data content can be routinely updated through a PHP-MySQL web page built, the new database presented is useful for searching the structures for some disease-associated proteins that may play important roles in the disease developing process for performing the structure-based drug design to against the diseases. PMID:20158919
Xu, Dong; Zhang, Yang
2012-07-01
Ab initio protein folding is one of the major unsolved problems in computational biology owing to the difficulties in force field design and conformational search. We developed a novel program, QUARK, for template-free protein structure prediction. Query sequences are first broken into fragments of 1-20 residues where multiple fragment structures are retrieved at each position from unrelated experimental structures. Full-length structure models are then assembled from fragments using replica-exchange Monte Carlo simulations, which are guided by a composite knowledge-based force field. A number of novel energy terms and Monte Carlo movements are introduced and the particular contributions to enhancing the efficiency of both force field and search engine are analyzed in detail. QUARK prediction procedure is depicted and tested on the structure modeling of 145 nonhomologous proteins. Although no global templates are used and all fragments from experimental structures with template modeling score >0.5 are excluded, QUARK can successfully construct 3D models of correct folds in one-third cases of short proteins up to 100 residues. In the ninth community-wide Critical Assessment of protein Structure Prediction experiment, QUARK server outperformed the second and third best servers by 18 and 47% based on the cumulative Z-score of global distance test-total scores in the FM category. Although ab initio protein folding remains a significant challenge, these data demonstrate new progress toward the solution of the most important problem in the field. Copyright © 2012 Wiley Periodicals, Inc.
Li, Yang; Yang, Jianyi
2017-04-24
The prediction of protein-ligand binding affinity has recently been improved remarkably by machine-learning-based scoring functions. For example, using a set of simple descriptors representing the atomic distance counts, the RF-Score improves the Pearson correlation coefficient to about 0.8 on the core set of the PDBbind 2007 database, which is significantly higher than the performance of any conventional scoring function on the same benchmark. A few studies have been made to discuss the performance of machine-learning-based methods, but the reason for this improvement remains unclear. In this study, by systemically controlling the structural and sequence similarity between the training and test proteins of the PDBbind benchmark, we demonstrate that protein structural and sequence similarity makes a significant impact on machine-learning-based methods. After removal of training proteins that are highly similar to the test proteins identified by structure alignment and sequence alignment, machine-learning-based methods trained on the new training sets do not outperform the conventional scoring functions any more. On the contrary, the performance of conventional functions like X-Score is relatively stable no matter what training data are used to fit the weights of its energy terms.
SDSL-ESR-based protein structure characterization.
Strancar, Janez; Kavalenka, Aleh; Urbancic, Iztok; Ljubetic, Ajasja; Hemminga, Marcus A
2010-03-01
As proteins are key molecules in living cells, knowledge about their structure can provide important insights and applications in science, biotechnology, and medicine. However, many protein structures are still a big challenge for existing high-resolution structure-determination methods, as can be seen in the number of protein structures published in the Protein Data Bank. This is especially the case for less-ordered, more hydrophobic and more flexible protein systems. The lack of efficient methods for structure determination calls for urgent development of a new class of biophysical techniques. This work attempts to address this problem with a novel combination of site-directed spin labelling electron spin resonance spectroscopy (SDSL-ESR) and protein structure modelling, which is coupled by restriction of the conformational spaces of the amino acid side chains. Comparison of the application to four different protein systems enables us to generalize the new method and to establish a general procedure for determination of protein structure.
Comparative Protein Structure Modeling Using MODELLER
Webb, Benjamin; Sali, Andrej
2016-01-01
Comparative protein structure modeling predicts the three-dimensional structure of a given protein sequence (target) based primarily on its alignment to one or more proteins of known structure (templates). The prediction process consists of fold assignment, target-template alignment, model building, and model evaluation. This unit describes how to calculate comparative models using the program MODELLER and how to use the ModBase database of such models, and discusses all four steps of comparative modeling, frequently observed errors, and some applications. Modeling lactate dehydrogenase from Trichomonas vaginalis (TvLDH) is described as an example. The download and installation of the MODELLER software is also described. PMID:27322406
Optimization of protein-protein docking for predicting Fc-protein interactions.
Agostino, Mark; Mancera, Ricardo L; Ramsland, Paul A; Fernández-Recio, Juan
2016-11-01
The antibody crystallizable fragment (Fc) is recognized by effector proteins as part of the immune system. Pathogens produce proteins that bind Fc in order to subvert or evade the immune response. The structural characterization of the determinants of Fc-protein association is essential to improve our understanding of the immune system at the molecular level and to develop new therapeutic agents. Furthermore, Fc-binding peptides and proteins are frequently used to purify therapeutic antibodies. Although several structures of Fc-protein complexes are available, numerous others have not yet been determined. Protein-protein docking could be used to investigate Fc-protein complexes; however, improved approaches are necessary to efficiently model such cases. In this study, a docking-based structural bioinformatics approach is developed for predicting the structures of Fc-protein complexes. Based on the available set of X-ray structures of Fc-protein complexes, three regions of the Fc, loosely corresponding to three turns within the structure, were defined as containing the essential features for protein recognition and used as restraints to filter the initial docking search. Rescoring the filtered poses with an optimal scoring strategy provided a success rate of approximately 80% of the test cases examined within the top ranked 20 poses, compared to approximately 20% by the initial unrestrained docking. The developed docking protocol provides a significant improvement over the initial unrestrained docking and will be valuable for predicting the structures of currently undetermined Fc-protein complexes, as well as in the design of peptides and proteins that target Fc. Copyright © 2016 John Wiley & Sons, Ltd.
ERIC Educational Resources Information Center
Giron, Maria D.; Salto, Rafael
2011-01-01
Structure-function relationship studies in proteins are essential in modern Cell Biology. Laboratory exercises that allow students to familiarize themselves with basic mutagenesis techniques are essential in all Genetic Engineering courses to teach the relevance of protein structure. We have implemented a laboratory course based on the…
Structural determination of intact proteins using mass spectrometry
Kruppa, Gary [San Francisco, CA; Schoeniger, Joseph S [Oakland, CA; Young, Malin M [Livermore, CA
2008-05-06
The present invention relates to novel methods of determining the sequence and structure of proteins. Specifically, the present invention allows for the analysis of intact proteins within a mass spectrometer. Therefore, preparatory separations need not be performed prior to introducing a protein sample into the mass spectrometer. Also disclosed herein are new instrumental developments for enhancing the signal from the desired modified proteins, methods for producing controlled protein fragments in the mass spectrometer, eliminating complex microseparations, and protein preparatory chemical steps necessary for cross-linking based protein structure determination.Additionally, the preferred method of the present invention involves the determination of protein structures utilizing a top-down analysis of protein structures to search for covalent modifications. In the preferred method, intact proteins are ionized and fragmented within the mass spectrometer.
ZifBASE: a database of zinc finger proteins and associated resources.
Jayakanthan, Mannu; Muthukumaran, Jayaraman; Chandrasekar, Sanniyasi; Chawla, Konika; Punetha, Ankita; Sundar, Durai
2009-09-09
Information on the occurrence of zinc finger protein motifs in genomes is crucial to the developing field of molecular genome engineering. The knowledge of their target DNA-binding sequences is vital to develop chimeric proteins for targeted genome engineering and site-specific gene correction. There is a need to develop a computational resource of zinc finger proteins (ZFP) to identify the potential binding sites and its location, which reduce the time of in vivo task, and overcome the difficulties in selecting the specific type of zinc finger protein and the target site in the DNA sequence. ZifBASE provides an extensive collection of various natural and engineered ZFP. It uses standard names and a genetic and structural classification scheme to present data retrieved from UniProtKB, GenBank, Protein Data Bank, ModBase, Protein Model Portal and the literature. It also incorporates specialized features of ZFP including finger sequences and positions, number of fingers, physiochemical properties, classes, framework, PubMed citations with links to experimental structures (PDB, if available) and modeled structures of natural zinc finger proteins. ZifBASE provides information on zinc finger proteins (both natural and engineered ones), the number of finger units in each of the zinc finger proteins (with multiple fingers), the synergy between the adjacent fingers and their positions. Additionally, it gives the individual finger sequence and their target DNA site to which it binds for better and clear understanding on the interactions of adjacent fingers. The current version of ZifBASE contains 139 entries of which 89 are engineered ZFPs, containing 3-7F totaling to 296 fingers. There are 50 natural zinc finger protein entries ranging from 2-13F, totaling to 307 fingers. It has sequences and structures from literature, Protein Data Bank, ModBase and Protein Model Portal. The interface is cross linked to other public databases like UniprotKB, PDB, ModBase and Protein Model Portal and PubMed for making it more informative. A database is established to maintain the information of the sequence features, including the class, framework, number of fingers, residues, position, recognition site and physio-chemical properties (molecular weight, isoelectric point) of both natural and engineered zinc finger proteins and dissociation constant of few. ZifBASE can provide more effective and efficient way of accessing the zinc finger protein sequences and their target binding sites with the links to their three-dimensional structures. All the data and functions are available at the advanced web-based search interface http://web.iitd.ac.in/~sundar/zifbase.
Performance of protein-structure predictions with the physics-based UNRES force field in CASP11.
Krupa, Paweł; Mozolewska, Magdalena A; Wiśniewska, Marta; Yin, Yanping; He, Yi; Sieradzan, Adam K; Ganzynkowicz, Robert; Lipska, Agnieszka G; Karczyńska, Agnieszka; Ślusarz, Magdalena; Ślusarz, Rafał; Giełdoń, Artur; Czaplewski, Cezary; Jagieła, Dawid; Zaborowski, Bartłomiej; Scheraga, Harold A; Liwo, Adam
2016-11-01
Participating as the Cornell-Gdansk group, we have used our physics-based coarse-grained UNited RESidue (UNRES) force field to predict protein structure in the 11th Community Wide Experiment on the Critical Assessment of Techniques for Protein Structure Prediction (CASP11). Our methodology involved extensive multiplexed replica exchange simulations of the target proteins with a recently improved UNRES force field to provide better reproductions of the local structures of polypeptide chains. All simulations were started from fully extended polypeptide chains, and no external information was included in the simulation process except for weak restraints on secondary structure to enable us to finish each prediction within the allowed 3-week time window. Because of simplified UNRES representation of polypeptide chains, use of enhanced sampling methods, code optimization and parallelization and sufficient computational resources, we were able to treat, for the first time, all 55 human prediction targets with sizes from 44 to 595 amino acid residues, the average size being 251 residues. Complete structures of six single-domain proteins were predicted accurately, with the highest accuracy being attained for the T0769, for which the CαRMSD was 3.8 Å for 97 residues of the experimental structure. Correct structures were also predicted for 13 domains of multi-domain proteins with accuracy comparable to that of the best template-based modeling methods. With further improvements of the UNRES force field that are now underway, our physics-based coarse-grained approach to protein-structure prediction will eventually reach global prediction capacity and, consequently, reliability in simulating protein structure and dynamics that are important in biochemical processes. Freely available on the web at http://www.unres.pl/ CONTACT: has5@cornell.edu. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Secure web book to store structural genomics research data.
Manjasetty, Babu A; Höppner, Klaus; Mueller, Uwe; Heinemann, Udo
2003-01-01
Recently established collaborative structural genomics programs aim at significantly accelerating the crystal structure analysis of proteins. These large-scale projects require efficient data management systems to ensure seamless collaboration between different groups of scientists working towards the same goal. Within the Berlin-based Protein Structure Factory, the synchrotron X-ray data collection and the subsequent crystal structure analysis tasks are located at BESSY, a third-generation synchrotron source. To organize file-based communication and data transfer at the BESSY site of the Protein Structure Factory, we have developed the web-based BCLIMS, the BESSY Crystallography Laboratory Information Management System. BCLIMS is a relational data management system which is powered by MySQL as the database engine and Apache HTTP as the web server. The database interface routines are written in Python programing language. The software is freely available to academic users. Here we describe the storage, retrieval and manipulation of laboratory information, mainly pertaining to the synchrotron X-ray diffraction experiments and the subsequent protein structure analysis, using BCLIMS.
Impact of computational structure-based methods on drug discovery.
Reynolds, Charles H
2014-01-01
Structure-based drug design has become an indispensible tool in drug discovery. The emergence of structure-based design is due to gains in structural biology that have provided exponential growth in the number of protein crystal structures, new computational algorithms and approaches for modeling protein-ligand interactions, and the tremendous growth of raw computer power in the last 30 years. Computer modeling and simulation have made major contributions to the discovery of many groundbreaking drugs in recent years. Examples are presented that highlight the evolution of computational structure-based design methodology, and the impact of that methodology on drug discovery.
A Particle Swarm Optimization-Based Approach with Local Search for Predicting Protein Folding.
Yang, Cheng-Hong; Lin, Yu-Shiun; Chuang, Li-Yeh; Chang, Hsueh-Wei
2017-10-01
The hydrophobic-polar (HP) model is commonly used for predicting protein folding structures and hydrophobic interactions. This study developed a particle swarm optimization (PSO)-based algorithm combined with local search algorithms; specifically, the high exploration PSO (HEPSO) algorithm (which can execute global search processes) was combined with three local search algorithms (hill-climbing algorithm, greedy algorithm, and Tabu table), yielding the proposed HE-L-PSO algorithm. By using 20 known protein structures, we evaluated the performance of the HE-L-PSO algorithm in predicting protein folding in the HP model. The proposed HE-L-PSO algorithm exhibited favorable performance in predicting both short and long amino acid sequences with high reproducibility and stability, compared with seven reported algorithms. The HE-L-PSO algorithm yielded optimal solutions for all predicted protein folding structures. All HE-L-PSO-predicted protein folding structures possessed a hydrophobic core that is similar to normal protein folding.
Jafari, Rahim; Sadeghi, Mehdi; Mirzaie, Mehdi
2016-05-01
The approaches taken to represent and describe structural features of the macromolecules are of major importance when developing computational methods for studying and predicting their structures and interactions. This study attempts to explore the significance of Delaunay tessellation for the definition of atomic interactions by evaluating its impact on the performance of scoring protein-protein docking prediction. Two sets of knowledge-based scoring potentials are extracted from a training dataset of native protein-protein complexes. The potential of the first set is derived using atomic interactions extracted from Delaunay tessellated structures. The potential of the second set is calculated conventionally, that is, using atom pairs whose interactions were determined by their separation distances. The scoring potentials were tested against two different docking decoy sets and their performances were compared. The results show that, if properly optimized, the Delaunay-based scoring potentials can achieve higher success rate than the usual scoring potentials. These results and the results of a previous study on the use of Delaunay-based potentials in protein fold recognition, all point to the fact that Delaunay tessellation of protein structure can provide a more realistic definition of atomic interaction, and therefore, if appropriately utilized, may be able to improve the accuracy of pair potentials. Copyright © 2016 Elsevier Inc. All rights reserved.
Towards fully automated structure-based function prediction in structural genomics: a case study.
Watson, James D; Sanderson, Steve; Ezersky, Alexandra; Savchenko, Alexei; Edwards, Aled; Orengo, Christine; Joachimiak, Andrzej; Laskowski, Roman A; Thornton, Janet M
2007-04-13
As the global Structural Genomics projects have picked up pace, the number of structures annotated in the Protein Data Bank as hypothetical protein or unknown function has grown significantly. A major challenge now involves the development of computational methods to assign functions to these proteins accurately and automatically. As part of the Midwest Center for Structural Genomics (MCSG) we have developed a fully automated functional analysis server, ProFunc, which performs a battery of analyses on a submitted structure. The analyses combine a number of sequence-based and structure-based methods to identify functional clues. After the first stage of the Protein Structure Initiative (PSI), we review the success of the pipeline and the importance of structure-based function prediction. As a dataset, we have chosen all structures solved by the MCSG during the 5 years of the first PSI. Our analysis suggests that two of the structure-based methods are particularly successful and provide examples of local similarity that is difficult to identify using current sequence-based methods. No one method is successful in all cases, so, through the use of a number of complementary sequence and structural approaches, the ProFunc server increases the chances that at least one method will find a significant hit that can help elucidate function. Manual assessment of the results is a time-consuming process and subject to individual interpretation and human error. We present a method based on the Gene Ontology (GO) schema using GO-slims that can allow the automated assessment of hits with a success rate approaching that of expert manual assessment.
Structure Refinement of Protein Low Resolution Models Using the GNEIMO Constrained Dynamics Method
Park, In-Hee; Gangupomu, Vamshi; Wagner, Jeffrey; Jain, Abhinandan; Vaidehi, Nagara-jan
2012-01-01
The challenge in protein structure prediction using homology modeling is the lack of reliable methods to refine the low resolution homology models. Unconstrained all-atom molecular dynamics (MD) does not serve well for structure refinement due to its limited conformational search. We have developed and tested the constrained MD method, based on the Generalized Newton-Euler Inverse Mass Operator (GNEIMO) algorithm for protein structure refinement. In this method, the high-frequency degrees of freedom are replaced with hard holonomic constraints and a protein is modeled as a collection of rigid body clusters connected by flexible torsional hinges. This allows larger integration time steps and enhances the conformational search space. In this work, we have demonstrated the use of a constraint free GNEIMO method for protein structure refinement that starts from low-resolution decoy sets derived from homology methods. In the eight proteins with three decoys for each, we observed an improvement of ~2 Å in the RMSD to the known experimental structures of these proteins. The GNEIMO method also showed enrichment in the population density of native-like conformations. In addition, we demonstrated structural refinement using a “Freeze and Thaw” clustering scheme with the GNEIMO framework as a viable tool for enhancing localized conformational search. We have derived a robust protocol based on the GNEIMO replica exchange method for protein structure refinement that can be readily extended to other proteins and possibly applicable for high throughput protein structure refinement. PMID:22260550
Leuthaeuser, Janelle B; Knutson, Stacy T; Kumar, Kiran; Babbitt, Patricia C; Fetrow, Jacquelyn S
2015-09-01
The development of accurate protein function annotation methods has emerged as a major unsolved biological problem. Protein similarity networks, one approach to function annotation via annotation transfer, group proteins into similarity-based clusters. An underlying assumption is that the edge metric used to identify such clusters correlates with functional information. In this contribution, this assumption is evaluated by observing topologies in similarity networks using three different edge metrics: sequence (BLAST), structure (TM-Align), and active site similarity (active site profiling, implemented in DASP). Network topologies for four well-studied protein superfamilies (enolase, peroxiredoxin (Prx), glutathione transferase (GST), and crotonase) were compared with curated functional hierarchies and structure. As expected, network topology differs, depending on edge metric; comparison of topologies provides valuable information on structure/function relationships. Subnetworks based on active site similarity correlate with known functional hierarchies at a single edge threshold more often than sequence- or structure-based networks. Sequence- and structure-based networks are useful for identifying sequence and domain similarities and differences; therefore, it is important to consider the clustering goal before deciding appropriate edge metric. Further, conserved active site residues identified in enolase and GST active site subnetworks correspond with published functionally important residues. Extension of this analysis yields predictions of functionally determinant residues for GST subgroups. These results support the hypothesis that active site similarity-based networks reveal clusters that share functional details and lay the foundation for capturing functionally relevant hierarchies using an approach that is both automatable and can deliver greater precision in function annotation than current similarity-based methods. © 2015 The Authors Protein Science published by Wiley Periodicals, Inc. on behalf of The Protein Society.
Knowledge-based model building of proteins: concepts and examples.
Bajorath, J.; Stenkamp, R.; Aruffo, A.
1993-01-01
We describe how to build protein models from structural templates. Methods to identify structural similarities between proteins in cases of significant, moderate to low, or virtually absent sequence similarity are discussed. The detection and evaluation of structural relationships is emphasized as a central aspect of protein modeling, distinct from the more technical aspects of model building. Computational techniques to generate and complement comparative protein models are also reviewed. Two examples, P-selectin and gp39, are presented to illustrate the derivation of protein model structures and their use in experimental studies. PMID:7505680
Overcoming barriers to membrane protein structure determination.
Bill, Roslyn M; Henderson, Peter J F; Iwata, So; Kunji, Edmund R S; Michel, Hartmut; Neutze, Richard; Newstead, Simon; Poolman, Bert; Tate, Christopher G; Vogel, Horst
2011-04-01
After decades of slow progress, the pace of research on membrane protein structures is beginning to quicken thanks to various improvements in technology, including protein engineering and microfocus X-ray diffraction. Here we review these developments and, where possible, highlight generic new approaches to solving membrane protein structures based on recent technological advances. Rational approaches to overcoming the bottlenecks in the field are urgently required as membrane proteins, which typically comprise ~30% of the proteomes of organisms, are dramatically under-represented in the structural database of the Protein Data Bank.
Li, Jinyu; Rossetti, Giulia; Dreyer, Jens; Raugei, Simone; Ippoliti, Emiliano; Lüscher, Bernhard; Carloni, Paolo
2014-01-01
Protein electrospray ionization (ESI) mass spectrometry (MS)-based techniques are widely used to provide insight into structural proteomics under the assumption that non-covalent protein complexes being transferred into the gas phase preserve basically the same intermolecular interactions as in solution. Here we investigate the applicability of this assumption by extending our previous structural prediction protocol for single proteins in ESI-MS to protein complexes. We apply our protocol to the human insulin dimer (hIns2) as a test case. Our calculations reproduce the main charge and the collision cross section (CCS) measured in ESI-MS experiments. Molecular dynamics simulations for 0.075 ms show that the complex maximizes intermolecular non-bonded interactions relative to the structure in water, without affecting the cross section. The overall gas-phase structure of hIns2 does exhibit differences with the one in aqueous solution, not inferable from a comparison with calculated CCS. Hence, care should be exerted when interpreting ESI-MS proteomics data based solely on NMR and/or X-ray structural information. PMID:25210764
Efficient protein structure search using indexing methods
2013-01-01
Understanding functions of proteins is one of the most important challenges in many studies of biological processes. The function of a protein can be predicted by analyzing the functions of structurally similar proteins, thus finding structurally similar proteins accurately and efficiently from a large set of proteins is crucial. A protein structure can be represented as a vector by 3D-Zernike Descriptor (3DZD) which compactly represents the surface shape of the protein tertiary structure. This simplified representation accelerates the searching process. However, computing the similarity of two protein structures is still computationally expensive, thus it is hard to efficiently process many simultaneous requests of structurally similar protein search. This paper proposes indexing techniques which substantially reduce the search time to find structurally similar proteins. In particular, we first exploit two indexing techniques, i.e., iDistance and iKernel, on the 3DZDs. After that, we extend the techniques to further improve the search speed for protein structures. The extended indexing techniques build and utilize an reduced index constructed from the first few attributes of 3DZDs of protein structures. To retrieve top-k similar structures, top-10 × k similar structures are first found using the reduced index, and top-k structures are selected among them. We also modify the indexing techniques to support θ-based nearest neighbor search, which returns data points less than θ to the query point. The results show that both iDistance and iKernel significantly enhance the searching speed. In top-k nearest neighbor search, the searching time is reduced 69.6%, 77%, 77.4% and 87.9%, respectively using iDistance, iKernel, the extended iDistance, and the extended iKernel. In θ-based nearest neighbor serach, the searching time is reduced 80%, 81%, 95.6% and 95.6% using iDistance, iKernel, the extended iDistance, and the extended iKernel, respectively. PMID:23691543
Efficient protein structure search using indexing methods.
Kim, Sungchul; Sael, Lee; Yu, Hwanjo
2013-01-01
Understanding functions of proteins is one of the most important challenges in many studies of biological processes. The function of a protein can be predicted by analyzing the functions of structurally similar proteins, thus finding structurally similar proteins accurately and efficiently from a large set of proteins is crucial. A protein structure can be represented as a vector by 3D-Zernike Descriptor (3DZD) which compactly represents the surface shape of the protein tertiary structure. This simplified representation accelerates the searching process. However, computing the similarity of two protein structures is still computationally expensive, thus it is hard to efficiently process many simultaneous requests of structurally similar protein search. This paper proposes indexing techniques which substantially reduce the search time to find structurally similar proteins. In particular, we first exploit two indexing techniques, i.e., iDistance and iKernel, on the 3DZDs. After that, we extend the techniques to further improve the search speed for protein structures. The extended indexing techniques build and utilize an reduced index constructed from the first few attributes of 3DZDs of protein structures. To retrieve top-k similar structures, top-10 × k similar structures are first found using the reduced index, and top-k structures are selected among them. We also modify the indexing techniques to support θ-based nearest neighbor search, which returns data points less than θ to the query point. The results show that both iDistance and iKernel significantly enhance the searching speed. In top-k nearest neighbor search, the searching time is reduced 69.6%, 77%, 77.4% and 87.9%, respectively using iDistance, iKernel, the extended iDistance, and the extended iKernel. In θ-based nearest neighbor serach, the searching time is reduced 80%, 81%, 95.6% and 95.6% using iDistance, iKernel, the extended iDistance, and the extended iKernel, respectively.
Conserved thioredoxin fold is present in Pisum sativum L. sieve element occlusion-1 protein
Umate, Pavan; Tuteja, Renu
2010-01-01
Homology-based three-dimensional model for Pisum sativum sieve element occlusion 1 (Ps.SEO1) (forisomes) protein was constructed. A stretch of amino acids (residues 320 to 456) which is well conserved in all known members of forisomes proteins was used to model the 3D structure of Ps.SEO1. The structural prediction was done using Protein Homology/analogY Recognition Engine (PHYRE) web server. Based on studies of local sequence alignment, the thioredoxin-fold containing protein [Structural Classification of Proteins (SCOP) code d1o73a_], a member of the glutathione peroxidase family was selected as a template for modeling the spatial structure of Ps.SEO1. Selection was based on comparison of primary sequence, higher match quality and alignment accuracy. Motif 1 (EVF) is conserved in Ps.SEO1, Vicia faba (Vf.For1) and Medicago truncatula (MT.SEO3); motif 2 (KKED) is well conserved across all forisomes proteins and motif 3 (IGYIGNP) is conserved in Ps.SEO1 and Vf.For1. PMID:20404566
Orenstein, Yaron; Wang, Yuhao; Berger, Bonnie
2016-06-15
Protein-RNA interactions, which play vital roles in many processes, are mediated through both RNA sequence and structure. CLIP-based methods, which measure protein-RNA binding in vivo, suffer from experimental noise and systematic biases, whereas in vitro experiments capture a clearer signal of protein RNA-binding. Among them, RNAcompete provides binding affinities of a specific protein to more than 240 000 unstructured RNA probes in one experiment. The computational challenge is to infer RNA structure- and sequence-based binding models from these data. The state-of-the-art in sequence models, Deepbind, does not model structural preferences. RNAcontext models both sequence and structure preferences, but is outperformed by GraphProt. Unfortunately, GraphProt cannot detect structural preferences from RNAcompete data due to the unstructured nature of the data, as noted by its developers, nor can it be tractably run on the full RNACompete dataset. We develop RCK, an efficient, scalable algorithm that infers both sequence and structure preferences based on a new k-mer based model. Remarkably, even though RNAcompete data is designed to be unstructured, RCK can still learn structural preferences from it. RCK significantly outperforms both RNAcontext and Deepbind in in vitro binding prediction for 244 RNAcompete experiments. Moreover, RCK is also faster and uses less memory, which enables scalability. While currently on par with existing methods in in vivo binding prediction on a small scale test, we demonstrate that RCK will increasingly benefit from experimentally measured RNA structure profiles as compared to computationally predicted ones. By running RCK on the entire RNAcompete dataset, we generate and provide as a resource a set of protein-RNA structure-based models on an unprecedented scale. Software and models are freely available at http://rck.csail.mit.edu/ bab@mit.edu Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.
Predicting PDZ domain mediated protein interactions from structure
2013-01-01
Background PDZ domains are structural protein domains that recognize simple linear amino acid motifs, often at protein C-termini, and mediate protein-protein interactions (PPIs) in important biological processes, such as ion channel regulation, cell polarity and neural development. PDZ domain-peptide interaction predictors have been developed based on domain and peptide sequence information. Since domain structure is known to influence binding specificity, we hypothesized that structural information could be used to predict new interactions compared to sequence-based predictors. Results We developed a novel computational predictor of PDZ domain and C-terminal peptide interactions using a support vector machine trained with PDZ domain structure and peptide sequence information. Performance was estimated using extensive cross validation testing. We used the structure-based predictor to scan the human proteome for ligands of 218 PDZ domains and show that the predictions correspond to known PDZ domain-peptide interactions and PPIs in curated databases. The structure-based predictor is complementary to the sequence-based predictor, finding unique known and novel PPIs, and is less dependent on training–testing domain sequence similarity. We used a functional enrichment analysis of our hits to create a predicted map of PDZ domain biology. This map highlights PDZ domain involvement in diverse biological processes, some only found by the structure-based predictor. Based on this analysis, we predict novel PDZ domain involvement in xenobiotic metabolism and suggest new interactions for other processes including wound healing and Wnt signalling. Conclusions We built a structure-based predictor of PDZ domain-peptide interactions, which can be used to scan C-terminal proteomes for PDZ interactions. We also show that the structure-based predictor finds many known PDZ mediated PPIs in human that were not found by our previous sequence-based predictor and is less dependent on training–testing domain sequence similarity. Using both predictors, we defined a functional map of human PDZ domain biology and predict novel PDZ domain function. Users may access our structure-based and previous sequence-based predictors at http://webservice.baderlab.org/domains/POW. PMID:23336252
Cell-free protein synthesis for structure determination by X-ray crystallography.
Watanabe, Miki; Miyazono, Ken-ichi; Tanokura, Masaru; Sawasaki, Tatsuya; Endo, Yaeta; Kobayashi, Ichizo
2010-01-01
Structure determination has been difficult for those proteins that are toxic to the cells and cannot be prepared in a large amount in vivo. These proteins, even when biologically very interesting, tend to be left uncharacterized in the structural genomics projects. Their cell-free synthesis can bypass the toxicity problem. Among the various cell-free systems, the wheat-germ-based system is of special interest due to the following points: (1) Because the gene is placed under a plant translational signal, its toxic expression in a bacterial host is reduced. (2) It has only little codon preference and, especially, little discrimination between methionine and selenomethionine (SeMet), which allows easy preparation of selenomethionylated proteins for crystal structure determination by SAD and MAD methods. (3) Translation is uncoupled from transcription, so that the toxicity of the translation product on DNA and its transcription, if any, can be bypassed. We have shown that the wheat-germ-based cell-free protein synthesis is useful for X-ray crystallography of one of the 4-bp cutter restriction enzymes, which are expected to be very toxic to all forms of cells retaining the genome. Our report on its structure represents the first report of structure determination by X-ray crystallography using protein overexpressed with the wheat-germ-based cell-free protein expression system. This will be a method of choice for cytotoxic proteins when its cost is not a problem. Its use will become popular when the crystal structure determination technology has evolved to require only a tiny amount of protein.
NASA Astrophysics Data System (ADS)
Takemura, Kazuhiro; Guo, Hao; Sakuraba, Shun; Matubayasi, Nobuyuki; Kitao, Akio
2012-12-01
We propose a method to evaluate binding free energy differences among distinct protein-protein complex model structures through all-atom molecular dynamics simulations in explicit water using the solution theory in the energy representation. Complex model structures are generated from a pair of monomeric structures using the rigid-body docking program ZDOCK. After structure refinement by side chain optimization and all-atom molecular dynamics simulations in explicit water, complex models are evaluated based on the sum of their conformational and solvation free energies, the latter calculated from the energy distribution functions obtained from relatively short molecular dynamics simulations of the complex in water and of pure water based on the solution theory in the energy representation. We examined protein-protein complex model structures of two protein-protein complex systems, bovine trypsin/CMTI-1 squash inhibitor (PDB ID: 1PPE) and RNase SA/barstar (PDB ID: 1AY7), for which both complex and monomer structures were determined experimentally. For each system, we calculated the energies for the crystal complex structure and twelve generated model structures including the model most similar to the crystal structure and very different from it. In both systems, the sum of the conformational and solvation free energies tended to be lower for the structure similar to the crystal. We concluded that our energy calculation method is useful for selecting low energy complex models similar to the crystal structure from among a set of generated models.
Takemura, Kazuhiro; Guo, Hao; Sakuraba, Shun; Matubayasi, Nobuyuki; Kitao, Akio
2012-12-07
We propose a method to evaluate binding free energy differences among distinct protein-protein complex model structures through all-atom molecular dynamics simulations in explicit water using the solution theory in the energy representation. Complex model structures are generated from a pair of monomeric structures using the rigid-body docking program ZDOCK. After structure refinement by side chain optimization and all-atom molecular dynamics simulations in explicit water, complex models are evaluated based on the sum of their conformational and solvation free energies, the latter calculated from the energy distribution functions obtained from relatively short molecular dynamics simulations of the complex in water and of pure water based on the solution theory in the energy representation. We examined protein-protein complex model structures of two protein-protein complex systems, bovine trypsin/CMTI-1 squash inhibitor (PDB ID: 1PPE) and RNase SA/barstar (PDB ID: 1AY7), for which both complex and monomer structures were determined experimentally. For each system, we calculated the energies for the crystal complex structure and twelve generated model structures including the model most similar to the crystal structure and very different from it. In both systems, the sum of the conformational and solvation free energies tended to be lower for the structure similar to the crystal. We concluded that our energy calculation method is useful for selecting low energy complex models similar to the crystal structure from among a set of generated models.
Grating-based X-ray tomography of 3D food structures
NASA Astrophysics Data System (ADS)
Miklos, Rikke; Nielsen, Mikkel Schou; Einarsdottir, Hildur; Lametsch, René
2016-10-01
A novel grating based X-ray phase-contrast tomographic method has been used to study how partly substitution of meat proteins with two different types of soy proteins affect the structure of the formed protein gel in meat emulsions. The measurements were performed at the Swiss synchrotron radiation light source using a grating interferometric set-up.
A topological approach for protein classification
Cang, Zixuan; Mu, Lin; Wu, Kedi; ...
2015-11-04
Here, protein function and dynamics are closely related to its sequence and structure. However, prediction of protein function and dynamics from its sequence and structure is still a fundamental challenge in molecular biology. Protein classification, which is typically done through measuring the similarity between proteins based on protein sequence or physical information, serves as a crucial step toward the understanding of protein function and dynamics.
Knowledge-based prediction of protein backbone conformation using a structural alphabet.
Vetrivel, Iyanar; Mahajan, Swapnil; Tyagi, Manoj; Hoffmann, Lionel; Sanejouand, Yves-Henri; Srinivasan, Narayanaswamy; de Brevern, Alexandre G; Cadet, Frédéric; Offmann, Bernard
2017-01-01
Libraries of structural prototypes that abstract protein local structures are known as structural alphabets and have proven to be very useful in various aspects of protein structure analyses and predictions. One such library, Protein Blocks, is composed of 16 standard 5-residues long structural prototypes. This form of analyzing proteins involves drafting its structure as a string of Protein Blocks. Predicting the local structure of a protein in terms of protein blocks is the general objective of this work. A new approach, PB-kPRED is proposed towards this aim. It involves (i) organizing the structural knowledge in the form of a database of pentapeptide fragments extracted from all protein structures in the PDB and (ii) applying a knowledge-based algorithm that does not rely on any secondary structure predictions and/or sequence alignment profiles, to scan this database and predict most probable backbone conformations for the protein local structures. Though PB-kPRED uses the structural information from homologues in preference, if available. The predictions were evaluated rigorously on 15,544 query proteins representing a non-redundant subset of the PDB filtered at 30% sequence identity cut-off. We have shown that the kPRED method was able to achieve mean accuracies ranging from 40.8% to 66.3% depending on the availability of homologues. The impact of the different strategies for scanning the database on the prediction was evaluated and is discussed. Our results highlight the usefulness of the method in the context of proteins without any known structural homologues. A scoring function that gives a good estimate of the accuracy of prediction was further developed. This score estimates very well the accuracy of the algorithm (R2 of 0.82). An online version of the tool is provided freely for non-commercial usage at http://www.bo-protscience.fr/kpred/.
Zhou, Wengang; Dickerson, Julie A
2012-01-01
Knowledge of protein subcellular locations can help decipher a protein's biological function. This work proposes new features: sequence-based: Hybrid Amino Acid Pair (HAAP) and two structure-based: Secondary Structural Element Composition (SSEC) and solvent accessibility state frequency. A multi-class Support Vector Machine is developed to predict the locations. Testing on two established data sets yields better prediction accuracies than the best available systems. Comparisons with existing methods show comparable results to ESLPred2. When StruLocPred is applied to the entire Arabidopsis proteome, over 77% of proteins with known locations match the prediction results. An implementation of this system is at http://wgzhou.ece. iastate.edu/StruLocPred/.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Cang, Zixuan; Mu, Lin; Wu, Kedi
Here, protein function and dynamics are closely related to its sequence and structure. However, prediction of protein function and dynamics from its sequence and structure is still a fundamental challenge in molecular biology. Protein classification, which is typically done through measuring the similarity between proteins based on protein sequence or physical information, serves as a crucial step toward the understanding of protein function and dynamics.
Matsuoka, Masanari; Sugita, Masatake; Kikuchi, Takeshi
2014-09-18
Proteins that share a high sequence homology while exhibiting drastically different 3D structures are investigated in this study. Recently, artificial proteins related to the sequences of the GA and IgG binding GB domains of human serum albumin have been designed. These artificial proteins, referred to as GA and GB, share 98% amino acid sequence identity but exhibit different 3D structures, namely, a 3α bundle versus a 4β + α structure. Discriminating between their 3D structures based on their amino acid sequences is a very difficult problem. In the present work, in addition to using bioinformatics techniques, an analysis based on inter-residue average distance statistics is used to address this problem. It was hard to distinguish which structure a given sequence would take only with the results of ordinary analyses like BLAST and conservation analyses. However, in addition to these analyses, with the analysis based on the inter-residue average distance statistics and our sequence tendency analysis, we could infer which part would play an important role in its structural formation. The results suggest possible determinants of the different 3D structures for sequences with high sequence identity. The possibility of discriminating between the 3D structures based on the given sequences is also discussed.
Wood, Christopher W; Bruning, Marc; Ibarra, Amaurys Á; Bartlett, Gail J; Thomson, Andrew R; Sessions, Richard B; Brady, R Leo; Woolfson, Derek N
2014-11-01
The ability to accurately model protein structures at the atomistic level underpins efforts to understand protein folding, to engineer natural proteins predictably and to design proteins de novo. Homology-based methods are well established and produce impressive results. However, these are limited to structures presented by and resolved for natural proteins. Addressing this problem more widely and deriving truly ab initio models requires mathematical descriptions for protein folds; the means to decorate these with natural, engineered or de novo sequences; and methods to score the resulting models. We present CCBuilder, a web-based application that tackles the problem for a defined but large class of protein structure, the α-helical coiled coils. CCBuilder generates coiled-coil backbones, builds side chains onto these frameworks and provides a range of metrics to measure the quality of the models. Its straightforward graphical user interface provides broad functionality that allows users to build and assess models, in which helix geometry, coiled-coil architecture and topology and protein sequence can be varied rapidly. We demonstrate the utility of CCBuilder by assembling models for 653 coiled-coil structures from the PDB, which cover >96% of the known coiled-coil types, and by generating models for rarer and de novo coiled-coil structures. CCBuilder is freely available, without registration, at http://coiledcoils.chm.bris.ac.uk/app/cc_builder/. © The Author 2014. Published by Oxford University Press.
Xiaodan, Chen; Xiurong, Zhan; Xinyu, Wu; Chunyan, Zhao; Wanghong, Zhao
2015-04-01
The aim of this study is to analyze the three-dimensional crystal structure of SMU.2055 protein, a putative acetyltransferase from the major caries pathogen Streptococcus mutans (S. mutans). The design and selection of the structure-based small molecule inhibitors are also studied. The three-dimensional crystal structure of SMU.2055 protein was obtained by structural genomics research methods of gene cloning and expression, protein purification with Ni²⁺-chelating affinity chromatography, crystal screening, and X-ray diffraction data collection. An inhibitor virtual model matching with its target protein structure was set up using computer-aided drug design methods, virtual screening and fine docking, and Libdock and Autodock procedures. The crystal of SMU.2055 protein was obtained, and its three-dimensional crystal structure was analyzed. This crystal was diffracted to a resolution of 0.23 nm. It belongs to orthorhombic space group C222(1), with unit cell parameters of a = 9.20 nm, b = 9.46 nm, and c = 19.39 nm. The asymmetric unit contained four molecules, with a solvent content of 56.7%. Moreover, five small molecule compounds, whose structure matched with that of the target protein in high degree, were designed and selected. Protein crystallography research of S. mutans SMU.2055 helps to understand the structures and functions of proteins from S. mutans at the atomic level. These five compounds may be considered as effective inhibitors to SMU.2055. The virtual model of small molecule inhibitors we built will lay a foundation to the anticaries research based on the crystal structure of proteins.
Christensen, Anders S.; Linnet, Troels E.; Borg, Mikael; Boomsma, Wouter; Lindorff-Larsen, Kresten; Hamelryck, Thomas; Jensen, Jan H.
2013-01-01
We present the ProCS method for the rapid and accurate prediction of protein backbone amide proton chemical shifts - sensitive probes of the geometry of key hydrogen bonds that determine protein structure. ProCS is parameterized against quantum mechanical (QM) calculations and reproduces high level QM results obtained for a small protein with an RMSD of 0.25 ppm (r = 0.94). ProCS is interfaced with the PHAISTOS protein simulation program and is used to infer statistical protein ensembles that reflect experimentally measured amide proton chemical shift values. Such chemical shift-based structural refinements, starting from high-resolution X-ray structures of Protein G, ubiquitin, and SMN Tudor Domain, result in average chemical shifts, hydrogen bond geometries, and trans-hydrogen bond (h3 JNC') spin-spin coupling constants that are in excellent agreement with experiment. We show that the structural sensitivity of the QM-based amide proton chemical shift predictions is needed to obtain this agreement. The ProCS method thus offers a powerful new tool for refining the structures of hydrogen bonding networks to high accuracy with many potential applications such as protein flexibility in ligand binding. PMID:24391900
Structure-Based Phylogenetic Analysis of the Lipocalin Superfamily.
Lakshmi, Balasubramanian; Mishra, Madhulika; Srinivasan, Narayanaswamy; Archunan, Govindaraju
2015-01-01
Lipocalins constitute a superfamily of extracellular proteins that are found in all three kingdoms of life. Although very divergent in their sequences and functions, they show remarkable similarity in 3-D structures. Lipocalins bind and transport small hydrophobic molecules. Earlier sequence-based phylogenetic studies of lipocalins highlighted that they have a long evolutionary history. However the molecular and structural basis of their functional diversity is not completely understood. The main objective of the present study is to understand functional diversity of the lipocalins using a structure-based phylogenetic approach. The present study with 39 protein domains from the lipocalin superfamily suggests that the clusters of lipocalins obtained by structure-based phylogeny correspond well with the functional diversity. The detailed analysis on each of the clusters and sub-clusters reveals that the 39 lipocalin domains cluster based on their mode of ligand binding though the clustering was performed on the basis of gross domain structure. The outliers in the phylogenetic tree are often from single member families. Also structure-based phylogenetic approach has provided pointers to assign putative function for the domains of unknown function in lipocalin family. The approach employed in the present study can be used in the future for the functional identification of new lipocalin proteins and may be extended to other protein families where members show poor sequence similarity but high structural similarity.
Computational approaches for drug discovery.
Hung, Che-Lun; Chen, Chi-Chun
2014-09-01
Cellular proteins are the mediators of multiple organism functions being involved in physiological mechanisms and disease. By discovering lead compounds that affect the function of target proteins, the target diseases or physiological mechanisms can be modulated. Based on knowledge of the ligand-receptor interaction, the chemical structures of leads can be modified to improve efficacy, selectivity and reduce side effects. One rational drug design technology, which enables drug discovery based on knowledge of target structures, functional properties and mechanisms, is computer-aided drug design (CADD). The application of CADD can be cost-effective using experiments to compare predicted and actual drug activity, the results from which can used iteratively to improve compound properties. The two major CADD-based approaches are structure-based drug design, where protein structures are required, and ligand-based drug design, where ligand and ligand activities can be used to design compounds interacting with the protein structure. Approaches in structure-based drug design include docking, de novo design, fragment-based drug discovery and structure-based pharmacophore modeling. Approaches in ligand-based drug design include quantitative structure-affinity relationship and pharmacophore modeling based on ligand properties. Based on whether the structure of the receptor and its interaction with the ligand are known, different design strategies can be seed. After lead compounds are generated, the rule of five can be used to assess whether these have drug-like properties. Several quality validation methods, such as cost function analysis, Fisher's cross-validation analysis and goodness of hit test, can be used to estimate the metrics of different drug design strategies. To further improve CADD performance, multi-computers and graphics processing units may be applied to reduce costs. © 2014 Wiley Periodicals, Inc.
Ligand Binding Site Detection by Local Structure Alignment and Its Performance Complementarity
Lee, Hui Sun; Im, Wonpil
2013-01-01
Accurate determination of potential ligand binding sites (BS) is a key step for protein function characterization and structure-based drug design. Despite promising results of template-based BS prediction methods using global structure alignment (GSA), there is a room to improve the performance by properly incorporating local structure alignment (LSA) because BS are local structures and often similar for proteins with dissimilar global folds. We present a template-based ligand BS prediction method using G-LoSA, our LSA tool. A large benchmark set validation shows that G-LoSA predicts drug-like ligands’ positions in single-chain protein targets more precisely than TM-align, a GSA-based method, while the overall success rate of TM-align is better. G-LoSA is particularly efficient for accurate detection of local structures conserved across proteins with diverse global topologies. Recognizing the performance complementarity of G-LoSA to TM-align and a non-template geometry-based method, fpocket, a robust consensus scoring method, CMCS-BSP (Complementary Methods and Consensus Scoring for ligand Binding Site Prediction), is developed and shows improvement on prediction accuracy. The G-LoSA source code is freely available at http://im.bioinformatics.ku.edu/GLoSA. PMID:23957286
Rapid and reliable protein structure determination via chemical shift threading.
Hafsa, Noor E; Berjanskii, Mark V; Arndt, David; Wishart, David S
2018-01-01
Protein structure determination using nuclear magnetic resonance (NMR) spectroscopy can be both time-consuming and labor intensive. Here we demonstrate how chemical shift threading can permit rapid, robust, and accurate protein structure determination using only chemical shift data. Threading is a relatively old bioinformatics technique that uses a combination of sequence information and predicted (or experimentally acquired) low-resolution structural data to generate high-resolution 3D protein structures. The key motivations behind using NMR chemical shifts for protein threading lie in the fact that they are easy to measure, they are available prior to 3D structure determination, and they contain vital structural information. The method we have developed uses not only sequence and chemical shift similarity but also chemical shift-derived secondary structure, shift-derived super-secondary structure, and shift-derived accessible surface area to generate a high quality protein structure regardless of the sequence similarity (or lack thereof) to a known structure already in the PDB. The method (called E-Thrifty) was found to be very fast (often < 10 min/structure) and to significantly outperform other shift-based or threading-based structure determination methods (in terms of top template model accuracy)-with an average TM-score performance of 0.68 (vs. 0.50-0.62 for other methods). Coupled with recent developments in chemical shift refinement, these results suggest that protein structure determination, using only NMR chemical shifts, is becoming increasingly practical and reliable. E-Thrifty is available as a web server at http://ethrifty.ca .
General overview on structure prediction of twilight-zone proteins.
Khor, Bee Yin; Tye, Gee Jun; Lim, Theam Soon; Choong, Yee Siew
2015-09-04
Protein structure prediction from amino acid sequence has been one of the most challenging aspects in computational structural biology despite significant progress in recent years showed by critical assessment of protein structure prediction (CASP) experiments. When experimentally determined structures are unavailable, the predictive structures may serve as starting points to study a protein. If the target protein consists of homologous region, high-resolution (typically <1.5 Å) model can be built via comparative modelling. However, when confronted with low sequence similarity of the target protein (also known as twilight-zone protein, sequence identity with available templates is less than 30%), the protein structure prediction has to be initiated from scratch. Traditionally, twilight-zone proteins can be predicted via threading or ab initio method. Based on the current trend, combination of different methods brings an improved success in the prediction of twilight-zone proteins. In this mini review, the methods, progresses and challenges for the prediction of twilight-zone proteins were discussed.
Protein flexibility in the light of structural alphabets
Craveur, Pierrick; Joseph, Agnel P.; Esque, Jeremy; Narwani, Tarun J.; Noël, Floriane; Shinada, Nicolas; Goguet, Matthieu; Leonard, Sylvain; Poulain, Pierre; Bertrand, Olivier; Faure, Guilhem; Rebehmed, Joseph; Ghozlane, Amine; Swapna, Lakshmipuram S.; Bhaskara, Ramachandra M.; Barnoud, Jonathan; Téletchéa, Stéphane; Jallu, Vincent; Cerny, Jiri; Schneider, Bohdan; Etchebest, Catherine; Srinivasan, Narayanaswamy; Gelly, Jean-Christophe; de Brevern, Alexandre G.
2015-01-01
Protein structures are valuable tools to understand protein function. Nonetheless, proteins are often considered as rigid macromolecules while their structures exhibit specific flexibility, which is essential to complete their functions. Analyses of protein structures and dynamics are often performed with a simplified three-state description, i.e., the classical secondary structures. More precise and complete description of protein backbone conformation can be obtained using libraries of small protein fragments that are able to approximate every part of protein structures. These libraries, called structural alphabets (SAs), have been widely used in structure analysis field, from definition of ligand binding sites to superimposition of protein structures. SAs are also well suited to analyze the dynamics of protein structures. Here, we review innovative approaches that investigate protein flexibility based on SAs description. Coupled to various sources of experimental data (e.g., B-factor) and computational methodology (e.g., Molecular Dynamic simulation), SAs turn out to be powerful tools to analyze protein dynamics, e.g., to examine allosteric mechanisms in large set of structures in complexes, to identify order/disorder transition. SAs were also shown to be quite efficient to predict protein flexibility from amino-acid sequence. Finally, in this review, we exemplify the interest of SAs for studying flexibility with different cases of proteins implicated in pathologies and diseases. PMID:26075209
Cao, Hu; Lu, Yonggang
2017-01-01
With the rapid growth of known protein 3D structures in number, how to efficiently compare protein structures becomes an essential and challenging problem in computational structural biology. At present, many protein structure alignment methods have been developed. Among all these methods, flexible structure alignment methods are shown to be superior to rigid structure alignment methods in identifying structure similarities between proteins, which have gone through conformational changes. It is also found that the methods based on aligned fragment pairs (AFPs) have a special advantage over other approaches in balancing global structure similarities and local structure similarities. Accordingly, we propose a new flexible protein structure alignment method based on variable-length AFPs. Compared with other methods, the proposed method possesses three main advantages. First, it is based on variable-length AFPs. The length of each AFP is separately determined to maximally represent a local similar structure fragment, which reduces the number of AFPs. Second, it uses local coordinate systems, which simplify the computation at each step of the expansion of AFPs during the AFP identification. Third, it decreases the number of twists by rewarding the situation where nonconsecutive AFPs share the same transformation in the alignment, which is realized by dynamic programming with an improved transition function. The experimental data show that compared with FlexProt, FATCAT, and FlexSnap, the proposed method can achieve comparable results by introducing fewer twists. Meanwhile, it can generate results similar to those of the FATCAT method in much less running time due to the reduced number of AFPs.
VoroMQA: Assessment of protein structure quality using interatomic contact areas.
Olechnovič, Kliment; Venclovas, Česlovas
2017-06-01
In the absence of experimentally determined protein structure many biological questions can be addressed using computational structural models. However, the utility of protein structural models depends on their quality. Therefore, the estimation of the quality of predicted structures is an important problem. One of the approaches to this problem is the use of knowledge-based statistical potentials. Such methods typically rely on the statistics of distances and angles of residue-residue or atom-atom interactions collected from experimentally determined structures. Here, we present VoroMQA (Voronoi tessellation-based Model Quality Assessment), a new method for the estimation of protein structure quality. Our method combines the idea of statistical potentials with the use of interatomic contact areas instead of distances. Contact areas, derived using Voronoi tessellation of protein structure, are used to describe and seamlessly integrate both explicit interactions between protein atoms and implicit interactions of protein atoms with solvent. VoroMQA produces scores at atomic, residue, and global levels, all in the fixed range from 0 to 1. The method was tested on the CASP data and compared to several other single-model quality assessment methods. VoroMQA showed strong performance in the recognition of the native structure and in the structural model selection tests, thus demonstrating the efficacy of interatomic contact areas in estimating protein structure quality. The software implementation of VoroMQA is freely available as a standalone application and as a web server at http://bioinformatics.lt/software/voromqa. Proteins 2017; 85:1131-1145. © 2017 Wiley Periodicals, Inc. © 2017 Wiley Periodicals, Inc.
Xu, Dong; Zhang, Yang
2012-01-01
Ab initio protein folding is one of the major unsolved problems in computational biology due to the difficulties in force field design and conformational search. We developed a novel program, QUARK, for template-free protein structure prediction. Query sequences are first broken into fragments of 1–20 residues where multiple fragment structures are retrieved at each position from unrelated experimental structures. Full-length structure models are then assembled from fragments using replica-exchange Monte Carlo simulations, which are guided by a composite knowledge-based force field. A number of novel energy terms and Monte Carlo movements are introduced and the particular contributions to enhancing the efficiency of both force field and search engine are analyzed in detail. QUARK prediction procedure is depicted and tested on the structure modeling of 145 non-homologous proteins. Although no global templates are used and all fragments from experimental structures with template modeling score (TM-score) >0.5 are excluded, QUARK can successfully construct 3D models of correct folds in 1/3 cases of short proteins up to 100 residues. In the ninth community-wide Critical Assessment of protein Structure Prediction (CASP9) experiment, QUARK server outperformed the second and third best servers by 18% and 47% based on the cumulative Z-score of global distance test-total (GDT-TS) scores in the free modeling (FM) category. Although ab initio protein folding remains a significant challenge, these data demonstrate new progress towards the solution of the most important problem in the field. PMID:22411565
Hsing, Michael; Cherkasov, Artem
2008-06-25
Insertions and deletions (indels) represent a common type of sequence variations, which are less studied and pose many important biological questions. Recent research has shown that the presence of sizable indels in protein sequences may be indicative of protein essentiality and their role in protein interaction networks. Examples of utilization of indels for structure-based drug design have also been recently demonstrated. Nonetheless many structural and functional characteristics of indels remain less researched or unknown. We have created a web-based resource, Indel PDB, representing a structural database of insertions/deletions identified from the sequence alignments of highly similar proteins found in the Protein Data Bank (PDB). Indel PDB utilized large amounts of available structural information to characterize 1-, 2- and 3-dimensional features of indel sites. Indel PDB contains 117,266 non-redundant indel sites extracted from 11,294 indel-containing proteins. Unlike loop databases, Indel PDB features more indel sequences with secondary structures including alpha-helices and beta-sheets in addition to loops. The insertion fragments have been characterized by their sequences, lengths, locations, secondary structure composition, solvent accessibility, protein domain association and three dimensional structures. By utilizing the data available in Indel PDB, we have studied and presented here several sequence and structural features of indels. We anticipate that Indel PDB will not only enable future functional studies of indels, but will also assist protein modeling efforts and identification of indel-directed drug binding sites.
Neumann, Sindy; Hartmann, Holger; Martin-Galiano, Antonio J; Fuchs, Angelika; Frishman, Dmitrij
2012-03-01
Structural bioinformatics of membrane proteins is still in its infancy, and the picture of their fold space is only beginning to emerge. Because only a handful of three-dimensional structures are available, sequence comparison and structure prediction remain the main tools for investigating sequence-structure relationships in membrane protein families. Here we present a comprehensive analysis of the structural families corresponding to α-helical membrane proteins with at least three transmembrane helices. The new version of our CAMPS database (CAMPS 2.0) covers nearly 1300 eukaryotic, prokaryotic, and viral genomes. Using an advanced classification procedure, which is based on high-order hidden Markov models and considers both sequence similarity as well as the number of transmembrane helices and loop lengths, we identified 1353 structurally homogeneous clusters roughly corresponding to membrane protein folds. Only 53 clusters are associated with experimentally determined three-dimensional structures, and for these clusters CAMPS is in reasonable agreement with structure-based classification approaches such as SCOP and CATH. We therefore estimate that ∼1300 structures would need to be determined to provide a sufficient structural coverage of polytopic membrane proteins. CAMPS 2.0 is available at http://webclu.bio.wzw.tum.de/CAMPS2.0/. Copyright © 2011 Wiley Periodicals, Inc.
3D-SURFER 2.0: web platform for real-time search and characterization of protein surfaces.
Xiong, Yi; Esquivel-Rodriguez, Juan; Sael, Lee; Kihara, Daisuke
2014-01-01
The increasing number of uncharacterized protein structures necessitates the development of computational approaches for function annotation using the protein tertiary structures. Protein structure database search is the basis of any structure-based functional elucidation of proteins. 3D-SURFER is a web platform for real-time protein surface comparison of a given protein structure against the entire PDB using 3D Zernike descriptors. It can smoothly navigate the protein structure space in real-time from one query structure to another. A major new feature of Release 2.0 is the ability to compare the protein surface of a single chain, a single domain, or a single complex against databases of protein chains, domains, complexes, or a combination of all three in the latest PDB. Additionally, two types of protein structures can now be compared: all-atom-surface and backbone-atom-surface. The server can also accept a batch job for a large number of database searches. Pockets in protein surfaces can be identified by VisGrid and LIGSITE (csc) . The server is available at http://kiharalab.org/3d-surfer/.
Mapping monomeric threading to protein-protein structure prediction.
Guerler, Aysam; Govindarajoo, Brandon; Zhang, Yang
2013-03-25
The key step of template-based protein-protein structure prediction is the recognition of complexes from experimental structure libraries that have similar quaternary fold. Maintaining two monomer and dimer structure libraries is however laborious, and inappropriate library construction can degrade template recognition coverage. We propose a novel strategy SPRING to identify complexes by mapping monomeric threading alignments to protein-protein interactions based on the original oligomer entries in the PDB, which does not rely on library construction and increases the efficiency and quality of complex template recognitions. SPRING is tested on 1838 nonhomologous protein complexes which can recognize correct quaternary template structures with a TM score >0.5 in 1115 cases after excluding homologous proteins. The average TM score of the first model is 60% and 17% higher than that by HHsearch and COTH, respectively, while the number of targets with an interface RMSD <2.5 Å by SPRING is 134% and 167% higher than these competing methods. SPRING is controlled with ZDOCK on 77 docking benchmark proteins. Although the relative performance of SPRING and ZDOCK depends on the level of homology filters, a combination of the two methods can result in a significantly higher model quality than ZDOCK at all homology thresholds. These data demonstrate a new efficient approach to quaternary structure recognition that is ready to use for genome-scale modeling of protein-protein interactions due to the high speed and accuracy.
PDBsum: Structural summaries of PDB entries.
Laskowski, Roman A; Jabłońska, Jagoda; Pravda, Lukáš; Vařeková, Radka Svobodová; Thornton, Janet M
2018-01-01
PDBsum is a web server providing structural information on the entries in the Protein Data Bank (PDB). The analyses are primarily image-based and include protein secondary structure, protein-ligand and protein-DNA interactions, PROCHECK analyses of structural quality, and many others. The 3D structures can be viewed interactively in RasMol, PyMOL, and a JavaScript viewer called 3Dmol.js. Users can upload their own PDB files and obtain a set of password-protected PDBsum analyses for each. The server is freely accessible to all at: http://www.ebi.ac.uk/pdbsum. © 2017 The Protein Society.
Protein single-model quality assessment by feature-based probability density functions.
Cao, Renzhi; Cheng, Jianlin
2016-04-04
Protein quality assessment (QA) has played an important role in protein structure prediction. We developed a novel single-model quality assessment method-Qprob. Qprob calculates the absolute error for each protein feature value against the true quality scores (i.e. GDT-TS scores) of protein structural models, and uses them to estimate its probability density distribution for quality assessment. Qprob has been blindly tested on the 11th Critical Assessment of Techniques for Protein Structure Prediction (CASP11) as MULTICOM-NOVEL server. The official CASP result shows that Qprob ranks as one of the top single-model QA methods. In addition, Qprob makes contributions to our protein tertiary structure predictor MULTICOM, which is officially ranked 3rd out of 143 predictors. The good performance shows that Qprob is good at assessing the quality of models of hard targets. These results demonstrate that this new probability density distribution based method is effective for protein single-model quality assessment and is useful for protein structure prediction. The webserver of Qprob is available at: http://calla.rnet.missouri.edu/qprob/. The software is now freely available in the web server of Qprob.
PDB@: an offline toolkit for exploration and analysis of PDB files.
Mani, Udayakumar; Ravisankar, Sadhana; Ramakrishnan, Sai Mukund
2013-12-01
Protein Data Bank (PDB) is a freely accessible archive of the 3-D structural data of biological molecules. Structure based studies offers a unique vantage point in inferring the properties of a protein molecule from structural data. This is too big a task to be done manually. Moreover, there is no single tool, software or server that comprehensively analyses all structure-based properties. The objective of the present work is to develop an offline computational toolkit, PDB@ containing in-built algorithms that help categorizing the structural properties of a protein molecule. The user has the facility to view and edit the PDB file to his need. Some features of the present work are unique in itself and others are an improvement over existing tools. Also, the representation of protein properties in both graphical and textual formats helps in predicting all the necessary details of a protein molecule on a single platform.
Refinement of protein termini in template-based modeling using conformational space annealing.
Park, Hahnbeom; Ko, Junsu; Joo, Keehyoung; Lee, Julian; Seok, Chaok; Lee, Jooyoung
2011-09-01
The rapid increase in the number of experimentally determined protein structures in recent years enables us to obtain more reliable protein tertiary structure models than ever by template-based modeling. However, refinement of template-based models beyond the limit available from the best templates is still needed for understanding protein function in atomic detail. In this work, we develop a new method for protein terminus modeling that can be applied to refinement of models with unreliable terminus structures. The energy function for terminus modeling consists of both physics-based and knowledge-based potential terms with carefully optimized relative weights. Effective sampling of both the framework and terminus is performed using the conformational space annealing technique. This method has been tested on a set of termini derived from a nonredundant structure database and two sets of termini from the CASP8 targets. The performance of the terminus modeling method is significantly improved over our previous method that does not employ terminus refinement. It is also comparable or superior to the best server methods tested in CASP8. The success of the current approach suggests that similar strategy may be applied to other types of refinement problems such as loop modeling or secondary structure rearrangement. Copyright © 2011 Wiley-Liss, Inc.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zemla, A; Lang, D; Kostova, T
2010-11-29
Most of the currently used methods for protein function prediction rely on sequence-based comparisons between a query protein and those for which a functional annotation is provided. A serious limitation of sequence similarity-based approaches for identifying residue conservation among proteins is the low confidence in assigning residue-residue correspondences among proteins when the level of sequence identity between the compared proteins is poor. Multiple sequence alignment methods are more satisfactory - still, they cannot provide reliable results at low levels of sequence identity. Our goal in the current work was to develop an algorithm that could overcome these difficulties and facilitatemore » the identification of structurally (and possibly functionally) relevant residue-residue correspondences between compared protein structures. Here we present StralSV, a new algorithm for detecting closely related structure fragments and quantifying residue frequency from tight local structure alignments. We apply StralSV in a study of the RNA-dependent RNA polymerase of poliovirus and demonstrate that the algorithm can be used to determine regions of the protein that are relatively unique or that shared structural similarity with structures that are distantly related. By quantifying residue frequencies among many residue-residue pairs extracted from local alignments, one can infer potential structural or functional importance of specific residues that are determined to be highly conserved or that deviate from a consensus. We further demonstrate that considerable detailed structural and phylogenetic information can be derived from StralSV analyses. StralSV is a new structure-based algorithm for identifying and aligning structure fragments that have similarity to a reference protein. StralSV analysis can be used to quantify residue-residue correspondences and identify residues that may be of particular structural or functional importance, as well as unusual or unexpected residues at a given sequence position.« less
Fast iodide-SAD phasing for high-throughput membrane protein structure determination.
Melnikov, Igor; Polovinkin, Vitaly; Kovalev, Kirill; Gushchin, Ivan; Shevtsov, Mikhail; Shevchenko, Vitaly; Mishin, Alexey; Alekseev, Alexey; Rodriguez-Valera, Francisco; Borshchevskiy, Valentin; Cherezov, Vadim; Leonard, Gordon A; Gordeliy, Valentin; Popov, Alexander
2017-05-01
We describe a fast, easy, and potentially universal method for the de novo solution of the crystal structures of membrane proteins via iodide-single-wavelength anomalous diffraction (I-SAD). The potential universality of the method is based on a common feature of membrane proteins-the availability at the hydrophobic-hydrophilic interface of positively charged amino acid residues with which iodide strongly interacts. We demonstrate the solution using I-SAD of four crystal structures representing different classes of membrane proteins, including a human G protein-coupled receptor (GPCR), and we show that I-SAD can be applied using data collection strategies based on either standard or serial x-ray crystallography techniques.
Kumar, Avishek; Campitelli, Paul; Thorpe, M F; Ozkan, S Banu
2015-12-01
The most successful protein structure prediction methods to date have been template-based modeling (TBM) or homology modeling, which predicts protein structure based on experimental structures. These high accuracy predictions sometimes retain structural errors due to incorrect templates or a lack of accurate templates in the case of low sequence similarity, making these structures inadequate in drug-design studies or molecular dynamics simulations. We have developed a new physics based approach to the protein refinement problem by mimicking the mechanism of chaperons that rehabilitate misfolded proteins. The template structure is unfolded by selectively (targeted) pulling on different portions of the protein using the geometric based technique FRODA, and then refolded using hierarchically restrained replica exchange molecular dynamics simulations (hr-REMD). FRODA unfolding is used to create a diverse set of topologies for surveying near native-like structures from a template and to provide a set of persistent contacts to be employed during re-folding. We have tested our approach on 13 previous CASP targets and observed that this method of folding an ensemble of partially unfolded structures, through the hierarchical addition of contact restraints (that is, first local and then nonlocal interactions), leads to a refolding of the structure along with refinement in most cases (12/13). Although this approach yields refined models through advancement in sampling, the task of blind selection of the best refined models still needs to be solved. Overall, the method can be useful for improved sampling for low resolution models where certain of the portions of the structure are incorrectly modeled. © 2015 Wiley Periodicals, Inc.
Thompson, Jared J; Tabatabaei Ghomi, Hamed; Lill, Markus A
2014-12-01
Knowledge-based methods for analyzing protein structures, such as statistical potentials, primarily consider the distances between pairs of bodies (atoms or groups of atoms). Considerations of several bodies simultaneously are generally used to characterize bonded structural elements or those in close contact with each other, but historically do not consider atoms that are not in direct contact with each other. In this report, we introduce an information-theoretic method for detecting and quantifying distance-dependent through-space multibody relationships between the sidechains of three residues. The technique introduced is capable of producing convergent and consistent results when applied to a sufficiently large database of randomly chosen, experimentally solved protein structures. The results of our study can be shown to reproduce established physico-chemical properties of residues as well as more recently discovered properties and interactions. These results offer insight into the numerous roles that residues play in protein structure, as well as relationships between residue function, protein structure, and evolution. The techniques and insights presented in this work should be useful in the future development of novel knowledge-based tools for the evaluation of protein structure. © 2014 Wiley Periodicals, Inc.
Tertiary structural propensities reveal fundamental sequence/structure relationships.
Zheng, Fan; Zhang, Jian; Grigoryan, Gevorg
2015-05-05
Extracting useful generalizations from the continually growing Protein Data Bank (PDB) is of central importance. We hypothesize that the PDB contains valuable quantitative information on the level of local tertiary structural motifs (TERMs). We show that by breaking a protein structure into its constituent TERMs, and querying the PDB to characterize the natural ensemble matching each, we can estimate the compatibility of the structure with a given amino acid sequence through a metric we term "structure score." Considering submissions from recent Critical Assessment of Structure Prediction (CASP) experiments, we found a strong correlation (R = 0.69) between structure score and model accuracy, with poorly predicted regions readily identifiable. This performance exceeds that of leading atomistic statistical energy functions. Furthermore, TERM-based analysis of two prototypical multi-state proteins rapidly produced structural insights fully consistent with prior extensive experimental studies. We thus find that TERM-based analysis should have considerable utility for protein structural biology. Copyright © 2015 Elsevier Ltd. All rights reserved.
Joseph, Agnel Praveen; Srinivasan, Narayanaswamy; de Brevern, Alexandre G
2012-09-01
Comparison of multiple protein structures has a broad range of applications in the analysis of protein structure, function and evolution. Multiple structure alignment tools (MSTAs) are necessary to obtain a simultaneous comparison of a family of related folds. In this study, we have developed a method for multiple structure comparison largely based on sequence alignment techniques. A widely used Structural Alphabet named Protein Blocks (PBs) was used to transform the information on 3D protein backbone conformation as a 1D sequence string. A progressive alignment strategy similar to CLUSTALW was adopted for multiple PB sequence alignment (mulPBA). Highly similar stretches identified by the pairwise alignments are given higher weights during the alignment. The residue equivalences from PB based alignments are used to obtain a three dimensional fit of the structures followed by an iterative refinement of the structural superposition. Systematic comparisons using benchmark datasets of MSTAs underlines that the alignment quality is better than MULTIPROT, MUSTANG and the alignments in HOMSTRAD, in more than 85% of the cases. Comparison with other rigid-body and flexible MSTAs also indicate that mulPBA alignments are superior to most of the rigid-body MSTAs and highly comparable to the flexible alignment methods. Copyright © 2012 Elsevier Masson SAS. All rights reserved.
Kister, Alexander
2015-01-01
We present an alternative approach to protein 3D folding prediction based on determination of rules that specify distribution of “favorable” residues, that are mainly responsible for a given fold formation, and “unfavorable” residues, that are incompatible with that fold, in polypeptide sequences. The process of determining favorable and unfavorable residues is iterative. The starting assumptions are based on the general principles of protein structure formation as well as structural features peculiar to a protein fold under investigation. The initial assumptions are tested one-by-one for a set of all known proteins with a given structure. The assumption is accepted as a “rule of amino acid distribution” for the protein fold if it holds true for all, or near all, structures. If the assumption is not accepted as a rule, it can be modified to better fit the data and then tested again in the next step of the iterative search algorithm, or rejected. We determined the set of amino acid distribution rules for a large group of beta sandwich-like proteins characterized by a specific arrangement of strands in two beta sheets. It was shown that this set of rules is highly sensitive (~90%) and very specific (~99%) for identifying sequences of proteins with specified beta sandwich fold structure. The advantage of the proposed approach is that it does not require that query proteins have a high degree of homology to proteins with known structure. So long as the query protein satisfies residue distribution rules, it can be confidently assigned to its respective protein fold. Another advantage of our approach is that it allows for a better understanding of which residues play an essential role in protein fold formation. It may, therefore, facilitate rational protein engineering design. PMID:25625198
Schindler, Christina E M; de Vries, Sjoerd J; Zacharias, Martin
2015-02-01
Protein-protein interactions are abundant in the cell but to date structural data for a large number of complexes is lacking. Computational docking methods can complement experiments by providing structural models of complexes based on structures of the individual partners. A major caveat for docking success is accounting for protein flexibility. Especially, interface residues undergo significant conformational changes upon binding. This limits the performance of docking methods that keep partner structures rigid or allow limited flexibility. A new docking refinement approach, iATTRACT, has been developed which combines simultaneous full interface flexibility and rigid body optimizations during docking energy minimization. It employs an atomistic molecular mechanics force field for intermolecular interface interactions and a structure-based force field for intramolecular contributions. The approach was systematically evaluated on a large protein-protein docking benchmark, starting from an enriched decoy set of rigidly docked protein-protein complexes deviating by up to 15 Å from the native structure at the interface. Large improvements in sampling and slight but significant improvements in scoring/discrimination of near native docking solutions were observed. Complexes with initial deviations at the interface of up to 5.5 Å were refined to significantly better agreement with the native structure. Improvements in the fraction of native contacts were especially favorable, yielding increases of up to 70%. © 2014 Wiley Periodicals, Inc.
Dehzangi, Abdollah; Paliwal, Kuldip; Sharma, Alok; Dehzangi, Omid; Sattar, Abdul
2013-01-01
Better understanding of structural class of a given protein reveals important information about its overall folding type and its domain. It can also be directly used to provide critical information on general tertiary structure of a protein which has a profound impact on protein function determination and drug design. Despite tremendous enhancements made by pattern recognition-based approaches to solve this problem, it still remains as an unsolved issue for bioinformatics that demands more attention and exploration. In this study, we propose a novel feature extraction model that incorporates physicochemical and evolutionary-based information simultaneously. We also propose overlapped segmented distribution and autocorrelation-based feature extraction methods to provide more local and global discriminatory information. The proposed feature extraction methods are explored for 15 most promising attributes that are selected from a wide range of physicochemical-based attributes. Finally, by applying an ensemble of different classifiers namely, Adaboost.M1, LogitBoost, naive Bayes, multilayer perceptron (MLP), and support vector machine (SVM) we show enhancement of the protein structural class prediction accuracy for four popular benchmarks.
SeqRate: sequence-based protein folding type classification and rates prediction
2010-01-01
Background Protein folding rate is an important property of a protein. Predicting protein folding rate is useful for understanding protein folding process and guiding protein design. Most previous methods of predicting protein folding rate require the tertiary structure of a protein as an input. And most methods do not distinguish the different kinetic nature (two-state folding or multi-state folding) of the proteins. Here we developed a method, SeqRate, to predict both protein folding kinetic type (two-state versus multi-state) and real-value folding rate using sequence length, amino acid composition, contact order, contact number, and secondary structure information predicted from only protein sequence with support vector machines. Results We systematically studied the contributions of individual features to folding rate prediction. On a standard benchmark dataset, the accuracy of folding kinetic type classification is 80%. The Pearson correlation coefficient and the mean absolute difference between predicted and experimental folding rates (sec-1) in the base-10 logarithmic scale are 0.81 and 0.79 for two-state protein folders, and 0.80 and 0.68 for three-state protein folders. SeqRate is the first sequence-based method for protein folding type classification and its accuracy of fold rate prediction is improved over previous sequence-based methods. Its performance can be further enhanced with additional information, such as structure-based geometric contacts, as inputs. Conclusions Both the web server and software of predicting folding rate are publicly available at http://casp.rnet.missouri.edu/fold_rate/index.html. PMID:20438647
Density functional study of molecular interactions in secondary structures of proteins.
Takano, Yu; Kusaka, Ayumi; Nakamura, Haruki
2016-01-01
Proteins play diverse and vital roles in biology, which are dominated by their three-dimensional structures. The three-dimensional structure of a protein determines its functions and chemical properties. Protein secondary structures, including α-helices and β-sheets, are key components of the protein architecture. Molecular interactions, in particular hydrogen bonds, play significant roles in the formation of protein secondary structures. Precise and quantitative estimations of these interactions are required to understand the principles underlying the formation of three-dimensional protein structures. In the present study, we have investigated the molecular interactions in α-helices and β-sheets, using ab initio wave function-based methods, the Hartree-Fock method (HF) and the second-order Møller-Plesset perturbation theory (MP2), density functional theory, and molecular mechanics. The characteristic interactions essential for forming the secondary structures are discussed quantitatively.
KoBaMIN: a knowledge-based minimization web server for protein structure refinement.
Rodrigues, João P G L M; Levitt, Michael; Chopra, Gaurav
2012-07-01
The KoBaMIN web server provides an online interface to a simple, consistent and computationally efficient protein structure refinement protocol based on minimization of a knowledge-based potential of mean force. The server can be used to refine either a single protein structure or an ensemble of proteins starting from their unrefined coordinates in PDB format. The refinement method is particularly fast and accurate due to the underlying knowledge-based potential derived from structures deposited in the PDB; as such, the energy function implicitly includes the effects of solvent and the crystal environment. Our server allows for an optional but recommended step that optimizes stereochemistry using the MESHI software. The KoBaMIN server also allows comparison of the refined structures with a provided reference structure to assess the changes brought about by the refinement protocol. The performance of KoBaMIN has been benchmarked widely on a large set of decoys, all models generated at the seventh worldwide experiments on critical assessment of techniques for protein structure prediction (CASP7) and it was also shown to produce top-ranking predictions in the refinement category at both CASP8 and CASP9, yielding consistently good results across a broad range of model quality values. The web server is fully functional and freely available at http://csb.stanford.edu/kobamin.
A tool for calculating binding-site residues on proteins from PDB structures.
Hu, Jing; Yan, Changhui
2009-08-03
In the research on protein functional sites, researchers often need to identify binding-site residues on a protein. A commonly used strategy is to find a complex structure from the Protein Data Bank (PDB) that consists of the protein of interest and its interacting partner(s) and calculate binding-site residues based on the complex structure. However, since a protein may participate in multiple interactions, the binding-site residues calculated based on one complex structure usually do not reveal all binding sites on a protein. Thus, this requires researchers to find all PDB complexes that contain the protein of interest and combine the binding-site information gleaned from them. This process is very time-consuming. Especially, combing binding-site information obtained from different PDB structures requires tedious work to align protein sequences. The process becomes overwhelmingly difficult when researchers have a large set of proteins to analyze, which is usually the case in practice. In this study, we have developed a tool for calculating binding-site residues on proteins, TCBRP http://yanbioinformatics.cs.usu.edu:8080/ppbindingsubmit. For an input protein, TCBRP can quickly find all binding-site residues on the protein by automatically combining the information obtained from all PDB structures that consist of the protein of interest. Additionally, TCBRP presents the binding-site residues in different categories according to the interaction type. TCBRP also allows researchers to set the definition of binding-site residues. The developed tool is very useful for the research on protein binding site analysis and prediction.
NIAS-Server: Neighbors Influence of Amino acids and Secondary Structures in Proteins.
Borguesan, Bruno; Inostroza-Ponta, Mario; Dorn, Márcio
2017-03-01
The exponential growth in the number of experimentally determined three-dimensional protein structures provide a new and relevant knowledge about the conformation of amino acids in proteins. Only a few of probability densities of amino acids are publicly available for use in structure validation and prediction methods. NIAS (Neighbors Influence of Amino acids and Secondary structures) is a web-based tool used to extract information about conformational preferences of amino acid residues and secondary structures in experimental-determined protein templates. This information is useful, for example, to characterize folds and local motifs in proteins, molecular folding, and can help the solution of complex problems such as protein structure prediction, protein design, among others. The NIAS-Server and supplementary data are available at http://sbcb.inf.ufrgs.br/nias .
Alpha-Helical Protein Networks Are Self-Protective and Flaw-Tolerant
Ackbarow, Theodor; Sen, Dipanjan; Thaulow, Christian; Buehler, Markus J.
2009-01-01
Alpha-helix based protein networks as they appear in intermediate filaments in the cell’s cytoskeleton and the nuclear membrane robustly withstand large deformation of up to several hundred percent strain, despite the presence of structural imperfections or flaws. This performance is not achieved by most synthetic materials, which typically fail at much smaller deformation and show a great sensitivity to the existence of structural flaws. Here we report a series of molecular dynamics simulations with a simple coarse-grained multi-scale model of alpha-helical protein domains, explaining the structural and mechanistic basis for this observed behavior. We find that the characteristic properties of alpha-helix based protein networks are due to the particular nanomechanical properties of their protein constituents, enabling the formation of large dissipative yield regions around structural flaws, effectively protecting the protein network against catastrophic failure. We show that the key for these self protecting properties is a geometric transformation of the crack shape that significantly reduces the stress concentration at corners. Specifically, our analysis demonstrates that the failure strain of alpha-helix based protein networks is insensitive to the presence of structural flaws in the protein network, only marginally affecting their overall strength. Our findings may help to explain the ability of cells to undergo large deformation without catastrophic failure while providing significant mechanical resistance. PMID:19547709
Scheraga, H A; Paine, G H
1986-01-01
We are using a variety of theoretical and computational techniques to study protein structure, protein folding, and higher-order structures. Our earlier work involved treatments of liquid water and aqueous solutions of nonpolar and polar solutes, computations of the stabilities of the fundamental structures of proteins and their packing arrangements, conformations of small cyclic and open-chain peptides, structures of fibrous proteins (collagen), structures of homologous globular proteins, introduction of special procedures as constraints during energy minimization of globular proteins, and structures of enzyme-substrate complexes. Recently, we presented a new methodology for predicting polypeptide structure (described here); the method is based on the calculation of the probable and average conformation of a polypeptide chain by the application of equilibrium statistical mechanics in conjunction with an adaptive, importance sampling Monte Carlo algorithm. As a test, it was applied to Met-enkephalin.
Modeling Structure and Dynamics of Protein Complexes with SAXS Profiles
Schneidman-Duhovny, Dina; Hammel, Michal
2018-01-01
Small-angle X-ray scattering (SAXS) is an increasingly common and useful technique for structural characterization of molecules in solution. A SAXS experiment determines the scattering intensity of a molecule as a function of spatial frequency, termed SAXS profile. SAXS profiles can be utilized in a variety of molecular modeling applications, such as comparing solution and crystal structures, structural characterization of flexible proteins, assembly of multi-protein complexes, and modeling of missing regions in the high-resolution structure. Here, we describe protocols for modeling atomic structures based on SAXS profiles. The first protocol is for comparing solution and crystal structures including modeling of missing regions and determination of the oligomeric state. The second protocol performs multi-state modeling by finding a set of conformations and their weights that fit the SAXS profile starting from a single-input structure. The third protocol is for protein-protein docking based on the SAXS profile of the complex. We describe the underlying software, followed by demonstrating their application on interleukin 33 (IL33) with its primary receptor ST2 and DNA ligase IV-XRCC4 complex. PMID:29605933
The impact of CRISPR repeat sequence on structures of a Cas6 protein-RNA complex
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wang, Ruiying; Zheng, Han; Preamplume, Gan
The repeat-associated mysterious proteins (RAMPs) comprise the most abundant family of proteins involved in prokaryotic immunity against invading genetic elements conferred by the clustered regularly interspaced short palindromic repeat (CRISPR) system. Cas6 is one of the first characterized RAMP proteins and is a key enzyme required for CRISPR RNA maturation. Despite a strong structural homology with other RAMP proteins that bind hairpin RNA, Cas6 distinctly recognizes single-stranded RNA. Previous structural and biochemical studies show that Cas6 captures the 5' end while cleaving the 3' end of the CRISPR RNA. Here, we describe three structures and complementary biochemical analysis of amore » noncatalytic Cas6 homolog from Pyrococcus horikoshii bound to CRISPR repeat RNA of different sequences. Our study confirms the specificity of the Cas6 protein for single-stranded RNA and further reveals the importance of the bases at Positions 5-7 in Cas6-RNA interactions. Substitutions of these bases result in structural changes in the protein-RNA complex including its oligomerization state.« less
Gupta, Sayan; Feng, Jun; Chance, Mark; Ralston, Corie
2016-01-01
Synchrotron X-ray Footprinting is a powerful in situ hydroxyl radical labeling method for analysis of protein structure, interactions, folding and conformation change in solution. In this method, water is ionized by high flux density broad band synchrotron X-rays to produce a steady-state concentration of hydroxyl radicals, which then react with solvent accessible side-chains. The resulting stable modification products are analyzed by liquid chromatography coupled to mass spectrometry. A comparative reactivity rate between known and unknown states of a protein provides local as well as global information on structural changes, which is then used to develop structural models for protein function and dynamics. In this review we describe the XF-MS method, its unique capabilities and its recent technical advances at the Advanced Light Source. We provide a comparison of other hydroxyl radical and mass spectrometry based methods with XFMS. We also discuss some of the latest developments in its usage for studying bound water, transmembrane proteins and photosynthetic protein components, and the synergy of the method with other synchrotron based structural biology methods.
Knowledge-based fragment binding prediction.
Tang, Grace W; Altman, Russ B
2014-04-01
Target-based drug discovery must assess many drug-like compounds for potential activity. Focusing on low-molecular-weight compounds (fragments) can dramatically reduce the chemical search space. However, approaches for determining protein-fragment interactions have limitations. Experimental assays are time-consuming, expensive, and not always applicable. At the same time, computational approaches using physics-based methods have limited accuracy. With increasing high-resolution structural data for protein-ligand complexes, there is now an opportunity for data-driven approaches to fragment binding prediction. We present FragFEATURE, a machine learning approach to predict small molecule fragments preferred by a target protein structure. We first create a knowledge base of protein structural environments annotated with the small molecule substructures they bind. These substructures have low-molecular weight and serve as a proxy for fragments. FragFEATURE then compares the structural environments within a target protein to those in the knowledge base to retrieve statistically preferred fragments. It merges information across diverse ligands with shared substructures to generate predictions. Our results demonstrate FragFEATURE's ability to rediscover fragments corresponding to the ligand bound with 74% precision and 82% recall on average. For many protein targets, it identifies high scoring fragments that are substructures of known inhibitors. FragFEATURE thus predicts fragments that can serve as inputs to fragment-based drug design or serve as refinement criteria for creating target-specific compound libraries for experimental or computational screening.
Knowledge-based Fragment Binding Prediction
Tang, Grace W.; Altman, Russ B.
2014-01-01
Target-based drug discovery must assess many drug-like compounds for potential activity. Focusing on low-molecular-weight compounds (fragments) can dramatically reduce the chemical search space. However, approaches for determining protein-fragment interactions have limitations. Experimental assays are time-consuming, expensive, and not always applicable. At the same time, computational approaches using physics-based methods have limited accuracy. With increasing high-resolution structural data for protein-ligand complexes, there is now an opportunity for data-driven approaches to fragment binding prediction. We present FragFEATURE, a machine learning approach to predict small molecule fragments preferred by a target protein structure. We first create a knowledge base of protein structural environments annotated with the small molecule substructures they bind. These substructures have low-molecular weight and serve as a proxy for fragments. FragFEATURE then compares the structural environments within a target protein to those in the knowledge base to retrieve statistically preferred fragments. It merges information across diverse ligands with shared substructures to generate predictions. Our results demonstrate FragFEATURE's ability to rediscover fragments corresponding to the ligand bound with 74% precision and 82% recall on average. For many protein targets, it identifies high scoring fragments that are substructures of known inhibitors. FragFEATURE thus predicts fragments that can serve as inputs to fragment-based drug design or serve as refinement criteria for creating target-specific compound libraries for experimental or computational screening. PMID:24762971
Template-based protein structure modeling using the RaptorX web server.
Källberg, Morten; Wang, Haipeng; Wang, Sheng; Peng, Jian; Wang, Zhiyong; Lu, Hui; Xu, Jinbo
2012-07-19
A key challenge of modern biology is to uncover the functional role of the protein entities that compose cellular proteomes. To this end, the availability of reliable three-dimensional atomic models of proteins is often crucial. This protocol presents a community-wide web-based method using RaptorX (http://raptorx.uchicago.edu/) for protein secondary structure prediction, template-based tertiary structure modeling, alignment quality assessment and sophisticated probabilistic alignment sampling. RaptorX distinguishes itself from other servers by the quality of the alignment between a target sequence and one or multiple distantly related template proteins (especially those with sparse sequence profiles) and by a novel nonlinear scoring function and a probabilistic-consistency algorithm. Consequently, RaptorX delivers high-quality structural models for many targets with only remote templates. At present, it takes RaptorX ~35 min to finish processing a sequence of 200 amino acids. Since its official release in August 2011, RaptorX has processed ~6,000 sequences submitted by ~1,600 users from around the world.
Template-based protein structure modeling using the RaptorX web server
Källberg, Morten; Wang, Haipeng; Wang, Sheng; Peng, Jian; Wang, Zhiyong; Lu, Hui; Xu, Jinbo
2016-01-01
A key challenge of modern biology is to uncover the functional role of the protein entities that compose cellular proteomes. To this end, the availability of reliable three-dimensional atomic models of proteins is often crucial. This protocol presents a community-wide web-based method using RaptorX (http://raptorx.uchicago.edu/) for protein secondary structure prediction, template-based tertiary structure modeling, alignment quality assessment and sophisticated probabilistic alignment sampling. RaptorX distinguishes itself from other servers by the quality of the alignment between a target sequence and one or multiple distantly related template proteins (especially those with sparse sequence profiles) and by a novel nonlinear scoring function and a probabilistic-consistency algorithm. Consequently, RaptorX delivers high-quality structural models for many targets with only remote templates. At present, it takes RaptorX ~35 min to finish processing a sequence of 200 amino acids. Since its official release in August 2011, RaptorX has processed ~6,000 sequences submitted by ~1,600 users from around the world. PMID:22814390
Understanding protein evolution: from protein physics to Darwinian selection.
Zeldovich, Konstantin B; Shakhnovich, Eugene I
2008-01-01
Efforts in whole-genome sequencing and structural proteomics start to provide a global view of the protein universe, the set of existing protein structures and sequences. However, approaches based on the selection of individual sequences have not been entirely successful at the quantitative description of the distribution of structures and sequences in the protein universe because evolutionary pressure acts on the entire organism, rather than on a particular molecule. In parallel to this line of study, studies in population genetics and phenomenological molecular evolution established a mathematical framework to describe the changes in genome sequences in populations of organisms over time. Here, we review both microscopic (physics-based) and macroscopic (organism-level) models of protein-sequence evolution and demonstrate that bridging the two scales provides the most complete description of the protein universe starting from clearly defined, testable, and physiologically relevant assumptions.
Decomposition of Proteins into Dynamic Units from Atomic Cross-Correlation Functions.
Calligari, Paolo; Gerolin, Marco; Abergel, Daniel; Polimeno, Antonino
2017-01-10
In this article, we present a clustering method of atoms in proteins based on the analysis of the correlation times of interatomic distance correlation functions computed from MD simulations. The goal is to provide a coarse-grained description of the protein in terms of fewer elements that can be treated as dynamically independent subunits. Importantly, this domain decomposition method does not take into account structural properties of the protein. Instead, the clustering of protein residues in terms of networks of dynamically correlated domains is defined on the basis of the effective correlation times of the pair distance correlation functions. For these properties, our method stands as a complementary analysis to the customary protein decomposition in terms of quasi-rigid, structure-based domains. Results obtained for a prototypal protein structure illustrate the approach proposed.
Cao, Renzhi; Bhattacharya, Debswapna; Adhikari, Badri; Li, Jilong; Cheng, Jianlin
2016-09-01
Model evaluation and selection is an important step and a big challenge in template-based protein structure prediction. Individual model quality assessment methods designed for recognizing some specific properties of protein structures often fail to consistently select good models from a model pool because of their limitations. Therefore, combining multiple complimentary quality assessment methods is useful for improving model ranking and consequently tertiary structure prediction. Here, we report the performance and analysis of our human tertiary structure predictor (MULTICOM) based on the massive integration of 14 diverse complementary quality assessment methods that was successfully benchmarked in the 11th Critical Assessment of Techniques of Protein Structure prediction (CASP11). The predictions of MULTICOM for 39 template-based domains were rigorously assessed by six scoring metrics covering global topology of Cα trace, local all-atom fitness, side chain quality, and physical reasonableness of the model. The results show that the massive integration of complementary, diverse single-model and multi-model quality assessment methods can effectively leverage the strength of single-model methods in distinguishing quality variation among similar good models and the advantage of multi-model quality assessment methods of identifying reasonable average-quality models. The overall excellent performance of the MULTICOM predictor demonstrates that integrating a large number of model quality assessment methods in conjunction with model clustering is a useful approach to improve the accuracy, diversity, and consequently robustness of template-based protein structure prediction. Proteins 2016; 84(Suppl 1):247-259. © 2015 Wiley Periodicals, Inc. © 2015 Wiley Periodicals, Inc.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ecale Zhou, C L; Zemla, A T; Roe, D
2005-01-29
Specific and sensitive ligand-based protein detection assays that employ antibodies or small molecules such as peptides, aptamers, or other small molecules require that the corresponding surface region of the protein be accessible and that there be minimal cross-reactivity with non-target proteins. To reduce the time and cost of laboratory screening efforts for diagnostic reagents, we developed new methods for evaluating and selecting protein surface regions for ligand targeting. We devised combined structure- and sequence-based methods for identifying 3D epitopes and binding pockets on the surface of the A chain of ricin that are conserved with respect to a set ofmore » ricin A chains and unique with respect to other proteins. We (1) used structure alignment software to detect structural deviations and extracted from this analysis the residue-residue correspondence, (2) devised a method to compare corresponding residues across sets of ricin structures and structures of closely related proteins, (3) devised a sequence-based approach to determine residue infrequency in local sequence context, and (4) modified a pocket-finding algorithm to identify surface crevices in close proximity to residues determined to be conserved/unique based on our structure- and sequence-based methods. In applying this combined informatics approach to ricin A we identified a conserved/unique pocket in close proximity (but not overlapping) the active site that is suitable for bi-dentate ligand development. These methods are generally applicable to identification of surface epitopes and binding pockets for development of diagnostic reagents, therapeutics, and vaccines.« less
Hayashi, Takanori; Matsuzaki, Yuri; Yanagisawa, Keisuke; Ohue, Masahito; Akiyama, Yutaka
2018-05-08
Protein-protein interactions (PPIs) play several roles in living cells, and computational PPI prediction is a major focus of many researchers. The three-dimensional (3D) structure and binding surface are important for the design of PPI inhibitors. Therefore, rigid body protein-protein docking calculations for two protein structures are expected to allow elucidation of PPIs different from known complexes in terms of 3D structures because known PPI information is not explicitly required. We have developed rapid PPI prediction software based on protein-protein docking, called MEGADOCK. In order to fully utilize the benefits of computational PPI predictions, it is necessary to construct a comprehensive database to gather prediction results and their predicted 3D complex structures and to make them easily accessible. Although several databases exist that provide predicted PPIs, the previous databases do not contain a sufficient number of entries for the purpose of discovering novel PPIs. In this study, we constructed an integrated database of MEGADOCK PPI predictions, named MEGADOCK-Web. MEGADOCK-Web provides more than 10 times the number of PPI predictions than previous databases and enables users to conduct PPI predictions that cannot be found in conventional PPI prediction databases. In MEGADOCK-Web, there are 7528 protein chains and 28,331,628 predicted PPIs from all possible combinations of those proteins. Each protein structure is annotated with PDB ID, chain ID, UniProt AC, related KEGG pathway IDs, and known PPI pairs. Additionally, MEGADOCK-Web provides four powerful functions: 1) searching precalculated PPI predictions, 2) providing annotations for each predicted protein pair with an experimentally known PPI, 3) visualizing candidates that may interact with the query protein on biochemical pathways, and 4) visualizing predicted complex structures through a 3D molecular viewer. MEGADOCK-Web provides a huge amount of comprehensive PPI predictions based on docking calculations with biochemical pathways and enables users to easily and quickly assess PPI feasibilities by archiving PPI predictions. MEGADOCK-Web also promotes the discovery of new PPIs and protein functions and is freely available for use at http://www.bi.cs.titech.ac.jp/megadock-web/ .
Jeong, Jae-Hee; Kim, Yi-Seul; Rojviriya, Catleya; Cha, Hyung Jin; Ha, Sung-Chul; Kim, Yeon-Gil
2013-10-01
The members of the ARM/HEAT repeat-containing protein superfamily in eukaryotes have been known to mediate protein-protein interactions by using their concave surface. However, little is known about the ARM/HEAT repeat proteins in prokaryotes. Here we report the crystal structure of TON1937, a hypothetical protein from the hyperthermophilic archaeon Thermococcus onnurineus NA1. The structure reveals a crescent-shaped molecule composed of a double layer of α-helices with seven anti-parallel α-helical repeats. A structure-based sequence alignment of the α-helical repeats identified a conserved pattern of hydrophobic or aliphatic residues reminiscent of the consensus sequence of eukaryotic HEAT repeats. The individual repeats of TON1937 also share high structural similarity with the canonical eukaryotic HEAT repeats. In addition, the concave surface of TON1937 is proposed to be its potential binding interface based on this structural comparison and its surface properties. These observations lead us to speculate that the archaeal HEAT-like repeats of TON1937 have evolved to engage in protein-protein interactions in the same manner as eukaryotic HEAT repeats. Copyright © 2013 Elsevier B.V. All rights reserved.
Structure-based design of combinatorial mutagenesis libraries.
Verma, Deeptak; Grigoryan, Gevorg; Bailey-Kellogg, Chris
2015-05-01
The development of protein variants with improved properties (thermostability, binding affinity, catalytic activity, etc.) has greatly benefited from the application of high-throughput screens evaluating large, diverse combinatorial libraries. At the same time, since only a very limited portion of sequence space can be experimentally constructed and tested, an attractive possibility is to use computational protein design to focus libraries on a productive portion of the space. We present a general-purpose method, called "Structure-based Optimization of Combinatorial Mutagenesis" (SOCoM), which can optimize arbitrarily large combinatorial mutagenesis libraries directly based on structural energies of their constituents. SOCoM chooses both positions and substitutions, employing a combinatorial optimization framework based on library-averaged energy potentials in order to avoid explicitly modeling every variant in every possible library. In case study applications to green fluorescent protein, β-lactamase, and lipase A, SOCoM optimizes relatively small, focused libraries whose variants achieve energies comparable to or better than previous library design efforts, as well as larger libraries (previously not designable by structure-based methods) whose variants cover greater diversity while still maintaining substantially better energies than would be achieved by representative random library approaches. By allowing the creation of large-scale combinatorial libraries based on structural calculations, SOCoM promises to increase the scope of applicability of computational protein design and improve the hit rate of discovering beneficial variants. While designs presented here focus on variant stability (predicted by total energy), SOCoM can readily incorporate other structure-based assessments, such as the energy gap between alternative conformational or bound states. © 2015 The Protein Society.
Computational 3D structures of drug-targeting proteins in the 2009-H1N1 influenza A virus
NASA Astrophysics Data System (ADS)
Du, Qi-Shi; Wang, Shu-Qing; Huang, Ri-Bo; Chou, Kuo-Chen
2010-01-01
The neuraminidase (NA) and M2 proton channel of influenza virus are the drug-targeting proteins, based on which several drugs were developed. However these once powerful drugs encountered drug-resistant problem to the H5N1 and H1N1 flu. To address this problem, the computational 3D structures of NA and M2 proteins of 2009-H1N1 influenza virus were built using the molecular modeling technique and computational chemistry method. Based on the models the structure features of NA and M2 proteins were analyzed, the docking structures of drug-protein complexes were computed, and the residue mutations were annotated. The results may help to solve the drug-resistant problem and stimulate designing more effective drugs against 2009-H1N1 influenza pandemic.
Kurgan, Lukasz; Cios, Krzysztof; Chen, Ke
2008-05-01
Protein structure prediction methods provide accurate results when a homologous protein is predicted, while poorer predictions are obtained in the absence of homologous templates. However, some protein chains that share twilight-zone pairwise identity can form similar folds and thus determining structural similarity without the sequence similarity would be desirable for the structure prediction. The folding type of a protein or its domain is defined as the structural class. Current structural class prediction methods that predict the four structural classes defined in SCOP provide up to 63% accuracy for the datasets in which sequence identity of any pair of sequences belongs to the twilight-zone. We propose SCPRED method that improves prediction accuracy for sequences that share twilight-zone pairwise similarity with sequences used for the prediction. SCPRED uses a support vector machine classifier that takes several custom-designed features as its input to predict the structural classes. Based on extensive design that considers over 2300 index-, composition- and physicochemical properties-based features along with features based on the predicted secondary structure and content, the classifier's input includes 8 features based on information extracted from the secondary structure predicted with PSI-PRED and one feature computed from the sequence. Tests performed with datasets of 1673 protein chains, in which any pair of sequences shares twilight-zone similarity, show that SCPRED obtains 80.3% accuracy when predicting the four SCOP-defined structural classes, which is superior when compared with over a dozen recent competing methods that are based on support vector machine, logistic regression, and ensemble of classifiers predictors. The SCPRED can accurately find similar structures for sequences that share low identity with sequence used for the prediction. The high predictive accuracy achieved by SCPRED is attributed to the design of the features, which are capable of separating the structural classes in spite of their low dimensionality. We also demonstrate that the SCPRED's predictions can be successfully used as a post-processing filter to improve performance of modern fold classification methods.
Kurgan, Lukasz; Cios, Krzysztof; Chen, Ke
2008-01-01
Background Protein structure prediction methods provide accurate results when a homologous protein is predicted, while poorer predictions are obtained in the absence of homologous templates. However, some protein chains that share twilight-zone pairwise identity can form similar folds and thus determining structural similarity without the sequence similarity would be desirable for the structure prediction. The folding type of a protein or its domain is defined as the structural class. Current structural class prediction methods that predict the four structural classes defined in SCOP provide up to 63% accuracy for the datasets in which sequence identity of any pair of sequences belongs to the twilight-zone. We propose SCPRED method that improves prediction accuracy for sequences that share twilight-zone pairwise similarity with sequences used for the prediction. Results SCPRED uses a support vector machine classifier that takes several custom-designed features as its input to predict the structural classes. Based on extensive design that considers over 2300 index-, composition- and physicochemical properties-based features along with features based on the predicted secondary structure and content, the classifier's input includes 8 features based on information extracted from the secondary structure predicted with PSI-PRED and one feature computed from the sequence. Tests performed with datasets of 1673 protein chains, in which any pair of sequences shares twilight-zone similarity, show that SCPRED obtains 80.3% accuracy when predicting the four SCOP-defined structural classes, which is superior when compared with over a dozen recent competing methods that are based on support vector machine, logistic regression, and ensemble of classifiers predictors. Conclusion The SCPRED can accurately find similar structures for sequences that share low identity with sequence used for the prediction. The high predictive accuracy achieved by SCPRED is attributed to the design of the features, which are capable of separating the structural classes in spite of their low dimensionality. We also demonstrate that the SCPRED's predictions can be successfully used as a post-processing filter to improve performance of modern fold classification methods. PMID:18452616
Motomura, Kenta; Nakamura, Morikazu; Otaki, Joji M.
2013-01-01
Protein structure and function information is coded in amino acid sequences. However, the relationship between primary sequences and three-dimensional structures and functions remains enigmatic. Our approach to this fundamental biochemistry problem is based on the frequencies of short constituent sequences (SCSs) or words. A protein amino acid sequence is considered analogous to an English sentence, where SCSs are equivalent to words. Availability scores, which are defined as real SCS frequencies in the non-redundant amino acid database relative to their probabilistically expected frequencies, demonstrate the biological usage bias of SCSs. As a result, this frequency-based linguistic approach is expected to have diverse applications, such as secondary structure specifications by structure-specific SCSs and immunological adjuvants with rare or non-existent SCSs. Linguistic similarities (e.g., wide ranges of scale-free distributions) and dissimilarities (e.g., behaviors of low-rank samples) between proteins and the natural English language have been revealed in the rank-frequency relationships of SCSs or words. We have developed a web server, the SCS Package, which contains five applications for analyzing protein sequences based on the linguistic concept. These tools have the potential to assist researchers in deciphering structurally and functionally important protein sites, species-specific sequences, and functional relationships between SCSs. The SCS Package also provides researchers with a tool to construct amino acid sequences de novo based on the idiomatic usage of SCSs. PMID:24688703
Motomura, Kenta; Nakamura, Morikazu; Otaki, Joji M
2013-01-01
Protein structure and function information is coded in amino acid sequences. However, the relationship between primary sequences and three-dimensional structures and functions remains enigmatic. Our approach to this fundamental biochemistry problem is based on the frequencies of short constituent sequences (SCSs) or words. A protein amino acid sequence is considered analogous to an English sentence, where SCSs are equivalent to words. Availability scores, which are defined as real SCS frequencies in the non-redundant amino acid database relative to their probabilistically expected frequencies, demonstrate the biological usage bias of SCSs. As a result, this frequency-based linguistic approach is expected to have diverse applications, such as secondary structure specifications by structure-specific SCSs and immunological adjuvants with rare or non-existent SCSs. Linguistic similarities (e.g., wide ranges of scale-free distributions) and dissimilarities (e.g., behaviors of low-rank samples) between proteins and the natural English language have been revealed in the rank-frequency relationships of SCSs or words. We have developed a web server, the SCS Package, which contains five applications for analyzing protein sequences based on the linguistic concept. These tools have the potential to assist researchers in deciphering structurally and functionally important protein sites, species-specific sequences, and functional relationships between SCSs. The SCS Package also provides researchers with a tool to construct amino acid sequences de novo based on the idiomatic usage of SCSs.
Ghosh, Pritha; Mathew, Oommen K; Sowdhamini, Ramanathan
2016-10-07
RNA-binding proteins (RBPs) interact with their cognate RNA(s) to form large biomolecular assemblies. They are versatile in their functionality and are involved in a myriad of processes inside the cell. RBPs with similar structural features and common biological functions are grouped together into families and superfamilies. It will be useful to obtain an early understanding and association of RNA-binding property of sequences of gene products. Here, we report a web server, RStrucFam, to predict the structure, type of cognate RNA(s) and function(s) of proteins, where possible, from mere sequence information. The web server employs Hidden Markov Model scan (hmmscan) to enable association to a back-end database of structural and sequence families. The database (HMMRBP) comprises of 437 HMMs of RBP families of known structure that have been generated using structure-based sequence alignments and 746 sequence-centric RBP family HMMs. The input protein sequence is associated with structural or sequence domain families, if structure or sequence signatures exist. In case of association of the protein with a family of known structures, output features like, multiple structure-based sequence alignment (MSSA) of the query with all others members of that family is provided. Further, cognate RNA partner(s) for that protein, Gene Ontology (GO) annotations, if any and a homology model of the protein can be obtained. The users can also browse through the database for details pertaining to each family, protein or RNA and their related information based on keyword search or RNA motif search. RStrucFam is a web server that exploits structurally conserved features of RBPs, derived from known family members and imprinted in mathematical profiles, to predict putative RBPs from sequence information. Proteins that fail to associate with such structure-centric families are further queried against the sequence-centric RBP family HMMs in the HMMRBP database. Further, all other essential information pertaining to an RBP, like overall function annotations, are provided. The web server can be accessed at the following link: http://caps.ncbs.res.in/rstrucfam .
Assessing the applicability of template-based protein docking in the twilight zone.
Negroni, Jacopo; Mosca, Roberto; Aloy, Patrick
2014-09-02
The structural modeling of protein interactions in the absence of close homologous templates is a challenging task. Recently, template-based docking methods have emerged to exploit local structural similarities to help ab-initio protocols provide reliable 3D models for protein interactions. In this work, we critically assess the performance of template-based docking in the twilight zone. Our results show that, while it is possible to find templates for nearly all known interactions, the quality of the obtained models is rather limited. We can increase the precision of the models at expenses of coverage, but it drastically reduces the potential applicability of the method, as illustrated by the whole-interactome modeling of nine organisms. Template-based docking is likely to play an important role in the structural characterization of the interaction space, but we still need to improve the repertoire of structural templates onto which we can reliably model protein complexes. Copyright © 2014 Elsevier Ltd. All rights reserved.
Hati, Sanchita; Bhattacharyya, Sudeep
2016-01-01
A project-based biophysical chemistry laboratory course, which is offered to the biochemistry and molecular biology majors in their senior year, is described. In this course, the classroom study of the structure-function of biomolecules is integrated with the discovery-guided laboratory study of these molecules using computer modeling and simulations. In particular, modern computational tools are employed to elucidate the relationship between structure, dynamics, and function in proteins. Computer-based laboratory protocols that we introduced in three modules allow students to visualize the secondary, super-secondary, and tertiary structures of proteins, analyze non-covalent interactions in protein-ligand complexes, develop three-dimensional structural models (homology model) for new protein sequences and evaluate their structural qualities, and study proteins' intrinsic dynamics to understand their functions. In the fourth module, students are assigned to an authentic research problem, where they apply their laboratory skills (acquired in modules 1-3) to answer conceptual biophysical questions. Through this process, students gain in-depth understanding of protein dynamics-the missing link between structure and function. Additionally, the requirement of term papers sharpens students' writing and communication skills. Finally, these projects result in new findings that are communicated in peer-reviewed journals. © 2016 The International Union of Biochemistry and Molecular Biology.
Statistical mechanics of protein structural transitions: Insights from the island model
Kobayashi, Yukio
2016-01-01
The so-called island model of protein structural transition holds that hydrophobic interactions are the key to both the folding and function of proteins. Herein, the genesis and statistical mechanical basis of the island model of transitions are reviewed, by presenting the results of simulations of such transitions. Elucidating the physicochemical mechanism of protein structural formation is the foundation for understanding the hierarchical structure of life at the microscopic level. Based on the results obtained to date using the island model, remaining problems and future work in the field of protein structures are discussed, referencing Professor Saitô’s views on the hierarchic structure of science. PMID:28409078
Walia, Rasna R; Caragea, Cornelia; Lewis, Benjamin A; Towfic, Fadi; Terribilini, Michael; El-Manzalawy, Yasser; Dobbs, Drena; Honavar, Vasant
2012-05-10
RNA molecules play diverse functional and structural roles in cells. They function as messengers for transferring genetic information from DNA to proteins, as the primary genetic material in many viruses, as catalysts (ribozymes) important for protein synthesis and RNA processing, and as essential and ubiquitous regulators of gene expression in living organisms. Many of these functions depend on precisely orchestrated interactions between RNA molecules and specific proteins in cells. Understanding the molecular mechanisms by which proteins recognize and bind RNA is essential for comprehending the functional implications of these interactions, but the recognition 'code' that mediates interactions between proteins and RNA is not yet understood. Success in deciphering this code would dramatically impact the development of new therapeutic strategies for intervening in devastating diseases such as AIDS and cancer. Because of the high cost of experimental determination of protein-RNA interfaces, there is an increasing reliance on statistical machine learning methods for training predictors of RNA-binding residues in proteins. However, because of differences in the choice of datasets, performance measures, and data representations used, it has been difficult to obtain an accurate assessment of the current state of the art in protein-RNA interface prediction. We provide a review of published approaches for predicting RNA-binding residues in proteins and a systematic comparison and critical assessment of protein-RNA interface residue predictors trained using these approaches on three carefully curated non-redundant datasets. We directly compare two widely used machine learning algorithms (Naïve Bayes (NB) and Support Vector Machine (SVM)) using three different data representations in which features are encoded using either sequence- or structure-based windows. Our results show that (i) Sequence-based classifiers that use a position-specific scoring matrix (PSSM)-based representation (PSSMSeq) outperform those that use an amino acid identity based representation (IDSeq) or a smoothed PSSM (SmoPSSMSeq); (ii) Structure-based classifiers that use smoothed PSSM representation (SmoPSSMStr) outperform those that use PSSM (PSSMStr) as well as sequence identity based representation (IDStr). PSSMSeq classifiers, when tested on an independent test set of 44 proteins, achieve performance that is comparable to that of three state-of-the-art structure-based predictors (including those that exploit geometric features) in terms of Matthews Correlation Coefficient (MCC), although the structure-based methods achieve substantially higher Specificity (albeit at the expense of Sensitivity) compared to sequence-based methods. We also find that the expected performance of the classifiers on a residue level can be markedly different from that on a protein level. Our experiments show that the classifiers trained on three different non-redundant protein-RNA interface datasets achieve comparable cross-validation performance. However, we find that the results are significantly affected by differences in the distance threshold used to define interface residues. Our results demonstrate that protein-RNA interface residue predictors that use a PSSM-based encoding of sequence windows outperform classifiers that use other encodings of sequence windows. While structure-based methods that exploit geometric features can yield significant increases in the Specificity of protein-RNA interface residue predictions, such increases are offset by decreases in Sensitivity. These results underscore the importance of comparing alternative methods using rigorous statistical procedures, multiple performance measures, and datasets that are constructed based on several alternative definitions of interface residues and redundancy cutoffs as well as including evaluations on independent test sets into the comparisons.
Rysavy, Steven J; Beck, David A C; Daggett, Valerie
2014-11-01
Protein function is intimately linked to protein structure and dynamics yet experimentally determined structures frequently omit regions within a protein due to indeterminate data, which is often due protein dynamics. We propose that atomistic molecular dynamics simulations provide a diverse sampling of biologically relevant structures for these missing segments (and beyond) to improve structural modeling and structure prediction. Here we make use of the Dynameomics data warehouse, which contains simulations of representatives of essentially all known protein folds. We developed novel computational methods to efficiently identify, rank and retrieve small peptide structures, or fragments, from this database. We also created a novel data model to analyze and compare large repositories of structural data, such as contained within the Protein Data Bank and the Dynameomics data warehouse. Our evaluation compares these structural repositories for improving loop predictions and analyzes the utility of our methods and models. Using a standard set of loop structures, containing 510 loops, 30 for each loop length from 4 to 20 residues, we find that the inclusion of Dynameomics structures in fragment-based methods improves the quality of the loop predictions without being dependent on sequence homology. Depending on loop length, ∼ 25-75% of the best predictions came from the Dynameomics set, resulting in lower main chain root-mean-square deviations for all fragment lengths using the combined fragment library. We also provide specific cases where Dynameomics fragments provide better predictions for NMR loop structures than fragments from crystal structures. Online access to these fragment libraries is available at http://www.dynameomics.org/fragments. © 2014 The Protein Society.
The Functional Curli Amyloid Is Not Based on In-register Parallel β-Sheet Structure*
Shewmaker, Frank; McGlinchey, Ryan P.; Thurber, Kent R.; McPhie, Peter; Dyda, Fred; Tycko, Robert; Wickner, Reed B.
2009-01-01
The extracellular curli proteins of Enterobacteriaceae form fibrous structures that are involved in biofilm formation and adhesion to host cells. These curli fibrils are considered a functional amyloid because they are not a consequence of misfolding, but they have many of the properties of protein amyloid. We confirm that fibrils formed by CsgA and CsgB, the primary curli proteins of Escherichia coli, possess many of the hallmarks typical of amyloid. Moreover we demonstrate that curli fibrils possess the cross-β structure that distinguishes protein amyloid. However, solid state NMR experiments indicate that curli structure is not based on an in-register parallel β-sheet architecture, which is common to many human disease-associated amyloids and the yeast prion amyloids. Solid state NMR and electron microscopy data are consistent with a β-helix-like structure but are not sufficient to establish such a structure definitively. PMID:19574225
Li, Zhan-Chao; Zhou, Xi-Bin; Dai, Zong; Zou, Xiao-Yong
2009-07-01
A prior knowledge of protein structural classes can provide useful information about its overall structure, so it is very important for quick and accurate determination of protein structural class with computation method in protein science. One of the key for computation method is accurate protein sample representation. Here, based on the concept of Chou's pseudo-amino acid composition (AAC, Chou, Proteins: structure, function, and genetics, 43:246-255, 2001), a novel method of feature extraction that combined continuous wavelet transform (CWT) with principal component analysis (PCA) was introduced for the prediction of protein structural classes. Firstly, the digital signal was obtained by mapping each amino acid according to various physicochemical properties. Secondly, CWT was utilized to extract new feature vector based on wavelet power spectrum (WPS), which contains more abundant information of sequence order in frequency domain and time domain, and PCA was then used to reorganize the feature vector to decrease information redundancy and computational complexity. Finally, a pseudo-amino acid composition feature vector was further formed to represent primary sequence by coupling AAC vector with a set of new feature vector of WPS in an orthogonal space by PCA. As a showcase, the rigorous jackknife cross-validation test was performed on the working datasets. The results indicated that prediction quality has been improved, and the current approach of protein representation may serve as a useful complementary vehicle in classifying other attributes of proteins, such as enzyme family class, subcellular localization, membrane protein types and protein secondary structure, etc.
Merckel, Michael C; Huiskonen, Juha T; Bamford, Dennis H; Goldman, Adrian; Tuma, Roman
2005-04-15
Comparisons of bacteriophage PRD1 and adenovirus protein structures and virion architectures have been instrumental in unraveling an evolutionary relationship and have led to a proposal of a phylogeny-based virus classification. The structure of the PRD1 spike protein P5 provides further insight into the evolution of viral proteins. The crystallized P5 fragment comprises two structural domains: a globular knob and a fibrous shaft. The head folds into a ten-stranded jelly roll beta barrel, which is structurally related to the tumor necrosis factor (TNF) and the PRD1 coat protein domains. The shaft domain is a structural counterpart to the adenovirus spike shaft. The structural relationships between PRD1, TNF, and adenovirus proteins suggest that the vertex proteins may have originated from an ancestral TNF-like jelly roll coat protein via a combination of gene duplication and deletion.
Implementation of a parallel protein structure alignment service on cloud.
Hung, Che-Lun; Lin, Yaw-Ling
2013-01-01
Protein structure alignment has become an important strategy by which to identify evolutionary relationships between protein sequences. Several alignment tools are currently available for online comparison of protein structures. In this paper, we propose a parallel protein structure alignment service based on the Hadoop distribution framework. This service includes a protein structure alignment algorithm, a refinement algorithm, and a MapReduce programming model. The refinement algorithm refines the result of alignment. To process vast numbers of protein structures in parallel, the alignment and refinement algorithms are implemented using MapReduce. We analyzed and compared the structure alignments produced by different methods using a dataset randomly selected from the PDB database. The experimental results verify that the proposed algorithm refines the resulting alignments more accurately than existing algorithms. Meanwhile, the computational performance of the proposed service is proportional to the number of processors used in our cloud platform.
Implementation of a Parallel Protein Structure Alignment Service on Cloud
Hung, Che-Lun; Lin, Yaw-Ling
2013-01-01
Protein structure alignment has become an important strategy by which to identify evolutionary relationships between protein sequences. Several alignment tools are currently available for online comparison of protein structures. In this paper, we propose a parallel protein structure alignment service based on the Hadoop distribution framework. This service includes a protein structure alignment algorithm, a refinement algorithm, and a MapReduce programming model. The refinement algorithm refines the result of alignment. To process vast numbers of protein structures in parallel, the alignment and refinement algorithms are implemented using MapReduce. We analyzed and compared the structure alignments produced by different methods using a dataset randomly selected from the PDB database. The experimental results verify that the proposed algorithm refines the resulting alignments more accurately than existing algorithms. Meanwhile, the computational performance of the proposed service is proportional to the number of processors used in our cloud platform. PMID:23671842
Simulating protein folding initiation sites using an alpha-carbon-only knowledge-based force field
Buck, Patrick M.; Bystroff, Christopher
2015-01-01
Protein folding is a hierarchical process where structure forms locally first, then globally. Some short sequence segments initiate folding through strong structural preferences that are independent of their three-dimensional context in proteins. We have constructed a knowledge-based force field in which the energy functions are conditional on local sequence patterns, as expressed in the hidden Markov model for local structure (HMMSTR). Carbon-alpha force field (CALF) builds sequence specific statistical potentials based on database frequencies for α-carbon virtual bond opening and dihedral angles, pairwise contacts and hydrogen bond donor-acceptor pairs, and simulates folding via Brownian dynamics. We introduce hydrogen bond donor and acceptor potentials as α-carbon probability fields that are conditional on the predicted local sequence. Constant temperature simulations were carried out using 27 peptides selected as putative folding initiation sites, each 12 residues in length, representing several different local structure motifs. Each 0.6 μs trajectory was clustered based on structure. Simulation convergence or representativeness was assessed by subdividing trajectories and comparing clusters. For 21 of the 27 sequences, the largest cluster made up more than half of the total trajectory. Of these 21 sequences, 14 had cluster centers that were at most 2.6 Å root mean square deviation (RMSD) from their native structure in the corresponding full-length protein. To assess the adequacy of the energy function on nonlocal interactions, 11 full length native structures were relaxed using Brownian dynamics simulations. Equilibrated structures deviated from their native states but retained their overall topology and compactness. A simple potential that folds proteins locally and stabilizes proteins globally may enable a more realistic understanding of hierarchical folding pathways. PMID:19137613
A discrete search algorithm for finding the structure of protein backbones and side chains.
Sallaume, Silas; Martins, Simone de Lima; Ochi, Luiz Satoru; Da Silva, Warley Gramacho; Lavor, Carlile; Liberti, Leo
2013-01-01
Some information about protein structure can be obtained by using Nuclear Magnetic Resonance (NMR) techniques, but they provide only a sparse set of distances between atoms in a protein. The Molecular Distance Geometry Problem (MDGP) consists in determining the three-dimensional structure of a molecule using a set of known distances between some atoms. Recently, a Branch and Prune (BP) algorithm was proposed to calculate the backbone of a protein, based on a discrete formulation for the MDGP. We present an extension of the BP algorithm that can calculate not only the protein backbone, but the whole three-dimensional structure of proteins.
Objective identification of residue ranges for the superposition of protein structures
2011-01-01
Background The automation of objectively selecting amino acid residue ranges for structure superpositions is important for meaningful and consistent protein structure analyses. So far there is no widely-used standard for choosing these residue ranges for experimentally determined protein structures, where the manual selection of residue ranges or the use of suboptimal criteria remain commonplace. Results We present an automated and objective method for finding amino acid residue ranges for the superposition and analysis of protein structures, in particular for structure bundles resulting from NMR structure calculations. The method is implemented in an algorithm, CYRANGE, that yields, without protein-specific parameter adjustment, appropriate residue ranges in most commonly occurring situations, including low-precision structure bundles, multi-domain proteins, symmetric multimers, and protein complexes. Residue ranges are chosen to comprise as many residues of a protein domain that increasing their number would lead to a steep rise in the RMSD value. Residue ranges are determined by first clustering residues into domains based on the distance variance matrix, and then refining for each domain the initial choice of residues by excluding residues one by one until the relative decrease of the RMSD value becomes insignificant. A penalty for the opening of gaps favours contiguous residue ranges in order to obtain a result that is as simple as possible, but not simpler. Results are given for a set of 37 proteins and compared with those of commonly used protein structure validation packages. We also provide residue ranges for 6351 NMR structures in the Protein Data Bank. Conclusions The CYRANGE method is capable of automatically determining residue ranges for the superposition of protein structure bundles for a large variety of protein structures. The method correctly identifies ordered regions. Global structure superpositions based on the CYRANGE residue ranges allow a clear presentation of the structure, and unnecessary small gaps within the selected ranges are absent. In the majority of cases, the residue ranges from CYRANGE contain fewer gaps and cover considerably larger parts of the sequence than those from other methods without significantly increasing the RMSD values. CYRANGE thus provides an objective and automatic method for standardizing the choice of residue ranges for the superposition of protein structures. PMID:21592348
Synthesis and characterization of recombinant abductin-based proteins.
Su, Renay S-C; Renner, Julie N; Liu, Julie C
2013-12-09
Recombinant proteins are promising tools for tissue engineering and drug delivery applications. Protein-based biomaterials have several advantages over natural and synthetic polymers, including precise control over amino acid composition and molecular weight, modular swapping of functional domains, and tunable mechanical and physical properties. In this work, we describe recombinant proteins based on abductin, an elastomeric protein that is found in the inner hinge of bivalves and functions as a coil spring to keep shells open. We illustrate, for the first time, the design, cloning, expression, and purification of a recombinant protein based on consensus abductin sequences derived from Argopecten irradians . The molecular weight of the protein was confirmed by mass spectrometry, and the protein was 94% pure. Circular dichroism studies showed that the dominant structures of abductin-based proteins were polyproline II helix structures in aqueous solution and type II β-turns in trifluoroethanol. Dynamic light scattering studies illustrated that the abductin-based proteins exhibit reversible upper critical solution temperature behavior and irreversible aggregation behavior at high temperatures. A LIVE/DEAD assay revealed that human umbilical vein endothelial cells had a viability of 98 ± 4% after being cultured for two days on the abductin-based protein. Initial cell spreading on the abductin-based protein was similar to that on bovine serum albumin. These studies thus demonstrate the potential of abductin-based proteins in tissue engineering and drug delivery applications due to the cytocompatibility and its response to temperature.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Miller, Matthew T.; Higgin, Joshua J.; Hall, Traci M.Tanaka
2008-06-06
Pumilio/FBF (PUF) family proteins are found in eukaryotic organisms and regulate gene expression post-transcriptionally by binding to sequences in the 3' untranslated region of target transcripts. PUF proteins contain an RNA binding domain that typically comprises eight {alpha}-helical repeats, each of which recognizes one RNA base. Some PUF proteins, including yeast Puf4p, have altered RNA binding specificity and use their eight repeats to bind to RNA sequences with nine or ten bases. Here we report the crystal structures of Puf4p alone and in complex with a 9-nucleotide (nt) target RNA sequence, revealing that Puf4p accommodates an 'extra' nucleotide by modestmore » adaptations allowing one base to be turned away from the RNA binding surface. Using structural information and sequence comparisons, we created a mutant Puf4p protein that preferentially binds to an 8-nt target RNA sequence over a 9-nt sequence and restores binding of each protein repeat to one RNA base.« less
Tactile Teaching: Exploring Protein Structure/Function Using Physical Models
ERIC Educational Resources Information Center
Herman, Tim; Morris, Jennifer; Colton, Shannon; Batiza, Ann; Patrick, Michael; Franzen, Margaret; Goodsell, David S.
2006-01-01
The technology now exists to construct physical models of proteins based on atomic coordinates of solved structures. We review here our recent experiences in using physical models to teach concepts of protein structure and function at both the high school and the undergraduate levels. At the high school level, physical models are used in a…
ERIC Educational Resources Information Center
Terrell, Cassidy R.; Listenberger, Laura L.
2017-01-01
Recognizing that undergraduate students can benefit from analysis of 3D protein structure and function, we have developed a multiweek, inquiry-based molecular visualization project for Biochemistry I students. This project uses a virtual model of cyclooxygenase-1 (COX-1) to guide students through multiple levels of protein structure analysis. The…
Mapping of ligand-binding cavities in proteins.
Andersson, C David; Chen, Brian Y; Linusson, Anna
2010-05-01
The complex interactions between proteins and small organic molecules (ligands) are intensively studied because they play key roles in biological processes and drug activities. Here, we present a novel approach to characterize and map the ligand-binding cavities of proteins without direct geometric comparison of structures, based on Principal Component Analysis of cavity properties (related mainly to size, polarity, and charge). This approach can provide valuable information on the similarities and dissimilarities, of binding cavities due to mutations, between-species differences and flexibility upon ligand-binding. The presented results show that information on ligand-binding cavity variations can complement information on protein similarity obtained from sequence comparisons. The predictive aspect of the method is exemplified by successful predictions of serine proteases that were not included in the model construction. The presented strategy to compare ligand-binding cavities of related and unrelated proteins has many potential applications within protein and medicinal chemistry, for example in the characterization and mapping of "orphan structures", selection of protein structures for docking studies in structure-based design, and identification of proteins for selectivity screens in drug design programs. 2009 Wiley-Liss, Inc.
Structure-based design of broadly protective group a streptococcal M protein-based vaccines.
Dale, James B; Smeesters, Pierre R; Courtney, Harry S; Penfound, Thomas A; Hohn, Claudia M; Smith, Jeremy C; Baudry, Jerome Y
2017-01-03
A major obstacle to the development of broadly protective M protein-based group A streptococcal (GAS) vaccines is the variability within the N-terminal epitopes that evoke potent bactericidal antibodies. The concept of M type-specific protective immune responses has recently been challenged based on the observation that multivalent M protein vaccines elicited cross-reactive bactericidal antibodies against a number of non-vaccine M types of GAS. Additionally, a new "cluster-based" typing system of 175M proteins identified a limited number of clusters containing closely related M proteins. In the current study, we used the emm cluster typing system, in combination with computational structure-based peptide modeling, as a novel approach to the design of potentially broadly protective M protein-based vaccines. M protein sequences (AA 16-50) from the E4 cluster containing 17 emm types of GAS were analyzed using de novo 3-D structure prediction tools and the resulting structures subjected to chemical diversity analysis to identify sequences that were the most representative of the 3-D physicochemical properties of the M peptides in the cluster. Five peptides that spanned the range of physicochemical attributes of all 17 peptides were used to formulate synthetic and recombinant vaccines. Rabbit antisera were assayed for antibodies that cross-reacted with E4 peptides and whole bacteria by ELISA and for bactericidal activity against all E4GAS. The synthetic vaccine rabbit antisera reacted with all 17 E4M peptides and demonstrated bactericidal activity against 15/17 E4GAS. A recombinant hybrid vaccine containing the same E4 peptides also elicited antibodies that cross-reacted with all E4M peptides. Comprehensive studies using structure-based design may result in a broadly protective M peptide vaccine that will elicit cluster-specific and emm type-specific antibody responses against the majority of clinically relevant emm types of GAS. Copyright © 2016 Elsevier Ltd. All rights reserved.
Modulation of electronic structures of bases through DNA recognition of protein.
Hagiwara, Yohsuke; Kino, Hiori; Tateno, Masaru
2010-04-21
The effects of environmental structures on the electronic states of functional regions in a fully solvated DNA·protein complex were investigated using combined ab initio quantum mechanics/molecular mechanics calculations. A complex of a transcriptional factor, PU.1, and the target DNA was used for the calculations. The effects of solvent on the energies of molecular orbitals (MOs) of some DNA bases strongly correlate with the magnitude of masking of the DNA bases from the solvent by the protein. In the complex, PU.1 causes a variation in the magnitude among DNA bases by means of directly recognizing the DNA bases through hydrogen bonds and inducing structural changes of the DNA structure from the canonical one. Thus, the strong correlation found in this study is the first evidence showing the close quantitative relationship between recognition modes of DNA bases and the energy levels of the corresponding MOs. Thus, it has been revealed that the electronic state of each base is highly regulated and organized by the DNA recognition of the protein. Other biological macromolecular systems can be expected to also possess similar modulation mechanisms, suggesting that this finding provides a novel basis for the understanding for the regulation functions of biological macromolecular systems.
Structure of Lmaj006129AAA, a hypothetical protein from Leishmania major
DOE Office of Scientific and Technical Information (OSTI.GOV)
Arakaki, Tracy; Le Trong, Isolde; Structural Genomics of Pathogenic Protozoa
2006-03-01
The crystal structure of a conserved hypothetical protein from L. major, Pfam sequence family PF04543, structural genomics target ID Lmaj006129AAA, has been determined at a resolution of 1.6 Å. The gene product of structural genomics target Lmaj006129 from Leishmania major codes for a 164-residue protein of unknown function. When SeMet expression of the full-length gene product failed, several truncation variants were created with the aid of Ginzu, a domain-prediction method. 11 truncations were selected for expression, purification and crystallization based upon secondary-structure elements and disorder. The structure of one of these variants, Lmaj006129AAH, was solved by multiple-wavelength anomalous diffraction (MAD)more » using ELVES, an automatic protein crystal structure-determination system. This model was then successfully used as a molecular-replacement probe for the parent full-length target, Lmaj006129AAA. The final structure of Lmaj006129AAA was refined to an R value of 0.185 (R{sub free} = 0.229) at 1.60 Å resolution. Structure and sequence comparisons based on Lmaj006129AAA suggest that proteins belonging to Pfam sequence families PF04543 and PF01878 may share a common ligand-binding motif.« less
Tan, Yen Hock; Huang, He; Kihara, Daisuke
2006-08-15
Aligning distantly related protein sequences is a long-standing problem in bioinformatics, and a key for successful protein structure prediction. Its importance is increasing recently in the context of structural genomics projects because more and more experimentally solved structures are available as templates for protein structure modeling. Toward this end, recent structure prediction methods employ profile-profile alignments, and various ways of aligning two profiles have been developed. More fundamentally, a better amino acid similarity matrix can improve a profile itself; thereby resulting in more accurate profile-profile alignments. Here we have developed novel amino acid similarity matrices from knowledge-based amino acid contact potentials. Contact potentials are used because the contact propensity to the other amino acids would be one of the most conserved features of each position of a protein structure. The derived amino acid similarity matrices are tested on benchmark alignments at three different levels, namely, the family, the superfamily, and the fold level. Compared to BLOSUM45 and the other existing matrices, the contact potential-based matrices perform comparably in the family level alignments, but clearly outperform in the fold level alignments. The contact potential-based matrices perform even better when suboptimal alignments are considered. Comparing the matrices themselves with each other revealed that the contact potential-based matrices are very different from BLOSUM45 and the other matrices, indicating that they are located in a different basin in the amino acid similarity matrix space.
Cao, Renzhi; Bhattacharya, Debswapna; Adhikari, Badri; Li, Jilong; Cheng, Jianlin
2015-01-01
Model evaluation and selection is an important step and a big challenge in template-based protein structure prediction. Individual model quality assessment methods designed for recognizing some specific properties of protein structures often fail to consistently select good models from a model pool because of their limitations. Therefore, combining multiple complimentary quality assessment methods is useful for improving model ranking and consequently tertiary structure prediction. Here, we report the performance and analysis of our human tertiary structure predictor (MULTICOM) based on the massive integration of 14 diverse complementary quality assessment methods that was successfully benchmarked in the 11th Critical Assessment of Techniques of Protein Structure prediction (CASP11). The predictions of MULTICOM for 39 template-based domains were rigorously assessed by six scoring metrics covering global topology of Cα trace, local all-atom fitness, side chain quality, and physical reasonableness of the model. The results show that the massive integration of complementary, diverse single-model and multi-model quality assessment methods can effectively leverage the strength of single-model methods in distinguishing quality variation among similar good models and the advantage of multi-model quality assessment methods of identifying reasonable average-quality models. The overall excellent performance of the MULTICOM predictor demonstrates that integrating a large number of model quality assessment methods in conjunction with model clustering is a useful approach to improve the accuracy, diversity, and consequently robustness of template-based protein structure prediction. PMID:26369671
The simulation study of protein-protein interfaces based on the 4-helix bundle structure
NASA Astrophysics Data System (ADS)
Fukuda, Masaki; Komatsu, Yu; Morikawa, Ryota; Miyakawa, Takeshi; Takasu, Masako; Akanuma, Satoshi; Yamagishi, Akihiko
2013-02-01
Docking of two protein molecules is induced by intermolecular interactions. Our purposes in this study are: designing binding interfaces on the two proteins, which specifically interact to each other; and inducing intermolecular interactions between the two proteins by mixing them. A 4-helix bundle structure was chosen as a scaffold on which binding interfaces were created. Based on this scaffold, we designed binding interfaces involving charged and nonpolar amino acid residues. We performed molecular dynamics (MD) simulation to identify suitable amino acid residues for the interfaces. We chose YciF protein as the scaffold for the protein-protein docking simulation. We observed the structure of two YciF protein molecules (I and II), and we calculated the distance between centroids (center of gravity) of the interfaces' surface planes of the molecules I and II. We found that the docking of the two protein molecules can be controlled by the number of hydrophobic and charged amino acid residues involved in the interfaces. Existence of six hydrophobic and five charged amino acid residues within an interface were most suitable for the protein-protein docking.
Xu, Dong; Zhang, Yang
2013-01-01
Genome-wide protein structure prediction and structure-based function annotation have been a long-term goal in molecular biology but not yet become possible due to difficulties in modeling distant-homology targets. We developed a hybrid pipeline combining ab initio folding and template-based modeling for genome-wide structure prediction applied to the Escherichia coli genome. The pipeline was tested on 43 known sequences, where QUARK-based ab initio folding simulation generated models with TM-score 17% higher than that by traditional comparative modeling methods. For 495 unknown hard sequences, 72 are predicted to have a correct fold (TM-score > 0.5) and 321 have a substantial portion of structure correctly modeled (TM-score > 0.35). 317 sequences can be reliably assigned to a SCOP fold family based on structural analogy to existing proteins in PDB. The presented results, as a case study of E. coli, represent promising progress towards genome-wide structure modeling and fold family assignment using state-of-the-art ab initio folding algorithms. PMID:23719418
Building a Better Fragment Library for De Novo Protein Structure Prediction
de Oliveira, Saulo H. P.; Shi, Jiye; Deane, Charlotte M.
2015-01-01
Fragment-based approaches are the current standard for de novo protein structure prediction. These approaches rely on accurate and reliable fragment libraries to generate good structural models. In this work, we describe a novel method for structure fragment library generation and its application in fragment-based de novo protein structure prediction. The importance of correct testing procedures in assessing the quality of fragment libraries is demonstrated. In particular, the exclusion of homologs to the target from the libraries to correctly simulate a de novo protein structure prediction scenario, something which surprisingly is not always done. We demonstrate that fragments presenting different predominant predicted secondary structures should be treated differently during the fragment library generation step and that exhaustive and random search strategies should both be used. This information was used to develop a novel method, Flib. On a validation set of 41 structurally diverse proteins, Flib libraries presents both a higher precision and coverage than two of the state-of-the-art methods, NNMake and HHFrag. Flib also achieves better precision and coverage on the set of 275 protein domains used in the two previous experiments of the the Critical Assessment of Structure Prediction (CASP9 and CASP10). We compared Flib libraries against NNMake libraries in a structure prediction context. Of the 13 cases in which a correct answer was generated, Flib models were more accurate than NNMake models for 10. “Flib is available for download at: http://www.stats.ox.ac.uk/research/proteins/resources”. PMID:25901595
Racemic & quasi-racemic protein crystallography enabled by chemical protein synthesis.
Kent, Stephen Bh
2018-04-04
A racemic protein mixture can be used to form centrosymmetric crystals for structure determination by X-ray diffraction. Both the unnatural d-protein and the corresponding natural l-protein are made by total chemical synthesis based on native chemical ligation-chemoselective condensation of unprotected synthetic peptide segments. Racemic protein crystallography is important for structure determination of the many natural protein molecules that are refractory to crystallization. Racemic mixtures facilitate the crystallization of recalcitrant proteins, and give diffraction-quality crystals. Quasi-racemic crystallization, using a single d-protein molecule, can facilitate the determination of the structures of a series of l-protein analog molecules. Copyright © 2018 Elsevier Ltd. All rights reserved.
A structural informatics approach to mine kinase knowledge bases.
Brooijmans, Natasja; Mobilio, Dominick; Walker, Gary; Nilakantan, Ramaswamy; Denny, Rajiah A; Feyfant, Eric; Diller, David; Bikker, Jack; Humblet, Christine
2010-03-01
In this paper, we describe a combination of structural informatics approaches developed to mine data extracted from existing structure knowledge bases (Protein Data Bank and the GVK database) with a focus on kinase ATP-binding site data. In contrast to existing systems that retrieve and analyze protein structures, our techniques are centered on a database of ligand-bound geometries in relation to residues lining the binding site and transparent access to ligand-based SAR data. We illustrate the systems in the context of the Abelson kinase and related inhibitor structures. 2009 Elsevier Ltd. All rights reserved.
Li, Yaohang; Liu, Hui; Rata, Ionel; Jakobsson, Eric
2013-02-25
The rapidly increasing number of protein crystal structures available in the Protein Data Bank (PDB) has naturally made statistical analyses feasible in studying complex high-order inter-residue correlations. In this paper, we report a context-based secondary structure potential (CSSP) for assessing the quality of predicted protein secondary structures generated by various prediction servers. CSSP is a sequence-position-specific knowledge-based potential generated based on the potentials of mean force approach, where high-order inter-residue interactions are taken into consideration. The CSSP potential is effective in identifying secondary structure predictions with good quality. In 56% of the targets in the CB513 benchmark, the optimal CSSP potential is able to recognize the native secondary structure or a prediction with Q3 accuracy higher than 90% as best scored in the predicted secondary structures generated by 10 popularly used secondary structure prediction servers. In more than 80% of the CB513 targets, the predicted secondary structures with the lowest CSSP potential values yield higher than 80% Q3 accuracy. Similar performance of CSSP is found on the CASP9 targets as well. Moreover, our computational results also show that the CSSP potential using triplets outperforms the CSSP potential using doublets and is currently better than the CSSP potential using quartets.
Structure-Based Design of Highly Selective Inhibitors of the CREB Binding Protein Bromodomain.
Denny, R Aldrin; Flick, Andrew C; Coe, Jotham; Langille, Jonathan; Basak, Arindrajit; Liu, Shenping; Stock, Ingrid; Sahasrabudhe, Parag; Bonin, Paul; Hay, Duncan A; Brennan, Paul E; Pletcher, Mathew; Jones, Lyn H; Chekler, Eugene L Piatnitski
2017-07-13
Chemical probes are required for preclinical target validation to interrogate novel biological targets and pathways. Selective inhibitors of the CREB binding protein (CREBBP)/EP300 bromodomains are required to facilitate the elucidation of biology associated with these important epigenetic targets. Medicinal chemistry optimization that paid particular attention to physiochemical properties delivered chemical probes with desirable potency, selectivity, and permeability attributes. An important feature of the optimization process was the successful application of rational structure-based drug design to address bromodomain selectivity issues (particularly against the structurally related BRD4 protein).
ProTSAV: A protein tertiary structure analysis and validation server.
Singh, Ankita; Kaushik, Rahul; Mishra, Avinash; Shanker, Asheesh; Jayaram, B
2016-01-01
Quality assessment of predicted model structures of proteins is as important as the protein tertiary structure prediction. A highly efficient quality assessment of predicted model structures directs further research on function. Here we present a new server ProTSAV, capable of evaluating predicted model structures based on some popular online servers and standalone tools. ProTSAV furnishes the user with a single quality score in case of individual protein structure along with a graphical representation and ranking in case of multiple protein structure assessment. The server is validated on ~64,446 protein structures including experimental structures from RCSB and predicted model structures for CASP targets and from public decoy sets. ProTSAV succeeds in predicting quality of protein structures with a specificity of 100% and a sensitivity of 98% on experimentally solved structures and achieves a specificity of 88%and a sensitivity of 91% on predicted protein structures of CASP11 targets under 2Å.The server overcomes the limitations of any single server/method and is seen to be robust in helping in quality assessment. ProTSAV is freely available at http://www.scfbio-iitd.res.in/software/proteomics/protsav.jsp. Copyright © 2015 Elsevier B.V. All rights reserved.
Physics and evolution of thermophilic adaptation.
Berezovsky, Igor N; Shakhnovich, Eugene I
2005-09-06
Analysis of structures and sequences of several hyperthermostable proteins from various sources reveals two major physical mechanisms of their thermostabilization. The first mechanism is "structure-based," whereby some hyperthermostable proteins are significantly more compact than their mesophilic homologues, while no particular interaction type appears to cause stabilization; rather, a sheer number of interactions is responsible for thermostability. Other hyperthermostable proteins employ an alternative, "sequence-based" mechanism of their thermal stabilization. They do not show pronounced structural differences from mesophilic homologues. Rather, a small number of apparently strong interactions is responsible for high thermal stability of these proteins. High-throughput comparative analysis of structures and complete genomes of several hyperthermophilic archaea and bacteria revealed that organisms develop diverse strategies of thermophilic adaptation by using, to a varying degree, two fundamental physical mechanisms of thermostability. The choice of a particular strategy depends on the evolutionary history of an organism. Proteins from organisms that originated in an extreme environment, such as hyperthermophilic archaea (Pyrococcus furiosus), are significantly more compact and more hydrophobic than their mesophilic counterparts. Alternatively, organisms that evolved as mesophiles but later recolonized a hot environment (Thermotoga maritima) relied in their evolutionary strategy of thermophilic adaptation on "sequence-based" mechanism of thermostability. We propose an evolutionary explanation of these differences based on physical concepts of protein designability.
Gold, Nicola D; Jackson, Richard M
2006-02-03
The rapid growth in protein structural data and the emergence of structural genomics projects have increased the need for automatic structure analysis and tools for function prediction. Small molecule recognition is critical to the function of many proteins; therefore, determination of ligand binding site similarity is important for understanding ligand interactions and may allow their functional classification. Here, we present a binding sites database (SitesBase) that given a known protein-ligand binding site allows rapid retrieval of other binding sites with similar structure independent of overall sequence or fold similarity. However, each match is also annotated with sequence similarity and fold information to aid interpretation of structure and functional similarity. Similarity in ligand binding sites can indicate common binding modes and recognition of similar molecules, allowing potential inference of function for an uncharacterised protein or providing additional evidence of common function where sequence or fold similarity is already known. Alternatively, the resource can provide valuable information for detailed studies of molecular recognition including structure-based ligand design and in understanding ligand cross-reactivity. Here, we show examples of atomic similarity between superfamily or more distant fold relatives as well as between seemingly unrelated proteins. Assignment of unclassified proteins to structural superfamiles is also undertaken and in most cases substantiates assignments made using sequence similarity. Correct assignment is also possible where sequence similarity fails to find significant matches, illustrating the potential use of binding site comparisons for newly determined proteins.
Li, Liqi; Cui, Xiang; Yu, Sanjiu; Zhang, Yuan; Luo, Zhong; Yang, Hua; Zhou, Yue; Zheng, Xiaoqi
2014-01-01
Protein structure prediction is critical to functional annotation of the massively accumulated biological sequences, which prompts an imperative need for the development of high-throughput technologies. As a first and key step in protein structure prediction, protein structural class prediction becomes an increasingly challenging task. Amongst most homological-based approaches, the accuracies of protein structural class prediction are sufficiently high for high similarity datasets, but still far from being satisfactory for low similarity datasets, i.e., below 40% in pairwise sequence similarity. Therefore, we present a novel method for accurate and reliable protein structural class prediction for both high and low similarity datasets. This method is based on Support Vector Machine (SVM) in conjunction with integrated features from position-specific score matrix (PSSM), PROFEAT and Gene Ontology (GO). A feature selection approach, SVM-RFE, is also used to rank the integrated feature vectors through recursively removing the feature with the lowest ranking score. The definitive top features selected by SVM-RFE are input into the SVM engines to predict the structural class of a query protein. To validate our method, jackknife tests were applied to seven widely used benchmark datasets, reaching overall accuracies between 84.61% and 99.79%, which are significantly higher than those achieved by state-of-the-art tools. These results suggest that our method could serve as an accurate and cost-effective alternative to existing methods in protein structural classification, especially for low similarity datasets.
Hensen, Ulf; Meyer, Tim; Haas, Jürgen; Rex, René; Vriend, Gert; Grubmüller, Helmut
2012-01-01
Proteins are usually described and classified according to amino acid sequence, structure or function. Here, we develop a minimally biased scheme to compare and classify proteins according to their internal mobility patterns. This approach is based on the notion that proteins not only fold into recurring structural motifs but might also be carrying out only a limited set of recurring mobility motifs. The complete set of these patterns, which we tentatively call the dynasome, spans a multi-dimensional space with axes, the dynasome descriptors, characterizing different aspects of protein dynamics. The unique dynamic fingerprint of each protein is represented as a vector in the dynasome space. The difference between any two vectors, consequently, gives a reliable measure of the difference between the corresponding protein dynamics. We characterize the properties of the dynasome by comparing the dynamics fingerprints obtained from molecular dynamics simulations of 112 proteins but our approach is, in principle, not restricted to any specific source of data of protein dynamics. We conclude that: 1. the dynasome consists of a continuum of proteins, rather than well separated classes. 2. For the majority of proteins we observe strong correlations between structure and dynamics. 3. Proteins with similar function carry out similar dynamics, which suggests a new method to improve protein function annotation based on protein dynamics. PMID:22606222
MODBASE, a database of annotated comparative protein structure models
Pieper, Ursula; Eswar, Narayanan; Stuart, Ashley C.; Ilyin, Valentin A.; Sali, Andrej
2002-01-01
MODBASE (http://guitar.rockefeller.edu/modbase) is a relational database of annotated comparative protein structure models for all available protein sequences matched to at least one known protein structure. The models are calculated by MODPIPE, an automated modeling pipeline that relies on PSI-BLAST, IMPALA and MODELLER. MODBASE uses the MySQL relational database management system for flexible and efficient querying, and the MODVIEW Netscape plugin for viewing and manipulating multiple sequences and structures. It is updated regularly to reflect the growth of the protein sequence and structure databases, as well as improvements in the software for calculating the models. For ease of access, MODBASE is organized into different datasets. The largest dataset contains models for domains in 304 517 out of 539 171 unique protein sequences in the complete TrEMBL database (23 March 2001); only models based on significant alignments (PSI-BLAST E-value < 10–4) and models assessed to have the correct fold are included. Other datasets include models for target selection and structure-based annotation by the New York Structural Genomics Research Consortium, models for prediction of genes in the Drosophila melanogaster genome, models for structure determination of several ribosomal particles and models calculated by the MODWEB comparative modeling web server. PMID:11752309
Chikenji, George; Fujitsuka, Yoshimi; Takada, Shoji
2006-02-28
Predicting protein tertiary structure by folding-like simulations is one of the most stringent tests of how much we understand the principle of protein folding. Currently, the most successful method for folding-based structure prediction is the fragment assembly (FA) method. Here, we address why the FA method is so successful and its lesson for the folding problem. To do so, using the FA method, we designed a structure prediction test of "chimera proteins." In the chimera proteins, local structural preference is specific to the target sequences, whereas nonlocal interactions are only sequence-independent compaction forces. We find that these chimera proteins can find the native folds of the intact sequences with high probability indicating dominant roles of the local interactions. We further explore roles of local structural preference by exact calculation of the HP lattice model of proteins. From these results, we suggest principles of protein folding: For small proteins, compact structures that are fully compatible with local structural preference are few, one of which is the native fold. These local biases shape up the funnel-like energy landscape.
Shaping up the protein folding funnel by local interaction: Lesson from a structure prediction study
Chikenji, George; Fujitsuka, Yoshimi; Takada, Shoji
2006-01-01
Predicting protein tertiary structure by folding-like simulations is one of the most stringent tests of how much we understand the principle of protein folding. Currently, the most successful method for folding-based structure prediction is the fragment assembly (FA) method. Here, we address why the FA method is so successful and its lesson for the folding problem. To do so, using the FA method, we designed a structure prediction test of “chimera proteins.” In the chimera proteins, local structural preference is specific to the target sequences, whereas nonlocal interactions are only sequence-independent compaction forces. We find that these chimera proteins can find the native folds of the intact sequences with high probability indicating dominant roles of the local interactions. We further explore roles of local structural preference by exact calculation of the HP lattice model of proteins. From these results, we suggest principles of protein folding: For small proteins, compact structures that are fully compatible with local structural preference are few, one of which is the native fold. These local biases shape up the funnel-like energy landscape. PMID:16488978
Text Mining for Protein Docking
Badal, Varsha D.; Kundrotas, Petras J.; Vakser, Ilya A.
2015-01-01
The rapidly growing amount of publicly available information from biomedical research is readily accessible on the Internet, providing a powerful resource for predictive biomolecular modeling. The accumulated data on experimentally determined structures transformed structure prediction of proteins and protein complexes. Instead of exploring the enormous search space, predictive tools can simply proceed to the solution based on similarity to the existing, previously determined structures. A similar major paradigm shift is emerging due to the rapidly expanding amount of information, other than experimentally determined structures, which still can be used as constraints in biomolecular structure prediction. Automated text mining has been widely used in recreating protein interaction networks, as well as in detecting small ligand binding sites on protein structures. Combining and expanding these two well-developed areas of research, we applied the text mining to structural modeling of protein-protein complexes (protein docking). Protein docking can be significantly improved when constraints on the docking mode are available. We developed a procedure that retrieves published abstracts on a specific protein-protein interaction and extracts information relevant to docking. The procedure was assessed on protein complexes from Dockground (http://dockground.compbio.ku.edu). The results show that correct information on binding residues can be extracted for about half of the complexes. The amount of irrelevant information was reduced by conceptual analysis of a subset of the retrieved abstracts, based on the bag-of-words (features) approach. Support Vector Machine models were trained and validated on the subset. The remaining abstracts were filtered by the best-performing models, which decreased the irrelevant information for ~ 25% complexes in the dataset. The extracted constraints were incorporated in the docking protocol and tested on the Dockground unbound benchmark set, significantly increasing the docking success rate. PMID:26650466
ERIC Educational Resources Information Center
National Institute of General Medical Sciences (NIGMS), 2007
2007-01-01
This booklet reveals how structural biology provides insight into health and disease and is useful in developing new medications. It contains a general introduction to proteins, coverage of the techniques used to determine protein structures, and a chapter on structure-based drug design. The booklet features "Student Snapshots," designed to…
Columba: an integrated database of proteins, structures, and annotations.
Trissl, Silke; Rother, Kristian; Müller, Heiko; Steinke, Thomas; Koch, Ina; Preissner, Robert; Frömmel, Cornelius; Leser, Ulf
2005-03-31
Structural and functional research often requires the computation of sets of protein structures based on certain properties of the proteins, such as sequence features, fold classification, or functional annotation. Compiling such sets using current web resources is tedious because the necessary data are spread over many different databases. To facilitate this task, we have created COLUMBA, an integrated database of annotations of protein structures. COLUMBA currently integrates twelve different databases, including PDB, KEGG, Swiss-Prot, CATH, SCOP, the Gene Ontology, and ENZYME. The database can be searched using either keyword search or data source-specific web forms. Users can thus quickly select and download PDB entries that, for instance, participate in a particular pathway, are classified as containing a certain CATH architecture, are annotated as having a certain molecular function in the Gene Ontology, and whose structures have a resolution under a defined threshold. The results of queries are provided in both machine-readable extensible markup language and human-readable format. The structures themselves can be viewed interactively on the web. The COLUMBA database facilitates the creation of protein structure data sets for many structure-based studies. It allows to combine queries on a number of structure-related databases not covered by other projects at present. Thus, information on both many and few protein structures can be used efficiently. The web interface for COLUMBA is available at http://www.columba-db.de.
Eggimann, Becky L.; Vostrikov, Vitaly V.; Veglia, Gianluigi; Siepmann, J. Ilja
2013-01-01
We present a fast and simple protocol to obtain moderate-resolution backbone structures of helical proteins. This approach utilizes a combination of sparse backbone NMR data (residual dipolar couplings and paramagnetic relaxation enhancements) or EPR data with a residue-based force field and Monte Carlo/simulated annealing protocol to explore the folding energy landscape of helical proteins. By using only backbone NMR data, which are relatively easy to collect and analyze, and strategically placed spin relaxation probes, we show that it is possible to obtain protein structures with correct helical topology and backbone RMS deviations well below 4 Å. This approach offers promising alternatives for the structural determination of proteins in which nuclear Overha-user effect data are difficult or impossible to assign and produces initial models that will speed up the high-resolution structure determination by NMR spectroscopy. PMID:24639619
Protein structure similarity from Principle Component Correlation analysis.
Zhou, Xiaobo; Chou, James; Wong, Stephen T C
2006-01-25
Owing to rapid expansion of protein structure databases in recent years, methods of structure comparison are becoming increasingly effective and important in revealing novel information on functional properties of proteins and their roles in the grand scheme of evolutionary biology. Currently, the structural similarity between two proteins is measured by the root-mean-square-deviation (RMSD) in their best-superimposed atomic coordinates. RMSD is the golden rule of measuring structural similarity when the structures are nearly identical; it, however, fails to detect the higher order topological similarities in proteins evolved into different shapes. We propose new algorithms for extracting geometrical invariants of proteins that can be effectively used to identify homologous protein structures or topologies in order to quantify both close and remote structural similarities. We measure structural similarity between proteins by correlating the principle components of their secondary structure interaction matrix. In our approach, the Principle Component Correlation (PCC) analysis, a symmetric interaction matrix for a protein structure is constructed with relationship parameters between secondary elements that can take the form of distance, orientation, or other relevant structural invariants. When using a distance-based construction in the presence or absence of encoded N to C terminal sense, there are strong correlations between the principle components of interaction matrices of structurally or topologically similar proteins. The PCC method is extensively tested for protein structures that belong to the same topological class but are significantly different by RMSD measure. The PCC analysis can also differentiate proteins having similar shapes but different topological arrangements. Additionally, we demonstrate that when using two independently defined interaction matrices, comparison of their maximum eigenvalues can be highly effective in clustering structurally or topologically similar proteins. We believe that the PCC analysis of interaction matrix is highly flexible in adopting various structural parameters for protein structure comparison.
Protein domain assignment from the recurrence of locally similar structures
Tai, Chin-Hsien; Sam, Vichetra; Gibrat, Jean-Francois; Garnier, Jean; Munson, Peter J.
2010-01-01
Domains are basic units of protein structure and essential for exploring protein fold space and structure evolution. With the structural genomics initiative, the number of protein structures in the Protein Databank (PDB) is increasing dramatically and domain assignments need to be done automatically. Most existing structural domain assignment programs define domains using the compactness of the domains and/or the number and strength of intra-domain versus inter-domain contacts. Here we present a different approach based on the recurrence of locally similar structural pieces (LSSPs) found by one-against-all structure comparisons with a dataset of 6,373 protein chains from the PDB. Residues of the query protein are clustered using LSSPs via three different procedures to define domains. This approach gives results that are comparable to several existing programs that use geometrical and other structural information explicitly. Remarkably, most of the proteins that contribute the LSSPs defining a domain do not themselves contain the domain of interest. This study shows that domains can be defined by a collection of relatively small locally similar structural pieces containing, on average, four secondary structure elements. In addition, it indicates that domains are indeed made of recurrent small structural pieces that are used to build protein structures of many different folds as suggested by recent studies. PMID:21287617
Whitford, Paul C; Noel, Jeffrey K; Gosavi, Shachi; Schug, Alexander; Sanbonmatsu, Kevin Y; Onuchic, José N
2009-05-01
Protein dynamics take place on many time and length scales. Coarse-grained structure-based (Go) models utilize the funneled energy landscape theory of protein folding to provide an understanding of both long time and long length scale dynamics. All-atom empirical forcefields with explicit solvent can elucidate our understanding of short time dynamics with high energetic and structural resolution. Thus, structure-based models with atomic details included can be used to bridge our understanding between these two approaches. We report on the robustness of folding mechanisms in one such all-atom model. Results for the B domain of Protein A, the SH3 domain of C-Src Kinase, and Chymotrypsin Inhibitor 2 are reported. The interplay between side chain packing and backbone folding is explored. We also compare this model to a C(alpha) structure-based model and an all-atom empirical forcefield. Key findings include: (1) backbone collapse is accompanied by partial side chain packing in a cooperative transition and residual side chain packing occurs gradually with decreasing temperature, (2) folding mechanisms are robust to variations of the energetic parameters, (3) protein folding free-energy barriers can be manipulated through parametric modifications, (4) the global folding mechanisms in a C(alpha) model and the all-atom model agree, although differences can be attributed to energetic heterogeneity in the all-atom model, and (5) proline residues have significant effects on folding mechanisms, independent of isomerization effects. Because this structure-based model has atomic resolution, this work lays the foundation for future studies to probe the contributions of specific energetic factors on protein folding and function.
Whitford, Paul C.; Noel, Jeffrey K.; Gosavi, Shachi; Schug, Alexander; Sanbonmatsu, Kevin Y.; Onuchic, José N.
2012-01-01
Protein dynamics take place on many time and length scales. Coarse-grained structure-based (Gō) models utilize the funneled energy landscape theory of protein folding to provide an understanding of both long time and long length scale dynamics. All-atom empirical forcefields with explicit solvent can elucidate our understanding of short time dynamics with high energetic and structural resolution. Thus, structure-based models with atomic details included can be used to bridge our understanding between these two approaches. We report on the robustness of folding mechanisms in one such all-atom model. Results for the B domain of Protein A, the SH3 domain of C-Src Kinase and Chymotrypsin Inhibitor 2 are reported. The interplay between side chain packing and backbone folding is explored. We also compare this model to a Cα structure-based model and an all-atom empirical forcefield. Key findings include 1) backbone collapse is accompanied by partial side chain packing in a cooperative transition and residual side chain packing occurs gradually with decreasing temperature 2) folding mechanisms are robust to variations of the energetic parameters 3) protein folding free energy barriers can be manipulated through parametric modifications 4) the global folding mechanisms in a Cα model and the all-atom model agree, although differences can be attributed to energetic heterogeneity in the all-atom model 5) proline residues have significant effects on folding mechanisms, independent of isomerization effects. Since this structure-based model has atomic resolution, this work lays the foundation for future studies to probe the contributions of specific energetic factors on protein folding and function. PMID:18837035
Discriminative structural approaches for enzyme active-site prediction.
Kato, Tsuyoshi; Nagano, Nozomi
2011-02-15
Predicting enzyme active-sites in proteins is an important issue not only for protein sciences but also for a variety of practical applications such as drug design. Because enzyme reaction mechanisms are based on the local structures of enzyme active-sites, various template-based methods that compare local structures in proteins have been developed to date. In comparing such local sites, a simple measurement, RMSD, has been used so far. This paper introduces new machine learning algorithms that refine the similarity/deviation for comparison of local structures. The similarity/deviation is applied to two types of applications, single template analysis and multiple template analysis. In the single template analysis, a single template is used as a query to search proteins for active sites, whereas a protein structure is examined as a query to discover the possible active-sites using a set of templates in the multiple template analysis. This paper experimentally illustrates that the machine learning algorithms effectively improve the similarity/deviation measurements for both the analyses.
Web3DMol: interactive protein structure visualization based on WebGL.
Shi, Maoxiang; Gao, Juntao; Zhang, Michael Q
2017-07-03
A growing number of web-based databases and tools for protein research are being developed. There is now a widespread need for visualization tools to present the three-dimensional (3D) structure of proteins in web browsers. Here, we introduce our 3D modeling program-Web3DMol-a web application focusing on protein structure visualization in modern web browsers. Users submit a PDB identification code or select a PDB archive from their local disk, and Web3DMol will display and allow interactive manipulation of the 3D structure. Featured functions, such as sequence plot, fragment segmentation, measure tool and meta-information display, are offered for users to gain a better understanding of protein structure. Easy-to-use APIs are available for developers to reuse and extend Web3DMol. Web3DMol can be freely accessed at http://web3dmol.duapp.com/, and the source code is distributed under the MIT license. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Elman RNN based classification of proteins sequences on account of their mutual information.
Mishra, Pooja; Nath Pandey, Paras
2012-10-21
In the present work we have employed the method of estimating residue correlation within the protein sequences, by using the mutual information (MI) of adjacent residues, based on structural and solvent accessibility properties of amino acids. The long range correlation between nonadjacent residues is improved by constructing a mutual information vector (MIV) for a single protein sequence, like this each protein sequence is associated with its corresponding MIVs. These MIVs are given to Elman RNN to obtain the classification of protein sequences. The modeling power of MIV was shown to be significantly better, giving a new approach towards alignment free classification of protein sequences. We also conclude that sequence structural and solvent accessible property based MIVs are better predictor. Copyright © 2012 Elsevier Ltd. All rights reserved.
Struct2Net: a web service to predict protein–protein interactions using a structure-based approach
Singh, Rohit; Park, Daniel; Xu, Jinbo; Hosur, Raghavendra; Berger, Bonnie
2010-01-01
Struct2Net is a web server for predicting interactions between arbitrary protein pairs using a structure-based approach. Prediction of protein–protein interactions (PPIs) is a central area of interest and successful prediction would provide leads for experiments and drug design; however, the experimental coverage of the PPI interactome remains inadequate. We believe that Struct2Net is the first community-wide resource to provide structure-based PPI predictions that go beyond homology modeling. Also, most web-resources for predicting PPIs currently rely on functional genomic data (e.g. GO annotation, gene expression, cellular localization, etc.). Our structure-based approach is independent of such methods and only requires the sequence information of the proteins being queried. The web service allows multiple querying options, aimed at maximizing flexibility. For the most commonly studied organisms (fly, human and yeast), predictions have been pre-computed and can be retrieved almost instantaneously. For proteins from other species, users have the option of getting a quick-but-approximate result (using orthology over pre-computed results) or having a full-blown computation performed. The web service is freely available at http://struct2net.csail.mit.edu. PMID:20513650
Overcoming bottlenecks in the membrane protein structural biology pipeline.
Hardy, David; Bill, Roslyn M; Jawhari, Anass; Rothnie, Alice J
2016-06-15
Membrane proteins account for a third of the eukaryotic proteome, but are greatly under-represented in the Protein Data Bank. Unfortunately, recent technological advances in X-ray crystallography and EM cannot account for the poor solubility and stability of membrane protein samples. A limitation of conventional detergent-based methods is that detergent molecules destabilize membrane proteins, leading to their aggregation. The use of orthologues, mutants and fusion tags has helped improve protein stability, but at the expense of not working with the sequence of interest. Novel detergents such as glucose neopentyl glycol (GNG), maltose neopentyl glycol (MNG) and calixarene-based detergents can improve protein stability without compromising their solubilizing properties. Styrene maleic acid lipid particles (SMALPs) focus on retaining the native lipid bilayer of a membrane protein during purification and biophysical analysis. Overcoming bottlenecks in the membrane protein structural biology pipeline, primarily by maintaining protein stability, will facilitate the elucidation of many more membrane protein structures in the near future. © 2016 The Author(s). published by Portland Press Limited on behalf of the Biochemical Society.
Automatic Classification of Protein Structure Using the Maximum Contact Map Overlap Metric
DOE Office of Scientific and Technical Information (OSTI.GOV)
Andonov, Rumen; Djidjev, Hristo Nikolov; Klau, Gunnar W.
In this paper, we propose a new distance measure for comparing two protein structures based on their contact map representations. We show that our novel measure, which we refer to as the maximum contact map overlap (max-CMO) metric, satisfies all properties of a metric on the space of protein representations. Having a metric in that space allows one to avoid pairwise comparisons on the entire database and, thus, to significantly accelerate exploring the protein space compared to no-metric spaces. We show on a gold standard superfamily classification benchmark set of 6759 proteins that our exact k-nearest neighbor (k-NN) scheme classifiesmore » up to 224 out of 236 queries correctly and on a larger, extended version of the benchmark with 60; 850 additional structures, up to 1361 out of 1369 queries. Finally, our k-NN classification thus provides a promising approach for the automatic classification of protein structures based on flexible contact map overlap alignments.« less
Automatic Classification of Protein Structure Using the Maximum Contact Map Overlap Metric
Andonov, Rumen; Djidjev, Hristo Nikolov; Klau, Gunnar W.; ...
2015-10-09
In this paper, we propose a new distance measure for comparing two protein structures based on their contact map representations. We show that our novel measure, which we refer to as the maximum contact map overlap (max-CMO) metric, satisfies all properties of a metric on the space of protein representations. Having a metric in that space allows one to avoid pairwise comparisons on the entire database and, thus, to significantly accelerate exploring the protein space compared to no-metric spaces. We show on a gold standard superfamily classification benchmark set of 6759 proteins that our exact k-nearest neighbor (k-NN) scheme classifiesmore » up to 224 out of 236 queries correctly and on a larger, extended version of the benchmark with 60; 850 additional structures, up to 1361 out of 1369 queries. Finally, our k-NN classification thus provides a promising approach for the automatic classification of protein structures based on flexible contact map overlap alignments.« less
Structure-based drug design for G protein-coupled receptors.
Congreve, Miles; Dias, João M; Marshall, Fiona H
2014-01-01
Our understanding of the structural biology of G protein-coupled receptors has undergone a transformation over the past 5 years. New protein-ligand complexes are described almost monthly in high profile journals. Appreciation of how small molecules and natural ligands bind to their receptors has the potential to impact enormously how medicinal chemists approach this major class of receptor targets. An outline of the key topics in this field and some recent examples of structure- and fragment-based drug design are described. A table is presented with example views of each G protein-coupled receptor for which there is a published X-ray structure, including interactions with small molecule antagonists, partial and full agonists. The possible implications of these new data for drug design are discussed. © 2014 Elsevier B.V. All rights reserved.
Khvostichenko, Daria S.; Schieferstein, Jeremy M.; Pawate, Ashtamurthy S.; ...
2014-08-21
Crystallization from lipidic mesophase matrices is a promising route to diffraction-quality crystals and structures of membrane proteins. The microfluidic approach reported here eliminates two bottlenecks of the standard mesophase-based crystallization protocols: (i) manual preparation of viscous mesophases and (ii) manual harvesting of often small and fragile protein crystals. In the approach reported here, protein-loaded mesophases are formulated in an X-ray transparent microfluidic chip using only 60 nL of the protein solution per crystallization trial. The X-ray transparency of the chip enables diffraction data collection from multiple crystals residing in microfluidic wells, eliminating the normally required manual harvesting and mounting ofmore » individual crystals. In addition, we validated our approach by on-chip crystallization of photosynthetic reaction center, a membrane protein from Rhodobacter sphaeroides, followed by solving its structure to a resolution of 2.5 Å using X-ray diffraction data collected on-chip under ambient conditions. A moderate conformational change in hydrophilic chains of the protein was observed when comparing the on-chip, room temperature structure with known structures for which data were acquired under cryogenic conditions.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Khvostichenko, Daria S.; Schieferstein, Jeremy M.; Pawate, Ashtamurthy S.
2014-10-01
Crystallization from lipidic mesophase matrices is a promising route to diffraction-quality crystals and structures of membrane proteins. The microfluidic approach reported here eliminates two bottlenecks of the standard mesophase-based crystallization protocols: (i) manual preparation of viscous mesophases and (ii) manual harvesting of often small and fragile protein crystals. In the approach reported here, protein-loaded mesophases are formulated in an X-ray transparent microfluidic chip using only 60 nL of the protein solution per crystallization trial. The X-ray transparency of the chip enables diffraction data collection from multiple crystals residing in microfluidic wells, eliminating the normally required manual harvesting and mounting ofmore » individual crystals. We validated our approach by on-chip crystallization of photosynthetic reaction center, a membrane protein from Rhodobacter sphaeroides, followed by solving its structure to a resolution of 2.5 Å using X-ray diffraction data collected on-chip under ambient conditions. A moderate conformational change in hydrophilic chains of the protein was observed when comparing the on-chip, room temperature structure with known structures for which data were acquired under cryogenic conditions.« less
Devine, Paul W A; Fisher, Henry C; Calabrese, Antonio N; Whelan, Fiona; Higazi, Daniel R; Potts, Jennifer R; Lowe, David C; Radford, Sheena E; Ashcroft, Alison E
2017-09-01
Collision cross-section (CCS) measurements obtained from ion mobility spectrometry-mass spectrometry (IMS-MS) analyses often provide useful information concerning a protein's size and shape and can be complemented by modeling procedures. However, there have been some concerns about the extent to which certain proteins maintain a native-like conformation during the gas-phase analysis, especially proteins with dynamic or extended regions. Here we have measured the CCSs of a range of biomolecules including non-globular proteins and RNAs of different sequence, size, and stability. Using traveling wave IMS-MS, we show that for the proteins studied, the measured CCS deviates significantly from predicted CCS values based upon currently available structures. The results presented indicate that these proteins collapse to different extents varying on their elongated structures upon transition into the gas-phase. Comparing two RNAs of similar mass but different solution structures, we show that these biomolecules may also be susceptible to gas-phase compaction. Together, the results suggest that caution is needed when predicting structural models based on CCS data for RNAs as well as proteins with non-globular folds. Graphical Abstract ᅟ.
Estrada-Ortiz, Natalia; Neochoritis, Constantinos G; Dömling, Alexander
2016-04-19
A recent therapeutic strategy in oncology is based on blocking the protein-protein interaction between the murine double minute (MDM) homologues MDM2/X and the tumor-suppressor protein p53. Inhibiting the binding between wild-type (WT) p53 and its negative regulators MDM2 and/or MDMX has become an important target in oncology to restore the antitumor activity of p53, the so-called guardian of our genome. Interestingly, based on the multiple disclosed compound classes and structural analysis of small-molecule-MDM2 adducts, the p53-MDM2 complex is perhaps the best studied and most targeted protein-protein interaction. Several classes of small molecules have been identified as potent, selective, and efficient inhibitors of the p53-MDM2/X interaction, and many co-crystal structures with the protein are available. Herein we review the properties as well as preclinical and clinical studies of these small molecules and peptides, categorized by scaffold type. A particular emphasis is made on crystallographic structures and the observed binding modes of these compounds, including conserved water molecules present. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Kang, Wen-Bin; He, Chuan; Liu, Zhen-Xing; Wang, Jun; Wang, Wei
2018-05-16
Previous studies based on bioinformatics showed that there is a sharp distinction of structural features and residue composition between the intrinsically disordered proteins and the folded proteins. What induces such a composition-related structural transition? How do various kinds of interactions work in such processes? In this work, we investigate these problems based on a survey on peptides randomly composed of charged residues (including glutamic acids and lysines) and the residues with different hydrophobicity, such as alanines, glycines, or phenylalanines. Based on simulations using all-atom model and replica-exchange Monte Carlo method, a coil-globule transition is observed for each peptide. The corresponding transition temperature is found to be dependent on the contents of the hydrophobic and charged residues. For several cases, when the mean hydrophobicity is larger than a certain threshold, the transition temperature is higher than the room temperature, and vise versa. These thresholds of hydrophobicity and net charge are quantitatively consistent with the border line observed from the study of bioinformatics. These results outline the basic physical reasons for the compositional distinction between the intrinsically disordered proteins and the folded proteins. Furthermore, the contributions of various interactions to the structural variation of peptides are analyzed based on the contact statistics and the charge-pattern dependence of the gyration radii of the peptides. Our observations imply that the hydrophobicity contributes essentially to such composition-related transitions. Thus, we achieve a better understanding on composition-structure relation of the natural proteins and the underlying physics.
CCBuilder 2.0: Powerful and accessible coiled-coil modeling.
Wood, Christopher W; Woolfson, Derek N
2018-01-01
The increased availability of user-friendly and accessible computational tools for biomolecular modeling would expand the reach and application of biomolecular engineering and design. For protein modeling, one key challenge is to reduce the complexities of 3D protein folds to sets of parametric equations that nonetheless capture the salient features of these structures accurately. At present, this is possible for a subset of proteins, namely, repeat proteins. The α-helical coiled coil provides one such example, which represents ≈ 3-5% of all known protein-encoding regions of DNA. Coiled coils are bundles of α helices that can be described by a small set of structural parameters. Here we describe how this parametric description can be implemented in an easy-to-use web application, called CCBuilder 2.0, for modeling and optimizing both α-helical coiled coils and polyproline-based collagen triple helices. This has many applications from providing models to aid molecular replacement for X-ray crystallography, in silico model building and engineering of natural and designed protein assemblies, and through to the creation of completely de novo "dark matter" protein structures. CCBuilder 2.0 is available as a web-based application, the code for which is open-source and can be downloaded freely. http://coiledcoils.chm.bris.ac.uk/ccbuilder2. We have created CCBuilder 2.0, an easy to use web-based application that can model structures for a whole class of proteins, the α-helical coiled coil, which is estimated to account for 3-5% of all proteins in nature. CCBuilder 2.0 will be of use to a large number of protein scientists engaged in fundamental studies, such as protein structure determination, through to more-applied research including designing and engineering novel proteins that have potential applications in biotechnology. © 2017 The Authors Protein Science published by Wiley Periodicals, Inc. on behalf of The Protein Society.
Bio-Organic Nanotechnology: Using Proteins and Synthetic Polymers for Nanoscale Devices
NASA Technical Reports Server (NTRS)
Molnar, Linda K.; Xu, Ting; Trent, Jonathan D.; Russell, Thomas P.
2003-01-01
While the ability of proteins to self-assemble makes them powerful tools in nanotechnology, in biological systems protein-based structures ultimately depend on the context in which they form. We combine the self-assembling properties of synthetic diblock copolymers and proteins to construct intricately ordered, three-dimensional polymer protein structures with the ultimate goal of forming nano-scale devices. This hybrid approach takes advantage of the capabilities of organic polymer chemistry to build ordered structures and the capabilities of genetic engineering to create proteins that are selective for inorganic or organic substrates. Here, microphase-separated block copolymers coupled with genetically engineered heat shock proteins are used to produce nano-scale patterning that maximizes the potential for both increased structural complexity and integrity.
Lorenzo, J Ramiro; Alonso, Leonardo G; Sánchez, Ignacio E
2015-01-01
Asparagine residues in proteins undergo spontaneous deamidation, a post-translational modification that may act as a molecular clock for the regulation of protein function and turnover. Asparagine deamidation is modulated by protein local sequence, secondary structure and hydrogen bonding. We present NGOME, an algorithm able to predict non-enzymatic deamidation of internal asparagine residues in proteins in the absence of structural data, using sequence-based predictions of secondary structure and intrinsic disorder. Compared to previous algorithms, NGOME does not require three-dimensional structures yet yields better predictions than available sequence-only methods. Four case studies of specific proteins show how NGOME may help the user identify deamidation-prone asparagine residues, often related to protein gain of function, protein degradation or protein misfolding in pathological processes. A fifth case study applies NGOME at a proteomic scale and unveils a correlation between asparagine deamidation and protein degradation in yeast. NGOME is freely available as a webserver at the National EMBnet node Argentina, URL: http://www.embnet.qb.fcen.uba.ar/ in the subpage "Protein and nucleic acid structure and sequence analysis".
Evaluation of variability in high-resolution protein structures by global distance scoring.
Anzai, Risa; Asami, Yoshiki; Inoue, Waka; Ueno, Hina; Yamada, Koya; Okada, Tetsuji
2018-01-01
Systematic analysis of the statistical and dynamical properties of proteins is critical to understanding cellular events. Extraction of biologically relevant information from a set of high-resolution structures is important because it can provide mechanistic details behind the functional properties of protein families, enabling rational comparison between families. Most of the current structural comparisons are pairwise-based, which hampers the global analysis of increasing contents in the Protein Data Bank. Additionally, pairing of protein structures introduces uncertainty with respect to reproducibility because it frequently accompanies other settings for superimposition. This study introduces intramolecular distance scoring for the global analysis of proteins, for each of which at least several high-resolution structures are available. As a pilot study, we have tested 300 human proteins and showed that the method is comprehensively used to overview advances in each protein and protein family at the atomic level. This method, together with the interpretation of the model calculations, provide new criteria for understanding specific structural variation in a protein, enabling global comparison of the variability in proteins from different species.
Preparation of Protein Samples for NMR Structure, Function, and Small Molecule Screening Studies
Acton, Thomas B.; Xiao, Rong; Anderson, Stephen; Aramini, James; Buchwald, William A.; Ciccosanti, Colleen; Conover, Ken; Everett, John; Hamilton, Keith; Huang, Yuanpeng Janet; Janjua, Haleema; Kornhaber, Gregory; Lau, Jessica; Lee, Dong Yup; Liu, Gaohua; Maglaqui, Melissa; Ma, Lichung; Mao, Lei; Patel, Dayaban; Rossi, Paolo; Sahdev, Seema; Shastry, Ritu; Swapna, G.V.T.; Tang, Yeufeng; Tong, Saichiu; Wang, Dongyan; Wang, Huang; Zhao, Li; Montelione, Gaetano T.
2014-01-01
In this chapter, we concentrate on the production of high quality protein samples for NMR studies. In particular, we provide an in-depth description of recent advances in the production of NMR samples and their synergistic use with recent advancements in NMR hardware. We describe the protein production platform of the Northeast Structural Genomics Consortium, and outline our high-throughput strategies for producing high quality protein samples for nuclear magnetic resonance (NMR) studies. Our strategy is based on the cloning, expression and purification of 6X-His-tagged proteins using T7-based Escherichia coli systems and isotope enrichment in minimal media. We describe 96-well ligation-independent cloning and analytical expression systems, parallel preparative scale fermentation, and high-throughput purification protocols. The 6X-His affinity tag allows for a similar two-step purification procedure implemented in a parallel high-throughput fashion that routinely results in purity levels sufficient for NMR studies (> 97% homogeneity). Using this platform, the protein open reading frames of over 17,500 different targeted proteins (or domains) have been cloned as over 28,000 constructs. Nearly 5,000 of these proteins have been purified to homogeneity in tens of milligram quantities (see Summary Statistics, http://nesg.org/statistics.html), resulting in more than 950 new protein structures, including more than 400 NMR structures, deposited in the Protein Data Bank. The Northeast Structural Genomics Consortium pipeline has been effective in producing protein samples of both prokaryotic and eukaryotic origin. Although this paper describes our entire pipeline for producing isotope-enriched protein samples, it focuses on the major updates introduced during the last 5 years (Phase 2 of the National Institute of General Medical Sciences Protein Structure Initiative). Our advanced automated and/or parallel cloning, expression, purification, and biophysical screening technologies are suitable for implementation in a large individual laboratory or by a small group of collaborating investigators for structural biology, functional proteomics, ligand screening and structural genomics research. PMID:21371586
Brylinski, Michal; Skolnick, Jeffrey
2010-01-01
The rapid accumulation of gene sequences, many of which are hypothetical proteins with unknown function, has stimulated the development of accurate computational tools for protein function prediction with evolution/structure-based approaches showing considerable promise. In this paper, we present FINDSITE-metal, a new threading-based method designed specifically to detect metal binding sites in modeled protein structures. Comprehensive benchmarks using different quality protein structures show that weakly homologous protein models provide sufficient structural information for quite accurate annotation by FINDSITE-metal. Combining structure/evolutionary information with machine learning results in highly accurate metal binding annotations; for protein models constructed by TASSER, whose average Cα RMSD from the native structure is 8.9 Å, 59.5% (71.9%) of the best of top five predicted metal locations are within 4 Å (8 Å) from a bound metal in the crystal structure. For most of the targets, multiple metal binding sites are detected with the best predicted binding site at rank 1 and within the top 2 ranks in 65.6% and 83.1% of the cases, respectively. Furthermore, for iron, copper, zinc, calcium and magnesium ions, the binding metal can be predicted with high, typically 70-90%, accuracy. FINDSITE-metal also provides a set of confidence indexes that help assess the reliability of predictions. Finally, we describe the proteome-wide application of FINDSITE-metal that quantifies the metal binding complement of the human proteome. FINDSITE-metal is freely available to the academic community at http://cssb.biology.gatech.edu/findsite-metal/. PMID:21287609
Structure solution of DNA-binding proteins and complexes with ARCIMBOLDO libraries
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pröpper, Kevin; Instituto de Biologia Molecular de Barcelona; Meindl, Kathrin
2014-06-01
The structure solution of DNA-binding protein structures and complexes based on the combination of location of DNA-binding protein motif fragments with density modification in a multi-solution frame is described. Protein–DNA interactions play a major role in all aspects of genetic activity within an organism, such as transcription, packaging, rearrangement, replication and repair. The molecular detail of protein–DNA interactions can be best visualized through crystallography, and structures emphasizing insight into the principles of binding and base-sequence recognition are essential to understanding the subtleties of the underlying mechanisms. An increasing number of high-quality DNA-binding protein structure determinations have been witnessed despite themore » fact that the crystallographic particularities of nucleic acids tend to pose specific challenges to methods primarily developed for proteins. Crystallographic structure solution of protein–DNA complexes therefore remains a challenging area that is in need of optimized experimental and computational methods. The potential of the structure-solution program ARCIMBOLDO for the solution of protein–DNA complexes has therefore been assessed. The method is based on the combination of locating small, very accurate fragments using the program Phaser and density modification with the program SHELXE. Whereas for typical proteins main-chain α-helices provide the ideal, almost ubiquitous, small fragments to start searches, in the case of DNA complexes the binding motifs and DNA double helix constitute suitable search fragments. The aim of this work is to provide an effective library of search fragments as well as to determine the optimal ARCIMBOLDO strategy for the solution of this class of structures.« less
Automatic classification of protein structures relying on similarities between alignments
2012-01-01
Background Identification of protein structural cores requires isolation of sets of proteins all sharing a same subset of structural motifs. In the context of an ever growing number of available 3D protein structures, standard and automatic clustering algorithms require adaptations so as to allow for efficient identification of such sets of proteins. Results When considering a pair of 3D structures, they are stated as similar or not according to the local similarities of their matching substructures in a structural alignment. This binary relation can be represented in a graph of similarities where a node represents a 3D protein structure and an edge states that two 3D protein structures are similar. Therefore, classifying proteins into structural families can be viewed as a graph clustering task. Unfortunately, because such a graph encodes only pairwise similarity information, clustering algorithms may include in the same cluster a subset of 3D structures that do not share a common substructure. In order to overcome this drawback we first define a ternary similarity on a triple of 3D structures as a constraint to be satisfied by the graph of similarities. Such a ternary constraint takes into account similarities between pairwise alignments, so as to ensure that the three involved protein structures do have some common substructure. We propose hereunder a modification algorithm that eliminates edges from the original graph of similarities and gives a reduced graph in which no ternary constraints are violated. Our approach is then first to build a graph of similarities, then to reduce the graph according to the modification algorithm, and finally to apply to the reduced graph a standard graph clustering algorithm. Such method was used for classifying ASTRAL-40 non-redundant protein domains, identifying significant pairwise similarities with Yakusa, a program devised for rapid 3D structure alignments. Conclusions We show that filtering similarities prior to standard graph based clustering process by applying ternary similarity constraints i) improves the separation of proteins of different classes and consequently ii) improves the classification quality of standard graph based clustering algorithms according to the reference classification SCOP. PMID:22974051
Protein classification using sequential pattern mining.
Exarchos, Themis P; Papaloukas, Costas; Lampros, Christos; Fotiadis, Dimitrios I
2006-01-01
Protein classification in terms of fold recognition can be employed to determine the structural and functional properties of a newly discovered protein. In this work sequential pattern mining (SPM) is utilized for sequence-based fold recognition. One of the most efficient SPM algorithms, cSPADE, is employed for protein primary structure analysis. Then a classifier uses the extracted sequential patterns for classifying proteins of unknown structure in the appropriate fold category. The proposed methodology exhibited an overall accuracy of 36% in a multi-class problem of 17 candidate categories. The classification performance reaches up to 65% when the three most probable protein folds are considered.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Anna Johnston, SNL 9215
2002-09-01
PDB to AMPL Conversion was written to convert protein data base files to AMPL files. The protein data bases on the internet contain a wealth of information about the structue and makeup of proteins. Each file contains information derived by one or more experiments and contains information on how the experiment waw performed, the amino acid building blocks of each chain, and often the three-dimensional structure of the protein extracted from the experiments. The way a protein folds determines much about its function. Thus, studying the three-dimensional structure of the protein is of great interest. Analysing the contact maps ismore » one way to examine the structure. A contact map is a graph which has a linear back bone of amino acids for nodes (i.e., adjacent amino acids are always connected) and vertices between non-adjacent nodes if they are close enough to be considered in contact. If the graphs are similar then the folds of the protein and their function should also be similar. This software extracts the contact maps from a protein data base file and puts in into AMPL data format. This format is designed for use in AMPL, a programming language for simplifying linear programming formulations.« less
Principles of assembly reveal a periodic table of protein complexes.
Ahnert, Sebastian E; Marsh, Joseph A; Hernández, Helena; Robinson, Carol V; Teichmann, Sarah A
2015-12-11
Structural insights into protein complexes have had a broad impact on our understanding of biological function and evolution. In this work, we sought a comprehensive understanding of the general principles underlying quaternary structure organization in protein complexes. We first examined the fundamental steps by which protein complexes can assemble, using experimental and structure-based characterization of assembly pathways. Most assembly transitions can be classified into three basic types, which can then be used to exhaustively enumerate a large set of possible quaternary structure topologies. These topologies, which include the vast majority of observed protein complex structures, enable a natural organization of protein complexes into a periodic table. On the basis of this table, we can accurately predict the expected frequencies of quaternary structure topologies, including those not yet observed. These results have important implications for quaternary structure prediction, modeling, and engineering. Copyright © 2015, American Association for the Advancement of Science.
Ryu, Hyojung; Lim, GyuTae; Sung, Bong Hyun; Lee, Jinhyuk
2016-02-15
Protein structure refinement is a necessary step for the study of protein function. In particular, some nuclear magnetic resonance (NMR) structures are of lower quality than X-ray crystallographic structures. Here, we present NMRe, a web-based server for NMR structure refinement. The previously developed knowledge-based energy function STAP (Statistical Torsion Angle Potential) was used for NMRe refinement. With STAP, NMRe provides two refinement protocols using two types of distance restraints. If a user provides NOE (Nuclear Overhauser Effect) data, the refinement is performed with the NOE distance restraints as a conventional NMR structure refinement. Additionally, NMRe generates NOE-like distance restraints based on the inter-hydrogen distances derived from the input structure. The efficiency of NMRe refinement was validated on 20 NMR structures. Most of the quality assessment scores of the refined NMR structures were better than those of the original structures. The refinement results are provided as a three-dimensional structure view, a secondary structure scheme, and numerical and graphical structure validation scores. NMRe is available at http://psb.kobic.re.kr/nmre/. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Goonesekere, Nalin Cw
2009-01-01
The large numbers of protein sequences generated by whole genome sequencing projects require rapid and accurate methods of annotation. The detection of homology through computational sequence analysis is a powerful tool in determining the complex evolutionary and functional relationships that exist between proteins. Homology search algorithms employ amino acid substitution matrices to detect similarity between proteins sequences. The substitution matrices in common use today are constructed using sequences aligned without reference to protein structure. Here we present amino acid substitution matrices constructed from the alignment of a large number of protein domain structures from the structural classification of proteins (SCOP) database. We show that when incorporated into the homology search algorithms BLAST and PSI-blast, the structure-based substitution matrices enhance the efficacy of detecting remote homologs.
Chemical cross-linking and native mass spectrometry: A fruitful combination for structural biology
Sinz, Andrea; Arlt, Christian; Chorev, Dror; Sharon, Michal
2015-01-01
Mass spectrometry (MS) is becoming increasingly popular in the field of structural biology for analyzing protein three-dimensional-structures and for mapping protein–protein interactions. In this review, the specific contributions of chemical crosslinking and native MS are outlined to reveal the structural features of proteins and protein assemblies. Both strategies are illustrated based on the examples of the tetrameric tumor suppressor protein p53 and multisubunit vinculin-Arp2/3 hybrid complexes. We describe the distinct advantages and limitations of each technique and highlight synergistic effects when both techniques are combined. Integrating both methods is especially useful for characterizing large protein assemblies and for capturing transient interactions. We also point out the future directions we foresee for a combination of in vivo crosslinking and native MS for structural investigation of intact protein assemblies. PMID:25970732
Rysavy, Steven J; Beck, David AC; Daggett, Valerie
2014-01-01
Protein function is intimately linked to protein structure and dynamics yet experimentally determined structures frequently omit regions within a protein due to indeterminate data, which is often due protein dynamics. We propose that atomistic molecular dynamics simulations provide a diverse sampling of biologically relevant structures for these missing segments (and beyond) to improve structural modeling and structure prediction. Here we make use of the Dynameomics data warehouse, which contains simulations of representatives of essentially all known protein folds. We developed novel computational methods to efficiently identify, rank and retrieve small peptide structures, or fragments, from this database. We also created a novel data model to analyze and compare large repositories of structural data, such as contained within the Protein Data Bank and the Dynameomics data warehouse. Our evaluation compares these structural repositories for improving loop predictions and analyzes the utility of our methods and models. Using a standard set of loop structures, containing 510 loops, 30 for each loop length from 4 to 20 residues, we find that the inclusion of Dynameomics structures in fragment-based methods improves the quality of the loop predictions without being dependent on sequence homology. Depending on loop length, ∼25–75% of the best predictions came from the Dynameomics set, resulting in lower main chain root-mean-square deviations for all fragment lengths using the combined fragment library. We also provide specific cases where Dynameomics fragments provide better predictions for NMR loop structures than fragments from crystal structures. Online access to these fragment libraries is available at http://www.dynameomics.org/fragments. PMID:25142412
WIWS: a protein structure bioinformatics Web service collection.
Hekkelman, M L; Te Beek, T A H; Pettifer, S R; Thorne, D; Attwood, T K; Vriend, G
2010-07-01
The WHAT IF molecular-modelling and drug design program is widely distributed in the world of protein structure bioinformatics. Although originally designed as an interactive application, its highly modular design and inbuilt control language have recently enabled its deployment as a collection of programmatically accessible web services. We report here a collection of WHAT IF-based protein structure bioinformatics web services: these relate to structure quality, the use of symmetry in crystal structures, structure correction and optimization, adding hydrogens and optimizing hydrogen bonds and a series of geometric calculations. The freely accessible web services are based on the industry standard WS-I profile and the EMBRACE technical guidelines, and are available via both REST and SOAP paradigms. The web services run on a dedicated computational cluster; their function and availability is monitored daily.
Baek, Minkyung; Park, Taeyong; Heo, Lim; Park, Chiwook; Seok, Chaok
2017-07-03
Homo-oligomerization of proteins is abundant in nature, and is often intimately related with the physiological functions of proteins, such as in metabolism, signal transduction or immunity. Information on the homo-oligomer structure is therefore important to obtain a molecular-level understanding of protein functions and their regulation. Currently available web servers predict protein homo-oligomer structures either by template-based modeling using homo-oligomer templates selected from the protein structure database or by ab initio docking of monomer structures resolved by experiment or predicted by computation. The GalaxyHomomer server, freely accessible at http://galaxy.seoklab.org/homomer, carries out template-based modeling, ab initio docking or both depending on the availability of proper oligomer templates. It also incorporates recently developed model refinement methods that can consistently improve model quality. Moreover, the server provides additional options that can be chosen by the user depending on the availability of information on the monomer structure, oligomeric state and locations of unreliable/flexible loops or termini. The performance of the server was better than or comparable to that of other available methods when tested on benchmark sets and in a recent CASP performed in a blind fashion. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Núñez-Vivanco, Gabriel; Valdés-Jiménez, Alejandro; Besoaín, Felipe; Reyes-Parada, Miguel
2016-01-01
Since the structure of proteins is more conserved than the sequence, the identification of conserved three-dimensional (3D) patterns among a set of proteins, can be important for protein function prediction, protein clustering, drug discovery and the establishment of evolutionary relationships. Thus, several computational applications to identify, describe and compare 3D patterns (or motifs) have been developed. Often, these tools consider a 3D pattern as that described by the residues surrounding co-crystallized/docked ligands available from X-ray crystal structures or homology models. Nevertheless, many of the protein structures stored in public databases do not provide information about the location and characteristics of ligand binding sites and/or other important 3D patterns such as allosteric sites, enzyme-cofactor interaction motifs, etc. This makes necessary the development of new ligand-independent methods to search and compare 3D patterns in all available protein structures. Here we introduce Geomfinder, an intuitive, flexible, alignment-free and ligand-independent web server for detailed estimation of similarities between all pairs of 3D patterns detected in any two given protein structures. We used around 1100 protein structures to form pairs of proteins which were assessed with Geomfinder. In these analyses each protein was considered in only one pair (e.g. in a subset of 100 different proteins, 50 pairs of proteins can be defined). Thus: (a) Geomfinder detected identical pairs of 3D patterns in a series of monoamine oxidase-B structures, which corresponded to the effectively similar ligand binding sites at these proteins; (b) we identified structural similarities among pairs of protein structures which are targets of compounds such as acarbose, benzamidine, adenosine triphosphate and pyridoxal phosphate; these similar 3D patterns are not detected using sequence-based methods; (c) the detailed evaluation of three specific cases showed the versatility of Geomfinder, which was able to discriminate between similar and different 3D patterns related to binding sites of common substrates in a range of diverse proteins. Geomfinder allows detecting similar 3D patterns between any two pair of protein structures, regardless of the divergency among their amino acids sequences. Although the software is not intended for simultaneous multiple comparisons in a large number of proteins, it can be particularly useful in cases such as the structure-based design of multitarget drugs, where a detailed analysis of 3D patterns similarities between a few selected protein targets is essential.
TOUCHSTONE II: a new approach to ab initio protein structure prediction.
Zhang, Yang; Kolinski, Andrzej; Skolnick, Jeffrey
2003-08-01
We have developed a new combined approach for ab initio protein structure prediction. The protein conformation is described as a lattice chain connecting C(alpha) atoms, with attached C(beta) atoms and side-chain centers of mass. The model force field includes various short-range and long-range knowledge-based potentials derived from a statistical analysis of the regularities of protein structures. The combination of these energy terms is optimized through the maximization of correlation for 30 x 60,000 decoys between the root mean square deviation (RMSD) to native and energies, as well as the energy gap between native and the decoy ensemble. To accelerate the conformational search, a newly developed parallel hyperbolic sampling algorithm with a composite movement set is used in the Monte Carlo simulation processes. We exploit this strategy to successfully fold 41/100 small proteins (36 approximately 120 residues) with predicted structures having a RMSD from native below 6.5 A in the top five cluster centroids. To fold larger-size proteins as well as to improve the folding yield of small proteins, we incorporate into the basic force field side-chain contact predictions from our threading program PROSPECTOR where homologous proteins were excluded from the data base. With these threading-based restraints, the program can fold 83/125 test proteins (36 approximately 174 residues) with structures having a RMSD to native below 6.5 A in the top five cluster centroids. This shows the significant improvement of folding by using predicted tertiary restraints, especially when the accuracy of side-chain contact prediction is >20%. For native fold selection, we introduce quantities dependent on the cluster density and the combination of energy and free energy, which show a higher discriminative power to select the native structure than the previously used cluster energy or cluster size, and which can be used in native structure identification in blind simulations. These procedures are readily automated and are being implemented on a genomic scale.
Seif, Salem; Planz, Viktoria; Windbergs, Maike
2017-10-01
Proteins play a vital role within the human body by regulating various functions and even serving as structural constituent of many body parts. In this context, protein-based therapeutics have attracted a lot of attention in the last few decades as potential treatment of different diseases. Due to the steadily increasing interest in protein-based therapeutics, different dosage forms were investigated for delivering such complex macromolecules to the human body. Here, electrospun fibers hold a great potential for embedding proteins without structural damage and for controlled release of the protein for therapeutic applications. This review provides a comprehensive overview of the current state of protein-based carrier systems using electrospun fibers, with special emphasis on discussing their potential and key challenges in developing such therapeutic strategies, along with a prospective view of anticipated future directions. © 2017 Deutsche Pharmazeutische Gesellschaft.
KFC Server: interactive forecasting of protein interaction hot spots.
Darnell, Steven J; LeGault, Laura; Mitchell, Julie C
2008-07-01
The KFC Server is a web-based implementation of the KFC (Knowledge-based FADE and Contacts) model-a machine learning approach for the prediction of binding hot spots, or the subset of residues that account for most of a protein interface's; binding free energy. The server facilitates the automated analysis of a user submitted protein-protein or protein-DNA interface and the visualization of its hot spot predictions. For each residue in the interface, the KFC Server characterizes its local structural environment, compares that environment to the environments of experimentally determined hot spots and predicts if the interface residue is a hot spot. After the computational analysis, the user can visualize the results using an interactive job viewer able to quickly highlight predicted hot spots and surrounding structural features within the protein structure. The KFC Server is accessible at http://kfc.mitchell-lab.org.
Engineering Proteins for Thermostability with iRDP Web Server
Ghanate, Avinash; Ramasamy, Sureshkumar; Suresh, C. G.
2015-01-01
Engineering protein molecules with desired structure and biological functions has been an elusive goal. Development of industrially viable proteins with improved properties such as stability, catalytic activity and altered specificity by modifying the structure of an existing protein has widely been targeted through rational protein engineering. Although a range of factors contributing to thermal stability have been identified and widely researched, the in silico implementation of these as strategies directed towards enhancement of protein stability has not yet been explored extensively. A wide range of structural analysis tools is currently available for in silico protein engineering. However these tools concentrate on only a limited number of factors or individual protein structures, resulting in cumbersome and time-consuming analysis. The iRDP web server presented here provides a unified platform comprising of iCAPS, iStability and iMutants modules. Each module addresses different facets of effective rational engineering of proteins aiming towards enhanced stability. While iCAPS aids in selection of target protein based on factors contributing to structural stability, iStability uniquely offers in silico implementation of known thermostabilization strategies in proteins for identification and stability prediction of potential stabilizing mutation sites. iMutants aims to assess mutants based on changes in local interaction network and degree of residue conservation at the mutation sites. Each module was validated using an extensively diverse dataset. The server is freely accessible at http://irdp.ncl.res.in and has no login requirements. PMID:26436543
Engineering Proteins for Thermostability with iRDP Web Server.
Panigrahi, Priyabrata; Sule, Manas; Ghanate, Avinash; Ramasamy, Sureshkumar; Suresh, C G
2015-01-01
Engineering protein molecules with desired structure and biological functions has been an elusive goal. Development of industrially viable proteins with improved properties such as stability, catalytic activity and altered specificity by modifying the structure of an existing protein has widely been targeted through rational protein engineering. Although a range of factors contributing to thermal stability have been identified and widely researched, the in silico implementation of these as strategies directed towards enhancement of protein stability has not yet been explored extensively. A wide range of structural analysis tools is currently available for in silico protein engineering. However these tools concentrate on only a limited number of factors or individual protein structures, resulting in cumbersome and time-consuming analysis. The iRDP web server presented here provides a unified platform comprising of iCAPS, iStability and iMutants modules. Each module addresses different facets of effective rational engineering of proteins aiming towards enhanced stability. While iCAPS aids in selection of target protein based on factors contributing to structural stability, iStability uniquely offers in silico implementation of known thermostabilization strategies in proteins for identification and stability prediction of potential stabilizing mutation sites. iMutants aims to assess mutants based on changes in local interaction network and degree of residue conservation at the mutation sites. Each module was validated using an extensively diverse dataset. The server is freely accessible at http://irdp.ncl.res.in and has no login requirements.
Identification of Conserved Water Sites in Protein Structures for Drug Design.
Jukič, Marko; Konc, Janez; Gobec, Stanislav; Janežič, Dušanka
2017-12-26
Identification of conserved waters in protein structures is a challenging task with applications in molecular docking and protein stability prediction. As an alternative to computationally demanding simulations of proteins in water, experimental cocrystallized waters in the Protein Data Bank (PDB) in combination with a local structure alignment algorithm can be used for reliable prediction of conserved water sites. We developed the ProBiS H2O approach based on the previously developed ProBiS algorithm, which enables identification of conserved water sites in proteins using experimental protein structures from the PDB or a set of custom protein structures available to the user. With a protein structure, a binding site, or an individual water molecule as a query, ProBiS H2O collects similar proteins from the PDB and performs local or binding site-specific superimpositions of the query structure with similar proteins using the ProBiS algorithm. It collects the experimental water molecules from the similar proteins and transposes them to the query protein. Transposed waters are clustered by their mutual proximity, which enables identification of discrete sites in the query protein with high water conservation. ProBiS H2O is a robust and fast new approach that uses existing experimental structural data to identify conserved water sites on the interfaces of protein complexes, for example protein-small molecule interfaces, and elsewhere on the protein structures. It has been successfully validated in several reported proteins in which conserved water molecules were found to play an important role in ligand binding with applications in drug design.
Dos Santos Vasconcelos, Crhisllane Rafaele; de Lima Campos, Túlio; Rezende, Antonio Mauro
2018-03-06
Systematic analysis of a parasite interactome is a key approach to understand different biological processes. It makes possible to elucidate disease mechanisms, to predict protein functions and to select promising targets for drug development. Currently, several approaches for protein interaction prediction for non-model species incorporate only small fractions of the entire proteomes and their interactions. Based on this perspective, this study presents an integration of computational methodologies, protein network predictions and comparative analysis of the protozoan species Leishmania braziliensis and Leishmania infantum. These parasites cause Leishmaniasis, a worldwide distributed and neglected disease, with limited treatment options using currently available drugs. The predicted interactions were obtained from a meta-approach, applying rigid body docking tests and template-based docking on protein structures predicted by different comparative modeling techniques. In addition, we trained a machine-learning algorithm (Gradient Boosting) using docking information performed on a curated set of positive and negative protein interaction data. Our final model obtained an AUC = 0.88, with recall = 0.69, specificity = 0.88 and precision = 0.83. Using this approach, it was possible to confidently predict 681 protein structures and 6198 protein interactions for L. braziliensis, and 708 protein structures and 7391 protein interactions for L. infantum. The predicted networks were integrated to protein interaction data already available, analyzed using several topological features and used to classify proteins as essential for network stability. The present study allowed to demonstrate the importance of integrating different methodologies of interaction prediction to increase the coverage of the protein interaction of the studied protocols, besides it made available protein structures and interactions not previously reported.
Shahbaaz, Mohd; Ahmad, Faizan; Imtaiyaz Hassan, Md
2015-06-01
Haemophilus influenzae is a small pleomorphic Gram-negative bacteria which causes several chronic diseases, including bacteremia, meningitis, cellulitis, epiglottitis, septic arthritis, pneumonia, and empyema. Here we extensively analyzed the sequenced genome of H. influenzae strain Rd KW20 using protein family databases, protein structure prediction, pathways and genome context methods to assign a precise function to proteins whose functions are unknown. These proteins are termed as hypothetical proteins (HPs), for which no experimental information is available. Function prediction of these proteins would surely be supportive to precisely understand the biochemical pathways and mechanism of pathogenesis of Haemophilus influenzae. During the extensive analysis of H. influenzae genome, we found the presence of eight HPs showing lyase activity. Subsequently, we modeled and analyzed three-dimensional structure of all these HPs to determine their functions more precisely. We found these HPs possess cystathionine-β-synthase, cyclase, carboxymuconolactone decarboxylase, pseudouridine synthase A and C, D-tagatose-1,6-bisphosphate aldolase and aminodeoxychorismate lyase-like features, indicating their corresponding functions in the H. influenzae. Lyases are actively involved in the regulation of biosynthesis of various hormones, metabolic pathways, signal transduction, and DNA repair. Lyases are also considered as a key player for various biological processes. These enzymes are critically essential for the survival and pathogenesis of H. influenzae and, therefore, these enzymes may be considered as a potential target for structure-based rational drug design. Our structure-function relationship analysis will be useful to search and design potential lead molecules based on the structure of these lyases, for drug design and discovery.
HDOCK: a web server for protein-protein and protein-DNA/RNA docking based on a hybrid strategy.
Yan, Yumeng; Zhang, Di; Zhou, Pei; Li, Botong; Huang, Sheng-You
2017-07-03
Protein-protein and protein-DNA/RNA interactions play a fundamental role in a variety of biological processes. Determining the complex structures of these interactions is valuable, in which molecular docking has played an important role. To automatically make use of the binding information from the PDB in docking, here we have presented HDOCK, a novel web server of our hybrid docking algorithm of template-based modeling and free docking, in which cases with misleading templates can be rescued by the free docking protocol. The server supports protein-protein and protein-DNA/RNA docking and accepts both sequence and structure inputs for proteins. The docking process is fast and consumes about 10-20 min for a docking run. Tested on the cases with weakly homologous complexes of <30% sequence identity from five docking benchmarks, the HDOCK pipeline tied with template-based modeling on the protein-protein and protein-DNA benchmarks and performed better than template-based modeling on the three protein-RNA benchmarks when the top 10 predictions were considered. The performance of HDOCK became better when more predictions were considered. Combining the results of HDOCK and template-based modeling by ranking first of the template-based model further improved the predictive power of the server. The HDOCK web server is available at http://hdock.phys.hust.edu.cn/. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Combining functional and structural genomics to sample the essential Burkholderia structome.
Baugh, Loren; Gallagher, Larry A; Patrapuvich, Rapatbhorn; Clifton, Matthew C; Gardberg, Anna S; Edwards, Thomas E; Armour, Brianna; Begley, Darren W; Dieterich, Shellie H; Dranow, David M; Abendroth, Jan; Fairman, James W; Fox, David; Staker, Bart L; Phan, Isabelle; Gillespie, Angela; Choi, Ryan; Nakazawa-Hewitt, Steve; Nguyen, Mary Trang; Napuli, Alberto; Barrett, Lynn; Buchko, Garry W; Stacy, Robin; Myler, Peter J; Stewart, Lance J; Manoil, Colin; Van Voorhis, Wesley C
2013-01-01
The genus Burkholderia includes pathogenic gram-negative bacteria that cause melioidosis, glanders, and pulmonary infections of patients with cancer and cystic fibrosis. Drug resistance has made development of new antimicrobials critical. Many approaches to discovering new antimicrobials, such as structure-based drug design and whole cell phenotypic screens followed by lead refinement, require high-resolution structures of proteins essential to the parasite. We experimentally identified 406 putative essential genes in B. thailandensis, a low-virulence species phylogenetically similar to B. pseudomallei, the causative agent of melioidosis, using saturation-level transposon mutagenesis and next-generation sequencing (Tn-seq). We selected 315 protein products of these genes based on structure-determination criteria, such as excluding very large and/or integral membrane proteins, and entered them into the Seattle Structural Genomics Center for Infection Disease (SSGCID) structure determination pipeline. To maximize structural coverage of these targets, we applied an "ortholog rescue" strategy for those producing insoluble or difficult to crystallize proteins, resulting in the addition of 387 orthologs (or paralogs) from seven other Burkholderia species into the SSGCID pipeline. This structural genomics approach yielded structures from 31 putative essential targets from B. thailandensis, and 25 orthologs from other Burkholderia species, yielding an overall structural coverage for 49 of the 406 essential gene families, with a total of 88 depositions into the Protein Data Bank. Of these, 25 proteins have properties of a potential antimicrobial drug target i.e., no close human homolog, part of an essential metabolic pathway, and a deep binding pocket. We describe the structures of several potential drug targets in detail. This collection of structures, solubility and experimental essentiality data provides a resource for development of drugs against infections and diseases caused by Burkholderia. All expression clones and proteins created in this study are freely available by request.
Leuthaeuser, Janelle B; Knutson, Stacy T; Kumar, Kiran; Babbitt, Patricia C; Fetrow, Jacquelyn S
2015-01-01
The development of accurate protein function annotation methods has emerged as a major unsolved biological problem. Protein similarity networks, one approach to function annotation via annotation transfer, group proteins into similarity-based clusters. An underlying assumption is that the edge metric used to identify such clusters correlates with functional information. In this contribution, this assumption is evaluated by observing topologies in similarity networks using three different edge metrics: sequence (BLAST), structure (TM-Align), and active site similarity (active site profiling, implemented in DASP). Network topologies for four well-studied protein superfamilies (enolase, peroxiredoxin (Prx), glutathione transferase (GST), and crotonase) were compared with curated functional hierarchies and structure. As expected, network topology differs, depending on edge metric; comparison of topologies provides valuable information on structure/function relationships. Subnetworks based on active site similarity correlate with known functional hierarchies at a single edge threshold more often than sequence- or structure-based networks. Sequence- and structure-based networks are useful for identifying sequence and domain similarities and differences; therefore, it is important to consider the clustering goal before deciding appropriate edge metric. Further, conserved active site residues identified in enolase and GST active site subnetworks correspond with published functionally important residues. Extension of this analysis yields predictions of functionally determinant residues for GST subgroups. These results support the hypothesis that active site similarity-based networks reveal clusters that share functional details and lay the foundation for capturing functionally relevant hierarchies using an approach that is both automatable and can deliver greater precision in function annotation than current similarity-based methods. PMID:26073648
Shin, Jae-Min; Cho, Doo-Ho
2005-01-01
PDB-Ligand (http://www.idrtech.com/PDB-Ligand/) is a three-dimensional structure database of small molecular ligands that are bound to larger biomolecules deposited in the Protein Data Bank (PDB). It is also a database tool that allows one to browse, classify, superimpose and visualize these structures. As of May 2004, there are about 4870 types of small molecular ligands, experimentally determined as a complex with protein or DNA in the PDB. The proteins that a given ligand binds are often homologous and present the same binding structure to the ligand. However, there are also many instances wherein a given ligand binds to two or more unrelated proteins, or to the same or homologous protein in different binding environments. PDB-Ligand serves as an interactive structural analysis and clustering tool for all the ligand-binding structures in the PDB. PDB-Ligand also provides an easier way to obtain a number of different structure alignments of many related ligand-binding structures based on a simple and flexible ligand clustering method. PDB-Ligand will be a good resource for both a better interpretation of ligand-binding structures and the development of better scoring functions to be used in many drug discovery applications.
(PS)2: protein structure prediction server version 3.0.
Huang, Tsun-Tsao; Hwang, Jenn-Kang; Chen, Chu-Huang; Chu, Chih-Sheng; Lee, Chi-Wen; Chen, Chih-Chieh
2015-07-01
Protein complexes are involved in many biological processes. Examining coupling between subunits of a complex would be useful to understand the molecular basis of protein function. Here, our updated (PS)(2) web server predicts the three-dimensional structures of protein complexes based on comparative modeling; furthermore, this server examines the coupling between subunits of the predicted complex by combining structural and evolutionary considerations. The predicted complex structure could be indicated and visualized by Java-based 3D graphics viewers and the structural and evolutionary profiles are shown and compared chain-by-chain. For each subunit, considerations with or without the packing contribution of other subunits cause the differences in similarities between structural and evolutionary profiles, and these differences imply which form, complex or monomeric, is preferred in the biological condition for the subunit. We believe that the (PS)(2) server would be a useful tool for biologists who are interested not only in the structures of protein complexes but also in the coupling between subunits of the complexes. The (PS)(2) is freely available at http://ps2v3.life.nctu.edu.tw/. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
Güssregen, Stefan; Matter, Hans; Hessler, Gerhard; Lionta, Evanthia; Heil, Jochen; Kast, Stefan M
2017-07-24
Water molecules play an essential role for mediating interactions between ligands and protein binding sites. Displacement of specific water molecules can favorably modulate the free energy of binding of protein-ligand complexes. Here, the nature of water interactions in protein binding sites is investigated by 3D RISM (three-dimensional reference interaction site model) integral equation theory to understand and exploit local thermodynamic features of water molecules by ranking their possible displacement in structure-based design. Unlike molecular dynamics-based approaches, 3D RISM theory allows for fast and noise-free calculations using the same detailed level of solute-solvent interaction description. Here we correlate molecular water entities instead of mere site density maxima with local contributions to the solvation free energy using novel algorithms. Distinct water molecules and hydration sites are investigated in multiple protein-ligand X-ray structures, namely streptavidin, factor Xa, and factor VIIa, based on 3D RISM-derived free energy density fields. Our approach allows the semiquantitative assessment of whether a given structural water molecule can potentially be targeted for replacement in structure-based design. Finally, PLS-based regression models from free energy density fields used within a 3D-QSAR approach (CARMa - comparative analysis of 3D RISM Maps) are shown to be able to extract relevant information for the interpretation of structure-activity relationship (SAR) trends, as demonstrated for a series of serine protease inhibitors.
Improved in-cell structure determination of proteins at near-physiological concentration
Ikeya, Teppei; Hanashima, Tomomi; Hosoya, Saori; Shimazaki, Manato; Ikeda, Shiro; Mishima, Masaki; Güntert, Peter; Ito, Yutaka
2016-01-01
Investigating three-dimensional (3D) structures of proteins in living cells by in-cell nuclear magnetic resonance (NMR) spectroscopy opens an avenue towards understanding the structural basis of their functions and physical properties under physiological conditions inside cells. In-cell NMR provides data at atomic resolution non-invasively, and has been used to detect protein-protein interactions, thermodynamics of protein stability, the behavior of intrinsically disordered proteins, etc. in cells. However, so far only a single de novo 3D protein structure could be determined based on data derived only from in-cell NMR. Here we introduce methods that enable in-cell NMR protein structure determination for a larger number of proteins at concentrations that approach physiological ones. The new methods comprise (1) advances in the processing of non-uniformly sampled NMR data, which reduces the measurement time for the intrinsically short-lived in-cell NMR samples, (2) automatic chemical shift assignment for obtaining an optimal resonance assignment, and (3) structure refinement with Bayesian inference, which makes it possible to calculate accurate 3D protein structures from sparse data sets of conformational restraints. As an example application we determined the structure of the B1 domain of protein G at about 250 μM concentration in living E. coli cells. PMID:27910948
Small Scaffolds, Big Potential: Developing Miniature Proteins as Therapeutic Agents.
Holub, Justin M
2017-09-01
Preclinical Research Miniature proteins are a class of oligopeptide characterized by their short sequence lengths and ability to adopt well-folded, three-dimensional structures. Because of their biomimetic nature and synthetic tractability, miniature proteins have been used to study a range of biochemical processes including fast protein folding, signal transduction, catalysis and molecular transport. Recently, miniature proteins have been gaining traction as potential therapeutic agents because their small size and ability to fold into defined tertiary structures facilitates their development as protein-based drugs. This research overview discusses emerging developments involving the use of miniature proteins as scaffolds to design novel therapeutics for the treatment and study of human disease. Specifically, this review will explore strategies to: (i) stabilize miniature protein tertiary structure; (ii) optimize biomolecular recognition by grafting functional epitopes onto miniature protein scaffolds; and (iii) enhance cytosolic delivery of miniature proteins through the use of cationic motifs that facilitate endosomal escape. These objectives are discussed not only to address challenges in developing effective miniature protein-based drugs, but also to highlight the tremendous potential miniature proteins hold for combating and understanding human disease. Drug Dev Res 78 : 268-282, 2017. © 2017 Wiley Periodicals, Inc. © 2017 Wiley Periodicals, Inc.
Challenges in NMR-based structural genomics
NASA Astrophysics Data System (ADS)
Sue, Shih-Che; Chang, Chi-Fon; Huang, Yao-Te; Chou, Ching-Yu; Huang, Tai-huang
2005-05-01
Understanding the functions of the vast number of proteins encoded in many genomes that have been completely sequenced recently is the main challenge for biologists in the post-genomics era. Since the function of a protein is determined by its exact three-dimensional structure it is paramount to determine the 3D structures of all proteins. This need has driven structural biologists to undertake the structural genomics project aimed at determining the structures of all known proteins. Several centers for structural genomics studies have been established throughout the world. Nuclear magnetic resonance (NMR) spectroscopy has played a major role in determining protein structures in atomic details and in a physiologically relevant solution state. Since the number of new genes being discovered daily far exceeds the number of structures determined by both NMR and X-ray crystallography, a high-throughput method for speeding up the process of protein structure determination is essential for the success of the structural genomics effort. In this article we will describe NMR methods currently being employed for protein structure determination. We will also describe methods under development which may drastically increase the throughput, as well as point out areas where opportunities exist for biophysicists to make significant contribution in this important field.
Ensemble-based evaluation for protein structure models.
Jamroz, Michal; Kolinski, Andrzej; Kihara, Daisuke
2016-06-15
Comparing protein tertiary structures is a fundamental procedure in structural biology and protein bioinformatics. Structure comparison is important particularly for evaluating computational protein structure models. Most of the model structure evaluation methods perform rigid body superimposition of a structure model to its crystal structure and measure the difference of the corresponding residue or atom positions between them. However, these methods neglect intrinsic flexibility of proteins by treating the native structure as a rigid molecule. Because different parts of proteins have different levels of flexibility, for example, exposed loop regions are usually more flexible than the core region of a protein structure, disagreement of a model to the native needs to be evaluated differently depending on the flexibility of residues in a protein. We propose a score named FlexScore for comparing protein structures that consider flexibility of each residue in the native state of proteins. Flexibility information may be extracted from experiments such as NMR or molecular dynamics simulation. FlexScore considers an ensemble of conformations of a protein described as a multivariate Gaussian distribution of atomic displacements and compares a query computational model with the ensemble. We compare FlexScore with other commonly used structure similarity scores over various examples. FlexScore agrees with experts' intuitive assessment of computational models and provides information of practical usefulness of models. https://bitbucket.org/mjamroz/flexscore dkihara@purdue.edu Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.
Ensemble-based evaluation for protein structure models
Jamroz, Michal; Kolinski, Andrzej; Kihara, Daisuke
2016-01-01
Motivation: Comparing protein tertiary structures is a fundamental procedure in structural biology and protein bioinformatics. Structure comparison is important particularly for evaluating computational protein structure models. Most of the model structure evaluation methods perform rigid body superimposition of a structure model to its crystal structure and measure the difference of the corresponding residue or atom positions between them. However, these methods neglect intrinsic flexibility of proteins by treating the native structure as a rigid molecule. Because different parts of proteins have different levels of flexibility, for example, exposed loop regions are usually more flexible than the core region of a protein structure, disagreement of a model to the native needs to be evaluated differently depending on the flexibility of residues in a protein. Results: We propose a score named FlexScore for comparing protein structures that consider flexibility of each residue in the native state of proteins. Flexibility information may be extracted from experiments such as NMR or molecular dynamics simulation. FlexScore considers an ensemble of conformations of a protein described as a multivariate Gaussian distribution of atomic displacements and compares a query computational model with the ensemble. We compare FlexScore with other commonly used structure similarity scores over various examples. FlexScore agrees with experts’ intuitive assessment of computational models and provides information of practical usefulness of models. Availability and implementation: https://bitbucket.org/mjamroz/flexscore Contact: dkihara@purdue.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27307633
Lee, Sang-Chul; Hong, Seungpyo; Park, Keunwan; Jeon, Young Ho; Kim, Dongsup; Cheong, Hae-Kap; Kim, Hak-Sung
2012-01-01
Repeat proteins are increasingly attracting much attention as alternative scaffolds to immunoglobulin antibodies due to their unique structural features. Nonetheless, engineering interaction interface and understanding molecular basis for affinity maturation of repeat proteins still remain a challenge. Here, we present a structure-based rational design of a repeat protein with high binding affinity for a target protein. As a model repeat protein, a Toll-like receptor4 (TLR4) decoy receptor composed of leucine-rich repeat (LRR) modules was used, and its interaction interface was rationally engineered to increase the binding affinity for myeloid differentiation protein 2 (MD2). Based on the complex crystal structure of the decoy receptor with MD2, we first designed single amino acid substitutions in the decoy receptor, and obtained three variants showing a binding affinity (KD) one-order of magnitude higher than the wild-type decoy receptor. The interacting modes and contributions of individual residues were elucidated by analyzing the crystal structures of the single variants. To further increase the binding affinity, single positive mutations were combined, and two double mutants were shown to have about 3000- and 565-fold higher binding affinities than the wild-type decoy receptor. Molecular dynamics simulations and energetic analysis indicate that an additive effect by two mutations occurring at nearby modules was the major contributor to the remarkable increase in the binding affinities. PMID:22363519
Parallel seed-based approach to multiple protein structure similarities detection
Chapuis, Guillaume; Le Boudic-Jamin, Mathilde; Andonov, Rumen; ...
2015-01-01
Finding similarities between protein structures is a crucial task in molecular biology. Most of the existing tools require proteins to be aligned in order-preserving way and only find single alignments even when multiple similar regions exist. We propose a new seed-based approach that discovers multiple pairs of similar regions. Its computational complexity is polynomial and it comes with a quality guarantee—the returned alignments have both root mean squared deviations (coordinate-based as well as internal-distances based) lower than a given threshold, if such exist. We do not require the alignments to be order preserving (i.e., we consider nonsequential alignments), which makesmore » our algorithm suitable for detecting similar domains when comparing multidomain proteins as well as to detect structural repetitions within a single protein. Because the search space for nonsequential alignments is much larger than for sequential ones, the computational burden is addressed by extensive use of parallel computing techniques: a coarse-grain level parallelism making use of available CPU cores for computation and a fine-grain level parallelism exploiting bit-level concurrency as well as vector instructions.« less
Uehara, Shota; Tanaka, Shigenori
2017-04-24
Protein flexibility is a major hurdle in current structure-based virtual screening (VS). In spite of the recent advances in high-performance computing, protein-ligand docking methods still demand tremendous computational cost to take into account the full degree of protein flexibility. In this context, ensemble docking has proven its utility and efficiency for VS studies, but it still needs a rational and efficient method to select and/or generate multiple protein conformations. Molecular dynamics (MD) simulations are useful to produce distinct protein conformations without abundant experimental structures. In this study, we present a novel strategy that makes use of cosolvent-based molecular dynamics (CMD) simulations for ensemble docking. By mixing small organic molecules into a solvent, CMD can stimulate dynamic protein motions and induce partial conformational changes of binding pocket residues appropriate for the binding of diverse ligands. The present method has been applied to six diverse target proteins and assessed by VS experiments using many actives and decoys of DEKOIS 2.0. The simulation results have revealed that the CMD is beneficial for ensemble docking. Utilizing cosolvent simulation allows the generation of druggable protein conformations, improving the VS performance compared with the use of a single experimental structure or ensemble docking by standard MD with pure water as the solvent.
NMR-based automated protein structure determination.
Würz, Julia M; Kazemi, Sina; Schmidt, Elena; Bagaria, Anurag; Güntert, Peter
2017-08-15
NMR spectra analysis for protein structure determination can now in many cases be performed by automated computational methods. This overview of the computational methods for NMR protein structure analysis presents recent automated methods for signal identification in multidimensional NMR spectra, sequence-specific resonance assignment, collection of conformational restraints, and structure calculation, as implemented in the CYANA software package. These algorithms are sufficiently reliable and integrated into one software package to enable the fully automated structure determination of proteins starting from NMR spectra without manual interventions or corrections at intermediate steps, with an accuracy of 1-2 Å backbone RMSD in comparison with manually solved reference structures. Copyright © 2017 Elsevier Inc. All rights reserved.
Huber, Roland G.; Bond, Peter J.
2017-01-01
An improved knowledge of protein-protein interactions is essential for better understanding of metabolic and signaling networks, and cellular function. Progress tends to be based on structure determination and predictions using known structures, along with computational methods based on evolutionary information or detailed atomistic descriptions. We hypothesized that for the case of interactions across a common interface, between proteins from a pair of paralogue families or within a family of paralogues, a relatively simple interface description could distinguish between binding and non-binding pairs. Using binding data for several systems, and large-scale comparative modeling based on known template complex structures, it is found that charge-charge interactions (for groups bearing net charge) are generally a better discriminant than buried non-polar surface. This is particularly the case for paralogue families that are less divergent, with more reliable comparative modeling. We suggest that electrostatic interactions are major determinants of specificity in such systems, an observation that could be used to predict binding partners. PMID:29016650
Ivanov, Stefan M; Cawley, Andrew; Huber, Roland G; Bond, Peter J; Warwicker, Jim
2017-01-01
An improved knowledge of protein-protein interactions is essential for better understanding of metabolic and signaling networks, and cellular function. Progress tends to be based on structure determination and predictions using known structures, along with computational methods based on evolutionary information or detailed atomistic descriptions. We hypothesized that for the case of interactions across a common interface, between proteins from a pair of paralogue families or within a family of paralogues, a relatively simple interface description could distinguish between binding and non-binding pairs. Using binding data for several systems, and large-scale comparative modeling based on known template complex structures, it is found that charge-charge interactions (for groups bearing net charge) are generally a better discriminant than buried non-polar surface. This is particularly the case for paralogue families that are less divergent, with more reliable comparative modeling. We suggest that electrostatic interactions are major determinants of specificity in such systems, an observation that could be used to predict binding partners.
The Structure and Function of Non-Collagenous Bone Proteins
NASA Technical Reports Server (NTRS)
Hook, Magnus; McQuillan, David J.
1997-01-01
The research done under the cooperative research agreement for the project titled 'The structure and function of non-collagenous bone proteins' represented the first phase of an ongoing program to define the structural and functional relationships of the principal noncollagenous proteins in bone. An ultimate goal of this research is to enable design and execution of useful pharmacological compounds that will have a beneficial effect in treatment of osteoporosis, both land-based and induced by long-duration space travel. The goals of the now complete first phase were as follows: 1. Establish and/or develop powerful recombinant protein expression systems; 2. Develop and refine isolation and purification of recombinant proteins; 3. Express wild-type non-collagenous bone proteins; 4. Express site-specific mutant proteins and domains of wild-type proteins to enhance likelihood of crystal formation for subsequent solution of structure.
In silico analysis of fragile histidine triad involved in regression of carcinoma.
Rasheed, Muhammad Asif; Tariq, Fatima; Afzal, Sara; Mannanv, Shazia
2017-04-01
Hepatocellular carcinoma (HCCa) is a primary malignancy of the liver. Many different proteins are involved in HCCa including insulin growth factor (IGF) II , signal transducers and activators of transcription (STAT) 3, STAT4, mothers against decapentaplegic homolog 4 (SMAD 4), fragile histidine triad (FHIT) and selective internal radiation therapy (SIRT) etc. The present study is based on the bioinformatics analysis of FHIT protein in order to understand the proteomics aspect and improvement of the diagnosis of the disease based on the protein. Different information related to protein were gathered from different databases, including National Centre for Biotechnology Information (NCBI) Gene, Protein and Online Mendelian Inheritance in Man (OMIM) databases, Uniprot database, String database and Kyoto Encyclopedia of Genes and Genomes (KEGG) database. Moreover, the structure of the protein and evaluation of the quality of the structure were included from Easy modeler programme. Hence, this analysis not only helped to gather information related to the protein at one place, but also analysed the structure and quality of the protein to conclude that the protein has a role in carcinoma.
Amporndanai, Kangsa; O’Neill, Paul M.
2018-01-01
Cytochrome bc 1, a dimeric multi-subunit electron-transport protein embedded in the inner mitochondrial membrane, is a major drug target for the treatment and prevention of malaria and toxoplasmosis. Structural studies of cytochrome bc 1 from mammalian homologues co-crystallized with lead compounds have underpinned structure-based drug design to develop compounds with higher potency and selectivity. However, owing to the limited amount of cytochrome bc 1 that may be available from parasites, all efforts have been focused on homologous cytochrome bc 1 complexes from mammalian species, which has resulted in the failure of some drug candidates owing to toxicity in the host. Crystallographic studies of the native parasite proteins are not feasible owing to limited availability of the proteins. Here, it is demonstrated that cytochrome bc 1 is highly amenable to single-particle cryo-EM (which uses significantly less protein) by solving the apo and two inhibitor-bound structures to ∼4.1 Å resolution, revealing clear inhibitor density at the binding site. Therefore, cryo-EM is proposed as a viable alternative method for structure-based drug discovery using both host and parasite enzymes. PMID:29765610
Ferrada, Evandro; Vergara, Ismael A; Melo, Francisco
2007-01-01
The correct discrimination between native and near-native protein conformations is essential for achieving accurate computer-based protein structure prediction. However, this has proven to be a difficult task, since currently available physical energy functions, empirical potentials and statistical scoring functions are still limited in achieving this goal consistently. In this work, we assess and compare the ability of different full atom knowledge-based potentials to discriminate between native protein structures and near-native protein conformations generated by comparative modeling. Using a benchmark of 152 near-native protein models and their corresponding native structures that encompass several different folds, we demonstrate that the incorporation of close non-bonded pairwise atom terms improves the discriminating power of the empirical potentials. Since the direct and unbiased derivation of close non-bonded terms from current experimental data is not possible, we obtained and used those terms from the corresponding pseudo-energy functions of a non-local knowledge-based potential. It is shown that this methodology significantly improves the discrimination between native and near-native protein conformations, suggesting that a proper description of close non-bonded terms is important to achieve a more complete and accurate description of native protein conformations. Some external knowledge-based energy functions that are widely used in model assessment performed poorly, indicating that the benchmark of models and the specific discrimination task tested in this work constitutes a difficult challenge.
F2Dock: Fast Fourier Protein-Protein Docking
Bajaj, Chandrajit; Chowdhury, Rezaul; Siddavanahalli, Vinay
2009-01-01
The functions of proteins is often realized through their mutual interactions. Determining a relative transformation for a pair of proteins and their conformations which form a stable complex, reproducible in nature, is known as docking. It is an important step in drug design, structure determination and understanding function and structure relationships. In this paper we extend our non-uniform fast Fourier transform docking algorithm to include an adaptive search phase (both translational and rotational) and thereby speed up its execution. We have also implemented a multithreaded version of the adaptive docking algorithm for even faster execution on multicore machines. We call this protein-protein docking code F2Dock (F2 = Fast Fourier). We have calibrated F2Dock based on an extensive experimental study on a list of benchmark complexes and conclude that F2Dock works very well in practice. Though all docking results reported in this paper use shape complementarity and Coulombic potential based scores only, F2Dock is structured to incorporate Lennard-Jones potential and re-ranking docking solutions based on desolvation energy. PMID:21071796
G Protein-Coupled Receptor Rhodopsin: A Prospectus
Filipek, Sławomir; Stenkamp, Ronald E.; Teller, David C.; Palczewski, Krzysztof
2006-01-01
Rhodopsin is a retinal photoreceptor protein of bipartite structure consisting of the transmembrane protein opsin and a light-sensitive chromophore 11-cis-retinal, linked to opsin via a protonated Schiff base. Studies on rhodopsin have unveiled many structural and functional features that are common to a large and pharmacologically important group of proteins from the G protein-coupled receptor (GPCR) superfamily, of which rhodopsin is the best-studied member. In this work, we focus on structural features of rhodopsin as revealed by many biochemical and structural investigations. In particular, the high-resolution structure of bovine rhodopsin provides a template for understanding how GPCRs work. We describe the sensitivity and complexity of rhodopsin that lead to its important role in vision. PMID:12471166
Brender, Jeffrey R.; Zhang, Yang
2015-01-01
The formation of protein-protein complexes is essential for proteins to perform their physiological functions in the cell. Mutations that prevent the proper formation of the correct complexes can have serious consequences for the associated cellular processes. Since experimental determination of protein-protein binding affinity remains difficult when performed on a large scale, computational methods for predicting the consequences of mutations on binding affinity are highly desirable. We show that a scoring function based on interface structure profiles collected from analogous protein-protein interactions in the PDB is a powerful predictor of protein binding affinity changes upon mutation. As a standalone feature, the differences between the interface profile score of the mutant and wild-type proteins has an accuracy equivalent to the best all-atom potentials, despite being two orders of magnitude faster once the profile has been constructed. Due to its unique sensitivity in collecting the evolutionary profiles of analogous binding interactions and the high speed of calculation, the interface profile score has additional advantages as a complementary feature to combine with physics-based potentials for improving the accuracy of composite scoring approaches. By incorporating the sequence-derived and residue-level coarse-grained potentials with the interface structure profile score, a composite model was constructed through the random forest training, which generates a Pearson correlation coefficient >0.8 between the predicted and observed binding free-energy changes upon mutation. This accuracy is comparable to, or outperforms in most cases, the current best methods, but does not require high-resolution full-atomic models of the mutant structures. The binding interface profiling approach should find useful application in human-disease mutation recognition and protein interface design studies. PMID:26506533
Xu, Xianjin; Qiu, Liming; Yan, Chengfei; Ma, Zhiwei; Grinter, Sam Z; Zou, Xiaoqin
2017-03-01
Protein-protein interactions are either through direct contacts between two binding partners or mediated by structural waters. Both direct contacts and water-mediated interactions are crucial to the formation of a protein-protein complex. During the recent CAPRI rounds, a novel parallel searching strategy for predicting water-mediated interactions is introduced into our protein-protein docking method, MDockPP. Briefly, a FFT-based docking algorithm is employed in generating putative binding modes, and an iteratively derived statistical potential-based scoring function, ITScorePP, in conjunction with biological information is used to assess and rank the binding modes. Up to 10 binding modes are selected as the initial protein-protein complex structures for MD simulations in explicit solvent. Water molecules near the interface are clustered based on the snapshots extracted from independent equilibrated trajectories. Then, protein-ligand docking is employed for a parallel search for water molecules near the protein-protein interface. The water molecules generated by ligand docking and the clustered water molecules generated by MD simulations are merged, referred to as the predicted structural water molecules. Here, we report the performance of this protocol for CAPRI rounds 28-29 and 31-35 containing 20 valid docking targets and 11 scoring targets. In the docking experiments, we predicted correct binding modes for nine targets, including one high-accuracy, two medium-accuracy, and six acceptable predictions. Regarding the two targets for the prediction of water-mediated interactions, we achieved models ranked as "excellent" in accordance with the CAPRI evaluation criteria; one of these two targets is considered as a difficult target for structural water prediction. Proteins 2017; 85:424-434. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
Raval, Alpan; Piana, Stefano; Eastwood, Michael P; Shaw, David E
2016-01-01
Molecular dynamics (MD) simulation is a well-established tool for the computational study of protein structure and dynamics, but its application to the important problem of protein structure prediction remains challenging, in part because extremely long timescales can be required to reach the native structure. Here, we examine the extent to which the use of low-resolution information in the form of residue-residue contacts, which can often be inferred from bioinformatics or experimental studies, can accelerate the determination of protein structure in simulation. We incorporated sets of 62, 31, or 15 contact-based restraints in MD simulations of ubiquitin, a benchmark system known to fold to the native state on the millisecond timescale in unrestrained simulations. One-third of the restrained simulations folded to the native state within a few tens of microseconds-a speedup of over an order of magnitude compared with unrestrained simulations and a demonstration of the potential for limited amounts of structural information to accelerate structure determination. Almost all of the remaining ubiquitin simulations reached near-native conformations within a few tens of microseconds, but remained trapped there, apparently due to the restraints. We discuss potential methodological improvements that would facilitate escape from these near-native traps and allow more simulations to quickly reach the native state. Finally, using a target from the Critical Assessment of protein Structure Prediction (CASP) experiment, we show that distance restraints can improve simulation accuracy: In our simulations, restraints stabilized the native state of the protein, enabling a reasonable structural model to be inferred. © 2015 The Authors Protein Science published by Wiley Periodicals, Inc. on behalf of The Protein Society.
Dias, David M.; Ciulli, Alessio
2014-01-01
Nuclear magnetic resonance (NMR) spectroscopy is a pivotal method for structure-based and fragment-based lead discovery because it is one of the most robust techniques to provide information on protein structure, dynamics and interaction at an atomic level in solution. Nowadays, in most ligand screening cascades, NMR-based methods are applied to identify and structurally validate small molecule binding. These can be high-throughput and are often used synergistically with other biophysical assays. Here, we describe current state-of-the-art in the portfolio of available NMR-based experiments that are used to aid early-stage lead discovery. We then focus on multi-protein complexes as targets and how NMR spectroscopy allows studying of interactions within the high molecular weight assemblies that make up a vast fraction of the yet untargeted proteome. Finally, we give our perspective on how currently available methods could build an improved strategy for drug discovery against such challenging targets. PMID:25175337
Chandonia, John-Marc; Fox, Naomi K; Brenner, Steven E
2017-02-03
SCOPe (Structural Classification of Proteins-extended, http://scop.berkeley.edu) is a database of relationships between protein structures that extends the Structural Classification of Proteins (SCOP) database. SCOP is an expert-curated ordering of domains from the majority of proteins of known structure in a hierarchy according to structural and evolutionary relationships. SCOPe classifies the majority of protein structures released since SCOP development concluded in 2009, using a combination of manual curation and highly precise automated tools, aiming to have the same accuracy as fully hand-curated SCOP releases. SCOPe also incorporates and updates the ASTRAL compendium, which provides several databases and tools to aid in the analysis of the sequences and structures of proteins classified in SCOPe. SCOPe continues high-quality manual classification of new superfamilies, a key feature of SCOP. Artifacts such as expression tags are now separated into their own class, in order to distinguish them from the homology-based annotations in the remainder of the SCOPe hierarchy. SCOPe 2.06 contains 77,439 Protein Data Bank entries, double the 38,221 structures classified in SCOP. Copyright © 2016 The Author(s). Published by Elsevier Ltd.. All rights reserved.
Rigid-Docking Approaches to Explore Protein-Protein Interaction Space.
Matsuzaki, Yuri; Uchikoga, Nobuyuki; Ohue, Masahito; Akiyama, Yutaka
Protein-protein interactions play core roles in living cells, especially in the regulatory systems. As information on proteins has rapidly accumulated on publicly available databases, much effort has been made to obtain a better picture of protein-protein interaction networks using protein tertiary structure data. Predicting relevant interacting partners from their tertiary structure is a challenging task and computer science methods have the potential to assist with this. Protein-protein rigid docking has been utilized by several projects, docking-based approaches having the advantages that they can suggest binding poses of predicted binding partners which would help in understanding the interaction mechanisms and that comparing docking results of both non-binders and binders can lead to understanding the specificity of protein-protein interactions from structural viewpoints. In this review we focus on explaining current computational prediction methods to predict pairwise direct protein-protein interactions that form protein complexes.
Designing protein-based biomaterials for medical applications.
Gagner, Jennifer E; Kim, Wookhyun; Chaikof, Elliot L
2014-04-01
Biomaterials produced by nature have been honed through billions of years, evolving exquisitely precise structure-function relationships that scientists strive to emulate. Advances in genetic engineering have facilitated extensive investigations to determine how changes in even a single peptide within a protein sequence can produce biomaterials with unique thermal, mechanical and biological properties. Elastin, a naturally occurring protein polymer, serves as a model protein to determine the relationship between specific structural elements and desirable material characteristics. The modular, repetitive nature of the protein facilitates the formation of well-defined secondary structures with the ability to self-assemble into complex three-dimensional architectures on a variety of length scales. Furthermore, many opportunities exist to incorporate other protein-based motifs and inorganic materials into recombinant protein-based materials, extending the range and usefulness of these materials in potential biomedical applications. Elastin-like polypeptides (ELPs) can be assembled into 3-D architectures with precise control over payload encapsulation, mechanical and thermal properties, as well as unique functionalization opportunities through both genetic and enzymatic means. An overview of current protein-based materials, their properties and uses in biomedicine will be provided, with a focus on the advantages of ELPs. Applications of these biomaterials as imaging and therapeutic delivery agents will be discussed. Finally, broader implications and future directions of these materials as diagnostic and therapeutic systems will be explored. Copyright © 2013 Elsevier Ltd. All rights reserved.
Designing Protein-Based Biomaterials for Medical Applications
Gagner, Jennifer E.; Kim, Wookhyun; Chaikof, Elliot L.
2013-01-01
Biomaterials produced by nature have been honed through billions of years, evolving exquisitely precise structure-function relationships that scientists strive to emulate. Advances in genetic engineering have facilitated extensive investigations to determine how changes in even a single peptide within a protein sequence can produce biomaterials with unique thermal, mechanical and biological properties. Elastin, a naturally occurring protein polymer, serves as a model protein to determine the relationship between specific structural elements and desirable material characteristics. The modular, repetitive nature of the protein facilitates the formation of well-defined secondary structures with the ability to self-assemble into complex three-dimensional architectures on a variety of length scales. Furthermore, many opportunities exist to incorporate other protein-based motifs and inorganic materials into recombinant protein-based materials, extending the range and usefulness of these materials in potential biomedical applications. Elastin-like polypeptides can be assembled into 3D architectures with precise control over payload encapsulation, mechanical and thermal properties, as well as unique functionalization opportunities through both genetic and enzymatic means. An overview of current protein-based materials, their properties and uses in biomedicine will be provided, with a focus on the advantages of elastin-like polypeptides. Applications of these biomaterials as imaging and therapeutic delivery agents will be discussed. Finally, broader implications and future directions of these materials as diagnostic and therapeutic systems will be explored. PMID:24121196
Understand protein functions by comparing the similarity of local structural environments.
Chen, Jiawen; Xie, Zhong-Ru; Wu, Yinghao
2017-02-01
The three-dimensional structures of proteins play an essential role in regulating binding between proteins and their partners, offering a direct relationship between structures and functions of proteins. It is widely accepted that the function of a protein can be determined if its structure is similar to other proteins whose functions are known. However, it is also observed that proteins with similar global structures do not necessarily correspond to the same function, while proteins with very different folds can share similar functions. This indicates that function similarity is originated from the local structural information of proteins instead of their global shapes. We assume that proteins with similar local environments prefer binding to similar types of molecular targets. In order to testify this assumption, we designed a new structural indicator to define the similarity of local environment between residues in different proteins. This indicator was further used to calculate the probability that a given residue binds to a specific type of structural neighbors, including DNA, RNA, small molecules and proteins. After applying the method to a large-scale non-redundant database of proteins, we show that the positive signal of binding probability calculated from the local structural indicator is statistically meaningful. In summary, our studies suggested that the local environment of residues in a protein is a good indicator to recognize specific binding partners of the protein. The new method could be a potential addition to a suite of existing template-based approaches for protein function prediction. Copyright © 2016 Elsevier B.V. All rights reserved.
Statistical discovery of site inter-dependencies in sub-molecular hierarchical protein structuring
2012-01-01
Background Much progress has been made in understanding the 3D structure of proteins using methods such as NMR and X-ray crystallography. The resulting 3D structures are extremely informative, but do not always reveal which sites and residues within the structure are of special importance. Recently, there are indications that multiple-residue, sub-domain structural relationships within the larger 3D consensus structure of a protein can be inferred from the analysis of the multiple sequence alignment data of a protein family. These intra-dependent clusters of associated sites are used to indicate hierarchical inter-residue relationships within the 3D structure. To reveal the patterns of associations among individual amino acids or sub-domain components within the structure, we apply a k-modes attribute (aligned site) clustering algorithm to the ubiquitin and transthyretin families in order to discover associations among groups of sites within the multiple sequence alignment. We then observe what these associations imply within the 3D structure of these two protein families. Results The k-modes site clustering algorithm we developed maximizes the intra-group interdependencies based on a normalized mutual information measure. The clusters formed correspond to sub-structural components or binding and interface locations. Applying this data-directed method to the ubiquitin and transthyretin protein family multiple sequence alignments as a test bed, we located numerous interesting associations of interdependent sites. These clusters were then arranged into cluster tree diagrams which revealed four structural sub-domains within the single domain structure of ubiquitin and a single large sub-domain within transthyretin associated with the interface among transthyretin monomers. In addition, several clusters of mutually interdependent sites were discovered for each protein family, each of which appear to play an important role in the molecular structure and/or function. Conclusions Our results demonstrate that the method we present here using a k-modes site clustering algorithm based on interdependency evaluation among sites obtained from a sequence alignment of homologous proteins can provide significant insights into the complex, hierarchical inter-residue structural relationships within the 3D structure of a protein family. PMID:22793672
Statistical discovery of site inter-dependencies in sub-molecular hierarchical protein structuring.
Durston, Kirk K; Chiu, David Ky; Wong, Andrew Kc; Li, Gary Cl
2012-07-13
Much progress has been made in understanding the 3D structure of proteins using methods such as NMR and X-ray crystallography. The resulting 3D structures are extremely informative, but do not always reveal which sites and residues within the structure are of special importance. Recently, there are indications that multiple-residue, sub-domain structural relationships within the larger 3D consensus structure of a protein can be inferred from the analysis of the multiple sequence alignment data of a protein family. These intra-dependent clusters of associated sites are used to indicate hierarchical inter-residue relationships within the 3D structure. To reveal the patterns of associations among individual amino acids or sub-domain components within the structure, we apply a k-modes attribute (aligned site) clustering algorithm to the ubiquitin and transthyretin families in order to discover associations among groups of sites within the multiple sequence alignment. We then observe what these associations imply within the 3D structure of these two protein families. The k-modes site clustering algorithm we developed maximizes the intra-group interdependencies based on a normalized mutual information measure. The clusters formed correspond to sub-structural components or binding and interface locations. Applying this data-directed method to the ubiquitin and transthyretin protein family multiple sequence alignments as a test bed, we located numerous interesting associations of interdependent sites. These clusters were then arranged into cluster tree diagrams which revealed four structural sub-domains within the single domain structure of ubiquitin and a single large sub-domain within transthyretin associated with the interface among transthyretin monomers. In addition, several clusters of mutually interdependent sites were discovered for each protein family, each of which appear to play an important role in the molecular structure and/or function. Our results demonstrate that the method we present here using a k-modes site clustering algorithm based on interdependency evaluation among sites obtained from a sequence alignment of homologous proteins can provide significant insights into the complex, hierarchical inter-residue structural relationships within the 3D structure of a protein family.
Structural analysis of herpes simplex virus by optical super-resolution imaging
NASA Astrophysics Data System (ADS)
Laine, Romain F.; Albecka, Anna; van de Linde, Sebastian; Rees, Eric J.; Crump, Colin M.; Kaminski, Clemens F.
2015-01-01
Herpes simplex virus type-1 (HSV-1) is one of the most widespread pathogens among humans. Although the structure of HSV-1 has been extensively investigated, the precise organization of tegument and envelope proteins remains elusive. Here we use super-resolution imaging by direct stochastic optical reconstruction microscopy (dSTORM) in combination with a model-based analysis of single-molecule localization data, to determine the position of protein layers within virus particles. We resolve different protein layers within individual HSV-1 particles using multi-colour dSTORM imaging and discriminate envelope-anchored glycoproteins from tegument proteins, both in purified virions and in virions present in infected cells. Precise characterization of HSV-1 structure was achieved by particle averaging of purified viruses and model-based analysis of the radial distribution of the tegument proteins VP16, VP1/2 and pUL37, and envelope protein gD. From this data, we propose a model of the protein organization inside the tegument.
Mixing and Matching Detergents for Membrane Protein NMR Structure Determination
DOE Office of Scientific and Technical Information (OSTI.GOV)
Columbus, Linda; Lipfert, Jan; Jambunathan, Kalyani
2009-10-21
One major obstacle to membrane protein structure determination is the selection of a detergent micelle that mimics the native lipid bilayer. Currently, detergents are selected by exhaustive screening because the effects of protein-detergent interactions on protein structure are poorly understood. In this study, the structure and dynamics of an integral membrane protein in different detergents is investigated by nuclear magnetic resonance (NMR) and electron paramagnetic resonance (EPR) spectroscopy and small-angle X-ray scattering (SAXS). The results suggest that matching of the micelle dimensions to the protein's hydrophobic surface avoids exchange processes that reduce the completeness of the NMR observations. Based onmore » these dimensions, several mixed micelles were designed that improved the completeness of NMR observations. These findings provide a basis for the rational design of mixed micelles that may advance membrane protein structure determination by NMR.« less
Zhu, Tong; Zhang, John Z H; He, Xiao
2014-09-14
In this work, protein side chain (1)H chemical shifts are used as probes to detect and correct side-chain packing errors in protein's NMR structures through structural refinement. By applying the automated fragmentation quantum mechanics/molecular mechanics (AF-QM/MM) method for ab initio calculation of chemical shifts, incorrect side chain packing was detected in the NMR structures of the Pin1 WW domain. The NMR structure is then refined by using molecular dynamics simulation and the polarized protein-specific charge (PPC) model. The computationally refined structure of the Pin1 WW domain is in excellent agreement with the corresponding X-ray structure. In particular, the use of the PPC model yields a more accurate structure than that using the standard (nonpolarizable) force field. For comparison, some of the widely used empirical models for chemical shift calculations are unable to correctly describe the relationship between the particular proton chemical shift and protein structures. The AF-QM/MM method can be used as a powerful tool for protein NMR structure validation and structural flaw detection.
i3Drefine software for protein 3D structure refinement and its assessment in CASP10.
Bhattacharya, Debswapna; Cheng, Jianlin
2013-01-01
Protein structure refinement refers to the process of improving the qualities of protein structures during structure modeling processes to bring them closer to their native states. Structure refinement has been drawing increasing attention in the community-wide Critical Assessment of techniques for Protein Structure prediction (CASP) experiments since its addition in 8(th) CASP experiment. During the 9(th) and recently concluded 10(th) CASP experiments, a consistent growth in number of refinement targets and participating groups has been witnessed. Yet, protein structure refinement still remains a largely unsolved problem with majority of participating groups in CASP refinement category failed to consistently improve the quality of structures issued for refinement. In order to alleviate this need, we developed a completely automated and computationally efficient protein 3D structure refinement method, i3Drefine, based on an iterative and highly convergent energy minimization algorithm with a powerful all-atom composite physics and knowledge-based force fields and hydrogen bonding (HB) network optimization technique. In the recent community-wide blind experiment, CASP10, i3Drefine (as 'MULTICOM-CONSTRUCT') was ranked as the best method in the server section as per the official assessment of CASP10 experiment. Here we provide the community with free access to i3Drefine software and systematically analyse the performance of i3Drefine in strict blind mode on the refinement targets issued in CASP10 refinement category and compare with other state-of-the-art refinement methods participating in CASP10. Our analysis demonstrates that i3Drefine is only fully-automated server participating in CASP10 exhibiting consistent improvement over the initial structures in both global and local structural quality metrics. Executable version of i3Drefine is freely available at http://protein.rnet.missouri.edu/i3drefine/.
Identifying DNA-binding proteins using structural motifs and the electrostatic potential
Shanahan, Hugh P.; Garcia, Mario A.; Jones, Susan; Thornton, Janet M.
2004-01-01
Robust methods to detect DNA-binding proteins from structures of unknown function are important for structural biology. This paper describes a method for identifying such proteins that (i) have a solvent accessible structural motif necessary for DNA-binding and (ii) a positive electrostatic potential in the region of the binding region. We focus on three structural motifs: helix–turn-helix (HTH), helix–hairpin–helix (HhH) and helix–loop–helix (HLH). We find that the combination of these variables detect 78% of proteins with an HTH motif, which is a substantial improvement over previous work based purely on structural templates and is comparable to more complex methods of identifying DNA-binding proteins. Similar true positive fractions are achieved for the HhH and HLH motifs. We see evidence of wide evolutionary diversity for DNA-binding proteins with an HTH motif, and much smaller diversity for those with an HhH or HLH motif. PMID:15356290
Making the Bend: DNA Tertiary Structure and Protein-DNA Interactions
Harteis, Sabrina; Schneider, Sabine
2014-01-01
DNA structure functions as an overlapping code to the DNA sequence. Rapid progress in understanding the role of DNA structure in gene regulation, DNA damage recognition and genome stability has been made. The three dimensional structure of both proteins and DNA plays a crucial role for their specific interaction, and proteins can recognise the chemical signature of DNA sequence (“base readout”) as well as the intrinsic DNA structure (“shape recognition”). These recognition mechanisms do not exist in isolation but, depending on the individual interaction partners, are combined to various extents. Driving force for the interaction between protein and DNA remain the unique thermodynamics of each individual DNA-protein pair. In this review we focus on the structures and conformations adopted by DNA, both influenced by and influencing the specific interaction with the corresponding protein binding partner, as well as their underlying thermodynamics. PMID:25026169
DOE Office of Scientific and Technical Information (OSTI.GOV)
Doiron, K.; Yu, P; McKinnon, J
2009-01-01
The objectives of this study were to reveal protein structures of feed tissues affected by heat processing at a cellular level, using the synchrotron-based Fourier transform infrared microspectroscopy as a novel approach, and quantify protein structure in relation to protein digestive kinetics and nutritive value in the rumen and intestine in dairy cattle. The parameters assessed included (1) protein structure a-helix to e-sheet ratio; (2) protein subfractions profiles; (3) protein degradation kinetics and effective degradability; (4) predicted nutrient supply using the intestinally absorbed protein supply (DVE)/degraded protein balance (OEB) system for dairy cattle. In this study, Vimy flaxseed protein wasmore » used as a model feed protein and was autoclave-heated at 120C for 20, 40, and 60 min in treatments T1, T2, and T3, respectively. The results showed that using the synchrotron-based Fourier transform infrared microspectroscopy revealed and identified the heat-induced protein structure changes. Heating at 120C for 40 and 60 min increased the protein structure a-helix to e-sheet ratio. There were linear effects of heating time on the ratio. The heating also changed chemical profiles, which showed soluble CP decreased upon heating with concomitant increases in nonprotein nitrogen, neutral, and acid detergent insoluble nitrogen. The protein subfractions with the greatest changes were PB1, which showed a dramatic reduction, and PB2, which showed a dramatic increase, demonstrating a decrease in overall protein degradability. In situ results showed a reduction in rumen-degradable protein and in rumen-degradable dry matter without differences between the treatments. Intestinal digestibility, determined using a 3-step in vitro procedure, showed no changes to rumen undegradable protein. Modeling results showed that heating increased total intestinally absorbable protein (feed DVE value) and decreased degraded protein balance (feed OEB value), but there were no differences between the treatments. There was a linear effect of heating time on the DVE and a cubic effect on the OEB value. Our results showed that heating changed chemical profiles, protein structure a-helix to e-sheet ratio, and protein subfractions; decreased rumen-degradable protein and rumen-degradable dry matter; and increased potential nutrient supply to dairy cattle. The protein structure a-helix to e-sheet ratio had a significant positive correlation with total intestinally absorbed protein supply and negative correlation with degraded protein balance.« less
Tan, Kemin; Chang, Changsoo; Cuff, Marianne; Osipiuk, Jerzy; Landorf, Elizabeth; Mack, Jamey C; Zerbs, Sarah; Joachimiak, Andrzej; Collart, Frank R
2013-10-01
Lignin comprises 15-25% of plant biomass and represents a major environmental carbon source for utilization by soil microorganisms. Access to this energy resource requires the action of fungal and bacterial enzymes to break down the lignin polymer into a complex assortment of aromatic compounds that can be transported into the cells. To improve our understanding of the utilization of lignin by microorganisms, we characterized the molecular properties of solute binding proteins of ATP-binding cassette transporter proteins that interact with these compounds. A combination of functional screens and structural studies characterized the binding specificity of the solute binding proteins for aromatic compounds derived from lignin such as p-coumarate, 3-phenylpropionic acid and compounds with more complex ring substitutions. A ligand screen based on thermal stabilization identified several binding protein clusters that exhibit preferences based on the size or number of aromatic ring substituents. Multiple X-ray crystal structures of protein-ligand complexes for these clusters identified the molecular basis of the binding specificity for the lignin-derived aromatic compounds. The screens and structural data provide new functional assignments for these solute-binding proteins which can be used to infer their transport specificity. This knowledge of the functional roles and molecular binding specificity of these proteins will support the identification of the specific enzymes and regulatory proteins of peripheral pathways that funnel these compounds to central metabolic pathways and will improve the predictive power of sequence-based functional annotation methods for this family of proteins. Copyright © 2013 Wiley Periodicals, Inc.
Tan, Kemin; Chang, Changsoo; Cuff, Marianne; Osipiuk, Jerzy; Landorf, Elizabeth; Mack, Jamey C.; Zerbs, Sarah; Joachimiak, Andrzej; Collart, Frank R.
2013-01-01
Lignin comprises 15.25% of plant biomass and represents a major environmental carbon source for utilization by soil microorganisms. Access to this energy resource requires the action of fungal and bacterial enzymes to break down the lignin polymer into a complex assortment of aromatic compounds that can be transported into the cells. To improve our understanding of the utilization of lignin by microorganisms, we characterized the molecular properties of solute binding proteins of ATP.binding cassette transporter proteins that interact with these compounds. A combination of functional screens and structural studies characterized the binding specificity of the solute binding proteins for aromatic compounds derived from lignin such as p-coumarate, 3-phenylpropionic acid and compounds with more complex ring substitutions. A ligand screen based on thermal stabilization identified several binding protein clusters that exhibit preferences based on the size or number of aromatic ring substituents. Multiple X-ray crystal structures of protein-ligand complexes for these clusters identified the molecular basis of the binding specificity for the lignin-derived aromatic compounds. The screens and structural data provide new functional assignments for these solute.binding proteins which can be used to infer their transport specificity. This knowledge of the functional roles and molecular binding specificity of these proteins will support the identification of the specific enzymes and regulatory proteins of peripheral pathways that funnel these compounds to central metabolic pathways and will improve the predictive power of sequence-based functional annotation methods for this family of proteins. PMID:23606130
Defining and predicting structurally conserved regions in protein superfamilies
Huang, Ivan K.; Grishin, Nick V.
2013-01-01
Motivation: The structures of homologous proteins are generally better conserved than their sequences. This phenomenon is demonstrated by the prevalence of structurally conserved regions (SCRs) even in highly divergent protein families. Defining SCRs requires the comparison of two or more homologous structures and is affected by their availability and divergence, and our ability to deduce structurally equivalent positions among them. In the absence of multiple homologous structures, it is necessary to predict SCRs of a protein using information from only a set of homologous sequences and (if available) a single structure. Accurate SCR predictions can benefit homology modelling and sequence alignment. Results: Using pairwise DaliLite alignments among a set of homologous structures, we devised a simple measure of structural conservation, termed structural conservation index (SCI). SCI was used to distinguish SCRs from non-SCRs. A database of SCRs was compiled from 386 SCOP superfamilies containing 6489 protein domains. Artificial neural networks were then trained to predict SCRs with various features deduced from a single structure and homologous sequences. Assessment of the predictions via a 5-fold cross-validation method revealed that predictions based on features derived from a single structure perform similarly to ones based on homologous sequences, while combining sequence and structural features was optimal in terms of accuracy (0.755) and Matthews correlation coefficient (0.476). These results suggest that even without information from multiple structures, it is still possible to effectively predict SCRs for a protein. Finally, inspection of the structures with the worst predictions pinpoints difficulties in SCR definitions. Availability: The SCR database and the prediction server can be found at http://prodata.swmed.edu/SCR. Contact: 91huangi@gmail.com or grishin@chop.swmed.edu Supplementary information: Supplementary data are available at Bioinformatics Online PMID:23193223
HomPPI: a class of sequence homology based protein-protein interface prediction methods
2011-01-01
Background Although homology-based methods are among the most widely used methods for predicting the structure and function of proteins, the question as to whether interface sequence conservation can be effectively exploited in predicting protein-protein interfaces has been a subject of debate. Results We studied more than 300,000 pair-wise alignments of protein sequences from structurally characterized protein complexes, including both obligate and transient complexes. We identified sequence similarity criteria required for accurate homology-based inference of interface residues in a query protein sequence. Based on these analyses, we developed HomPPI, a class of sequence homology-based methods for predicting protein-protein interface residues. We present two variants of HomPPI: (i) NPS-HomPPI (Non partner-specific HomPPI), which can be used to predict interface residues of a query protein in the absence of knowledge of the interaction partner; and (ii) PS-HomPPI (Partner-specific HomPPI), which can be used to predict the interface residues of a query protein with a specific target protein. Our experiments on a benchmark dataset of obligate homodimeric complexes show that NPS-HomPPI can reliably predict protein-protein interface residues in a given protein, with an average correlation coefficient (CC) of 0.76, sensitivity of 0.83, and specificity of 0.78, when sequence homologs of the query protein can be reliably identified. NPS-HomPPI also reliably predicts the interface residues of intrinsically disordered proteins. Our experiments suggest that NPS-HomPPI is competitive with several state-of-the-art interface prediction servers including those that exploit the structure of the query proteins. The partner-specific classifier, PS-HomPPI can, on a large dataset of transient complexes, predict the interface residues of a query protein with a specific target, with a CC of 0.65, sensitivity of 0.69, and specificity of 0.70, when homologs of both the query and the target can be reliably identified. The HomPPI web server is available at http://homppi.cs.iastate.edu/. Conclusions Sequence homology-based methods offer a class of computationally efficient and reliable approaches for predicting the protein-protein interface residues that participate in either obligate or transient interactions. For query proteins involved in transient interactions, the reliability of interface residue prediction can be improved by exploiting knowledge of putative interaction partners. PMID:21682895
Novel nonlinear knowledge-based mean force potentials based on machine learning.
Dong, Qiwen; Zhou, Shuigeng
2011-01-01
The prediction of 3D structures of proteins from amino acid sequences is one of the most challenging problems in molecular biology. An essential task for solving this problem with coarse-grained models is to deduce effective interaction potentials. The development and evaluation of new energy functions is critical to accurately modeling the properties of biological macromolecules. Knowledge-based mean force potentials are derived from statistical analysis of proteins of known structures. Current knowledge-based potentials are almost in the form of weighted linear sum of interaction pairs. In this study, a class of novel nonlinear knowledge-based mean force potentials is presented. The potential parameters are obtained by nonlinear classifiers, instead of relative frequencies of interaction pairs against a reference state or linear classifiers. The support vector machine is used to derive the potential parameters on data sets that contain both native structures and decoy structures. Five knowledge-based mean force Boltzmann-based or linear potentials are introduced and their corresponding nonlinear potentials are implemented. They are the DIH potential (single-body residue-level Boltzmann-based potential), the DFIRE-SCM potential (two-body residue-level Boltzmann-based potential), the FS potential (two-body atom-level Boltzmann-based potential), the HR potential (two-body residue-level linear potential), and the T32S3 potential (two-body atom-level linear potential). Experiments are performed on well-established decoy sets, including the LKF data set, the CASP7 data set, and the Decoys “R”Us data set. The evaluation metrics include the energy Z score and the ability of each potential to discriminate native structures from a set of decoy structures. Experimental results show that all nonlinear potentials significantly outperform the corresponding Boltzmann-based or linear potentials, and the proposed discriminative framework is effective in developing knowledge-based mean force potentials. The nonlinear potentials can be widely used for ab initio protein structure prediction, model quality assessment, protein docking, and other challenging problems in computational biology.
NASA Astrophysics Data System (ADS)
Yu, Peiqiang; Jonker, Arjan; Gruber, Margaret
2009-09-01
To date there has been very little application of synchrotron radiation-based Fourier transform infrared microspectroscopy (SRFTIRM) to the study of molecular structures in plant forage in relation to livestock digestive behavior and nutrient availability. Protein inherent structure, among other factors such as protein matrix, affects nutritive quality, fermentation and degradation behavior in both humans and animals. The relative percentage of protein secondary structure influences protein value. A high percentage of β-sheets usually reduce the access of gastrointestinal digestive enzymes to the protein. Reduced accessibility results in poor digestibility and as a result, low protein value. The objective of this study was to use SRFTIRM to compare protein molecular structure of alfalfa plant tissues transformed with the maize Lc regulatory gene with non-transgenic alfalfa protein within cellular and subcellular dimensions and to quantify protein inherent structure profiles using Gaussian and Lorentzian methods of multi-component peak modeling. Protein molecular structure revealed by this method included α-helices, β-sheets and other structures such as β-turns and random coils. Hierarchical cluster analysis and principal component analysis of the synchrotron data, as well as accurate spectral analysis based on curve fitting, showed that transgenic alfalfa contained a relatively lower ( P < 0.05) percentage of the model-fitted α-helices (29 vs. 34) and model-fitted β-sheets (22 vs. 27) and a higher ( P < 0.05) percentage of other model-fitted structures (49 vs. 39). Transgenic alfalfa protein displayed no difference ( P > 0.05) in the ratio of α-helices to β-sheets (average: 1.4) and higher ( P < 0.05) ratios of α-helices to others (0.7 vs. 0.9) and β-sheets to others (0.5 vs. 0.8) than the non-transgenic alfalfa protein. The transgenic protein structures also exhibited no difference ( P > 0.05) in the vibrational intensity of protein amide I (average of 24) and amide II areas (average of 10) and their ratio (average of 2.4) compared with non-transgenic alfalfa. Cluster analysis and principal component analysis showed no significant differences between the two genotypes in the broad molecular fingerprint region, amides I and II regions, and the carbohydrate molecular region, indicating they are highly related to each other. The results suggest that transgenic Lc-alfalfa leaves contain similar proteins to non-transgenic alfalfa (because amide I and II intensities were identical), but a subtle difference in protein molecular structure after freeze drying. Further study is needed to understand the relationship between these structural profiles and biological features such as protein nutrient availability, protein bypass and digestive behavior of livestock fed with this type of forage.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yu, P.; Jonker, A; Gruber, M
2009-01-01
To date there has been very little application of synchrotron radiation-based Fourier transform infrared microspectroscopy (SRFTIRM) to the study of molecular structures in plant forage in relation to livestock digestive behavior and nutrient availability. Protein inherent structure, among other factors such as protein matrix, affects nutritive quality, fermentation and degradation behavior in both humans and animals. The relative percentage of protein secondary structure influences protein value. A high percentage of e-sheets usually reduce the access of gastrointestinal digestive enzymes to the protein. Reduced accessibility results in poor digestibility and as a result, low protein value. The objective of this studymore » was to use SRFTIRM to compare protein molecular structure of alfalfa plant tissues transformed with the maize Lc regulatory gene with non-transgenic alfalfa protein within cellular and subcellular dimensions and to quantify protein inherent structure profiles using Gaussian and Lorentzian methods of multi-component peak modeling. Protein molecular structure revealed by this method included a-helices, e-sheets and other structures such as e-turns and random coils. Hierarchical cluster analysis and principal component analysis of the synchrotron data, as well as accurate spectral analysis based on curve fitting, showed that transgenic alfalfa contained a relatively lower (P < 0.05) percentage of the model-fitted a-helices (29 vs. 34) and model-fitted e-sheets (22 vs. 27) and a higher (P < 0.05) percentage of other model-fitted structures (49 vs. 39). Transgenic alfalfa protein displayed no difference (P > 0.05) in the ratio of a-helices to e-sheets (average: 1.4) and higher (P < 0.05) ratios of a-helices to others (0.7 vs. 0.9) and e-sheets to others (0.5 vs. 0.8) than the non-transgenic alfalfa protein. The transgenic protein structures also exhibited no difference (P > 0.05) in the vibrational intensity of protein amide I (average of 24) and amide II areas (average of 10) and their ratio (average of 2.4) compared with non-transgenic alfalfa. Cluster analysis and principal component analysis showed no significant differences between the two genotypes in the broad molecular fingerprint region, amides I and II regions, and the carbohydrate molecular region, indicating they are highly related to each other. The results suggest that transgenic Lc-alfalfa leaves contain similar proteins to non-transgenic alfalfa (because amide I and II intensities were identical), but a subtle difference in protein molecular structure after freeze drying. Further study is needed to understand the relationship between these structural profiles and biological features such as protein nutrient availability, protein bypass and digestive behavior of livestock fed with this type of forage.« less
Structure-based conformational preferences of amino acids
Koehl, Patrice; Levitt, Michael
1999-01-01
Proteins can be very tolerant to amino acid substitution, even within their core. Understanding the factors responsible for this behavior is of critical importance for protein engineering and design. Mutations in proteins have been quantified in terms of the changes in stability they induce. For example, guest residues in specific secondary structures have been used as probes of conformational preferences of amino acids, yielding propensity scales. Predicting these amino acid propensities would be a good test of any new potential energy functions used to mimic protein stability. We have recently developed a protein design procedure that optimizes whole sequences for a given target conformation based on the knowledge of the template backbone and on a semiempirical potential energy function. This energy function is purely physical, including steric interactions based on a Lennard-Jones potential, electrostatics based on a Coulomb potential, and hydrophobicity in the form of an environment free energy based on accessible surface area and interatomic contact areas. Sequences designed by this procedure for 10 different proteins were analyzed to extract conformational preferences for amino acids. The resulting structure-based propensity scales show significant agreements with experimental propensity scale values, both for α-helices and β-sheets. These results indicate that amino acid conformational preferences are a natural consequence of the potential energy we use. This confirms the accuracy of our potential and indicates that such preferences should not be added as a design criterion. PMID:10535955
DNA Nanotubes for NMR Structure Determination of Membrane Proteins
Bellot, Gaëtan; McClintock, Mark A.; Chou, James J; Shih, William M.
2013-01-01
Structure determination of integral membrane proteins by solution NMR represents one of the most important challenges of structural biology. A Residual-Dipolar-Coupling-based refinement approach can be used to solve the structure of membrane proteins up to 40 kDa in size, however, a weak-alignment medium that is detergent-resistant is required. Previously, availability of media suitable for weak alignment of membrane proteins was severely limited. We describe here a protocol for robust, large-scale synthesis of detergent-resistant DNA nanotubes that can be assembled into dilute liquid crystals for application as weak-alignment media in solution NMR structure determination of membrane proteins in detergent micelles. The DNA nanotubes are heterodimers of 400nm-long six-helix bundles each self-assembled from a M13-based p7308 scaffold strand and >170 short oligonucleotide staple strands. Compatibility with proteins bearing considerable positive charge as well as modulation of molecular alignment, towards collection of linearly independent restraints, can be introduced by reducing the negative charge of DNA nanotubes via counter ions and small DNA binding molecules. This detergent-resistant liquid-crystal media offers a number of properties conducive for membrane protein alignment, including high-yield production, thermal stability, buffer compatibility, and structural programmability. Production of sufficient nanotubes for 4–5 NMR experiments can be completed in one week by a single individual. PMID:23518667
PreSSAPro: a software for the prediction of secondary structure by amino acid properties.
Costantini, Susan; Colonna, Giovanni; Facchiano, Angelo M
2007-10-01
PreSSAPro is a software, available to the scientific community as a free web service designed to provide predictions of secondary structures starting from the amino acid sequence of a given protein. Predictions are based on our recently published work on the amino acid propensities for secondary structures in either large but not homogeneous protein data sets, as well as in smaller but homogeneous data sets corresponding to protein structural classes, i.e. all-alpha, all-beta, or alpha-beta proteins. Predictions result improved by the use of propensities evaluated for the right protein class. PreSSAPro predicts the secondary structure according to the right protein class, if known, or gives a multiple prediction with reference to the different structural classes. The comparison of these predictions represents a novel tool to evaluate what sequence regions can assume different secondary structures depending on the structural class assignment, in the perspective of identifying proteins able to fold in different conformations. The service is available at the URL http://bioinformatica.isa.cnr.it/PRESSAPRO/.
2017-01-01
Recent advances in understanding protein folding have benefitted from coarse-grained representations of protein structures. Empirical energy functions derived from these techniques occasionally succeed in distinguishing native structures from their corresponding ensembles of nonnative folds or decoys which display varying degrees of structural dissimilarity to the native proteins. Here we utilized atomic coordinates of single protein chains, comprising a large diverse training set, to develop and evaluate twelve all-atom four-body statistical potentials obtained by exploring alternative values for a pair of inherent parameters. Delaunay tessellation was performed on the atomic coordinates of each protein to objectively identify all quadruplets of interacting atoms, and atomic potentials were generated via statistical analysis of the data and implementation of the inverted Boltzmann principle. Our potentials were evaluated using benchmarking datasets from Decoys-‘R'-Us, and comparisons were made with twelve other physics- and knowledge-based potentials. Ranking 3rd, our best potential tied CHARMM19 and surpassed AMBER force field potentials. We illustrate how a generalized version of our potential can be used to empirically calculate binding energies for target-ligand complexes, using HIV-1 protease-inhibitor complexes for a practical application. The combined results suggest an accurate and efficient atomic four-body statistical potential for protein structure prediction and assessment. PMID:29119109
Monte Carlo replica-exchange based ensemble docking of protein conformations.
Zhang, Zhe; Ehmann, Uwe; Zacharias, Martin
2017-05-01
A replica-exchange Monte Carlo (REMC) ensemble docking approach has been developed that allows efficient exploration of protein-protein docking geometries. In addition to Monte Carlo steps in translation and orientation of binding partners, possible conformational changes upon binding are included based on Monte Carlo selection of protein conformations stored as ordered pregenerated conformational ensembles. The conformational ensembles of each binding partner protein were generated by three different approaches starting from the unbound partner protein structure with a range spanning a root mean square deviation of 1-2.5 Å with respect to the unbound structure. Because MC sampling is performed to select appropriate partner conformations on the fly the approach is not limited by the number of conformations in the ensemble compared to ensemble docking of each conformer pair in ensemble cross docking. Although only a fraction of generated conformers was in closer agreement with the bound structure the REMC ensemble docking approach achieved improved docking results compared to REMC docking with only the unbound partner structures or using docking energy minimization methods. The approach has significant potential for further improvement in combination with more realistic structural ensembles and better docking scoring functions. Proteins 2017; 85:924-937. © 2016 Wiley Periodicals, Inc. © 2017 Wiley Periodicals, Inc.
Robic, Srebrenka
2010-01-01
To fully understand the roles proteins play in cellular processes, students need to grasp complex ideas about protein structure, folding, and stability. Our current understanding of these topics is based on mathematical models and experimental data. However, protein structure, folding, and stability are often introduced as descriptive, qualitative phenomena in undergraduate classes. In the process of learning about these topics, students often form incorrect ideas. For example, by learning about protein folding in the context of protein synthesis, students may come to an incorrect conclusion that once synthesized on the ribosome, a protein spends its entire cellular life time in its fully folded native confirmation. This is clearly not true; proteins are dynamic structures that undergo both local fluctuations and global unfolding events. To prevent and address such misconceptions, basic concepts of protein science can be introduced in the context of simple mathematical models and hands-on explorations of publicly available data sets. Ten common misconceptions about proteins are presented, along with suggestions for using equations, models, sequence, structure, and thermodynamic data to help students gain a deeper understanding of basic concepts relating to protein structure, folding, and stability.
Nuccitelli, Annalisa; Cozzi, Roberta; Gourlay, Louise J; Donnarumma, Danilo; Necchi, Francesca; Norais, Nathalie; Telford, John L; Rappuoli, Rino; Bolognesi, Martino; Maione, Domenico; Grandi, Guido; Rinaudo, C Daniela
2011-06-21
Structural vaccinology is an emerging strategy for the rational design of vaccine candidates. We successfully applied structural vaccinology to design a fully synthetic protein with multivalent protection activity. In Group B Streptococcus, cell-surface pili have aroused great interest because of their direct roles in virulence and importance as protective antigens. The backbone subunit of type 2a pilus (BP-2a) is present in six immunogenically different but structurally similar variants. We determined the 3D structure of one of the variants, and experimentally demonstrated that protective antibodies specifically recognize one of the four domains that comprise the protein. We therefore constructed a synthetic protein constituted by the protective domain of each one of the six variants and showed that the chimeric protein protects mice against the challenge with all of the type 2a pilus-carrying strains. This work demonstrates the power of structural vaccinology and will facilitate the development of an optimized, broadly protective pilus-based vaccine against Group B Streptococcus by combining the uniquely generated chimeric protein with protective pilin subunits from two other previously identified pilus types. In addition, this work describes a template procedure that can be followed to develop vaccines against other bacterial pathogens.
Structure-based design of combinatorial mutagenesis libraries
Verma, Deeptak; Grigoryan, Gevorg; Bailey-Kellogg, Chris
2015-01-01
The development of protein variants with improved properties (thermostability, binding affinity, catalytic activity, etc.) has greatly benefited from the application of high-throughput screens evaluating large, diverse combinatorial libraries. At the same time, since only a very limited portion of sequence space can be experimentally constructed and tested, an attractive possibility is to use computational protein design to focus libraries on a productive portion of the space. We present a general-purpose method, called “Structure-based Optimization of Combinatorial Mutagenesis” (SOCoM), which can optimize arbitrarily large combinatorial mutagenesis libraries directly based on structural energies of their constituents. SOCoM chooses both positions and substitutions, employing a combinatorial optimization framework based on library-averaged energy potentials in order to avoid explicitly modeling every variant in every possible library. In case study applications to green fluorescent protein, β-lactamase, and lipase A, SOCoM optimizes relatively small, focused libraries whose variants achieve energies comparable to or better than previous library design efforts, as well as larger libraries (previously not designable by structure-based methods) whose variants cover greater diversity while still maintaining substantially better energies than would be achieved by representative random library approaches. By allowing the creation of large-scale combinatorial libraries based on structural calculations, SOCoM promises to increase the scope of applicability of computational protein design and improve the hit rate of discovering beneficial variants. While designs presented here focus on variant stability (predicted by total energy), SOCoM can readily incorporate other structure-based assessments, such as the energy gap between alternative conformational or bound states. PMID:25611189
Tsai, Keng-Chang; Jian, Jhih-Wei; Yang, Ei-Wen; Hsu, Po-Chiang; Peng, Hung-Pin; Chen, Ching-Tai; Chen, Jun-Bo; Chang, Jeng-Yih; Hsu, Wen-Lian; Yang, An-Suei
2012-01-01
Non-covalent protein-carbohydrate interactions mediate molecular targeting in many biological processes. Prediction of non-covalent carbohydrate binding sites on protein surfaces not only provides insights into the functions of the query proteins; information on key carbohydrate-binding residues could suggest site-directed mutagenesis experiments, design therapeutics targeting carbohydrate-binding proteins, and provide guidance in engineering protein-carbohydrate interactions. In this work, we show that non-covalent carbohydrate binding sites on protein surfaces can be predicted with relatively high accuracy when the query protein structures are known. The prediction capabilities were based on a novel encoding scheme of the three-dimensional probability density maps describing the distributions of 36 non-covalent interacting atom types around protein surfaces. One machine learning model was trained for each of the 30 protein atom types. The machine learning algorithms predicted tentative carbohydrate binding sites on query proteins by recognizing the characteristic interacting atom distribution patterns specific for carbohydrate binding sites from known protein structures. The prediction results for all protein atom types were integrated into surface patches as tentative carbohydrate binding sites based on normalized prediction confidence level. The prediction capabilities of the predictors were benchmarked by a 10-fold cross validation on 497 non-redundant proteins with known carbohydrate binding sites. The predictors were further tested on an independent test set with 108 proteins. The residue-based Matthews correlation coefficient (MCC) for the independent test was 0.45, with prediction precision and sensitivity (or recall) of 0.45 and 0.49 respectively. In addition, 111 unbound carbohydrate-binding protein structures for which the structures were determined in the absence of the carbohydrate ligands were predicted with the trained predictors. The overall prediction MCC was 0.49. Independent tests on anti-carbohydrate antibodies showed that the carbohydrate antigen binding sites were predicted with comparable accuracy. These results demonstrate that the predictors are among the best in carbohydrate binding site predictions to date. PMID:22848404
CABS-fold: Server for the de novo and consensus-based prediction of protein structure.
Blaszczyk, Maciej; Jamroz, Michal; Kmiecik, Sebastian; Kolinski, Andrzej
2013-07-01
The CABS-fold web server provides tools for protein structure prediction from sequence only (de novo modeling) and also using alternative templates (consensus modeling). The web server is based on the CABS modeling procedures ranked in previous Critical Assessment of techniques for protein Structure Prediction competitions as one of the leading approaches for de novo and template-based modeling. Except for template data, fragmentary distance restraints can also be incorporated into the modeling process. The web server output is a coarse-grained trajectory of generated conformations, its Jmol representation and predicted models in all-atom resolution (together with accompanying analysis). CABS-fold can be freely accessed at http://biocomp.chem.uw.edu.pl/CABSfold.
CABS-fold: server for the de novo and consensus-based prediction of protein structure
Blaszczyk, Maciej; Jamroz, Michal; Kmiecik, Sebastian; Kolinski, Andrzej
2013-01-01
The CABS-fold web server provides tools for protein structure prediction from sequence only (de novo modeling) and also using alternative templates (consensus modeling). The web server is based on the CABS modeling procedures ranked in previous Critical Assessment of techniques for protein Structure Prediction competitions as one of the leading approaches for de novo and template-based modeling. Except for template data, fragmentary distance restraints can also be incorporated into the modeling process. The web server output is a coarse-grained trajectory of generated conformations, its Jmol representation and predicted models in all-atom resolution (together with accompanying analysis). CABS-fold can be freely accessed at http://biocomp.chem.uw.edu.pl/CABSfold. PMID:23748950
STRUM: structure-based prediction of protein stability changes upon single-point mutation.
Quan, Lijun; Lv, Qiang; Zhang, Yang
2016-10-01
Mutations in human genome are mainly through single nucleotide polymorphism, some of which can affect stability and function of proteins, causing human diseases. Several methods have been proposed to predict the effect of mutations on protein stability; but most require features from experimental structure. Given the fast progress in protein structure prediction, this work explores the possibility to improve the mutation-induced stability change prediction using low-resolution structure modeling. We developed a new method (STRUM) for predicting stability change caused by single-point mutations. Starting from wild-type sequences, 3D models are constructed by the iterative threading assembly refinement (I-TASSER) simulations, where physics- and knowledge-based energy functions are derived on the I-TASSER models and used to train STRUM models through gradient boosting regression. STRUM was assessed by 5-fold cross validation on 3421 experimentally determined mutations from 150 proteins. The Pearson correlation coefficient (PCC) between predicted and measured changes of Gibbs free-energy gap, ΔΔG, upon mutation reaches 0.79 with a root-mean-square error 1.2 kcal/mol in the mutation-based cross-validations. The PCC reduces if separating training and test mutations from non-homologous proteins, which reflects inherent correlations in the current mutation sample. Nevertheless, the results significantly outperform other state-of-the-art methods, including those built on experimental protein structures. Detailed analyses show that the most sensitive features in STRUM are the physics-based energy terms on I-TASSER models and the conservation scores from multiple-threading template alignments. However, the ΔΔG prediction accuracy has only a marginal dependence on the accuracy of protein structure models as long as the global fold is correct. These data demonstrate the feasibility to use low-resolution structure modeling for high-accuracy stability change prediction upon point mutations. http://zhanglab.ccmb.med.umich.edu/STRUM/ CONTACT: qiang@suda.edu.cn and zhng@umich.edu Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
STRUM: structure-based prediction of protein stability changes upon single-point mutation
Quan, Lijun; Lv, Qiang; Zhang, Yang
2016-01-01
Motivation: Mutations in human genome are mainly through single nucleotide polymorphism, some of which can affect stability and function of proteins, causing human diseases. Several methods have been proposed to predict the effect of mutations on protein stability; but most require features from experimental structure. Given the fast progress in protein structure prediction, this work explores the possibility to improve the mutation-induced stability change prediction using low-resolution structure modeling. Results: We developed a new method (STRUM) for predicting stability change caused by single-point mutations. Starting from wild-type sequences, 3D models are constructed by the iterative threading assembly refinement (I-TASSER) simulations, where physics- and knowledge-based energy functions are derived on the I-TASSER models and used to train STRUM models through gradient boosting regression. STRUM was assessed by 5-fold cross validation on 3421 experimentally determined mutations from 150 proteins. The Pearson correlation coefficient (PCC) between predicted and measured changes of Gibbs free-energy gap, ΔΔG, upon mutation reaches 0.79 with a root-mean-square error 1.2 kcal/mol in the mutation-based cross-validations. The PCC reduces if separating training and test mutations from non-homologous proteins, which reflects inherent correlations in the current mutation sample. Nevertheless, the results significantly outperform other state-of-the-art methods, including those built on experimental protein structures. Detailed analyses show that the most sensitive features in STRUM are the physics-based energy terms on I-TASSER models and the conservation scores from multiple-threading template alignments. However, the ΔΔG prediction accuracy has only a marginal dependence on the accuracy of protein structure models as long as the global fold is correct. These data demonstrate the feasibility to use low-resolution structure modeling for high-accuracy stability change prediction upon point mutations. Availability and Implementation: http://zhanglab.ccmb.med.umich.edu/STRUM/ Contact: qiang@suda.edu.cn and zhng@umich.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27318206
Jeong, Chan-Seok; Kim, Dongsup
2016-02-24
Elucidating the cooperative mechanism of interconnected residues is an important component toward understanding the biological function of a protein. Coevolution analysis has been developed to model the coevolutionary information reflecting structural and functional constraints. Recently, several methods have been developed based on a probabilistic graphical model called the Markov random field (MRF), which have led to significant improvements for coevolution analysis; however, thus far, the performance of these models has mainly been assessed by focusing on the aspect of protein structure. In this study, we built an MRF model whose graphical topology is determined by the residue proximity in the protein structure, and derived a novel positional coevolution estimate utilizing the node weight of the MRF model. This structure-based MRF method was evaluated for three data sets, each of which annotates catalytic site, allosteric site, and comprehensively determined functional site information. We demonstrate that the structure-based MRF architecture can encode the evolutionary information associated with biological function. Furthermore, we show that the node weight can more accurately represent positional coevolution information compared to the edge weight. Lastly, we demonstrate that the structure-based MRF model can be reliably built with only a few aligned sequences in linear time. The results show that adoption of a structure-based architecture could be an acceptable approximation for coevolution modeling with efficient computation complexity.
Localization-based super-resolution imaging of cellular structures.
Kanchanawong, Pakorn; Waterman, Clare M
2013-01-01
Fluorescence microscopy allows direct visualization of fluorescently tagged proteins within cells. However, the spatial resolution of conventional fluorescence microscopes is limited by diffraction to ~250 nm, prompting the development of super-resolution microscopy which offers resolution approaching the scale of single proteins, i.e., ~20 nm. Here, we describe protocols for single molecule localization-based super-resolution imaging, using focal adhesion proteins as an example and employing either photoswitchable fluorophores or photoactivatable fluorescent proteins. These protocols should also be easily adaptable to imaging a broad array of macromolecular assemblies in cells whose components can be fluorescently tagged and assemble into high density structures.
SA-Search: a web tool for protein structure mining based on a Structural Alphabet
Guyon, Frédéric; Camproux, Anne-Claude; Hochez, Joëlle; Tufféry, Pierre
2004-01-01
SA-Search is a web tool that can be used to mine for protein structures and extract structural similarities. It is based on a hidden Markov model derived Structural Alphabet (SA) that allows the compression of three-dimensional (3D) protein conformations into a one-dimensional (1D) representation using a limited number of prototype conformations. Using such a representation, classical methods developed for amino acid sequences can be employed. Currently, SA-Search permits the performance of fast 3D similarity searches such as the extraction of exact words using a suffix tree approach, and the search for fuzzy words viewed as a simple 1D sequence alignment problem. SA-Search is available at http://bioserv.rpbs.jussieu.fr/cgi-bin/SA-Search. PMID:15215446
SA-Search: a web tool for protein structure mining based on a Structural Alphabet.
Guyon, Frédéric; Camproux, Anne-Claude; Hochez, Joëlle; Tufféry, Pierre
2004-07-01
SA-Search is a web tool that can be used to mine for protein structures and extract structural similarities. It is based on a hidden Markov model derived Structural Alphabet (SA) that allows the compression of three-dimensional (3D) protein conformations into a one-dimensional (1D) representation using a limited number of prototype conformations. Using such a representation, classical methods developed for amino acid sequences can be employed. Currently, SA-Search permits the performance of fast 3D similarity searches such as the extraction of exact words using a suffix tree approach, and the search for fuzzy words viewed as a simple 1D sequence alignment problem. SA-Search is available at http://bioserv.rpbs.jussieu.fr/cgi-bin/SA-Search.
Monleón, Daniel; Colson, Kimberly; Moseley, Hunter N B; Anklin, Clemens; Oswald, Robert; Szyperski, Thomas; Montelione, Gaetano T
2002-01-01
Rapid data collection, spectral referencing, processing by time domain deconvolution, peak picking and editing, and assignment of NMR spectra are necessary components of any efficient integrated system for protein NMR structure analysis. We have developed a set of software tools designated AutoProc, AutoPeak, and AutoAssign, which function together with the data processing and peak-picking programs NMRPipe and Sparky, to provide an integrated software system for rapid analysis of protein backbone resonance assignments. In this paper we demonstrate that these tools, together with high-sensitivity triple resonance NMR cryoprobes for data collection and a Linux-based computer cluster architecture, can be combined to provide nearly complete backbone resonance assignments and secondary structures (based on chemical shift data) for a 59-residue protein in less than 30 hours of data collection and processing time. In this optimum case of a small protein providing excellent spectra, extensive backbone resonance assignments could also be obtained using less than 6 hours of data collection and processing time. These results demonstrate the feasibility of high throughput triple resonance NMR for determining resonance assignments and secondary structures of small proteins, and the potential for applying NMR in large scale structural proteomics projects.
Rubinson, Emily H.; Metz, Audrey H.; O'Quin, Jami; Eichman, Brandt F.
2013-01-01
Summary DNA glycosylases safeguard the genome by locating and excising chemically modified bases from DNA. AlkD is a recently discovered bacterial DNA glycosylase that removes positively charged methylpurines from DNA, and was predicted to adopt a protein fold distinct from other DNA repair proteins. The crystal structure of Bacillus cereus AlkD presented here shows that the protein is composed exclusively of helical HEAT-like repeats, which form a solenoid perfectly shaped to accommodate a DNA duplex on the concave surface. Structural analysis of the variant HEAT repeats in AlkD provides a rationale for how this protein scaffolding motif has been modified to bind DNA. We report 7mG excision and DNA binding activities of AlkD mutants, along with a comparison of alkylpurine DNA glycosylase structures. Together, these data provide important insight into the requirements for alkylation repair within DNA and suggest that AlkD utilizes a novel strategy to manipulate DNA in its search for alkylpurine bases. PMID:18585735
Rebelling for a Reason: Protein Structural “Outliers”
Arumugam, Gandhimathi; Nair, Anu G.; Hariharaputran, Sridhar; Ramanathan, Sowdhamini
2013-01-01
Analysis of structural variation in domain superfamilies can reveal constraints in protein evolution which aids protein structure prediction and classification. Structure-based sequence alignment of distantly related proteins, organized in PASS2 database, provides clues about structurally conserved regions among different functional families. Some superfamily members show large structural differences which are functionally relevant. This paper analyses the impact of structural divergence on function for multi-member superfamilies, selected from the PASS2 superfamily alignment database. Functional annotations within superfamilies, with structural outliers or ‘rebels’, are discussed in the context of structural variations. Overall, these data reinforce the idea that functional similarities cannot be extrapolated from mere structural conservation. The implication for fold-function prediction is that the functional annotations can only be inherited with very careful consideration, especially at low sequence identities. PMID:24073209
Protein space: a natural method for realizing the nature of protein universe.
Yu, Chenglong; Deng, Mo; Cheng, Shiu-Yuen; Yau, Shek-Chung; He, Rong L; Yau, Stephen S-T
2013-02-07
Current methods cannot tell us what the nature of the protein universe is concretely. They are based on different models of amino acid substitution and multiple sequence alignment which is an NP-hard problem and requires manual intervention. Protein structural analysis also gives a direction for mapping the protein universe. Unfortunately, now only a minuscule fraction of proteins' 3-dimensional structures are known. Furthermore, the phylogenetic tree representations are not unique for any existing tree construction methods. Here we develop a novel method to realize the nature of protein universe. We show the protein universe can be realized as a protein space in 60-dimensional Euclidean space using a distance based on a normalized distribution of amino acids. Every protein is in one-to-one correspondence with a point in protein space, where proteins with similar properties stay close together. Thus the distance between two points in protein space represents the biological distance of the corresponding two proteins. We also propose a natural graphical representation for inferring phylogenies. The representation is natural and unique based on the biological distances of proteins in protein space. This will solve the fundamental question of how proteins are distributed in the protein universe. Copyright © 2012 Elsevier Ltd. All rights reserved.
Protein Secondary Structure Prediction Using AutoEncoder Network and Bayes Classifier
NASA Astrophysics Data System (ADS)
Wang, Leilei; Cheng, Jinyong
2018-03-01
Protein secondary structure prediction is belong to bioinformatics,and it's important in research area. In this paper, we propose a new prediction way of protein using bayes classifier and autoEncoder network. Our experiments show some algorithms including the construction of the model, the classification of parameters and so on. The data set is a typical CB513 data set for protein. In terms of accuracy, the method is the cross validation based on the 3-fold. Then we can get the Q3 accuracy. Paper results illustrate that the autoencoder network improved the prediction accuracy of protein secondary structure.
MEGADOCK: An All-to-All Protein-Protein Interaction Prediction System Using Tertiary Structure Data
Ohue, Masahito; Matsuzaki, Yuri; Uchikoga, Nobuyuki; Ishida, Takashi; Akiyama, Yutaka
2014-01-01
The elucidation of protein-protein interaction (PPI) networks is important for understanding cellular structure and function and structure-based drug design. However, the development of an effective method to conduct exhaustive PPI screening represents a computational challenge. We have been investigating a protein docking approach based on shape complementarity and physicochemical properties. We describe here the development of the protein-protein docking software package “MEGADOCK” that samples an extremely large number of protein dockings at high speed. MEGADOCK reduces the calculation time required for docking by using several techniques such as a novel scoring function called the real Pairwise Shape Complementarity (rPSC) score. We showed that MEGADOCK is capable of exhaustive PPI screening by completing docking calculations 7.5 times faster than the conventional docking software, ZDOCK, while maintaining an acceptable level of accuracy. When MEGADOCK was applied to a subset of a general benchmark dataset to predict 120 relevant interacting pairs from 120 x 120 = 14,400 combinations of proteins, an F-measure value of 0.231 was obtained. Further, we showed that MEGADOCK can be applied to a large-scale protein-protein interaction-screening problem with accuracy better than random. When our approach is combined with parallel high-performance computing systems, it is now feasible to search and analyze protein-protein interactions while taking into account three-dimensional structures at the interactome scale. MEGADOCK is freely available at http://www.bi.cs.titech.ac.jp/megadock. PMID:23855673
Wang, Hsin-Wei; Hsu, Yen-Chu; Hwang, Jenn-Kang; Lyu, Ping-Chiang; Pai, Tun-Wen; Tang, Chuan Yi
2010-01-01
This work presents a novel detection method for three-dimensional domain swapping (DS), a mechanism for forming protein quaternary structures that can be visualized as if monomers had “opened” their “closed” structures and exchanged the opened portion to form intertwined oligomers. Since the first report of DS in the mid 1990s, an increasing number of identified cases has led to the postulation that DS might occur in a protein with an unconstrained terminus under appropriate conditions. DS may play important roles in the molecular evolution and functional regulation of proteins and the formation of depositions in Alzheimer's and prion diseases. Moreover, it is promising for designing auto-assembling biomaterials. Despite the increasing interest in DS, related bioinformatics methods are rarely available. Owing to a dramatic conformational difference between the monomeric/closed and oligomeric/open forms, conventional structural comparison methods are inadequate for detecting DS. Hence, there is also a lack of comprehensive datasets for studying DS. Based on angle-distance (A-D) image transformations of secondary structural elements (SSEs), specific patterns within A-D images can be recognized and classified for structural similarities. In this work, a matching algorithm to extract corresponding SSE pairs from A-D images and a novel DS score have been designed and demonstrated to be applicable to the detection of DS relationships. The Matthews correlation coefficient (MCC) and sensitivity of the proposed DS-detecting method were higher than 0.81 even when the sequence identities of the proteins examined were lower than 10%. On average, the alignment percentage and root-mean-square distance (RMSD) computed by the proposed method were 90% and 1.8Å for a set of 1,211 DS-related pairs of proteins. The performances of structural alignments remain high and stable for DS-related homologs with less than 10% sequence identities. In addition, the quality of its hinge loop determination is comparable to that of manual inspection. This method has been implemented as a web-based tool, which requires two protein structures as the input and then the type and/or existence of DS relationships between the input structures are determined according to the A-D image-based structural alignments and the DS score. The proposed method is expected to trigger large-scale studies of this interesting structural phenomenon and facilitate related applications. PMID:20976204
Stengel, Florian; Aebersold, Ruedi; Robinson, Carol V.
2012-01-01
Protein assemblies are critical for cellular function and understanding their physical organization is the key aim of structural biology. However, applying conventional structural biology approaches is challenging for transient, dynamic, or polydisperse assemblies. There is therefore a growing demand for hybrid technologies that are able to complement classical structural biology methods and thereby broaden our arsenal for the study of these important complexes. Exciting new developments in the field of mass spectrometry and proteomics have added a new dimension to the study of protein-protein interactions and protein complex architecture. In this review, we focus on how complementary mass spectrometry-based techniques can greatly facilitate structural understanding of protein assemblies. PMID:22180098
Mass spectrometry-based carboxyl footprinting of proteins: Method evaluation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhang, Hao; Wen, Jianzhong; Huang, Richard Y-C.
2012-02-01
Protein structure determines function in biology, and a variety of approaches have been employed to obtain structural information about proteins. Mass spectrometry-based protein footprinting is one fast-growing approach. One labeling-based footprinting approach is the use of a water-soluble carbodiimide, 1-ethyl-3-(3-dimethylaminopropyl)carbodiimide (EDC) and glycine ethyl ester (GEE) to modify solvent-accessible carboxyl groups on glutamate (E) and aspartate (D). This paper describes method development of carboxyl-group modification in protein footprinting. The modification protocol was evaluated by using the protein calmodulin as a model. Because carboxyl-group modification is a slow reaction relative to protein folding and unfolding, there is an issue that modificationsmore » at certain sites may induce protein unfolding and lead to additional modification at sites that are not solvent-accessible in the wild-type protein. We investigated this possibility by using hydrogen deuterium amide exchange (H/DX). The study demonstrated that application of carboxyl group modification in probing conformational changes in calmodulin induced by Ca{sup 2+} binding provides useful information that is not compromised by modification-induced protein unfolding.« less
On the relationship between residue structural environment and sequence conservation in proteins.
Liu, Jen-Wei; Lin, Jau-Ji; Cheng, Chih-Wen; Lin, Yu-Feng; Hwang, Jenn-Kang; Huang, Tsun-Tsao
2017-09-01
Residues that are crucial to protein function or structure are usually evolutionarily conserved. To identify the important residues in protein, sequence conservation is estimated, and current methods rely upon the unbiased collection of homologous sequences. Surprisingly, our previous studies have shown that the sequence conservation is closely correlated with the weighted contact number (WCN), a measure of packing density for residue's structural environment, calculated only based on the C α positions of a protein structure. Moreover, studies have shown that sequence conservation is correlated with environment-related structural properties calculated based on different protein substructures, such as a protein's all atoms, backbone atoms, side-chain atoms, or side-chain centroid. To know whether the C α atomic positions are adequate to show the relationship between residue environment and sequence conservation or not, here we compared C α atoms with other substructures in their contributions to the sequence conservation. Our results show that C α positions are substantially equivalent to the other substructures in calculations of various measures of residue environment. As a result, the overlapping contributions between C α atoms and the other substructures are high, yielding similar structure-conservation relationship. Take the WCN as an example, the average overlapping contribution to sequence conservation is 87% between C α and all-atom substructures. These results indicate that only C α atoms of a protein structure could reflect sequence conservation at the residue level. © 2017 Wiley Periodicals, Inc.
Comparative Protein Structure Modeling Using MODELLER.
Webb, Benjamin; Sali, Andrej
2014-09-08
Functional characterization of a protein sequence is one of the most frequent problems in biology. This task is usually facilitated by accurate three-dimensional (3-D) structure of the studied protein. In the absence of an experimentally determined structure, comparative or homology modeling can sometimes provide a useful 3-D model for a protein that is related to at least one known protein structure. Comparative modeling predicts the 3-D structure of a given protein sequence (target) based primarily on its alignment to one or more proteins of known structure (templates). The prediction process consists of fold assignment, target-template alignment, model building, and model evaluation. This unit describes how to calculate comparative models using the program MODELLER and discusses all four steps of comparative modeling, frequently observed errors, and some applications. Modeling lactate dehydrogenase from Trichomonas vaginalis (TvLDH) is described as an example. The download and installation of the MODELLER software is also described. Copyright © 2014 John Wiley & Sons, Inc.
Tertiary alphabet for the observable protein structural universe.
Mackenzie, Craig O; Zhou, Jianfu; Grigoryan, Gevorg
2016-11-22
Here, we systematically decompose the known protein structural universe into its basic elements, which we dub tertiary structural motifs (TERMs). A TERM is a compact backbone fragment that captures the secondary, tertiary, and quaternary environments around a given residue, comprising one or more disjoint segments (three on average). We seek the set of universal TERMs that capture all structure in the Protein Data Bank (PDB), finding remarkable degeneracy. Only ∼600 TERMs are sufficient to describe 50% of the PDB at sub-Angstrom resolution. However, more rare geometries also exist, and the overall structural coverage grows logarithmically with the number of TERMs. We go on to show that universal TERMs provide an effective mapping between sequence and structure. We demonstrate that TERM-based statistics alone are sufficient to recapitulate close-to-native sequences given either NMR or X-ray backbones. Furthermore, sequence variability predicted from TERM data agrees closely with evolutionary variation. Finally, locations of TERMs in protein chains can be predicted from sequence alone based on sequence signatures emergent from TERM instances in the PDB. For multisegment motifs, this method identifies spatially adjacent fragments that are not contiguous in sequence-a major bottleneck in structure prediction. Although all TERMs recur in diverse proteins, some appear specialized for certain functions, such as interface formation, metal coordination, or even water binding. Structural biology has benefited greatly from previously observed degeneracies in structure. The decomposition of the known structural universe into a finite set of compact TERMs offers exciting opportunities toward better understanding, design, and prediction of protein structure.
Tertiary alphabet for the observable protein structural universe
Mackenzie, Craig O.; Zhou, Jianfu; Grigoryan, Gevorg
2016-01-01
Here, we systematically decompose the known protein structural universe into its basic elements, which we dub tertiary structural motifs (TERMs). A TERM is a compact backbone fragment that captures the secondary, tertiary, and quaternary environments around a given residue, comprising one or more disjoint segments (three on average). We seek the set of universal TERMs that capture all structure in the Protein Data Bank (PDB), finding remarkable degeneracy. Only ∼600 TERMs are sufficient to describe 50% of the PDB at sub-Angstrom resolution. However, more rare geometries also exist, and the overall structural coverage grows logarithmically with the number of TERMs. We go on to show that universal TERMs provide an effective mapping between sequence and structure. We demonstrate that TERM-based statistics alone are sufficient to recapitulate close-to-native sequences given either NMR or X-ray backbones. Furthermore, sequence variability predicted from TERM data agrees closely with evolutionary variation. Finally, locations of TERMs in protein chains can be predicted from sequence alone based on sequence signatures emergent from TERM instances in the PDB. For multisegment motifs, this method identifies spatially adjacent fragments that are not contiguous in sequence—a major bottleneck in structure prediction. Although all TERMs recur in diverse proteins, some appear specialized for certain functions, such as interface formation, metal coordination, or even water binding. Structural biology has benefited greatly from previously observed degeneracies in structure. The decomposition of the known structural universe into a finite set of compact TERMs offers exciting opportunities toward better understanding, design, and prediction of protein structure. PMID:27810958
Structural alphabets derived from attractors in conformational space
2010-01-01
Background The hierarchical and partially redundant nature of protein structures justifies the definition of frequently occurring conformations of short fragments as 'states'. Collections of selected representatives for these states define Structural Alphabets, describing the most typical local conformations within protein structures. These alphabets form a bridge between the string-oriented methods of sequence analysis and the coordinate-oriented methods of protein structure analysis. Results A Structural Alphabet has been derived by clustering all four-residue fragments of a high-resolution subset of the protein data bank and extracting the high-density states as representative conformational states. Each fragment is uniquely defined by a set of three independent angles corresponding to its degrees of freedom, capturing in simple and intuitive terms the properties of the conformational space. The fragments of the Structural Alphabet are equivalent to the conformational attractors and therefore yield a most informative encoding of proteins. Proteins can be reconstructed within the experimental uncertainty in structure determination and ensembles of structures can be encoded with accuracy and robustness. Conclusions The density-based Structural Alphabet provides a novel tool to describe local conformations and it is specifically suitable for application in studies of protein dynamics. PMID:20170534
How large B-factors can be in protein crystal structures.
Carugo, Oliviero
2018-02-23
Protein crystal structures are potentially over-interpreted since they are routinely refined without any restraint on the upper limit of atomic B-factors. Consequently, some of their atoms, undetected in the electron density maps, are allowed to reach extremely large B-factors, even above 100 square Angstroms, and their final positions are purely speculative and not based on any experimental evidence. A strategy to define B-factors upper limits is described here, based on the analysis of protein crystal structures deposited in the Protein Data Bank prior 2008, when the tendency to allow B-factor to arbitrary inflate was limited. This B-factor upper limit (B_max) is determined by extrapolating the relationship between crystal structure average B-factor and percentage of crystal volume occupied by solvent (pcVol) to pcVol =100%, when, ab absurdo, the crystal contains only liquid solvent, the structure of which is, by definition, undetectable in electron density maps. It is thus possible to highlight structures with average B-factors larger than B_max, which should be considered with caution by the users of the information deposited in the Protein Data Bank, in order to avoid scientifically deleterious over-interpretations.
CCBuilder 2.0: Powerful and accessible coiled‐coil modeling
Wood, Christopher W.
2017-01-01
Abstract The increased availability of user‐friendly and accessible computational tools for biomolecular modeling would expand the reach and application of biomolecular engineering and design. For protein modeling, one key challenge is to reduce the complexities of 3D protein folds to sets of parametric equations that nonetheless capture the salient features of these structures accurately. At present, this is possible for a subset of proteins, namely, repeat proteins. The α‐helical coiled coil provides one such example, which represents ≈ 3–5% of all known protein‐encoding regions of DNA. Coiled coils are bundles of α helices that can be described by a small set of structural parameters. Here we describe how this parametric description can be implemented in an easy‐to‐use web application, called CCBuilder 2.0, for modeling and optimizing both α‐helical coiled coils and polyproline‐based collagen triple helices. This has many applications from providing models to aid molecular replacement for X‐ray crystallography, in silico model building and engineering of natural and designed protein assemblies, and through to the creation of completely de novo “dark matter” protein structures. CCBuilder 2.0 is available as a web‐based application, the code for which is open‐source and can be downloaded freely. http://coiledcoils.chm.bris.ac.uk/ccbuilder2. Lay Summary We have created CCBuilder 2.0, an easy to use web‐based application that can model structures for a whole class of proteins, the α‐helical coiled coil, which is estimated to account for 3–5% of all proteins in nature. CCBuilder 2.0 will be of use to a large number of protein scientists engaged in fundamental studies, such as protein structure determination, through to more‐applied research including designing and engineering novel proteins that have potential applications in biotechnology. PMID:28836317
Munteanu, Cristian R; Gonzalez-Diaz, Humberto; Garcia, Rafael; Loza, Mabel; Pazos, Alejandro
2015-01-01
The molecular information encoding into molecular descriptors is the first step into in silico Chemoinformatics methods in Drug Design. The Machine Learning methods are a complex solution to find prediction models for specific biological properties of molecules. These models connect the molecular structure information such as atom connectivity (molecular graphs) or physical-chemical properties of an atom/group of atoms to the molecular activity (Quantitative Structure - Activity Relationship, QSAR). Due to the complexity of the proteins, the prediction of their activity is a complicated task and the interpretation of the models is more difficult. The current review presents a series of 11 prediction models for proteins, implemented as free Web tools on an Artificial Intelligence Model Server in Biosciences, Bio-AIMS (http://bio-aims.udc.es/TargetPred.php). Six tools predict protein activity, two models evaluate drug - protein target interactions and the other three calculate protein - protein interactions. The input information is based on the protein 3D structure for nine models, 1D peptide amino acid sequence for three tools and drug SMILES formulas for two servers. The molecular graph descriptor-based Machine Learning models could be useful tools for in silico screening of new peptides/proteins as future drug targets for specific treatments.
SVM-Fold: a tool for discriminative multi-class protein fold and superfamily recognition
Melvin, Iain; Ie, Eugene; Kuang, Rui; Weston, Jason; Stafford, William Noble; Leslie, Christina
2007-01-01
Background Predicting a protein's structural class from its amino acid sequence is a fundamental problem in computational biology. Much recent work has focused on developing new representations for protein sequences, called string kernels, for use with support vector machine (SVM) classifiers. However, while some of these approaches exhibit state-of-the-art performance at the binary protein classification problem, i.e. discriminating between a particular protein class and all other classes, few of these studies have addressed the real problem of multi-class superfamily or fold recognition. Moreover, there are only limited software tools and systems for SVM-based protein classification available to the bioinformatics community. Results We present a new multi-class SVM-based protein fold and superfamily recognition system and web server called SVM-Fold, which can be found at . Our system uses an efficient implementation of a state-of-the-art string kernel for sequence profiles, called the profile kernel, where the underlying feature representation is a histogram of inexact matching k-mer frequencies. We also employ a novel machine learning approach to solve the difficult multi-class problem of classifying a sequence of amino acids into one of many known protein structural classes. Binary one-vs-the-rest SVM classifiers that are trained to recognize individual structural classes yield prediction scores that are not comparable, so that standard "one-vs-all" classification fails to perform well. Moreover, SVMs for classes at different levels of the protein structural hierarchy may make useful predictions, but one-vs-all does not try to combine these multiple predictions. To deal with these problems, our method learns relative weights between one-vs-the-rest classifiers and encodes information about the protein structural hierarchy for multi-class prediction. In large-scale benchmark results based on the SCOP database, our code weighting approach significantly improves on the standard one-vs-all method for both the superfamily and fold prediction in the remote homology setting and on the fold recognition problem. Moreover, our code weight learning algorithm strongly outperforms nearest-neighbor methods based on PSI-BLAST in terms of prediction accuracy on every structure classification problem we consider. Conclusion By combining state-of-the-art SVM kernel methods with a novel multi-class algorithm, the SVM-Fold system delivers efficient and accurate protein fold and superfamily recognition. PMID:17570145
Use of designed sequences in protein structure recognition.
Kumar, Gayatri; Mudgal, Richa; Srinivasan, Narayanaswamy; Sandhya, Sankaran
2018-05-09
Knowledge of the protein structure is a pre-requisite for improved understanding of molecular function. The gap in the sequence-structure space has increased in the post-genomic era. Grouping related protein sequences into families can aid in narrowing the gap. In the Pfam database, structure description is provided for part or full-length proteins of 7726 families. For the remaining 52% of the families, information on 3-D structure is not yet available. We use the computationally designed sequences that are intermediately related to two protein domain families, which are already known to share the same fold. These strategically designed sequences enable detection of distant relationships and here, we have employed them for the purpose of structure recognition of protein families of yet unknown structure. We first measured the success rate of our approach using a dataset of protein families of known fold and achieved a success rate of 88%. Next, for 1392 families of yet unknown structure, we made structural assignments for part/full length of the proteins. Fold association for 423 domains of unknown function (DUFs) are provided as a step towards functional annotation. The results indicate that knowledge-based filling of gaps in protein sequence space is a lucrative approach for structure recognition. Such sequences assist in traversal through protein sequence space and effectively function as 'linkers', where natural linkers between distant proteins are unavailable. This article was reviewed by Oliviero Carugo, Christine Orengo and Srikrishna Subramanian.
Homology modeling a fast tool for drug discovery: current perspectives.
Vyas, V K; Ukawala, R D; Ghate, M; Chintha, C
2012-01-01
Major goal of structural biology involve formation of protein-ligand complexes; in which the protein molecules act energetically in the course of binding. Therefore, perceptive of protein-ligand interaction will be very important for structure based drug design. Lack of knowledge of 3D structures has hindered efforts to understand the binding specificities of ligands with protein. With increasing in modeling software and the growing number of known protein structures, homology modeling is rapidly becoming the method of choice for obtaining 3D coordinates of proteins. Homology modeling is a representation of the similarity of environmental residues at topologically corresponding positions in the reference proteins. In the absence of experimental data, model building on the basis of a known 3D structure of a homologous protein is at present the only reliable method to obtain the structural information. Knowledge of the 3D structures of proteins provides invaluable insights into the molecular basis of their functions. The recent advances in homology modeling, particularly in detecting and aligning sequences with template structures, distant homologues, modeling of loops and side chains as well as detecting errors in a model contributed to consistent prediction of protein structure, which was not possible even several years ago. This review focused on the features and a role of homology modeling in predicting protein structure and described current developments in this field with victorious applications at the different stages of the drug design and discovery.
Homology Modeling a Fast Tool for Drug Discovery: Current Perspectives
Vyas, V. K.; Ukawala, R. D.; Ghate, M.; Chintha, C.
2012-01-01
Major goal of structural biology involve formation of protein-ligand complexes; in which the protein molecules act energetically in the course of binding. Therefore, perceptive of protein-ligand interaction will be very important for structure based drug design. Lack of knowledge of 3D structures has hindered efforts to understand the binding specificities of ligands with protein. With increasing in modeling software and the growing number of known protein structures, homology modeling is rapidly becoming the method of choice for obtaining 3D coordinates of proteins. Homology modeling is a representation of the similarity of environmental residues at topologically corresponding positions in the reference proteins. In the absence of experimental data, model building on the basis of a known 3D structure of a homologous protein is at present the only reliable method to obtain the structural information. Knowledge of the 3D structures of proteins provides invaluable insights into the molecular basis of their functions. The recent advances in homology modeling, particularly in detecting and aligning sequences with template structures, distant homologues, modeling of loops and side chains as well as detecting errors in a model contributed to consistent prediction of protein structure, which was not possible even several years ago. This review focused on the features and a role of homology modeling in predicting protein structure and described current developments in this field with victorious applications at the different stages of the drug design and discovery. PMID:23204616
Crysalis: an integrated server for computational analysis and design of protein crystallization.
Wang, Huilin; Feng, Liubin; Zhang, Ziding; Webb, Geoffrey I; Lin, Donghai; Song, Jiangning
2016-02-24
The failure of multi-step experimental procedures to yield diffraction-quality crystals is a major bottleneck in protein structure determination. Accordingly, several bioinformatics methods have been successfully developed and employed to select crystallizable proteins. Unfortunately, the majority of existing in silico methods only allow the prediction of crystallization propensity, seldom enabling computational design of protein mutants that can be targeted for enhancing protein crystallizability. Here, we present Crysalis, an integrated crystallization analysis tool that builds on support-vector regression (SVR) models to facilitate computational protein crystallization prediction, analysis, and design. More specifically, the functionality of this new tool includes: (1) rapid selection of target crystallizable proteins at the proteome level, (2) identification of site non-optimality for protein crystallization and systematic analysis of all potential single-point mutations that might enhance protein crystallization propensity, and (3) annotation of target protein based on predicted structural properties. We applied the design mode of Crysalis to identify site non-optimality for protein crystallization on a proteome-scale, focusing on proteins currently classified as non-crystallizable. Our results revealed that site non-optimality is based on biases related to residues, predicted structures, physicochemical properties, and sequence loci, which provides in-depth understanding of the features influencing protein crystallization. Crysalis is freely available at http://nmrcen.xmu.edu.cn/crysalis/.
Crysalis: an integrated server for computational analysis and design of protein crystallization
Wang, Huilin; Feng, Liubin; Zhang, Ziding; Webb, Geoffrey I.; Lin, Donghai; Song, Jiangning
2016-01-01
The failure of multi-step experimental procedures to yield diffraction-quality crystals is a major bottleneck in protein structure determination. Accordingly, several bioinformatics methods have been successfully developed and employed to select crystallizable proteins. Unfortunately, the majority of existing in silico methods only allow the prediction of crystallization propensity, seldom enabling computational design of protein mutants that can be targeted for enhancing protein crystallizability. Here, we present Crysalis, an integrated crystallization analysis tool that builds on support-vector regression (SVR) models to facilitate computational protein crystallization prediction, analysis, and design. More specifically, the functionality of this new tool includes: (1) rapid selection of target crystallizable proteins at the proteome level, (2) identification of site non-optimality for protein crystallization and systematic analysis of all potential single-point mutations that might enhance protein crystallization propensity, and (3) annotation of target protein based on predicted structural properties. We applied the design mode of Crysalis to identify site non-optimality for protein crystallization on a proteome-scale, focusing on proteins currently classified as non-crystallizable. Our results revealed that site non-optimality is based on biases related to residues, predicted structures, physicochemical properties, and sequence loci, which provides in-depth understanding of the features influencing protein crystallization. Crysalis is freely available at http://nmrcen.xmu.edu.cn/crysalis/. PMID:26906024
Exploring the Sequence-based Prediction of Folding Initiation Sites in Proteins.
Raimondi, Daniele; Orlando, Gabriele; Pancsa, Rita; Khan, Taushif; Vranken, Wim F
2017-08-18
Protein folding is a complex process that can lead to disease when it fails. Especially poorly understood are the very early stages of protein folding, which are likely defined by intrinsic local interactions between amino acids close to each other in the protein sequence. We here present EFoldMine, a method that predicts, from the primary amino acid sequence of a protein, which amino acids are likely involved in early folding events. The method is based on early folding data from hydrogen deuterium exchange (HDX) data from NMR pulsed labelling experiments, and uses backbone and sidechain dynamics as well as secondary structure propensities as features. The EFoldMine predictions give insights into the folding process, as illustrated by a qualitative comparison with independent experimental observations. Furthermore, on a quantitative proteome scale, the predicted early folding residues tend to become the residues that interact the most in the folded structure, and they are often residues that display evolutionary covariation. The connection of the EFoldMine predictions with both folding pathway data and the folded protein structure suggests that the initial statistical behavior of the protein chain with respect to local structure formation has a lasting effect on its subsequent states.
Chira, Camelia; Horvath, Dragos; Dumitrescu, D
2011-07-30
Proteins are complex structures made of amino acids having a fundamental role in the correct functioning of living cells. The structure of a protein is the result of the protein folding process. However, the general principles that govern the folding of natural proteins into a native structure are unknown. The problem of predicting a protein structure with minimum-energy starting from the unfolded amino acid sequence is a highly complex and important task in molecular and computational biology. Protein structure prediction has important applications in fields such as drug design and disease prediction. The protein structure prediction problem is NP-hard even in simplified lattice protein models. An evolutionary model based on hill-climbing genetic operators is proposed for protein structure prediction in the hydrophobic - polar (HP) model. Problem-specific search operators are implemented and applied using a steepest-ascent hill-climbing approach. Furthermore, the proposed model enforces an explicit diversification stage during the evolution in order to avoid local optimum. The main features of the resulting evolutionary algorithm - hill-climbing mechanism and diversification strategy - are evaluated in a set of numerical experiments for the protein structure prediction problem to assess their impact to the efficiency of the search process. Furthermore, the emerging consolidated model is compared to relevant algorithms from the literature for a set of difficult bidimensional instances from lattice protein models. The results obtained by the proposed algorithm are promising and competitive with those of related methods.
A non-heme iron-mediated chemical demethylation in DNA and RNA.
Yi, Chengqi; Yang, Cai-Guang; He, Chuan
2009-04-21
DNA methylation is arguably one of the most important chemical signals in biology. However, aberrant DNA methylation can lead to cytotoxic or mutagenic consequences. A DNA repair protein in Escherichia coli, AlkB, corrects some of the unwanted methylations of DNA bases by a unique oxidative demethylation in which the methyl carbon is liberated as formaldehyde. The enzyme also repairs exocyclic DNA lesions--that is, derivatives in which the base is augmented with an additional heterocyclic subunit--by a similar mechanism. Two proteins in humans that are homologous to AlkB, ABH2 and ABH3, repair the same spectrum of lesions; another human homologue of AlkB, FTO, is linked to obesity. In this Account, we describe our studies of AlkB, ABH2, and ABH3, including our development of a general strategy to trap homogeneous protein-DNA complexes through active-site disulfide cross-linking. AlkB uses a non-heme mononuclear iron(II) and the cofactors 2-ketoglutarate (2KG) and dioxygen to effect oxidative demethylation of the DNA base lesions 1-methyladenine (1-meA), 3-methylcytosine (3-meC), 1-methylguanine (1-meG), and 3-methylthymine (3-meT). ABH3, like AlkB, works better on single-stranded DNA (ssDNA) and is capable of repairing damaged bases in RNA. Conversely, ABH2 primarily repairs lesions in double-stranded DNA (dsDNA); it is the main housekeeping enzyme that protects the mammalian genome from 1-meA base damage. The AlkB-family proteins have moderate affinities for their substrates and bind DNA in a non-sequence-specific manner. Knowing that these proteins flip the damaged base out from the duplex DNA and insert it into the active site for further processing, we first engineered a disulfide cross-link in the active site to stabilize the Michaelis complex. Based on the detailed structural information afforded by the active-site cross-linked structures, we can readily install a cross-link away from the active site to obtain the native-like structures of these complexes. The crystal structures show a distinct base-flipping feature in AlkB and establish ABH2 as a dsDNA repair protein. They also provide a molecular framework for understanding the demethylation reaction catalyzed by these proteins and help to explain their substrate preferences. The chemical cross-linking method demonstrated here can be applied to trap other labile protein-DNA interactions and can serve as a general strategy for exploring the structural and functional aspects of base-flipping proteins.
Nielsen, Jens E.; Gunner, M. R.; Bertrand García-Moreno, E.
2012-01-01
The pKa Cooperative http://www.pkacoop.org was organized to advance development of accurate and useful computational methods for structure-based calculation of pKa values and electrostatic energy in proteins. The Cooperative brings together laboratories with expertise and interest in theoretical, computational and experimental studies of protein electrostatics. To improve structure-based energy calculations it is necessary to better understand the physical character and molecular determinants of electrostatic effects. The Cooperative thus intends to foment experimental research into fundamental aspects of proteins that depend on electrostatic interactions. It will maintain a depository for experimental data useful for critical assessment of methods for structure-based electrostatics calculations. To help guide the development of computational methods the Cooperative will organize blind prediction exercises. As a first step, computational laboratories were invited to reproduce an unpublished set of experimental pKa values of acidic and basic residues introduced in the interior of staphylococcal nuclease by site-directed mutagenesis. The pKa values of these groups are unique and challenging to simulate owing to the large magnitude of their shifts relative to normal pKa values in water. Many computational methods were tested in this 1st Blind Prediction Challenge and critical assessment exercise. A workshop was organized in the Telluride Science Research Center to assess objectively the performance of many computational methods tested on this one extensive dataset. This volume of PROTEINS: Structure, Function, and Bioinformatics introduces the pKa Cooperative, presents reports submitted by participants in the blind prediction challenge, and highlights some of the problems in structure-based calculations identified during this exercise. PMID:22002877
Mapping of Ligand-Binding Cavities in Proteins
Andersson, C. David; Chen, Brian Y.; Linusson, Anna
2010-01-01
The complex interactions between proteins and small organic molecules (ligands) are intensively studied because they play key roles in biological processes and drug activities. Here, we present a novel approach to characterise and map the ligand-binding cavities of proteins without direct geometric comparison of structures, based on Principal Component Analysis of cavity properties (related mainly to size, polarity and charge). This approach can provide valuable information on the similarities, and dissimilarities, of binding cavities due to mutations, between-species differences and flexibility upon ligand-binding. The presented results show that information on ligand-binding cavity variations can complement information on protein similarity obtained from sequence comparisons. The predictive aspect of the method is exemplified by successful predictions of serine proteases that were not included in the model construction. The presented strategy to compare ligand-binding cavities of related and unrelated proteins has many potential applications within protein and medicinal chemistry, for example in the characterisation and mapping of “orphan structures”, selection of protein structures for docking studies in structure-based design and identification of proteins for selectivity screens in drug design programs. PMID:20034113
Protein Interaction Profile Sequencing (PIP-seq).
Foley, Shawn W; Gregory, Brian D
2016-10-10
Every eukaryotic RNA transcript undergoes extensive post-transcriptional processing from the moment of transcription up through degradation. This regulation is performed by a distinct cohort of RNA-binding proteins which recognize their target transcript by both its primary sequence and secondary structure. Here, we describe protein interaction profile sequencing (PIP-seq), a technique that uses ribonuclease-based footprinting followed by high-throughput sequencing to globally assess both protein-bound RNA sequences and RNA secondary structure. PIP-seq utilizes single- and double-stranded RNA-specific nucleases in the absence of proteins to infer RNA secondary structure. These libraries are also compared to samples that undergo nuclease digestion in the presence of proteins in order to find enriched protein-bound sequences. Combined, these four libraries provide a comprehensive, transcriptome-wide view of RNA secondary structure and RNA protein interaction sites from a single experimental technique. © 2016 by John Wiley & Sons, Inc. Copyright © 2016 John Wiley & Sons, Inc.
Qin, Nan; Zhang, Shaoqing; Jiang, Jianjuan; Corder, Stephanie Gilbert; Qian, Zhigang; Zhou, Zhitao; Lee, Woonsoo; Liu, Keyin; Wang, Xiaohan; Li, Xinxin; Shi, Zhifeng; Mao, Ying; Bechtel, Hans A.; Martin, Michael C.; Xia, Xiaoxia; Marelli, Benedetto; Kaplan, David L.; Omenetto, Fiorenzo G.; Liu, Mengkun; Tao, Tiger H.
2016-01-01
Silk protein fibres produced by silkworms and spiders are renowned for their unparalleled mechanical strength and extensibility arising from their high-β-sheet crystal contents as natural materials. Investigation of β-sheet-oriented conformational transitions in silk proteins at the nanoscale remains a challenge using conventional imaging techniques given their limitations in chemical sensitivity or limited spatial resolution. Here, we report on electron-regulated nanoscale polymorphic transitions in silk proteins revealed by near-field infrared imaging and nano-spectroscopy at resolutions approaching the molecular level. The ability to locally probe nanoscale protein structural transitions combined with nanometre-precision electron-beam lithography offers us the capability to finely control the structure of silk proteins in two and three dimensions. Our work paves the way for unlocking essential nanoscopic protein structures and critical conditions for electron-induced conformational transitions, offering new rules to design protein-based nanoarchitectures. PMID:27713412
G2S: a web-service for annotating genomic variants on 3D protein structures.
Wang, Juexin; Sheridan, Robert; Sumer, S Onur; Schultz, Nikolaus; Xu, Dong; Gao, Jianjiong
2018-06-01
Accurately mapping and annotating genomic locations on 3D protein structures is a key step in structure-based analysis of genomic variants detected by recent large-scale sequencing efforts. There are several mapping resources currently available, but none of them provides a web API (Application Programming Interface) that supports programmatic access. We present G2S, a real-time web API that provides automated mapping of genomic variants on 3D protein structures. G2S can align genomic locations of variants, protein locations, or protein sequences to protein structures and retrieve the mapped residues from structures. G2S API uses REST-inspired design and it can be used by various clients such as web browsers, command terminals, programming languages and other bioinformatics tools for bringing 3D structures into genomic variant analysis. The webserver and source codes are freely available at https://g2s.genomenexus.org. g2s@genomenexus.org. Supplementary data are available at Bioinformatics online.
Compton, L A; Johnson, W C
1986-05-15
Inverse circular dichroism (CD) spectra are presented for each of the five major secondary structures of proteins: alpha-helix, antiparallel and parallel beta-sheet, beta-turn, and other (random) structures. The fraction of the each secondary structure in a protein is predicted by forming the dot product of the corresponding inverse CD spectrum, expressed as a vector, with the CD spectrum of the protein digitized in the same way. We show how this method is based on the construction of the generalized inverse from the singular value decomposition of a set of CD spectra corresponding to proteins whose secondary structures are known from X-ray crystallography. These inverse spectra compute secondary structure directly from protein CD spectra without resorting to least-squares fitting and standard matrix inversion techniques. In addition, spectra corresponding to the individual secondary structures, analogous to the CD spectra of synthetic polypeptides, are generated from the five most significant CD eigenvectors.
Muthu Krishnan, S
2018-05-14
The receptor-associated protein (RAP) is an inhibitor of endocytic receptors that belong to the lipoprotein receptor gene family. In this study, a computational approach was tried to find the evolutionarily related fold of the RAP proteins. Through the structural and sequence-based analysis, found various protein folds that are very close to the RAP folds. Remote homolog datasets were used potentially to develop a different support vector machine (SVM) methods to recognize the homologous RAP fold. This study helps in understanding the relationship of RAP homologs folds based on the structure, function and evolutionary history. Copyright © 2018 Elsevier Ltd. All rights reserved.
Molecular chaperones: functional mechanisms and nanotechnological applications
NASA Astrophysics Data System (ADS)
Rosario Fernández-Fernández, M.; Sot, Begoña; María Valpuesta, José
2016-08-01
Molecular chaperones are a group of proteins that assist in protein homeostasis. They not only prevent protein misfolding and aggregation, but also target misfolded proteins for degradation. Despite differences in structure, all types of chaperones share a common general feature, a surface that recognizes and interacts with the misfolded protein. This and other, more specialized properties can be adapted for various nanotechnological purposes, by modification of the original biomolecules or by de novo design based on artificial structures.
Protein crystallography and infectious diseases.
Verlinde, C. L.; Merritt, E. A.; Van den Akker, F.; Kim, H.; Feil, I.; Delboni, L. F.; Mande, S. C.; Sarfaty, S.; Petra, P. H.; Hol, W. G.
1994-01-01
The current rapid growth in the number of known 3-dimensional protein structures is producing a database of structures that is increasingly useful as a starting point for the development of new medically relevant molecules such as drugs, therapeutic proteins, and vaccines. This development is beautifully illustrated in the recent book, Protein structure: New approaches to disease and therapy (Perutz, 1992). There is a great and growing promise for the design of molecules for the treatment or prevention of a wide variety of diseases, an endeavor made possible by the insights derived from the structure and function of crucial proteins from pathogenic organisms and from man. We present here 2 illustrations of structure-based drug design. The first is the prospect of developing antitrypanosomal drugs based on crystallographic, ligand-binding, and molecular modeling studies of glycolytic glycosomal enzymes from Trypanosomatidae. These unicellular organisms are responsible for several tropical diseases, including African and American trypanosomiases, as well as various forms of leishmaniasis. Because the target enzymes are also present in the human host, this project is a pioneering study in selective design. The second illustrative case is the prospect of designing anti-cholera drugs based on detailed analysis of the structure of cholera toxin and the closely related Escherichia coli heat-labile enterotoxin. Such potential drugs can be targeted either at inhibiting the toxin's receptor binding site or at blocking the toxin's intracellular catalytic activity. Study of the Vibrio cholerae and E. coli toxins serves at the same time as an example of a general approach to structure-based vaccine design. These toxins exhibit a remarkable ability to stimulate the mucosal immune system, and early results have suggested that this property can be maintained by engineered fusion proteins based on the native toxin structure. The challenge is thus to incorporate selected epitopes from foreign pathogens into the native framework of the toxin such that crucial features of both the epitope and the toxin are maintained. That is, the modified toxin must continue to evoke a strong mucosal immune response, and this response must be directed against an epitope conformation characteristic of the original pathogen. PMID:7849584
Bietz, Stefan; Inhester, Therese; Lauck, Florian; Sommer, Kai; von Behren, Mathias M; Fährrolfes, Rainer; Flachsenberg, Florian; Meyder, Agnes; Nittinger, Eva; Otto, Thomas; Hilbig, Matthias; Schomburg, Karen T; Volkamer, Andrea; Rarey, Matthias
2017-11-10
Nowadays, computational approaches are an integral part of life science research. Problems related to interpretation of experimental results, data analysis, or visualization tasks highly benefit from the achievements of the digital era. Simulation methods facilitate predictions of physicochemical properties and can assist in understanding macromolecular phenomena. Here, we will give an overview of the methods developed in our group that aim at supporting researchers from all life science areas. Based on state-of-the-art approaches from structural bioinformatics and cheminformatics, we provide software covering a wide range of research questions. Our all-in-one web service platform ProteinsPlus (http://proteins.plus) offers solutions for pocket and druggability prediction, hydrogen placement, structure quality assessment, ensemble generation, protein-protein interaction classification, and 2D-interaction visualization. Additionally, we provide a software package that contains tools targeting cheminformatics problems like file format conversion, molecule data set processing, SMARTS editing, fragment space enumeration, and ligand-based virtual screening. Furthermore, it also includes structural bioinformatics solutions for inverse screening, binding site alignment, and searching interaction patterns across structure libraries. The software package is available at http://software.zbh.uni-hamburg.de. Copyright © 2017 The Authors. Published by Elsevier B.V. All rights reserved.
Uversky, Vladimir N
2015-03-01
Intrinsically disordered proteins (IDPs) and intrinsically disordered protein regions (IDPRs) are functional proteins or regions that do not have unique 3D structures under functional conditions. Therefore, from the viewpoint of their lack of stable 3D structure, IDPs/IDPRs are inherently unstable. As much as structure and function of normal ordered globular proteins are determined by their amino acid sequences, the lack of unique 3D structure in IDPs/IDPRs and their disorder-based functionality are also encoded in the amino acid sequences. Because of their specific sequence features and distinctive conformational behavior, these intrinsically unstable proteins or regions have several applications in biotechnology. This review introduces some of the most characteristic features of IDPs/IDPRs (such as peculiarities of amino acid sequences of these proteins and regions, their major structural features, and peculiar responses to changes in their environment) and describes how these features can be used in the biotechnology, for example for the proteome-wide analysis of the abundance of extended IDPs, for recombinant protein isolation and purification, as polypeptide nanoparticles for drug delivery, as solubilization tools, and as thermally sensitive carriers of active peptides and proteins. Copyright © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Prediction of Protein Structure by Template-Based Modeling Combined with the UNRES Force Field.
Krupa, Paweł; Mozolewska, Magdalena A; Joo, Keehyoung; Lee, Jooyoung; Czaplewski, Cezary; Liwo, Adam
2015-06-22
A new approach to the prediction of protein structures that uses distance and backbone virtual-bond dihedral angle restraints derived from template-based models and simulations with the united residue (UNRES) force field is proposed. The approach combines the accuracy and reliability of template-based methods for the segments of the target sequence with high similarity to those having known structures with the ability of UNRES to pack the domains correctly. Multiplexed replica-exchange molecular dynamics with restraints derived from template-based models of a given target, in which each restraint is weighted according to the accuracy of the prediction of the corresponding section of the molecule, is used to search the conformational space, and the weighted histogram analysis method and cluster analysis are applied to determine the families of the most probable conformations, from which candidate predictions are selected. To test the capability of the method to recover template-based models from restraints, five single-domain proteins with structures that have been well-predicted by template-based methods were used; it was found that the resulting structures were of the same quality as the best of the original models. To assess whether the new approach can improve template-based predictions with incorrectly predicted domain packing, four such targets were selected from the CASP10 targets; for three of them the new approach resulted in significantly better predictions compared with the original template-based models. The new approach can be used to predict the structures of proteins for which good templates can be found for sections of the sequence or an overall good template can be found for the entire sequence but the prediction quality is remarkably weaker in putative domain-linker regions.
Wang, Jichao; Zhang, Tongchuan; Liu, Ruicun; Song, Meilin; Wang, Juncheng; Hong, Jiong; Chen, Quan; Liu, Haiyan
2017-02-01
An interesting way of generating novel artificial proteins is to combine sequence motifs from natural proteins, mimicking the evolutionary path suggested by natural proteins comprising recurring motifs. We analyzed the βα and αβ modules of TIM barrel proteins by structure alignment-based sequence clustering. A number of preferred motifs were identified. A chimeric TIM was designed by using recurring elements as mutually compatible interfaces. The foldability of the designed TIM protein was then significantly improved by six rounds of directed evolution. The melting temperature has been improved by more than 20°C. A variety of characteristics suggested that the resulting protein is well-folded. Our analysis provided a library of peptide motifs that is potentially useful for different protein engineering studies. The protein engineering strategy of using recurring motifs as interfaces to connect partial natural proteins may be applied to other protein folds. Copyright © 2016 Elsevier B.V. All rights reserved.
Structural genomics: keeping up with expanding knowledge of the protein universe.
Grabowski, Marek; Joachimiak, Andrzej; Otwinowski, Zbyszek; Minor, Wladek
2007-06-01
Structural characterization of the protein universe is the main mission of Structural Genomics (SG) programs. However, progress in gene sequencing technology, set in motion in the 1990s, has resulted in rapid expansion of protein sequence space--a twelvefold increase in the past seven years. For the SG field, this creates new challenges and necessitates a re-assessment of its strategies. Nevertheless, despite the growth of sequence space, at present nearly half of the content of the Swiss-Prot database and over 40% of Pfam protein families can be structurally modeled based on structures determined so far, with SG projects making an increasingly significant contribution. The SG contribution of new Pfam structures nearly doubled from 27.2% in 2003 to 51.6% in 2006.
Comparison of intrinsic dynamics of cytochrome p450 proteins using normal mode analysis
Dorner, Mariah E; McMunn, Ryan D; Bartholow, Thomas G; Calhoon, Brecken E; Conlon, Michelle R; Dulli, Jessica M; Fehling, Samuel C; Fisher, Cody R; Hodgson, Shane W; Keenan, Shawn W; Kruger, Alyssa N; Mabin, Justin W; Mazula, Daniel L; Monte, Christopher A; Olthafer, Augustus; Sexton, Ashley E; Soderholm, Beatrice R; Strom, Alexander M; Hati, Sanchita
2015-01-01
Cytochrome P450 enzymes are hemeproteins that catalyze the monooxygenation of a wide-range of structurally diverse substrates of endogenous and exogenous origin. These heme monooxygenases receive electrons from NADH/NADPH via electron transfer proteins. The cytochrome P450 enzymes, which constitute a diverse superfamily of more than 8,700 proteins, share a common tertiary fold but < 25% sequence identity. Based on their electron transfer protein partner, cytochrome P450 proteins are classified into six broad classes. Traditional methods of pro are based on the canonical paradigm that attributes proteins' function to their three-dimensional structure, which is determined by their primary structure that is the amino acid sequence. It is increasingly recognized that protein dynamics play an important role in molecular recognition and catalytic activity. As the mobility of a protein is an intrinsic property that is encrypted in its primary structure, we examined if different classes of cytochrome P450 enzymes display any unique patterns of intrinsic mobility. Normal mode analysis was performed to characterize the intrinsic dynamics of five classes of cytochrome P450 proteins. The present study revealed that cytochrome P450 enzymes share a strong dynamic similarity (root mean squared inner product > 55% and Bhattacharyya coefficient > 80%), despite the low sequence identity (< 25%) and sequence similarity (< 50%) across the cytochrome P450 superfamily. Noticeable differences in Cα atom fluctuations of structural elements responsible for substrate binding were noticed. These differences in residue fluctuations might be crucial for substrate selectivity in these enzymes. PMID:26130403
Self-Assembled Materials Made from Functional Recombinant Proteins.
Jang, Yeongseon; Champion, Julie A
2016-10-18
Proteins are potent molecules that can be used as therapeutics, sensors, and biocatalysts with many advantages over small-molecule counterparts due to the specificity of their activity based on their amino acid sequence and folded three-dimensional structure. However, they also have significant limitations in their stability, localization, and recovery when used in soluble form. These opportunities and challenges have motivated the creation of materials from such functional proteins in order to protect and present them in a way that enhances their function. We have designed functional recombinant fusion proteins capable of self-assembling into materials with unique structures that maintain or improve the functionality of the protein. Fusion of either a functional protein or an assembly domain to a leucine zipper domain makes the materials design strategy modular, based on the high affinity between leucine zippers. The self-assembly domains, including elastin-like polypeptides (ELPs) and defined-sequence random coil polypeptides, can be fused with a leucine zipper motif in order to promote assembly of the fusion proteins into larger structures upon specific stimuli such as temperature and ionic strength. Fusion of other functional domains with the counterpart leucine zipper motif endows the self-assembled materials with protein-specific functions such as fluorescence or catalytic activity. In this Account, we describe several examples of materials assembled from functional fusion proteins as well as the structural characterization, functionality, and understanding of the assembly mechanism. The first example is zipper fusion proteins containing ELPs that assemble into particles when introduced to a model extracellular matrix and subsequently disassemble over time to release the functional protein for drug delivery applications. Under different conditions, the same fusion proteins can self-assemble into hollow vesicles. The vesicles display a functional protein on the surface and can also carry protein, small-molecule, or nanoparticle cargo in the vesicle lumen. To create a material with a more complex hierarchical structure, we combined calcium phosphate with zipper fusion proteins containing random coil polypeptides to produce hybrid protein-inorganic supraparticles with high surface area and porous structure. The use of a functional enzyme created supraparticles with the ability to degrade inflammatory cytokines. Our characterization of these protein materials revealed that the molecular interactions are complex because of the large size of the protein building blocks, their folded structures, and the number of potential interactions including hydrophobic interactions, electrostatic interactions, van der Waals forces, and specific affinity-based interactions. It is difficult or even impossible to predict the structures a priori. However, once the basic assembly principles are understood, there is opportunity to tune the material properties, such as size, through control of the self-assembly conditions. Our future efforts on the fundamental side will focus on identifying the phase space of self-assembly of these fusion proteins and additional experimental levers with which to control and tune the resulting materials. On the application side, we are investigating an array of different functional proteins to expand the use of these structures in both therapeutic protein delivery and biocatalysis.
Cao, Han; Ng, Marcus C K; Jusoh, Siti Azma; Tai, Hio Kuan; Siu, Shirley W I
2017-09-01
[Formula: see text]-Helical transmembrane proteins are the most important drug targets in rational drug development. However, solving the experimental structures of these proteins remains difficult, therefore computational methods to accurately and efficiently predict the structures are in great demand. We present an improved structure prediction method TMDIM based on Park et al. (Proteins 57:577-585, 2004) for predicting bitopic transmembrane protein dimers. Three major algorithmic improvements are introduction of the packing type classification, the multiple-condition decoy filtering, and the cluster-based candidate selection. In a test of predicting nine known bitopic dimers, approximately 78% of our predictions achieved a successful fit (RMSD <2.0 Å) and 78% of the cases are better predicted than the two other methods compared. Our method provides an alternative for modeling TM bitopic dimers of unknown structures for further computational studies. TMDIM is freely available on the web at https://cbbio.cis.umac.mo/TMDIM . Website is implemented in PHP, MySQL and Apache, with all major browsers supported.
TMDIM: an improved algorithm for the structure prediction of transmembrane domains of bitopic dimers
NASA Astrophysics Data System (ADS)
Cao, Han; Ng, Marcus C. K.; Jusoh, Siti Azma; Tai, Hio Kuan; Siu, Shirley W. I.
2017-09-01
α-Helical transmembrane proteins are the most important drug targets in rational drug development. However, solving the experimental structures of these proteins remains difficult, therefore computational methods to accurately and efficiently predict the structures are in great demand. We present an improved structure prediction method TMDIM based on Park et al. (Proteins 57:577-585, 2004) for predicting bitopic transmembrane protein dimers. Three major algorithmic improvements are introduction of the packing type classification, the multiple-condition decoy filtering, and the cluster-based candidate selection. In a test of predicting nine known bitopic dimers, approximately 78% of our predictions achieved a successful fit (RMSD <2.0 Å) and 78% of the cases are better predicted than the two other methods compared. Our method provides an alternative for modeling TM bitopic dimers of unknown structures for further computational studies. TMDIM is freely available on the web at https://cbbio.cis.umac.mo/TMDIM. Website is implemented in PHP, MySQL and Apache, with all major browsers supported.
Predicting residue-wise contact orders in proteins by support vector regression.
Song, Jiangning; Burrage, Kevin
2006-10-03
The residue-wise contact order (RWCO) describes the sequence separations between the residues of interest and its contacting residues in a protein sequence. It is a new kind of one-dimensional protein structure that represents the extent of long-range contacts and is considered as a generalization of contact order. Together with secondary structure, accessible surface area, the B factor, and contact number, RWCO provides comprehensive and indispensable important information to reconstructing the protein three-dimensional structure from a set of one-dimensional structural properties. Accurately predicting RWCO values could have many important applications in protein three-dimensional structure prediction and protein folding rate prediction, and give deep insights into protein sequence-structure relationships. We developed a novel approach to predict residue-wise contact order values in proteins based on support vector regression (SVR), starting from primary amino acid sequences. We explored seven different sequence encoding schemes to examine their effects on the prediction performance, including local sequence in the form of PSI-BLAST profiles, local sequence plus amino acid composition, local sequence plus molecular weight, local sequence plus secondary structure predicted by PSIPRED, local sequence plus molecular weight and amino acid composition, local sequence plus molecular weight and predicted secondary structure, and local sequence plus molecular weight, amino acid composition and predicted secondary structure. When using local sequences with multiple sequence alignments in the form of PSI-BLAST profiles, we could predict the RWCO distribution with a Pearson correlation coefficient (CC) between the predicted and observed RWCO values of 0.55, and root mean square error (RMSE) of 0.82, based on a well-defined dataset with 680 protein sequences. Moreover, by incorporating global features such as molecular weight and amino acid composition we could further improve the prediction performance with the CC to 0.57 and an RMSE of 0.79. In addition, combining the predicted secondary structure by PSIPRED was found to significantly improve the prediction performance and could yield the best prediction accuracy with a CC of 0.60 and RMSE of 0.78, which provided at least comparable performance compared with the other existing methods. The SVR method shows a prediction performance competitive with or at least comparable to the previously developed linear regression-based methods for predicting RWCO values. In contrast to support vector classification (SVC), SVR is very good at estimating the raw value profiles of the samples. The successful application of the SVR approach in this study reinforces the fact that support vector regression is a powerful tool in extracting the protein sequence-structure relationship and in estimating the protein structural profiles from amino acid sequences.
Predicting nucleic acid binding interfaces from structural models of proteins
Dror, Iris; Shazman, Shula; Mukherjee, Srayanta; Zhang, Yang; Glaser, Fabian; Mandel-Gutfreund, Yael
2011-01-01
The function of DNA- and RNA-binding proteins can be inferred from the characterization and accurate prediction of their binding interfaces. However the main pitfall of various structure-based methods for predicting nucleic acid binding function is that they are all limited to a relatively small number of proteins for which high-resolution three dimensional structures are available. In this study, we developed a pipeline for extracting functional electrostatic patches from surfaces of protein structural models, obtained using the I-TASSER protein structure predictor. The largest positive patches are extracted from the protein surface using the patchfinder algorithm. We show that functional electrostatic patches extracted from an ensemble of structural models highly overlap the patches extracted from high-resolution structures. Furthermore, by testing our pipeline on a set of 55 known nucleic acid binding proteins for which I-TASSER produces high-quality models, we show that the method accurately identifies the nucleic acids binding interface on structural models of proteins. Employing a combined patch approach we show that patches extracted from an ensemble of models better predicts the real nucleic acid binding interfaces compared to patches extracted from independent models. Overall, these results suggest that combining information from a collection of low-resolution structural models could be a valuable approach for functional annotation. We suggest that our method will be further applicable for predicting other functional surfaces of proteins with unknown structure. PMID:22086767
Prediction of phenotypes of missense mutations in human proteins from biological assemblies.
Wei, Qiong; Xu, Qifang; Dunbrack, Roland L
2013-02-01
Single nucleotide polymorphisms (SNPs) are the most frequent variation in the human genome. Nonsynonymous SNPs that lead to missense mutations can be neutral or deleterious, and several computational methods have been presented that predict the phenotype of human missense mutations. These methods use sequence-based and structure-based features in various combinations, relying on different statistical distributions of these features for deleterious and neutral mutations. One structure-based feature that has not been studied significantly is the accessible surface area within biologically relevant oligomeric assemblies. These assemblies are different from the crystallographic asymmetric unit for more than half of X-ray crystal structures. We find that mutations in the core of proteins or in the interfaces in biological assemblies are significantly more likely to be disease-associated than those on the surface of the biological assemblies. For structures with more than one protein in the biological assembly (whether the same sequence or different), we find the accessible surface area from biological assemblies provides a statistically significant improvement in prediction over the accessible surface area of monomers from protein crystal structures (P = 6e-5). When adding this information to sequence-based features such as the difference between wildtype and mutant position-specific profile scores, the improvement from biological assemblies is statistically significant but much smaller (P = 0.018). Combining this information with sequence-based features in a support vector machine leads to 82% accuracy on a balanced dataset of 50% disease-associated mutations from SwissVar and 50% neutral mutations from human/primate sequence differences in orthologous proteins. Copyright © 2012 Wiley Periodicals, Inc.
Christensen, Signe; Horowitz, Scott; Bardwell, James C.A.; Olsen, Johan G.; Willemoës, Martin; Lindorff-Larsen, Kresten; Ferkinghoff-Borg, Jesper; Hamelryck, Thomas; Winther, Jakob R.
2017-01-01
Despite the development of powerful computational tools, the full-sequence design of proteins still remains a challenging task. To investigate the limits and capabilities of computational tools, we conducted a study of the ability of the program Rosetta to predict sequences that recreate the authentic fold of thioredoxin. Focusing on the influence of conformational details in the template structures, we based our study on 8 experimentally determined template structures and generated 120 designs from each. For experimental evaluation, we chose six sequences from each of the eight templates by objective criteria. The 48 selected sequences were evaluated based on their progressive ability to (1) produce soluble protein in Escherichia coli and (2) yield stable monomeric protein, and (3) on the ability of the stable, soluble proteins to adopt the target fold. Of the 48 designs, we were able to synthesize 32, 20 of which resulted in soluble protein. Of these, only two were sufficiently stable to be purified. An X-ray crystal structure was solved for one of the designs, revealing a close resemblance to the target structure. We found a significant difference among the eight template structures to realize the above three criteria despite their high structural similarity. Thus, in order to improve the success rate of computational full-sequence design methods, we recommend that multiple template structures are used. Furthermore, this study shows that special care should be taken when optimizing the geometry of a structure prior to computational design when using a method that is based on rigid conformations. PMID:27659562
Johansson, Kristoffer E; Tidemand Johansen, Nicolai; Christensen, Signe; Horowitz, Scott; Bardwell, James C A; Olsen, Johan G; Willemoës, Martin; Lindorff-Larsen, Kresten; Ferkinghoff-Borg, Jesper; Hamelryck, Thomas; Winther, Jakob R
2016-10-23
Despite the development of powerful computational tools, the full-sequence design of proteins still remains a challenging task. To investigate the limits and capabilities of computational tools, we conducted a study of the ability of the program Rosetta to predict sequences that recreate the authentic fold of thioredoxin. Focusing on the influence of conformational details in the template structures, we based our study on 8 experimentally determined template structures and generated 120 designs from each. For experimental evaluation, we chose six sequences from each of the eight templates by objective criteria. The 48 selected sequences were evaluated based on their progressive ability to (1) produce soluble protein in Escherichia coli and (2) yield stable monomeric protein, and (3) on the ability of the stable, soluble proteins to adopt the target fold. Of the 48 designs, we were able to synthesize 32, 20 of which resulted in soluble protein. Of these, only two were sufficiently stable to be purified. An X-ray crystal structure was solved for one of the designs, revealing a close resemblance to the target structure. We found a significant difference among the eight template structures to realize the above three criteria despite their high structural similarity. Thus, in order to improve the success rate of computational full-sequence design methods, we recommend that multiple template structures are used. Furthermore, this study shows that special care should be taken when optimizing the geometry of a structure prior to computational design when using a method that is based on rigid conformations. Copyright © 2016 Elsevier Ltd. All rights reserved.
PoLi: A Virtual Screening Pipeline Based On Template Pocket And Ligand Similarity
Roy, Ambrish; Srinivasan, Bharath; Skolnick, Jeffrey
2015-01-01
Often in pharmaceutical research, the goal is to identify small molecules that can interact with and appropriately modify the biological behavior of a new protein target. Unfortunately, most proteins lack both known structures and small molecule binders, prerequisites of many virtual screening, VS, approaches. For such proteins, ligand homology modeling, LHM, that copies ligands from homologous and perhaps evolutionarily distant template proteins, has been shown to be a powerful VS approach to identify possible binding ligands. However, if we want to target a specific pocket for which there is no homologous holo template protein structure, then LHM will not work. To address this issue, in a new pocket based approach, PoLi, we generalize LHM by exploiting the fact that the number of distinct small molecule ligand binding pockets in proteins is small. PoLi identifies similar ligand binding pockets in a holo-template protein library, selectively copies relevant parts of template ligands and uses them for VS. In practice, PoLi is a hybrid structure and ligand based VS algorithm that integrates 2D fingerprint-based and 3D shape-based similarity metrics for improved virtual screening performance. On standard DUD and DUD-E benchmark databases, using modeled receptor structures, PoLi achieves an average enrichment factor of 13.4 and 9.6 respectively, in the top 1% of the screened library. In contrast, traditional docking based VS using AutoDock Vina and homology-based VS using FINDSITEfilt have an average enrichment of 1.6 (3.0) and 9.0 (7.9) on the DUD (DUD-E) sets respectively. Experimental validation of PoLi predictions on dihydrofolate reductase, DHFR, using differential scanning fluorimetry, DSF, identifies multiple ligands with diverse molecular scaffolds, thus demonstrating the advantage of PoLi over current state-of-the-art VS methods. PMID:26225536
Classification of protein quaternary structure by functional domain composition
Yu, Xiaojing; Wang, Chuan; Li, Yixue
2006-01-01
Background The number and the arrangement of subunits that form a protein are referred to as quaternary structure. Quaternary structure is an important protein attribute that is closely related to its function. Proteins with quaternary structure are called oligomeric proteins. Oligomeric proteins are involved in various biological processes, such as metabolism, signal transduction, and chromosome replication. Thus, it is highly desirable to develop some computational methods to automatically classify the quaternary structure of proteins from their sequences. Results To explore this problem, we adopted an approach based on the functional domain composition of proteins. Every protein was represented by a vector calculated from the domains in the PFAM database. The nearest neighbor algorithm (NNA) was used for classifying the quaternary structure of proteins from this information. The jackknife cross-validation test was performed on the non-redundant protein dataset in which the sequence identity was less than 25%. The overall success rate obtained is 75.17%. Additionally, to demonstrate the effectiveness of this method, we predicted the proteins in an independent dataset and achieved an overall success rate of 84.11% Conclusion Compared with the amino acid composition method and Blast, the results indicate that the domain composition approach may be a more effective and promising high-throughput method in dealing with this complicated problem in bioinformatics. PMID:16584572
Huang, Wei; Ravikumar, Krishnakumar M; Parisien, Marc; Yang, Sichun
2016-12-01
Structural determination of protein-protein complexes such as multidomain nuclear receptors has been challenging for high-resolution structural techniques. Here, we present a combined use of multiple biophysical methods, termed iSPOT, an integration of shape information from small-angle X-ray scattering (SAXS), protection factors probed by hydroxyl radical footprinting, and a large series of computationally docked conformations from rigid-body or molecular dynamics (MD) simulations. Specifically tested on two model systems, the power of iSPOT is demonstrated to accurately predict the structures of a large protein-protein complex (TGFβ-FKBP12) and a multidomain nuclear receptor homodimer (HNF-4α), based on the structures of individual components of the complexes. Although neither SAXS nor footprinting alone can yield an unambiguous picture for each complex, the combination of both, seamlessly integrated in iSPOT, narrows down the best-fit structures that are about 3.2Å and 4.2Å in RMSD from their corresponding crystal structures, respectively. Furthermore, this proof-of-principle study based on the data synthetically derived from available crystal structures shows that the iSPOT-using either rigid-body or MD-based flexible docking-is capable of overcoming the shortcomings of standalone computational methods, especially for HNF-4α. By taking advantage of the integration of SAXS-based shape information and footprinting-based protection/accessibility as well as computational docking, this iSPOT platform is set to be a powerful approach towards accurate integrated modeling of many challenging multiprotein complexes. Copyright © 2016 Elsevier Inc. All rights reserved.
The protein structure prediction problem could be solved using the current PDB library
Zhang, Yang; Skolnick, Jeffrey
2005-01-01
For single-domain proteins, we examine the completeness of the structures in the current Protein Data Bank (PDB) library for use in full-length model construction of unknown sequences. To address this issue, we employ a comprehensive benchmark set of 1,489 medium-size proteins that cover the PDB at the level of 35% sequence identity and identify templates by structure alignment. With homologous proteins excluded, we can always find similar folds to native with an average rms deviation (RMSD) from native of 2.5 Å with ≈82% alignment coverage. These template structures often contain a significant number of insertions/deletions. The tasser algorithm was applied to build full-length models, where continuous fragments are excised from the top-scoring templates and reassembled under the guide of an optimized force field, which includes consensus restraints taken from the templates and knowledge-based statistical potentials. For almost all targets (except for 2/1,489), the resultant full-length models have an RMSD to native below 6 Å (97% of them below 4 Å). On average, the RMSD of full-length models is 2.25 Å, with aligned regions improved from 2.5 Å to 1.88 Å, comparable with the accuracy of low-resolution experimental structures. Furthermore, starting from state-of-the-art structural alignments, we demonstrate a methodology that can consistently bring template-based alignments closer to native. These results are highly suggestive that the protein-folding problem can in principle be solved based on the current PDB library by developing efficient fold recognition algorithms that can recover such initial alignments. PMID:15653774
Sanchez-Martinez, M; Crehuet, R
2014-12-21
We present a method based on the maximum entropy principle that can re-weight an ensemble of protein structures based on data from residual dipolar couplings (RDCs). The RDCs of intrinsically disordered proteins (IDPs) provide information on the secondary structure elements present in an ensemble; however even two sets of RDCs are not enough to fully determine the distribution of conformations, and the force field used to generate the structures has a pervasive influence on the refined ensemble. Two physics-based coarse-grained force fields, Profasi and Campari, are able to predict the secondary structure elements present in an IDP, but even after including the RDC data, the re-weighted ensembles differ between both force fields. Thus the spread of IDP ensembles highlights the need for better force fields. We distribute our algorithm in an open-source Python code.
A Maltose-Binding Protein Fusion Construct Yields a Robust Crystallography Platform for MCL1
Clifton, Matthew C.; Dranow, David M.; Leed, Alison; Fulroth, Ben; Fairman, James W.; Abendroth, Jan; Atkins, Kateri A.; Wallace, Ellen; Fan, Dazhong; Xu, Guoping; Ni, Z. J.; Daniels, Doug; Van Drie, John; Wei, Guo; Burgin, Alex B.; Golub, Todd R.; Hubbard, Brian K.; Serrano-Wu, Michael H.
2015-01-01
Crystallization of a maltose-binding protein MCL1 fusion has yielded a robust crystallography platform that generated the first apo MCL1 crystal structure, as well as five ligand-bound structures. The ability to obtain fragment-bound structures advances structure-based drug design efforts that, despite considerable effort, had previously been intractable by crystallography. In the ligand-independent crystal form we identify inhibitor binding modes not observed in earlier crystallographic systems. This MBP-MCL1 construct dramatically improves the structural understanding of well-validated MCL1 ligands, and will likely catalyze the structure-based optimization of high affinity MCL1 inhibitors. PMID:25909780
3DNALandscapes: a database for exploring the conformational features of DNA.
Zheng, Guohui; Colasanti, Andrew V; Lu, Xiang-Jun; Olson, Wilma K
2010-01-01
3DNALandscapes, located at: http://3DNAscapes.rutgers.edu, is a new database for exploring the conformational features of DNA. In contrast to most structural databases, which archive the Cartesian coordinates and/or derived parameters and images for individual structures, 3DNALandscapes enables searches of conformational information across multiple structures. The database contains a wide variety of structural parameters and molecular images, computed with the 3DNA software package and known to be useful for characterizing and understanding the sequence-dependent spatial arrangements of the DNA sugar-phosphate backbone, sugar-base side groups, base pairs, base-pair steps, groove structure, etc. The data comprise all DNA-containing structures--both free and bound to proteins, drugs and other ligands--currently available in the Protein Data Bank. The web interface allows the user to link, report, plot and analyze this information from numerous perspectives and thereby gain insight into DNA conformation, deformability and interactions in different sequence and structural contexts. The data accumulated from known, well-resolved DNA structures can serve as useful benchmarks for the analysis and simulation of new structures. The collective data can also help to understand how DNA deforms in response to proteins and other molecules and undergoes conformational rearrangements.
CABS-flex: server for fast simulation of protein structure fluctuations
Jamroz, Michal; Kolinski, Andrzej; Kmiecik, Sebastian
2013-01-01
The CABS-flex server (http://biocomp.chem.uw.edu.pl/CABSflex) implements CABS-model–based protocol for the fast simulations of near-native dynamics of globular proteins. In this application, the CABS model was shown to be a computationally efficient alternative to all-atom molecular dynamics—a classical simulation approach. The simulation method has been validated on a large set of molecular dynamics simulation data. Using a single input (user-provided file in PDB format), the CABS-flex server outputs an ensemble of protein models (in all-atom PDB format) reflecting the flexibility of the input structure, together with the accompanying analysis (residue mean-square-fluctuation profile and others). The ensemble of predicted models can be used in structure-based studies of protein functions and interactions. PMID:23658222
CABS-flex: Server for fast simulation of protein structure fluctuations.
Jamroz, Michal; Kolinski, Andrzej; Kmiecik, Sebastian
2013-07-01
The CABS-flex server (http://biocomp.chem.uw.edu.pl/CABSflex) implements CABS-model-based protocol for the fast simulations of near-native dynamics of globular proteins. In this application, the CABS model was shown to be a computationally efficient alternative to all-atom molecular dynamics--a classical simulation approach. The simulation method has been validated on a large set of molecular dynamics simulation data. Using a single input (user-provided file in PDB format), the CABS-flex server outputs an ensemble of protein models (in all-atom PDB format) reflecting the flexibility of the input structure, together with the accompanying analysis (residue mean-square-fluctuation profile and others). The ensemble of predicted models can be used in structure-based studies of protein functions and interactions.
Usha, S; Selvaraj, S
2014-01-01
The molecular recognition and discrimination of very similar ligand moieties by proteins are important subjects in protein-ligand interaction studies. Specificity in the recognition of molecules is determined by the arrangement of protein and ligand atoms in space. The three pyrimidine bases, viz. cytosine, thymine, and uracil, are structurally similar, but the proteins that bind to them are able to discriminate them and form interactions. Since nonbonded interactions are responsible for molecular recognition processes in biological systems, our work attempts to understand some of the underlying principles of such recognition of pyrimidine molecular structures by proteins. The preferences of the amino acid residues to contact the pyrimidine bases in terms of nonbonded interactions; amino acid residue-ligand atom preferences; main chain and side chain atom contributions of amino acid residues; and solvent-accessible surface area of ligand atoms when forming complexes are analyzed. Our analysis shows that the amino acid residues, tyrosine and phenyl alanine, are highly involved in the pyrimidine interactions. Arginine prefers contacts with the cytosine base. The similarities and differences that exist between the interactions of the amino acid residues with each of the three pyrimidine base atoms in our analysis provide insights that can be exploited in designing specific inhibitors competitive to the ligands.
Zheng, Wenjun
2010-01-01
Abstract Protein conformational dynamics, despite its significant anharmonicity, has been widely explored by normal mode analysis (NMA) based on atomic or coarse-grained potential functions. To account for the anharmonic aspects of protein dynamics, this study proposes, and has performed, an anharmonic NMA (ANMA) based on the Cα-only elastic network models, which assume elastic interactions between pairs of residues whose Cα atoms or heavy atoms are within a cutoff distance. The key step of ANMA is to sample an anharmonic potential function along the directions of eigenvectors of the lowest normal modes to determine the mean-squared fluctuations along these directions. ANMA was evaluated based on the modeling of anisotropic displacement parameters (ADPs) from a list of 83 high-resolution protein crystal structures. Significant improvement was found in the modeling of ADPs by ANMA compared with standard NMA. Further improvement in the modeling of ADPs is attained if the interactions between a protein and its crystalline environment are taken into account. In addition, this study has determined the optimal cutoff distances for ADP modeling based on elastic network models, and these agree well with the peaks of the statistical distributions of distances between Cα atoms or heavy atoms derived from a large set of protein crystal structures. PMID:20550915
Fragment-based modelling of single stranded RNA bound to RNA recognition motif containing proteins
de Beauchene, Isaure Chauvot; de Vries, Sjoerd J.; Zacharias, Martin
2016-01-01
Abstract Protein-RNA complexes are important for many biological processes. However, structural modeling of such complexes is hampered by the high flexibility of RNA. Particularly challenging is the docking of single-stranded RNA (ssRNA). We have developed a fragment-based approach to model the structure of ssRNA bound to a protein, based on only the protein structure, the RNA sequence and conserved contacts. The conformational diversity of each RNA fragment is sampled by an exhaustive library of trinucleotides extracted from all known experimental protein–RNA complexes. The method was applied to ssRNA with up to 12 nucleotides which bind to dimers of the RNA recognition motifs (RRMs), a highly abundant eukaryotic RNA-binding domain. The fragment based docking allows a precise de novo atomic modeling of protein-bound ssRNA chains. On a benchmark of seven experimental ssRNA–RRM complexes, near-native models (with a mean heavy-atom deviation of <3 Å from experiment) were generated for six out of seven bound RNA chains, and even more precise models (deviation < 2 Å) were obtained for five out of seven cases, a significant improvement compared to the state of the art. The method is not restricted to RRMs but was also successfully applied to Pumilio RNA binding proteins. PMID:27131381
An introduction to NMR-based approaches for measuring protein dynamics
Kleckner, Ian R; Foster, Mark P
2010-01-01
Proteins are inherently flexible at ambient temperature. At equilibrium, they are characterized by a set of conformations that undergo continuous exchange within a hierarchy of spatial and temporal scales ranging from nanometers to micrometers and femtoseconds to hours. Dynamic properties of proteins are essential for describing the structural bases of their biological functions including catalysis, binding, regulation and cellular structure. Nuclear magnetic resonance (NMR) spectroscopy represents a powerful technique for measuring these essential features of proteins. Here we provide an introduction to NMR-based approaches for studying protein dynamics, highlighting eight distinct methods with recent examples, contextualized within a common experimental and analytical framework. The selected methods are (1) Real-time NMR, (2) Exchange spectroscopy, (3) Lineshape analysis, (4) CPMG relaxation dispersion, (5) Rotating frame relaxation dispersion, (6) Nuclear spin relaxation, (7) Residual dipolar coupling, (8) Paramagnetic relaxation enhancement. PMID:21059410
Kryshtafovych, Andriy; Moult, John; Bales, Patrick; Bazan, J. Fernando; Biasini, Marco; Burgin, Alex; Chen, Chen; Cochran, Frank V.; Craig, Timothy K.; Das, Rhiju; Fass, Deborah; Garcia-Doval, Carmela; Herzberg, Osnat; Lorimer, Donald; Luecke, Hartmut; Ma, Xiaolei; Nelson, Daniel C.; van Raaij, Mark J.; Rohwer, Forest; Segall, Anca; Seguritan, Victor; Zeth, Kornelius; Schwede, Torsten
2014-01-01
For the last two decades, CASP has assessed the state of the art in techniques for protein structure prediction and identified areas which required further development. CASP would not have been possible without the prediction targets provided by the experimental structural biology community. In the latest experiment, CASP10, over 100 structures were suggested as prediction targets, some of which appeared to be extraordinarily difficult for modeling. In this paper, authors of some of the most challenging targets discuss which specific scientific question motivated the experimental structure determination of the target protein, which structural features were especially interesting from a structural or functional perspective, and to what extent these features were correctly reproduced in the predictions submitted to CASP10. Specifically, the following targets will be presented: the acid-gated urea channel, a difficult to predict trans-membrane protein from the important human pathogen Helicobacter pylori; the structure of human interleukin IL-34, a recently discovered helical cytokine; the structure of a functionally uncharacterized enzyme OrfY from Thermoproteus tenax formed by a gene duplication and a novel fold; an ORFan domain of mimivirus sulfhydryl oxidase R596; the fibre protein gp17 from bacteriophage T7; the Bacteriophage CBA-120 tailspike protein; a virus coat protein from metagenomic samples of the marine environment; and finally an unprecedented class of structure prediction targets based on engineered disulfide-rich small proteins. PMID:24318984
Mass spectrometry for the biophysical characterization of therapeutic monoclonal antibodies.
Zhang, Hao; Cui, Weidong; Gross, Michael L
2014-01-21
Monoclonal antibodies (mAbs) are powerful therapeutics, and their characterization has drawn considerable attention and urgency. Unlike small-molecule drugs (150-600 Da) that have rigid structures, mAbs (∼150 kDa) are engineered proteins that undergo complicated folding and can exist in a number of low-energy structures, posing a challenge for traditional methods in structural biology. Mass spectrometry (MS)-based biophysical characterization approaches can provide structural information, bringing high sensitivity, fast turnaround, and small sample consumption. This review outlines various MS-based strategies for protein biophysical characterization and then reviews how these strategies provide structural information of mAbs at the protein level (intact or top-down approaches), peptide, and residue level (bottom-up approaches), affording information on higher order structure, aggregation, and the nature of antibody complexes. Copyright © 2013 Federation of European Biochemical Societies. Published by Elsevier B.V. All rights reserved.
Gold, Matthew G.; Fowler, Douglas M.; Means, Christopher K.; Pawson, Catherine T.; Stephany, Jason J.; Langeberg, Lorene K.; Fields, Stanley; Scott, John D.
2013-01-01
PKA is retained within distinct subcellular environments by the association of its regulatory type II (RII) subunits with A-kinase anchoring proteins (AKAPs). Conventional reagents that universally disrupt PKA anchoring are patterned after a conserved AKAP motif. We introduce a phage selection procedure that exploits high-resolution structural information to engineer RII mutants that are selective for a particular AKAP. Selective RII (RSelect) sequences were obtained for eight AKAPs following competitive selection screening. Biochemical and cell-based experiments validated the efficacy of RSelect proteins for AKAP2 and AKAP18. These engineered proteins represent a new class of reagents that can be used to dissect the contributions of different AKAP-targeted pools of PKA. Molecular modeling and high-throughput sequencing analyses revealed the molecular basis of AKAP-selective interactions and shed new light on native RII-AKAP interactions. We propose that this structure-directed evolution strategy might be generally applicable for the investigation of other protein interaction surfaces. PMID:23625929
Schmidt, Thomas H; Kandt, Christian
2012-10-22
At the beginning of each molecular dynamics membrane simulation stands the generation of a suitable starting structure which includes the working steps of aligning membrane and protein and seamlessly accommodating the protein in the membrane. Here we introduce two efficient and complementary methods based on pre-equilibrated membrane patches, automating these steps. Using a voxel-based cast of the coarse-grained protein, LAMBADA computes a hydrophilicity profile-derived scoring function based on which the optimal rotation and translation operations are determined to align protein and membrane. Employing an entirely geometrical approach, LAMBADA is independent from any precalculated data and aligns even large membrane proteins within minutes on a regular workstation. LAMBADA is the first tool performing the entire alignment process automatically while providing the user with the explicit 3D coordinates of the aligned protein and membrane. The second tool is an extension of the InflateGRO method addressing the shortcomings of its predecessor in a fully automated workflow. Determining the exact number of overlapping lipids based on the area occupied by the protein and restricting expansion, compression and energy minimization steps to a subset of relevant lipids through automatically calculated and system-optimized operation parameters, InflateGRO2 yields optimal lipid packing and reduces lipid vacuum exposure to a minimum preserving as much of the equilibrated membrane structure as possible. Applicable to atomistic and coarse grain structures in MARTINI format, InflateGRO2 offers high accuracy, fast performance, and increased application flexibility permitting the easy preparation of systems exhibiting heterogeneous lipid composition as well as embedding proteins into multiple membranes. Both tools can be used separately, in combination with other methods, or in tandem permitting a fully automated workflow while retaining a maximum level of usage control and flexibility. To assess the performance of both methods, we carried out test runs using 22 membrane proteins of different size and transmembrane structure.
Ligand-biased ensemble receptor docking (LigBEnD): a hybrid ligand/receptor structure-based approach
NASA Astrophysics Data System (ADS)
Lam, Polo C.-H.; Abagyan, Ruben; Totrov, Maxim
2018-01-01
Ligand docking to flexible protein molecules can be efficiently carried out through ensemble docking to multiple protein conformations, either from experimental X-ray structures or from in silico simulations. The success of ensemble docking often requires the careful selection of complementary protein conformations, through docking and scoring of known co-crystallized ligands. False positives, in which a ligand in a wrong pose achieves a better docking score than that of native pose, arise as additional protein conformations are added. In the current study, we developed a new ligand-biased ensemble receptor docking method and composite scoring function which combine the use of ligand-based atomic property field (APF) method with receptor structure-based docking. This method helps us to correctly dock 30 out of 36 ligands presented by the D3R docking challenge. For the six mis-docked ligands, the cognate receptor structures prove to be too different from the 40 available experimental Pocketome conformations used for docking and could be identified only by receptor sampling beyond experimentally explored conformational subspace.
Structure Prediction and Analysis of DNA Transposon and LINE Retrotransposon Proteins*
Abrusán, György; Zhang, Yang; Szilágyi, András
2013-01-01
Despite the considerable amount of research on transposable elements, no large-scale structural analyses of the TE proteome have been performed so far. We predicted the structures of hundreds of proteins from a representative set of DNA and LINE transposable elements and used the obtained structural data to provide the first general structural characterization of TE proteins and to estimate the frequency of TE domestication and horizontal transfer events. We show that 1) ORF1 and Gag proteins of retrotransposons contain high amounts of structural disorder; thus, despite their very low conservation, the presence of disordered regions and probably their chaperone function is conserved. 2) The distribution of SCOP classes in DNA transposons and LINEs indicates that the proteins of DNA transposons are more ancient, containing folds that already existed when the first cellular organisms appeared. 3) DNA transposon proteins have lower contact order than randomly selected reference proteins, indicating rapid folding, most likely to avoid protein aggregation. 4) Structure-based searches for TE homologs indicate that the overall frequency of TE domestication events is low, whereas we found a relatively high number of cases where horizontal transfer, frequently involving parasites, is the most likely explanation for the observed homology. PMID:23530042
Automated prediction of protein function and detection of functional sites from structure.
Pazos, Florencio; Sternberg, Michael J E
2004-10-12
Current structural genomics projects are yielding structures for proteins whose functions are unknown. Accordingly, there is a pressing requirement for computational methods for function prediction. Here we present PHUNCTIONER, an automatic method for structure-based function prediction using automatically extracted functional sites (residues associated to functions). The method relates proteins with the same function through structural alignments and extracts 3D profiles of conserved residues. Functional features to train the method are extracted from the Gene Ontology (GO) database. The method extracts these features from the entire GO hierarchy and hence is applicable across the whole range of function specificity. 3D profiles associated with 121 GO annotations were extracted. We tested the power of the method both for the prediction of function and for the extraction of functional sites. The success of function prediction by our method was compared with the standard homology-based method. In the zone of low sequence similarity (approximately 15%), our method assigns the correct GO annotation in 90% of the protein structures considered, approximately 20% higher than inheritance of function from the closest homologue.
Binding ligand prediction for proteins using partial matching of local surface patches.
Sael, Lee; Kihara, Daisuke
2010-01-01
Functional elucidation of uncharacterized protein structures is an important task in bioinformatics. We report our new approach for structure-based function prediction which captures local surface features of ligand binding pockets. Function of proteins, specifically, binding ligands of proteins, can be predicted by finding similar local surface regions of known proteins. To enable partial comparison of binding sites in proteins, a weighted bipartite matching algorithm is used to match pairs of surface patches. The surface patches are encoded with the 3D Zernike descriptors. Unlike the existing methods which compare global characteristics of the protein fold or the global pocket shape, the local surface patch method can find functional similarity between non-homologous proteins and binding pockets for flexible ligand molecules. The proposed method improves prediction results over global pocket shape-based method which was previously developed by our group.
Binding Ligand Prediction for Proteins Using Partial Matching of Local Surface Patches
Sael, Lee; Kihara, Daisuke
2010-01-01
Functional elucidation of uncharacterized protein structures is an important task in bioinformatics. We report our new approach for structure-based function prediction which captures local surface features of ligand binding pockets. Function of proteins, specifically, binding ligands of proteins, can be predicted by finding similar local surface regions of known proteins. To enable partial comparison of binding sites in proteins, a weighted bipartite matching algorithm is used to match pairs of surface patches. The surface patches are encoded with the 3D Zernike descriptors. Unlike the existing methods which compare global characteristics of the protein fold or the global pocket shape, the local surface patch method can find functional similarity between non-homologous proteins and binding pockets for flexible ligand molecules. The proposed method improves prediction results over global pocket shape-based method which was previously developed by our group. PMID:21614188
A General, Adaptive, Roadmap-Based Algorithm for Protein Motion Computation.
Molloy, Kevin; Shehu, Amarda
2016-03-01
Precious information on protein function can be extracted from a detailed characterization of protein equilibrium dynamics. This remains elusive in wet and dry laboratories, as function-modulating transitions of a protein between functionally-relevant, thermodynamically-stable and meta-stable structural states often span disparate time scales. In this paper we propose a novel, robotics-inspired algorithm that circumvents time-scale challenges by drawing analogies between protein motion and robot motion. The algorithm adapts the popular roadmap-based framework in robot motion computation to handle the more complex protein conformation space and its underlying rugged energy surface. Given known structures representing stable and meta-stable states of a protein, the algorithm yields a time- and energy-prioritized list of transition paths between the structures, with each path represented as a series of conformations. The algorithm balances computational resources between a global search aimed at obtaining a global view of the network of protein conformations and their connectivity and a detailed local search focused on realizing such connections with physically-realistic models. Promising results are presented on a variety of proteins that demonstrate the general utility of the algorithm and its capability to improve the state of the art without employing system-specific insight.
ELM: the status of the 2010 eukaryotic linear motif resource
Gould, Cathryn M.; Diella, Francesca; Via, Allegra; Puntervoll, Pål; Gemünd, Christine; Chabanis-Davidson, Sophie; Michael, Sushama; Sayadi, Ahmed; Bryne, Jan Christian; Chica, Claudia; Seiler, Markus; Davey, Norman E.; Haslam, Niall; Weatheritt, Robert J.; Budd, Aidan; Hughes, Tim; Paś, Jakub; Rychlewski, Leszek; Travé, Gilles; Aasland, Rein; Helmer-Citterich, Manuela; Linding, Rune; Gibson, Toby J.
2010-01-01
Linear motifs are short segments of multidomain proteins that provide regulatory functions independently of protein tertiary structure. Much of intracellular signalling passes through protein modifications at linear motifs. Many thousands of linear motif instances, most notably phosphorylation sites, have now been reported. Although clearly very abundant, linear motifs are difficult to predict de novo in protein sequences due to the difficulty of obtaining robust statistical assessments. The ELM resource at http://elm.eu.org/ provides an expanding knowledge base, currently covering 146 known motifs, with annotation that includes >1300 experimentally reported instances. ELM is also an exploratory tool for suggesting new candidates of known linear motifs in proteins of interest. Information about protein domains, protein structure and native disorder, cellular and taxonomic contexts is used to reduce or deprecate false positive matches. Results are graphically displayed in a ‘Bar Code’ format, which also displays known instances from homologous proteins through a novel ‘Instance Mapper’ protocol based on PHI-BLAST. ELM server output provides links to the ELM annotation as well as to a number of remote resources. Using the links, researchers can explore the motifs, proteins, complex structures and associated literature to evaluate whether candidate motifs might be worth experimental investigation. PMID:19920119
Distance matrix-based approach to protein structure prediction.
Kloczkowski, Andrzej; Jernigan, Robert L; Wu, Zhijun; Song, Guang; Yang, Lei; Kolinski, Andrzej; Pokarowski, Piotr
2009-03-01
Much structural information is encoded in the internal distances; a distance matrix-based approach can be used to predict protein structure and dynamics, and for structural refinement. Our approach is based on the square distance matrix D = [r(ij)(2)] containing all square distances between residues in proteins. This distance matrix contains more information than the contact matrix C, that has elements of either 0 or 1 depending on whether the distance r (ij) is greater or less than a cutoff value r (cutoff). We have performed spectral decomposition of the distance matrices D = sigma lambda(k)V(k)V(kT), in terms of eigenvalues lambda kappa and the corresponding eigenvectors v kappa and found that it contains at most five nonzero terms. A dominant eigenvector is proportional to r (2)--the square distance of points from the center of mass, with the next three being the principal components of the system of points. By predicting r (2) from the sequence we can approximate a distance matrix of a protein with an expected RMSD value of about 7.3 A, and by combining it with the prediction of the first principal component we can improve this approximation to 4.0 A. We can also explain the role of hydrophobic interactions for the protein structure, because r is highly correlated with the hydrophobic profile of the sequence. Moreover, r is highly correlated with several sequence profiles which are useful in protein structure prediction, such as contact number, the residue-wise contact order (RWCO) or mean square fluctuations (i.e. crystallographic temperature factors). We have also shown that the next three components are related to spatial directionality of the secondary structure elements, and they may be also predicted from the sequence, improving overall structure prediction. We have also shown that the large number of available HIV-1 protease structures provides a remarkable sampling of conformations, which can be viewed as direct structural information about the dynamics. After structure matching, we apply principal component analysis (PCA) to obtain the important apparent motions for both bound and unbound structures. There are significant similarities between the first few key motions and the first few low-frequency normal modes calculated from a static representative structure with an elastic network model (ENM) that is based on the contact matrix C (related to D), strongly suggesting that the variations among the observed structures and the corresponding conformational changes are facilitated by the low-frequency, global motions intrinsic to the structure. Similarities are also found when the approach is applied to an NMR ensemble, as well as to atomic molecular dynamics (MD) trajectories. Thus, a sufficiently large number of experimental structures can directly provide important information about protein dynamics, but ENM can also provide a similar sampling of conformations. Finally, we use distance constraints from databases of known protein structures for structure refinement. We use the distributions of distances of various types in known protein structures to obtain the most probable ranges or the mean-force potentials for the distances. We then impose these constraints on structures to be refined or include the mean-force potentials directly in the energy minimization so that more plausible structural models can be built. This approach has been successfully used by us in 2006 in the CASPR structure refinement (http://predictioncenter.org/caspR).
Predicting turns in proteins with a unified model.
Song, Qi; Li, Tonghua; Cong, Peisheng; Sun, Jiangming; Li, Dapeng; Tang, Shengnan
2012-01-01
Turns are a critical element of the structure of a protein; turns play a crucial role in loops, folds, and interactions. Current prediction methods are well developed for the prediction of individual turn types, including α-turn, β-turn, and γ-turn, etc. However, for further protein structure and function prediction it is necessary to develop a uniform model that can accurately predict all types of turns simultaneously. In this study, we present a novel approach, TurnP, which offers the ability to investigate all the turns in a protein based on a unified model. The main characteristics of TurnP are: (i) using newly exploited features of structural evolution information (secondary structure and shape string of protein) based on structure homologies, (ii) considering all types of turns in a unified model, and (iii) practical capability of accurate prediction of all turns simultaneously for a query. TurnP utilizes predicted secondary structures and predicted shape strings, both of which have greater accuracy, based on innovative technologies which were both developed by our group. Then, sequence and structural evolution features, which are profile of sequence, profile of secondary structures and profile of shape strings are generated by sequence and structure alignment. When TurnP was validated on a non-redundant dataset (4,107 entries) by five-fold cross-validation, we achieved an accuracy of 88.8% and a sensitivity of 71.8%, which exceeded the most state-of-the-art predictors of certain type of turn. Newly determined sequences, the EVA and CASP9 datasets were used as independent tests and the results we achieved were outstanding for turn predictions and confirmed the good performance of TurnP for practical applications.
Predicting Turns in Proteins with a Unified Model
Song, Qi; Li, Tonghua; Cong, Peisheng; Sun, Jiangming; Li, Dapeng; Tang, Shengnan
2012-01-01
Motivation Turns are a critical element of the structure of a protein; turns play a crucial role in loops, folds, and interactions. Current prediction methods are well developed for the prediction of individual turn types, including α-turn, β-turn, and γ-turn, etc. However, for further protein structure and function prediction it is necessary to develop a uniform model that can accurately predict all types of turns simultaneously. Results In this study, we present a novel approach, TurnP, which offers the ability to investigate all the turns in a protein based on a unified model. The main characteristics of TurnP are: (i) using newly exploited features of structural evolution information (secondary structure and shape string of protein) based on structure homologies, (ii) considering all types of turns in a unified model, and (iii) practical capability of accurate prediction of all turns simultaneously for a query. TurnP utilizes predicted secondary structures and predicted shape strings, both of which have greater accuracy, based on innovative technologies which were both developed by our group. Then, sequence and structural evolution features, which are profile of sequence, profile of secondary structures and profile of shape strings are generated by sequence and structure alignment. When TurnP was validated on a non-redundant dataset (4,107 entries) by five-fold cross-validation, we achieved an accuracy of 88.8% and a sensitivity of 71.8%, which exceeded the most state-of-the-art predictors of certain type of turn. Newly determined sequences, the EVA and CASP9 datasets were used as independent tests and the results we achieved were outstanding for turn predictions and confirmed the good performance of TurnP for practical applications. PMID:23144872
Wang, Liwen; Qin, Yali; Ilchenko, Serguei; Bohon, Jen; Shi, Wuxian; Cho, Michael W.; Takamoto, Keiji; Chance, Mark R.
2010-01-01
Structural characterization of the HIV envelope protein gp120 is very important to provide an understanding of the protein's immunogenicity and it's binding to cell receptors. So far, crystallographic structure determination of gp120 with an intact V3 loop (in the absence of CD4 co-receptor or antibody) has not been achieved. The third variable region (V3) of the gp120 is immunodominant and contains glycosylation signatures that are essential for co-receptor binding and viral entry to T-cells. In this study, we characterized the structure of the outer domain of gp120 with an intact V3 loop (gp120-OD8) purified from Drosophila S2 cells utilizing mass spectrometry-based approaches. We mapped the glycosylation sites and calculated glycosylation occupancy of gp120-OD8; eleven sites from fifteen glycosylation motifs were determined as having high mannose or hybrid glycosylation structures. The specific glycan moieties of nine glycosylation sites from eight unique glycopeptides were determined by a combination of ECD and CID MS approaches. Hydroxyl radical-mediated protein footprinting coupled with mass spectrometry analysis was employed to provide detailed information on protein structure of gp120-OD8 by directly identifying accessible and hydroxyl radical-reactive side chain residues. Comparison of gp120-OD8 experimental footprinting data with a homology model derived from the ligated CD4/ gp120-OD8 crystal structure revealed a flexible V3 loop structure where the V3 tip may provide contacts with the rest of the protein while residues in the V3 base remain solvent accessible. In addition, the data illustrate interactions between specific sugar moieties and amino acid side chains potentially important to the gp120-OD8 structure. PMID:20825246
Projections for fast protein structure retrieval
Bhattacharya, Sourangshu; Bhattacharyya, Chiranjib; Chandra, Nagasuma R
2006-01-01
Background In recent times, there has been an exponential rise in the number of protein structures in databases e.g. PDB. So, design of fast algorithms capable of querying such databases is becoming an increasingly important research issue. This paper reports an algorithm, motivated from spectral graph matching techniques, for retrieving protein structures similar to a query structure from a large protein structure database. Each protein structure is specified by the 3D coordinates of residues of the protein. The algorithm is based on a novel characterization of the residues, called projections, leading to a similarity measure between the residues of the two proteins. This measure is exploited to efficiently compute the optimal equivalences. Results Experimental results show that, the current algorithm outperforms the state of the art on benchmark datasets in terms of speed without losing accuracy. Search results on SCOP 95% nonredundant database, for fold similarity with 5 proteins from different SCOP classes show that the current method performs competitively with the standard algorithm CE. The algorithm is also capable of detecting non-topological similarities between two proteins which is not possible with most of the state of the art tools like Dali. PMID:17254310
Phylogeny-Based Systematization of Arabidopsis Proteins with Histone H1 Globular Domain1[OPEN
Knizewski, Lukasz; Schmidt, Anja; Ginalski, Krzysztof
2017-01-01
H1 (or linker) histones are basic nuclear proteins that possess an evolutionarily conserved nucleosome-binding globular domain, GH1. They perform critical functions in determining the accessibility of chromatin DNA to trans-acting factors. In most metazoan species studied so far, linker histones are highly heterogenous, with numerous nonallelic variants cooccurring in the same cells. The phylogenetic relationships among these variants as well as their structural and functional properties have been relatively well established. This contrasts markedly with the rather limited knowledge concerning the phylogeny and structural and functional roles of an unusually diverse group of GH1-containing proteins in plants. The dearth of information and the lack of a coherent phylogeny-based nomenclature of these proteins can lead to misunderstandings regarding their identity and possible relationships, thereby hampering plant chromatin research. Based on published data and our in silico and high-throughput analyses, we propose a systematization and coherent nomenclature of GH1-containing proteins of Arabidopsis (Arabidopsis thaliana [L.] Heynh) that will be useful for both the identification and structural and functional characterization of homologous proteins from other plant species. PMID:28298478
Phylogeny-Based Systematization of Arabidopsis Proteins with Histone H1 Globular Domain.
Kotliński, Maciej; Knizewski, Lukasz; Muszewska, Anna; Rutowicz, Kinga; Lirski, Maciej; Schmidt, Anja; Baroux, Célia; Ginalski, Krzysztof; Jerzmanowski, Andrzej
2017-05-01
H1 (or linker) histones are basic nuclear proteins that possess an evolutionarily conserved nucleosome-binding globular domain, GH1. They perform critical functions in determining the accessibility of chromatin DNA to trans-acting factors. In most metazoan species studied so far, linker histones are highly heterogenous, with numerous nonallelic variants cooccurring in the same cells. The phylogenetic relationships among these variants as well as their structural and functional properties have been relatively well established. This contrasts markedly with the rather limited knowledge concerning the phylogeny and structural and functional roles of an unusually diverse group of GH1-containing proteins in plants. The dearth of information and the lack of a coherent phylogeny-based nomenclature of these proteins can lead to misunderstandings regarding their identity and possible relationships, thereby hampering plant chromatin research. Based on published data and our in silico and high-throughput analyses, we propose a systematization and coherent nomenclature of GH1-containing proteins of Arabidopsis ( Arabidopsis thaliana [L.] Heynh) that will be useful for both the identification and structural and functional characterization of homologous proteins from other plant species. © 2017 American Society of Plant Biologists. All Rights Reserved.
Self-assembled bionanostructures: proteins following the lead of DNA nanostructures
2014-01-01
Natural polymers are able to self-assemble into versatile nanostructures based on the information encoded into their primary structure. The structural richness of biopolymer-based nanostructures depends on the information content of building blocks and the available biological machinery to assemble and decode polymers with a defined sequence. Natural polypeptides comprise 20 amino acids with very different properties in comparison to only 4 structurally similar nucleotides, building elements of nucleic acids. Nevertheless the ease of synthesizing polynucleotides with selected sequence and the ability to encode the nanostructural assembly based on the two specific nucleotide pairs underlay the development of techniques to self-assemble almost any selected three-dimensional nanostructure from polynucleotides. Despite more complex design rules, peptides were successfully used to assemble symmetric nanostructures, such as fibrils and spheres. While earlier designed protein-based nanostructures used linked natural oligomerizing domains, recent design of new oligomerizing interaction surfaces and introduction of the platform for topologically designed protein fold may enable polypeptide-based design to follow the track of DNA nanostructures. The advantages of protein-based nanostructures, such as the functional versatility and cost effective and sustainable production methods provide strong incentive for further development in this direction. PMID:24491139
Chen, Fu; Sun, Huiyong; Wang, Junmei; Zhu, Feng; Liu, Hui; Wang, Zhe; Lei, Tailong; Li, Youyong; Hou, Tingjun
2018-06-21
Molecular docking provides a computationally efficient way to predict the atomic structural details of protein-RNA interactions (PRI), but accurate prediction of the three-dimensional structures and binding affinities for PRI is still notoriously difficult, partly due to the unreliability of the existing scoring functions for PRI. MM/PBSA and MM/GBSA are more theoretically rigorous than most scoring functions for protein-RNA docking, but their prediction performance for protein-RNA systems remains unclear. Here, we systemically evaluated the capability of MM/PBSA and MM/GBSA to predict the binding affinities and recognize the near-native binding structures for protein-RNA systems with different solvent models and interior dielectric constants (ϵ in ). For predicting the binding affinities, the predictions given by MM/GBSA based on the minimized structures in explicit solvent and the GBGBn1 model with ϵ in = 2 yielded the highest correlation with the experimental data. Moreover, the MM/GBSA calculations based on the minimized structures in implicit solvent and the GBGBn1 model distinguished the near-native binding structures within the top 10 decoys for 118 out of the 149 protein-RNA systems (79.2%). This performance is better than all docking scoring functions studied here. Therefore, the MM/GBSA rescoring is an efficient way to improve the prediction capability of scoring functions for protein-RNA systems. Published by Cold Spring Harbor Laboratory Press for the RNA Society.
Effect of fullerenol surface chemistry on nanoparticle binding-induced protein misfolding
NASA Astrophysics Data System (ADS)
Radic, Slaven; Nedumpully-Govindan, Praveen; Chen, Ran; Salonen, Emppu; Brown, Jared M.; Ke, Pu Chun; Ding, Feng
2014-06-01
Fullerene and its derivatives with different surface chemistry have great potential in biomedical applications. Accordingly, it is important to delineate the impact of these carbon-based nanoparticles on protein structure, dynamics, and subsequently function. Here, we focused on the effect of hydroxylation -- a common strategy for solubilizing and functionalizing fullerene -- on protein-nanoparticle interactions using a model protein, ubiquitin. We applied a set of complementary computational modeling methods, including docking and molecular dynamics simulations with both explicit and implicit solvent, to illustrate the impact of hydroxylated fullerenes on the structure and dynamics of ubiquitin. We found that all derivatives bound to the model protein. Specifically, the more hydrophilic nanoparticles with a higher number of hydroxyl groups bound to the surface of the protein via hydrogen bonds, which stabilized the protein without inducing large conformational changes in the protein structure. In contrast, fullerene derivatives with a smaller number of hydroxyl groups buried their hydrophobic surface inside the protein, thereby causing protein denaturation. Overall, our results revealed a distinct role of surface chemistry on nanoparticle-protein binding and binding-induced protein misfolding.Fullerene and its derivatives with different surface chemistry have great potential in biomedical applications. Accordingly, it is important to delineate the impact of these carbon-based nanoparticles on protein structure, dynamics, and subsequently function. Here, we focused on the effect of hydroxylation -- a common strategy for solubilizing and functionalizing fullerene -- on protein-nanoparticle interactions using a model protein, ubiquitin. We applied a set of complementary computational modeling methods, including docking and molecular dynamics simulations with both explicit and implicit solvent, to illustrate the impact of hydroxylated fullerenes on the structure and dynamics of ubiquitin. We found that all derivatives bound to the model protein. Specifically, the more hydrophilic nanoparticles with a higher number of hydroxyl groups bound to the surface of the protein via hydrogen bonds, which stabilized the protein without inducing large conformational changes in the protein structure. In contrast, fullerene derivatives with a smaller number of hydroxyl groups buried their hydrophobic surface inside the protein, thereby causing protein denaturation. Overall, our results revealed a distinct role of surface chemistry on nanoparticle-protein binding and binding-induced protein misfolding. Electronic supplementary information (ESI) is available: Fluorescence spectra, ITC, CD spectra and other data as described in the text. See DOI: 10.1039/c4nr01544d
Protein Crystallography in Vaccine Research and Development.
Malito, Enrico; Carfi, Andrea; Bottomley, Matthew J
2015-06-09
The use of protein X-ray crystallography for structure-based design of small-molecule drugs is well-documented and includes several notable success stories. However, it is less well-known that structural biology has emerged as a major tool for the design of novel vaccine antigens. Here, we review the important contributions that protein crystallography has made so far to vaccine research and development. We discuss several examples of the crystallographic characterization of vaccine antigen structures, alone or in complexes with ligands or receptors. We cover the critical role of high-resolution epitope mapping by reviewing structures of complexes between antigens and their cognate neutralizing, or protective, antibody fragments. Most importantly, we provide recent examples where structural insights obtained via protein crystallography have been used to design novel optimized vaccine antigens. This review aims to illustrate the value of protein crystallography in the emerging discipline of structural vaccinology and its impact on the rational design of vaccines.
Protein Crystallography in Vaccine Research and Development
Malito, Enrico; Carfi, Andrea; Bottomley, Matthew J.
2015-01-01
The use of protein X-ray crystallography for structure-based design of small-molecule drugs is well-documented and includes several notable success stories. However, it is less well-known that structural biology has emerged as a major tool for the design of novel vaccine antigens. Here, we review the important contributions that protein crystallography has made so far to vaccine research and development. We discuss several examples of the crystallographic characterization of vaccine antigen structures, alone or in complexes with ligands or receptors. We cover the critical role of high-resolution epitope mapping by reviewing structures of complexes between antigens and their cognate neutralizing, or protective, antibody fragments. Most importantly, we provide recent examples where structural insights obtained via protein crystallography have been used to design novel optimized vaccine antigens. This review aims to illustrate the value of protein crystallography in the emerging discipline of structural vaccinology and its impact on the rational design of vaccines. PMID:26068237
Bayesian comparison of protein structures using partial Procrustes distance.
Ejlali, Nasim; Faghihi, Mohammad Reza; Sadeghi, Mehdi
2017-09-26
An important topic in bioinformatics is the protein structure alignment. Some statistical methods have been proposed for this problem, but most of them align two protein structures based on the global geometric information without considering the effect of neighbourhood in the structures. In this paper, we provide a Bayesian model to align protein structures, by considering the effect of both local and global geometric information of protein structures. Local geometric information is incorporated to the model through the partial Procrustes distance of small substructures. These substructures are composed of β-carbon atoms from the side chains. Parameters are estimated using a Markov chain Monte Carlo (MCMC) approach. We evaluate the performance of our model through some simulation studies. Furthermore, we apply our model to a real dataset and assess the accuracy and convergence rate. Results show that our model is much more efficient than previous approaches.
DOE Office of Scientific and Technical Information (OSTI.GOV)
B Wallace; R Janes
CD (circular dichroism) spectroscopy is a well-established technique in structural biology. SRCD (synchrotron radiation circular dichroism) spectroscopy extends the utility and applications of conventional CD spectroscopy (using laboratory-based instruments) because the high flux of a synchrotron enables collection of data at lower wavelengths (resulting in higher information content), detection of spectra with higher signal-to-noise levels and measurements in the presence of absorbing components (buffers, salts, lipids and detergents). SRCD spectroscopy can provide important static and dynamic structural information on proteins in solution, including secondary structures of intact proteins and their domains, protein stability, the differences between wild-type and mutant proteins,more » the identification of natively disordered regions in proteins, and the dynamic processes of protein folding and membrane insertion and the kinetics of enzyme reactions. It has also been used to effectively study protein interactions, including protein-protein complex formation involving either induced-fit or rigid-body mechanisms, and protein-lipid complexes. A new web-based bioinformatics resource, the Protein Circular Dichroism Data Bank (PCDDB), has been created which enables archiving, access and analyses of CD and SRCD spectra and supporting metadata, now making this information publicly available. To summarize, the developing method of SRCD spectroscopy has the potential for playing an important role in new types of studies of protein conformations and their complexes.« less
Integrating structure-based and ligand-based approaches for computational drug design.
Wilson, Gregory L; Lill, Markus A
2011-04-01
Methods utilized in computer-aided drug design can be classified into two major categories: structure based and ligand based, using information on the structure of the protein or on the biological and physicochemical properties of bound ligands, respectively. In recent years there has been a trend towards integrating these two methods in order to enhance the reliability and efficiency of computer-aided drug-design approaches by combining information from both the ligand and the protein. This trend resulted in a variety of methods that include: pseudoreceptor methods, pharmacophore methods, fingerprint methods and approaches integrating docking with similarity-based methods. In this article, we will describe the concepts behind each method and selected applications.
eSBMTools 1.0: enhanced native structure-based modeling tools.
Lutz, Benjamin; Sinner, Claude; Heuermann, Geertje; Verma, Abhinav; Schug, Alexander
2013-11-01
Molecular dynamics simulations provide detailed insights into the structure and function of biomolecular systems. Thus, they complement experimental measurements by giving access to experimentally inaccessible regimes. Among the different molecular dynamics techniques, native structure-based models (SBMs) are based on energy landscape theory and the principle of minimal frustration. Typically used in protein and RNA folding simulations, they coarse-grain the biomolecular system and/or simplify the Hamiltonian resulting in modest computational requirements while achieving high agreement with experimental data. eSBMTools streamlines running and evaluating SBM in a comprehensive package and offers high flexibility in adding experimental- or bioinformatics-derived restraints. We present a software package that allows setting up, modifying and evaluating SBM for both RNA and proteins. The implemented workflows include predicting protein complexes based on bioinformatics-derived inter-protein contact information, a standardized setup of protein folding simulations based on the common PDB format, calculating reaction coordinates and evaluating the simulation by free-energy calculations with weighted histogram analysis method or by phi-values. The modules interface with the molecular dynamics simulation program GROMACS. The package is open source and written in architecture-independent Python2. http://sourceforge.net/projects/esbmtools/. alexander.schug@kit.edu. Supplementary data are available at Bioinformatics online.
Lubin, Johnathan W; Rao, Timsi; Mandell, Edward K; Wuttke, Deborah S; Lundblad, Victoria
2013-03-01
Mutations that confer the loss of a single biochemical property (separation-of-function mutations) can often uncover a previously unknown role for a protein in a particular biological process. However, most mutations are identified based on loss-of-function phenotypes, which cannot differentiate between separation-of-function alleles vs. mutations that encode unstable/unfolded proteins. An alternative approach is to use overexpression dominant-negative (ODN) phenotypes to identify mutant proteins that disrupt function in an otherwise wild-type strain when overexpressed. This is based on the assumption that such mutant proteins retain an overall structure that is comparable to that of the wild-type protein and are able to compete with the endogenous protein (Herskowitz 1987). To test this, the in vivo phenotypes of mutations in the Est3 telomerase subunit from Saccharomyces cerevisiae were compared with the in vitro secondary structure of these mutant proteins as analyzed by circular-dichroism spectroscopy, which demonstrates that ODN is a more sensitive assessment of protein stability than the commonly used method of monitoring protein levels from extracts. Reverse mutagenesis of EST3, which targeted different categories of amino acids, also showed that mutating highly conserved charged residues to the oppositely charged amino acid had an increased likelihood of generating a severely defective est3(-) mutation, which nevertheless encoded a structurally stable protein. These results suggest that charge-swap mutagenesis directed at a limited subset of highly conserved charged residues, combined with ODN screening to eliminate partially unfolded proteins, may provide a widely applicable and efficient strategy for generating separation-of-function mutations.
FPGA accelerator for protein secondary structure prediction based on the GOR algorithm
2011-01-01
Background Protein is an important molecule that performs a wide range of functions in biological systems. Recently, the protein folding attracts much more attention since the function of protein can be generally derived from its molecular structure. The GOR algorithm is one of the most successful computational methods and has been widely used as an efficient analysis tool to predict secondary structure from protein sequence. However, the execution time is still intolerable with the steep growth in protein database. Recently, FPGA chips have emerged as one promising application accelerator to accelerate bioinformatics algorithms by exploiting fine-grained custom design. Results In this paper, we propose a complete fine-grained parallel hardware implementation on FPGA to accelerate the GOR-IV package for 2D protein structure prediction. To improve computing efficiency, we partition the parameter table into small segments and access them in parallel. We aggressively exploit data reuse schemes to minimize the need for loading data from external memory. The whole computation structure is carefully pipelined to overlap the sequence loading, computing and back-writing operations as much as possible. We implemented a complete GOR desktop system based on an FPGA chip XC5VLX330. Conclusions The experimental results show a speedup factor of more than 430x over the original GOR-IV version and 110x speedup over the optimized version with multi-thread SIMD implementation running on a PC platform with AMD Phenom 9650 Quad CPU for 2D protein structure prediction. However, the power consumption is only about 30% of that of current general-propose CPUs. PMID:21342582
Rajasekaran, Rajalakshmi; Chen, Yi-Ping Phoebe
2012-09-01
Leishmaniasis, a multi-faceted ethereal disease is considered to be one of the World's major communicable diseases that demands exhaustive research and control measures. The substantial data on these protozoan parasites has not been utilized completely to develop potential therapeutic strategies against Leishmaniasis. Dihydrofolate reductase thymidylate synthase (DHFR-TS) plays a major role in the infective state of the parasite and hence the DHFR-TS based drugs remains of much interest to researchers working on Leishmaniasis. Although, crystal structures of DHFR-TS from different species including Plasmodium falciparum and Trypanosoma cruzi are available, the experimentally determined structure of the Leishmania major DHFR-TS has not yet been reported in the Protein Data Bank. A high quality three dimensional structure of L.major DHFR-TS has been modeled through the homology modeling approach. Carefully refined and the energy minimized structure of the modeled protein was validated using a number of structure validation programs to confirm its structure quality. The modeled protein structure was used in the process of structure based virtual screening to figure out a potential lead structure against DHFR TS. The lead molecule identified has a binding affinity of 0.51 nM and clearly follows drug like properties.
Structural basis for specific recognition of multiple mRNA targets by a PUF regulatory protein
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wang, Yeming; Opperman, Laura; Wickens, Marvin
2011-11-02
Caenorhabditis elegans fem-3 binding factor (FBF) is a founding member of the PUMILIO/FBF (PUF) family of mRNA regulatory proteins. It regulates multiple mRNAs critical for stem cell maintenance and germline development. Here, we report crystal structures of FBF in complex with 6 different 9-nt RNA sequences, including elements from 4 natural mRNAs. These structures reveal that FBF binds to conserved bases at positions 1-3 and 7-8. The key specificity determinant of FBF vs. other PUF proteins lies in positions 4-6. In FBF/RNA complexes, these bases stack directly with one another and turn away from the RNA-binding surface. A short regionmore » of FBF is sufficient to impart its unique specificity and lies directly opposite the flipped bases. We suggest that this region imposes a flattened curvature on the protein; hence, the requirement for the additional nucleotide. The principles of FBF/RNA recognition suggest a general mechanism by which PUF proteins recognize distinct families of RNAs yet exploit very nearly identical atomic contacts in doing so.« less
Structural basis for specific recognition of multiple mRNA targets by a PUF regulatory protein
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wang, Yeming; Opperman, Laura; Wickens, Marvin
2010-08-19
Caenorhabditis elegans fem-3 binding factor (FBF) is a founding member of the PUMILIO/FBF (PUF) family of mRNA regulatory proteins. It regulates multiple mRNAs critical for stem cell maintenance and germline development. Here, we report crystal structures of FBF in complex with 6 different 9-nt RNA sequences, including elements from 4 natural mRNAs. These structures reveal that FBF binds to conserved bases at positions 1-3 and 7-8. The key specificity determinant of FBF vs. other PUF proteins lies in positions 4-6. In FBF/RNA complexes, these bases stack directly with one another and turn away from the RNA-binding surface. A short regionmore » of FBF is sufficient to impart its unique specificity and lies directly opposite the flipped bases. We suggest that this region imposes a flattened curvature on the protein; hence, the requirement for the additional nucleotide. The principles of FBF/RNA recognition suggest a general mechanism by which PUF proteins recognize distinct families of RNAs yet exploit very nearly identical atomic contacts in doing so.« less
Fractal symmetry of protein interior: what have we learned?
Banerji, Anirban; Ghosh, Indira
2011-08-01
The application of fractal dimension-based constructs to probe the protein interior dates back to the development of the concept of fractal dimension itself. Numerous approaches have been tried and tested over a course of (almost) 30 years with the aim of elucidating the various facets of symmetry of self-similarity prevalent in the protein interior. In the last 5 years especially, there has been a startling upsurge of research that innovatively stretches the limits of fractal-based studies to present an array of unexpected results on the biophysical properties of protein interior. In this article, we introduce readers to the fundamentals of fractals, reviewing the commonality (and the lack of it) between these approaches before exploring the patterns in the results that they produced. Clustering the approaches in major schools of protein self-similarity studies, we describe the evolution of fractal dimension-based methodologies. The genealogy of approaches (and results) presented here portrays a clear picture of the contemporary state of fractal-based studies in the context of the protein interior. To underline the utility of fractal dimension-based measures further, we have performed a correlation dimension analysis on all of the available non-redundant protein structures, both at the level of an individual protein and at the level of structural domains. In this investigation, we were able to separately quantify the self-similar symmetries in spatial correlation patterns amongst peptide-dipole units, charged amino acids, residues with the π-electron cloud and hydrophobic amino acids. The results revealed that electrostatic environments in the interiors of proteins belonging to 'α/α toroid' (all-α class) and 'PLP-dependent transferase-like' domains (α/β class) are highly conducive. In contrast, the interiors of 'zinc finger design' ('designed proteins') and 'knottins' ('small proteins') were identified as folds with the least conducive electrostatic environments. The fold 'conotoxins' (peptides) could be unambiguously identified as one type with the least stability. The same analyses revealed that peptide-dipoles in the α/β class of proteins, in general, are more correlated to each other than are the peptide-dipoles in proteins belonging to the all-α class. Highly favorable electrostatic milieu in the interiors of TIM-barrel, α/β-hydrolase structures could explain their remarkably conserved (evolutionary) stability from a new light. Finally, we point out certain inherent limitations of fractal constructs before attempting to identify the areas and problems where the implementation of fractal dimension-based constructs can be of paramount help to unearth latent information on protein structural properties.
Nealon, John Oliver; Philomina, Limcy Seby
2017-01-01
The elucidation of protein–protein interactions is vital for determining the function and action of quaternary protein structures. Here, we discuss the difficulty and importance of establishing protein quaternary structure and review in vitro and in silico methods for doing so. Determining the interacting partner proteins of predicted protein structures is very time-consuming when using in vitro methods, this can be somewhat alleviated by use of predictive methods. However, developing reliably accurate predictive tools has proved to be difficult. We review the current state of the art in predictive protein interaction software and discuss the problem of scoring and therefore ranking predictions. Current community-based predictive exercises are discussed in relation to the growth of protein interaction prediction as an area within these exercises. We suggest a fusion of experimental and predictive methods that make use of sparse experimental data to determine higher resolution predicted protein interactions as being necessary to drive forward development. PMID:29206185
Bioinformatics and variability in drug response: a protein structural perspective
Lahti, Jennifer L.; Tang, Grace W.; Capriotti, Emidio; Liu, Tianyun; Altman, Russ B.
2012-01-01
Marketed drugs frequently perform worse in clinical practice than in the clinical trials on which their approval is based. Many therapeutic compounds are ineffective for a large subpopulation of patients to whom they are prescribed; worse, a significant fraction of patients experience adverse effects more severe than anticipated. The unacceptable risk–benefit profile for many drugs mandates a paradigm shift towards personalized medicine. However, prior to adoption of patient-specific approaches, it is useful to understand the molecular details underlying variable drug response among diverse patient populations. Over the past decade, progress in structural genomics led to an explosion of available three-dimensional structures of drug target proteins while efforts in pharmacogenetics offered insights into polymorphisms correlated with differential therapeutic outcomes. Together these advances provide the opportunity to examine how altered protein structures arising from genetic differences affect protein–drug interactions and, ultimately, drug response. In this review, we first summarize structural characteristics of protein targets and common mechanisms of drug interactions. Next, we describe the impact of coding mutations on protein structures and drug response. Finally, we highlight tools for analysing protein structures and protein–drug interactions and discuss their application for understanding altered drug responses associated with protein structural variants. PMID:22552919
2016-01-01
Many excellent methods exist that incorporate cryo-electron microscopy (cryoEM) data to constrain computational protein structure prediction and refinement. Previously, it was shown that iteration of two such orthogonal sampling and scoring methods – Rosetta and molecular dynamics (MD) simulations – facilitated exploration of conformational space in principle. Here, we go beyond a proof-of-concept study and address significant remaining limitations of the iterative MD–Rosetta protein structure refinement protocol. Specifically, all parts of the iterative refinement protocol are now guided by medium-resolution cryoEM density maps, and previous knowledge about the native structure of the protein is no longer necessary. Models are identified solely based on score or simulation time. All four benchmark proteins showed substantial improvement through three rounds of the iterative refinement protocol. The best-scoring final models of two proteins had sub-Ångstrom RMSD to the native structure over residues in secondary structure elements. Molecular dynamics was most efficient in refining secondary structure elements and was thus highly complementary to the Rosetta refinement which is most powerful in refining side chains and loop regions. PMID:25883538
Structure elucidation of dimeric transmembrane domains of bitopic proteins.
Bocharov, Eduard V; Volynsky, Pavel E; Pavlov, Konstantin V; Efremov, Roman G; Arseniev, Alexander S
2010-01-01
The interaction between transmembrane helices is of great interest because it directly determines biological activity of a membrane protein. Either destroying or enhancing such interactions can result in many diseases related to dysfunction of different tissues in human body. One much studied form of membrane proteins known as bitopic protein is a dimer containing two membrane-spanning helices associating laterally. Establishing structure-function relationship as well as rational design of new types of drugs targeting membrane proteins requires precise structural information about this class of objects. At present time, to investigate spatial structure and internal dynamics of such transmembrane helical dimers, several strategies were developed based mainly on a combination of NMR spectroscopy, optical spectroscopy, protein engineering and molecular modeling. These approaches were successfully applied to homo- and heterodimeric transmembrane fragments of several bitopic proteins, which play important roles in normal and in pathological conditions of human organism.
Tandem Repeat Proteins Inspired By Squid Ring Teeth
NASA Astrophysics Data System (ADS)
Pena-Francesch, Abdon
Proteins are large biomolecules consisting of long chains of amino acids that hierarchically assemble into complex structures, and provide a variety of building blocks for biological materials. The repetition of structural building blocks is a natural evolutionary strategy for increasing the complexity and stability of protein structures. However, the relationship between amino acid sequence, structure, and material properties of protein systems remains unclear due to the lack of control over the protein sequence and the intricacies of the assembly process. In order to investigate the repetition of protein building blocks, a recently discovered protein from squids is examined as an ideal protein system. Squid ring teeth are predatory appendages located inside the suction cups that provide a strong grasp of prey, and are solely composed of a group of proteins with tandem repetition of building blocks. The objective of this thesis is the understanding of sequence, structure and property relationship in repetitive protein materials inspired in squid ring teeth for the first time. Specifically, this work focuses on squid-inspired structural proteins with tandem repeat units in their sequence (i.e., repetition of alternating building blocks) that are physically cross-linked via beta-sheet structures. The research work presented here tests the hypothesis that, in these systems, increasing the number of building blocks in the polypeptide chain decreases the protein network defects and improves the material properties. Hence, the sequence, nanostructure, and properties (thermal, mechanical, and conducting) of tandem repeat squid-inspired protein materials are examined. Spectroscopic structural analysis, advanced materials characterization, and entropic elasticity theory are combined to elucidate the structure and material properties of these repetitive proteins. This approach is applied not only to native squid proteins but also to squid-inspired synthetic polypeptides that allow for a fine control of the sequence and network morphology. The results provided in this work establish a clear dependence between the repetitive building blocks, the network morphology, and the properties of squid-inspired repetitive protein materials. Increasing the number of tandem repeat units in SRT-inspired proteins led to more effective protein networks with superior properties. Through increasing tandem repetition and optimization of network morphology, highly efficient protein materials capable of withstanding deformations up to 400% of their original length, with MPa-GPa modulus, high energy absorption (50 MJ m-3), peak proton conductivity of 3.7 mS cm-1 (at pH 7, highest reported to date for biological materials), and peak thermal conductivity of 1.4 W m-1 K -1 (which exceeds that of most polymer materials) were developed. These findings introduce new design rules in the engineering of proteins based on tandem repetition and morphology control, and provide a novel framework for tailoring and optimizing the properties of protein-based materials.
Fast large-scale clustering of protein structures using Gauss integrals.
Harder, Tim; Borg, Mikael; Boomsma, Wouter; Røgen, Peter; Hamelryck, Thomas
2012-02-15
Clustering protein structures is an important task in structural bioinformatics. De novo structure prediction, for example, often involves a clustering step for finding the best prediction. Other applications include assigning proteins to fold families and analyzing molecular dynamics trajectories. We present Pleiades, a novel approach to clustering protein structures with a rigorous mathematical underpinning. The method approximates clustering based on the root mean square deviation by first mapping structures to Gauss integral vectors--which were introduced by Røgen and co-workers--and subsequently performing K-means clustering. Compared to current methods, Pleiades dramatically improves on the time needed to perform clustering, and can cluster a significantly larger number of structures, while providing state-of-the-art results. The number of low energy structures generated in a typical folding study, which is in the order of 50,000 structures, can be clustered within seconds to minutes.
DNA nanotubes for NMR structure determination of membrane proteins.
Bellot, Gaëtan; McClintock, Mark A; Chou, James J; Shih, William M
2013-04-01
Finding a way to determine the structures of integral membrane proteins using solution nuclear magnetic resonance (NMR) spectroscopy has proved to be challenging. A residual-dipolar-coupling-based refinement approach can be used to resolve the structure of membrane proteins up to 40 kDa in size, but to do this you need a weak-alignment medium that is detergent-resistant and it has thus far been difficult to obtain such a medium suitable for weak alignment of membrane proteins. We describe here a protocol for robust, large-scale synthesis of detergent-resistant DNA nanotubes that can be assembled into dilute liquid crystals for application as weak-alignment media in solution NMR structure determination of membrane proteins in detergent micelles. The DNA nanotubes are heterodimers of 400-nm-long six-helix bundles, each self-assembled from a M13-based p7308 scaffold strand and >170 short oligonucleotide staple strands. Compatibility with proteins bearing considerable positive charge as well as modulation of molecular alignment, toward collection of linearly independent restraints, can be introduced by reducing the negative charge of DNA nanotubes using counter ions and small DNA-binding molecules. This detergent-resistant liquid-crystal medium offers a number of properties conducive for membrane protein alignment, including high-yield production, thermal stability, buffer compatibility and structural programmability. Production of sufficient nanotubes for four or five NMR experiments can be completed in 1 week by a single individual.
NASA Astrophysics Data System (ADS)
Ward, Meaghan E.; Brown, Leonid S.; Ladizhansky, Vladimir
2015-04-01
Studies of the structure, dynamics, and function of membrane proteins (MPs) have long been considered one of the main applications of solid-state NMR (SSNMR). Advances in instrumentation, and the plethora of new SSNMR methodologies developed over the past decade have resulted in a number of high-resolution structures and structural models of both bitopic and polytopic α-helical MPs. The necessity to retain lipids in the sample, the high proportion of one type of secondary structure, differential dynamics, and the possibility of local disorder in the loop regions all create challenges for structure determination. In this Perspective article we describe our recent efforts directed at determining the structure and functional dynamics of Anabaena Sensory Rhodopsin, a heptahelical transmembrane (7TM) protein. We review some of the established and emerging methods which can be utilized for SSNMR-based structure determination, with a particular focus on those used for ASR, a bacterial protein which shares its 7TM architecture with G-protein coupled receptors.
Marti, Alessandra; Bock, Jayne E; Pagani, Maria Ambrogina; Ismail, Baraem; Seetharaman, Koushik
2016-03-01
The high protein and fiber content of intermediate wheatgrass (IWG) - together with its interesting agronomic traits and environment-related benefits - make this perennial crop attractive also for human consumption. Structural characteristics of the proteins in IWG/hard wheat flour (HWF) doughs (at IWG:HWF ratios of 0:100, 50:50, 75:25 and 100:0) - including aggregate formation, thiols availability, and secondary structure changes during dough mixing - were investigated. Proteins in IWG-doughs had higher solubility and thiol content - as function of IWG content - suggesting that protein network was mostly based on non-covalent interactions. While 50% IWG-enrichment gave an increase in random structures, enrichment at ⩾75% resulted in a decrease in β-sheets with an increase in random structures, indicating a decrease in structural order. The observed differences in protein molecular configuration and interactions in HWF compared to IWG doughs necessitate further investigation to establish their impact on the quality of IWG-enriched bread. Copyright © 2015 Elsevier Ltd. All rights reserved.
Kurczynska, Monika; Kotulska, Malgorzata
2018-01-01
Mirror protein structures are often considered as artifacts in modeling protein structures. However, they may soon become a new branch of biochemistry. Moreover, methods of protein structure reconstruction, based on their residue-residue contact maps, need methodology to differentiate between models of native and mirror orientation, especially regarding the reconstructed backbones. We analyzed 130 500 structural protein models obtained from contact maps of 1 305 SCOP domains belonging to all 7 structural classes. On average, the same numbers of native and mirror models were obtained among 100 models generated for each domain. Since their structural features are often not sufficient for differentiating between the two types of model orientations, we proposed to apply various energy terms (ETs) from PyRosetta to separate native and mirror models. To automate the procedure for differentiating these models, the k-means clustering algorithm was applied. Using total energy did not allow to obtain appropriate clusters-the accuracy of the clustering for class A (all helices) was no more than 0.52. Therefore, we tested a series of different k-means clusterings based on various combinations of ETs. Finally, applying two most differentiating ETs for each class allowed to obtain satisfying results. To unify the method for differentiating between native and mirror models, independent of their structural class, the two best ETs for each class were considered. Finally, the k-means clustering algorithm used three common ETs: probability of amino acid assuming certain values of dihedral angles Φ and Ψ, Ramachandran preferences and Coulomb interactions. The accuracies of clustering with these ETs were in the range between 0.68 and 0.76, with sensitivity and selectivity in the range between 0.68 and 0.87, depending on the structural class. The method can be applied to all fully-automated tools for protein structure reconstruction based on contact maps, especially those analyzing big sets of models.
Kurczynska, Monika
2018-01-01
Mirror protein structures are often considered as artifacts in modeling protein structures. However, they may soon become a new branch of biochemistry. Moreover, methods of protein structure reconstruction, based on their residue-residue contact maps, need methodology to differentiate between models of native and mirror orientation, especially regarding the reconstructed backbones. We analyzed 130 500 structural protein models obtained from contact maps of 1 305 SCOP domains belonging to all 7 structural classes. On average, the same numbers of native and mirror models were obtained among 100 models generated for each domain. Since their structural features are often not sufficient for differentiating between the two types of model orientations, we proposed to apply various energy terms (ETs) from PyRosetta to separate native and mirror models. To automate the procedure for differentiating these models, the k-means clustering algorithm was applied. Using total energy did not allow to obtain appropriate clusters–the accuracy of the clustering for class A (all helices) was no more than 0.52. Therefore, we tested a series of different k-means clusterings based on various combinations of ETs. Finally, applying two most differentiating ETs for each class allowed to obtain satisfying results. To unify the method for differentiating between native and mirror models, independent of their structural class, the two best ETs for each class were considered. Finally, the k-means clustering algorithm used three common ETs: probability of amino acid assuming certain values of dihedral angles Φ and Ψ, Ramachandran preferences and Coulomb interactions. The accuracies of clustering with these ETs were in the range between 0.68 and 0.76, with sensitivity and selectivity in the range between 0.68 and 0.87, depending on the structural class. The method can be applied to all fully-automated tools for protein structure reconstruction based on contact maps, especially those analyzing big sets of models. PMID:29787567
Shamim, Mohammad Tabrez Anwar; Anwaruddin, Mohammad; Nagarajaram, H A
2007-12-15
Fold recognition is a key step in the protein structure discovery process, especially when traditional sequence comparison methods fail to yield convincing structural homologies. Although many methods have been developed for protein fold recognition, their accuracies remain low. This can be attributed to insufficient exploitation of fold discriminatory features. We have developed a new method for protein fold recognition using structural information of amino acid residues and amino acid residue pairs. Since protein fold recognition can be treated as a protein fold classification problem, we have developed a Support Vector Machine (SVM) based classifier approach that uses secondary structural state and solvent accessibility state frequencies of amino acids and amino acid pairs as feature vectors. Among the individual properties examined secondary structural state frequencies of amino acids gave an overall accuracy of 65.2% for fold discrimination, which is better than the accuracy by any method reported so far in the literature. Combination of secondary structural state frequencies with solvent accessibility state frequencies of amino acids and amino acid pairs further improved the fold discrimination accuracy to more than 70%, which is approximately 8% higher than the best available method. In this study we have also tested, for the first time, an all-together multi-class method known as Crammer and Singer method for protein fold classification. Our studies reveal that the three multi-class classification methods, namely one versus all, one versus one and Crammer and Singer method, yield similar predictions. Dataset and stand-alone program are available upon request.
Prediction of Ras-effector interactions using position energy matrices.
Kiel, Christina; Serrano, Luis
2007-09-01
One of the more challenging problems in biology is to determine the cellular protein interaction network. Progress has been made to predict protein-protein interactions based on structural information, assuming that structural similar proteins interact in a similar way. In a previous publication, we have determined a genome-wide Ras-effector interaction network based on homology models, with a high accuracy of predicting binding and non-binding domains. However, for a prediction on a genome-wide scale, homology modelling is a time-consuming process. Therefore, we here successfully developed a faster method using position energy matrices, where based on different Ras-effector X-ray template structures, all amino acids in the effector binding domain are sequentially mutated to all other amino acid residues and the effect on binding energy is calculated. Those pre-calculated matrices can then be used to score for binding any Ras or effector sequences. Based on position energy matrices, the sequences of putative Ras-binding domains can be scanned quickly to calculate an energy sum value. By calibrating energy sum values using quantitative experimental binding data, thresholds can be defined and thus non-binding domains can be excluded quickly. Sequences which have energy sum values above this threshold are considered to be potential binding domains, and could be further analysed using homology modelling. This prediction method could be applied to other protein families sharing conserved interaction types, in order to determine in a fast way large scale cellular protein interaction networks. Thus, it could have an important impact on future in silico structural genomics approaches, in particular with regard to increasing structural proteomics efforts, aiming to determine all possible domain folds and interaction types. All matrices are deposited in the ADAN database (http://adan-embl.ibmc.umh.es/). Supplementary data are available at Bioinformatics online.
Dias, David M; Ciulli, Alessio
2014-01-01
Nuclear magnetic resonance (NMR) spectroscopy is a pivotal method for structure-based and fragment-based lead discovery because it is one of the most robust techniques to provide information on protein structure, dynamics and interaction at an atomic level in solution. Nowadays, in most ligand screening cascades, NMR-based methods are applied to identify and structurally validate small molecule binding. These can be high-throughput and are often used synergistically with other biophysical assays. Here, we describe current state-of-the-art in the portfolio of available NMR-based experiments that are used to aid early-stage lead discovery. We then focus on multi-protein complexes as targets and how NMR spectroscopy allows studying of interactions within the high molecular weight assemblies that make up a vast fraction of the yet untargeted proteome. Finally, we give our perspective on how currently available methods could build an improved strategy for drug discovery against such challenging targets. Copyright © 2014 The Authors. Published by Elsevier Ltd.. All rights reserved.
Combining Functional and Structural Genomics to Sample the Essential Burkholderia Structome
Baugh, Loren; Gallagher, Larry A.; Patrapuvich, Rapatbhorn; Clifton, Matthew C.; Gardberg, Anna S.; Edwards, Thomas E.; Armour, Brianna; Begley, Darren W.; Dieterich, Shellie H.; Dranow, David M.; Abendroth, Jan; Fairman, James W.; Fox, David; Staker, Bart L.; Phan, Isabelle; Gillespie, Angela; Choi, Ryan; Nakazawa-Hewitt, Steve; Nguyen, Mary Trang; Napuli, Alberto; Barrett, Lynn; Buchko, Garry W.; Stacy, Robin; Myler, Peter J.; Stewart, Lance J.; Manoil, Colin; Van Voorhis, Wesley C.
2013-01-01
Background The genus Burkholderia includes pathogenic gram-negative bacteria that cause melioidosis, glanders, and pulmonary infections of patients with cancer and cystic fibrosis. Drug resistance has made development of new antimicrobials critical. Many approaches to discovering new antimicrobials, such as structure-based drug design and whole cell phenotypic screens followed by lead refinement, require high-resolution structures of proteins essential to the parasite. Methodology/Principal Findings We experimentally identified 406 putative essential genes in B. thailandensis, a low-virulence species phylogenetically similar to B. pseudomallei, the causative agent of melioidosis, using saturation-level transposon mutagenesis and next-generation sequencing (Tn-seq). We selected 315 protein products of these genes based on structure-determination criteria, such as excluding very large and/or integral membrane proteins, and entered them into the Seattle Structural Genomics Center for Infection Disease (SSGCID) structure determination pipeline. To maximize structural coverage of these targets, we applied an “ortholog rescue” strategy for those producing insoluble or difficult to crystallize proteins, resulting in the addition of 387 orthologs (or paralogs) from seven other Burkholderia species into the SSGCID pipeline. This structural genomics approach yielded structures from 31 putative essential targets from B. thailandensis, and 25 orthologs from other Burkholderia species, yielding an overall structural coverage for 49 of the 406 essential gene families, with a total of 88 depositions into the Protein Data Bank. Of these, 25 proteins have properties of a potential antimicrobial drug target i.e., no close human homolog, part of an essential metabolic pathway, and a deep binding pocket. We describe the structures of several potential drug targets in detail. Conclusions/Significance This collection of structures, solubility and experimental essentiality data provides a resource for development of drugs against infections and diseases caused by Burkholderia. All expression clones and proteins created in this study are freely available by request. PMID:23382856
Peterson, Lenna X; Shin, Woong-Hee; Kim, Hyungrae; Kihara, Daisuke
2018-03-01
We report our group's performance for protein-protein complex structure prediction and scoring in Round 37 of the Critical Assessment of PRediction of Interactions (CAPRI), an objective assessment of protein-protein complex modeling. We demonstrated noticeable improvement in both prediction and scoring compared to previous rounds of CAPRI, with our human predictor group near the top of the rankings and our server scorer group at the top. This is the first time in CAPRI that a server has been the top scorer group. To predict protein-protein complex structures, we used both multi-chain template-based modeling (TBM) and our protein-protein docking program, LZerD. LZerD represents protein surfaces using 3D Zernike descriptors (3DZD), which are based on a mathematical series expansion of a 3D function. Because 3DZD are a soft representation of the protein surface, LZerD is tolerant to small conformational changes, making it well suited to docking unbound and TBM structures. The key to our improved performance in CAPRI Round 37 was to combine multi-chain TBM and docking. As opposed to our previous strategy of performing docking for all target complexes, we used TBM when multi-chain templates were available and docking otherwise. We also describe the combination of multiple scoring functions used by our server scorer group, which achieved the top rank for the scorer phase. © 2017 Wiley Periodicals, Inc.
Characterizing monoclonal antibody structure by carbodiimide/GEE footprinting
Kaur, Parminder; Tomechko, Sara; Kiselar, Janna; Shi, Wuxian; Deperalta, Galahad; Wecksler, Aaron T; Gokulrangan, Giridharan; Ling, Victor; Chance, Mark R
2014-01-01
Amino acid-specific covalent labeling is well suited to probe protein structure and macromolecular interactions, especially for macromolecules and their complexes that are difficult to examine by alternative means, due to size, complexity, or instability. Here we present a detailed account of carbodiimide-based covalent labeling (with GEE tagging) applied to a glycosylated monoclonal antibody therapeutic, which represents an important class of biologic drugs. Characterization of such proteins and their antigen complexes is essential to development of new biologic-based medicines. In this study, the experiments were optimized to preserve the structural integrity of the protein, and experimental conditions were varied and replicated to establish the reproducibility and precision of the technique. Homology-based models were generated and used to compare the solvent accessibility of the labeled residues, which include D, E, and the C-terminus, against the experimental surface accessibility data in order to understand the accuracy of the approach in providing an unbiased assessment of structure. Data from the protein were also compared to reactivity measures of several model peptides to explain sequence or structure-based variations in reactivity. The results highlight several advantages of this approach. These include: the ease of use at the bench top, the linearity of the dose response plots at high levels of labeling (indicating that the label does not significantly perturb the structure of the protein), the high reproducibility of replicate experiments (<2 % variation in modification extent), the similar reactivity of the 3 target probe residues (as suggested by analysis of model peptides), and the overall positive and significant correlation of reactivity and solvent accessible surface area (the latter values predicted by the homology modeling). Attenuation of reactivity, in otherwise solvent accessible probes, is documented as arising from the effects of positive charge or bond formation between adjacent amine and carboxyl groups, the latter accompanied by observed water loss. The results are also compared with data from hydroxyl radical-mediated oxidative footprinting on the same protein, showing that complementary information is gained from the 2 approaches, although the number of target residues in carbodiimide/GEE labeling is fewer. Overall, this approach is an accurate and precise method for assessing protein structure of biologic drugs. PMID:25484052
Baker, Max O D G; Shanmugam, Nirukshan; Pham, Chi L L; Strange, Merryn; Steain, Megan; Sunde, Margaret
2018-05-05
The Receptor-interacting protein kinase Homotypic Interaction Motif (RHIM) is an amino acid sequence that mediates multiple protein:protein interactions in the mammalian programmed cell death pathway known as necroptosis. At least one key RHIM-based complex has been shown to have a functional amyloid fibril structure, which provides a stable hetero-oligomeric platform for downstream signaling. RHIMs and related motifs are present in immunity-related proteins across nature, from viruses to fungi to metazoans. Necroptosis is a hallmark feature of cellular clearance of infection. For this reason, numerous pathogens, including viruses and bacteria, have developed varied methods to modulate necroptosis, focusing on inhibiting RHIM:RHIM interactions, and thus their downstream cell death effects. This review will discuss current understanding of RHIM:RHIM interactions in normal cellular activation of necroptosis, from a structural and cell biology perspective. It will compare the mechanisms by which pathogens subvert these interactions in order to maintain their replicative and infective cycles and consider the similarities between RHIMs and other functional amyloid-forming proteins associated with cell death and innate immunity. It will discuss the implications of the heteromeric nature and structure of RHIM-based amyloid complexes in the context of other functional amyloids. Copyright © 2018. Published by Elsevier Ltd.
Doiron, Kevin J; Yu, Peiqiang
2017-01-02
Advanced synchrotron radiation-based infrared microspectroscopy is able to reveal feed and food structure feature at cellular and molecular levels and simultaneously provides composition, structure, environment, and chemistry within intact tissue. However, to date, this advanced synchrotron-based technique is still seldom known to food and feed scientists. This article aims to provide detailed background for flaxseed (oil seed) protein research and then review recent progress and development in flaxseed research in ruminant nutrition in the areas of (1) dietary inclusion of flaxseed in rations; (2) heat processing effect; (3) assessing dietary protein; (4) synchrotron-based Fourier transform infrared microspectroscopy as a tool of nutritive evaluation within cellular and subcellular dimensions; (5) recent synchrotron applications in flaxseed research on a molecular basis. The information described in this paper gives better insight in flaxseed research progress and update.
Wang, Bing; Westerhoff, Lance M.; Merz, Kenneth M.
2008-01-01
We have generated docking poses for the FKBP-GPI complex using eight docking programs, and compared their scoring functions with scoring based on NMR chemical shift perturbations (NMRScore). Because the chemical shift perturbation (CSP) is exquisitely sensitive on the orientation of ligand inside the binding pocket, NMRScore offers an accurate and straightforward approach to score different poses. All scoring functions were inspected by their abilities to highly rank the native-like structures and separate them from decoy poses generated for a protein-ligand complex. The overall performance of NMRScore is much better than that of energy-based scoring functions associated with docking programs in both aspects. In summary, we find that the combination of docking programs with NMRScore results in an approach that can robustly determine the binding site structure for a protein-ligand complex, thereby, providing a new tool facilitating the structure-based drug discovery process. PMID:17867664
Xue, Yi; Skrynnikov, Nikolai R
2014-01-01
Currently, the best existing molecular dynamics (MD) force fields cannot accurately reproduce the global free-energy minimum which realizes the experimental protein structure. As a result, long MD trajectories tend to drift away from the starting coordinates (e.g., crystallographic structures). To address this problem, we have devised a new simulation strategy aimed at protein crystals. An MD simulation of protein crystal is essentially an ensemble simulation involving multiple protein molecules in a crystal unit cell (or a block of unit cells). To ensure that average protein coordinates remain correct during the simulation, we introduced crystallography-based restraints into the MD protocol. Because these restraints are aimed at the ensemble-average structure, they have only minimal impact on conformational dynamics of the individual protein molecules. So long as the average structure remains reasonable, the proteins move in a native-like fashion as dictated by the original force field. To validate this approach, we have used the data from solid-state NMR spectroscopy, which is the orthogonal experimental technique uniquely sensitive to protein local dynamics. The new method has been tested on the well-established model protein, ubiquitin. The ensemble-restrained MD simulations produced lower crystallographic R factors than conventional simulations; they also led to more accurate predictions for crystallographic temperature factors, solid-state chemical shifts, and backbone order parameters. The predictions for 15N R1 relaxation rates are at least as accurate as those obtained from conventional simulations. Taken together, these results suggest that the presented trajectories may be among the most realistic protein MD simulations ever reported. In this context, the ensemble restraints based on high-resolution crystallographic data can be viewed as protein-specific empirical corrections to the standard force fields. PMID:24452989
Text Mining Improves Prediction of Protein Functional Sites
Cohn, Judith D.; Ravikumar, Komandur E.
2012-01-01
We present an approach that integrates protein structure analysis and text mining for protein functional site prediction, called LEAP-FS (Literature Enhanced Automated Prediction of Functional Sites). The structure analysis was carried out using Dynamics Perturbation Analysis (DPA), which predicts functional sites at control points where interactions greatly perturb protein vibrations. The text mining extracts mentions of residues in the literature, and predicts that residues mentioned are functionally important. We assessed the significance of each of these methods by analyzing their performance in finding known functional sites (specifically, small-molecule binding sites and catalytic sites) in about 100,000 publicly available protein structures. The DPA predictions recapitulated many of the functional site annotations and preferentially recovered binding sites annotated as biologically relevant vs. those annotated as potentially spurious. The text-based predictions were also substantially supported by the functional site annotations: compared to other residues, residues mentioned in text were roughly six times more likely to be found in a functional site. The overlap of predictions with annotations improved when the text-based and structure-based methods agreed. Our analysis also yielded new high-quality predictions of many functional site residues that were not catalogued in the curated data sources we inspected. We conclude that both DPA and text mining independently provide valuable high-throughput protein functional site predictions, and that integrating the two methods using LEAP-FS further improves the quality of these predictions. PMID:22393388
Marzaro, Giovanni; Ferrarese, Alessandro; Chilin, Adriana
2014-08-01
The selection of the most appropriate protein conformation is a crucial aspect in molecular docking experiments. In order to reduce the errors arising from the use of a single protein conformation, several authors suggest the use of several tridimensional structures for the target. However, the selection of the most appropriate protein conformations still remains a challenging goal. The protein 3D-structures selection is mainly performed based on pairwise root-mean-square-deviation (RMSD) values computation, followed by hierarchical clustering. Herein we report an alternative strategy, based on the computation of only two atom affinity map for each protein conformation, followed by multivariate analysis and hierarchical clustering. This methodology was applied on seven different kinases of pharmaceutical interest. The comparison with the classical RMSD-based strategy was based on cross-docking of co-crystallized ligands. In the case of epidermal growth factor receptor kinase, also the docking performance on 220 known ligands were evaluated, followed by 3D-QSAR studies. In all the cases, the herein proposed methodology outperformed the RMSD-based one.
Zhang, Xuewei; Yu, Peiqiang
2014-07-02
Non-invasive techniques are a key to study nutrition and structure interaction. Fourier transform infrared microspectroscopy coupled with a synchrotron radiation source (SR-IMS) is a rapid, non-invasive, and non-destructive bioanalytical technique. To understand internal structure changes in relation to nutrient availability in oil seed processing is vital to find optimal processing conditions. The objective of this study was to use a synchrotron-based bioanalytical technique SR-IMS as a non-invasive and non-destructive tool to study the effects of heat-processing methods and oil seed canola type on modeled protein structure based on spectral data within intact tissue that were randomly selected and quantify the relationship between the modeled protein structure and protein nutrient supply to ruminants. The results showed that the moisture heat-related processing significantly changed (p<0.05) modeled protein structures compared to the raw canola (control) and those processing by dry heating. The moisture heating increased (p<0.05) spectral intensities of amide I, amide II, α-helices, and β-sheets but decreased (p<0.05) the ratio of modeled α-helices to β-sheet spectral intensity. There was no difference (p>0.05) in the protein spectral profile between the raw and dry-heated canola tissue and between yellow- and brown-type canola tissue. The results indicated that different heat processing methods have different impacts on the protein inherent structure. The protein intrinsic structure in canola seed tissue was more sensitive and more response to the moisture heating in comparison to the dry heating. These changes are expected to be related to the nutritive value. However, the current study is based on limited samples, and more large-scale studies are needed to confirm our findings.
NASA Astrophysics Data System (ADS)
Tavenor, Nathan Albert
Protein-based supramolecular polymers (SMPs) are a class of biomaterials which draw inspiration from and expand upon the many examples of complex protein quaternary structures observed in nature: collagen, microtubules, viral capsids, etc. Designing synthetic supramolecular protein scaffolds both increases our understanding of natural superstructures and allows for the creation of novel materials. Similar to small-molecule SMPs, protein-based SMPs form due to self-assembly driven by intermolecular interactions between monomers, and monomer structure determines the properties of the overall material. Using protein-based monomers takes advantage of the self-assembly and highly specific molecular recognition properties encodable in polypeptide sequences to rationally design SMP architectures. The central hypothesis underlying our work is that alpha-helical coiled coils, a well-studied protein quaternary folding motif, are well-suited to SMP design through the addition of synthetic linkers at solvent-exposed sites. Through small changes in the structures of the cross-links and/or peptide sequence, we have been able to control both the nanoscale organization and the macroscopic properties of the SMPs. Changes to the linker and hydrophobic core of the peptide can be used to control polymer rigidity, stability, and dimensionality. The gaps in knowledge that this thesis sought to fill on this project were 1) the relationship between the molecular structure of the cross-linked polypeptides and the macroscopic properties of the SMPs and 2) a means of creating materials exhibiting multi-dimensional net or framework topologies. Separate from the above efforts on supramolecular architectures was work on improving backbone modification strategies for an alpha-helix in the context of a complex protein tertiary fold. Earlier work in our lab had successfully incorporated unnatural building blocks into every major secondary structure (beta-sheet, alpha-helix, loops and beta-turns) of a small protein with a tertiary fold. Although the tertiary fold of the native sequence was mimicked by the resulting artificial protein, the thermodynamic stability was greatly compromised. Most of this energetic penalty derived from the modifications present in the alpha-helix. The contribution within this thesis was direct comparison of several alpha-helical design strategies and establishment of the thermodynamic consequences of each.
Liu, Lizhen; Sun, Xiaowu; Song, Wei; Du, Chao
2018-06-01
Predicting protein complexes from protein-protein interaction (PPI) network is of great significance to recognize the structure and function of cells. A protein may interact with different proteins under different time or conditions. Existing approaches only utilize static PPI network data that may lose much temporal biological information. First, this article proposed a novel method that combines gene expression data at different time points with traditional static PPI network to construct different dynamic subnetworks. Second, to further filter out the data noise, the semantic similarity based on gene ontology is regarded as the network weight together with the principal component analysis, which is introduced to deal with the weight computing by three traditional methods. Third, after building a dynamic PPI network, a predicting protein complexes algorithm based on "core-attachment" structural feature is applied to detect complexes from each dynamic subnetworks. Finally, it is revealed from the experimental results that our method proposed in this article performs well on detecting protein complexes from dynamic weighted PPI networks.
Structural genomics: keeping up with expanding knowledge of the protein universe
Grabowski, Marek; Joachimiak, Andrzej; Otwinowski, Zbyszek; Minor, Wladek
2010-01-01
Structural characterization of the protein universe is the main mission of Structural Genomics (SG) programs. However, progress in gene sequencing technology, set in motion in the 1990s, has resulted in rapid expansion of protein sequence space — a twelvefold increase in the past seven years. For the SG field, this creates new challenges and necessitates a reassessment of its strategies. Nevertheless, despite the growth of sequence space, at present nearly half of the content of the Swiss-Prot database and over 40% of Pfam protein families can be structurally modeled based on structures determined so far, with SG projects making an increasingly significant contribution. The SG contribution of new Pfam structures nearly doubled from 27.2% in 2003 to 51.6% in 2006. PMID:17587562
Lee, Hui Sun; Im, Wonpil
2016-04-01
Molecular recognition by protein mostly occurs in a local region on the protein surface. Thus, an efficient computational method for accurate characterization of protein local structural conservation is necessary to better understand biology and drug design. We present a novel local structure alignment tool, G-LoSA. G-LoSA aligns protein local structures in a sequence order independent way and provides a GA-score, a chemical feature-based and size-independent structure similarity score. Our benchmark validation shows the robust performance of G-LoSA to the local structures of diverse sizes and characteristics, demonstrating its universal applicability to local structure-centric comparative biology studies. In particular, G-LoSA is highly effective in detecting conserved local regions on the entire surface of a given protein. In addition, the applications of G-LoSA to identifying template ligands and predicting ligand and protein binding sites illustrate its strong potential for computer-aided drug design. We hope that G-LoSA can be a useful computational method for exploring interesting biological problems through large-scale comparison of protein local structures and facilitating drug discovery research and development. G-LoSA is freely available to academic users at http://im.compbio.ku.edu/GLoSA/. © 2016 The Protein Society.
High performance transcription factor-DNA docking with GPU computing
2012-01-01
Background Protein-DNA docking is a very challenging problem in structural bioinformatics and has important implications in a number of applications, such as structure-based prediction of transcription factor binding sites and rational drug design. Protein-DNA docking is very computational demanding due to the high cost of energy calculation and the statistical nature of conformational sampling algorithms. More importantly, experiments show that the docking quality depends on the coverage of the conformational sampling space. It is therefore desirable to accelerate the computation of the docking algorithm, not only to reduce computing time, but also to improve docking quality. Methods In an attempt to accelerate the sampling process and to improve the docking performance, we developed a graphics processing unit (GPU)-based protein-DNA docking algorithm. The algorithm employs a potential-based energy function to describe the binding affinity of a protein-DNA pair, and integrates Monte-Carlo simulation and a simulated annealing method to search through the conformational space. Algorithmic techniques were developed to improve the computation efficiency and scalability on GPU-based high performance computing systems. Results The effectiveness of our approach is tested on a non-redundant set of 75 TF-DNA complexes and a newly developed TF-DNA docking benchmark. We demonstrated that the GPU-based docking algorithm can significantly accelerate the simulation process and thereby improving the chance of finding near-native TF-DNA complex structures. This study also suggests that further improvement in protein-DNA docking research would require efforts from two integral aspects: improvement in computation efficiency and energy function design. Conclusions We present a high performance computing approach for improving the prediction accuracy of protein-DNA docking. The GPU-based docking algorithm accelerates the search of the conformational space and thus increases the chance of finding more near-native structures. To the best of our knowledge, this is the first ad hoc effort of applying GPU or GPU clusters to the protein-DNA docking problem. PMID:22759575
Fast iodide-SAD phasing for high-throughput membrane protein structure determination
Melnikov, Igor; Polovinkin, Vitaly; Kovalev, Kirill; Gushchin, Ivan; Shevtsov, Mikhail; Shevchenko, Vitaly; Mishin, Alexey; Alekseev, Alexey; Rodriguez-Valera, Francisco; Borshchevskiy, Valentin; Cherezov, Vadim; Leonard, Gordon A.; Gordeliy, Valentin; Popov, Alexander
2017-01-01
We describe a fast, easy, and potentially universal method for the de novo solution of the crystal structures of membrane proteins via iodide–single-wavelength anomalous diffraction (I-SAD). The potential universality of the method is based on a common feature of membrane proteins—the availability at the hydrophobic-hydrophilic interface of positively charged amino acid residues with which iodide strongly interacts. We demonstrate the solution using I-SAD of four crystal structures representing different classes of membrane proteins, including a human G protein–coupled receptor (GPCR), and we show that I-SAD can be applied using data collection strategies based on either standard or serial x-ray crystallography techniques. PMID:28508075
Sequence Determinants of Compaction in Intrinsically Disordered Proteins
Marsh, Joseph A.; Forman-Kay, Julie D.
2010-01-01
Abstract Intrinsically disordered proteins (IDPs), which lack folded structure and are disordered under nondenaturing conditions, have been shown to perform important functions in a large number of cellular processes. These proteins have interesting structural properties that deviate from the random-coil-like behavior exhibited by chemically denatured proteins. In particular, IDPs are often observed to exhibit significant compaction. In this study, we have analyzed the hydrodynamic radii of a number of IDPs to investigate the sequence determinants of this compaction. Net charge and proline content are observed to be strongly correlated with increased hydrodynamic radii, suggesting that these are the dominant contributors to compaction. Hydrophobicity and secondary structure, on the other hand, appear to have negligible effects on compaction, which implies that the determinants of structure in folded and intrinsically disordered proteins are profoundly different. Finally, we observe that polyhistidine tags seem to increase IDP compaction, which suggests that these tags have significant perturbing effects and thus should be removed before any structural characterizations of IDPs. Using the relationships observed in this analysis, we have developed a sequence-based predictor of hydrodynamic radius for IDPs that shows substantial improvement over a simple model based upon chain length alone. PMID:20483348
Banach, Mateusz; Konieczny, Leszek; Roterman, Irena
2014-10-21
In this paper we show that the fuzzy oil drop model represents a general framework for describing the generation of hydrophobic cores in proteins and thus provides insight into the influence of the water environment upon protein structure and stability. The model has been successfully applied in the study of a wide range of proteins, however this paper focuses specifically on domains representing immunoglobulin-like folds. Here we provide evidence that immunoglobulin-like domains, despite being structurally similar, differ with respect to their participation in the generation of hydrophobic core. It is shown that β-structural fragments in β-barrels participate in hydrophobic core formation in a highly differentiated manner. Quantitatively measured participation in core formation helps explain the variable stability of proteins and is shown to be related to their biological properties. This also includes the known tendency of immunoglobulin domains to form amyloids, as shown using transthyretin to reveal the clear relation between amyloidogenic properties and structural characteristics based on the fuzzy oil drop model. Copyright © 2014 The Authors. Published by Elsevier Ltd.. All rights reserved.
The turn of the screw: an exercise in protein secondary structure.
Pikaart, Michael
2011-01-01
An exercise using simple paper strips to illustrate protein helical and sheet secondary structures is presented. Drawing on the rich historical context of the use of physical models in protein biochemistry by early practitioners, in particular Linus Pauling, the purpose of this activity is to cultivate in students a hands-on, intuitive sense of protein secondary structure and to complement the common computer-based structural portrayals often used in teaching biochemistry. As students fold these paper strips into model secondary structures, they will better grasp how intramolecular hydrogen bonds form in the folding of a polypeptide into secondary structure, and how these hydrogen bonds direct the overall shape of helical and sheet structures, including the handedness of the α-helix and the difference between right- and the left-handed twist. Copyright © 2010 Wiley Periodicals, Inc.
Vögeli, Beat; Orts, Julien; Strotz, Dean; Chi, Celestine; Minges, Martina; Wälti, Marielle Aulikki; Güntert, Peter; Riek, Roland
2014-04-01
Confined by the Boltzmann distribution of the energies of the states, a multitude of structural states are inherent to biomolecules. For a detailed understanding of a protein's function, its entire structural landscape at atomic resolution and insight into the interconversion between all the structural states (i.e. dynamics) are required. Whereas dedicated trickery with NMR relaxation provides aspects of local dynamics, and 3D structure determination by NMR is well established, only recently have several attempts been made to formulate a more comprehensive description of the dynamics and the structural landscape of a protein. Here, a perspective is given on the use of exact NOEs (eNOEs) for the elucidation of structural ensembles of a protein describing the covered conformational space. Copyright © 2013 Elsevier Inc. All rights reserved.
ERIC Educational Resources Information Center
Johnson, R. Jeremy
2014-01-01
HIV protease has served as a model protein for understanding protein structure, enzyme kinetics, structure-based drug design, and protein evolution. Inhibitors of HIV protease are also an essential part of effective HIV/AIDS treatment and have provided great societal benefits. The broad applications for HIV protease and its inhibitors make it a…
Rodriguez-Rivas, Juan; Marsili, Simone; Juan, David; Valencia, Alfonso
2016-12-27
Protein-protein interactions are fundamental for the proper functioning of the cell. As a result, protein interaction surfaces are subject to strong evolutionary constraints. Recent developments have shown that residue coevolution provides accurate predictions of heterodimeric protein interfaces from sequence information. So far these approaches have been limited to the analysis of families of prokaryotic complexes for which large multiple sequence alignments of homologous sequences can be compiled. We explore the hypothesis that coevolution points to structurally conserved contacts at protein-protein interfaces, which can be reliably projected to homologous complexes with distantly related sequences. We introduce a domain-centered protocol to study the interplay between residue coevolution and structural conservation of protein-protein interfaces. We show that sequence-based coevolutionary analysis systematically identifies residue contacts at prokaryotic interfaces that are structurally conserved at the interface of their eukaryotic counterparts. In turn, this allows the prediction of conserved contacts at eukaryotic protein-protein interfaces with high confidence using solely mutational patterns extracted from prokaryotic genomes. Even in the context of high divergence in sequence (the twilight zone), where standard homology modeling of protein complexes is unreliable, our approach provides sequence-based accurate information about specific details of protein interactions at the residue level. Selected examples of the application of prokaryotic coevolutionary analysis to the prediction of eukaryotic interfaces further illustrate the potential of this approach.
Mining protein loops using a structural alphabet and statistical exceptionality
2010-01-01
Background Protein loops encompass 50% of protein residues in available three-dimensional structures. These regions are often involved in protein functions, e.g. binding site, catalytic pocket... However, the description of protein loops with conventional tools is an uneasy task. Regular secondary structures, helices and strands, have been widely studied whereas loops, because they are highly variable in terms of sequence and structure, are difficult to analyze. Due to data sparsity, long loops have rarely been systematically studied. Results We developed a simple and accurate method that allows the description and analysis of the structures of short and long loops using structural motifs without restriction on loop length. This method is based on the structural alphabet HMM-SA. HMM-SA allows the simplification of a three-dimensional protein structure into a one-dimensional string of states, where each state is a four-residue prototype fragment, called structural letter. The difficult task of the structural grouping of huge data sets is thus easily accomplished by handling structural letter strings as in conventional protein sequence analysis. We systematically extracted all seven-residue fragments in a bank of 93000 protein loops and grouped them according to the structural-letter sequence, named structural word. This approach permits a systematic analysis of loops of all sizes since we consider the structural motifs of seven residues rather than complete loops. We focused the analysis on highly recurrent words of loops (observed more than 30 times). Our study reveals that 73% of loop-lengths are covered by only 3310 highly recurrent structural words out of 28274 observed words). These structural words have low structural variability (mean RMSd of 0.85 Å). As expected, half of these motifs display a flanking-region preference but interestingly, two thirds are shared by short (less than 12 residues) and long loops. Moreover, half of recurrent motifs exhibit a significant level of amino-acid conservation with at least four significant positions and 87% of long loops contain at least one such word. We complement our analysis with the detection of statistically over-represented patterns of structural letters as in conventional DNA sequence analysis. About 30% (930) of structural words are over-represented, and cover about 40% of loop lengths. Interestingly, these words exhibit lower structural variability and higher sequential specificity, suggesting structural or functional constraints. Conclusions We developed a method to systematically decompose and study protein loops using recurrent structural motifs. This method is based on the structural alphabet HMM-SA and not on structural alignment and geometrical parameters. We extracted meaningful structural motifs that are found in both short and long loops. To our knowledge, it is the first time that pattern mining helps to increase the signal-to-noise ratio in protein loops. This finding helps to better describe protein loops and might permit to decrease the complexity of long-loop analysis. Detailed results are available at http://www.mti.univ-paris-diderot.fr/publication/supplementary/2009/ACCLoop/. PMID:20132552
Mining protein loops using a structural alphabet and statistical exceptionality.
Regad, Leslie; Martin, Juliette; Nuel, Gregory; Camproux, Anne-Claude
2010-02-04
Protein loops encompass 50% of protein residues in available three-dimensional structures. These regions are often involved in protein functions, e.g. binding site, catalytic pocket... However, the description of protein loops with conventional tools is an uneasy task. Regular secondary structures, helices and strands, have been widely studied whereas loops, because they are highly variable in terms of sequence and structure, are difficult to analyze. Due to data sparsity, long loops have rarely been systematically studied. We developed a simple and accurate method that allows the description and analysis of the structures of short and long loops using structural motifs without restriction on loop length. This method is based on the structural alphabet HMM-SA. HMM-SA allows the simplification of a three-dimensional protein structure into a one-dimensional string of states, where each state is a four-residue prototype fragment, called structural letter. The difficult task of the structural grouping of huge data sets is thus easily accomplished by handling structural letter strings as in conventional protein sequence analysis. We systematically extracted all seven-residue fragments in a bank of 93000 protein loops and grouped them according to the structural-letter sequence, named structural word. This approach permits a systematic analysis of loops of all sizes since we consider the structural motifs of seven residues rather than complete loops. We focused the analysis on highly recurrent words of loops (observed more than 30 times). Our study reveals that 73% of loop-lengths are covered by only 3310 highly recurrent structural words out of 28274 observed words). These structural words have low structural variability (mean RMSd of 0.85 A). As expected, half of these motifs display a flanking-region preference but interestingly, two thirds are shared by short (less than 12 residues) and long loops. Moreover, half of recurrent motifs exhibit a significant level of amino-acid conservation with at least four significant positions and 87% of long loops contain at least one such word. We complement our analysis with the detection of statistically over-represented patterns of structural letters as in conventional DNA sequence analysis. About 30% (930) of structural words are over-represented, and cover about 40% of loop lengths. Interestingly, these words exhibit lower structural variability and higher sequential specificity, suggesting structural or functional constraints. We developed a method to systematically decompose and study protein loops using recurrent structural motifs. This method is based on the structural alphabet HMM-SA and not on structural alignment and geometrical parameters. We extracted meaningful structural motifs that are found in both short and long loops. To our knowledge, it is the first time that pattern mining helps to increase the signal-to-noise ratio in protein loops. This finding helps to better describe protein loops and might permit to decrease the complexity of long-loop analysis. Detailed results are available at http://www.mti.univ-paris-diderot.fr/publication/supplementary/2009/ACCLoop/.
Sasse, Alexander; de Vries, Sjoerd J; Schindler, Christina E M; de Beauchêne, Isaure Chauvot; Zacharias, Martin
2017-01-01
Protein-protein docking protocols aim to predict the structures of protein-protein complexes based on the structure of individual partners. Docking protocols usually include several steps of sampling, clustering, refinement and re-scoring. The scoring step is one of the bottlenecks in the performance of many state-of-the-art protocols. The performance of scoring functions depends on the quality of the generated structures and its coupling to the sampling algorithm. A tool kit, GRADSCOPT (GRid Accelerated Directly SCoring OPTimizing), was designed to allow rapid development and optimization of different knowledge-based scoring potentials for specific objectives in protein-protein docking. Different atomistic and coarse-grained potentials can be created by a grid-accelerated directly scoring dependent Monte-Carlo annealing or by a linear regression optimization. We demonstrate that the scoring functions generated by our approach are similar to or even outperform state-of-the-art scoring functions for predicting near-native solutions. Of additional importance, we find that potentials specifically trained to identify the native bound complex perform rather poorly on identifying acceptable or medium quality (near-native) solutions. In contrast, atomistic long-range contact potentials can increase the average fraction of near-native poses by up to a factor 2.5 in the best scored 1% decoys (compared to existing scoring), emphasizing the need of specific docking potentials for different steps in the docking protocol.
Generation of Viable Cell and Biomaterial Patterns by Laser Transfer
NASA Astrophysics Data System (ADS)
Ringeisen, Bradley
2001-03-01
In order to fabricate and interface biological systems for next generation applications such as biosensors, protein recognition microarrays, and engineered tissues, it is imperative to have a method of accurately and rapidly depositing different active biomaterials in patterns or layered structures. Ideally, the biomaterial structures would also be compatible with many different substrates including technologically relevant platforms such as electronic circuits or various detection devices. We have developed a novel laser-based technique, termed matrix assisted pulsed laser evaporation direct write (MAPLE DW), that is able to direct write patterns and three-dimensional structures of numerous biologically active species ranging from proteins and antibodies to living cells. Specifically, we have shown that MAPLE DW is capable of forming mesoscopic patterns of living prokaryotic cells (E. coli bacteria), living mammalian cells (Chinese hamster ovaries), active proteins (biotinylated bovine serum albumin, horse radish peroxidase), and antibodies specific to a variety of classes of cancer related proteins including intracellular and extracellular matrix proteins, signaling proteins, cell cycle proteins, growth factors, and growth factor receptors. In addition, patterns of viable cells and active biomolecules were deposited on different substrates including metals, semiconductors, nutrient agar, and functionalized glass slides. We will present an explanation of the laser-based transfer mechanism as well as results from our recent efforts to fabricate protein recognition microarrays and tissue-based microfluidic networks.
Su, Min-Gang; Weng, Julia Tzu-Ya; Hsu, Justin Bo-Kai; Huang, Kai-Yao; Chi, Yu-Hsiang; Lee, Tzong-Yi
2017-12-21
Protein post-translational modification (PTM) plays an essential role in various cellular processes that modulates the physical and chemical properties, folding, conformation, stability and activity of proteins, thereby modifying the functions of proteins. The improved throughput of mass spectrometry (MS) or MS/MS technology has not only brought about a surge in proteome-scale studies, but also contributed to a fruitful list of identified PTMs. However, with the increase in the number of identified PTMs, perhaps the more crucial question is what kind of biological mechanisms these PTMs are involved in. This is particularly important in light of the fact that most protein-based pharmaceuticals deliver their therapeutic effects through some form of PTM. Yet, our understanding is still limited with respect to the local effects and frequency of PTM sites near pharmaceutical binding sites and the interfaces of protein-protein interaction (PPI). Understanding PTM's function is critical to our ability to manipulate the biological mechanisms of protein. In this study, to understand the regulation of protein functions by PTMs, we mapped 25,835 PTM sites to proteins with available three-dimensional (3D) structural information in the Protein Data Bank (PDB), including 1785 modified PTM sites on the 3D structure. Based on the acquired structural PTM sites, we proposed to use five properties for the structural characterization of PTM substrate sites: the spatial composition of amino acids, residues and side-chain orientations surrounding the PTM substrate sites, as well as the secondary structure, division of acidity and alkaline residues, and solvent-accessible surface area. We further mapped the structural PTM sites to the structures of drug binding and PPI sites, identifying a total of 1917 PTM sites that may affect PPI and 3951 PTM sites associated with drug-target binding. An integrated analytical platform (CruxPTM), with a variety of methods and online molecular docking tools for exploring the structural characteristics of PTMs, is presented. In addition, all tertiary structures of PTM sites on proteins can be visualized using the JSmol program. Resolving the function of PTM sites is important for understanding the role that proteins play in biological mechanisms. Our work attempted to delineate the structural correlation between PTM sites and PPI or drug-target binding. CurxPTM could help scientists narrow the scope of their PTM research and enhance the efficiency of PTM identification in the face of big proteome data. CruxPTM is now available at http://csb.cse.yzu.edu.tw/CruxPTM/ .
Madaoui, Hocine; Guerois, Raphaël
2008-01-01
Protein surfaces are under significant selection pressure to maintain interactions with their partners throughout evolution. Capturing how selection pressure acts at the interfaces of protein–protein complexes is a fundamental issue with high interest for the structural prediction of macromolecular assemblies. We tackled this issue under the assumption that, throughout evolution, mutations should minimally disrupt the physicochemical compatibility between specific clusters of interacting residues. This constraint drove the development of the so-called Surface COmplementarity Trace in Complex History score (SCOTCH), which was found to discriminate with high efficiency the structure of biological complexes. SCOTCH performances were assessed not only with respect to other evolution-based approaches, such as conservation and coevolution analyses, but also with respect to statistically based scoring methods. Validated on a set of 129 complexes of known structure exhibiting both permanent and transient intermolecular interactions, SCOTCH appears as a robust strategy to guide the prediction of protein–protein complex structures. Of particular interest, it also provides a basic framework to efficiently track how protein surfaces could evolve while keeping their partners in contact. PMID:18511568
Protein-based materials, toward a new level of structural control.
van Hest, J C; Tirrell, D A
2001-10-07
Through billions of years of evolution nature has created and refined structural proteins for a wide variety of specific purposes. Amino acid sequences and their associated folding patterns combine to create elastic, rigid or tough materials. In many respects, nature's intricately designed products provide challenging examples for materials scientists, but translation of natural structural concepts into bio-inspired materials requires a level of control of macromolecular architecture far higher than that afforded by conventional polymerization processes. An increasingly important approach to this problem has been to use biological systems for production of materials. Through protein engineering, artificial genes can be developed that encode protein-based materials with desired features. Structural elements found in nature, such as beta-sheets and alpha-helices, can be combined with great flexibility, and can be outfitted with functional elements such as cell binding sites or enzymatic domains. The possibility of incorporating non-natural amino acids increases the versatility of protein engineering still further. It is expected that such methods will have large impact in the field of materials science, and especially in biomedical materials science, in the future.
Jaimes, Javier A; Whittaker, Gary R
2018-04-01
Feline coronavirus (FCoV) is an etiological agent that causes a benign enteric illness and the fatal systemic disease feline infectious peritonitis (FIP). The FCoV spike (S) protein is considered the viral regulator for binding and entry to the cell. This protein is also involved in FCoV tropism and virulence, as well as in the switch from enteric disease to FIP. This regulation is carried out by spike's major functions: receptor binding and virus-cell membrane fusion. In this review, we address important aspects in FCoV genetics, replication and pathogenesis, focusing on the role of S. To better understand this, FCoV S protein models were constructed, based on the human coronavirus NL63 (HCoV-NL63) S structure. We describe the specific structural characteristics of the FCoV S, in comparison with other coronavirus spikes. We also revise the biochemical events needed for FCoV S activation and its relation to the structural features of the protein. Copyright © 2018 Elsevier Inc. All rights reserved.
Predicting nucleic acid binding interfaces from structural models of proteins.
Dror, Iris; Shazman, Shula; Mukherjee, Srayanta; Zhang, Yang; Glaser, Fabian; Mandel-Gutfreund, Yael
2012-02-01
The function of DNA- and RNA-binding proteins can be inferred from the characterization and accurate prediction of their binding interfaces. However, the main pitfall of various structure-based methods for predicting nucleic acid binding function is that they are all limited to a relatively small number of proteins for which high-resolution three-dimensional structures are available. In this study, we developed a pipeline for extracting functional electrostatic patches from surfaces of protein structural models, obtained using the I-TASSER protein structure predictor. The largest positive patches are extracted from the protein surface using the patchfinder algorithm. We show that functional electrostatic patches extracted from an ensemble of structural models highly overlap the patches extracted from high-resolution structures. Furthermore, by testing our pipeline on a set of 55 known nucleic acid binding proteins for which I-TASSER produces high-quality models, we show that the method accurately identifies the nucleic acids binding interface on structural models of proteins. Employing a combined patch approach we show that patches extracted from an ensemble of models better predicts the real nucleic acid binding interfaces compared with patches extracted from independent models. Overall, these results suggest that combining information from a collection of low-resolution structural models could be a valuable approach for functional annotation. We suggest that our method will be further applicable for predicting other functional surfaces of proteins with unknown structure. Copyright © 2011 Wiley Periodicals, Inc.
TOPSAN: a dynamic web database for structural genomics.
Ellrott, Kyle; Zmasek, Christian M; Weekes, Dana; Sri Krishna, S; Bakolitsa, Constantina; Godzik, Adam; Wooley, John
2011-01-01
The Open Protein Structure Annotation Network (TOPSAN) is a web-based collaboration platform for exploring and annotating structures determined by structural genomics efforts. Characterization of those structures presents a challenge since the majority of the proteins themselves have not yet been characterized. Responding to this challenge, the TOPSAN platform facilitates collaborative annotation and investigation via a user-friendly web-based interface pre-populated with automatically generated information. Semantic web technologies expand and enrich TOPSAN's content through links to larger sets of related databases, and thus, enable data integration from disparate sources and data mining via conventional query languages. TOPSAN can be found at http://www.topsan.org.
Structure-guided Protein Transition Modeling with a Probabilistic Roadmap Algorithm.
Maximova, Tatiana; Plaku, Erion; Shehu, Amarda
2016-07-07
Proteins are macromolecules in perpetual motion, switching between structural states to modulate their function. A detailed characterization of the precise yet complex relationship between protein structure, dynamics, and function requires elucidating transitions between functionally-relevant states. Doing so challenges both wet and dry laboratories, as protein dynamics involves disparate temporal scales. In this paper we present a novel, sampling-based algorithm to compute transition paths. The algorithm exploits two main ideas. First, it leverages known structures to initialize its search and define a reduced conformation space for rapid sampling. This is key to address the insufficient sampling issue suffered by sampling-based algorithms. Second, the algorithm embeds samples in a nearest-neighbor graph where transition paths can be efficiently computed via queries. The algorithm adapts the probabilistic roadmap framework that is popular in robot motion planning. In addition to efficiently computing lowest-cost paths between any given structures, the algorithm allows investigating hypotheses regarding the order of experimentally-known structures in a transition event. This novel contribution is likely to open up new venues of research. Detailed analysis is presented on multiple-basin proteins of relevance to human disease. Multiscaling and the AMBER ff14SB force field are used to obtain energetically-credible paths at atomistic detail.
3Drefine: an interactive web server for efficient protein structure refinement
Bhattacharya, Debswapna; Nowotny, Jackson; Cao, Renzhi; Cheng, Jianlin
2016-01-01
3Drefine is an interactive web server for consistent and computationally efficient protein structure refinement with the capability to perform web-based statistical and visual analysis. The 3Drefine refinement protocol utilizes iterative optimization of hydrogen bonding network combined with atomic-level energy minimization on the optimized model using a composite physics and knowledge-based force fields for efficient protein structure refinement. The method has been extensively evaluated on blind CASP experiments as well as on large-scale and diverse benchmark datasets and exhibits consistent improvement over the initial structure in both global and local structural quality measures. The 3Drefine web server allows for convenient protein structure refinement through a text or file input submission, email notification, provided example submission and is freely available without any registration requirement. The server also provides comprehensive analysis of submissions through various energy and statistical feedback and interactive visualization of multiple refined models through the JSmol applet that is equipped with numerous protein model analysis tools. The web server has been extensively tested and used by many users. As a result, the 3Drefine web server conveniently provides a useful tool easily accessible to the community. The 3Drefine web server has been made publicly available at the URL: http://sysbio.rnet.missouri.edu/3Drefine/. PMID:27131371
Shi, Xiaohu; Zhang, Jingfen; He, Zhiquan; Shang, Yi; Xu, Dong
2011-09-01
One of the major challenges in protein tertiary structure prediction is structure quality assessment. In many cases, protein structure prediction tools generate good structural models, but fail to select the best models from a huge number of candidates as the final output. In this study, we developed a sampling-based machine-learning method to rank protein structural models by integrating multiple scores and features. First, features such as predicted secondary structure, solvent accessibility and residue-residue contact information are integrated by two Radial Basis Function (RBF) models trained from different datasets. Then, the two RBF scores and five selected scoring functions developed by others, i.e., Opus-CA, Opus-PSP, DFIRE, RAPDF, and Cheng Score are synthesized by a sampling method. At last, another integrated RBF model ranks the structural models according to the features of sampling distribution. We tested the proposed method by using two different datasets, including the CASP server prediction models of all CASP8 targets and a set of models generated by our in-house software MUFOLD. The test result shows that our method outperforms any individual scoring function on both best model selection, and overall correlation between the predicted ranking and the actual ranking of structural quality.
DNAproDB: an interactive tool for structural analysis of DNA–protein complexes
Sagendorf, Jared M.
2017-01-01
Abstract Many biological processes are mediated by complex interactions between DNA and proteins. Transcription factors, various polymerases, nucleases and histones recognize and bind DNA with different levels of binding specificity. To understand the physical mechanisms that allow proteins to recognize DNA and achieve their biological functions, it is important to analyze structures of DNA–protein complexes in detail. DNAproDB is a web-based interactive tool designed to help researchers study these complexes. DNAproDB provides an automated structure-processing pipeline that extracts structural features from DNA–protein complexes. The extracted features are organized in structured data files, which are easily parsed with any programming language or viewed in a browser. We processed a large number of DNA–protein complexes retrieved from the Protein Data Bank and created the DNAproDB database to store this data. Users can search the database by combining features of the DNA, protein or DNA–protein interactions at the interface. Additionally, users can upload their own structures for processing privately and securely. DNAproDB provides several interactive and customizable tools for creating visualizations of the DNA–protein interface at different levels of abstraction that can be exported as high quality figures. All functionality is documented and freely accessible at http://dnaprodb.usc.edu. PMID:28431131