Sample records for modeling protein structures

  1. Template-based modeling and ab initio refinement of protein oligomer structures using GALAXY in CAPRI round 30.

    PubMed

    Lee, Hasup; Baek, Minkyung; Lee, Gyu Rie; Park, Sangwoo; Seok, Chaok

    2017-03-01

    Many proteins function as homo- or hetero-oligomers; therefore, attempts to understand and regulate protein functions require knowledge of protein oligomer structures. The number of available experimental protein structures is increasing, and oligomer structures can be predicted using the experimental structures of related proteins as templates. However, template-based models may have errors due to sequence differences between the target and template proteins, which can lead to functional differences. Such structural differences may be predicted by loop modeling of local regions or refinement of the overall structure. In CAPRI (Critical Assessment of PRotein Interactions) round 30, we used recently developed features of the GALAXY protein modeling package, including template-based structure prediction, loop modeling, model refinement, and protein-protein docking to predict protein complex structures from amino acid sequences. Out of the 25 CAPRI targets, medium and acceptable quality models were obtained for 14 and 1 target(s), respectively, for which proper oligomer or monomer templates could be detected. Symmetric interface loop modeling on oligomer model structures successfully improved model quality, while loop modeling on monomer model structures failed. Overall refinement of the predicted oligomer structures consistently improved the model quality, in particular in interface contacts. Proteins 2017; 85:399-407. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.

  2. Exploring Human Diseases and Biological Mechanisms by Protein Structure Prediction and Modeling.

    PubMed

    Wang, Juexin; Luttrell, Joseph; Zhang, Ning; Khan, Saad; Shi, NianQing; Wang, Michael X; Kang, Jing-Qiong; Wang, Zheng; Xu, Dong

    2016-01-01

    Protein structure prediction and modeling provide a tool for understanding protein functions by computationally constructing protein structures from amino acid sequences and analyzing them. With help from protein prediction tools and web servers, users can obtain the three-dimensional protein structure models and gain knowledge of functions from the proteins. In this chapter, we will provide several examples of such studies. As an example, structure modeling methods were used to investigate the relation between mutation-caused misfolding of protein and human diseases including epilepsy and leukemia. Protein structure prediction and modeling were also applied in nucleotide-gated channels and their interaction interfaces to investigate their roles in brain and heart cells. In molecular mechanism studies of plants, rice salinity tolerance mechanism was studied via structure modeling on crucial proteins identified by systems biology analysis; trait-associated protein-protein interactions were modeled, which sheds some light on the roles of mutations in soybean oil/protein content. In the age of precision medicine, we believe protein structure prediction and modeling will play more and more important roles in investigating biomedical mechanism of diseases and drug design.

  3. Protein Structure Determination using Metagenome sequence data

    PubMed Central

    Ovchinnikov, Sergey; Park, Hahnbeom; Varghese, Neha; Huang, Po-Ssu; Pavlopoulos, Georgios A.; Kim, David E.; Kamisetty, Hetunandan; Kyrpides, Nikos C.; Baker, David

    2017-01-01

    Despite decades of work by structural biologists, there are still ~5200 protein families with unknown structure outside the range of comparative modeling. We show that Rosetta structure prediction guided by residue-residue contacts inferred from evolutionary information can accurately model proteins that belong to large families, and that metagenome sequence data more than triples the number of protein families with sufficient sequences for accurate modeling. We then integrate metagenome data, contact based structure matching and Rosetta structure calculations to generate models for 614 protein families with currently unknown structures; 206 are membrane proteins and 137 have folds not represented in the PDB. This approach provides the representative models for large protein families originally envisioned as the goal of the protein structure initiative at a fraction of the cost. PMID:28104891

  4. Quality assessment of protein model-structures based on structural and functional similarities.

    PubMed

    Konopka, Bogumil M; Nebel, Jean-Christophe; Kotulska, Malgorzata

    2012-09-21

    Experimental determination of protein 3D structures is expensive, time consuming and sometimes impossible. A gap between number of protein structures deposited in the World Wide Protein Data Bank and the number of sequenced proteins constantly broadens. Computational modeling is deemed to be one of the ways to deal with the problem. Although protein 3D structure prediction is a difficult task, many tools are available. These tools can model it from a sequence or partial structural information, e.g. contact maps. Consequently, biologists have the ability to generate automatically a putative 3D structure model of any protein. However, the main issue becomes evaluation of the model quality, which is one of the most important challenges of structural biology. GOBA--Gene Ontology-Based Assessment is a novel Protein Model Quality Assessment Program. It estimates the compatibility between a model-structure and its expected function. GOBA is based on the assumption that a high quality model is expected to be structurally similar to proteins functionally similar to the prediction target. Whereas DALI is used to measure structure similarity, protein functional similarity is quantified using standardized and hierarchical description of proteins provided by Gene Ontology combined with Wang's algorithm for calculating semantic similarity. Two approaches are proposed to express the quality of protein model-structures. One is a single model quality assessment method, the other is its modification, which provides a relative measure of model quality. Exhaustive evaluation is performed on data sets of model-structures submitted to the CASP8 and CASP9 contests. The validation shows that the method is able to discriminate between good and bad model-structures. The best of tested GOBA scores achieved 0.74 and 0.8 as a mean Pearson correlation to the observed quality of models in our CASP8 and CASP9-based validation sets. GOBA also obtained the best result for two targets of CASP8, and one of CASP9, compared to the contest participants. Consequently, GOBA offers a novel single model quality assessment program that addresses the practical needs of biologists. In conjunction with other Model Quality Assessment Programs (MQAPs), it would prove useful for the evaluation of single protein models.

  5. RECURSIVE PROTEIN MODELING: A DIVIDE AND CONQUER STRATEGY FOR PROTEIN STRUCTURE PREDICTION AND ITS CASE STUDY IN CASP9

    PubMed Central

    CHENG, JIANLIN; EICKHOLT, JESSE; WANG, ZHENG; DENG, XIN

    2013-01-01

    After decades of research, protein structure prediction remains a very challenging problem. In order to address the different levels of complexity of structural modeling, two types of modeling techniques — template-based modeling and template-free modeling — have been developed. Template-based modeling can often generate a moderate- to high-resolution model when a similar, homologous template structure is found for a query protein but fails if no template or only incorrect templates are found. Template-free modeling, such as fragment-based assembly, may generate models of moderate resolution for small proteins of low topological complexity. Seldom have the two techniques been integrated together to improve protein modeling. Here we develop a recursive protein modeling approach to selectively and collaboratively apply template-based and template-free modeling methods to model template-covered (i.e. certain) and template-free (i.e. uncertain) regions of a protein. A preliminary implementation of the approach was tested on a number of hard modeling cases during the 9th Critical Assessment of Techniques for Protein Structure Prediction (CASP9) and successfully improved the quality of modeling in most of these cases. Recursive modeling can signicantly reduce the complexity of protein structure modeling and integrate template-based and template-free modeling to improve the quality and efficiency of protein structure prediction. PMID:22809379

  6. Modeling complexes of modeled proteins.

    PubMed

    Anishchenko, Ivan; Kundrotas, Petras J; Vakser, Ilya A

    2017-03-01

    Structural characterization of proteins is essential for understanding life processes at the molecular level. However, only a fraction of known proteins have experimentally determined structures. This fraction is even smaller for protein-protein complexes. Thus, structural modeling of protein-protein interactions (docking) primarily has to rely on modeled structures of the individual proteins, which typically are less accurate than the experimentally determined ones. Such "double" modeling is the Grand Challenge of structural reconstruction of the interactome. Yet it remains so far largely untested in a systematic way. We present a comprehensive validation of template-based and free docking on a set of 165 complexes, where each protein model has six levels of structural accuracy, from 1 to 6 Å C α RMSD. Many template-based docking predictions fall into acceptable quality category, according to the CAPRI criteria, even for highly inaccurate proteins (5-6 Å RMSD), although the number of such models (and, consequently, the docking success rate) drops significantly for models with RMSD > 4 Å. The results show that the existing docking methodologies can be successfully applied to protein models with a broad range of structural accuracy, and the template-based docking is much less sensitive to inaccuracies of protein models than the free docking. Proteins 2017; 85:470-478. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.

  7. Ensemble-based evaluation for protein structure models.

    PubMed

    Jamroz, Michal; Kolinski, Andrzej; Kihara, Daisuke

    2016-06-15

    Comparing protein tertiary structures is a fundamental procedure in structural biology and protein bioinformatics. Structure comparison is important particularly for evaluating computational protein structure models. Most of the model structure evaluation methods perform rigid body superimposition of a structure model to its crystal structure and measure the difference of the corresponding residue or atom positions between them. However, these methods neglect intrinsic flexibility of proteins by treating the native structure as a rigid molecule. Because different parts of proteins have different levels of flexibility, for example, exposed loop regions are usually more flexible than the core region of a protein structure, disagreement of a model to the native needs to be evaluated differently depending on the flexibility of residues in a protein. We propose a score named FlexScore for comparing protein structures that consider flexibility of each residue in the native state of proteins. Flexibility information may be extracted from experiments such as NMR or molecular dynamics simulation. FlexScore considers an ensemble of conformations of a protein described as a multivariate Gaussian distribution of atomic displacements and compares a query computational model with the ensemble. We compare FlexScore with other commonly used structure similarity scores over various examples. FlexScore agrees with experts' intuitive assessment of computational models and provides information of practical usefulness of models. https://bitbucket.org/mjamroz/flexscore dkihara@purdue.edu Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.

  8. Ensemble-based evaluation for protein structure models

    PubMed Central

    Jamroz, Michal; Kolinski, Andrzej; Kihara, Daisuke

    2016-01-01

    Motivation: Comparing protein tertiary structures is a fundamental procedure in structural biology and protein bioinformatics. Structure comparison is important particularly for evaluating computational protein structure models. Most of the model structure evaluation methods perform rigid body superimposition of a structure model to its crystal structure and measure the difference of the corresponding residue or atom positions between them. However, these methods neglect intrinsic flexibility of proteins by treating the native structure as a rigid molecule. Because different parts of proteins have different levels of flexibility, for example, exposed loop regions are usually more flexible than the core region of a protein structure, disagreement of a model to the native needs to be evaluated differently depending on the flexibility of residues in a protein. Results: We propose a score named FlexScore for comparing protein structures that consider flexibility of each residue in the native state of proteins. Flexibility information may be extracted from experiments such as NMR or molecular dynamics simulation. FlexScore considers an ensemble of conformations of a protein described as a multivariate Gaussian distribution of atomic displacements and compares a query computational model with the ensemble. We compare FlexScore with other commonly used structure similarity scores over various examples. FlexScore agrees with experts’ intuitive assessment of computational models and provides information of practical usefulness of models. Availability and implementation: https://bitbucket.org/mjamroz/flexscore Contact: dkihara@purdue.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27307633

  9. Knowledge-based model building of proteins: concepts and examples.

    PubMed Central

    Bajorath, J.; Stenkamp, R.; Aruffo, A.

    1993-01-01

    We describe how to build protein models from structural templates. Methods to identify structural similarities between proteins in cases of significant, moderate to low, or virtually absent sequence similarity are discussed. The detection and evaluation of structural relationships is emphasized as a central aspect of protein modeling, distinct from the more technical aspects of model building. Computational techniques to generate and complement comparative protein models are also reviewed. Two examples, P-selectin and gp39, are presented to illustrate the derivation of protein model structures and their use in experimental studies. PMID:7505680

  10. Homology modeling a fast tool for drug discovery: current perspectives.

    PubMed

    Vyas, V K; Ukawala, R D; Ghate, M; Chintha, C

    2012-01-01

    Major goal of structural biology involve formation of protein-ligand complexes; in which the protein molecules act energetically in the course of binding. Therefore, perceptive of protein-ligand interaction will be very important for structure based drug design. Lack of knowledge of 3D structures has hindered efforts to understand the binding specificities of ligands with protein. With increasing in modeling software and the growing number of known protein structures, homology modeling is rapidly becoming the method of choice for obtaining 3D coordinates of proteins. Homology modeling is a representation of the similarity of environmental residues at topologically corresponding positions in the reference proteins. In the absence of experimental data, model building on the basis of a known 3D structure of a homologous protein is at present the only reliable method to obtain the structural information. Knowledge of the 3D structures of proteins provides invaluable insights into the molecular basis of their functions. The recent advances in homology modeling, particularly in detecting and aligning sequences with template structures, distant homologues, modeling of loops and side chains as well as detecting errors in a model contributed to consistent prediction of protein structure, which was not possible even several years ago. This review focused on the features and a role of homology modeling in predicting protein structure and described current developments in this field with victorious applications at the different stages of the drug design and discovery.

  11. Homology Modeling a Fast Tool for Drug Discovery: Current Perspectives

    PubMed Central

    Vyas, V. K.; Ukawala, R. D.; Ghate, M.; Chintha, C.

    2012-01-01

    Major goal of structural biology involve formation of protein-ligand complexes; in which the protein molecules act energetically in the course of binding. Therefore, perceptive of protein-ligand interaction will be very important for structure based drug design. Lack of knowledge of 3D structures has hindered efforts to understand the binding specificities of ligands with protein. With increasing in modeling software and the growing number of known protein structures, homology modeling is rapidly becoming the method of choice for obtaining 3D coordinates of proteins. Homology modeling is a representation of the similarity of environmental residues at topologically corresponding positions in the reference proteins. In the absence of experimental data, model building on the basis of a known 3D structure of a homologous protein is at present the only reliable method to obtain the structural information. Knowledge of the 3D structures of proteins provides invaluable insights into the molecular basis of their functions. The recent advances in homology modeling, particularly in detecting and aligning sequences with template structures, distant homologues, modeling of loops and side chains as well as detecting errors in a model contributed to consistent prediction of protein structure, which was not possible even several years ago. This review focused on the features and a role of homology modeling in predicting protein structure and described current developments in this field with victorious applications at the different stages of the drug design and discovery. PMID:23204616

  12. In Silico Analysis for the Study of Botulinum Toxin Structure

    NASA Astrophysics Data System (ADS)

    Suzuki, Tomonori; Miyazaki, Satoru

    2010-01-01

    Protein-protein interactions play many important roles in biological function. Knowledge of protein-protein complex structure is required for understanding the function. The determination of protein-protein complex structure by experimental studies remains difficult, therefore computational prediction of protein structures by structure modeling and docking studies is valuable method. In addition, MD simulation is also one of the most popular methods for protein structure modeling and characteristics. Here, we attempt to predict protein-protein complex structure and property using some of bioinformatic methods, and we focus botulinum toxin complex as target structure.

  13. Quality assessment of protein model-structures based on structural and functional similarities

    PubMed Central

    2012-01-01

    Background Experimental determination of protein 3D structures is expensive, time consuming and sometimes impossible. A gap between number of protein structures deposited in the World Wide Protein Data Bank and the number of sequenced proteins constantly broadens. Computational modeling is deemed to be one of the ways to deal with the problem. Although protein 3D structure prediction is a difficult task, many tools are available. These tools can model it from a sequence or partial structural information, e.g. contact maps. Consequently, biologists have the ability to generate automatically a putative 3D structure model of any protein. However, the main issue becomes evaluation of the model quality, which is one of the most important challenges of structural biology. Results GOBA - Gene Ontology-Based Assessment is a novel Protein Model Quality Assessment Program. It estimates the compatibility between a model-structure and its expected function. GOBA is based on the assumption that a high quality model is expected to be structurally similar to proteins functionally similar to the prediction target. Whereas DALI is used to measure structure similarity, protein functional similarity is quantified using standardized and hierarchical description of proteins provided by Gene Ontology combined with Wang's algorithm for calculating semantic similarity. Two approaches are proposed to express the quality of protein model-structures. One is a single model quality assessment method, the other is its modification, which provides a relative measure of model quality. Exhaustive evaluation is performed on data sets of model-structures submitted to the CASP8 and CASP9 contests. Conclusions The validation shows that the method is able to discriminate between good and bad model-structures. The best of tested GOBA scores achieved 0.74 and 0.8 as a mean Pearson correlation to the observed quality of models in our CASP8 and CASP9-based validation sets. GOBA also obtained the best result for two targets of CASP8, and one of CASP9, compared to the contest participants. Consequently, GOBA offers a novel single model quality assessment program that addresses the practical needs of biologists. In conjunction with other Model Quality Assessment Programs (MQAPs), it would prove useful for the evaluation of single protein models. PMID:22998498

  14. Comparative Protein Structure Modeling Using MODELLER.

    PubMed

    Webb, Benjamin; Sali, Andrej

    2014-09-08

    Functional characterization of a protein sequence is one of the most frequent problems in biology. This task is usually facilitated by accurate three-dimensional (3-D) structure of the studied protein. In the absence of an experimentally determined structure, comparative or homology modeling can sometimes provide a useful 3-D model for a protein that is related to at least one known protein structure. Comparative modeling predicts the 3-D structure of a given protein sequence (target) based primarily on its alignment to one or more proteins of known structure (templates). The prediction process consists of fold assignment, target-template alignment, model building, and model evaluation. This unit describes how to calculate comparative models using the program MODELLER and discusses all four steps of comparative modeling, frequently observed errors, and some applications. Modeling lactate dehydrogenase from Trichomonas vaginalis (TvLDH) is described as an example. The download and installation of the MODELLER software is also described. Copyright © 2014 John Wiley & Sons, Inc.

  15. Advances in Homology Protein Structure Modeling

    PubMed Central

    Xiang, Zhexin

    2007-01-01

    Homology modeling plays a central role in determining protein structure in the structural genomics project. The importance of homology modeling has been steadily increasing because of the large gap that exists between the overwhelming number of available protein sequences and experimentally solved protein structures, and also, more importantly, because of the increasing reliability and accuracy of the method. In fact, a protein sequence with over 30% identity to a known structure can often be predicted with an accuracy equivalent to a low-resolution X-ray structure. The recent advances in homology modeling, especially in detecting distant homologues, aligning sequences with template structures, modeling of loops and side chains, as well as detecting errors in a model, have contributed to reliable prediction of protein structure, which was not possible even several years ago. The ongoing efforts in solving protein structures, which can be time-consuming and often difficult, will continue to spur the development of a host of new computational methods that can fill in the gap and further contribute to understanding the relationship between protein structure and function. PMID:16787261

  16. A Template-Based Protein Structure Reconstruction Method Using Deep Autoencoder Learning.

    PubMed

    Li, Haiou; Lyu, Qiang; Cheng, Jianlin

    2016-12-01

    Protein structure prediction is an important problem in computational biology, and is widely applied to various biomedical problems such as protein function study, protein design, and drug design. In this work, we developed a novel deep learning approach based on a deeply stacked denoising autoencoder for protein structure reconstruction. We applied our approach to a template-based protein structure prediction using only the 3D structural coordinates of homologous template proteins as input. The templates were identified for a target protein by a PSI-BLAST search. 3DRobot (a program that automatically generates diverse and well-packed protein structure decoys) was used to generate initial decoy models for the target from the templates. A stacked denoising autoencoder was trained on the decoys to obtain a deep learning model for the target protein. The trained deep model was then used to reconstruct the final structural model for the target sequence. With target proteins that have highly similar template proteins as benchmarks, the GDT-TS score of the predicted structures is greater than 0.7, suggesting that the deep autoencoder is a promising method for protein structure reconstruction.

  17. Predicting nucleic acid binding interfaces from structural models of proteins

    PubMed Central

    Dror, Iris; Shazman, Shula; Mukherjee, Srayanta; Zhang, Yang; Glaser, Fabian; Mandel-Gutfreund, Yael

    2011-01-01

    The function of DNA- and RNA-binding proteins can be inferred from the characterization and accurate prediction of their binding interfaces. However the main pitfall of various structure-based methods for predicting nucleic acid binding function is that they are all limited to a relatively small number of proteins for which high-resolution three dimensional structures are available. In this study, we developed a pipeline for extracting functional electrostatic patches from surfaces of protein structural models, obtained using the I-TASSER protein structure predictor. The largest positive patches are extracted from the protein surface using the patchfinder algorithm. We show that functional electrostatic patches extracted from an ensemble of structural models highly overlap the patches extracted from high-resolution structures. Furthermore, by testing our pipeline on a set of 55 known nucleic acid binding proteins for which I-TASSER produces high-quality models, we show that the method accurately identifies the nucleic acids binding interface on structural models of proteins. Employing a combined patch approach we show that patches extracted from an ensemble of models better predicts the real nucleic acid binding interfaces compared to patches extracted from independent models. Overall, these results suggest that combining information from a collection of low-resolution structural models could be a valuable approach for functional annotation. We suggest that our method will be further applicable for predicting other functional surfaces of proteins with unknown structure. PMID:22086767

  18. A Generative Angular Model of Protein Structure Evolution

    PubMed Central

    Golden, Michael; García-Portugués, Eduardo; Sørensen, Michael; Mardia, Kanti V.; Hamelryck, Thomas; Hein, Jotun

    2017-01-01

    Abstract Recently described stochastic models of protein evolution have demonstrated that the inclusion of structural information in addition to amino acid sequences leads to a more reliable estimation of evolutionary parameters. We present a generative, evolutionary model of protein structure and sequence that is valid on a local length scale. The model concerns the local dependencies between sequence and structure evolution in a pair of homologous proteins. The evolutionary trajectory between the two structures in the protein pair is treated as a random walk in dihedral angle space, which is modeled using a novel angular diffusion process on the two-dimensional torus. Coupling sequence and structure evolution in our model allows for modeling both “smooth” conformational changes and “catastrophic” conformational jumps, conditioned on the amino acid changes. The model has interpretable parameters and is comparatively more realistic than previous stochastic models, providing new insights into the relationship between sequence and structure evolution. For example, using the trained model we were able to identify an apparent sequence–structure evolutionary motif present in a large number of homologous protein pairs. The generative nature of our model enables us to evaluate its validity and its ability to simulate aspects of protein evolution conditioned on an amino acid sequence, a related amino acid sequence, a related structure or any combination thereof. PMID:28453724

  19. Modeling repetitive, non‐globular proteins

    PubMed Central

    Basu, Koli; Campbell, Robert L.; Guo, Shuaiqi; Sun, Tianjun

    2016-01-01

    Abstract While ab initio modeling of protein structures is not routine, certain types of proteins are more straightforward to model than others. Proteins with short repetitive sequences typically exhibit repetitive structures. These repetitive sequences can be more amenable to modeling if some information is known about the predominant secondary structure or other key features of the protein sequence. We have successfully built models of a number of repetitive structures with novel folds using knowledge of the consensus sequence within the sequence repeat and an understanding of the likely secondary structures that these may adopt. Our methods for achieving this success are reviewed here. PMID:26914323

  20. Evaluation of protein-protein docking model structures using all-atom molecular dynamics simulations combined with the solution theory in the energy representation

    NASA Astrophysics Data System (ADS)

    Takemura, Kazuhiro; Guo, Hao; Sakuraba, Shun; Matubayasi, Nobuyuki; Kitao, Akio

    2012-12-01

    We propose a method to evaluate binding free energy differences among distinct protein-protein complex model structures through all-atom molecular dynamics simulations in explicit water using the solution theory in the energy representation. Complex model structures are generated from a pair of monomeric structures using the rigid-body docking program ZDOCK. After structure refinement by side chain optimization and all-atom molecular dynamics simulations in explicit water, complex models are evaluated based on the sum of their conformational and solvation free energies, the latter calculated from the energy distribution functions obtained from relatively short molecular dynamics simulations of the complex in water and of pure water based on the solution theory in the energy representation. We examined protein-protein complex model structures of two protein-protein complex systems, bovine trypsin/CMTI-1 squash inhibitor (PDB ID: 1PPE) and RNase SA/barstar (PDB ID: 1AY7), for which both complex and monomer structures were determined experimentally. For each system, we calculated the energies for the crystal complex structure and twelve generated model structures including the model most similar to the crystal structure and very different from it. In both systems, the sum of the conformational and solvation free energies tended to be lower for the structure similar to the crystal. We concluded that our energy calculation method is useful for selecting low energy complex models similar to the crystal structure from among a set of generated models.

  1. Evaluation of protein-protein docking model structures using all-atom molecular dynamics simulations combined with the solution theory in the energy representation.

    PubMed

    Takemura, Kazuhiro; Guo, Hao; Sakuraba, Shun; Matubayasi, Nobuyuki; Kitao, Akio

    2012-12-07

    We propose a method to evaluate binding free energy differences among distinct protein-protein complex model structures through all-atom molecular dynamics simulations in explicit water using the solution theory in the energy representation. Complex model structures are generated from a pair of monomeric structures using the rigid-body docking program ZDOCK. After structure refinement by side chain optimization and all-atom molecular dynamics simulations in explicit water, complex models are evaluated based on the sum of their conformational and solvation free energies, the latter calculated from the energy distribution functions obtained from relatively short molecular dynamics simulations of the complex in water and of pure water based on the solution theory in the energy representation. We examined protein-protein complex model structures of two protein-protein complex systems, bovine trypsin/CMTI-1 squash inhibitor (PDB ID: 1PPE) and RNase SA/barstar (PDB ID: 1AY7), for which both complex and monomer structures were determined experimentally. For each system, we calculated the energies for the crystal complex structure and twelve generated model structures including the model most similar to the crystal structure and very different from it. In both systems, the sum of the conformational and solvation free energies tended to be lower for the structure similar to the crystal. We concluded that our energy calculation method is useful for selecting low energy complex models similar to the crystal structure from among a set of generated models.

  2. Gaia: automated quality assessment of protein structure models.

    PubMed

    Kota, Pradeep; Ding, Feng; Ramachandran, Srinivas; Dokholyan, Nikolay V

    2011-08-15

    Increasing use of structural modeling for understanding structure-function relationships in proteins has led to the need to ensure that the protein models being used are of acceptable quality. Quality of a given protein structure can be assessed by comparing various intrinsic structural properties of the protein to those observed in high-resolution protein structures. In this study, we present tools to compare a given structure to high-resolution crystal structures. We assess packing by calculating the total void volume, the percentage of unsatisfied hydrogen bonds, the number of steric clashes and the scaling of the accessible surface area. We assess covalent geometry by determining bond lengths, angles, dihedrals and rotamers. The statistical parameters for the above measures, obtained from high-resolution crystal structures enable us to provide a quality-score that points to specific areas where a given protein structural model needs improvement. We provide these tools that appraise protein structures in the form of a web server Gaia (http://chiron.dokhlab.org). Gaia evaluates the packing and covalent geometry of a given protein structure and provides quantitative comparison of the given structure to high-resolution crystal structures. dokh@unc.edu Supplementary data are available at Bioinformatics online.

  3. Coarse-Grained Simulations of Membrane Insertion and Folding of Small Helical Proteins Using the CABS Model.

    PubMed

    Pulawski, Wojciech; Jamroz, Michal; Kolinski, Michal; Kolinski, Andrzej; Kmiecik, Sebastian

    2016-11-28

    The CABS coarse-grained model is a well-established tool for modeling globular proteins (predicting their structure, dynamics, and interactions). Here we introduce an extension of the CABS representation and force field (CABS-membrane) to the modeling of the effect of the biological membrane environment on the structure of membrane proteins. We validate the CABS-membrane model in folding simulations of 10 short helical membrane proteins not using any knowledge about their structure. The simulations start from random protein conformations placed outside the membrane environment and allow for full flexibility of the modeled proteins during their spontaneous insertion into the membrane. In the resulting trajectories, we have found models close to the experimental membrane structures. We also attempted to select the correctly folded models using simple filtering followed by structural clustering combined with reconstruction to the all-atom representation and all-atom scoring. The CABS-membrane model is a promising approach for further development toward modeling of large protein-membrane systems.

  4. Automated protein structure modeling in CASP9 by I-TASSER pipeline combined with QUARK-based ab initio folding and FG-MD-based structure refinement

    PubMed Central

    Xu, Dong; Zhang, Jian; Roy, Ambrish; Zhang, Yang

    2011-01-01

    I-TASSER is an automated pipeline for protein tertiary structure prediction using multiple threading alignments and iterative structure assembly simulations. In CASP9 experiments, two new algorithms, QUARK and FG-MD, were added to the I-TASSER pipeline for improving the structural modeling accuracy. QUARK is a de novo structure prediction algorithm used for structure modeling of proteins that lack detectable template structures. For distantly homologous targets, QUARK models are found useful as a reference structure for selecting good threading alignments and guiding the I-TASSER structure assembly simulations. FG-MD is an atomic-level structural refinement program that uses structural fragments collected from the PDB structures to guide molecular dynamics simulation and improve the local structure of predicted model, including hydrogen-bonding networks, torsion angles and steric clashes. Despite considerable progress in both the template-based and template-free structure modeling, significant improvements on protein target classification, domain parsing, model selection, and ab initio folding of beta-proteins are still needed to further improve the I-TASSER pipeline. PMID:22069036

  5. Predicting nucleic acid binding interfaces from structural models of proteins.

    PubMed

    Dror, Iris; Shazman, Shula; Mukherjee, Srayanta; Zhang, Yang; Glaser, Fabian; Mandel-Gutfreund, Yael

    2012-02-01

    The function of DNA- and RNA-binding proteins can be inferred from the characterization and accurate prediction of their binding interfaces. However, the main pitfall of various structure-based methods for predicting nucleic acid binding function is that they are all limited to a relatively small number of proteins for which high-resolution three-dimensional structures are available. In this study, we developed a pipeline for extracting functional electrostatic patches from surfaces of protein structural models, obtained using the I-TASSER protein structure predictor. The largest positive patches are extracted from the protein surface using the patchfinder algorithm. We show that functional electrostatic patches extracted from an ensemble of structural models highly overlap the patches extracted from high-resolution structures. Furthermore, by testing our pipeline on a set of 55 known nucleic acid binding proteins for which I-TASSER produces high-quality models, we show that the method accurately identifies the nucleic acids binding interface on structural models of proteins. Employing a combined patch approach we show that patches extracted from an ensemble of models better predicts the real nucleic acid binding interfaces compared with patches extracted from independent models. Overall, these results suggest that combining information from a collection of low-resolution structural models could be a valuable approach for functional annotation. We suggest that our method will be further applicable for predicting other functional surfaces of proteins with unknown structure. Copyright © 2011 Wiley Periodicals, Inc.

  6. Ab Initio Structural Modeling of and Experimental Validation for Chlamydia trachomatis Protein CT296 Reveal Structural Similarity to Fe(II) 2-Oxoglutarate-Dependent Enzymes▿

    PubMed Central

    Kemege, Kyle E.; Hickey, John M.; Lovell, Scott; Battaile, Kevin P.; Zhang, Yang; Hefty, P. Scott

    2011-01-01

    Chlamydia trachomatis is a medically important pathogen that encodes a relatively high percentage of proteins with unknown function. The three-dimensional structure of a protein can be very informative regarding the protein's functional characteristics; however, determining protein structures experimentally can be very challenging. Computational methods that model protein structures with sufficient accuracy to facilitate functional studies have had notable successes. To evaluate the accuracy and potential impact of computational protein structure modeling of hypothetical proteins encoded by Chlamydia, a successful computational method termed I-TASSER was utilized to model the three-dimensional structure of a hypothetical protein encoded by open reading frame (ORF) CT296. CT296 has been reported to exhibit functional properties of a divalent cation transcription repressor (DcrA), with similarity to the Escherichia coli iron-responsive transcriptional repressor, Fur. Unexpectedly, the I-TASSER model of CT296 exhibited no structural similarity to any DNA-interacting proteins or motifs. To validate the I-TASSER-generated model, the structure of CT296 was solved experimentally using X-ray crystallography. Impressively, the ab initio I-TASSER-generated model closely matched (2.72-Å Cα root mean square deviation [RMSD]) the high-resolution (1.8-Å) crystal structure of CT296. Modeled and experimentally determined structures of CT296 share structural characteristics of non-heme Fe(II) 2-oxoglutarate-dependent enzymes, although key enzymatic residues are not conserved, suggesting a unique biochemical process is likely associated with CT296 function. Additionally, functional analyses did not support prior reports that CT296 has properties shared with divalent cation repressors such as Fur. PMID:21965559

  7. Ab initio structural modeling of and experimental validation for Chlamydia trachomatis protein CT296 reveal structural similarity to Fe(II) 2-oxoglutarate-dependent enzymes

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kemege, Kyle E.; Hickey, John M.; Lovell, Scott

    2012-02-13

    Chlamydia trachomatis is a medically important pathogen that encodes a relatively high percentage of proteins with unknown function. The three-dimensional structure of a protein can be very informative regarding the protein's functional characteristics; however, determining protein structures experimentally can be very challenging. Computational methods that model protein structures with sufficient accuracy to facilitate functional studies have had notable successes. To evaluate the accuracy and potential impact of computational protein structure modeling of hypothetical proteins encoded by Chlamydia, a successful computational method termed I-TASSER was utilized to model the three-dimensional structure of a hypothetical protein encoded by open reading frame (ORF)more » CT296. CT296 has been reported to exhibit functional properties of a divalent cation transcription repressor (DcrA), with similarity to the Escherichia coli iron-responsive transcriptional repressor, Fur. Unexpectedly, the I-TASSER model of CT296 exhibited no structural similarity to any DNA-interacting proteins or motifs. To validate the I-TASSER-generated model, the structure of CT296 was solved experimentally using X-ray crystallography. Impressively, the ab initio I-TASSER-generated model closely matched (2.72-{angstrom} C{alpha} root mean square deviation [RMSD]) the high-resolution (1.8-{angstrom}) crystal structure of CT296. Modeled and experimentally determined structures of CT296 share structural characteristics of non-heme Fe(II) 2-oxoglutarate-dependent enzymes, although key enzymatic residues are not conserved, suggesting a unique biochemical process is likely associated with CT296 function. Additionally, functional analyses did not support prior reports that CT296 has properties shared with divalent cation repressors such as Fur.« less

  8. Hill-Climbing search and diversification within an evolutionary approach to protein structure prediction.

    PubMed

    Chira, Camelia; Horvath, Dragos; Dumitrescu, D

    2011-07-30

    Proteins are complex structures made of amino acids having a fundamental role in the correct functioning of living cells. The structure of a protein is the result of the protein folding process. However, the general principles that govern the folding of natural proteins into a native structure are unknown. The problem of predicting a protein structure with minimum-energy starting from the unfolded amino acid sequence is a highly complex and important task in molecular and computational biology. Protein structure prediction has important applications in fields such as drug design and disease prediction. The protein structure prediction problem is NP-hard even in simplified lattice protein models. An evolutionary model based on hill-climbing genetic operators is proposed for protein structure prediction in the hydrophobic - polar (HP) model. Problem-specific search operators are implemented and applied using a steepest-ascent hill-climbing approach. Furthermore, the proposed model enforces an explicit diversification stage during the evolution in order to avoid local optimum. The main features of the resulting evolutionary algorithm - hill-climbing mechanism and diversification strategy - are evaluated in a set of numerical experiments for the protein structure prediction problem to assess their impact to the efficiency of the search process. Furthermore, the emerging consolidated model is compared to relevant algorithms from the literature for a set of difficult bidimensional instances from lattice protein models. The results obtained by the proposed algorithm are promising and competitive with those of related methods.

  9. Mixture models for protein structure ensembles.

    PubMed

    Hirsch, Michael; Habeck, Michael

    2008-10-01

    Protein structure ensembles provide important insight into the dynamics and function of a protein and contain information that is not captured with a single static structure. However, it is not clear a priori to what extent the variability within an ensemble is caused by internal structural changes. Additional variability results from overall translations and rotations of the molecule. And most experimental data do not provide information to relate the structures to a common reference frame. To report meaningful values of intrinsic dynamics, structural precision, conformational entropy, etc., it is therefore important to disentangle local from global conformational heterogeneity. We consider the task of disentangling local from global heterogeneity as an inference problem. We use probabilistic methods to infer from the protein ensemble missing information on reference frames and stable conformational sub-states. To this end, we model a protein ensemble as a mixture of Gaussian probability distributions of either entire conformations or structural segments. We learn these models from a protein ensemble using the expectation-maximization algorithm. Our first model can be used to find multiple conformers in a structure ensemble. The second model partitions the protein chain into locally stable structural segments or core elements and less structured regions typically found in loops. Both models are simple to implement and contain only a single free parameter: the number of conformers or structural segments. Our models can be used to analyse experimental ensembles, molecular dynamics trajectories and conformational change in proteins. The Python source code for protein ensemble analysis is available from the authors upon request.

  10. A hidden markov model derived structural alphabet for proteins.

    PubMed

    Camproux, A C; Gautier, R; Tufféry, P

    2004-06-04

    Understanding and predicting protein structures depends on the complexity and the accuracy of the models used to represent them. We have set up a hidden Markov model that discretizes protein backbone conformation as series of overlapping fragments (states) of four residues length. This approach learns simultaneously the geometry of the states and their connections. We obtain, using a statistical criterion, an optimal systematic decomposition of the conformational variability of the protein peptidic chain in 27 states with strong connection logic. This result is stable over different protein sets. Our model fits well the previous knowledge related to protein architecture organisation and seems able to grab some subtle details of protein organisation, such as helix sub-level organisation schemes. Taking into account the dependence between the states results in a description of local protein structure of low complexity. On an average, the model makes use of only 8.3 states among 27 to describe each position of a protein structure. Although we use short fragments, the learning process on entire protein conformations captures the logic of the assembly on a larger scale. Using such a model, the structure of proteins can be reconstructed with an average accuracy close to 1.1A root-mean-square deviation and for a complexity of only 3. Finally, we also observe that sequence specificity increases with the number of states of the structural alphabet. Such models can constitute a very relevant approach to the analysis of protein architecture in particular for protein structure prediction.

  11. Protein Models Docking Benchmark 2

    PubMed Central

    Anishchenko, Ivan; Kundrotas, Petras J.; Tuzikov, Alexander V.; Vakser, Ilya A.

    2015-01-01

    Structural characterization of protein-protein interactions is essential for our ability to understand life processes. However, only a fraction of known proteins have experimentally determined structures. Such structures provide templates for modeling of a large part of the proteome, where individual proteins can be docked by template-free or template-based techniques. Still, the sensitivity of the docking methods to the inherent inaccuracies of protein models, as opposed to the experimentally determined high-resolution structures, remains largely untested, primarily due to the absence of appropriate benchmark set(s). Structures in such a set should have pre-defined inaccuracy levels and, at the same time, resemble actual protein models in terms of structural motifs/packing. The set should also be large enough to ensure statistical reliability of the benchmarking results. We present a major update of the previously developed benchmark set of protein models. For each interactor, six models were generated with the model-to-native Cα RMSD in the 1 to 6 Å range. The models in the set were generated by a new approach, which corresponds to the actual modeling of new protein structures in the “real case scenario,” as opposed to the previous set, where a significant number of structures were model-like only. In addition, the larger number of complexes (165 vs. 63 in the previous set) increases the statistical reliability of the benchmarking. We estimated the highest accuracy of the predicted complexes (according to CAPRI criteria), which can be attained using the benchmark structures. The set is available at http://dockground.bioinformatics.ku.edu. PMID:25712716

  12. MODBASE, a database of annotated comparative protein structure models

    PubMed Central

    Pieper, Ursula; Eswar, Narayanan; Stuart, Ashley C.; Ilyin, Valentin A.; Sali, Andrej

    2002-01-01

    MODBASE (http://guitar.rockefeller.edu/modbase) is a relational database of annotated comparative protein structure models for all available protein sequences matched to at least one known protein structure. The models are calculated by MODPIPE, an automated modeling pipeline that relies on PSI-BLAST, IMPALA and MODELLER. MODBASE uses the MySQL relational database management system for flexible and efficient querying, and the MODVIEW Netscape plugin for viewing and manipulating multiple sequences and structures. It is updated regularly to reflect the growth of the protein sequence and structure databases, as well as improvements in the software for calculating the models. For ease of access, MODBASE is organized into different datasets. The largest dataset contains models for domains in 304 517 out of 539 171 unique protein sequences in the complete TrEMBL database (23 March 2001); only models based on significant alignments (PSI-BLAST E-value < 10–4) and models assessed to have the correct fold are included. Other datasets include models for target selection and structure-based annotation by the New York Structural Genomics Research Consortium, models for prediction of genes in the Drosophila melanogaster genome, models for structure determination of several ribosomal particles and models calculated by the MODWEB comparative modeling web server. PMID:11752309

  13. ProTSAV: A protein tertiary structure analysis and validation server.

    PubMed

    Singh, Ankita; Kaushik, Rahul; Mishra, Avinash; Shanker, Asheesh; Jayaram, B

    2016-01-01

    Quality assessment of predicted model structures of proteins is as important as the protein tertiary structure prediction. A highly efficient quality assessment of predicted model structures directs further research on function. Here we present a new server ProTSAV, capable of evaluating predicted model structures based on some popular online servers and standalone tools. ProTSAV furnishes the user with a single quality score in case of individual protein structure along with a graphical representation and ranking in case of multiple protein structure assessment. The server is validated on ~64,446 protein structures including experimental structures from RCSB and predicted model structures for CASP targets and from public decoy sets. ProTSAV succeeds in predicting quality of protein structures with a specificity of 100% and a sensitivity of 98% on experimentally solved structures and achieves a specificity of 88%and a sensitivity of 91% on predicted protein structures of CASP11 targets under 2Å.The server overcomes the limitations of any single server/method and is seen to be robust in helping in quality assessment. ProTSAV is freely available at http://www.scfbio-iitd.res.in/software/proteomics/protsav.jsp. Copyright © 2015 Elsevier B.V. All rights reserved.

  14. Comparative Protein Structure Modeling Using MODELLER

    PubMed Central

    Webb, Benjamin; Sali, Andrej

    2016-01-01

    Comparative protein structure modeling predicts the three-dimensional structure of a given protein sequence (target) based primarily on its alignment to one or more proteins of known structure (templates). The prediction process consists of fold assignment, target-template alignment, model building, and model evaluation. This unit describes how to calculate comparative models using the program MODELLER and how to use the ModBase database of such models, and discusses all four steps of comparative modeling, frequently observed errors, and some applications. Modeling lactate dehydrogenase from Trichomonas vaginalis (TvLDH) is described as an example. The download and installation of the MODELLER software is also described. PMID:27322406

  15. Restricted N-glycan conformational space in the PDB and its implication in glycan structure modeling.

    PubMed

    Jo, Sunhwan; Lee, Hui Sun; Skolnick, Jeffrey; Im, Wonpil

    2013-01-01

    Understanding glycan structure and dynamics is central to understanding protein-carbohydrate recognition and its role in protein-protein interactions. Given the difficulties in obtaining the glycan's crystal structure in glycoconjugates due to its flexibility and heterogeneity, computational modeling could play an important role in providing glycosylated protein structure models. To address if glycan structures available in the PDB can be used as templates or fragments for glycan modeling, we present a survey of the N-glycan structures of 35 different sequences in the PDB. Our statistical analysis shows that the N-glycan structures found on homologous glycoproteins are significantly conserved compared to the random background, suggesting that N-glycan chains can be confidently modeled with template glycan structures whose parent glycoproteins share sequence similarity. On the other hand, N-glycan structures found on non-homologous glycoproteins do not show significant global structural similarity. Nonetheless, the internal substructures of these N-glycans, particularly, the substructures that are closer to the protein, show significantly similar structures, suggesting that such substructures can be used as fragments in glycan modeling. Increased interactions with protein might be responsible for the restricted conformational space of N-glycan chains. Our results suggest that structure prediction/modeling of N-glycans of glycoconjugates using structure database could be effective and different modeling approaches would be needed depending on the availability of template structures.

  16. Restricted N-glycan Conformational Space in the PDB and Its Implication in Glycan Structure Modeling

    PubMed Central

    Jo, Sunhwan; Lee, Hui Sun; Skolnick, Jeffrey; Im, Wonpil

    2013-01-01

    Understanding glycan structure and dynamics is central to understanding protein-carbohydrate recognition and its role in protein-protein interactions. Given the difficulties in obtaining the glycan's crystal structure in glycoconjugates due to its flexibility and heterogeneity, computational modeling could play an important role in providing glycosylated protein structure models. To address if glycan structures available in the PDB can be used as templates or fragments for glycan modeling, we present a survey of the N-glycan structures of 35 different sequences in the PDB. Our statistical analysis shows that the N-glycan structures found on homologous glycoproteins are significantly conserved compared to the random background, suggesting that N-glycan chains can be confidently modeled with template glycan structures whose parent glycoproteins share sequence similarity. On the other hand, N-glycan structures found on non-homologous glycoproteins do not show significant global structural similarity. Nonetheless, the internal substructures of these N-glycans, particularly, the substructures that are closer to the protein, show significantly similar structures, suggesting that such substructures can be used as fragments in glycan modeling. Increased interactions with protein might be responsible for the restricted conformational space of N-glycan chains. Our results suggest that structure prediction/modeling of N-glycans of glycoconjugates using structure database could be effective and different modeling approaches would be needed depending on the availability of template structures. PMID:23516343

  17. Data-assisted protein structure modeling by global optimization in CASP12.

    PubMed

    Joo, Keehyoung; Heo, Seungryong; Joung, InSuk; Hong, Seung Hwan; Lee, Sung Jong; Lee, Jooyoung

    2018-03-01

    In CASP12, 2 types of data-assisted protein structure modeling were experimented. Either SAXS experimental data or cross-linking experimental data was provided for a selected number of CASP12 targets that the CASP12 predictor could utilize for better protein structure modeling. We devised 2 separate energy terms for SAXS data and cross-linking data to drive the model structures into more native-like structures that satisfied the given experimental data as much as possible. In CASP11, we successfully performed protein structure modeling using simulated sparse and ambiguously assigned NOE data and/or correct residue-residue contact information, where the only energy term that folded the protein into its native structure was the term which was originated from the given experimental data. However, the 2 types of experimental data provided in CASP12 were far from being sufficient enough to fold the target protein into its native structure because SAXS data provides only the overall shape of the molecule and the cross-linking contact information provides only very low-resolution distance information. For this reason, we combined the SAXS or cross-linking energy term with our regular modeling energy function that includes both the template energy term and the de novo energy terms. By optimizing the newly formulated energy function, we obtained protein models that fit better with provided SAXS data than the X-ray structure of the target. However, the improvement of the model relative to the 1 modeled without the SAXS data, was not significant. Consistent structural improvement was achieved by incorporating cross-linking data into the protein structure modeling. © 2018 Wiley Periodicals, Inc.

  18. Statistical mechanics of protein structural transitions: Insights from the island model

    PubMed Central

    Kobayashi, Yukio

    2016-01-01

    The so-called island model of protein structural transition holds that hydrophobic interactions are the key to both the folding and function of proteins. Herein, the genesis and statistical mechanical basis of the island model of transitions are reviewed, by presenting the results of simulations of such transitions. Elucidating the physicochemical mechanism of protein structural formation is the foundation for understanding the hierarchical structure of life at the microscopic level. Based on the results obtained to date using the island model, remaining problems and future work in the field of protein structures are discussed, referencing Professor Saitô’s views on the hierarchic structure of science. PMID:28409078

  19. Large-scale model quality assessment for improving protein tertiary structure prediction.

    PubMed

    Cao, Renzhi; Bhattacharya, Debswapna; Adhikari, Badri; Li, Jilong; Cheng, Jianlin

    2015-06-15

    Sampling structural models and ranking them are the two major challenges of protein structure prediction. Traditional protein structure prediction methods generally use one or a few quality assessment (QA) methods to select the best-predicted models, which cannot consistently select relatively better models and rank a large number of models well. Here, we develop a novel large-scale model QA method in conjunction with model clustering to rank and select protein structural models. It unprecedentedly applied 14 model QA methods to generate consensus model rankings, followed by model refinement based on model combination (i.e. averaging). Our experiment demonstrates that the large-scale model QA approach is more consistent and robust in selecting models of better quality than any individual QA method. Our method was blindly tested during the 11th Critical Assessment of Techniques for Protein Structure Prediction (CASP11) as MULTICOM group. It was officially ranked third out of all 143 human and server predictors according to the total scores of the first models predicted for 78 CASP11 protein domains and second according to the total scores of the best of the five models predicted for these domains. MULTICOM's outstanding performance in the extremely competitive 2014 CASP11 experiment proves that our large-scale QA approach together with model clustering is a promising solution to one of the two major problems in protein structure modeling. The web server is available at: http://sysbio.rnet.missouri.edu/multicom_cluster/human/. © The Author 2015. Published by Oxford University Press.

  20. Protein docking by the interface structure similarity: how much structure is needed?

    PubMed

    Sinha, Rohita; Kundrotas, Petras J; Vakser, Ilya A

    2012-01-01

    The increasing availability of co-crystallized protein-protein complexes provides an opportunity to use template-based modeling for protein-protein docking. Structure alignment techniques are useful in detection of remote target-template similarities. The size of the structure involved in the alignment is important for the success in modeling. This paper describes a systematic large-scale study to find the optimal definition/size of the interfaces for the structure alignment-based docking applications. The results showed that structural areas corresponding to the cutoff values <12 Å across the interface inadequately represent structural details of the interfaces. With the increase of the cutoff beyond 12 Å, the success rate for the benchmark set of 99 protein complexes, did not increase significantly for higher accuracy models, and decreased for lower-accuracy models. The 12 Å cutoff was optimal in our interface alignment-based docking, and a likely best choice for the large-scale (e.g., on the scale of the entire genome) applications to protein interaction networks. The results provide guidelines for the docking approaches, including high-throughput applications to modeled structures.

  1. Template-based structure modeling of protein-protein interactions

    PubMed Central

    Szilagyi, Andras; Zhang, Yang

    2014-01-01

    The structure of protein-protein complexes can be constructed by using the known structure of other protein complexes as a template. The complex structure templates are generally detected either by homology-based sequence alignments or, given the structure of monomer components, by structure-based comparisons. Critical improvements have been made in recent years by utilizing interface recognition and by recombining monomer and complex template libraries. Encouraging progress has also been witnessed in genome-wide applications of template-based modeling, with modeling accuracy comparable to high-throughput experimental data. Nevertheless, bottlenecks exist due to the incompleteness of the proteinprotein complex structure library and the lack of methods for distant homologous template identification and full-length complex structure refinement. PMID:24721449

  2. Lessons from making the Structural Classification of Proteins (SCOP) and their implications for protein structure modelling.

    PubMed

    Andreeva, Antonina

    2016-06-15

    The Structural Classification of Proteins (SCOP) database has facilitated the development of many tools and algorithms and it has been successfully used in protein structure prediction and large-scale genome annotations. During the development of SCOP, numerous exceptions were found to topological rules, along with complex evolutionary scenarios and peculiarities in proteins including the ability to fold into alternative structures. This article reviews cases of structural variations observed for individual proteins and among groups of homologues, knowledge of which is essential for protein structure modelling. © 2016 The Author(s). published by Portland Press Limited on behalf of the Biochemical Society.

  3. Coverage of whole proteome by structural genomics observed through protein homology modeling database

    PubMed Central

    Yamaguchi, Akihiro; Go, Mitiko

    2006-01-01

    We have been developing FAMSBASE, a protein homology-modeling database of whole ORFs predicted from genome sequences. The latest update of FAMSBASE (http://daisy.nagahama-i-bio.ac.jp/Famsbase/), which is based on the protein three-dimensional (3D) structures released by November 2003, contains modeled 3D structures for 368,724 open reading frames (ORFs) derived from genomes of 276 species, namely 17 archaebacterial, 130 eubacterial, 18 eukaryotic and 111 phage genomes. Those 276 genomes are predicted to have 734,193 ORFs in total and the current FAMSBASE contains protein 3D structure of approximately 50% of the ORF products. However, cases that a modeled 3D structure covers the whole part of an ORF product are rare. When portion of an ORF with 3D structure is compared in three kingdoms of life, in archaebacteria and eubacteria, approximately 60% of the ORFs have modeled 3D structures covering almost the entire amino acid sequences, however, the percentage falls to about 30% in eukaryotes. When annual differences in the number of ORFs with modeled 3D structure are calculated, the fraction of modeled 3D structures of soluble protein for archaebacteria is increased by 5%, and that for eubacteria by 7% in the last 3 years. Assuming that this rate would be maintained and that determination of 3D structures for predicted disordered regions is unattainable, whole soluble protein model structures of prokaryotes without the putative disordered regions will be in hand within 15 years. For eukaryotic proteins, they will be in hand within 25 years. The 3D structures we will have at those times are not the 3D structure of the entire proteins encoded in single ORFs, but the 3D structures of separate structural domains. Measuring or predicting spatial arrangements of structural domains in an ORF will then be a coming issue of structural genomics. PMID:17146617

  4. Accurate protein structure modeling using sparse NMR data and homologous structure information.

    PubMed

    Thompson, James M; Sgourakis, Nikolaos G; Liu, Gaohua; Rossi, Paolo; Tang, Yuefeng; Mills, Jeffrey L; Szyperski, Thomas; Montelione, Gaetano T; Baker, David

    2012-06-19

    While information from homologous structures plays a central role in X-ray structure determination by molecular replacement, such information is rarely used in NMR structure determination because it can be incorrect, both locally and globally, when evolutionary relationships are inferred incorrectly or there has been considerable evolutionary structural divergence. Here we describe a method that allows robust modeling of protein structures of up to 225 residues by combining (1)H(N), (13)C, and (15)N backbone and (13)Cβ chemical shift data, distance restraints derived from homologous structures, and a physically realistic all-atom energy function. Accurate models are distinguished from inaccurate models generated using incorrect sequence alignments by requiring that (i) the all-atom energies of models generated using the restraints are lower than models generated in unrestrained calculations and (ii) the low-energy structures converge to within 2.0 Å backbone rmsd over 75% of the protein. Benchmark calculations on known structures and blind targets show that the method can accurately model protein structures, even with very remote homology information, to a backbone rmsd of 1.2-1.9 Å relative to the conventional determined NMR ensembles and of 0.9-1.6 Å relative to X-ray structures for well-defined regions of the protein structures. This approach facilitates the accurate modeling of protein structures using backbone chemical shift data without need for side-chain resonance assignments and extensive analysis of NOESY cross-peak assignments.

  5. Functional insights from proteome-wide structural modeling of Treponema pallidum subspecies pallidum, the causative agent of syphilis.

    PubMed

    Houston, Simon; Lithgow, Karen Vivien; Osbak, Kara Krista; Kenyon, Chris Richard; Cameron, Caroline E

    2018-05-16

    Syphilis continues to be a major global health threat with 11 million new infections each year, and a global burden of 36 million cases. The causative agent of syphilis, Treponema pallidum subspecies pallidum, is a highly virulent bacterium, however the molecular mechanisms underlying T. pallidum pathogenesis remain to be definitively identified. This is due to the fact that T. pallidum is currently uncultivatable, inherently fragile and thus difficult to work with, and phylogenetically distinct with no conventional virulence factor homologs found in other pathogens. In fact, approximately 30% of its predicted protein-coding genes have no known orthologs or assigned functions. Here we employed a structural bioinformatics approach using Phyre2-based tertiary structure modeling to improve our understanding of T. pallidum protein function on a proteome-wide scale. Phyre2-based tertiary structure modeling generated high-confidence predictions for 80% of the T. pallidum proteome (780/978 predicted proteins). Tertiary structure modeling also inferred the same function as primary structure-based annotations from genome sequencing pipelines for 525/605 proteins (87%), which represents 54% (525/978) of all T. pallidum proteins. Of the 175 T. pallidum proteins modeled with high confidence that were not assigned functions in the previously annotated published proteome, 167 (95%) were able to be assigned predicted functions. Twenty-one of the 175 hypothetical proteins modeled with high confidence were also predicted to exhibit significant structural similarity with proteins experimentally confirmed to be required for virulence in other pathogens. Phyre2-based structural modeling is a powerful bioinformatics tool that has provided insight into the potential structure and function of the majority of T. pallidum proteins and helped validate the primary structure-based annotation of more than 50% of all T. pallidum proteins with high confidence. This work represents the first T. pallidum proteome-wide structural modeling study and is one of few studies to apply this approach for the functional annotation of a whole proteome.

  6. ORION: a web server for protein fold recognition and structure prediction using evolutionary hybrid profiles

    PubMed Central

    Ghouzam, Yassine; Postic, Guillaume; Guerin, Pierre-Edouard; de Brevern, Alexandre G.; Gelly, Jean-Christophe

    2016-01-01

    Protein structure prediction based on comparative modeling is the most efficient way to produce structural models when it can be performed. ORION is a dedicated webserver based on a new strategy that performs this task. The identification by ORION of suitable templates is performed using an original profile-profile approach that combines sequence and structure evolution information. Structure evolution information is encoded into profiles using structural features, such as solvent accessibility and local conformation —with Protein Blocks—, which give an accurate description of the local protein structure. ORION has recently been improved, increasing by 5% the quality of its results. The ORION web server accepts a single protein sequence as input and searches homologous protein structures within minutes. Various databases such as PDB, SCOP and HOMSTRAD can be mined to find an appropriate structural template. For the modeling step, a protein 3D structure can be directly obtained from the selected template by MODELLER and displayed with global and local quality model estimation measures. The sequence and the predicted structure of 4 examples from the CAMEO server and a recent CASP11 target from the ‘Hard’ category (T0818-D1) are shown as pertinent examples. Our web server is accessible at http://www.dsimb.inserm.fr/ORION/. PMID:27319297

  7. ORION: a web server for protein fold recognition and structure prediction using evolutionary hybrid profiles.

    PubMed

    Ghouzam, Yassine; Postic, Guillaume; Guerin, Pierre-Edouard; de Brevern, Alexandre G; Gelly, Jean-Christophe

    2016-06-20

    Protein structure prediction based on comparative modeling is the most efficient way to produce structural models when it can be performed. ORION is a dedicated webserver based on a new strategy that performs this task. The identification by ORION of suitable templates is performed using an original profile-profile approach that combines sequence and structure evolution information. Structure evolution information is encoded into profiles using structural features, such as solvent accessibility and local conformation -with Protein Blocks-, which give an accurate description of the local protein structure. ORION has recently been improved, increasing by 5% the quality of its results. The ORION web server accepts a single protein sequence as input and searches homologous protein structures within minutes. Various databases such as PDB, SCOP and HOMSTRAD can be mined to find an appropriate structural template. For the modeling step, a protein 3D structure can be directly obtained from the selected template by MODELLER and displayed with global and local quality model estimation measures. The sequence and the predicted structure of 4 examples from the CAMEO server and a recent CASP11 target from the 'Hard' category (T0818-D1) are shown as pertinent examples. Our web server is accessible at http://www.dsimb.inserm.fr/ORION/.

  8. Protein Structure

    ERIC Educational Resources Information Center

    Asmus, Elaine Garbarino

    2007-01-01

    Individual students model specific amino acids and then, through dehydration synthesis, a class of students models a protein. The students clearly learn amino acid structure, primary, secondary, tertiary, and quaternary structure in proteins and the nature of the bonds maintaining a protein's shape. This activity is fun, concrete, inexpensive and…

  9. Modularity of Protein Folds as a Tool for Template-Free Modeling of Structures.

    PubMed

    Vallat, Brinda; Madrid-Aliste, Carlos; Fiser, Andras

    2015-08-01

    Predicting the three-dimensional structure of proteins from their amino acid sequences remains a challenging problem in molecular biology. While the current structural coverage of proteins is almost exclusively provided by template-based techniques, the modeling of the rest of the protein sequences increasingly require template-free methods. However, template-free modeling methods are much less reliable and are usually applicable for smaller proteins, leaving much space for improvement. We present here a novel computational method that uses a library of supersecondary structure fragments, known as Smotifs, to model protein structures. The library of Smotifs has saturated over time, providing a theoretical foundation for efficient modeling. The method relies on weak sequence signals from remotely related protein structures to create a library of Smotif fragments specific to the target protein sequence. This Smotif library is exploited in a fragment assembly protocol to sample decoys, which are assessed by a composite scoring function. Since the Smotif fragments are larger in size compared to the ones used in other fragment-based methods, the proposed modeling algorithm, SmotifTF, can employ an exhaustive sampling during decoy assembly. SmotifTF successfully predicts the overall fold of the target proteins in about 50% of the test cases and performs competitively when compared to other state of the art prediction methods, especially when sequence signal to remote homologs is diminishing. Smotif-based modeling is complementary to current prediction methods and provides a promising direction in addressing the structure prediction problem, especially when targeting larger proteins for modeling.

  10. Designing and evaluating the MULTICOM protein local and global model quality prediction methods in the CASP10 experiment

    PubMed Central

    2014-01-01

    Background Protein model quality assessment is an essential component of generating and using protein structural models. During the Tenth Critical Assessment of Techniques for Protein Structure Prediction (CASP10), we developed and tested four automated methods (MULTICOM-REFINE, MULTICOM-CLUSTER, MULTICOM-NOVEL, and MULTICOM-CONSTRUCT) that predicted both local and global quality of protein structural models. Results MULTICOM-REFINE was a clustering approach that used the average pairwise structural similarity between models to measure the global quality and the average Euclidean distance between a model and several top ranked models to measure the local quality. MULTICOM-CLUSTER and MULTICOM-NOVEL were two new support vector machine-based methods of predicting both the local and global quality of a single protein model. MULTICOM-CONSTRUCT was a new weighted pairwise model comparison (clustering) method that used the weighted average similarity between models in a pool to measure the global model quality. Our experiments showed that the pairwise model assessment methods worked better when a large portion of models in the pool were of good quality, whereas single-model quality assessment methods performed better on some hard targets when only a small portion of models in the pool were of reasonable quality. Conclusions Since digging out a few good models from a large pool of low-quality models is a major challenge in protein structure prediction, single model quality assessment methods appear to be poised to make important contributions to protein structure modeling. The other interesting finding was that single-model quality assessment scores could be used to weight the models by the consensus pairwise model comparison method to improve its accuracy. PMID:24731387

  11. Designing and evaluating the MULTICOM protein local and global model quality prediction methods in the CASP10 experiment.

    PubMed

    Cao, Renzhi; Wang, Zheng; Cheng, Jianlin

    2014-04-15

    Protein model quality assessment is an essential component of generating and using protein structural models. During the Tenth Critical Assessment of Techniques for Protein Structure Prediction (CASP10), we developed and tested four automated methods (MULTICOM-REFINE, MULTICOM-CLUSTER, MULTICOM-NOVEL, and MULTICOM-CONSTRUCT) that predicted both local and global quality of protein structural models. MULTICOM-REFINE was a clustering approach that used the average pairwise structural similarity between models to measure the global quality and the average Euclidean distance between a model and several top ranked models to measure the local quality. MULTICOM-CLUSTER and MULTICOM-NOVEL were two new support vector machine-based methods of predicting both the local and global quality of a single protein model. MULTICOM-CONSTRUCT was a new weighted pairwise model comparison (clustering) method that used the weighted average similarity between models in a pool to measure the global model quality. Our experiments showed that the pairwise model assessment methods worked better when a large portion of models in the pool were of good quality, whereas single-model quality assessment methods performed better on some hard targets when only a small portion of models in the pool were of reasonable quality. Since digging out a few good models from a large pool of low-quality models is a major challenge in protein structure prediction, single model quality assessment methods appear to be poised to make important contributions to protein structure modeling. The other interesting finding was that single-model quality assessment scores could be used to weight the models by the consensus pairwise model comparison method to improve its accuracy.

  12. Protein loop modeling using a new hybrid energy function and its application to modeling in inaccurate structural environments.

    PubMed

    Park, Hahnbeom; Lee, Gyu Rie; Heo, Lim; Seok, Chaok

    2014-01-01

    Protein loop modeling is a tool for predicting protein local structures of particular interest, providing opportunities for applications involving protein structure prediction and de novo protein design. Until recently, the majority of loop modeling methods have been developed and tested by reconstructing loops in frameworks of experimentally resolved structures. In many practical applications, however, the protein loops to be modeled are located in inaccurate structural environments. These include loops in model structures, low-resolution experimental structures, or experimental structures of different functional forms. Accordingly, discrepancies in the accuracy of the structural environment assumed in development of the method and that in practical applications present additional challenges to modern loop modeling methods. This study demonstrates a new strategy for employing a hybrid energy function combining physics-based and knowledge-based components to help tackle this challenge. The hybrid energy function is designed to combine the strengths of each energy component, simultaneously maintaining accurate loop structure prediction in a high-resolution framework structure and tolerating minor environmental errors in low-resolution structures. A loop modeling method based on global optimization of this new energy function is tested on loop targets situated in different levels of environmental errors, ranging from experimental structures to structures perturbed in backbone as well as side chains and template-based model structures. The new method performs comparably to force field-based approaches in loop reconstruction in crystal structures and better in loop prediction in inaccurate framework structures. This result suggests that higher-accuracy predictions would be possible for a broader range of applications. The web server for this method is available at http://galaxy.seoklab.org/loop with the PS2 option for the scoring function.

  13. Computer Models of Proteins

    NASA Technical Reports Server (NTRS)

    2000-01-01

    Dr. Marc Pusey (seated) and Dr. Craig Kundrot use computers to analyze x-ray maps and generate three-dimensional models of protein structures. With this information, scientists at Marshall Space Flight Center can learn how proteins are made and how they work. The computer screen depicts a proten structure as a ball-and-stick model. Other models depict the actual volume occupied by the atoms, or the ribbon-like structures that are crucial to a protein's function.

  14. Protein folding, protein structure and the origin of life: Theoretical methods and solutions of dynamical problems

    NASA Technical Reports Server (NTRS)

    Weaver, D. L.

    1982-01-01

    Theoretical methods and solutions of the dynamics of protein folding, protein aggregation, protein structure, and the origin of life are discussed. The elements of a dynamic model representing the initial stages of protein folding are presented. The calculation and experimental determination of the model parameters are discussed. The use of computer simulation for modeling protein folding is considered.

  15. Tactile Teaching: Exploring Protein Structure/Function Using Physical Models

    ERIC Educational Resources Information Center

    Herman, Tim; Morris, Jennifer; Colton, Shannon; Batiza, Ann; Patrick, Michael; Franzen, Margaret; Goodsell, David S.

    2006-01-01

    The technology now exists to construct physical models of proteins based on atomic coordinates of solved structures. We review here our recent experiences in using physical models to teach concepts of protein structure and function at both the high school and the undergraduate levels. At the high school level, physical models are used in a…

  16. Three-dimensional (3D) structure prediction of the American and African oil-palms β-ketoacyl-[ACP] synthase-II protein by comparative modelling

    PubMed Central

    Wang, Edina; Chinni, Suresh; Bhore, Subhash Janardhan

    2014-01-01

    Background: The fatty-acid profile of the vegetable oils determines its properties and nutritional value. Palm-oil obtained from the African oil-palm [Elaeis guineensis Jacq. (Tenera)] contains 44% palmitic acid (C16:0), but, palm-oil obtained from the American oilpalm [Elaeis oleifera] contains only 25% C16:0. In part, the b-ketoacyl-[ACP] synthase II (KASII) [EC: 2.3.1.179] protein is responsible for the high level of C16:0 in palm-oil derived from the African oil-palm. To understand more about E. guineensis KASII (EgKASII) and E. oleifera KASII (EoKASII) proteins, it is essential to know its structures. Hence, this study was undertaken. Objective: The objective of this study was to predict three-dimensional (3D) structure of EgKASII and EoKASII proteins using molecular modelling tools. Materials and Methods: The amino-acid sequences for KASII proteins were retrieved from the protein database of National Center for Biotechnology Information (NCBI), USA. The 3D structures were predicted for both proteins using homology modelling and ab-initio technique approach of protein structure prediction. The molecular dynamics (MD) simulation was performed to refine the predicted structures. The predicted structure models were evaluated and root mean square deviation (RMSD) and root mean square fluctuation (RMSF) values were calculated. Results: The homology modelling showed that EgKASII and EoKASII proteins are 78% and 74% similar with Streptococcus pneumonia KASII and Brucella melitensis KASII, respectively. The EgKASII and EoKASII structures predicted by using ab-initio technique approach shows 6% and 9% deviation to its structures predicted by homology modelling, respectively. The structure refinement and validation confirmed that the predicted structures are accurate. Conclusion: The 3D structures for EgKASII and EoKASII proteins were predicted. However, further research is essential to understand the interaction of EgKASII and EoKASII proteins with its substrates. PMID:24748752

  17. Three-dimensional (3D) structure prediction of the American and African oil-palms β-ketoacyl-[ACP] synthase-II protein by comparative modelling.

    PubMed

    Wang, Edina; Chinni, Suresh; Bhore, Subhash Janardhan

    2014-01-01

    The fatty-acid profile of the vegetable oils determines its properties and nutritional value. Palm-oil obtained from the African oil-palm [Elaeis guineensis Jacq. (Tenera)] contains 44% palmitic acid (C16:0), but, palm-oil obtained from the American oilpalm [Elaeis oleifera] contains only 25% C16:0. In part, the b-ketoacyl-[ACP] synthase II (KASII) [EC: 2.3.1.179] protein is responsible for the high level of C16:0 in palm-oil derived from the African oil-palm. To understand more about E. guineensis KASII (EgKASII) and E. oleifera KASII (EoKASII) proteins, it is essential to know its structures. Hence, this study was undertaken. The objective of this study was to predict three-dimensional (3D) structure of EgKASII and EoKASII proteins using molecular modelling tools. The amino-acid sequences for KASII proteins were retrieved from the protein database of National Center for Biotechnology Information (NCBI), USA. The 3D structures were predicted for both proteins using homology modelling and ab-initio technique approach of protein structure prediction. The molecular dynamics (MD) simulation was performed to refine the predicted structures. The predicted structure models were evaluated and root mean square deviation (RMSD) and root mean square fluctuation (RMSF) values were calculated. The homology modelling showed that EgKASII and EoKASII proteins are 78% and 74% similar with Streptococcus pneumonia KASII and Brucella melitensis KASII, respectively. The EgKASII and EoKASII structures predicted by using ab-initio technique approach shows 6% and 9% deviation to its structures predicted by homology modelling, respectively. The structure refinement and validation confirmed that the predicted structures are accurate. The 3D structures for EgKASII and EoKASII proteins were predicted. However, further research is essential to understand the interaction of EgKASII and EoKASII proteins with its substrates.

  18. Mathematics, thermodynamics, and modeling to address ten common misconceptions about protein structure, folding, and stability.

    PubMed

    Robic, Srebrenka

    2010-01-01

    To fully understand the roles proteins play in cellular processes, students need to grasp complex ideas about protein structure, folding, and stability. Our current understanding of these topics is based on mathematical models and experimental data. However, protein structure, folding, and stability are often introduced as descriptive, qualitative phenomena in undergraduate classes. In the process of learning about these topics, students often form incorrect ideas. For example, by learning about protein folding in the context of protein synthesis, students may come to an incorrect conclusion that once synthesized on the ribosome, a protein spends its entire cellular life time in its fully folded native confirmation. This is clearly not true; proteins are dynamic structures that undergo both local fluctuations and global unfolding events. To prevent and address such misconceptions, basic concepts of protein science can be introduced in the context of simple mathematical models and hands-on explorations of publicly available data sets. Ten common misconceptions about proteins are presented, along with suggestions for using equations, models, sequence, structure, and thermodynamic data to help students gain a deeper understanding of basic concepts relating to protein structure, folding, and stability.

  19. Hidden Markov model-derived structural alphabet for proteins: the learning of protein local shapes captures sequence specificity.

    PubMed

    Camproux, A C; Tufféry, P

    2005-08-05

    Understanding and predicting protein structures depend on the complexity and the accuracy of the models used to represent them. We have recently set up a Hidden Markov Model to optimally compress protein three-dimensional conformations into a one-dimensional series of letters of a structural alphabet. Such a model learns simultaneously the shape of representative structural letters describing the local conformation and the logic of their connections, i.e. the transition matrix between the letters. Here, we move one step further and report some evidence that such a model of protein local architecture also captures some accurate amino acid features. All the letters have specific and distinct amino acid distributions. Moreover, we show that words of amino acids can have significant propensities for some letters. Perspectives point towards the prediction of the series of letters describing the structure of a protein from its amino acid sequence.

  20. Protein single-model quality assessment by feature-based probability density functions.

    PubMed

    Cao, Renzhi; Cheng, Jianlin

    2016-04-04

    Protein quality assessment (QA) has played an important role in protein structure prediction. We developed a novel single-model quality assessment method-Qprob. Qprob calculates the absolute error for each protein feature value against the true quality scores (i.e. GDT-TS scores) of protein structural models, and uses them to estimate its probability density distribution for quality assessment. Qprob has been blindly tested on the 11th Critical Assessment of Techniques for Protein Structure Prediction (CASP11) as MULTICOM-NOVEL server. The official CASP result shows that Qprob ranks as one of the top single-model QA methods. In addition, Qprob makes contributions to our protein tertiary structure predictor MULTICOM, which is officially ranked 3rd out of 143 predictors. The good performance shows that Qprob is good at assessing the quality of models of hard targets. These results demonstrate that this new probability density distribution based method is effective for protein single-model quality assessment and is useful for protein structure prediction. The webserver of Qprob is available at: http://calla.rnet.missouri.edu/qprob/. The software is now freely available in the web server of Qprob.

  1. CONFOLD2: improved contact-driven ab initio protein structure modeling.

    PubMed

    Adhikari, Badri; Cheng, Jianlin

    2018-01-25

    Contact-guided protein structure prediction methods are becoming more and more successful because of the latest advances in residue-residue contact prediction. To support contact-driven structure prediction, effective tools that can quickly build tertiary structural models of good quality from predicted contacts need to be developed. We develop an improved contact-driven protein modelling method, CONFOLD2, and study how it may be effectively used for ab initio protein structure prediction with predicted contacts as input. It builds models using various subsets of input contacts to explore the fold space under the guidance of a soft square energy function, and then clusters the models to obtain the top five models. CONFOLD2 obtains an average reconstruction accuracy of 0.57 TM-score for the 150 proteins in the PSICOV contact prediction dataset. When benchmarked on the CASP11 contacts predicted using CONSIP2 and CASP12 contacts predicted using Raptor-X, CONFOLD2 achieves a mean TM-score of 0.41 on both datasets. CONFOLD2 allows to quickly generate top five structural models for a protein sequence when its secondary structures and contacts predictions at hand. The source code of CONFOLD2 is publicly available at https://github.com/multicom-toolbox/CONFOLD2/ .

  2. Protein modeling and molecular dynamics simulation of SlWRKY4 protein cloned from drought tolerant tomato (Solanum habrochaites) line EC520061.

    PubMed

    Karkute, Suhas G; Easwaran, Murugesh; Gujjar, Ranjit Singh; Piramanayagam, Shanmughavel; Singh, Major

    2015-10-01

    WRKY genes are members of one of the largest families of plant transcription factors and play an important role in response to biotic and abiotic stresses, and overall growth and development. Understanding the interaction of WRKY proteins with other proteins/ligands in plant cells is of utmost importance to develop plants having tolerance to biotic and abiotic stresses. The SlWRKY4 gene was cloned from a drought tolerant wild species of tomato (Solanum habrochaites) and the secondary structure and 3D modeling of this protein were predicted using Schrödinger Suite-Prime. Predicted structures were also subjected to plot against Ramachandran's conformation, and the modeled structure was minimized using Macromodel. Finally, the minimized structure was simulated in the water environment to check the protein stability. The behavior of the modeled structure was well-simulated and analyzed through RMSD and RMSF of the protein. The present work provides the modeled 3D structure of SlWRKY4 that will help in understanding the mechanism of gene regulation by further in silico interaction studies.

  3. Bayesian comparison of protein structures using partial Procrustes distance.

    PubMed

    Ejlali, Nasim; Faghihi, Mohammad Reza; Sadeghi, Mehdi

    2017-09-26

    An important topic in bioinformatics is the protein structure alignment. Some statistical methods have been proposed for this problem, but most of them align two protein structures based on the global geometric information without considering the effect of neighbourhood in the structures. In this paper, we provide a Bayesian model to align protein structures, by considering the effect of both local and global geometric information of protein structures. Local geometric information is incorporated to the model through the partial Procrustes distance of small substructures. These substructures are composed of β-carbon atoms from the side chains. Parameters are estimated using a Markov chain Monte Carlo (MCMC) approach. We evaluate the performance of our model through some simulation studies. Furthermore, we apply our model to a real dataset and assess the accuracy and convergence rate. Results show that our model is much more efficient than previous approaches.

  4. Conformational Sampling in Template-Free Protein Loop Structure Modeling: An Overview

    PubMed Central

    Li, Yaohang

    2013-01-01

    Accurately modeling protein loops is an important step to predict three-dimensional structures as well as to understand functions of many proteins. Because of their high flexibility, modeling the three-dimensional structures of loops is difficult and is usually treated as a “mini protein folding problem” under geometric constraints. In the past decade, there has been remarkable progress in template-free loop structure modeling due to advances of computational methods as well as stably increasing number of known structures available in PDB. This mini review provides an overview on the recent computational approaches for loop structure modeling. In particular, we focus on the approaches of sampling loop conformation space, which is a critical step to obtain high resolution models in template-free methods. We review the potential energy functions for loop modeling, loop buildup mechanisms to satisfy geometric constraints, and loop conformation sampling algorithms. The recent loop modeling results are also summarized. PMID:24688696

  5. Conformational sampling in template-free protein loop structure modeling: an overview.

    PubMed

    Li, Yaohang

    2013-01-01

    Accurately modeling protein loops is an important step to predict three-dimensional structures as well as to understand functions of many proteins. Because of their high flexibility, modeling the three-dimensional structures of loops is difficult and is usually treated as a "mini protein folding problem" under geometric constraints. In the past decade, there has been remarkable progress in template-free loop structure modeling due to advances of computational methods as well as stably increasing number of known structures available in PDB. This mini review provides an overview on the recent computational approaches for loop structure modeling. In particular, we focus on the approaches of sampling loop conformation space, which is a critical step to obtain high resolution models in template-free methods. We review the potential energy functions for loop modeling, loop buildup mechanisms to satisfy geometric constraints, and loop conformation sampling algorithms. The recent loop modeling results are also summarized.

  6. VoroMQA: Assessment of protein structure quality using interatomic contact areas.

    PubMed

    Olechnovič, Kliment; Venclovas, Česlovas

    2017-06-01

    In the absence of experimentally determined protein structure many biological questions can be addressed using computational structural models. However, the utility of protein structural models depends on their quality. Therefore, the estimation of the quality of predicted structures is an important problem. One of the approaches to this problem is the use of knowledge-based statistical potentials. Such methods typically rely on the statistics of distances and angles of residue-residue or atom-atom interactions collected from experimentally determined structures. Here, we present VoroMQA (Voronoi tessellation-based Model Quality Assessment), a new method for the estimation of protein structure quality. Our method combines the idea of statistical potentials with the use of interatomic contact areas instead of distances. Contact areas, derived using Voronoi tessellation of protein structure, are used to describe and seamlessly integrate both explicit interactions between protein atoms and implicit interactions of protein atoms with solvent. VoroMQA produces scores at atomic, residue, and global levels, all in the fixed range from 0 to 1. The method was tested on the CASP data and compared to several other single-model quality assessment methods. VoroMQA showed strong performance in the recognition of the native structure and in the structural model selection tests, thus demonstrating the efficacy of interatomic contact areas in estimating protein structure quality. The software implementation of VoroMQA is freely available as a standalone application and as a web server at http://bioinformatics.lt/software/voromqa. Proteins 2017; 85:1131-1145. © 2017 Wiley Periodicals, Inc. © 2017 Wiley Periodicals, Inc.

  7. Designing and benchmarking the MULTICOM protein structure prediction system

    PubMed Central

    2013-01-01

    Background Predicting protein structure from sequence is one of the most significant and challenging problems in bioinformatics. Numerous bioinformatics techniques and tools have been developed to tackle almost every aspect of protein structure prediction ranging from structural feature prediction, template identification and query-template alignment to structure sampling, model quality assessment, and model refinement. How to synergistically select, integrate and improve the strengths of the complementary techniques at each prediction stage and build a high-performance system is becoming a critical issue for constructing a successful, competitive protein structure predictor. Results Over the past several years, we have constructed a standalone protein structure prediction system MULTICOM that combines multiple sources of information and complementary methods at all five stages of the protein structure prediction process including template identification, template combination, model generation, model assessment, and model refinement. The system was blindly tested during the ninth Critical Assessment of Techniques for Protein Structure Prediction (CASP9) in 2010 and yielded very good performance. In addition to studying the overall performance on the CASP9 benchmark, we thoroughly investigated the performance and contributions of each component at each stage of prediction. Conclusions Our comprehensive and comparative study not only provides useful and practical insights about how to select, improve, and integrate complementary methods to build a cutting-edge protein structure prediction system but also identifies a few new sources of information that may help improve the design of a protein structure prediction system. Several components used in the MULTICOM system are available at: http://sysbio.rnet.missouri.edu/multicom_toolbox/. PMID:23442819

  8. Modeling Structure and Dynamics of Protein Complexes with SAXS Profiles

    PubMed Central

    Schneidman-Duhovny, Dina; Hammel, Michal

    2018-01-01

    Small-angle X-ray scattering (SAXS) is an increasingly common and useful technique for structural characterization of molecules in solution. A SAXS experiment determines the scattering intensity of a molecule as a function of spatial frequency, termed SAXS profile. SAXS profiles can be utilized in a variety of molecular modeling applications, such as comparing solution and crystal structures, structural characterization of flexible proteins, assembly of multi-protein complexes, and modeling of missing regions in the high-resolution structure. Here, we describe protocols for modeling atomic structures based on SAXS profiles. The first protocol is for comparing solution and crystal structures including modeling of missing regions and determination of the oligomeric state. The second protocol performs multi-state modeling by finding a set of conformations and their weights that fit the SAXS profile starting from a single-input structure. The third protocol is for protein-protein docking based on the SAXS profile of the complex. We describe the underlying software, followed by demonstrating their application on interleukin 33 (IL33) with its primary receptor ST2 and DNA ligase IV-XRCC4 complex. PMID:29605933

  9. VITAL NMR: Using Chemical Shift Derived Secondary Structure Information for a Limited Set of Amino Acids to Assess Homology Model Accuracy

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Brothers, Michael C; Nesbitt, Anna E; Hallock, Michael J

    2011-01-01

    Homology modeling is a powerful tool for predicting protein structures, whose success depends on obtaining a reasonable alignment between a given structural template and the protein sequence being analyzed. In order to leverage greater predictive power for proteins with few structural templates, we have developed a method to rank homology models based upon their compliance to secondary structure derived from experimental solid-state NMR (SSNMR) data. Such data is obtainable in a rapid manner by simple SSNMR experiments (e.g., (13)C-(13)C 2D correlation spectra). To test our homology model scoring procedure for various amino acid labeling schemes, we generated a library ofmore » 7,474 homology models for 22 protein targets culled from the TALOS+/SPARTA+ training set of protein structures. Using subsets of amino acids that are plausibly assigned by SSNMR, we discovered that pairs of the residues Val, Ile, Thr, Ala and Leu (VITAL) emulate an ideal dataset where all residues are site specifically assigned. Scoring the models with a predicted VITAL site-specific dataset and calculating secondary structure with the Chemical Shift Index resulted in a Pearson correlation coefficient (-0.75) commensurate to the control (-0.77), where secondary structure was scored site specifically for all amino acids (ALL 20) using STRIDE. This method promises to accelerate structure procurement by SSNMR for proteins with unknown folds through guiding the selection of remotely homologous protein templates and assessing model quality.« less

  10. Mathematics, Thermodynamics, and Modeling to Address Ten Common Misconceptions about Protein Structure, Folding, and Stability

    ERIC Educational Resources Information Center

    Robic, Srebrenka

    2010-01-01

    To fully understand the roles proteins play in cellular processes, students need to grasp complex ideas about protein structure, folding, and stability. Our current understanding of these topics is based on mathematical models and experimental data. However, protein structure, folding, and stability are often introduced as descriptive, qualitative…

  11. Structure Refinement of Protein Low Resolution Models Using the GNEIMO Constrained Dynamics Method

    PubMed Central

    Park, In-Hee; Gangupomu, Vamshi; Wagner, Jeffrey; Jain, Abhinandan; Vaidehi, Nagara-jan

    2012-01-01

    The challenge in protein structure prediction using homology modeling is the lack of reliable methods to refine the low resolution homology models. Unconstrained all-atom molecular dynamics (MD) does not serve well for structure refinement due to its limited conformational search. We have developed and tested the constrained MD method, based on the Generalized Newton-Euler Inverse Mass Operator (GNEIMO) algorithm for protein structure refinement. In this method, the high-frequency degrees of freedom are replaced with hard holonomic constraints and a protein is modeled as a collection of rigid body clusters connected by flexible torsional hinges. This allows larger integration time steps and enhances the conformational search space. In this work, we have demonstrated the use of a constraint free GNEIMO method for protein structure refinement that starts from low-resolution decoy sets derived from homology methods. In the eight proteins with three decoys for each, we observed an improvement of ~2 Å in the RMSD to the known experimental structures of these proteins. The GNEIMO method also showed enrichment in the population density of native-like conformations. In addition, we demonstrated structural refinement using a “Freeze and Thaw” clustering scheme with the GNEIMO framework as a viable tool for enhancing localized conformational search. We have derived a robust protocol based on the GNEIMO replica exchange method for protein structure refinement that can be readily extended to other proteins and possibly applicable for high throughput protein structure refinement. PMID:22260550

  12. Protein structure prediction with local adjust tabu search algorithm

    PubMed Central

    2014-01-01

    Background Protein folding structure prediction is one of the most challenging problems in the bioinformatics domain. Because of the complexity of the realistic protein structure, the simplified structure model and the computational method should be adopted in the research. The AB off-lattice model is one of the simplification models, which only considers two classes of amino acids, hydrophobic (A) residues and hydrophilic (B) residues. Results The main work of this paper is to discuss how to optimize the lowest energy configurations in 2D off-lattice model and 3D off-lattice model by using Fibonacci sequences and real protein sequences. In order to avoid falling into local minimum and faster convergence to the global minimum, we introduce a novel method (SATS) to the protein structure problem, which combines simulated annealing algorithm and tabu search algorithm. Various strategies, such as the new encoding strategy, the adaptive neighborhood generation strategy and the local adjustment strategy, are adopted successfully for high-speed searching the optimal conformation corresponds to the lowest energy of the protein sequences. Experimental results show that some of the results obtained by the improved SATS are better than those reported in previous literatures, and we can sure that the lowest energy folding state for short Fibonacci sequences have been found. Conclusions Although the off-lattice models is not very realistic, they can reflect some important characteristics of the realistic protein. It can be found that 3D off-lattice model is more like native folding structure of the realistic protein than 2D off-lattice model. In addition, compared with some previous researches, the proposed hybrid algorithm can more effectively and more quickly search the spatial folding structure of a protein chain. PMID:25474708

  13. Modeling the Structure of Helical Assemblies with Experimental Constraints in Rosetta.

    PubMed

    André, Ingemar

    2018-01-01

    Determining high-resolution structures of proteins with helical symmetry can be challenging due to limitations in experimental data. In such instances, structure-based protein simulations driven by experimental data can provide a valuable approach for building models of helical assemblies. This chapter describes how the Rosetta macromolecular package can be used to model homomeric protein assemblies with helical symmetry in a range of modeling scenarios including energy refinement, symmetrical docking, comparative modeling, and de novo structure prediction. Data-guided structure modeling of helical assemblies with experimental information from electron density, X-ray fiber diffraction, solid-state NMR, and chemical cross-linking mass spectrometry is also described.

  14. CCBuilder: an interactive web-based tool for building, designing and assessing coiled-coil protein assemblies.

    PubMed

    Wood, Christopher W; Bruning, Marc; Ibarra, Amaurys Á; Bartlett, Gail J; Thomson, Andrew R; Sessions, Richard B; Brady, R Leo; Woolfson, Derek N

    2014-11-01

    The ability to accurately model protein structures at the atomistic level underpins efforts to understand protein folding, to engineer natural proteins predictably and to design proteins de novo. Homology-based methods are well established and produce impressive results. However, these are limited to structures presented by and resolved for natural proteins. Addressing this problem more widely and deriving truly ab initio models requires mathematical descriptions for protein folds; the means to decorate these with natural, engineered or de novo sequences; and methods to score the resulting models. We present CCBuilder, a web-based application that tackles the problem for a defined but large class of protein structure, the α-helical coiled coils. CCBuilder generates coiled-coil backbones, builds side chains onto these frameworks and provides a range of metrics to measure the quality of the models. Its straightforward graphical user interface provides broad functionality that allows users to build and assess models, in which helix geometry, coiled-coil architecture and topology and protein sequence can be varied rapidly. We demonstrate the utility of CCBuilder by assembling models for 653 coiled-coil structures from the PDB, which cover >96% of the known coiled-coil types, and by generating models for rarer and de novo coiled-coil structures. CCBuilder is freely available, without registration, at http://coiledcoils.chm.bris.ac.uk/app/cc_builder/. © The Author 2014. Published by Oxford University Press.

  15. RaptorX server: a resource for template-based protein structure modeling.

    PubMed

    Källberg, Morten; Margaryan, Gohar; Wang, Sheng; Ma, Jianzhu; Xu, Jinbo

    2014-01-01

    Assigning functional properties to a newly discovered protein is a key challenge in modern biology. To this end, computational modeling of the three-dimensional atomic arrangement of the amino acid chain is often crucial in determining the role of the protein in biological processes. We present a community-wide web-based protocol, RaptorX server ( http://raptorx.uchicago.edu ), for automated protein secondary structure prediction, template-based tertiary structure modeling, and probabilistic alignment sampling.Given a target sequence, RaptorX server is able to detect even remotely related template sequences by means of a novel nonlinear context-specific alignment potential and probabilistic consistency algorithm. Using the protocol presented here it is thus possible to obtain high-quality structural models for many target protein sequences when only distantly related protein domains have experimentally solved structures. At present, RaptorX server can perform secondary and tertiary structure prediction of a 200 amino acid target sequence in approximately 30 min.

  16. @TOME-2: a new pipeline for comparative modeling of protein-ligand complexes.

    PubMed

    Pons, Jean-Luc; Labesse, Gilles

    2009-07-01

    @TOME 2.0 is new web pipeline dedicated to protein structure modeling and small ligand docking based on comparative analyses. @TOME 2.0 allows fold recognition, template selection, structural alignment editing, structure comparisons, 3D-model building and evaluation. These tasks are routinely used in sequence analyses for structure prediction. In our pipeline the necessary software is efficiently interconnected in an original manner to accelerate all the processes. Furthermore, we have also connected comparative docking of small ligands that is performed using protein-protein superposition. The input is a simple protein sequence in one-letter code with no comment. The resulting 3D model, protein-ligand complexes and structural alignments can be visualized through dedicated Web interfaces or can be downloaded for further studies. These original features will aid in the functional annotation of proteins and the selection of templates for molecular modeling and virtual screening. Several examples are described to highlight some of the new functionalities provided by this pipeline. The server and its documentation are freely available at http://abcis.cbs.cnrs.fr/AT2/

  17. Binding free energy analysis of protein-protein docking model structures by evERdock.

    PubMed

    Takemura, Kazuhiro; Matubayasi, Nobuyuki; Kitao, Akio

    2018-03-14

    To aid the evaluation of protein-protein complex model structures generated by protein docking prediction (decoys), we previously developed a method to calculate the binding free energies for complexes. The method combines a short (2 ns) all-atom molecular dynamics simulation with explicit solvent and solution theory in the energy representation (ER). We showed that this method successfully selected structures similar to the native complex structure (near-native decoys) as the lowest binding free energy structures. In our current work, we applied this method (evERdock) to 100 or 300 model structures of four protein-protein complexes. The crystal structures and the near-native decoys showed the lowest binding free energy of all the examined structures, indicating that evERdock can successfully evaluate decoys. Several decoys that show low interface root-mean-square distance but relatively high binding free energy were also identified. Analysis of the fraction of native contacts, hydrogen bonds, and salt bridges at the protein-protein interface indicated that these decoys were insufficiently optimized at the interface. After optimizing the interactions around the interface by including interfacial water molecules, the binding free energies of these decoys were improved. We also investigated the effect of solute entropy on binding free energy and found that consideration of the entropy term does not necessarily improve the evaluations of decoys using the normal model analysis for entropy calculation.

  18. Binding free energy analysis of protein-protein docking model structures by evERdock

    NASA Astrophysics Data System (ADS)

    Takemura, Kazuhiro; Matubayasi, Nobuyuki; Kitao, Akio

    2018-03-01

    To aid the evaluation of protein-protein complex model structures generated by protein docking prediction (decoys), we previously developed a method to calculate the binding free energies for complexes. The method combines a short (2 ns) all-atom molecular dynamics simulation with explicit solvent and solution theory in the energy representation (ER). We showed that this method successfully selected structures similar to the native complex structure (near-native decoys) as the lowest binding free energy structures. In our current work, we applied this method (evERdock) to 100 or 300 model structures of four protein-protein complexes. The crystal structures and the near-native decoys showed the lowest binding free energy of all the examined structures, indicating that evERdock can successfully evaluate decoys. Several decoys that show low interface root-mean-square distance but relatively high binding free energy were also identified. Analysis of the fraction of native contacts, hydrogen bonds, and salt bridges at the protein-protein interface indicated that these decoys were insufficiently optimized at the interface. After optimizing the interactions around the interface by including interfacial water molecules, the binding free energies of these decoys were improved. We also investigated the effect of solute entropy on binding free energy and found that consideration of the entropy term does not necessarily improve the evaluations of decoys using the normal model analysis for entropy calculation.

  19. The interface of protein structure, protein biophysics, and molecular evolution

    PubMed Central

    Liberles, David A; Teichmann, Sarah A; Bahar, Ivet; Bastolla, Ugo; Bloom, Jesse; Bornberg-Bauer, Erich; Colwell, Lucy J; de Koning, A P Jason; Dokholyan, Nikolay V; Echave, Julian; Elofsson, Arne; Gerloff, Dietlind L; Goldstein, Richard A; Grahnen, Johan A; Holder, Mark T; Lakner, Clemens; Lartillot, Nicholas; Lovell, Simon C; Naylor, Gavin; Perica, Tina; Pollock, David D; Pupko, Tal; Regan, Lynne; Roger, Andrew; Rubinstein, Nimrod; Shakhnovich, Eugene; Sjölander, Kimmen; Sunyaev, Shamil; Teufel, Ashley I; Thorne, Jeffrey L; Thornton, Joseph W; Weinreich, Daniel M; Whelan, Simon

    2012-01-01

    Abstract The interface of protein structural biology, protein biophysics, molecular evolution, and molecular population genetics forms the foundations for a mechanistic understanding of many aspects of protein biochemistry. Current efforts in interdisciplinary protein modeling are in their infancy and the state-of-the art of such models is described. Beyond the relationship between amino acid substitution and static protein structure, protein function, and corresponding organismal fitness, other considerations are also discussed. More complex mutational processes such as insertion and deletion and domain rearrangements and even circular permutations should be evaluated. The role of intrinsically disordered proteins is still controversial, but may be increasingly important to consider. Protein geometry and protein dynamics as a deviation from static considerations of protein structure are also important. Protein expression level is known to be a major determinant of evolutionary rate and several considerations including selection at the mRNA level and the role of interaction specificity are discussed. Lastly, the relationship between modeling and needed high-throughput experimental data as well as experimental examination of protein evolution using ancestral sequence resurrection and in vitro biochemistry are presented, towards an aim of ultimately generating better models for biological inference and prediction. PMID:22528593

  20. Correction of erroneously packed protein's side chains in the NMR structure based on ab initio chemical shift calculations.

    PubMed

    Zhu, Tong; Zhang, John Z H; He, Xiao

    2014-09-14

    In this work, protein side chain (1)H chemical shifts are used as probes to detect and correct side-chain packing errors in protein's NMR structures through structural refinement. By applying the automated fragmentation quantum mechanics/molecular mechanics (AF-QM/MM) method for ab initio calculation of chemical shifts, incorrect side chain packing was detected in the NMR structures of the Pin1 WW domain. The NMR structure is then refined by using molecular dynamics simulation and the polarized protein-specific charge (PPC) model. The computationally refined structure of the Pin1 WW domain is in excellent agreement with the corresponding X-ray structure. In particular, the use of the PPC model yields a more accurate structure than that using the standard (nonpolarizable) force field. For comparison, some of the widely used empirical models for chemical shift calculations are unable to correctly describe the relationship between the particular proton chemical shift and protein structures. The AF-QM/MM method can be used as a powerful tool for protein NMR structure validation and structural flaw detection.

  1. General overview on structure prediction of twilight-zone proteins.

    PubMed

    Khor, Bee Yin; Tye, Gee Jun; Lim, Theam Soon; Choong, Yee Siew

    2015-09-04

    Protein structure prediction from amino acid sequence has been one of the most challenging aspects in computational structural biology despite significant progress in recent years showed by critical assessment of protein structure prediction (CASP) experiments. When experimentally determined structures are unavailable, the predictive structures may serve as starting points to study a protein. If the target protein consists of homologous region, high-resolution (typically <1.5 Å) model can be built via comparative modelling. However, when confronted with low sequence similarity of the target protein (also known as twilight-zone protein, sequence identity with available templates is less than 30%), the protein structure prediction has to be initiated from scratch. Traditionally, twilight-zone proteins can be predicted via threading or ab initio method. Based on the current trend, combination of different methods brings an improved success in the prediction of twilight-zone proteins. In this mini review, the methods, progresses and challenges for the prediction of twilight-zone proteins were discussed.

  2. Structural alignment of protein descriptors - a combinatorial model.

    PubMed

    Antczak, Maciej; Kasprzak, Marta; Lukasiak, Piotr; Blazewicz, Jacek

    2016-09-17

    Structural alignment of proteins is one of the most challenging problems in molecular biology. The tertiary structure of a protein strictly correlates with its function and computationally predicted structures are nowadays a main premise for understanding the latter. However, computationally derived 3D models often exhibit deviations from the native structure. A way to confirm a model is a comparison with other structures. The structural alignment of a pair of proteins can be defined with the use of a concept of protein descriptors. The protein descriptors are local substructures of protein molecules, which allow us to divide the original problem into a set of subproblems and, consequently, to propose a more efficient algorithmic solution. In the literature, one can find many applications of the descriptors concept that prove its usefulness for insight into protein 3D structures, but the proposed approaches are presented rather from the biological perspective than from the computational or algorithmic point of view. Efficient algorithms for identification and structural comparison of descriptors can become crucial components of methods for structural quality assessment as well as tertiary structure prediction. In this paper, we propose a new combinatorial model and new polynomial-time algorithms for the structural alignment of descriptors. The model is based on the maximum-size assignment problem, which we define here and prove that it can be solved in polynomial time. We demonstrate suitability of this approach by comparison with an exact backtracking algorithm. Besides a simplification coming from the combinatorial modeling, both on the conceptual and complexity level, we gain with this approach high quality of obtained results, in terms of 3D alignment accuracy and processing efficiency. All the proposed algorithms were developed and integrated in a computationally efficient tool descs-standalone, which allows the user to identify and structurally compare descriptors of biological molecules, such as proteins and RNAs. Both PDB (Protein Data Bank) and mmCIF (macromolecular Crystallographic Information File) formats are supported. The proposed tool is available as an open source project stored on GitHub ( https://github.com/mantczak/descs-standalone ).

  3. Random close packing in protein cores

    NASA Astrophysics Data System (ADS)

    Gaines, Jennifer C.; Smith, W. Wendell; Regan, Lynne; O'Hern, Corey S.

    2016-03-01

    Shortly after the determination of the first protein x-ray crystal structures, researchers analyzed their cores and reported packing fractions ϕ ≈0.75 , a value that is similar to close packing of equal-sized spheres. A limitation of these analyses was the use of extended atom models, rather than the more physically accurate explicit hydrogen model. The validity of the explicit hydrogen model was proved in our previous studies by its ability to predict the side chain dihedral angle distributions observed in proteins. In contrast, the extended atom model is not able to recapitulate the side chain dihedral angle distributions, and gives rise to large atomic clashes at side chain dihedral angle combinations that are highly probable in protein crystal structures. Here, we employ the explicit hydrogen model to calculate the packing fraction of the cores of over 200 high-resolution protein structures. We find that these protein cores have ϕ ≈0.56 , which is similar to results obtained from simulations of random packings of individual amino acids. This result provides a deeper understanding of the physical basis of protein structure that will enable predictions of the effects of amino acid mutations to protein cores and interfaces of known structure.

  4. Random close packing in protein cores.

    PubMed

    Gaines, Jennifer C; Smith, W Wendell; Regan, Lynne; O'Hern, Corey S

    2016-03-01

    Shortly after the determination of the first protein x-ray crystal structures, researchers analyzed their cores and reported packing fractions ϕ ≈ 0.75, a value that is similar to close packing of equal-sized spheres. A limitation of these analyses was the use of extended atom models, rather than the more physically accurate explicit hydrogen model. The validity of the explicit hydrogen model was proved in our previous studies by its ability to predict the side chain dihedral angle distributions observed in proteins. In contrast, the extended atom model is not able to recapitulate the side chain dihedral angle distributions, and gives rise to large atomic clashes at side chain dihedral angle combinations that are highly probable in protein crystal structures. Here, we employ the explicit hydrogen model to calculate the packing fraction of the cores of over 200 high-resolution protein structures. We find that these protein cores have ϕ ≈ 0.56, which is similar to results obtained from simulations of random packings of individual amino acids. This result provides a deeper understanding of the physical basis of protein structure that will enable predictions of the effects of amino acid mutations to protein cores and interfaces of known structure.

  5. Constraint Logic Programming approach to protein structure prediction.

    PubMed

    Dal Palù, Alessandro; Dovier, Agostino; Fogolari, Federico

    2004-11-30

    The protein structure prediction problem is one of the most challenging problems in biological sciences. Many approaches have been proposed using database information and/or simplified protein models. The protein structure prediction problem can be cast in the form of an optimization problem. Notwithstanding its importance, the problem has very seldom been tackled by Constraint Logic Programming, a declarative programming paradigm suitable for solving combinatorial optimization problems. Constraint Logic Programming techniques have been applied to the protein structure prediction problem on the face-centered cube lattice model. Molecular dynamics techniques, endowed with the notion of constraint, have been also exploited. Even using a very simplified model, Constraint Logic Programming on the face-centered cube lattice model allowed us to obtain acceptable results for a few small proteins. As a test implementation their (known) secondary structure and the presence of disulfide bridges are used as constraints. Simplified structures obtained in this way have been converted to all atom models with plausible structure. Results have been compared with a similar approach using a well-established technique as molecular dynamics. The results obtained on small proteins show that Constraint Logic Programming techniques can be employed for studying protein simplified models, which can be converted into realistic all atom models. The advantage of Constraint Logic Programming over other, much more explored, methodologies, resides in the rapid software prototyping, in the easy way of encoding heuristics, and in exploiting all the advances made in this research area, e.g. in constraint propagation and its use for pruning the huge search space.

  6. Automated de novo phasing and model building of coiled-coil proteins.

    PubMed

    Rämisch, Sebastian; Lizatović, Robert; André, Ingemar

    2015-03-01

    Models generated by de novo structure prediction can be very useful starting points for molecular replacement for systems where suitable structural homologues cannot be readily identified. Protein-protein complexes and de novo-designed proteins are examples of systems that can be challenging to phase. In this study, the potential of de novo models of protein complexes for use as starting points for molecular replacement is investigated. The approach is demonstrated using homomeric coiled-coil proteins, which are excellent model systems for oligomeric systems. Despite the stereotypical fold of coiled coils, initial phase estimation can be difficult and many structures have to be solved with experimental phasing. A method was developed for automatic structure determination of homomeric coiled coils from X-ray diffraction data. In a benchmark set of 24 coiled coils, ranging from dimers to pentamers with resolutions down to 2.5 Å, 22 systems were automatically solved, 11 of which had previously been solved by experimental phasing. The generated models contained 71-103% of the residues present in the deposited structures, had the correct sequence and had free R values that deviated on average by 0.01 from those of the respective reference structures. The electron-density maps were of sufficient quality that only minor manual editing was necessary to produce final structures. The method, named CCsolve, combines methods for de novo structure prediction, initial phase estimation and automated model building into one pipeline. CCsolve is robust against errors in the initial models and can readily be modified to make use of alternative crystallographic software. The results demonstrate the feasibility of de novo phasing of protein-protein complexes, an approach that could also be employed for other small systems beyond coiled coils.

  7. Modeling disordered protein interactions from biophysical principles

    PubMed Central

    Christoffer, Charles; Terashi, Genki

    2017-01-01

    Disordered protein-protein interactions (PPIs), those involving a folded protein and an intrinsically disordered protein (IDP), are prevalent in the cell, including important signaling and regulatory pathways. IDPs do not adopt a single dominant structure in isolation but often become ordered upon binding. To aid understanding of the molecular mechanisms of disordered PPIs, it is crucial to obtain the tertiary structure of the PPIs. However, experimental methods have difficulty in solving disordered PPIs and existing protein-protein and protein-peptide docking methods are not able to model them. Here we present a novel computational method, IDP-LZerD, which models the conformation of a disordered PPI by considering the biophysical binding mechanism of an IDP to a structured protein, whereby a local segment of the IDP initiates the interaction and subsequently the remaining IDP regions explore and coalesce around the initial binding site. On a dataset of 22 disordered PPIs with IDPs up to 69 amino acids, successful predictions were made for 21 bound and 18 unbound receptors. The successful modeling provides additional support for biophysical principles. Moreover, the new technique significantly expands the capability of protein structure modeling and provides crucial insights into the molecular mechanisms of disordered PPIs. PMID:28394890

  8. Comprehensive 3D-modeling of allergenic proteins and amino acid composition of potential conformational IgE epitopes

    PubMed Central

    Oezguen, Numan; Zhou, Bin; Negi, Surendra S.; Ivanciuc, Ovidiu; Schein, Catherine H.; Labesse, Gilles; Braun, Werner

    2008-01-01

    Similarities in sequences and 3D structures of allergenic proteins provide vital clues to identify clinically relevant IgE cross-reactivities. However, experimental 3D structures are available in the Protein Data Bank for only 5% (45/829) of all allergens catalogued in the Structural Database of Allergenic Proteins (SDAP, http://fermi.utmb.edu/SDAP). Here, an automated procedure was used to prepare 3D-models of all allergens where there was no experimentally determined 3D structure or high identity (95%) to another protein of known 3D structure. After a final selection by quality criteria, 433 reliable 3D models were retained and are available from our SDAP Website. The new 3D models extensively enhance our knowledge of allergen structures. As an example of their use, experimentally derived “continuous IgE epitopes” were mapped on 3 experimentally determined structures and 13 of our 3D-models of allergenic proteins. Large portions of these continuous sequences are not entirely on the surface and therefore cannot interact with IgE or other proteins. Only the surface exposed residues are constituents of “conformational IgE epitopes” which are not in all cases continuous in sequence. The surface exposed parts of the experimental determined continuous IgE epitopes showed a distinct statistical distribution as compared to their presence in typical protein-protein interfaces. The amino acids Ala, Ser, Asn, Gly and particularly Lys have a high propensity to occur in IgE binding sites. The 3D-models will facilitate further analysis of the common properties of IgE binding sites of allergenic proteins. PMID:18621419

  9. Protein Structure Classification and Loop Modeling Using Multiple Ramachandran Distributions.

    PubMed

    Najibi, Seyed Morteza; Maadooliat, Mehdi; Zhou, Lan; Huang, Jianhua Z; Gao, Xin

    2017-01-01

    Recently, the study of protein structures using angular representations has attracted much attention among structural biologists. The main challenge is how to efficiently model the continuous conformational space of the protein structures based on the differences and similarities between different Ramachandran plots. Despite the presence of statistical methods for modeling angular data of proteins, there is still a substantial need for more sophisticated and faster statistical tools to model the large-scale circular datasets. To address this need, we have developed a nonparametric method for collective estimation of multiple bivariate density functions for a collection of populations of protein backbone angles. The proposed method takes into account the circular nature of the angular data using trigonometric spline which is more efficient compared to existing methods. This collective density estimation approach is widely applicable when there is a need to estimate multiple density functions from different populations with common features. Moreover, the coefficients of adaptive basis expansion for the fitted densities provide a low-dimensional representation that is useful for visualization, clustering, and classification of the densities. The proposed method provides a novel and unique perspective to two important and challenging problems in protein structure research: structure-based protein classification and angular-sampling-based protein loop structure prediction.

  10. A Stochastic Evolutionary Model for Protein Structure Alignment and Phylogeny

    PubMed Central

    Challis, Christopher J.; Schmidler, Scott C.

    2012-01-01

    We present a stochastic process model for the joint evolution of protein primary and tertiary structure, suitable for use in alignment and estimation of phylogeny. Indels arise from a classic Links model, and mutations follow a standard substitution matrix, whereas backbone atoms diffuse in three-dimensional space according to an Ornstein–Uhlenbeck process. The model allows for simultaneous estimation of evolutionary distances, indel rates, structural drift rates, and alignments, while fully accounting for uncertainty. The inclusion of structural information enables phylogenetic inference on time scales not previously attainable with sequence evolution models. The model also provides a tool for testing evolutionary hypotheses and improving our understanding of protein structural evolution. PMID:22723302

  11. DeepQA: improving the estimation of single protein model quality with deep belief networks.

    PubMed

    Cao, Renzhi; Bhattacharya, Debswapna; Hou, Jie; Cheng, Jianlin

    2016-12-05

    Protein quality assessment (QA) useful for ranking and selecting protein models has long been viewed as one of the major challenges for protein tertiary structure prediction. Especially, estimating the quality of a single protein model, which is important for selecting a few good models out of a large model pool consisting of mostly low-quality models, is still a largely unsolved problem. We introduce a novel single-model quality assessment method DeepQA based on deep belief network that utilizes a number of selected features describing the quality of a model from different perspectives, such as energy, physio-chemical characteristics, and structural information. The deep belief network is trained on several large datasets consisting of models from the Critical Assessment of Protein Structure Prediction (CASP) experiments, several publicly available datasets, and models generated by our in-house ab initio method. Our experiments demonstrate that deep belief network has better performance compared to Support Vector Machines and Neural Networks on the protein model quality assessment problem, and our method DeepQA achieves the state-of-the-art performance on CASP11 dataset. It also outperformed two well-established methods in selecting good outlier models from a large set of models of mostly low quality generated by ab initio modeling methods. DeepQA is a useful deep learning tool for protein single model quality assessment and protein structure prediction. The source code, executable, document and training/test datasets of DeepQA for Linux is freely available to non-commercial users at http://cactus.rnet.missouri.edu/DeepQA/ .

  12. A novel structural tree for wrap-proteins, a subclass of (α+β)-proteins.

    PubMed

    Boshkova, Eugenia A; Gordeev, Alexey B; Efimov, Alexander V

    2014-01-01

    In this paper, a novel structural subclass of (α+β)-proteins is presented. A characteristic feature of these proteins and domains is that they consist of strongly twisted and coiled β-sheets wrapped around one or two α-helices, so they are referred to here as wrap-proteins. It is shown that overall folds of the wrap-proteins can be obtained by stepwise addition of α-helices and/or β-strands to the strongly twisted and coiled β-hairpin taken as the starting structure in modeling. As a result of modeling, a structural tree for the wrap-proteins was constructed that includes 201 folds of which 49 occur in known nonhomologous proteins.

  13. Improvement on a simplified model for protein folding simulation.

    PubMed

    Zhang, Ming; Chen, Changjun; He, Yi; Xiao, Yi

    2005-11-01

    Improvements were made on a simplified protein model--the Ramachandran model-to achieve better computer simulation of protein folding. To check the validity of such improvements, we chose the ultrafast folding protein Engrailed Homeodomain as an example and explored several aspects of its folding. The engrailed homeodomain is a mainly alpha-helical protein of 61 residues from Drosophila melanogaster. We found that the simplified model of Engrailed Homeodomain can fold into a global minimum state with a tertiary structure in good agreement with its native structure.

  14. A sampling-based method for ranking protein structural models by integrating multiple scores and features.

    PubMed

    Shi, Xiaohu; Zhang, Jingfen; He, Zhiquan; Shang, Yi; Xu, Dong

    2011-09-01

    One of the major challenges in protein tertiary structure prediction is structure quality assessment. In many cases, protein structure prediction tools generate good structural models, but fail to select the best models from a huge number of candidates as the final output. In this study, we developed a sampling-based machine-learning method to rank protein structural models by integrating multiple scores and features. First, features such as predicted secondary structure, solvent accessibility and residue-residue contact information are integrated by two Radial Basis Function (RBF) models trained from different datasets. Then, the two RBF scores and five selected scoring functions developed by others, i.e., Opus-CA, Opus-PSP, DFIRE, RAPDF, and Cheng Score are synthesized by a sampling method. At last, another integrated RBF model ranks the structural models according to the features of sampling distribution. We tested the proposed method by using two different datasets, including the CASP server prediction models of all CASP8 targets and a set of models generated by our in-house software MUFOLD. The test result shows that our method outperforms any individual scoring function on both best model selection, and overall correlation between the predicted ranking and the actual ranking of structural quality.

  15. Massive integration of diverse protein quality assessment methods to improve template based modeling in CASP11.

    PubMed

    Cao, Renzhi; Bhattacharya, Debswapna; Adhikari, Badri; Li, Jilong; Cheng, Jianlin

    2016-09-01

    Model evaluation and selection is an important step and a big challenge in template-based protein structure prediction. Individual model quality assessment methods designed for recognizing some specific properties of protein structures often fail to consistently select good models from a model pool because of their limitations. Therefore, combining multiple complimentary quality assessment methods is useful for improving model ranking and consequently tertiary structure prediction. Here, we report the performance and analysis of our human tertiary structure predictor (MULTICOM) based on the massive integration of 14 diverse complementary quality assessment methods that was successfully benchmarked in the 11th Critical Assessment of Techniques of Protein Structure prediction (CASP11). The predictions of MULTICOM for 39 template-based domains were rigorously assessed by six scoring metrics covering global topology of Cα trace, local all-atom fitness, side chain quality, and physical reasonableness of the model. The results show that the massive integration of complementary, diverse single-model and multi-model quality assessment methods can effectively leverage the strength of single-model methods in distinguishing quality variation among similar good models and the advantage of multi-model quality assessment methods of identifying reasonable average-quality models. The overall excellent performance of the MULTICOM predictor demonstrates that integrating a large number of model quality assessment methods in conjunction with model clustering is a useful approach to improve the accuracy, diversity, and consequently robustness of template-based protein structure prediction. Proteins 2016; 84(Suppl 1):247-259. © 2015 Wiley Periodicals, Inc. © 2015 Wiley Periodicals, Inc.

  16. A benchmark testing ground for integrating homology modeling and protein docking.

    PubMed

    Bohnuud, Tanggis; Luo, Lingqi; Wodak, Shoshana J; Bonvin, Alexandre M J J; Weng, Zhiping; Vajda, Sandor; Schueler-Furman, Ora; Kozakov, Dima

    2017-01-01

    Protein docking procedures carry out the task of predicting the structure of a protein-protein complex starting from the known structures of the individual protein components. More often than not, however, the structure of one or both components is not known, but can be derived by homology modeling on the basis of known structures of related proteins deposited in the Protein Data Bank (PDB). Thus, the problem is to develop methods that optimally integrate homology modeling and docking with the goal of predicting the structure of a complex directly from the amino acid sequences of its component proteins. One possibility is to use the best available homology modeling and docking methods. However, the models built for the individual subunits often differ to a significant degree from the bound conformation in the complex, often much more so than the differences observed between free and bound structures of the same protein, and therefore additional conformational adjustments, both at the backbone and side chain levels need to be modeled to achieve an accurate docking prediction. In particular, even homology models of overall good accuracy frequently include localized errors that unfavorably impact docking results. The predicted reliability of the different regions in the model can also serve as a useful input for the docking calculations. Here we present a benchmark dataset that should help to explore and solve combined modeling and docking problems. This dataset comprises a subset of the experimentally solved 'target' complexes from the widely used Docking Benchmark from the Weng Lab (excluding antibody-antigen complexes). This subset is extended to include the structures from the PDB related to those of the individual components of each complex, and hence represent potential templates for investigating and benchmarking integrated homology modeling and docking approaches. Template sets can be dynamically customized by specifying ranges in sequence similarity and in PDB release dates, or using other filtering options, such as excluding sets of specific structures from the template list. Multiple sequence alignments, as well as structural alignments of the templates to their corresponding subunits in the target are also provided. The resource is accessible online or can be downloaded at http://cluspro.org/benchmark, and is updated on a weekly basis in synchrony with new PDB releases. Proteins 2016; 85:10-16. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.

  17. A resource for benchmarking the usefulness of protein structure models.

    PubMed

    Carbajo, Daniel; Tramontano, Anna

    2012-08-02

    Increasingly, biologists and biochemists use computational tools to design experiments to probe the function of proteins and/or to engineer them for a variety of different purposes. The most effective strategies rely on the knowledge of the three-dimensional structure of the protein of interest. However it is often the case that an experimental structure is not available and that models of different quality are used instead. On the other hand, the relationship between the quality of a model and its appropriate use is not easy to derive in general, and so far it has been analyzed in detail only for specific application. This paper describes a database and related software tools that allow testing of a given structure based method on models of a protein representing different levels of accuracy. The comparison of the results of a computational experiment on the experimental structure and on a set of its decoy models will allow developers and users to assess which is the specific threshold of accuracy required to perform the task effectively. The ModelDB server automatically builds decoy models of different accuracy for a given protein of known structure and provides a set of useful tools for their analysis. Pre-computed data for a non-redundant set of deposited protein structures are available for analysis and download in the ModelDB database. IMPLEMENTATION, AVAILABILITY AND REQUIREMENTS: Project name: A resource for benchmarking the usefulness of protein structure models. Project home page: http://bl210.caspur.it/MODEL-DB/MODEL-DB_web/MODindex.php.Operating system(s): Platform independent. Programming language: Perl-BioPerl (program); mySQL, Perl DBI and DBD modules (database); php, JavaScript, Jmol scripting (web server). Other requirements: Java Runtime Environment v1.4 or later, Perl, BioPerl, CPAN modules, HHsearch, Modeller, LGA, NCBI Blast package, DSSP, Speedfill (Surfnet) and PSAIA. License: Free. Any restrictions to use by non-academics: No.

  18. Approaches to ab initio molecular replacement of α-helical transmembrane proteins.

    PubMed

    Thomas, Jens M H; Simkovic, Felix; Keegan, Ronan; Mayans, Olga; Zhang, Chengxin; Zhang, Yang; Rigden, Daniel J

    2017-12-01

    α-Helical transmembrane proteins are a ubiquitous and important class of proteins, but present difficulties for crystallographic structure solution. Here, the effectiveness of the AMPLE molecular replacement pipeline in solving α-helical transmembrane-protein structures is assessed using a small library of eight ideal helices, as well as search models derived from ab initio models generated both with and without evolutionary contact information. The ideal helices prove to be surprisingly effective at solving higher resolution structures, but ab initio-derived search models are able to solve structures that could not be solved with the ideal helices. The addition of evolutionary contact information results in a marked improvement in the modelling and makes additional solutions possible.

  19. Protein Structure and Function Prediction Using I-TASSER

    PubMed Central

    Yang, Jianyi; Zhang, Yang

    2016-01-01

    I-TASSER is a hierarchical protocol for automated protein structure prediction and structure-based function annotation. Starting from the amino acid sequence of target proteins, I-TASSER first generates full-length atomic structural models from multiple threading alignments and iterative structural assembly simulations followed by atomic-level structure refinement. The biological functions of the protein, including ligand-binding sites, enzyme commission number, and gene ontology terms, are then inferred from known protein function databases based on sequence and structure profile comparisons. I-TASSER is freely available as both an on-line server and a stand-alone package. This unit describes how to use the I-TASSER protocol to generate structure and function prediction and how to interpret the prediction results, as well as alternative approaches for further improving the I-TASSER modeling quality for distant-homologous and multi-domain protein targets. PMID:26678386

  20. Assessment of Protein Side-Chain Conformation Prediction Methods in Different Residue Environments

    PubMed Central

    Peterson, Lenna X.; Kang, Xuejiao; Kihara, Daisuke

    2016-01-01

    Computational prediction of side-chain conformation is an important component of protein structure prediction. Accurate side-chain prediction is crucial for practical applications of protein structure models that need atomic detailed resolution such as protein and ligand design. We evaluated the accuracy of eight side-chain prediction methods in reproducing the side-chain conformations of experimentally solved structures deposited to the Protein Data Bank. Prediction accuracy was evaluated for a total of four different structural environments (buried, surface, interface, and membrane-spanning) in three different protein types (monomeric, multimeric, and membrane). Overall, the highest accuracy was observed for buried residues in monomeric and multimeric proteins. Notably, side-chains at protein interfaces and membrane-spanning regions were better predicted than surface residues even though the methods did not all use multimeric and membrane proteins for training. Thus, we conclude that the current methods are as practically useful for modeling protein docking interfaces and membrane-spanning regions as for modeling monomers. PMID:24619909

  1. Surface layer protein characterization by small angle x-ray scattering and a fractal mean force concept: from protein structure to nanodisk assemblies.

    PubMed

    Horejs, Christine; Pum, Dietmar; Sleytr, Uwe B; Peterlik, Herwig; Jungbauer, Alois; Tscheliessnig, Rupert

    2010-11-07

    Surface layers (S-layers) are the most commonly observed cell surface structure of prokaryotic organisms. They are made up of proteins that spontaneously self-assemble into functional crystalline lattices in solution, on various solid surfaces, and interfaces. While classical experimental techniques failed to recover a complete structural model of an unmodified S-layer protein, small angle x-ray scattering (SAXS) provides an opportunity to study the structure of S-layer monomers in solution and of self-assembled two-dimensional sheets. For the protein under investigation we recently suggested an atomistic structural model by the use of molecular dynamics simulations. This structural model is now refined on the basis of SAXS data together with a fractal assembly approach. Here we show that a nondiluted critical system of proteins, which crystallize into monomolecular structures, might be analyzed by SAXS if protein-protein interactions are taken into account by relating a fractal local density distribution to a fractal local mean potential, which has to fulfill the Poisson equation. The present work demonstrates an important step into the elucidation of the structure of S-layers and offers a tool to analyze the structure of self-assembling systems in solution by means of SAXS and computer simulations.

  2. Surface layer protein characterization by small angle x-ray scattering and a fractal mean force concept: From protein structure to nanodisk assemblies

    NASA Astrophysics Data System (ADS)

    Horejs, Christine; Pum, Dietmar; Sleytr, Uwe B.; Peterlik, Herwig; Jungbauer, Alois; Tscheliessnig, Rupert

    2010-11-01

    Surface layers (S-layers) are the most commonly observed cell surface structure of prokaryotic organisms. They are made up of proteins that spontaneously self-assemble into functional crystalline lattices in solution, on various solid surfaces, and interfaces. While classical experimental techniques failed to recover a complete structural model of an unmodified S-layer protein, small angle x-ray scattering (SAXS) provides an opportunity to study the structure of S-layer monomers in solution and of self-assembled two-dimensional sheets. For the protein under investigation we recently suggested an atomistic structural model by the use of molecular dynamics simulations. This structural model is now refined on the basis of SAXS data together with a fractal assembly approach. Here we show that a nondiluted critical system of proteins, which crystallize into monomolecular structures, might be analyzed by SAXS if protein-protein interactions are taken into account by relating a fractal local density distribution to a fractal local mean potential, which has to fulfill the Poisson equation. The present work demonstrates an important step into the elucidation of the structure of S-layers and offers a tool to analyze the structure of self-assembling systems in solution by means of SAXS and computer simulations.

  3. BAYESIAN PROTEIN STRUCTURE ALIGNMENT.

    PubMed

    Rodriguez, Abel; Schmidler, Scott C

    The analysis of the three-dimensional structure of proteins is an important topic in molecular biochemistry. Structure plays a critical role in defining the function of proteins and is more strongly conserved than amino acid sequence over evolutionary timescales. A key challenge is the identification and evaluation of structural similarity between proteins; such analysis can aid in understanding the role of newly discovered proteins and help elucidate evolutionary relationships between organisms. Computational biologists have developed many clever algorithmic techniques for comparing protein structures, however, all are based on heuristic optimization criteria, making statistical interpretation somewhat difficult. Here we present a fully probabilistic framework for pairwise structural alignment of proteins. Our approach has several advantages, including the ability to capture alignment uncertainty and to estimate key "gap" parameters which critically affect the quality of the alignment. We show that several existing alignment methods arise as maximum a posteriori estimates under specific choices of prior distributions and error models. Our probabilistic framework is also easily extended to incorporate additional information, which we demonstrate by including primary sequence information to generate simultaneous sequence-structure alignments that can resolve ambiguities obtained using structure alone. This combined model also provides a natural approach for the difficult task of estimating evolutionary distance based on structural alignments. The model is illustrated by comparison with well-established methods on several challenging protein alignment examples.

  4. Surface layer protein characterization by small angle x-ray scattering and a fractal mean force concept: From protein structure to nanodisk assemblies

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Horejs, Christine; Pum, Dietmar; Sleytr, Uwe B.

    2010-11-07

    Surface layers (S-layers) are the most commonly observed cell surface structure of prokaryotic organisms. They are made up of proteins that spontaneously self-assemble into functional crystalline lattices in solution, on various solid surfaces, and interfaces. While classical experimental techniques failed to recover a complete structural model of an unmodified S-layer protein, small angle x-ray scattering (SAXS) provides an opportunity to study the structure of S-layer monomers in solution and of self-assembled two-dimensional sheets. For the protein under investigation we recently suggested an atomistic structural model by the use of molecular dynamics simulations. This structural model is now refined on themore » basis of SAXS data together with a fractal assembly approach. Here we show that a nondiluted critical system of proteins, which crystallize into monomolecular structures, might be analyzed by SAXS if protein-protein interactions are taken into account by relating a fractal local density distribution to a fractal local mean potential, which has to fulfill the Poisson equation. The present work demonstrates an important step into the elucidation of the structure of S-layers and offers a tool to analyze the structure of self-assembling systems in solution by means of SAXS and computer simulations.« less

  5. PROGEN: An automated modelling algorithm for the generation of complete protein structures from the α-carbon atomic coordinates

    NASA Astrophysics Data System (ADS)

    Mandal, Chhabinath; Linthicum, D. Scott

    1993-04-01

    A modelling algorithm (PROGEN) for the generation of complete protein atomic coordinates from only the α-carbon coordinates is described. PROGEN utilizes an optimal geometry parameter (OGP) database for the positioning of atoms for each amino acid of the polypeptide model. The OGP database was established by examining the statistical correlations between 23 different intra-peptide and inter-peptide geometric parameters relative to the α-carbon distances for each amino acid in a library of 19 known proteins from the Brookhaven Protein Database (BPDB). The OGP files for specific amino acids and peptides were used to generate the atomic positions, with respect to α-carbons, for main-chain and side-chain atoms in the modelled structure. Refinement of the initial model was accomplished using energy minimization (EM) and molecular dynamics techniques. PROGEN was tested using 60 known proteins in the BPDB, representing a wide spectrum of primary and secondary structures. Comparison between PROGEN models and BPDB crystal reference structures gave r.m.s.d. values for peptide main-chain atoms between 0.29 and 0.76 Å, with a grand average of 0.53 Å for all 60 models. The r.m.s.d. for all non-hydrogen atoms ranged between 1.44 and 1.93 Å for the 60 polypeptide models. PROGEN was also able to make the correct assignment of cis- or trans-proline configurations in the protein structures examined. PROGEN offers a fully automatic building and refinement procedure and requires no special or specific structural considerations for the protein to be modelled.

  6. Hidden Markov model approach for identifying the modular framework of the protein backbone.

    PubMed

    Camproux, A C; Tuffery, P; Chevrolat, J P; Boisvieux, J F; Hazout, S

    1999-12-01

    The hidden Markov model (HMM) was used to identify recurrent short 3D structural building blocks (SBBs) describing protein backbones, independently of any a priori knowledge. Polypeptide chains are decomposed into a series of short segments defined by their inter-alpha-carbon distances. Basically, the model takes into account the sequentiality of the observed segments and assumes that each one corresponds to one of several possible SBBs. Fitting the model to a database of non-redundant proteins allowed us to decode proteins in terms of 12 distinct SBBs with different roles in protein structure. Some SBBs correspond to classical regular secondary structures. Others correspond to a significant subdivision of their bounding regions previously considered to be a single pattern. The major contribution of the HMM is that this model implicitly takes into account the sequential connections between SBBs and thus describes the most probable pathways by which the blocks are connected to form the framework of the protein structures. Validation of the SBBs code was performed by extracting SBB series repeated in recoding proteins and examining their structural similarities. Preliminary results on the sequence specificity of SBBs suggest promising perspectives for the prediction of SBBs or series of SBBs from the protein sequences.

  7. The network organization of protein interactions in the spliceosome is reproduced by the simple rules of food-web models

    PubMed Central

    Pires, Mathias M.; Cantor, Maurício; Guimarães, Paulo R.; de Aguiar, Marcus A. M.; dos Reis, Sérgio F.; Coltri, Patricia P.

    2015-01-01

    The network structure of biological systems provides information on the underlying processes shaping their organization and dynamics. Here we examined the structure of the network depicting protein interactions within the spliceosome, the macromolecular complex responsible for splicing in eukaryotic cells. We show the interactions of less connected spliceosome proteins are nested subsets of the connections of the highly connected proteins. At the same time, the network has a modular structure with groups of proteins sharing similar interaction patterns. We then investigated the role of affinity and specificity in shaping the spliceosome network by adapting a probabilistic model originally designed to reproduce food webs. This food-web model was as successful in reproducing the structure of protein interactions as it is in reproducing interactions among species. The good performance of the model suggests affinity and specificity, partially determined by protein size and the timing of association to the complex, may be determining network structure. Moreover, because network models allow building ensembles of realistic networks while encompassing uncertainty they can be useful to examine the dynamics and vulnerability of intracelullar processes. Unraveling the mechanisms organizing the spliceosome interactions is important to characterize the role of individual proteins on splicing catalysis and regulation. PMID:26443080

  8. Microgravity

    NASA Image and Video Library

    2000-04-19

    Dr. Marc Pusey (seated) and Dr. Craig Kundrot use computers to analyze x-ray maps and generate three-dimensional models of protein structures. With this information, scientists at Marshall Space Flight Center can learn how proteins are made and how they work. The computer screen depicts a proten structure as a ball-and-stick model. Other models depict the actual volume occupied by the atoms, or the ribbon-like structures that are crucial to a protein's function.

  9. Dynameomics: data-driven methods and models for utilizing large-scale protein structure repositories for improving fragment-based loop prediction.

    PubMed

    Rysavy, Steven J; Beck, David A C; Daggett, Valerie

    2014-11-01

    Protein function is intimately linked to protein structure and dynamics yet experimentally determined structures frequently omit regions within a protein due to indeterminate data, which is often due protein dynamics. We propose that atomistic molecular dynamics simulations provide a diverse sampling of biologically relevant structures for these missing segments (and beyond) to improve structural modeling and structure prediction. Here we make use of the Dynameomics data warehouse, which contains simulations of representatives of essentially all known protein folds. We developed novel computational methods to efficiently identify, rank and retrieve small peptide structures, or fragments, from this database. We also created a novel data model to analyze and compare large repositories of structural data, such as contained within the Protein Data Bank and the Dynameomics data warehouse. Our evaluation compares these structural repositories for improving loop predictions and analyzes the utility of our methods and models. Using a standard set of loop structures, containing 510 loops, 30 for each loop length from 4 to 20 residues, we find that the inclusion of Dynameomics structures in fragment-based methods improves the quality of the loop predictions without being dependent on sequence homology. Depending on loop length, ∼ 25-75% of the best predictions came from the Dynameomics set, resulting in lower main chain root-mean-square deviations for all fragment lengths using the combined fragment library. We also provide specific cases where Dynameomics fragments provide better predictions for NMR loop structures than fragments from crystal structures. Online access to these fragment libraries is available at http://www.dynameomics.org/fragments. © 2014 The Protein Society.

  10. Structural characterisation of medically relevant protein assemblies by integrating mass spectrometry with computational modelling.

    PubMed

    Politis, Argyris; Schmidt, Carla

    2018-03-20

    Structural mass spectrometry with its various techniques is a powerful tool for the structural elucidation of medically relevant protein assemblies. It delivers information on the composition, stoichiometries, interactions and topologies of these assemblies. Most importantly it can deal with heterogeneous mixtures and assemblies which makes it universal among the conventional structural techniques. In this review we summarise recent advances and challenges in structural mass spectrometric techniques. We describe how the combination of the different mass spectrometry-based methods with computational strategies enable structural models at molecular levels of resolution. These models hold significant potential for helping us in characterizing the function of protein assemblies related to human health and disease. In this review we summarise the techniques of structural mass spectrometry often applied when studying protein-ligand complexes. We exemplify these techniques through recent examples from literature that helped in the understanding of medically relevant protein assemblies. We further provide a detailed introduction into various computational approaches that can be integrated with these mass spectrometric techniques. Last but not least we discuss case studies that integrated mass spectrometry and computational modelling approaches and yielded models of medically important protein assembly states such as fibrils and amyloids. Copyright © 2017 The Author(s). Published by Elsevier B.V. All rights reserved.

  11. Incorporating modeling and simulations in undergraduate biophysical chemistry course to promote understanding of structure-dynamics-function relationships in proteins.

    PubMed

    Hati, Sanchita; Bhattacharyya, Sudeep

    2016-01-01

    A project-based biophysical chemistry laboratory course, which is offered to the biochemistry and molecular biology majors in their senior year, is described. In this course, the classroom study of the structure-function of biomolecules is integrated with the discovery-guided laboratory study of these molecules using computer modeling and simulations. In particular, modern computational tools are employed to elucidate the relationship between structure, dynamics, and function in proteins. Computer-based laboratory protocols that we introduced in three modules allow students to visualize the secondary, super-secondary, and tertiary structures of proteins, analyze non-covalent interactions in protein-ligand complexes, develop three-dimensional structural models (homology model) for new protein sequences and evaluate their structural qualities, and study proteins' intrinsic dynamics to understand their functions. In the fourth module, students are assigned to an authentic research problem, where they apply their laboratory skills (acquired in modules 1-3) to answer conceptual biophysical questions. Through this process, students gain in-depth understanding of protein dynamics-the missing link between structure and function. Additionally, the requirement of term papers sharpens students' writing and communication skills. Finally, these projects result in new findings that are communicated in peer-reviewed journals. © 2016 The International Union of Biochemistry and Molecular Biology.

  12. Modeling Protein Expression and Protein Signaling Pathways

    PubMed Central

    Telesca, Donatello; Müller, Peter; Kornblau, Steven M.; Suchard, Marc A.; Ji, Yuan

    2015-01-01

    High-throughput functional proteomic technologies provide a way to quantify the expression of proteins of interest. Statistical inference centers on identifying the activation state of proteins and their patterns of molecular interaction formalized as dependence structure. Inference on dependence structure is particularly important when proteins are selected because they are part of a common molecular pathway. In that case, inference on dependence structure reveals properties of the underlying pathway. We propose a probability model that represents molecular interactions at the level of hidden binary latent variables that can be interpreted as indicators for active versus inactive states of the proteins. The proposed approach exploits available expert knowledge about the target pathway to define an informative prior on the hidden conditional dependence structure. An important feature of this prior is that it provides an instrument to explicitly anchor the model space to a set of interactions of interest, favoring a local search approach to model determination. We apply our model to reverse-phase protein array data from a study on acute myeloid leukemia. Our inference identifies relevant subpathways in relation to the unfolding of the biological process under study. PMID:26246646

  13. Benchmarking Inverse Statistical Approaches for Protein Structure and Design with Exactly Solvable Models.

    PubMed

    Jacquin, Hugo; Gilson, Amy; Shakhnovich, Eugene; Cocco, Simona; Monasson, Rémi

    2016-05-01

    Inverse statistical approaches to determine protein structure and function from Multiple Sequence Alignments (MSA) are emerging as powerful tools in computational biology. However the underlying assumptions of the relationship between the inferred effective Potts Hamiltonian and real protein structure and energetics remain untested so far. Here we use lattice protein model (LP) to benchmark those inverse statistical approaches. We build MSA of highly stable sequences in target LP structures, and infer the effective pairwise Potts Hamiltonians from those MSA. We find that inferred Potts Hamiltonians reproduce many important aspects of 'true' LP structures and energetics. Careful analysis reveals that effective pairwise couplings in inferred Potts Hamiltonians depend not only on the energetics of the native structure but also on competing folds; in particular, the coupling values reflect both positive design (stabilization of native conformation) and negative design (destabilization of competing folds). In addition to providing detailed structural information, the inferred Potts models used as protein Hamiltonian for design of new sequences are able to generate with high probability completely new sequences with the desired folds, which is not possible using independent-site models. Those are remarkable results as the effective LP Hamiltonians used to generate MSA are not simple pairwise models due to the competition between the folds. Our findings elucidate the reasons for the success of inverse approaches to the modelling of proteins from sequence data, and their limitations.

  14. De novo protein structure prediction by dynamic fragment assembly and conformational space annealing.

    PubMed

    Lee, Juyong; Lee, Jinhyuk; Sasaki, Takeshi N; Sasai, Masaki; Seok, Chaok; Lee, Jooyoung

    2011-08-01

    Ab initio protein structure prediction is a challenging problem that requires both an accurate energetic representation of a protein structure and an efficient conformational sampling method for successful protein modeling. In this article, we present an ab initio structure prediction method which combines a recently suggested novel way of fragment assembly, dynamic fragment assembly (DFA) and conformational space annealing (CSA) algorithm. In DFA, model structures are scored by continuous functions constructed based on short- and long-range structural restraint information from a fragment library. Here, DFA is represented by the full-atom model by CHARMM with the addition of the empirical potential of DFIRE. The relative contributions between various energy terms are optimized using linear programming. The conformational sampling was carried out with CSA algorithm, which can find low energy conformations more efficiently than simulated annealing used in the existing DFA study. The newly introduced DFA energy function and CSA sampling algorithm are implemented into CHARMM. Test results on 30 small single-domain proteins and 13 template-free modeling targets of the 8th Critical Assessment of protein Structure Prediction show that the current method provides comparable and complementary prediction results to existing top methods. Copyright © 2011 Wiley-Liss, Inc.

  15. Computational modeling of Repeat1 region of INI1/hSNF5: An evolutionary link with ubiquitin.

    PubMed

    Bhutoria, Savita; Kalpana, Ganjam V; Acharya, Seetharama A

    2016-09-01

    The structure of a protein can be very informative of its function. However, determining protein structures experimentally can often be very challenging. Computational methods have been used successfully in modeling structures with sufficient accuracy. Here we have used computational tools to predict the structure of an evolutionarily conserved and functionally significant domain of Integrase interactor (INI)1/hSNF5 protein. INI1 is a component of the chromatin remodeling SWI/SNF complex, a tumor suppressor and is involved in many protein-protein interactions. It belongs to SNF5 family of proteins that contain two conserved repeat (Rpt) domains. Rpt1 domain of INI1 binds to HIV-1 Integrase, and acts as a dominant negative mutant to inhibit viral replication. Rpt1 domain also interacts with oncogene c-MYC and modulates its transcriptional activity. We carried out an ab initio modeling of a segment of INI1 protein containing the Rpt1 domain. The structural model suggested the presence of a compact and well defined ββαα topology as core structure in the Rpt1 domain of INI1. This topology in Rpt1 was similar to PFU domain of Phospholipase A2 Activating Protein, PLAA. Interestingly, PFU domain shares similarity with Ubiquitin and has ubiquitin binding activity. Because of the structural similarity between Rpt1 domain of INI1 and PFU domain of PLAA, we propose that Rpt1 domain of INI1 may participate in ubiquitin recognition or binding with ubiquitin or ubiquitin related proteins. This modeling study may shed light on the mode of interactions of Rpt1 domain of INI1 and is likely to facilitate future functional studies of INI1. © 2016 The Protein Society.

  16. Models of protein-ligand crystal structures: trust, but verify.

    PubMed

    Deller, Marc C; Rupp, Bernhard

    2015-09-01

    X-ray crystallography provides the most accurate models of protein-ligand structures. These models serve as the foundation of many computational methods including structure prediction, molecular modelling, and structure-based drug design. The success of these computational methods ultimately depends on the quality of the underlying protein-ligand models. X-ray crystallography offers the unparalleled advantage of a clear mathematical formalism relating the experimental data to the protein-ligand model. In the case of X-ray crystallography, the primary experimental evidence is the electron density of the molecules forming the crystal. The first step in the generation of an accurate and precise crystallographic model is the interpretation of the electron density of the crystal, typically carried out by construction of an atomic model. The atomic model must then be validated for fit to the experimental electron density and also for agreement with prior expectations of stereochemistry. Stringent validation of protein-ligand models has become possible as a result of the mandatory deposition of primary diffraction data, and many computational tools are now available to aid in the validation process. Validation of protein-ligand complexes has revealed some instances of overenthusiastic interpretation of ligand density. Fundamental concepts and metrics of protein-ligand quality validation are discussed and we highlight software tools to assist in this process. It is essential that end users select high quality protein-ligand models for their computational and biological studies, and we provide an overview of how this can be achieved.

  17. Massive integration of diverse protein quality assessment methods to improve template based modeling in CASP11

    PubMed Central

    Cao, Renzhi; Bhattacharya, Debswapna; Adhikari, Badri; Li, Jilong; Cheng, Jianlin

    2015-01-01

    Model evaluation and selection is an important step and a big challenge in template-based protein structure prediction. Individual model quality assessment methods designed for recognizing some specific properties of protein structures often fail to consistently select good models from a model pool because of their limitations. Therefore, combining multiple complimentary quality assessment methods is useful for improving model ranking and consequently tertiary structure prediction. Here, we report the performance and analysis of our human tertiary structure predictor (MULTICOM) based on the massive integration of 14 diverse complementary quality assessment methods that was successfully benchmarked in the 11th Critical Assessment of Techniques of Protein Structure prediction (CASP11). The predictions of MULTICOM for 39 template-based domains were rigorously assessed by six scoring metrics covering global topology of Cα trace, local all-atom fitness, side chain quality, and physical reasonableness of the model. The results show that the massive integration of complementary, diverse single-model and multi-model quality assessment methods can effectively leverage the strength of single-model methods in distinguishing quality variation among similar good models and the advantage of multi-model quality assessment methods of identifying reasonable average-quality models. The overall excellent performance of the MULTICOM predictor demonstrates that integrating a large number of model quality assessment methods in conjunction with model clustering is a useful approach to improve the accuracy, diversity, and consequently robustness of template-based protein structure prediction. PMID:26369671

  18. Effect of fullerenol surface chemistry on nanoparticle binding-induced protein misfolding

    NASA Astrophysics Data System (ADS)

    Radic, Slaven; Nedumpully-Govindan, Praveen; Chen, Ran; Salonen, Emppu; Brown, Jared M.; Ke, Pu Chun; Ding, Feng

    2014-06-01

    Fullerene and its derivatives with different surface chemistry have great potential in biomedical applications. Accordingly, it is important to delineate the impact of these carbon-based nanoparticles on protein structure, dynamics, and subsequently function. Here, we focused on the effect of hydroxylation -- a common strategy for solubilizing and functionalizing fullerene -- on protein-nanoparticle interactions using a model protein, ubiquitin. We applied a set of complementary computational modeling methods, including docking and molecular dynamics simulations with both explicit and implicit solvent, to illustrate the impact of hydroxylated fullerenes on the structure and dynamics of ubiquitin. We found that all derivatives bound to the model protein. Specifically, the more hydrophilic nanoparticles with a higher number of hydroxyl groups bound to the surface of the protein via hydrogen bonds, which stabilized the protein without inducing large conformational changes in the protein structure. In contrast, fullerene derivatives with a smaller number of hydroxyl groups buried their hydrophobic surface inside the protein, thereby causing protein denaturation. Overall, our results revealed a distinct role of surface chemistry on nanoparticle-protein binding and binding-induced protein misfolding.Fullerene and its derivatives with different surface chemistry have great potential in biomedical applications. Accordingly, it is important to delineate the impact of these carbon-based nanoparticles on protein structure, dynamics, and subsequently function. Here, we focused on the effect of hydroxylation -- a common strategy for solubilizing and functionalizing fullerene -- on protein-nanoparticle interactions using a model protein, ubiquitin. We applied a set of complementary computational modeling methods, including docking and molecular dynamics simulations with both explicit and implicit solvent, to illustrate the impact of hydroxylated fullerenes on the structure and dynamics of ubiquitin. We found that all derivatives bound to the model protein. Specifically, the more hydrophilic nanoparticles with a higher number of hydroxyl groups bound to the surface of the protein via hydrogen bonds, which stabilized the protein without inducing large conformational changes in the protein structure. In contrast, fullerene derivatives with a smaller number of hydroxyl groups buried their hydrophobic surface inside the protein, thereby causing protein denaturation. Overall, our results revealed a distinct role of surface chemistry on nanoparticle-protein binding and binding-induced protein misfolding. Electronic supplementary information (ESI) is available: Fluorescence spectra, ITC, CD spectra and other data as described in the text. See DOI: 10.1039/c4nr01544d

  19. Modelling and enhanced molecular dynamics to steer structure-based drug discovery.

    PubMed

    Kalyaanamoorthy, Subha; Chen, Yi-Ping Phoebe

    2014-05-01

    The ever-increasing gap between the availabilities of the genome sequences and the crystal structures of proteins remains one of the significant challenges to the modern drug discovery efforts. The knowledge of structure-dynamics-functionalities of proteins is important in order to understand several key aspects of structure-based drug discovery, such as drug-protein interactions, drug binding and unbinding mechanisms and protein-protein interactions. This review presents a brief overview on the different state of the art computational approaches that are applied for protein structure modelling and molecular dynamics simulations of biological systems. We give an essence of how different enhanced sampling molecular dynamics approaches, together with regular molecular dynamics methods, assist in steering the structure based drug discovery processes. Copyright © 2013 Elsevier Ltd. All rights reserved.

  20. Conformational Transitions upon Ligand Binding: Holo-Structure Prediction from Apo Conformations

    PubMed Central

    Seeliger, Daniel; de Groot, Bert L.

    2010-01-01

    Biological function of proteins is frequently associated with the formation of complexes with small-molecule ligands. Experimental structure determination of such complexes at atomic resolution, however, can be time-consuming and costly. Computational methods for structure prediction of protein/ligand complexes, particularly docking, are as yet restricted by their limited consideration of receptor flexibility, rendering them not applicable for predicting protein/ligand complexes if large conformational changes of the receptor upon ligand binding are involved. Accurate receptor models in the ligand-bound state (holo structures), however, are a prerequisite for successful structure-based drug design. Hence, if only an unbound (apo) structure is available distinct from the ligand-bound conformation, structure-based drug design is severely limited. We present a method to predict the structure of protein/ligand complexes based solely on the apo structure, the ligand and the radius of gyration of the holo structure. The method is applied to ten cases in which proteins undergo structural rearrangements of up to 7.1 Å backbone RMSD upon ligand binding. In all cases, receptor models within 1.6 Å backbone RMSD to the target were predicted and close-to-native ligand binding poses were obtained for 8 of 10 cases in the top-ranked complex models. A protocol is presented that is expected to enable structure modeling of protein/ligand complexes and structure-based drug design for cases where crystal structures of ligand-bound conformations are not available. PMID:20066034

  1. CCBuilder 2.0: Powerful and accessible coiled-coil modeling.

    PubMed

    Wood, Christopher W; Woolfson, Derek N

    2018-01-01

    The increased availability of user-friendly and accessible computational tools for biomolecular modeling would expand the reach and application of biomolecular engineering and design. For protein modeling, one key challenge is to reduce the complexities of 3D protein folds to sets of parametric equations that nonetheless capture the salient features of these structures accurately. At present, this is possible for a subset of proteins, namely, repeat proteins. The α-helical coiled coil provides one such example, which represents ≈ 3-5% of all known protein-encoding regions of DNA. Coiled coils are bundles of α helices that can be described by a small set of structural parameters. Here we describe how this parametric description can be implemented in an easy-to-use web application, called CCBuilder 2.0, for modeling and optimizing both α-helical coiled coils and polyproline-based collagen triple helices. This has many applications from providing models to aid molecular replacement for X-ray crystallography, in silico model building and engineering of natural and designed protein assemblies, and through to the creation of completely de novo "dark matter" protein structures. CCBuilder 2.0 is available as a web-based application, the code for which is open-source and can be downloaded freely. http://coiledcoils.chm.bris.ac.uk/ccbuilder2. We have created CCBuilder 2.0, an easy to use web-based application that can model structures for a whole class of proteins, the α-helical coiled coil, which is estimated to account for 3-5% of all proteins in nature. CCBuilder 2.0 will be of use to a large number of protein scientists engaged in fundamental studies, such as protein structure determination, through to more-applied research including designing and engineering novel proteins that have potential applications in biotechnology. © 2017 The Authors Protein Science published by Wiley Periodicals, Inc. on behalf of The Protein Society.

  2. Covering complete proteomes with X-ray structures: A current snapshot

    DOE PAGES

    Mizianty, Marcin J.; Fan, Xiao; Yan, Jing; ...

    2014-10-23

    Structural genomics programs have developed and applied structure-determination pipelines to a wide range of protein targets, facilitating the visualization of macromolecular interactions and the understanding of their molecular and biochemical functions. The fundamental question of whether three-dimensional structures of all proteins and all functional annotations can be determined using X-ray crystallography is investigated. A first-of-its-kind large-scale analysis of crystallization propensity for all proteins encoded in 1953 fully sequenced genomes was performed. It is shown that current X-ray crystallographic knowhow combined with homology modeling can provide structures for 25% of modeling families (protein clusters for which structural models can be obtainedmore » through homology modeling), with at least one structural model produced for each Gene Ontology functional annotation. The coverage varies between superkingdoms, with 19% for eukaryotes, 35% for bacteria and 49% for archaea, and with those of viruses following the coverage values of their hosts. It is shown that the crystallization propensities of proteomes from the taxonomic superkingdoms are distinct. The use of knowledge-based target selection is shown to substantially increase the ability to produce X-ray structures. It is demonstrated that the human proteome has one of the highest attainable coverage values among eukaryotes, and GPCR membrane proteins suitable for X-ray structure determination were determined.« less

  3. An automated method for modeling proteins on known templates using distance geometry.

    PubMed

    Srinivasan, S; March, C J; Sudarsanam, S

    1993-02-01

    We present an automated method incorporated into a software package, FOLDER, to fold a protein sequence on a given three-dimensional (3D) template. Starting with the sequence alignment of a family of homologous proteins, tertiary structures are modeled using the known 3D structure of one member of the family as a template. Homologous interatomic distances from the template are used as constraints. For nonhomologous regions in the model protein, the lower and the upper bounds for the interatomic distances are imposed by steric constraints and the globular dimensions of the template, respectively. Distance geometry is used to embed an ensemble of structures consistent with these distance bounds. Structures are selected from this ensemble based on minimal distance error criteria, after a penalty function optimization step. These structures are then refined using energy optimization methods. The method is tested by simulating the alpha-chain of horse hemoglobin using the alpha-chain of human hemoglobin as the template and by comparing the generated models with the crystal structure of the alpha-chain of horse hemoglobin. We also test the packing efficiency of this method by reconstructing the atomic positions of the interior side chains beyond C beta atoms of a protein domain from a known 3D structure. In both test cases, models retain the template constraints and any additionally imposed constraints while the packing of the interior residues is optimized with no short contacts or bond deformations. To demonstrate the use of this method in simulating structures of proteins with nonhomologous disulfides, we construct a model of murine interleukin (IL)-4 using the NMR structure of human IL-4 as the template. The resulting geometry of the nonhomologous disulfide in the model structure for murine IL-4 is consistent with standard disulfide geometry.

  4. Addressing recent docking challenges: A hybrid strategy to integrate template-based and free protein-protein docking.

    PubMed

    Yan, Yumeng; Wen, Zeyu; Wang, Xinxiang; Huang, Sheng-You

    2017-03-01

    Protein-protein docking is an important computational tool for predicting protein-protein interactions. With the rapid development of proteomics projects, more and more experimental binding information ranging from mutagenesis data to three-dimensional structures of protein complexes are becoming available. Therefore, how to appropriately incorporate the biological information into traditional ab initio docking has been an important issue and challenge in the field of protein-protein docking. To address these challenges, we have developed a Hybrid DOCKing protocol of template-based and template-free approaches, referred to as HDOCK. The basic procedure of HDOCK is to model the structures of individual components based on the template complex by a template-based method if a template is available; otherwise, the component structures will be modeled based on monomer proteins by regular homology modeling. Then, the complex structure of the component models is predicted by traditional protein-protein docking. With the HDOCK protocol, we have participated in the CPARI experiment for rounds 28-35. Out of the 25 CASP-CAPRI targets for oligomer modeling, our HDOCK protocol predicted correct models for 16 targets, ranking one of the top algorithms in this challenge. Our docking method also made correct predictions on other CAPRI challenges such as protein-peptide binding for 6 out of 8 targets and water predictions for 2 out of 2 targets. The advantage of our hybrid docking approach over pure template-based docking was further confirmed by a comparative evaluation on 20 CASP-CAPRI targets. Proteins 2017; 85:497-512. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.

  5. Automated determination of fibrillar structures by simultaneous model building and fiber diffraction refinement.

    PubMed

    Potrzebowski, Wojciech; André, Ingemar

    2015-07-01

    For highly oriented fibrillar molecules, three-dimensional structures can often be determined from X-ray fiber diffraction data. However, because of limited information content, structure determination and validation can be challenging. We demonstrate that automated structure determination of protein fibers can be achieved by guiding the building of macromolecular models with fiber diffraction data. We illustrate the power of our approach by determining the structures of six bacteriophage viruses de novo using fiber diffraction data alone and together with solid-state NMR data. Furthermore, we demonstrate the feasibility of molecular replacement from monomeric and fibrillar templates by solving the structure of a plant virus using homology modeling and protein-protein docking. The generated models explain the experimental data to the same degree as deposited reference structures but with improved structural quality. We also developed a cross-validation method for model selection. The results highlight the power of fiber diffraction data as structural constraints.

  6. Robustness of atomistic Gō models in predicting native-like folding intermediates

    NASA Astrophysics Data System (ADS)

    Estácio, S. G.; Fernandes, C. S.; Krobath, H.; Faísca, P. F. N.; Shakhnovich, E. I.

    2012-08-01

    Gō models are exceedingly popular tools in computer simulations of protein folding. These models are native-centric, i.e., they are directly constructed from the protein's native structure. Therefore, it is important to understand up to which extent the atomistic details of the native structure dictate the folding behavior exhibited by Gō models. Here we address this challenge by performing exhaustive discrete molecular dynamics simulations of a Gō potential combined with a full atomistic protein representation. In particular, we investigate the robustness of this particular type of Gō models in predicting the existence of intermediate states in protein folding. We focus on the N47G mutational form of the Spc-SH3 folding domain (x-ray structure) and compare its folding pathway with that of alternative native structures produced in silico. Our methodological strategy comprises equilibrium folding simulations, structural clustering, and principal component analysis.

  7. Ab Initio Protein Structure Assembly Using Continuous Structure Fragments and Optimized Knowledge-based Force Field

    PubMed Central

    Xu, Dong; Zhang, Yang

    2012-01-01

    Ab initio protein folding is one of the major unsolved problems in computational biology due to the difficulties in force field design and conformational search. We developed a novel program, QUARK, for template-free protein structure prediction. Query sequences are first broken into fragments of 1–20 residues where multiple fragment structures are retrieved at each position from unrelated experimental structures. Full-length structure models are then assembled from fragments using replica-exchange Monte Carlo simulations, which are guided by a composite knowledge-based force field. A number of novel energy terms and Monte Carlo movements are introduced and the particular contributions to enhancing the efficiency of both force field and search engine are analyzed in detail. QUARK prediction procedure is depicted and tested on the structure modeling of 145 non-homologous proteins. Although no global templates are used and all fragments from experimental structures with template modeling score (TM-score) >0.5 are excluded, QUARK can successfully construct 3D models of correct folds in 1/3 cases of short proteins up to 100 residues. In the ninth community-wide Critical Assessment of protein Structure Prediction (CASP9) experiment, QUARK server outperformed the second and third best servers by 18% and 47% based on the cumulative Z-score of global distance test-total (GDT-TS) scores in the free modeling (FM) category. Although ab initio protein folding remains a significant challenge, these data demonstrate new progress towards the solution of the most important problem in the field. PMID:22411565

  8. Text Mining for Protein Docking

    PubMed Central

    Badal, Varsha D.; Kundrotas, Petras J.; Vakser, Ilya A.

    2015-01-01

    The rapidly growing amount of publicly available information from biomedical research is readily accessible on the Internet, providing a powerful resource for predictive biomolecular modeling. The accumulated data on experimentally determined structures transformed structure prediction of proteins and protein complexes. Instead of exploring the enormous search space, predictive tools can simply proceed to the solution based on similarity to the existing, previously determined structures. A similar major paradigm shift is emerging due to the rapidly expanding amount of information, other than experimentally determined structures, which still can be used as constraints in biomolecular structure prediction. Automated text mining has been widely used in recreating protein interaction networks, as well as in detecting small ligand binding sites on protein structures. Combining and expanding these two well-developed areas of research, we applied the text mining to structural modeling of protein-protein complexes (protein docking). Protein docking can be significantly improved when constraints on the docking mode are available. We developed a procedure that retrieves published abstracts on a specific protein-protein interaction and extracts information relevant to docking. The procedure was assessed on protein complexes from Dockground (http://dockground.compbio.ku.edu). The results show that correct information on binding residues can be extracted for about half of the complexes. The amount of irrelevant information was reduced by conceptual analysis of a subset of the retrieved abstracts, based on the bag-of-words (features) approach. Support Vector Machine models were trained and validated on the subset. The remaining abstracts were filtered by the best-performing models, which decreased the irrelevant information for ~ 25% complexes in the dataset. The extracted constraints were incorporated in the docking protocol and tested on the Dockground unbound benchmark set, significantly increasing the docking success rate. PMID:26650466

  9. Molecular dynamics studies of a DNA-binding protein: 2. An evaluation of implicit and explicit solvent models for the molecular dynamics simulation of the Escherichia coli trp repressor.

    PubMed Central

    Guenot, J.; Kollman, P. A.

    1992-01-01

    Although aqueous simulations with periodic boundary conditions more accurately describe protein dynamics than in vacuo simulations, these are computationally intensive for most proteins. Trp repressor dynamic simulations with a small water shell surrounding the starting model yield protein trajectories that are markedly improved over gas phase, yet computationally efficient. Explicit water in molecular dynamics simulations maintains surface exposure of protein hydrophilic atoms and burial of hydrophobic atoms by opposing the otherwise asymmetric protein-protein forces. This properly orients protein surface side chains, reduces protein fluctuations, and lowers the overall root mean square deviation from the crystal structure. For simulations with crystallographic waters only, a linear or sigmoidal distance-dependent dielectric yields a much better trajectory than does a constant dielectric model. As more water is added to the starting model, the differences between using distance-dependent and constant dielectric models becomes smaller, although the linear distance-dependent dielectric yields an average structure closer to the crystal structure than does a constant dielectric model. Multiplicative constants greater than one, for the linear distance-dependent dielectric simulations, produced trajectories that are progressively worse in describing trp repressor dynamics. Simulations of bovine pancreatic trypsin were used to ensure that the trp repressor results were not protein dependent and to explore the effect of the nonbonded cutoff on the distance-dependent and constant dielectric simulation models. The nonbonded cutoff markedly affected the constant but not distance-dependent dielectric bovine pancreatic trypsin inhibitor simulations. As with trp repressor, the distance-dependent dielectric model with a shell of water surrounding the protein produced a trajectory in better agreement with the crystal structure than a constant dielectric model, and the physical properties of the trajectory average structure, both with and without a nonbonded cutoff, were comparable. PMID:1304396

  10. Automated method to differentiate between native and mirror protein models obtained from contact maps.

    PubMed

    Kurczynska, Monika; Kotulska, Malgorzata

    2018-01-01

    Mirror protein structures are often considered as artifacts in modeling protein structures. However, they may soon become a new branch of biochemistry. Moreover, methods of protein structure reconstruction, based on their residue-residue contact maps, need methodology to differentiate between models of native and mirror orientation, especially regarding the reconstructed backbones. We analyzed 130 500 structural protein models obtained from contact maps of 1 305 SCOP domains belonging to all 7 structural classes. On average, the same numbers of native and mirror models were obtained among 100 models generated for each domain. Since their structural features are often not sufficient for differentiating between the two types of model orientations, we proposed to apply various energy terms (ETs) from PyRosetta to separate native and mirror models. To automate the procedure for differentiating these models, the k-means clustering algorithm was applied. Using total energy did not allow to obtain appropriate clusters-the accuracy of the clustering for class A (all helices) was no more than 0.52. Therefore, we tested a series of different k-means clusterings based on various combinations of ETs. Finally, applying two most differentiating ETs for each class allowed to obtain satisfying results. To unify the method for differentiating between native and mirror models, independent of their structural class, the two best ETs for each class were considered. Finally, the k-means clustering algorithm used three common ETs: probability of amino acid assuming certain values of dihedral angles Φ and Ψ, Ramachandran preferences and Coulomb interactions. The accuracies of clustering with these ETs were in the range between 0.68 and 0.76, with sensitivity and selectivity in the range between 0.68 and 0.87, depending on the structural class. The method can be applied to all fully-automated tools for protein structure reconstruction based on contact maps, especially those analyzing big sets of models.

  11. Automated method to differentiate between native and mirror protein models obtained from contact maps

    PubMed Central

    Kurczynska, Monika

    2018-01-01

    Mirror protein structures are often considered as artifacts in modeling protein structures. However, they may soon become a new branch of biochemistry. Moreover, methods of protein structure reconstruction, based on their residue-residue contact maps, need methodology to differentiate between models of native and mirror orientation, especially regarding the reconstructed backbones. We analyzed 130 500 structural protein models obtained from contact maps of 1 305 SCOP domains belonging to all 7 structural classes. On average, the same numbers of native and mirror models were obtained among 100 models generated for each domain. Since their structural features are often not sufficient for differentiating between the two types of model orientations, we proposed to apply various energy terms (ETs) from PyRosetta to separate native and mirror models. To automate the procedure for differentiating these models, the k-means clustering algorithm was applied. Using total energy did not allow to obtain appropriate clusters–the accuracy of the clustering for class A (all helices) was no more than 0.52. Therefore, we tested a series of different k-means clusterings based on various combinations of ETs. Finally, applying two most differentiating ETs for each class allowed to obtain satisfying results. To unify the method for differentiating between native and mirror models, independent of their structural class, the two best ETs for each class were considered. Finally, the k-means clustering algorithm used three common ETs: probability of amino acid assuming certain values of dihedral angles Φ and Ψ, Ramachandran preferences and Coulomb interactions. The accuracies of clustering with these ETs were in the range between 0.68 and 0.76, with sensitivity and selectivity in the range between 0.68 and 0.87, depending on the structural class. The method can be applied to all fully-automated tools for protein structure reconstruction based on contact maps, especially those analyzing big sets of models. PMID:29787567

  12. Proposed structure of putative glucose channel in GLUT1 facilitative glucose transporter.

    PubMed Central

    Zeng, H; Parthasarathy, R; Rampal, A L; Jung, C Y

    1996-01-01

    A family of structurally related intrinsic membrane proteins (facilitative glucose transporters) catalyzes the movement of glucose across the plasma membrane of animal cells. Evidence indicates that these proteins show a common structural motif where approximately 50% of the mass is embedded in lipid bilayer (transmembrane domain) in 12 alpha-helices (transmembrane helices; TMHs) and accommodates a water-filled channel for substrate passage (glucose channel) whose tertiary structure is currently unknown. Using recent advances in protein structure prediction algorithms we proposed here two three-dimensional structural models for the transmembrane glucose channel of GLUT1 glucose transporter. Our models emphasize the physical dimension and water accessibility of the channel, loop lengths between TMHs, the macrodipole orientation in four-helix bundle motif, and helix packing energy. Our models predict that five TMHs, either TMHs 3, 4, 7, 8, 11 (Model 1) or TMHs 2, 5, 11, 8, 7 (Model 2), line the channel, and the remaining TMHs surround these channel-lining TMHs. We discuss how our models are compatible with the experimental data obtained with this protein, and how they can be used in designing new biochemical and molecular biological experiments in elucidation of the structural basis of this important protein function. Images FIGURE 1 FIGURE 2 FIGURE 4 FIGURE 5 PMID:8770183

  13. MQAPRank: improved global protein model quality assessment by learning-to-rank.

    PubMed

    Jing, Xiaoyang; Dong, Qiwen

    2017-05-25

    Protein structure prediction has achieved a lot of progress during the last few decades and a greater number of models for a certain sequence can be predicted. Consequently, assessing the qualities of predicted protein models in perspective is one of the key components of successful protein structure prediction. Over the past years, a number of methods have been developed to address this issue, which could be roughly divided into three categories: single methods, quasi-single methods and clustering (or consensus) methods. Although these methods achieve much success at different levels, accurate protein model quality assessment is still an open problem. Here, we present the MQAPRank, a global protein model quality assessment program based on learning-to-rank. The MQAPRank first sorts the decoy models by using single method based on learning-to-rank algorithm to indicate their relative qualities for the target protein. And then it takes the first five models as references to predict the qualities of other models by using average GDT_TS scores between reference models and other models. Benchmarked on CASP11 and 3DRobot datasets, the MQAPRank achieved better performances than other leading protein model quality assessment methods. Recently, the MQAPRank participated in the CASP12 under the group name FDUBio and achieved the state-of-the-art performances. The MQAPRank provides a convenient and powerful tool for protein model quality assessment with the state-of-the-art performances, it is useful for protein structure prediction and model quality assessment usages.

  14. Kinetic Aspects of Surfactant-Induced Structural Changes of Proteins - Unsolved Problems of Two-State Model for Protein Denaturation -.

    PubMed

    Takeda, Kunio; Moriyama, Yoshiko

    2015-01-01

    The kinetic mechanism of surfactant-induced protein denaturation is discussed on the basis of not only stopped-flow kinetic data but also the changes of protein helicities caused by the surfactants and the discontinuous mobility changes of surfactant-protein complexes. For example, the α-helical structures of bovine serum albumin (BSA) are partially disrupted due to the addition of sodium dodecyl sulfate (SDS). Formation of SDS-BSA complex can lead to only four complex types with specific mobilities depending on the surfactant concentration. On the other hand, the apparent rate constant of the structural change of BSA increases with an increase of SDS concentration, indicating that the rate of the structural change becomes fast as the degree of the change increases. When a certain amount of surfactant ions bind to proteins, their native structures transform directly to particular structures without passing through intermediate stages that might be induced due to the binding of fewer amounts of the surfactant ions. Furthermore, this review brings up a question about two-state and three-state models, N⇌D and N⇌D'⇌D (N: native state, D: denatured sate, D': intermediate between N and D), which have been often adopted without hesitation in discussion on general denaturations of proteins. First of all, doubtful is whether any equilibrium relationship exists in such denaturation reactions. It cannot be disregarded that the D states in these models differ depending on the changes of intensities of the denaturing factors. The authors emphasize that the denaturations or the structural changes of proteins should be discussed assuming one-way reaction models with no backward processes rather than assuming the reversible two-state reaction models or similar modified reaction models.

  15. MMM: A toolbox for integrative structure modeling.

    PubMed

    Jeschke, Gunnar

    2018-01-01

    Structural characterization of proteins and their complexes may require integration of restraints from various experimental techniques. MMM (Multiscale Modeling of Macromolecules) is a Matlab-based open-source modeling toolbox for this purpose with a particular emphasis on distance distribution restraints obtained from electron paramagnetic resonance experiments on spin-labelled proteins and nucleic acids and their combination with atomistic structures of domains or whole protomers, small-angle scattering data, secondary structure information, homology information, and elastic network models. MMM does not only integrate various types of restraints, but also various existing modeling tools by providing a common graphical user interface to them. The types of restraints that can support such modeling and the available model types are illustrated by recent application examples. © 2017 The Protein Society.

  16. Insights from molecular modeling and dynamics simulation of pathogen resistance (R) protein from brinjal.

    PubMed

    Shrivastava, Dipty; Nain, Vikrant; Sahi, Shakti; Verma, Anju; Sharma, Priyanka; Sharma, Prakash Chand; Kumar, Polumetla Ananda

    2011-01-22

    Resistance (R) protein recognizes molecular signature of pathogen infection and activates downstream hypersensitive response signalling in plants. R protein works as a molecular switch for pathogen defence signalling and represent one of the largest plant gene family. Hence, understanding molecular structure and function of R proteins has been of paramount importance for plant biologists. The present study is aimed at predicting structure of R proteins signalling domains (CC-NBS) by creating a homology model, refining and optimising the model by molecular dynamics simulation and comparing ADP and ATP binding. Based on sequence similarity with proteins of known structures, CC-NBS domains were initially modelled using CED- 4 (cell death abnormality protein) and APAF-1 (apoptotic protease activating factor) as multiple templates. The final CC-NBS structural model was built and optimized by molecular dynamic simulation for 5 nanoseconds (ns). Docking of ADP and ATP at active site shows that both ligand bind specifically with same residues and with minor difference (1 Kcal/mol) in binding energy. Sharing of binding site by ADP and ATP and low difference in their binding site makes CC-NBS suitable for working as molecular switch. Furthermore, structural superimposition elucidate that CC-NBS and CARD (caspase recruitment domains) domain of CED-4 have low RMSD value of 0.9 A° Availability of 3D structural model for both CC and NBS domains will . help in getting deeper insight in these pathogen defence genes.

  17. UNRES server for physics-based coarse-grained simulations and prediction of protein structure, dynamics and thermodynamics.

    PubMed

    Czaplewski, Cezary; Karczynska, Agnieszka; Sieradzan, Adam K; Liwo, Adam

    2018-04-30

    A server implementation of the UNRES package (http://www.unres.pl) for coarse-grained simulations of protein structures with the physics-based UNRES model, coined a name UNRES server, is presented. In contrast to most of the protein coarse-grained models, owing to its physics-based origin, the UNRES force field can be used in simulations, including those aimed at protein-structure prediction, without ancillary information from structural databases; however, the implementation includes the possibility of using restraints. Local energy minimization, canonical molecular dynamics simulations, replica exchange and multiplexed replica exchange molecular dynamics simulations can be run with the current UNRES server; the latter are suitable for protein-structure prediction. The user-supplied input includes protein sequence and, optionally, restraints from secondary-structure prediction or small x-ray scattering data, and simulation type and parameters which are selected or typed in. Oligomeric proteins, as well as those containing D-amino-acid residues and disulfide links can be treated. The output is displayed graphically (minimized structures, trajectories, final models, analysis of trajectory/ensembles); however, all output files can be downloaded by the user. The UNRES server can be freely accessed at http://unres-server.chem.ug.edu.pl.

  18. Ab initio protein structure assembly using continuous structure fragments and optimized knowledge-based force field.

    PubMed

    Xu, Dong; Zhang, Yang

    2012-07-01

    Ab initio protein folding is one of the major unsolved problems in computational biology owing to the difficulties in force field design and conformational search. We developed a novel program, QUARK, for template-free protein structure prediction. Query sequences are first broken into fragments of 1-20 residues where multiple fragment structures are retrieved at each position from unrelated experimental structures. Full-length structure models are then assembled from fragments using replica-exchange Monte Carlo simulations, which are guided by a composite knowledge-based force field. A number of novel energy terms and Monte Carlo movements are introduced and the particular contributions to enhancing the efficiency of both force field and search engine are analyzed in detail. QUARK prediction procedure is depicted and tested on the structure modeling of 145 nonhomologous proteins. Although no global templates are used and all fragments from experimental structures with template modeling score >0.5 are excluded, QUARK can successfully construct 3D models of correct folds in one-third cases of short proteins up to 100 residues. In the ninth community-wide Critical Assessment of protein Structure Prediction experiment, QUARK server outperformed the second and third best servers by 18 and 47% based on the cumulative Z-score of global distance test-total scores in the FM category. Although ab initio protein folding remains a significant challenge, these data demonstrate new progress toward the solution of the most important problem in the field. Copyright © 2012 Wiley Periodicals, Inc.

  19. Refinement of Generalized Born Implicit Solvation Parameters for Nucleic Acids and their Complexes with Proteins

    PubMed Central

    Nguyen, Hai; Pérez, Alberto; Bermeo, Sherry; Simmerling, Carlos

    2016-01-01

    The Generalized Born (GB) implicit solvent model has undergone significant improvements in accuracy for modeling of proteins and small molecules. However, GB still remains a less widely explored option for nucleic acid simulations, in part because fast GB models are often unable to maintain stable nucleic acid structures, or they introduce structural bias in proteins, leading to difficulty in application of GB models in simulations of protein-nucleic acid complexes. Recently, GB-neck2 was developed to improve the behavior of protein simulations. In an effort to create a more accurate model for nucleic acids, a similar procedure to the development of GB-neck2 is described here for nucleic acids. The resulting parameter set significantly reduces absolute and relative energy error relative to Poisson Boltzmann for both nucleic acids and nucleic acid-protein complexes, when compared to its predecessor GB-neck model. This improvement in solvation energy calculation translates to increased structural stability for simulations of DNA and RNA duplexes, quadruplexes, and protein-nucleic acid complexes. The GB-neck2 model also enables successful folding of small DNA and RNA hairpins to near native structures as determined from comparison with experiment. The functional form and all required parameters are provided here and also implemented in the AMBER software. PMID:26574454

  20. What determines the spectrum of protein native state structures?

    PubMed

    Lezon, Timothy R; Banavar, Jayanth R; Lesk, Arthur M; Maritan, Amos

    2006-05-01

    We present a brief summary of the key factors underlying protein structure, as developed in the investigations of Pauling, Ramachandran, and Rose. We then outline a simplified physical model of proteins that focuses on geometry and symmetry. Although this model superficially appears unrelated to the detailed chemical descriptions commonly applied to proteins, we show that it captures the essential elements of the chemistry and provides a unified framework for understanding the common characteristics of folded proteins. We suggest that the spectrum of protein native state structures is determined by geometry and symmetry and the role of the sequence is to choose its native state structure from this predetermined menu. 2006 Wiley-Liss, Inc.

  1. Learning To Fold Proteins Using Energy Landscape Theory

    PubMed Central

    Schafer, N.P.; Kim, B.L.; Zheng, W.; Wolynes, P.G.

    2014-01-01

    This review is a tutorial for scientists interested in the problem of protein structure prediction, particularly those interested in using coarse-grained molecular dynamics models that are optimized using lessons learned from the energy landscape theory of protein folding. We also present a review of the results of the AMH/AMC/AMW/AWSEM family of coarse-grained molecular dynamics protein folding models to illustrate the points covered in the first part of the article. Accurate coarse-grained structure prediction models can be used to investigate a wide range of conceptual and mechanistic issues outside of protein structure prediction; specifically, the paper concludes by reviewing how AWSEM has in recent years been able to elucidate questions related to the unusual kinetic behavior of artificially designed proteins, multidomain protein misfolding, and the initial stages of protein aggregation. PMID:25308991

  2. Contact Prediction for Beta and Alpha-Beta Proteins Using Integer Linear Optimization and its Impact on the First Principles 3D Structure Prediction Method ASTRO-FOLD

    PubMed Central

    Rajgaria, R.; Wei, Y.; Floudas, C. A.

    2010-01-01

    An integer linear optimization model is presented to predict residue contacts in β, α + β, and α/β proteins. The total energy of a protein is expressed as sum of a Cα – Cα distance dependent contact energy contribution and a hydrophobic contribution. The model selects contacts that assign lowest energy to the protein structure while satisfying a set of constraints that are included to enforce certain physically observed topological information. A new method based on hydrophobicity is proposed to find the β-sheet alignments. These β-sheet alignments are used as constraints for contacts between residues of β-sheets. This model was tested on three independent protein test sets and CASP8 test proteins consisting of β, α + β, α/β proteins and was found to perform very well. The average accuracy of the predictions (separated by at least six residues) was approximately 61%. The average true positive and false positive distances were also calculated for each of the test sets and they are 7.58 Å and 15.88 Å, respectively. Residue contact prediction can be directly used to facilitate the protein tertiary structure prediction. This proposed residue contact prediction model is incorporated into the first principles protein tertiary structure prediction approach, ASTRO-FOLD. The effectiveness of the contact prediction model was further demonstrated by the improvement in the quality of the protein structure ensemble generated using the predicted residue contacts for a test set of 10 proteins. PMID:20225257

  3. InterPred: A pipeline to identify and model protein-protein interactions.

    PubMed

    Mirabello, Claudio; Wallner, Björn

    2017-06-01

    Protein-protein interactions (PPI) are crucial for protein function. There exist many techniques to identify PPIs experimentally, but to determine the interactions in molecular detail is still difficult and very time-consuming. The fact that the number of PPIs is vastly larger than the number of individual proteins makes it practically impossible to characterize all interactions experimentally. Computational approaches that can bridge this gap and predict PPIs and model the interactions in molecular detail are greatly needed. Here we present InterPred, a fully automated pipeline that predicts and model PPIs from sequence using structural modeling combined with massive structural comparisons and molecular docking. A key component of the method is the use of a novel random forest classifier that integrate several structural features to distinguish correct from incorrect protein-protein interaction models. We show that InterPred represents a major improvement in protein-protein interaction detection with a performance comparable or better than experimental high-throughput techniques. We also show that our full-atom protein-protein complex modeling pipeline performs better than state of the art protein docking methods on a standard benchmark set. In addition, InterPred was also one of the top predictors in the latest CAPRI37 experiment. InterPred source code can be downloaded from http://wallnerlab.org/InterPred Proteins 2017; 85:1159-1170. © 2017 Wiley Periodicals, Inc. © 2017 Wiley Periodicals, Inc.

  4. Preservation of protein clefts in comparative models.

    PubMed

    Piedra, David; Lois, Sergi; de la Cruz, Xavier

    2008-01-16

    Comparative, or homology, modelling of protein structures is the most widely used prediction method when the target protein has homologues of known structure. Given that the quality of a model may vary greatly, several studies have been devoted to identifying the factors that influence modelling results. These studies usually consider the protein as a whole, and only a few provide a separate discussion of the behaviour of biologically relevant features of the protein. Given the value of the latter for many applications, here we extended previous work by analysing the preservation of native protein clefts in homology models. We chose to examine clefts because of their role in protein function/structure, as they are usually the locus of protein-protein interactions, host the enzymes' active site, or, in the case of protein domains, can also be the locus of domain-domain interactions that lead to the structure of the whole protein. We studied how the largest cleft of a protein varies in comparative models. To this end, we analysed a set of 53507 homology models that cover the whole sequence identity range, with a special emphasis on medium and low similarities. More precisely we examined how cleft quality - measured using six complementary parameters related to both global shape and local atomic environment, depends on the sequence identity between target and template proteins. In addition to this general analysis, we also explored the impact of a number of factors on cleft quality, and found that the relationship between quality and sequence identity varies depending on cleft rank amongst the set of protein clefts (when ordered according to size), and number of aligned residues. We have examined cleft quality in homology models at a range of seq.id. levels. Our results provide a detailed view of how quality is affected by distinct parameters and thus may help the user of comparative modelling to determine the final quality and applicability of his/her cleft models. In addition, the large variability in model quality that we observed within each sequence bin, with good models present even at low sequence identities (between 20% and 30%), indicates that properly developed identification methods could be used to recover good cleft models in this sequence range.

  5. Modelling a 3D structure for EgDf1 from shape Echinococcus granulosus: putative epitopes, phosphorylation motifs and ligand

    NASA Astrophysics Data System (ADS)

    Paulino, M.; Esteves, A.; Vega, M.; Tabares, G.; Ehrlich, R.; Tapia, O.

    1998-07-01

    EgDf1 is a developmentally regulated protein from the parasite Echinococcus granulosus related to a family of hydrophobic ligand binding proteins. This protein could play a crucial role during the parasite life cycle development since this organism is unable to synthetize most of their own lipids de novo. Furthermore, it has been shown that two related protein from other parasitic platyhelminths (Fh15 from Fasciola hepatica and Sm14 from Schistosoma mansoni) are able to confer protective inmunity against experimental infection in animal models. A three-dimensional structure would help establishing structure/function relationships on a knowledge based manner. 3D structures for EgDf1 protein were modelled by using myelin P2 (mP2) and intestine fatty acid binding protein (I-FABP) as templates. Molecular dynamics techniques were used to validate the models. Template mP2 yielded the best 3D structure for EgDf1. Palmitic and oleic acids were docked inside EgDf1. The present theoretical results suggest definite location in the secondary structure of the epitopic regions, consensus phosphorylation motifs and oleic acid as a good ligand candidate to EgDf1. This protein might well be involved in the process of supplying hydrophobic metabolites for membrane biosynthesis and for signaling pathways.

  6. Refinement of protein termini in template-based modeling using conformational space annealing.

    PubMed

    Park, Hahnbeom; Ko, Junsu; Joo, Keehyoung; Lee, Julian; Seok, Chaok; Lee, Jooyoung

    2011-09-01

    The rapid increase in the number of experimentally determined protein structures in recent years enables us to obtain more reliable protein tertiary structure models than ever by template-based modeling. However, refinement of template-based models beyond the limit available from the best templates is still needed for understanding protein function in atomic detail. In this work, we develop a new method for protein terminus modeling that can be applied to refinement of models with unreliable terminus structures. The energy function for terminus modeling consists of both physics-based and knowledge-based potential terms with carefully optimized relative weights. Effective sampling of both the framework and terminus is performed using the conformational space annealing technique. This method has been tested on a set of termini derived from a nonredundant structure database and two sets of termini from the CASP8 targets. The performance of the terminus modeling method is significantly improved over our previous method that does not employ terminus refinement. It is also comparable or superior to the best server methods tested in CASP8. The success of the current approach suggests that similar strategy may be applied to other types of refinement problems such as loop modeling or secondary structure rearrangement. Copyright © 2011 Wiley-Liss, Inc.

  7. MyPMFs: a simple tool for creating statistical potentials to assess protein structural models.

    PubMed

    Postic, Guillaume; Hamelryck, Thomas; Chomilier, Jacques; Stratmann, Dirk

    2018-05-29

    Evaluating the model quality of protein structures that evolve in environments with particular physicochemical properties requires scoring functions that are adapted to their specific residue compositions and/or structural characteristics. Thus, computational methods developed for structures from the cytosol cannot work properly on membrane or secreted proteins. Here, we present MyPMFs, an easy-to-use tool that allows users to train statistical potentials of mean force (PMFs) on the protein structures of their choice, with all parameters being adjustable. We demonstrate its use by creating an accurate statistical potential for transmembrane protein domains. We also show its usefulness to study the influence of the physical environment on residue interactions within protein structures. Our open-source software is freely available for download at https://github.com/bibip-impmc/mypmfs. Copyright © 2018. Published by Elsevier B.V.

  8. Simultaneous optimization of biomolecular energy function on features from small molecules and macromolecules

    PubMed Central

    Park, Hahnbeom; Bradley, Philip; Greisen, Per; Liu, Yuan; Mulligan, Vikram Khipple; Kim, David E.; Baker, David; DiMaio, Frank

    2017-01-01

    Most biomolecular modeling energy functions for structure prediction, sequence design, and molecular docking, have been parameterized using existing macromolecular structural data; this contrasts molecular mechanics force fields which are largely optimized using small-molecule data. In this study, we describe an integrated method that enables optimization of a biomolecular modeling energy function simultaneously against small-molecule thermodynamic data and high-resolution macromolecular structural data. We use this approach to develop a next-generation Rosetta energy function that utilizes a new anisotropic implicit solvation model, and an improved electrostatics and Lennard-Jones model, illustrating how energy functions can be considerably improved in their ability to describe large-scale energy landscapes by incorporating both small-molecule and macromolecule data. The energy function improves performance in a wide range of protein structure prediction challenges, including monomeric structure prediction, protein-protein and protein-ligand docking, protein sequence design, and prediction of the free energy changes by mutation, while reasonably recapitulating small-molecule thermodynamic properties. PMID:27766851

  9. General mechanism of two-state protein folding kinetics.

    PubMed

    Rollins, Geoffrey C; Dill, Ken A

    2014-08-13

    We describe here a general model of the kinetic mechanism of protein folding. In the Foldon Funnel Model, proteins fold in units of secondary structures, which form sequentially along the folding pathway, stabilized by tertiary interactions. The model predicts that the free energy landscape has a volcano shape, rather than a simple funnel, that folding is two-state (single-exponential) when secondary structures are intrinsically unstable, and that each structure along the folding path is a transition state for the previous structure. It shows how sequential pathways are consistent with multiple stochastic routes on funnel landscapes, and it gives good agreement with the 9 order of magnitude dependence of folding rates on protein size for a set of 93 proteins, at the same time it is consistent with the near independence of folding equilibrium constant on size. This model gives estimates of folding rates of proteomes, leading to a median folding time in Escherichia coli of about 5 s.

  10. Synchrotron IR microspectroscopy for protein structure analysis: Potential and questions

    DOE PAGES

    Yu, Peiqiang

    2006-01-01

    Synchrotron radiation-based Fourier transform infrared microspectroscopy (S-FTIR) has been developed as a rapid, direct, non-destructive, bioanalytical technique. This technique takes advantage of synchrotron light brightness and small effective source size and is capable of exploring the molecular chemical make-up within microstructures of a biological tissue without destruction of inherent structures at ultra-spatial resolutions within cellular dimension. To date there has been very little application of this advanced technique to the study of pure protein inherent structure at a cellular level in biological tissues. In this review, a novel approach was introduced to show the potential of the newly developed, advancedmore » synchrotron-based analytical technology, which can be used to localize relatively “pure“ protein in the plant tissues and relatively reveal protein inherent structure and protein molecular chemical make-up within intact tissue at cellular and subcellular levels. Several complex protein IR spectra data analytical techniques (Gaussian and Lorentzian multi-component peak modeling, univariate and multivariate analysis, principal component analysis (PCA), and hierarchical cluster analysis (CLA) are employed to relatively reveal features of protein inherent structure and distinguish protein inherent structure differences between varieties/species and treatments in plant tissues. By using a multi-peak modeling procedure, RELATIVE estimates (but not EXACT determinations) for protein secondary structure analysis can be made for comparison purpose. The issues of pro- and anti-multi-peaking modeling/fitting procedure for relative estimation of protein structure were discussed. By using the PCA and CLA analyses, the plant molecular structure can be qualitatively separate one group from another, statistically, even though the spectral assignments are not known. The synchrotron-based technology provides a new approach for protein structure research in biological tissues at ultraspatial resolutions.« less

  11. An Introductory Classroom Exercise on Protein Molecular Model Visualization and Detailed Analysis of Protein-Ligand Binding

    ERIC Educational Resources Information Center

    Poeylaut-Palena, Andres, A.; de los Angeles Laborde, Maria

    2013-01-01

    A learning module for molecular level analysis of protein structure and ligand/drug interaction through the visualization of X-ray diffraction is presented. Using DeepView as molecular model visualization software, students learn about the general concepts of protein structure. This Biochemistry classroom exercise is designed to be carried out by…

  12. Predicting Ligand Binding Sites on Protein Surfaces by 3-Dimensional Probability Density Distributions of Interacting Atoms

    PubMed Central

    Jian, Jhih-Wei; Elumalai, Pavadai; Pitti, Thejkiran; Wu, Chih Yuan; Tsai, Keng-Chang; Chang, Jeng-Yih; Peng, Hung-Pin; Yang, An-Suei

    2016-01-01

    Predicting ligand binding sites (LBSs) on protein structures, which are obtained either from experimental or computational methods, is a useful first step in functional annotation or structure-based drug design for the protein structures. In this work, the structure-based machine learning algorithm ISMBLab-LIG was developed to predict LBSs on protein surfaces with input attributes derived from the three-dimensional probability density maps of interacting atoms, which were reconstructed on the query protein surfaces and were relatively insensitive to local conformational variations of the tentative ligand binding sites. The prediction accuracy of the ISMBLab-LIG predictors is comparable to that of the best LBS predictors benchmarked on several well-established testing datasets. More importantly, the ISMBLab-LIG algorithm has substantial tolerance to the prediction uncertainties of computationally derived protein structure models. As such, the method is particularly useful for predicting LBSs not only on experimental protein structures without known LBS templates in the database but also on computationally predicted model protein structures with structural uncertainties in the tentative ligand binding sites. PMID:27513851

  13. Mapping monomeric threading to protein-protein structure prediction.

    PubMed

    Guerler, Aysam; Govindarajoo, Brandon; Zhang, Yang

    2013-03-25

    The key step of template-based protein-protein structure prediction is the recognition of complexes from experimental structure libraries that have similar quaternary fold. Maintaining two monomer and dimer structure libraries is however laborious, and inappropriate library construction can degrade template recognition coverage. We propose a novel strategy SPRING to identify complexes by mapping monomeric threading alignments to protein-protein interactions based on the original oligomer entries in the PDB, which does not rely on library construction and increases the efficiency and quality of complex template recognitions. SPRING is tested on 1838 nonhomologous protein complexes which can recognize correct quaternary template structures with a TM score >0.5 in 1115 cases after excluding homologous proteins. The average TM score of the first model is 60% and 17% higher than that by HHsearch and COTH, respectively, while the number of targets with an interface RMSD <2.5 Å by SPRING is 134% and 167% higher than these competing methods. SPRING is controlled with ZDOCK on 77 docking benchmark proteins. Although the relative performance of SPRING and ZDOCK depends on the level of homology filters, a combination of the two methods can result in a significantly higher model quality than ZDOCK at all homology thresholds. These data demonstrate a new efficient approach to quaternary structure recognition that is ready to use for genome-scale modeling of protein-protein interactions due to the high speed and accuracy.

  14. Dynameomics: Data-driven methods and models for utilizing large-scale protein structure repositories for improving fragment-based loop prediction

    PubMed Central

    Rysavy, Steven J; Beck, David AC; Daggett, Valerie

    2014-01-01

    Protein function is intimately linked to protein structure and dynamics yet experimentally determined structures frequently omit regions within a protein due to indeterminate data, which is often due protein dynamics. We propose that atomistic molecular dynamics simulations provide a diverse sampling of biologically relevant structures for these missing segments (and beyond) to improve structural modeling and structure prediction. Here we make use of the Dynameomics data warehouse, which contains simulations of representatives of essentially all known protein folds. We developed novel computational methods to efficiently identify, rank and retrieve small peptide structures, or fragments, from this database. We also created a novel data model to analyze and compare large repositories of structural data, such as contained within the Protein Data Bank and the Dynameomics data warehouse. Our evaluation compares these structural repositories for improving loop predictions and analyzes the utility of our methods and models. Using a standard set of loop structures, containing 510 loops, 30 for each loop length from 4 to 20 residues, we find that the inclusion of Dynameomics structures in fragment-based methods improves the quality of the loop predictions without being dependent on sequence homology. Depending on loop length, ∼25–75% of the best predictions came from the Dynameomics set, resulting in lower main chain root-mean-square deviations for all fragment lengths using the combined fragment library. We also provide specific cases where Dynameomics fragments provide better predictions for NMR loop structures than fragments from crystal structures. Online access to these fragment libraries is available at http://www.dynameomics.org/fragments. PMID:25142412

  15. Protein folding simulations: from coarse-grained model to all-atom model.

    PubMed

    Zhang, Jian; Li, Wenfei; Wang, Jun; Qin, Meng; Wu, Lei; Yan, Zhiqiang; Xu, Weixin; Zuo, Guanghong; Wang, Wei

    2009-06-01

    Protein folding is an important and challenging problem in molecular biology. During the last two decades, molecular dynamics (MD) simulation has proved to be a paramount tool and was widely used to study protein structures, folding kinetics and thermodynamics, and structure-stability-function relationship. It was also used to help engineering and designing new proteins, and to answer even more general questions such as the minimal number of amino acid or the evolution principle of protein families. Nowadays, the MD simulation is still undergoing rapid developments. The first trend is to toward developing new coarse-grained models and studying larger and more complex molecular systems such as protein-protein complex and their assembling process, amyloid related aggregations, and structure and motion of chaperons, motors, channels and virus capsides; the second trend is toward building high resolution models and explore more detailed and accurate pictures of protein folding and the associated processes, such as the coordination bond or disulfide bond involved folding, the polarization, charge transfer and protonate/deprotonate process involved in metal coupled folding, and the ion permeation and its coupling with the kinetics of channels. On these new territories, MD simulations have given many promising results and will continue to offer exciting views. Here, we review several new subjects investigated by using MD simulations as well as the corresponding developments of appropriate protein models. These include but are not limited to the attempt to go beyond the topology based Gō-like model and characterize the energetic factors in protein structures and dynamics, the study of the thermodynamics and kinetics of disulfide bond involved protein folding, the modeling of the interactions between chaperonin and the encapsulated protein and the protein folding under this circumstance, the effort to clarify the important yet still elusive folding mechanism of protein BBL, the development of discrete MD and its application in studying the alpha-beta conformational conversion and oligomer assembling process, and the modeling of metal ion involved protein folding. (c) 2009 IUBMB.

  16. Protein simulation using coarse-grained two-bead multipole force field with polarizable water models.

    PubMed

    Li, Min; Zhang, John Z H

    2017-02-14

    A recently developed two-bead multipole force field (TMFF) is employed in coarse-grained (CG) molecular dynamics (MD) simulation of proteins in combination with polarizable CG water models, the Martini polarizable water model, and modified big multipole water model. Significant improvement in simulated structures and dynamics of proteins is observed in terms of both the root-mean-square deviations (RMSDs) of the structures and residue root-mean-square fluctuations (RMSFs) from the native ones in the present simulation compared with the simulation result with Martini's non-polarizable water model. Our result shows that TMFF simulation using CG water models gives much stable secondary structures of proteins without the need for adding extra interaction potentials to constrain the secondary structures. Our result also shows that by increasing the MD time step from 2 fs to 6 fs, the RMSD and RMSF results are still in excellent agreement with those from all-atom simulations. The current study demonstrated clearly that the application of TMFF together with a polarizable CG water model significantly improves the accuracy and efficiency for CG simulation of proteins.

  17. Protein simulation using coarse-grained two-bead multipole force field with polarizable water models

    NASA Astrophysics Data System (ADS)

    Li, Min; Zhang, John Z. H.

    2017-02-01

    A recently developed two-bead multipole force field (TMFF) is employed in coarse-grained (CG) molecular dynamics (MD) simulation of proteins in combination with polarizable CG water models, the Martini polarizable water model, and modified big multipole water model. Significant improvement in simulated structures and dynamics of proteins is observed in terms of both the root-mean-square deviations (RMSDs) of the structures and residue root-mean-square fluctuations (RMSFs) from the native ones in the present simulation compared with the simulation result with Martini's non-polarizable water model. Our result shows that TMFF simulation using CG water models gives much stable secondary structures of proteins without the need for adding extra interaction potentials to constrain the secondary structures. Our result also shows that by increasing the MD time step from 2 fs to 6 fs, the RMSD and RMSF results are still in excellent agreement with those from all-atom simulations. The current study demonstrated clearly that the application of TMFF together with a polarizable CG water model significantly improves the accuracy and efficiency for CG simulation of proteins.

  18. BindML/BindML+: Detecting Protein-Protein Interaction Interface Propensity from Amino Acid Substitution Patterns.

    PubMed

    Wei, Qing; La, David; Kihara, Daisuke

    2017-01-01

    Prediction of protein-protein interaction sites in a protein structure provides important information for elucidating the mechanism of protein function and can also be useful in guiding a modeling or design procedures of protein complex structures. Since prediction methods essentially assess the propensity of amino acids that are likely to be part of a protein docking interface, they can help in designing protein-protein interactions. Here, we introduce BindML and BindML+ protein-protein interaction sites prediction methods. BindML predicts protein-protein interaction sites by identifying mutation patterns found in known protein-protein complexes using phylogenetic substitution models. BindML+ is an extension of BindML for distinguishing permanent and transient types of protein-protein interaction sites. We developed an interactive web-server that provides a convenient interface to assist in structural visualization of protein-protein interactions site predictions. The input data for the web-server are a tertiary structure of interest. BindML and BindML+ are available at http://kiharalab.org/bindml/ and http://kiharalab.org/bindml/plus/ .

  19. Probing Protein Fold Space with a Simplified Model

    PubMed Central

    Minary, Peter; Levitt, Michael

    2008-01-01

    We probe the stability and near-native energy landscape of protein fold space using powerful conformational sampling methods together with simple reduced models and statistical potentials. Fold space is represented by a set of 280 protein domains spanning all topological classes and having a wide range of lengths (0-300 residues), amino acid composition, and number of secondary structural elements. The degrees of freedom are taken as the loop torsion angles. This choice preserves the native secondary structure but allows the tertiary structure to change. The proteins are represented by three-point per residue, three-dimensional models with statistical potentials derived from a knowledge-based study of known protein structures. When this space is sampled by a combination of Parallel Tempering and Equi-Energy Monte Carlo, we find that the three-point model captures the known stability of protein native structures with stable energy basins that are near-native (all-α: 4.77 Å, all-β: 2.93 Å, α/β: 3.09 Å, α+β: 4.89 Å on average and within 6 Å for 71.41 %, 92.85 %, 94.29 % and 64.28 % for all-α, all-β, α/β and α+β, classes respectively). Denatured structures also occur and these have interesting structural properties that shed light on the different landscape characteristics of α and β folds. We find that α/β proteins with alternating α and β segments (such as the beta-barrel) are more stable than proteins in other fold classes. PMID:18054792

  20. Structural Transition and Antibody Binding of EBOV GP and ZIKV E Proteins from Pre-Fusion to Fusion-Initiation State.

    PubMed

    Lappala, Anna; Nishima, Wataru; Miner, Jacob; Fenimore, Paul; Fischer, Will; Hraber, Peter; Zhang, Ming; McMahon, Benjamin; Tung, Chang-Shung

    2018-05-10

    Membrane fusion proteins are responsible for viral entry into host cells—a crucial first step in viral infection. These proteins undergo large conformational changes from pre-fusion to fusion-initiation structures, and, despite differences in viral genomes and disease etiology, many fusion proteins are arranged as trimers. Structural information for both pre-fusion and fusion-initiation states is critical for understanding virus neutralization by the host immune system. In the case of Ebola virus glycoprotein (EBOV GP) and Zika virus envelope protein (ZIKV E), pre-fusion state structures have been identified experimentally, but only partial structures of fusion-initiation states have been described. While the fusion-initiation structure is in an energetically unfavorable state that is difficult to solve experimentally, the existing structural information combined with computational approaches enabled the modeling of fusion-initiation state structures of both proteins. These structural models provide an improved understanding of four different neutralizing antibodies in the prevention of viral host entry.

  1. Unraveling protein folding mechanism by analyzing the hierarchy of models with increasing level of detail

    NASA Astrophysics Data System (ADS)

    Hayashi, Tomohiko; Yasuda, Satoshi; Škrbić, Tatjana; Giacometti, Achille; Kinoshita, Masahiro

    2017-09-01

    Taking protein G with 56 residues for a case study, we investigate the mechanism of protein folding. In addition to its native structure possessing α-helix and β-sheet contents of 27% and 39%, respectively, we construct a number of misfolded decoys with a wide variety of α-helix and β-sheet contents. We then consider a hierarchy of 8 different models with increasing level of detail in terms of the number of entropic and energetic physical factors incorporated. The polyatomic structure is always taken into account, but the side chains are removed in half of the models. The solvent is formed by either neutral hard spheres or water molecules. Protein intramolecular hydrogen bonds (H-bonds) and protein-solvent H-bonds (the latter is present only in water) are accounted for or not, depending on the model considered. We then apply a physics-based free-energy function (FEF) corresponding to each model and investigate which structures are most stabilized. This special approach taken on a step-by-step basis enables us to clarify the role of each physical factor in contributing to the structural stability and separately elucidate its effect. Depending on the model employed, significantly different structures such as very compact configurations with no secondary structures and configurations of associated α-helices are optimally stabilized. The native structure can be identified as that with lowest FEF only when the most detailed model is employed. This result is significant for at least the two reasons: The most detailed model considered here is able to capture the fundamental aspects of protein folding notwithstanding its simplicity; and it is shown that the native structure is stabilized by a complex interplay of minimal multiple factors that must be all included in the description. In the absence of even a single of these factors, the protein is likely to be driven towards a different, more stable state.

  2. LIBP-Pred: web server for lipid binding proteins using structural network parameters; PDB mining of human cancer biomarkers and drug targets in parasites and bacteria.

    PubMed

    González-Díaz, Humberto; Munteanu, Cristian R; Postelnicu, Lucian; Prado-Prado, Francisco; Gestal, Marcos; Pazos, Alejandro

    2012-03-01

    Lipid-Binding Proteins (LIBPs) or Fatty Acid-Binding Proteins (FABPs) play an important role in many diseases such as different types of cancer, kidney injury, atherosclerosis, diabetes, intestinal ischemia and parasitic infections. Thus, the computational methods that can predict LIBPs based on 3D structure parameters became a goal of major importance for drug-target discovery, vaccine design and biomarker selection. In addition, the Protein Data Bank (PDB) contains 3000+ protein 3D structures with unknown function. This list, as well as new experimental outcomes in proteomics research, is a very interesting source to discover relevant proteins, including LIBPs. However, to the best of our knowledge, there are no general models to predict new LIBPs based on 3D structures. We developed new Quantitative Structure-Activity Relationship (QSAR) models based on 3D electrostatic parameters of 1801 different proteins, including 801 LIBPs. We calculated these electrostatic parameters with the MARCH-INSIDE software and they correspond to the entire protein or to specific protein regions named core, inner, middle, and surface. We used these parameters as inputs to develop a simple Linear Discriminant Analysis (LDA) classifier to discriminate 3D structure of LIBPs from other proteins. We implemented this predictor in the web server named LIBP-Pred, freely available at , along with other important web servers of the Bio-AIMS portal. The users can carry out an automatic retrieval of protein structures from PDB or upload their custom protein structural models from their disk created with LOMETS server. We demonstrated the PDB mining option performing a predictive study of 2000+ proteins with unknown function. Interesting results regarding the discovery of new Cancer Biomarkers in humans or drug targets in parasites have been discussed here in this sense.

  3. Bio-AIMS Collection of Chemoinformatics Web Tools based on Molecular Graph Information and Artificial Intelligence Models.

    PubMed

    Munteanu, Cristian R; Gonzalez-Diaz, Humberto; Garcia, Rafael; Loza, Mabel; Pazos, Alejandro

    2015-01-01

    The molecular information encoding into molecular descriptors is the first step into in silico Chemoinformatics methods in Drug Design. The Machine Learning methods are a complex solution to find prediction models for specific biological properties of molecules. These models connect the molecular structure information such as atom connectivity (molecular graphs) or physical-chemical properties of an atom/group of atoms to the molecular activity (Quantitative Structure - Activity Relationship, QSAR). Due to the complexity of the proteins, the prediction of their activity is a complicated task and the interpretation of the models is more difficult. The current review presents a series of 11 prediction models for proteins, implemented as free Web tools on an Artificial Intelligence Model Server in Biosciences, Bio-AIMS (http://bio-aims.udc.es/TargetPred.php). Six tools predict protein activity, two models evaluate drug - protein target interactions and the other three calculate protein - protein interactions. The input information is based on the protein 3D structure for nine models, 1D peptide amino acid sequence for three tools and drug SMILES formulas for two servers. The molecular graph descriptor-based Machine Learning models could be useful tools for in silico screening of new peptides/proteins as future drug targets for specific treatments.

  4. A probabilistic model for detecting rigid domains in protein structures.

    PubMed

    Nguyen, Thach; Habeck, Michael

    2016-09-01

    Large-scale conformational changes in proteins are implicated in many important biological functions. These structural transitions can often be rationalized in terms of relative movements of rigid domains. There is a need for objective and automated methods that identify rigid domains in sets of protein structures showing alternative conformational states. We present a probabilistic model for detecting rigid-body movements in protein structures. Our model aims to approximate alternative conformational states by a few structural parts that are rigidly transformed under the action of a rotation and a translation. By using Bayesian inference and Markov chain Monte Carlo sampling, we estimate all parameters of the model, including a segmentation of the protein into rigid domains, the structures of the domains themselves, and the rigid transformations that generate the observed structures. We find that our Gibbs sampling algorithm can also estimate the optimal number of rigid domains with high efficiency and accuracy. We assess the power of our method on several thousand entries of the DynDom database and discuss applications to various complex biomolecular systems. The Python source code for protein ensemble analysis is available at: https://github.com/thachnguyen/motion_detection : mhabeck@gwdg.de. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  5. GalaxyHomomer: a web server for protein homo-oligomer structure prediction from a monomer sequence or structure.

    PubMed

    Baek, Minkyung; Park, Taeyong; Heo, Lim; Park, Chiwook; Seok, Chaok

    2017-07-03

    Homo-oligomerization of proteins is abundant in nature, and is often intimately related with the physiological functions of proteins, such as in metabolism, signal transduction or immunity. Information on the homo-oligomer structure is therefore important to obtain a molecular-level understanding of protein functions and their regulation. Currently available web servers predict protein homo-oligomer structures either by template-based modeling using homo-oligomer templates selected from the protein structure database or by ab initio docking of monomer structures resolved by experiment or predicted by computation. The GalaxyHomomer server, freely accessible at http://galaxy.seoklab.org/homomer, carries out template-based modeling, ab initio docking or both depending on the availability of proper oligomer templates. It also incorporates recently developed model refinement methods that can consistently improve model quality. Moreover, the server provides additional options that can be chosen by the user depending on the availability of information on the monomer structure, oligomeric state and locations of unreliable/flexible loops or termini. The performance of the server was better than or comparable to that of other available methods when tested on benchmark sets and in a recent CASP performed in a blind fashion. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  6. Accounting for observed small angle X-ray scattering profile in the protein-protein docking server ClusPro.

    PubMed

    Xia, Bing; Mamonov, Artem; Leysen, Seppe; Allen, Karen N; Strelkov, Sergei V; Paschalidis, Ioannis Ch; Vajda, Sandor; Kozakov, Dima

    2015-07-30

    The protein-protein docking server ClusPro is used by thousands of laboratories, and models built by the server have been reported in over 300 publications. Although the structures generated by the docking include near-native ones for many proteins, selecting the best model is difficult due to the uncertainty in scoring. Small angle X-ray scattering (SAXS) is an experimental technique for obtaining low resolution structural information in solution. While not sufficient on its own to uniquely predict complex structures, accounting for SAXS data improves the ranking of models and facilitates the identification of the most accurate structure. Although SAXS profiles are currently available only for a small number of complexes, due to its simplicity the method is becoming increasingly popular. Since combining docking with SAXS experiments will provide a viable strategy for fairly high-throughput determination of protein complex structures, the option of using SAXS restraints is added to the ClusPro server. © 2015 Wiley Periodicals, Inc. © 2015 Wiley Periodicals, Inc.

  7. Quantitative theory of hydrophobic effect as a driving force of protein structure

    PubMed Central

    Perunov, Nikolay; England, Jeremy L

    2014-01-01

    Various studies suggest that the hydrophobic effect plays a major role in driving the folding of proteins. In the past, however, it has been challenging to translate this understanding into a predictive, quantitative theory of how the full pattern of sequence hydrophobicity in a protein shapes functionally important features of its tertiary structure. Here, we extend and apply such a phenomenological theory of the sequence-structure relationship in globular protein domains, which had previously been applied to the study of allosteric motion. In an effort to optimize parameters for the model, we first analyze the patterns of backbone burial found in single-domain crystal structures, and discover that classic hydrophobicity scales derived from bulk physicochemical properties of amino acids are already nearly optimal for prediction of burial using the model. Subsequently, we apply the model to studying structural fluctuations in proteins and establish a means of identifying ligand-binding and protein–protein interaction sites using this approach. PMID:24408023

  8. RCK: accurate and efficient inference of sequence- and structure-based protein-RNA binding models from RNAcompete data.

    PubMed

    Orenstein, Yaron; Wang, Yuhao; Berger, Bonnie

    2016-06-15

    Protein-RNA interactions, which play vital roles in many processes, are mediated through both RNA sequence and structure. CLIP-based methods, which measure protein-RNA binding in vivo, suffer from experimental noise and systematic biases, whereas in vitro experiments capture a clearer signal of protein RNA-binding. Among them, RNAcompete provides binding affinities of a specific protein to more than 240 000 unstructured RNA probes in one experiment. The computational challenge is to infer RNA structure- and sequence-based binding models from these data. The state-of-the-art in sequence models, Deepbind, does not model structural preferences. RNAcontext models both sequence and structure preferences, but is outperformed by GraphProt. Unfortunately, GraphProt cannot detect structural preferences from RNAcompete data due to the unstructured nature of the data, as noted by its developers, nor can it be tractably run on the full RNACompete dataset. We develop RCK, an efficient, scalable algorithm that infers both sequence and structure preferences based on a new k-mer based model. Remarkably, even though RNAcompete data is designed to be unstructured, RCK can still learn structural preferences from it. RCK significantly outperforms both RNAcontext and Deepbind in in vitro binding prediction for 244 RNAcompete experiments. Moreover, RCK is also faster and uses less memory, which enables scalability. While currently on par with existing methods in in vivo binding prediction on a small scale test, we demonstrate that RCK will increasingly benefit from experimentally measured RNA structure profiles as compared to computationally predicted ones. By running RCK on the entire RNAcompete dataset, we generate and provide as a resource a set of protein-RNA structure-based models on an unprecedented scale. Software and models are freely available at http://rck.csail.mit.edu/ bab@mit.edu Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.

  9. Structure and function of seed storage proteins in faba bean (Vicia faba L.).

    PubMed

    Liu, Yujiao; Wu, Xuexia; Hou, Wanwei; Li, Ping; Sha, Weichao; Tian, Yingying

    2017-05-01

    The protein subunit is the most important basic unit of protein, and its study can unravel the structure and function of seed storage proteins in faba bean. In this study, we identified six specific protein subunits in Faba bean (cv. Qinghai 13) combining liquid chromatography (LC), liquid chromatography-electronic spray ionization mass (LC-ESI-MS/MS) and bio-information technology. The results suggested a diversity of seed storage proteins in faba bean, and a total of 16 proteins (four GroEL molecular chaperones and 12 plant-specific proteins) were identified from 97-, 96-, 64-, 47-, 42-, and 38-kD-specific protein subunits in faba bean based on the peptide sequence. We also analyzed the composition and abundance of the amino acids, the physicochemical characteristics, secondary structure, three-dimensional structure, transmembrane domain, and possible subcellular localization of these identified proteins in faba bean seed, and finally predicted function and structure. The three-dimensional structures were generated based on homologous modeling, and the protein function was analyzed based on the annotation from the non-redundant protein database (NR database, NCBI) and function analysis of optimal modeling. The objective of this study was to identify the seed storage proteins in faba bean and confirm the structure and function of these proteins. Our results can be useful for the study of protein nutrition and achieve breeding goals for optimal protein quality in faba bean.

  10. An Integrated Framework Advancing Membrane Protein Modeling and Design

    PubMed Central

    Weitzner, Brian D.; Duran, Amanda M.; Tilley, Drew C.; Elazar, Assaf; Gray, Jeffrey J.

    2015-01-01

    Membrane proteins are critical functional molecules in the human body, constituting more than 30% of open reading frames in the human genome. Unfortunately, a myriad of difficulties in overexpression and reconstitution into membrane mimetics severely limit our ability to determine their structures. Computational tools are therefore instrumental to membrane protein structure prediction, consequently increasing our understanding of membrane protein function and their role in disease. Here, we describe a general framework facilitating membrane protein modeling and design that combines the scientific principles for membrane protein modeling with the flexible software architecture of Rosetta3. This new framework, called RosettaMP, provides a general membrane representation that interfaces with scoring, conformational sampling, and mutation routines that can be easily combined to create new protocols. To demonstrate the capabilities of this implementation, we developed four proof-of-concept applications for (1) prediction of free energy changes upon mutation; (2) high-resolution structural refinement; (3) protein-protein docking; and (4) assembly of symmetric protein complexes, all in the membrane environment. Preliminary data show that these algorithms can produce meaningful scores and structures. The data also suggest needed improvements to both sampling routines and score functions. Importantly, the applications collectively demonstrate the potential of combining the flexible nature of RosettaMP with the power of Rosetta algorithms to facilitate membrane protein modeling and design. PMID:26325167

  11. Partial unfolding and refolding for structure refinement: A unified approach of geometric simulations and molecular dynamics.

    PubMed

    Kumar, Avishek; Campitelli, Paul; Thorpe, M F; Ozkan, S Banu

    2015-12-01

    The most successful protein structure prediction methods to date have been template-based modeling (TBM) or homology modeling, which predicts protein structure based on experimental structures. These high accuracy predictions sometimes retain structural errors due to incorrect templates or a lack of accurate templates in the case of low sequence similarity, making these structures inadequate in drug-design studies or molecular dynamics simulations. We have developed a new physics based approach to the protein refinement problem by mimicking the mechanism of chaperons that rehabilitate misfolded proteins. The template structure is unfolded by selectively (targeted) pulling on different portions of the protein using the geometric based technique FRODA, and then refolded using hierarchically restrained replica exchange molecular dynamics simulations (hr-REMD). FRODA unfolding is used to create a diverse set of topologies for surveying near native-like structures from a template and to provide a set of persistent contacts to be employed during re-folding. We have tested our approach on 13 previous CASP targets and observed that this method of folding an ensemble of partially unfolded structures, through the hierarchical addition of contact restraints (that is, first local and then nonlocal interactions), leads to a refolding of the structure along with refinement in most cases (12/13). Although this approach yields refined models through advancement in sampling, the task of blind selection of the best refined models still needs to be solved. Overall, the method can be useful for improved sampling for low resolution models where certain of the portions of the structure are incorrectly modeled. © 2015 Wiley Periodicals, Inc.

  12. Identify High-Quality Protein Structural Models by Enhanced K-Means.

    PubMed

    Wu, Hongjie; Li, Haiou; Jiang, Min; Chen, Cheng; Lv, Qiang; Wu, Chuang

    2017-01-01

    Background. One critical issue in protein three-dimensional structure prediction using either ab initio or comparative modeling involves identification of high-quality protein structural models from generated decoys. Currently, clustering algorithms are widely used to identify near-native models; however, their performance is dependent upon different conformational decoys, and, for some algorithms, the accuracy declines when the decoy population increases. Results. Here, we proposed two enhanced K -means clustering algorithms capable of robustly identifying high-quality protein structural models. The first one employs the clustering algorithm SPICKER to determine the initial centroids for basic K -means clustering ( SK -means), whereas the other employs squared distance to optimize the initial centroids ( K -means++). Our results showed that SK -means and K -means++ were more robust as compared with SPICKER alone, detecting 33 (59%) and 42 (75%) of 56 targets, respectively, with template modeling scores better than or equal to those of SPICKER. Conclusions. We observed that the classic K -means algorithm showed a similar performance to that of SPICKER, which is a widely used algorithm for protein-structure identification. Both SK -means and K -means++ demonstrated substantial improvements relative to results from SPICKER and classical K -means.

  13. Identify High-Quality Protein Structural Models by Enhanced K-Means

    PubMed Central

    Li, Haiou; Chen, Cheng; Lv, Qiang; Wu, Chuang

    2017-01-01

    Background. One critical issue in protein three-dimensional structure prediction using either ab initio or comparative modeling involves identification of high-quality protein structural models from generated decoys. Currently, clustering algorithms are widely used to identify near-native models; however, their performance is dependent upon different conformational decoys, and, for some algorithms, the accuracy declines when the decoy population increases. Results. Here, we proposed two enhanced K-means clustering algorithms capable of robustly identifying high-quality protein structural models. The first one employs the clustering algorithm SPICKER to determine the initial centroids for basic K-means clustering (SK-means), whereas the other employs squared distance to optimize the initial centroids (K-means++). Our results showed that SK-means and K-means++ were more robust as compared with SPICKER alone, detecting 33 (59%) and 42 (75%) of 56 targets, respectively, with template modeling scores better than or equal to those of SPICKER. Conclusions. We observed that the classic K-means algorithm showed a similar performance to that of SPICKER, which is a widely used algorithm for protein-structure identification. Both SK-means and K-means++ demonstrated substantial improvements relative to results from SPICKER and classical K-means. PMID:28421198

  14. The Prediction of Botulinum Toxin Structure Based on in Silico and in Vitro Analysis

    NASA Astrophysics Data System (ADS)

    Suzuki, Tomonori; Miyazaki, Satoru

    2011-01-01

    Many of biological system mediated through protein-protein interactions. Knowledge of protein-protein complex structure is required for understanding the function. The determination of huge size and flexible protein-protein complex structure by experimental studies remains difficult, costly and five-consuming, therefore computational prediction of protein structures by homolog modeling and docking studies is valuable method. In addition, MD simulation is also one of the most powerful methods allowing to see the real dynamics of proteins. Here, we predict protein-protein complex structure of botulinum toxin to analyze its property. These bioinformatics methods are useful to report the relation between the flexibility of backbone structure and the activity.

  15. Use of 13Cα Chemical-Shifts in Protein Structure Determination

    PubMed Central

    Vila, Jorge A.; Ripoll, Daniel R.; Scheraga, Harold A.

    2008-01-01

    A physics-based method, aimed at determining protein structures by using NOE-derived distances together with observed and computed 13C chemical shifts, is proposed. The approach makes use of 13Cα chemical shifts, computed at the density functional level of theory, to obtain torsional constraints for all backbone and side-chain torsional angles without making a priori use of the occupancy of any region of the Ramachandran map by the amino acid residues. The torsional constraints are not fixed but are changed dynamically in each step of the procedure, following an iterative self-consistent approach intended to identify a set of conformations for which the computed 13Cα chemical shifts match the experimental ones. A test is carried out on a 76-amino acid all-α-helical protein, namely the B. Subtilis acyl carrier protein. It is shown that, starting from randomly generated conformations, the final protein models are more accurate than an existing NMR-derived structure model of this protein, in terms of both the agreement between predicted and observed 13Cα chemical shifts and some stereochemical quality indicators, and of similar accuracy as one of the protein models solved at a high level of resolution. The results provide evidence that this methodology can be used not only for structure determination but also for additional protein structure refinement of NMR-derived models deposited in the Protein Data Bank. PMID:17516673

  16. A Score of the Ability of a Three-Dimensional Protein Model to Retrieve Its Own Sequence as a Quantitative Measure of Its Quality and Appropriateness

    PubMed Central

    Martínez-Castilla, León P.; Rodríguez-Sotres, Rogelio

    2010-01-01

    Background Despite the remarkable progress of bioinformatics, how the primary structure of a protein leads to a three-dimensional fold, and in turn determines its function remains an elusive question. Alignments of sequences with known function can be used to identify proteins with the same or similar function with high success. However, identification of function-related and structure-related amino acid positions is only possible after a detailed study of every protein. Folding pattern diversity seems to be much narrower than sequence diversity, and the amino acid sequences of natural proteins have evolved under a selective pressure comprising structural and functional requirements acting in parallel. Principal Findings The approach described in this work begins by generating a large number of amino acid sequences using ROSETTA [Dantas G et al. (2003) J Mol Biol 332:449–460], a program with notable robustness in the assignment of amino acids to a known three-dimensional structure. The resulting sequence-sets showed no conservation of amino acids at active sites, or protein-protein interfaces. Hidden Markov models built from the resulting sequence sets were used to search sequence databases. Surprisingly, the models retrieved from the database sequences belonged to proteins with the same or a very similar function. Given an appropriate cutoff, the rate of false positives was zero. According to our results, this protocol, here referred to as Rd.HMM, detects fine structural details on the folding patterns, that seem to be tightly linked to the fitness of a structural framework for a specific biological function. Conclusion Because the sequence of the native protein used to create the Rd.HMM model was always amongst the top hits, the procedure is a reliable tool to score, very accurately, the quality and appropriateness of computer-modeled 3D-structures, without the need for spectroscopy data. However, Rd.HMM is very sensitive to the conformational features of the models' backbone. PMID:20830209

  17. In silico designing of therapeutic protein enriched with branched-chain amino acids for the dietary treatment of chronic liver disease.

    PubMed

    L, Sunil; Vasu, Prasanna

    2017-09-01

    Leucine, isoleucine, and valine are three essential branched-chain amino acids (BCAA) account for 40-45% of total essential amino acids. BCAA stimulates protein synthesis primarily in skeletal muscles, and it can directly transport to circulatory blood stream bypassing the liver. Hence, a protein enriched with BCAA is an important therapeutic target for the dietary treatment of chronic liver disease. The present study is to design a synthetic protein enriched with BCAA and the challenge is to maximize the BCAA content, keeping the balanced ratio of leucine, isoleucine, valine - 2: 1: 1.2 as specified by WHO/UNU/FAO. Here, we turned the general concept of homology modeling and tried to find a suitable scaffold (α-helix) to host an excess amount of BCAA for increased stability and digestibility. A total of 50 protein models were constructed by using SWISS-MODEL, Modeller 9.17, ProtParam tool, and allergen online tools. Out of 50 different protein models, protein model-50 was found to be best, which had a well-defined 3D structure, good in silico digestibility, balanced ratio of BCAA and showed 65.57% structure identity to the template apo-bovine α-lactalbumin (1F6R). Templates search was performed against PDB using PSI-BLAST, SWISS-MODEL, PROFUNC, I-TASSER, and ConSurf. The secondary structure was predicted by PSSPred, PSIPRED, I-TASSER, PORTER, and SPIDER2. The modeled structure of protein Model-50 was validated by PROCHECK, ERRAT, ProSA, and QMEAN. COACH and ProFUNC tools were performed to determine the functional effects of protein model-50. Overall, the BCAA was enriched from 22 to 56.4% with the balanced ratio of Leu: Ile: Val (2: 1: 1.2). The Ramachandran plot showed 97.7% of the amino acid residues in allowed regions with ERRAT score of 86.05. We have successfully modeled the complete three-dimensional structure of the target protein model-50 using highly reputed computational tools. Copyright © 2017 Elsevier Inc. All rights reserved.

  18. Cross-Link Guided Molecular Modeling with ROSETTA

    PubMed Central

    Leitner, Alexander; Rosenberger, George; Aebersold, Ruedi; Malmström, Lars

    2013-01-01

    Chemical cross-links identified by mass spectrometry generate distance restraints that reveal low-resolution structural information on proteins and protein complexes. The technology to reliably generate such data has become mature and robust enough to shift the focus to the question of how these distance restraints can be best integrated into molecular modeling calculations. Here, we introduce three workflows for incorporating distance restraints generated by chemical cross-linking and mass spectrometry into ROSETTA protocols for comparative and de novo modeling and protein-protein docking. We demonstrate that the cross-link validation and visualization software Xwalk facilitates successful cross-link data integration. Besides the protocols we introduce XLdb, a database of chemical cross-links from 14 different publications with 506 intra-protein and 62 inter-protein cross-links, where each cross-link can be mapped on an experimental structure from the Protein Data Bank. Finally, we demonstrate on a protein-protein docking reference data set the impact of virtual cross-links on protein docking calculations and show that an inter-protein cross-link can reduce on average the RMSD of a docking prediction by 5.0 Å. The methods and results presented here provide guidelines for the effective integration of chemical cross-link data in molecular modeling calculations and should advance the structural analysis of particularly large and transient protein complexes via hybrid structural biology methods. PMID:24069194

  19. XAS Characterization of the Zn Site of Non-structural Protein 3 (NS3) from Hepatitis C Virus

    NASA Astrophysics Data System (ADS)

    Ascone, I.; Nobili, G.; Benfatto, M.; Congiu-Castellano, A.

    2007-02-01

    XANES spectra of non structural protein 3 (NS3) have been calculated using 4 Zn coordination models from three crystallographic structures in the Protein Data Base (PDB): 1DY9, subunit B, 1CU1 subunit A and B, and 1JXP subunit B. Results indicate that XANES is an appropriate tool to distinguish among them. Experimental XANES spectra have been simulated refining crystallographic data. The model obtained by XAS is compared with the PDB models.

  20. In silico modeling of the yeast protein and protein family interaction network

    NASA Astrophysics Data System (ADS)

    Goh, K.-I.; Kahng, B.; Kim, D.

    2004-03-01

    Understanding of how protein interaction networks of living organisms have evolved or are organized can be the first stepping stone in unveiling how life works on a fundamental ground. Here we introduce an in silico ``coevolutionary'' model for the protein interaction network and the protein family network. The essential ingredient of the model includes the protein family identity and its robustness under evolution, as well as the three previously proposed: gene duplication, divergence, and mutation. This model produces a prototypical feature of complex networks in a wide range of parameter space, following the generalized Pareto distribution in connectivity. Moreover, we investigate other structural properties of our model in detail with some specific values of parameters relevant to the yeast Saccharomyces cerevisiae, showing excellent agreement with the empirical data. Our model indicates that the physical constraints encoded via the domain structure of proteins play a crucial role in protein interactions.

  1. A generative, probabilistic model of local protein structure.

    PubMed

    Boomsma, Wouter; Mardia, Kanti V; Taylor, Charles C; Ferkinghoff-Borg, Jesper; Krogh, Anders; Hamelryck, Thomas

    2008-07-01

    Despite significant progress in recent years, protein structure prediction maintains its status as one of the prime unsolved problems in computational biology. One of the key remaining challenges is an efficient probabilistic exploration of the structural space that correctly reflects the relative conformational stabilities. Here, we present a fully probabilistic, continuous model of local protein structure in atomic detail. The generative model makes efficient conformational sampling possible and provides a framework for the rigorous analysis of local sequence-structure correlations in the native state. Our method represents a significant theoretical and practical improvement over the widely used fragment assembly technique by avoiding the drawbacks associated with a discrete and nonprobabilistic approach.

  2. Web-ware bioinformatical analysis and structure modelling of N-terminus of human multisynthetase complex auxiliary component protein p43.

    PubMed

    Deineko, Viktor

    2006-01-01

    Human multisynthetase complex auxiliary component, protein p43 is an endothelial monocyte-activating polypeptide II precursor. In this study, comprehensive sequence analysis of N-terminus has been performed to identify structural domains, motifs, sites of post-translation modification and other functionally important parameters. The spatial structure model of full-chain protein p43 is obtained.

  3. As Simple As Possible, but Not Simpler: Exploring the Fidelity of Coarse-Grained Protein Models for Simulated Force Spectroscopy

    PubMed Central

    Rottler, Jörg; Plotkin, Steven S.

    2016-01-01

    Mechanical unfolding of a single domain of loop-truncated superoxide dismutase protein has been simulated via force spectroscopy techniques with both all-atom (AA) models and several coarse-grained models having different levels of resolution: A Gō model containing all heavy atoms in the protein (HA-Gō), the associative memory, water mediated, structure and energy model (AWSEM) which has 3 interaction sites per amino acid, and a Gō model containing only one interaction site per amino acid at the Cα position (Cα-Gō). To systematically compare results across models, the scales of time, energy, and force had to be suitably renormalized in each model. Surprisingly, the HA-Gō model gives the softest protein, exhibiting much smaller force peaks than all other models after the above renormalization. Clustering to render a structural taxonomy as the protein unfolds showed that the AA, HA-Gō, and Cα-Gō models exhibit a single pathway for early unfolding, which eventually bifurcates repeatedly to multiple branches only after the protein is about half-unfolded. The AWSEM model shows a single dominant unfolding pathway over the whole range of unfolding, in contrast to all other models. TM alignment, clustering analysis, and native contact maps show that the AWSEM pathway has however the most structural similarity to the AA model at high nativeness, but the least structural similarity to the AA model at low nativeness. In comparison to the AA model, the sequence of native contact breakage is best predicted by the HA-Gō model. All models consistently predict a similar unfolding mechanism for early force-induced unfolding events, but diverge in their predictions for late stage unfolding events when the protein is more significantly disordered. PMID:27898663

  4. As Simple As Possible, but Not Simpler: Exploring the Fidelity of Coarse-Grained Protein Models for Simulated Force Spectroscopy.

    PubMed

    Habibi, Mona; Rottler, Jörg; Plotkin, Steven S

    2016-11-01

    Mechanical unfolding of a single domain of loop-truncated superoxide dismutase protein has been simulated via force spectroscopy techniques with both all-atom (AA) models and several coarse-grained models having different levels of resolution: A Gō model containing all heavy atoms in the protein (HA-Gō), the associative memory, water mediated, structure and energy model (AWSEM) which has 3 interaction sites per amino acid, and a Gō model containing only one interaction site per amino acid at the Cα position (Cα-Gō). To systematically compare results across models, the scales of time, energy, and force had to be suitably renormalized in each model. Surprisingly, the HA-Gō model gives the softest protein, exhibiting much smaller force peaks than all other models after the above renormalization. Clustering to render a structural taxonomy as the protein unfolds showed that the AA, HA-Gō, and Cα-Gō models exhibit a single pathway for early unfolding, which eventually bifurcates repeatedly to multiple branches only after the protein is about half-unfolded. The AWSEM model shows a single dominant unfolding pathway over the whole range of unfolding, in contrast to all other models. TM alignment, clustering analysis, and native contact maps show that the AWSEM pathway has however the most structural similarity to the AA model at high nativeness, but the least structural similarity to the AA model at low nativeness. In comparison to the AA model, the sequence of native contact breakage is best predicted by the HA-Gō model. All models consistently predict a similar unfolding mechanism for early force-induced unfolding events, but diverge in their predictions for late stage unfolding events when the protein is more significantly disordered.

  5. Contact-assisted protein structure modeling by global optimization in CASP11.

    PubMed

    Joo, Keehyoung; Joung, InSuk; Cheng, Qianyi; Lee, Sung Jong; Lee, Jooyoung

    2016-09-01

    We have applied the conformational space annealing method to the contact-assisted protein structure modeling in CASP11. For Tp targets, where predicted residue-residue contact information was provided, the contact energy term in the form of the Lorentzian function was implemented together with the physical energy terms used in our template-free modeling of proteins. Although we observed some structural improvement of Tp models over the models predicted without the Tp information, the improvement was not substantial on average. This is partly due to the inaccuracy of the provided contact information, where only about 18% of it was correct. For Ts targets, where the information of ambiguous NOE (Nuclear Overhauser Effect) restraints was provided, we formulated the modeling in terms of the two-tier optimization problem, which covers: (1) the assignment of NOE peaks and (2) the three-dimensional (3D) model generation based on the assigned NOEs. Although solving the problem in a direct manner appears to be intractable at first glance, we demonstrate through CASP11 that remarkably accurate protein 3D modeling is possible by brute force optimization of a relevant energy function. For 19 Ts targets of the average size of 224 residues, generated protein models were of about 3.6 Å Cα atom accuracy. Even greater structural improvement was observed when additional Tc contact information was provided. For 20 out of the total 24 Tc targets, we were able to generate protein structures which were better than the best model from the rest of the CASP11 groups in terms of GDT-TS. Proteins 2016; 84(Suppl 1):189-199. © 2015 Wiley Periodicals, Inc. © 2015 Wiley Periodicals, Inc.

  6. CCBuilder 2.0: Powerful and accessible coiled‐coil modeling

    PubMed Central

    Wood, Christopher W.

    2017-01-01

    Abstract The increased availability of user‐friendly and accessible computational tools for biomolecular modeling would expand the reach and application of biomolecular engineering and design. For protein modeling, one key challenge is to reduce the complexities of 3D protein folds to sets of parametric equations that nonetheless capture the salient features of these structures accurately. At present, this is possible for a subset of proteins, namely, repeat proteins. The α‐helical coiled coil provides one such example, which represents ≈ 3–5% of all known protein‐encoding regions of DNA. Coiled coils are bundles of α helices that can be described by a small set of structural parameters. Here we describe how this parametric description can be implemented in an easy‐to‐use web application, called CCBuilder 2.0, for modeling and optimizing both α‐helical coiled coils and polyproline‐based collagen triple helices. This has many applications from providing models to aid molecular replacement for X‐ray crystallography, in silico model building and engineering of natural and designed protein assemblies, and through to the creation of completely de novo “dark matter” protein structures. CCBuilder 2.0 is available as a web‐based application, the code for which is open‐source and can be downloaded freely. http://coiledcoils.chm.bris.ac.uk/ccbuilder2. Lay Summary We have created CCBuilder 2.0, an easy to use web‐based application that can model structures for a whole class of proteins, the α‐helical coiled coil, which is estimated to account for 3–5% of all proteins in nature. CCBuilder 2.0 will be of use to a large number of protein scientists engaged in fundamental studies, such as protein structure determination, through to more‐applied research including designing and engineering novel proteins that have potential applications in biotechnology. PMID:28836317

  7. Computational modeling of Repeat1 region of INI1/hSNF5: An evolutionary link with ubiquitin

    PubMed Central

    Bhutoria, Savita

    2016-01-01

    Abstract The structure of a protein can be very informative of its function. However, determining protein structures experimentally can often be very challenging. Computational methods have been used successfully in modeling structures with sufficient accuracy. Here we have used computational tools to predict the structure of an evolutionarily conserved and functionally significant domain of Integrase interactor (INI)1/hSNF5 protein. INI1 is a component of the chromatin remodeling SWI/SNF complex, a tumor suppressor and is involved in many protein‐protein interactions. It belongs to SNF5 family of proteins that contain two conserved repeat (Rpt) domains. Rpt1 domain of INI1 binds to HIV‐1 Integrase, and acts as a dominant negative mutant to inhibit viral replication. Rpt1 domain also interacts with oncogene c‐MYC and modulates its transcriptional activity. We carried out an ab initio modeling of a segment of INI1 protein containing the Rpt1 domain. The structural model suggested the presence of a compact and well defined ββαα topology as core structure in the Rpt1 domain of INI1. This topology in Rpt1 was similar to PFU domain of Phospholipase A2 Activating Protein, PLAA. Interestingly, PFU domain shares similarity with Ubiquitin and has ubiquitin binding activity. Because of the structural similarity between Rpt1 domain of INI1 and PFU domain of PLAA, we propose that Rpt1 domain of INI1 may participate in ubiquitin recognition or binding with ubiquitin or ubiquitin related proteins. This modeling study may shed light on the mode of interactions of Rpt1 domain of INI1 and is likely to facilitate future functional studies of INI1. PMID:27261671

  8. An all-atom structure-based potential for proteins: bridging minimal models with all-atom empirical forcefields.

    PubMed

    Whitford, Paul C; Noel, Jeffrey K; Gosavi, Shachi; Schug, Alexander; Sanbonmatsu, Kevin Y; Onuchic, José N

    2009-05-01

    Protein dynamics take place on many time and length scales. Coarse-grained structure-based (Go) models utilize the funneled energy landscape theory of protein folding to provide an understanding of both long time and long length scale dynamics. All-atom empirical forcefields with explicit solvent can elucidate our understanding of short time dynamics with high energetic and structural resolution. Thus, structure-based models with atomic details included can be used to bridge our understanding between these two approaches. We report on the robustness of folding mechanisms in one such all-atom model. Results for the B domain of Protein A, the SH3 domain of C-Src Kinase, and Chymotrypsin Inhibitor 2 are reported. The interplay between side chain packing and backbone folding is explored. We also compare this model to a C(alpha) structure-based model and an all-atom empirical forcefield. Key findings include: (1) backbone collapse is accompanied by partial side chain packing in a cooperative transition and residual side chain packing occurs gradually with decreasing temperature, (2) folding mechanisms are robust to variations of the energetic parameters, (3) protein folding free-energy barriers can be manipulated through parametric modifications, (4) the global folding mechanisms in a C(alpha) model and the all-atom model agree, although differences can be attributed to energetic heterogeneity in the all-atom model, and (5) proline residues have significant effects on folding mechanisms, independent of isomerization effects. Because this structure-based model has atomic resolution, this work lays the foundation for future studies to probe the contributions of specific energetic factors on protein folding and function.

  9. An All-atom Structure-Based Potential for Proteins: Bridging Minimal Models with All-atom Empirical Forcefields

    PubMed Central

    Whitford, Paul C.; Noel, Jeffrey K.; Gosavi, Shachi; Schug, Alexander; Sanbonmatsu, Kevin Y.; Onuchic, José N.

    2012-01-01

    Protein dynamics take place on many time and length scales. Coarse-grained structure-based (Gō) models utilize the funneled energy landscape theory of protein folding to provide an understanding of both long time and long length scale dynamics. All-atom empirical forcefields with explicit solvent can elucidate our understanding of short time dynamics with high energetic and structural resolution. Thus, structure-based models with atomic details included can be used to bridge our understanding between these two approaches. We report on the robustness of folding mechanisms in one such all-atom model. Results for the B domain of Protein A, the SH3 domain of C-Src Kinase and Chymotrypsin Inhibitor 2 are reported. The interplay between side chain packing and backbone folding is explored. We also compare this model to a Cα structure-based model and an all-atom empirical forcefield. Key findings include 1) backbone collapse is accompanied by partial side chain packing in a cooperative transition and residual side chain packing occurs gradually with decreasing temperature 2) folding mechanisms are robust to variations of the energetic parameters 3) protein folding free energy barriers can be manipulated through parametric modifications 4) the global folding mechanisms in a Cα model and the all-atom model agree, although differences can be attributed to energetic heterogeneity in the all-atom model 5) proline residues have significant effects on folding mechanisms, independent of isomerization effects. Since this structure-based model has atomic resolution, this work lays the foundation for future studies to probe the contributions of specific energetic factors on protein folding and function. PMID:18837035

  10. QSAR modeling of human serum protein binding with several modeling techniques utilizing structure-information representation.

    PubMed

    Votano, Joseph R; Parham, Marc; Hall, L Mark; Hall, Lowell H; Kier, Lemont B; Oloff, Scott; Tropsha, Alexander

    2006-11-30

    Four modeling techniques, using topological descriptors to represent molecular structure, were employed to produce models of human serum protein binding (% bound) on a data set of 1008 experimental values, carefully screened from publicly available sources. To our knowledge, this data is the largest set on human serum protein binding reported for QSAR modeling. The data was partitioned into a training set of 808 compounds and an external validation test set of 200 compounds. Partitioning was accomplished by clustering the compounds in a structure descriptor space so that random sampling of 20% of the whole data set produced an external test set that is a good representative of the training set with respect to both structure and protein binding values. The four modeling techniques include multiple linear regression (MLR), artificial neural networks (ANN), k-nearest neighbors (kNN), and support vector machines (SVM). With the exception of the MLR model, the ANN, kNN, and SVM QSARs were ensemble models. Training set correlation coefficients and mean absolute error ranged from r2=0.90 and MAE=7.6 for ANN to r2=0.61 and MAE=16.2 for MLR. Prediction results from the validation set yielded correlation coefficients and mean absolute errors which ranged from r2=0.70 and MAE=14.1 for ANN to a low of r2=0.59 and MAE=18.3 for the SVM model. Structure descriptors that contribute significantly to the models are discussed and compared with those found in other published models. For the ANN model, structure descriptor trends with respect to their affects on predicted protein binding can assist the chemist in structure modification during the drug design process.

  11. A scoring function based on solvation thermodynamics for protein structure prediction

    PubMed Central

    Du, Shiqiao; Harano, Yuichi; Kinoshita, Masahiro; Sakurai, Minoru

    2012-01-01

    We predict protein structure using our recently developed free energy function for describing protein stability, which is focused on solvation thermodynamics. The function is combined with the current most reliable sampling methods, i.e., fragment assembly (FA) and comparative modeling (CM). The prediction is tested using 11 small proteins for which high-resolution crystal structures are available. For 8 of these proteins, sequence similarities are found in the database, and the prediction is performed with CM. Fairly accurate models with average Cα root mean square deviation (RMSD) ∼ 2.0 Å are successfully obtained for all cases. For the rest of the target proteins, we perform the prediction following FA protocols. For 2 cases, we obtain predicted models with an RMSD ∼ 3.0 Å as the best-scored structures. For the other case, the RMSD remains larger than 7 Å. For all the 11 target proteins, our scoring function identifies the experimentally determined native structure as the best structure. Starting from the predicted structure, replica exchange molecular dynamics is performed to further refine the structures. However, we are unable to improve its RMSD toward the experimental structure. The exhaustive sampling by coarse-grained normal mode analysis around the native structures reveals that our function has a linear correlation with RMSDs < 3.0 Å. These results suggest that the function is quite reliable for the protein structure prediction while the sampling method remains one of the major limiting factors in it. The aspects through which the methodology could further be improved are discussed. PMID:27493529

  12. Molecular basis of protein structure in proanthocyanidin and anthocyanin-enhanced Lc-transgenic alfalfa in relation to nutritive value using synchrotron-radiation FTIR microspectroscopy: A novel approach

    NASA Astrophysics Data System (ADS)

    Yu, Peiqiang; Jonker, Arjan; Gruber, Margaret

    2009-09-01

    To date there has been very little application of synchrotron radiation-based Fourier transform infrared microspectroscopy (SRFTIRM) to the study of molecular structures in plant forage in relation to livestock digestive behavior and nutrient availability. Protein inherent structure, among other factors such as protein matrix, affects nutritive quality, fermentation and degradation behavior in both humans and animals. The relative percentage of protein secondary structure influences protein value. A high percentage of β-sheets usually reduce the access of gastrointestinal digestive enzymes to the protein. Reduced accessibility results in poor digestibility and as a result, low protein value. The objective of this study was to use SRFTIRM to compare protein molecular structure of alfalfa plant tissues transformed with the maize Lc regulatory gene with non-transgenic alfalfa protein within cellular and subcellular dimensions and to quantify protein inherent structure profiles using Gaussian and Lorentzian methods of multi-component peak modeling. Protein molecular structure revealed by this method included α-helices, β-sheets and other structures such as β-turns and random coils. Hierarchical cluster analysis and principal component analysis of the synchrotron data, as well as accurate spectral analysis based on curve fitting, showed that transgenic alfalfa contained a relatively lower ( P < 0.05) percentage of the model-fitted α-helices (29 vs. 34) and model-fitted β-sheets (22 vs. 27) and a higher ( P < 0.05) percentage of other model-fitted structures (49 vs. 39). Transgenic alfalfa protein displayed no difference ( P > 0.05) in the ratio of α-helices to β-sheets (average: 1.4) and higher ( P < 0.05) ratios of α-helices to others (0.7 vs. 0.9) and β-sheets to others (0.5 vs. 0.8) than the non-transgenic alfalfa protein. The transgenic protein structures also exhibited no difference ( P > 0.05) in the vibrational intensity of protein amide I (average of 24) and amide II areas (average of 10) and their ratio (average of 2.4) compared with non-transgenic alfalfa. Cluster analysis and principal component analysis showed no significant differences between the two genotypes in the broad molecular fingerprint region, amides I and II regions, and the carbohydrate molecular region, indicating they are highly related to each other. The results suggest that transgenic Lc-alfalfa leaves contain similar proteins to non-transgenic alfalfa (because amide I and II intensities were identical), but a subtle difference in protein molecular structure after freeze drying. Further study is needed to understand the relationship between these structural profiles and biological features such as protein nutrient availability, protein bypass and digestive behavior of livestock fed with this type of forage.

  13. Molecular Basis of Protein Structure in Proanthocyanidin and Anthocyanin-Enhanced Lc-transgenic Alfalfa in Relation to Nutritive Value Using Synchrotron-Radiation FTIR Microspectroscopy: A Novel Approach

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Yu, P.; Jonker, A; Gruber, M

    2009-01-01

    To date there has been very little application of synchrotron radiation-based Fourier transform infrared microspectroscopy (SRFTIRM) to the study of molecular structures in plant forage in relation to livestock digestive behavior and nutrient availability. Protein inherent structure, among other factors such as protein matrix, affects nutritive quality, fermentation and degradation behavior in both humans and animals. The relative percentage of protein secondary structure influences protein value. A high percentage of e-sheets usually reduce the access of gastrointestinal digestive enzymes to the protein. Reduced accessibility results in poor digestibility and as a result, low protein value. The objective of this studymore » was to use SRFTIRM to compare protein molecular structure of alfalfa plant tissues transformed with the maize Lc regulatory gene with non-transgenic alfalfa protein within cellular and subcellular dimensions and to quantify protein inherent structure profiles using Gaussian and Lorentzian methods of multi-component peak modeling. Protein molecular structure revealed by this method included a-helices, e-sheets and other structures such as e-turns and random coils. Hierarchical cluster analysis and principal component analysis of the synchrotron data, as well as accurate spectral analysis based on curve fitting, showed that transgenic alfalfa contained a relatively lower (P < 0.05) percentage of the model-fitted a-helices (29 vs. 34) and model-fitted e-sheets (22 vs. 27) and a higher (P < 0.05) percentage of other model-fitted structures (49 vs. 39). Transgenic alfalfa protein displayed no difference (P > 0.05) in the ratio of a-helices to e-sheets (average: 1.4) and higher (P < 0.05) ratios of a-helices to others (0.7 vs. 0.9) and e-sheets to others (0.5 vs. 0.8) than the non-transgenic alfalfa protein. The transgenic protein structures also exhibited no difference (P > 0.05) in the vibrational intensity of protein amide I (average of 24) and amide II areas (average of 10) and their ratio (average of 2.4) compared with non-transgenic alfalfa. Cluster analysis and principal component analysis showed no significant differences between the two genotypes in the broad molecular fingerprint region, amides I and II regions, and the carbohydrate molecular region, indicating they are highly related to each other. The results suggest that transgenic Lc-alfalfa leaves contain similar proteins to non-transgenic alfalfa (because amide I and II intensities were identical), but a subtle difference in protein molecular structure after freeze drying. Further study is needed to understand the relationship between these structural profiles and biological features such as protein nutrient availability, protein bypass and digestive behavior of livestock fed with this type of forage.« less

  14. Applications of graph theory in protein structure identification

    PubMed Central

    2011-01-01

    There is a growing interest in the identification of proteins on the proteome wide scale. Among different kinds of protein structure identification methods, graph-theoretic methods are very sharp ones. Due to their lower costs, higher effectiveness and many other advantages, they have drawn more and more researchers’ attention nowadays. Specifically, graph-theoretic methods have been widely used in homology identification, side-chain cluster identification, peptide sequencing and so on. This paper reviews several methods in solving protein structure identification problems using graph theory. We mainly introduce classical methods and mathematical models including homology modeling based on clique finding, identification of side-chain clusters in protein structures upon graph spectrum, and de novo peptide sequencing via tandem mass spectrometry using the spectrum graph model. In addition, concluding remarks and future priorities of each method are given. PMID:22165974

  15. From laptop to benchtop to bedside: Structure-based Drug Design on Protein Targets

    PubMed Central

    Chen, Lu; Morrow, John K.; Tran, Hoang T.; Phatak, Sharangdhar S.; Du-Cuny, Lei; Zhang, Shuxing

    2013-01-01

    As an important aspect of computer-aided drug design, structure-based drug design brought a new horizon to pharmaceutical development. This in silico method permeates all aspects of drug discovery today, including lead identification, lead optimization, ADMET prediction and drug repurposing. Structure-based drug design has resulted in fruitful successes drug discovery targeting protein-ligand and protein-protein interactions. Meanwhile, challenges, noted by low accuracy and combinatoric issues, may also cause failures. In this review, state-of-the-art techniques for protein modeling (e.g. structure prediction, modeling protein flexibility, etc.), hit identification/optimization (e.g. molecular docking, focused library design, fragment-based design, molecular dynamic, etc.), and polypharmacology design will be discussed. We will explore how structure-based techniques can facilitate the drug discovery process and interplay with other experimental approaches. PMID:22316152

  16. Role of Transbilayer Distribution of Lipid Molecules on the Structure and Protein-Lipid Interaction of an Amyloidogenic Protein on the Membrane Surface

    NASA Astrophysics Data System (ADS)

    Cheng, Kwan; Cheng, Sara

    We used molecular dynamics simulations to examine the effects of transbilayer distribution of lipid molecules, particularly anionic lipids with negatively charged headgroups, on the structure and binding kinetics of an amyloidogenic protein on the membrane surface and subsequent protein-induced structural disruption of the membrane. Our systems consisted of a model beta-sheet rich dimeric protein absorbed on asymmetric bilayers with neutral and anionic lipids and symmetric bilayers with neutral lipids. We observed larger folding, domain aggregation, and tilt angle of the absorbed protein on the asymmetric bilayer surfaces. We also detected more focused bilayer thinning in the asymmetric bilayer due to weak lipid-protein interactions. Our results support the mechanism that the higher lipid packing in the protein-contacting lipid leaflet promotes stronger protein-protein but weaker protein-lipid interactions of an amyloidogenic protein on the membrane surface. We speculate that the observed surface-induced structural and protein-lipid interaction of our model amyloidogenic protein may play a role in the early membrane-associated amyloid cascade pathway that leads to membrane structural damage of neurons in Alzheimer's disease. NSF ACI-1531594.

  17. Stochastic Protein Multimerization, Cooperativity and Fitness

    NASA Astrophysics Data System (ADS)

    Hagner, Kyle; Setayeshgar, Sima; Lynch, Michael

    Many proteins assemble into multimeric structures that can vary greatly among phylogenetic lineages. As protein-protein interactions (PPI) require productive encounters among subunits, these structural variations are related in part to variation in cellular protein abundance. The protein abundance in turn depends on the intrinsic rates of production and decay of mRNA and protein molecules, as well as rates of cell growth and division. We present a stochastic model for prediction of the multimeric state of a protein as a function of these processes and the free energy associated with binding interfaces. We demonstrate favorable agreement between the model and a wide class of proteins using E. coli proteome data. As such, this platform, which links protein abundance, PPI and quaternary structure in growing and dividing cells can be extended to evolutionary models for the emergence and diversification of multimeric proteins. We investigate cooperativity - a ubiquitous functional property of multimeric proteins - as a possible selective force driving multimerization, demonstrating a reduction in the cost of protein production relative to the overall proteome energy budget that can be tied to fitness.

  18. Structural elucidation of transmembrane domain zero (TMD0) of EcdL: A multidrug resistance-associated protein (MRP) family of ATP-binding cassette transporter protein revealed by atomistic simulation.

    PubMed

    Bera, Krishnendu; Rani, Priyanka; Kishor, Gaurav; Agarwal, Shikha; Kumar, Antresh; Singh, Durg Vijay

    2017-09-20

    ATP-Binding cassette (ABC) transporters play an extensive role in the translocation of diverse sets of biologically important molecules across membrane. EchnocandinB (antifungal) and EcdL protein of Aspergillus rugulosus are encoded by the same cluster of genes. Co-expression of EcdL and echinocandinB reflects tightly linked biological functions. EcdL belongs to Multidrug Resistance associated Protein (MRP) subfamily of ABC transporters with an extra transmembrane domain zero (TMD0). Complete structure of MRP subfamily comprising of TMD0 domain, at atomic resolution is not known. We hypothesized that the transportation of echonocandinB is mediated via EcdL protein. Henceforth, it is pertinent to know the topological arrangement of TMD0, with other domains of protein and its possible role in transportation of echinocandinB. Absence of effective template for TMD0 domain lead us to model by I-TASSER, further structure has been refined by multiple template modelling using homologous templates of remaining domains (TMD1, NBD1, TMD2, NBD2). The modelled structure has been validated for packing, folding and stereochemical properties. MD simulation for 0.1 μs has been carried out in the biphasic environment for refinement of modelled protein. Non-redundant structures have been excavated by clustering of MD trajectory. The structural alignment of modelled structure has shown Z-score -37.9; 31.6, 31.5 with RMSD; 2.4, 4.2, 4.8 with ABC transporters; PDB ID 4F4C, 4M1 M, 4M2T, respectively, reflecting the correctness of structure. EchinocandinB has been docked to the modelled as well as to the clustered structures, which reveals interaction of echinocandinB with TMD0 and other TM helices in the translocation path build of TMDs.

  19. HARMONY: a server for the assessment of protein structures

    PubMed Central

    Pugalenthi, G.; Shameer, K.; Srinivasan, N.; Sowdhamini, R.

    2006-01-01

    Protein structure validation is an important step in computational modeling and structure determination. Stereochemical assessment of protein structures examine internal parameters such as bond lengths and Ramachandran (φ,ψ) angles. Gross structure prediction methods such as inverse folding procedure and structure determination especially at low resolution can sometimes give rise to models that are incorrect due to assignment of misfolds or mistracing of electron density maps. Such errors are not reflected as strain in internal parameters. HARMONY is a procedure that examines the compatibility between the sequence and the structure of a protein by assigning scores to individual residues and their amino acid exchange patterns after considering their local environments. Local environments are described by the backbone conformation, solvent accessibility and hydrogen bonding patterns. We are now providing HARMONY through a web server such that users can submit their protein structure files and, if required, the alignment of homologous sequences. Scores are mapped on the structure for subsequent examination that is useful to also recognize regions of possible local errors in protein structures. HARMONY server is located at PMID:16844999

  20. Ab Initio structure prediction for Escherichia coli: towards genome-wide protein structure modeling and fold assignment

    PubMed Central

    Xu, Dong; Zhang, Yang

    2013-01-01

    Genome-wide protein structure prediction and structure-based function annotation have been a long-term goal in molecular biology but not yet become possible due to difficulties in modeling distant-homology targets. We developed a hybrid pipeline combining ab initio folding and template-based modeling for genome-wide structure prediction applied to the Escherichia coli genome. The pipeline was tested on 43 known sequences, where QUARK-based ab initio folding simulation generated models with TM-score 17% higher than that by traditional comparative modeling methods. For 495 unknown hard sequences, 72 are predicted to have a correct fold (TM-score > 0.5) and 321 have a substantial portion of structure correctly modeled (TM-score > 0.35). 317 sequences can be reliably assigned to a SCOP fold family based on structural analogy to existing proteins in PDB. The presented results, as a case study of E. coli, represent promising progress towards genome-wide structure modeling and fold family assignment using state-of-the-art ab initio folding algorithms. PMID:23719418

  1. Discovery of an Unexplored Protein Structural Scaffold of Serine Protease from Big Blue Octopus (Octopus cyanea): A New Prospective Lead Molecule.

    PubMed

    Panda, Subhamay; Kumari, Leena

    2017-01-01

    Serine proteases are a group of enzymes that hydrolyses the peptide bonds in proteins. In mammals, these enzymes help in the regulation of several major physiological functions such as digestion, blood clotting, responses of immune system, reproductive functions and the complement system. Serine proteases obtained from the venom of Octopodidae family is a relatively unexplored area of research. In the present work, we tried to effectively utilize comparative composite molecular modeling technique. Our key aim was to propose the first molecular model structure of unexplored serine protease 5 derived from big blue octopus. The other objective of this study was to analyze the distribution of negatively and positively charged amino acid over molecular modeled structure, distribution of secondary structural elements, hydrophobicity molecular surface analysis and electrostatic potential analysis with the aid of different bioinformatic tools. In the present study, molecular model has been generated with the help of I-TASSER suite. Afterwards the refined structural model was validated with standard methods. For functional annotation of protein molecule we used Protein Information Resource (PIR) database. Serine protease 5 of big blue octopus was analyzed with different bioinformatical algorithms for the distribution of negatively and positively charged amino acid over molecular modeled structure, distribution of secondary structural elements, hydrophobicity molecular surface analysis and electrostatic potential analysis. The functionally critical amino acids and ligand- binding site (LBS) of the proteins (modeled) were determined using the COACH program. The molecular model data in cooperation to other pertinent post model analysis data put forward molecular insight to proteolytic activity of serine protease 5, which helps in the clear understanding of procoagulant and anticoagulant characteristics of this natural lead molecule. Our approach was to investigate the octopus venom protein as a whole or a part of their structure that may result in the development of new lead molecule. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  2. Structural bioinformatics of the human spliceosomal proteome

    PubMed Central

    Korneta, Iga; Magnus, Marcin; Bujnicki, Janusz M.

    2012-01-01

    In this work, we describe the results of a comprehensive structural bioinformatics analysis of the spliceosomal proteome. We used fold recognition analysis to complement prior data on the ordered domains of 252 human splicing proteins. Examples of newly identified domains include a PWI domain in the U5 snRNP protein 200K (hBrr2, residues 258–338), while examples of previously known domains with a newly determined fold include the DUF1115 domain of the U4/U6 di-snRNP protein 90K (hPrp3, residues 540–683). We also established a non-redundant set of experimental models of spliceosomal proteins, as well as constructed in silico models for regions without an experimental structure. The combined set of structural models is available for download. Altogether, over 90% of the ordered regions of the spliceosomal proteome can be represented structurally with a high degree of confidence. We analyzed the reduced spliceosomal proteome of the intron-poor organism Giardia lamblia, and as a result, we proposed a candidate set of ordered structural regions necessary for a functional spliceosome. The results of this work will aid experimental and structural analyses of the spliceosomal proteins and complexes, and can serve as a starting point for multiscale modeling of the structure of the entire spliceosome. PMID:22573172

  3. Template-Based Modeling of Protein-RNA Interactions.

    PubMed

    Zheng, Jinfang; Kundrotas, Petras J; Vakser, Ilya A; Liu, Shiyong

    2016-09-01

    Protein-RNA complexes formed by specific recognition between RNA and RNA-binding proteins play an important role in biological processes. More than a thousand of such proteins in human are curated and many novel RNA-binding proteins are to be discovered. Due to limitations of experimental approaches, computational techniques are needed for characterization of protein-RNA interactions. Although much progress has been made, adequate methodologies reliably providing atomic resolution structural details are still lacking. Although protein-RNA free docking approaches proved to be useful, in general, the template-based approaches provide higher quality of predictions. Templates are key to building a high quality model. Sequence/structure relationships were studied based on a representative set of binary protein-RNA complexes from PDB. Several approaches were tested for pairwise target/template alignment. The analysis revealed a transition point between random and correct binding modes. The results showed that structural alignment is better than sequence alignment in identifying good templates, suitable for generating protein-RNA complexes close to the native structure, and outperforms free docking, successfully predicting complexes where the free docking fails, including cases of significant conformational change upon binding. A template-based protein-RNA interaction modeling protocol PRIME was developed and benchmarked on a representative set of complexes.

  4. United3D: a protein model quality assessment program that uses two consensus based methods.

    PubMed

    Terashi, Genki; Oosawa, Makoto; Nakamura, Yuuki; Kanou, Kazuhiko; Takeda-Shitaka, Mayuko

    2012-01-01

    In protein structure prediction, such as template-based modeling and free modeling (ab initio modeling), the step that assesses the quality of protein models is very important. We have developed a model quality assessment (QA) program United3D that uses an optimized clustering method and a simple Cα atom contact-based potential. United3D automatically estimates the quality scores (Qscore) of predicted protein models that are highly correlated with the actual quality (GDT_TS). The performance of United3D was tested in the ninth Critical Assessment of protein Structure Prediction (CASP9) experiment. In CASP9, United3D showed the lowest average loss of GDT_TS (5.3) among the QA methods participated in CASP9. This result indicates that the performance of United3D to identify the high quality models from the models predicted by CASP9 servers on 116 targets was best among the QA methods that were tested in CASP9. United3D also produced high average Pearson correlation coefficients (0.93) and acceptable Kendall rank correlation coefficients (0.68) between the Qscore and GDT_TS. This performance was competitive with the other top ranked QA methods that were tested in CASP9. These results indicate that United3D is a useful tool for selecting high quality models from many candidate model structures provided by various modeling methods. United3D will improve the accuracy of protein structure prediction.

  5. Pushing the size limit of de novo structure ensemble prediction guided by sparse SDSL-EPR restraints to 200 residues: The monomeric and homodimeric forms of BAX

    PubMed Central

    Fischer, Axel W.; Bordignon, Enrica; Bleicken, Stephanie; García-Sáez, Ana J.; Jeschke, Gunnar; Meiler, Jens

    2016-01-01

    Structure determination remains a challenge for many biologically important proteins. In particular, proteins that adopt multiple conformations often evade crystallization in all biologically relevant states. Although computational de novo protein folding approaches often sample biologically relevant conformations, the selection of the most accurate model for different functional states remains a formidable challenge, in particular, for proteins with more than about 150 residues. Electron paramagnetic resonance (EPR) spectroscopy can obtain limited structural information for proteins in well-defined biological states and thereby assist in selecting biologically relevant conformations. The present study demonstrates that de novo folding methods are able to accurately sample the folds of 192-residue long soluble monomeric Bcl-2-associated X protein (BAX). The tertiary structures of the monomeric and homodimeric forms of BAX were predicted using the primary structure as well as 25 and 11 EPR distance restraints, respectively. The predicted models were subsequently compared to respective NMR/X-ray structures of BAX. EPR restraints improve the protein-size normalized root-mean-square-deviation (RMSD100) of the most accurate models with respect to the NMR/crystal structure from 5.9 Å to 3.9 Å and from 5.7 Å to 3.3 Å, respectively. Additionally, the model discrimination is improved, which is demonstrated by an improvement of the enrichment from 5% to 15% and from 13% to 21%, respectively. PMID:27129417

  6. GalaxyRefineComplex: Refinement of protein-protein complex model structures driven by interface repacking.

    PubMed

    Heo, Lim; Lee, Hasup; Seok, Chaok

    2016-08-18

    Protein-protein docking methods have been widely used to gain an atomic-level understanding of protein interactions. However, docking methods that employ low-resolution energy functions are popular because of computational efficiency. Low-resolution docking tends to generate protein complex structures that are not fully optimized. GalaxyRefineComplex takes such low-resolution docking structures and refines them to improve model accuracy in terms of both interface contact and inter-protein orientation. This refinement method allows flexibility at the protein interface and in the overall docking structure to capture conformational changes that occur upon binding. Symmetric refinement is also provided for symmetric homo-complexes. This method was validated by refining models produced by available docking programs, including ZDOCK and M-ZDOCK, and was successfully applied to CAPRI targets in a blind fashion. An example of using the refinement method with an existing docking method for ligand binding mode prediction of a drug target is also presented. A web server that implements the method is freely available at http://galaxy.seoklab.org/refinecomplex.

  7. Modeling Protein Excited-state Structures from "Over-length" Chemical Cross-links.

    PubMed

    Ding, Yue-He; Gong, Zhou; Dong, Xu; Liu, Kan; Liu, Zhu; Liu, Chao; He, Si-Min; Dong, Meng-Qiu; Tang, Chun

    2017-01-27

    Chemical cross-linking coupled with mass spectroscopy (CXMS) provides proximity information for the cross-linked residues and is used increasingly for modeling protein structures. However, experimentally identified cross-links are sometimes incompatible with the known structure of a protein, as the distance calculated between the cross-linked residues far exceeds the maximum length of the cross-linker. The discrepancies may persist even after eliminating potentially false cross-links and excluding intermolecular ones. Thus the "over-length" cross-links may arise from alternative excited-state conformation of the protein. Here we present a method and associated software DynaXL for visualizing the ensemble structures of multidomain proteins based on intramolecular cross-links identified by mass spectrometry with high confidence. Representing the cross-linkers and cross-linking reactions explicitly, we show that the protein excited-state structure can be modeled with as few as two over-length cross-links. We demonstrate the generality of our method with three systems: calmodulin, enzyme I, and glutamine-binding protein, and we show that these proteins alternate between different conformations for interacting with other proteins and ligands. Taken together, the over-length chemical cross-links contain valuable information about protein dynamics, and our findings here illustrate the relationship between dynamic domain movement and protein function. © 2017 by The American Society for Biochemistry and Molecular Biology, Inc.

  8. Building a Better Fragment Library for De Novo Protein Structure Prediction

    PubMed Central

    de Oliveira, Saulo H. P.; Shi, Jiye; Deane, Charlotte M.

    2015-01-01

    Fragment-based approaches are the current standard for de novo protein structure prediction. These approaches rely on accurate and reliable fragment libraries to generate good structural models. In this work, we describe a novel method for structure fragment library generation and its application in fragment-based de novo protein structure prediction. The importance of correct testing procedures in assessing the quality of fragment libraries is demonstrated. In particular, the exclusion of homologs to the target from the libraries to correctly simulate a de novo protein structure prediction scenario, something which surprisingly is not always done. We demonstrate that fragments presenting different predominant predicted secondary structures should be treated differently during the fragment library generation step and that exhaustive and random search strategies should both be used. This information was used to develop a novel method, Flib. On a validation set of 41 structurally diverse proteins, Flib libraries presents both a higher precision and coverage than two of the state-of-the-art methods, NNMake and HHFrag. Flib also achieves better precision and coverage on the set of 275 protein domains used in the two previous experiments of the the Critical Assessment of Structure Prediction (CASP9 and CASP10). We compared Flib libraries against NNMake libraries in a structure prediction context. Of the 13 cases in which a correct answer was generated, Flib models were more accurate than NNMake models for 10. “Flib is available for download at: http://www.stats.ox.ac.uk/research/proteins/resources”. PMID:25901595

  9. Structure Prediction and Analysis of Neuraminidase Sequence Variants

    ERIC Educational Resources Information Center

    Thayer, Kelly M.

    2016-01-01

    Analyzing protein structure has become an integral aspect of understanding systems of biochemical import. The laboratory experiment endeavors to introduce protein folding to ascertain structures of proteins for which the structure is unavailable, as well as to critically evaluate the quality of the prediction obtained. The model system used is the…

  10. Sequence-similar, structure-dissimilar protein pairs in the PDB.

    PubMed

    Kosloff, Mickey; Kolodny, Rachel

    2008-05-01

    It is often assumed that in the Protein Data Bank (PDB), two proteins with similar sequences will also have similar structures. Accordingly, it has proved useful to develop subsets of the PDB from which "redundant" structures have been removed, based on a sequence-based criterion for similarity. Similarly, when predicting protein structure using homology modeling, if a template structure for modeling a target sequence is selected by sequence alone, this implicitly assumes that all sequence-similar templates are equivalent. Here, we show that this assumption is often not correct and that standard approaches to create subsets of the PDB can lead to the loss of structurally and functionally important information. We have carried out sequence-based structural superpositions and geometry-based structural alignments of a large number of protein pairs to determine the extent to which sequence similarity ensures structural similarity. We find many examples where two proteins that are similar in sequence have structures that differ significantly from one another. The source of the structural differences usually has a functional basis. The number of such proteins pairs that are identified and the magnitude of the dissimilarity depend on the approach that is used to calculate the differences; in particular sequence-based structure superpositioning will identify a larger number of structurally dissimilar pairs than geometry-based structural alignments. When two sequences can be aligned in a statistically meaningful way, sequence-based structural superpositioning provides a meaningful measure of structural differences. This approach and geometry-based structure alignments reveal somewhat different information and one or the other might be preferable in a given application. Our results suggest that in some cases, notably homology modeling, the common use of nonredundant datasets, culled from the PDB based on sequence, may mask important structural and functional information. We have established a data base of sequence-similar, structurally dissimilar protein pairs that will help address this problem (http://luna.bioc.columbia.edu/rachel/seqsimstrdiff.htm).

  11. Principles of protein folding--a perspective from simple exact models.

    PubMed Central

    Dill, K. A.; Bromberg, S.; Yue, K.; Fiebig, K. M.; Yee, D. P.; Thomas, P. D.; Chan, H. S.

    1995-01-01

    General principles of protein structure, stability, and folding kinetics have recently been explored in computer simulations of simple exact lattice models. These models represent protein chains at a rudimentary level, but they involve few parameters, approximations, or implicit biases, and they allow complete explorations of conformational and sequence spaces. Such simulations have resulted in testable predictions that are sometimes unanticipated: The folding code is mainly binary and delocalized throughout the amino acid sequence. The secondary and tertiary structures of a protein are specified mainly by the sequence of polar and nonpolar monomers. More specific interactions may refine the structure, rather than dominate the folding code. Simple exact models can account for the properties that characterize protein folding: two-state cooperativity, secondary and tertiary structures, and multistage folding kinetics--fast hydrophobic collapse followed by slower annealing. These studies suggest the possibility of creating "foldable" chain molecules other than proteins. The encoding of a unique compact chain conformation may not require amino acids; it may require only the ability to synthesize specific monomer sequences in which at least one monomer type is solvent-averse. PMID:7613459

  12. Modeling of the structure of ribosomal protein L1 from the archaeon Haloarcula marismortui

    NASA Astrophysics Data System (ADS)

    Nevskaya, N. A.; Kljashtorny, V. G.; Vakhrusheva, A. V.; Garber, M. B.; Nikonov, S. V.

    2017-07-01

    The halophilic archaeon Haloarcula marismortui proliferates in the Dead Sea at extremely high salt concentrations (higher than 3 M). This is the only archaeon, for which the crystal structure of the ribosomal 50S subunit was determined. However, the structure of the functionally important side protuberance containing the abnormally negatively charged protein L1 (HmaL1) was not visualized. Attempts to crystallize HmaL1 in the isolated state or as its complex with RNA using normal salt concentrations (≤500 mM) failed. A theoretical model of HmaL1 was built based on the structural data for homologs of the protein L1 from other organisms, and this model was refined by molecular dynamics methods. Analysis of this model showed that the protein HmaL1 can undergo aggregation due to the presence of a cluster of positive charges unique for proteins L1. This cluster is located at the RNA-protein interface, which interferes with the crystallization of HmaL1 and the binding of the latter to RNA.

  13. Parmodel: a web server for automated comparative modeling of proteins.

    PubMed

    Uchôa, Hugo Brandão; Jorge, Guilherme Eberhart; Freitas Da Silveira, Nelson José; Camera, João Carlos; Canduri, Fernanda; De Azevedo, Walter Filgueira

    2004-12-24

    Parmodel is a web server for automated comparative modeling and evaluation of protein structures. The aim of this tool is to help inexperienced users to perform modeling, assessment, visualization, and optimization of protein models as well as crystallographers to evaluate structures solved experimentally. It is subdivided in four modules: Parmodel Modeling, Parmodel Assessment, Parmodel Visualization, and Parmodel Optimization. The main module is the Parmodel Modeling that allows the building of several models for a same protein in a reduced time, through the distribution of modeling processes on a Beowulf cluster. Parmodel automates and integrates the main softwares used in comparative modeling as MODELLER, Whatcheck, Procheck, Raster3D, Molscript, and Gromacs. This web server is freely accessible at .

  14. Assessing the applicability of template-based protein docking in the twilight zone.

    PubMed

    Negroni, Jacopo; Mosca, Roberto; Aloy, Patrick

    2014-09-02

    The structural modeling of protein interactions in the absence of close homologous templates is a challenging task. Recently, template-based docking methods have emerged to exploit local structural similarities to help ab-initio protocols provide reliable 3D models for protein interactions. In this work, we critically assess the performance of template-based docking in the twilight zone. Our results show that, while it is possible to find templates for nearly all known interactions, the quality of the obtained models is rather limited. We can increase the precision of the models at expenses of coverage, but it drastically reduces the potential applicability of the method, as illustrated by the whole-interactome modeling of nine organisms. Template-based docking is likely to play an important role in the structural characterization of the interaction space, but we still need to improve the repertoire of structural templates onto which we can reliably model protein complexes. Copyright © 2014 Elsevier Ltd. All rights reserved.

  15. Biomolecular interactions modulate macromolecular structure and dynamics in atomistic model of a bacterial cytoplasm.

    PubMed

    Yu, Isseki; Mori, Takaharu; Ando, Tadashi; Harada, Ryuhei; Jung, Jaewoon; Sugita, Yuji; Feig, Michael

    2016-11-01

    Biological macromolecules function in highly crowded cellular environments. The structure and dynamics of proteins and nucleic acids are well characterized in vitro, but in vivo crowding effects remain unclear. Using molecular dynamics simulations of a comprehensive atomistic model cytoplasm we found that protein-protein interactions may destabilize native protein structures, whereas metabolite interactions may induce more compact states due to electrostatic screening. Protein-protein interactions also resulted in significant variations in reduced macromolecular diffusion under crowded conditions, while metabolites exhibited significant two-dimensional surface diffusion and altered protein-ligand binding that may reduce the effective concentration of metabolites and ligands in vivo. Metabolic enzymes showed weak non-specific association in cellular environments attributed to solvation and entropic effects. These effects are expected to have broad implications for the in vivo functioning of biomolecules. This work is a first step towards physically realistic in silico whole-cell models that connect molecular with cellular biology.

  16. General Mechanism of Two-State Protein Folding Kinetics

    PubMed Central

    Rollins, Geoffrey C.; Dill, Ken A.

    2016-01-01

    We describe here a general model of the kinetic mechanism of protein folding. In the Foldon Funnel Model, proteins fold in units of secondary structures, which form sequentially along the folding pathway, stabilized by tertiary interactions. The model predicts that the free energy landscape has a volcano shape, rather than a simple funnel, that folding is two-state (single-exponential) when secondary structures are intrinsically unstable, and that each structure along the folding path is a transition state for the previous structure. It shows how sequential pathways are consistent with multiple stochastic routes on funnel landscapes, and it gives good agreement with the 9 order of magnitude dependence of folding rates on protein size for a set of 93 proteins, at the same time it is consistent with the near independence of folding equilibrium constant on size. This model gives estimates of folding rates of proteomes, leading to a median folding time in Escherichia coli of about 5 s. PMID:25056406

  17. Template-based protein structure modeling using the RaptorX web server.

    PubMed

    Källberg, Morten; Wang, Haipeng; Wang, Sheng; Peng, Jian; Wang, Zhiyong; Lu, Hui; Xu, Jinbo

    2012-07-19

    A key challenge of modern biology is to uncover the functional role of the protein entities that compose cellular proteomes. To this end, the availability of reliable three-dimensional atomic models of proteins is often crucial. This protocol presents a community-wide web-based method using RaptorX (http://raptorx.uchicago.edu/) for protein secondary structure prediction, template-based tertiary structure modeling, alignment quality assessment and sophisticated probabilistic alignment sampling. RaptorX distinguishes itself from other servers by the quality of the alignment between a target sequence and one or multiple distantly related template proteins (especially those with sparse sequence profiles) and by a novel nonlinear scoring function and a probabilistic-consistency algorithm. Consequently, RaptorX delivers high-quality structural models for many targets with only remote templates. At present, it takes RaptorX ~35 min to finish processing a sequence of 200 amino acids. Since its official release in August 2011, RaptorX has processed ~6,000 sequences submitted by ~1,600 users from around the world.

  18. Template-based protein structure modeling using the RaptorX web server

    PubMed Central

    Källberg, Morten; Wang, Haipeng; Wang, Sheng; Peng, Jian; Wang, Zhiyong; Lu, Hui; Xu, Jinbo

    2016-01-01

    A key challenge of modern biology is to uncover the functional role of the protein entities that compose cellular proteomes. To this end, the availability of reliable three-dimensional atomic models of proteins is often crucial. This protocol presents a community-wide web-based method using RaptorX (http://raptorx.uchicago.edu/) for protein secondary structure prediction, template-based tertiary structure modeling, alignment quality assessment and sophisticated probabilistic alignment sampling. RaptorX distinguishes itself from other servers by the quality of the alignment between a target sequence and one or multiple distantly related template proteins (especially those with sparse sequence profiles) and by a novel nonlinear scoring function and a probabilistic-consistency algorithm. Consequently, RaptorX delivers high-quality structural models for many targets with only remote templates. At present, it takes RaptorX ~35 min to finish processing a sequence of 200 amino acids. Since its official release in August 2011, RaptorX has processed ~6,000 sequences submitted by ~1,600 users from around the world. PMID:22814390

  19. The Phyre2 web portal for protein modelling, prediction and analysis

    PubMed Central

    Kelley, Lawrence A; Mezulis, Stefans; Yates, Christopher M; Wass, Mark N; Sternberg, Michael JE

    2017-01-01

    Summary Phyre2 is a suite of tools available on the web to predict and analyse protein structure, function and mutations. The focus of Phyre2 is to provide biologists with a simple and intuitive interface to state-of-the-art protein bioinformatics tools. Phyre2 replaces Phyre, the original version of the server for which we previously published a protocol. In this updated protocol, we describe Phyre2, which uses advanced remote homology detection methods to build 3D models, predict ligand binding sites, and analyse the effect of amino-acid variants (e.g. nsSNPs) for a user’s protein sequence. Users are guided through results by a simple interface at a level of detail determined by them. This protocol will guide a user from submitting a protein sequence to interpreting the secondary and tertiary structure of their models, their domain composition and model quality. A range of additional available tools is described to find a protein structure in a genome, to submit large number of sequences at once and to automatically run weekly searches for proteins difficult to model. The server is available at http://www.sbg.bio.ic.ac.uk/phyre2. A typical structure prediction will be returned between 30mins and 2 hours after submission. PMID:25950237

  20. The Fluid-Mosaic Model of Membrane Structure: still relevant to understanding the structure, function and dynamics of biological membranes after more than 40 years.

    PubMed

    Nicolson, Garth L

    2014-06-01

    In 1972 the Fluid-Mosaic Membrane Model of membrane structure was proposed based on thermodynamic principals of organization of membrane lipids and proteins and available evidence of asymmetry and lateral mobility within the membrane matrix [S. J. Singer and G. L. Nicolson, Science 175 (1972) 720-731]. After over 40years, this basic model of the cell membrane remains relevant for describing the basic nano-structures of a variety of intracellular and cellular membranes of plant and animal cells and lower forms of life. In the intervening years, however, new information has documented the importance and roles of specialized membrane domains, such as lipid rafts and protein/glycoprotein complexes, in describing the macrostructure, dynamics and functions of cellular membranes as well as the roles of membrane-associated cytoskeletal fences and extracellular matrix structures in limiting the lateral diffusion and range of motion of membrane components. These newer data build on the foundation of the original model and add new layers of complexity and hierarchy, but the concepts described in the original model are still applicable today. In updated versions of the model more emphasis has been placed on the mosaic nature of the macrostructure of cellular membranes where many protein and lipid components are limited in their rotational and lateral motilities in the membrane plane, especially in their natural states where lipid-lipid, protein-protein and lipid-protein interactions as well as cell-matrix, cell-cell and intracellular membrane-associated protein and cytoskeletal interactions are important in restraining the lateral motility and range of motion of particular membrane components. The formation of specialized membrane domains and the presence of tightly packed integral membrane protein complexes due to membrane-associated fences, fenceposts and other structures are considered very important in describing membrane dynamics and architecture. These structures along with membrane-associated cytoskeletal and extracellular structures maintain the long-range, non-random mosaic macro-organization of membranes, while smaller membrane nano- and submicro-sized domains, such as lipid rafts and protein complexes, are important in maintaining specialized membrane structures that are in cooperative dynamic flux in a crowded membrane plane. This Article is Part of a Special Issue Entitled: Membrane Structure and Function: Relevance in the Cell's Physiology, Pathology and Therapy. © 2013.

  1. Mathematical methods for protein science

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hart, W.; Istrail, S.; Atkins, J.

    1997-12-31

    Understanding the structure and function of proteins is a fundamental endeavor in molecular biology. Currently, over 100,000 protein sequences have been determined by experimental methods. The three dimensional structure of the protein determines its function, but there are currently less than 4,000 structures known to atomic resolution. Accordingly, techniques to predict protein structure from sequence have an important role in aiding the understanding of the Genome and the effects of mutations in genetic disease. The authors describe current efforts at Sandia to better understand the structure of proteins through rigorous mathematical analyses of simple lattice models. The efforts have focusedmore » on two aspects of protein science: mathematical structure prediction, and inverse protein folding.« less

  2. Quality assessment of protein model-structures using evolutionary conservation.

    PubMed

    Kalman, Matan; Ben-Tal, Nir

    2010-05-15

    Programs that evaluate the quality of a protein structural model are important both for validating the structure determination procedure and for guiding the model-building process. Such programs are based on properties of native structures that are generally not expected for faulty models. One such property, which is rarely used for automatic structure quality assessment, is the tendency for conserved residues to be located at the structural core and for variable residues to be located at the surface. We present ConQuass, a novel quality assessment program based on the consistency between the model structure and the protein's conservation pattern. We show that it can identify problematic structural models, and that the scores it assigns to the server models in CASP8 correlate with the similarity of the models to the native structure. We also show that when the conservation information is reliable, the method's performance is comparable and complementary to that of the other single-structure quality assessment methods that participated in CASP8 and that do not use additional structural information from homologs. A perl implementation of the method, as well as the various perl and R scripts used for the analysis are available at http://bental.tau.ac.il/ConQuass/. nirb@tauex.tau.ac.il Supplementary data are available at Bioinformatics online.

  3. The phylogenomic roots of modern biochemistry: origins of proteins, cofactors and protein biosynthesis.

    PubMed

    Caetano-Anollés, Gustavo; Kim, Kyung Mo; Caetano-Anollés, Derek

    2012-02-01

    The complexity of modern biochemistry developed gradually on early Earth as new molecules and structures populated the emerging cellular systems. Here, we generate a historical account of the gradual discovery of primordial proteins, cofactors, and molecular functions using phylogenomic information in the sequence of 420 genomes. We focus on structural and functional annotations of the 54 most ancient protein domains. We show how primordial functions are linked to folded structures and how their interaction with cofactors expanded the functional repertoire. We also reveal protocell membranes played a crucial role in early protein evolution and show translation started with RNA and thioester cofactor-mediated aminoacylation. Our findings allow elaboration of an evolutionary model of early biochemistry that is firmly grounded in phylogenomic information and biochemical, biophysical, and structural knowledge. The model describes how primordial α-helical bundles stabilized membranes, how these were decorated by layered arrangements of β-sheets and α-helices, and how these arrangements became globular. Ancient forms of aminoacyl-tRNA synthetase (aaRS) catalytic domains and ancient non-ribosomal protein synthetase (NRPS) modules gave rise to primordial protein synthesis and the ability to generate a code for specificity in their active sites. These structures diversified producing cofactor-binding molecular switches and barrel structures. Accretion of domains and molecules gave rise to modern aaRSs, NRPS, and ribosomal ensembles, first organized around novel emerging cofactors (tRNA and carrier proteins) and then more complex cofactor structures (rRNA). The model explains how the generation of protein structures acted as scaffold for nucleic acids and resulted in crystallization of modern translation.

  4. qPIPSA: Relating enzymatic kinetic parameters and interaction fields

    PubMed Central

    Gabdoulline, Razif R; Stein, Matthias; Wade, Rebecca C

    2007-01-01

    Background The simulation of metabolic networks in quantitative systems biology requires the assignment of enzymatic kinetic parameters. Experimentally determined values are often not available and therefore computational methods to estimate these parameters are needed. It is possible to use the three-dimensional structure of an enzyme to perform simulations of a reaction and derive kinetic parameters. However, this is computationally demanding and requires detailed knowledge of the enzyme mechanism. We have therefore sought to develop a general, simple and computationally efficient procedure to relate protein structural information to enzymatic kinetic parameters that allows consistency between the kinetic and structural information to be checked and estimation of kinetic constants for structurally and mechanistically similar enzymes. Results We describe qPIPSA: quantitative Protein Interaction Property Similarity Analysis. In this analysis, molecular interaction fields, for example, electrostatic potentials, are computed from the enzyme structures. Differences in molecular interaction fields between enzymes are then related to the ratios of their kinetic parameters. This procedure can be used to estimate unknown kinetic parameters when enzyme structural information is available and kinetic parameters have been measured for related enzymes or were obtained under different conditions. The detailed interaction of the enzyme with substrate or cofactors is not modeled and is assumed to be similar for all the proteins compared. The protein structure modeling protocol employed ensures that differences between models reflect genuine differences between the protein sequences, rather than random fluctuations in protein structure. Conclusion Provided that the experimental conditions and the protein structural models refer to the same protein state or conformation, correlations between interaction fields and kinetic parameters can be established for sets of related enzymes. Outliers may arise due to variation in the importance of different contributions to the kinetic parameters, such as protein stability and conformational changes. The qPIPSA approach can assist in the validation as well as estimation of kinetic parameters, and provide insights into enzyme mechanism. PMID:17919319

  5. A minimalist model protein with multiple folding funnels

    PubMed Central

    Locker, C. Rebecca; Hernandez, Rigoberto

    2001-01-01

    Kinetic and structural studies of wild-type proteins such as prions and amyloidogenic proteins provide suggestive evidence that proteins may adopt multiple long-lived states in addition to the native state. All of these states differ structurally because they lie far apart in configuration space, but their stability is not necessarily caused by cooperative (nucleation) effects. In this study, a minimalist model protein is designed to exhibit multiple long-lived states to explore the dynamics of the corresponding wild-type proteins. The minimalist protein is modeled as a 27-monomer sequence confined to a cubic lattice with three different monomer types. An order parameter—the winding index—is introduced to characterize the extent of folding. The winding index has several advantages over other commonly used order parameters like the number of native contacts. It can distinguish between enantiomers, its calculation requires less computational time than the number of native contacts, and reduced-dimensional landscapes can be developed when the native state structure is not known a priori. The results for the designed model protein prove by existence that the rugged energy landscape picture of protein folding can be generalized to include protein “misfolding” into long-lived states. PMID:11470921

  6. Modelling dynamics in protein crystal structures by ensemble refinement

    PubMed Central

    Burnley, B Tom; Afonine, Pavel V; Adams, Paul D; Gros, Piet

    2012-01-01

    Single-structure models derived from X-ray data do not adequately account for the inherent, functionally important dynamics of protein molecules. We generated ensembles of structures by time-averaged refinement, where local molecular vibrations were sampled by molecular-dynamics (MD) simulation whilst global disorder was partitioned into an underlying overall translation–libration–screw (TLS) model. Modeling of 20 protein datasets at 1.1–3.1 Å resolution reduced cross-validated Rfree values by 0.3–4.9%, indicating that ensemble models fit the X-ray data better than single structures. The ensembles revealed that, while most proteins display a well-ordered core, some proteins exhibit a ‘molten core’ likely supporting functionally important dynamics in ligand binding, enzyme activity and protomer assembly. Order–disorder changes in HIV protease indicate a mechanism of entropy compensation for ordering the catalytic residues upon ligand binding by disordering specific core residues. Thus, ensemble refinement extracts dynamical details from the X-ray data that allow a more comprehensive understanding of structure–dynamics–function relationships. DOI: http://dx.doi.org/10.7554/eLife.00311.001 PMID:23251785

  7. Data Mining of Macromolecular Structures.

    PubMed

    van Beusekom, Bart; Perrakis, Anastassis; Joosten, Robbie P

    2016-01-01

    The use of macromolecular structures is widespread for a variety of applications, from teaching protein structure principles all the way to ligand optimization in drug development. Applying data mining techniques on these experimentally determined structures requires a highly uniform, standardized structural data source. The Protein Data Bank (PDB) has evolved over the years toward becoming the standard resource for macromolecular structures. However, the process selecting the data most suitable for specific applications is still very much based on personal preferences and understanding of the experimental techniques used to obtain these models. In this chapter, we will first explain the challenges with data standardization, annotation, and uniformity in the PDB entries determined by X-ray crystallography. We then discuss the specific effect that crystallographic data quality and model optimization methods have on structural models and how validation tools can be used to make informed choices. We also discuss specific advantages of using the PDB_REDO databank as a resource for structural data. Finally, we will provide guidelines on how to select the most suitable protein structure models for detailed analysis and how to select a set of structure models suitable for data mining.

  8. Structural studies of human glioma pathogenesis-related protein 1

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Asojo, Oluwatoyin A., E-mail: oasojo@unmc.edu; Koski, Raymond A.; Bonafé, Nathalie

    2011-10-01

    Structural analysis of a truncated soluble domain of human glioma pathogenesis-related protein 1, a membrane protein implicated in the proliferation of aggressive brain cancer, is presented. Human glioma pathogenesis-related protein 1 (GLIPR1) is a membrane protein that is highly upregulated in brain cancers but is barely detectable in normal brain tissue. GLIPR1 is composed of a signal peptide that directs its secretion, a conserved cysteine-rich CAP (cysteine-rich secretory proteins, antigen 5 and pathogenesis-related 1 proteins) domain and a transmembrane domain. GLIPR1 is currently being investigated as a candidate for prostate cancer gene therapy and for glioblastoma targeted therapy. Crystal structuresmore » of a truncated soluble domain of the human GLIPR1 protein (sGLIPR1) solved by molecular replacement using a truncated polyalanine search model of the CAP domain of stecrisp, a snake-venom cysteine-rich secretory protein (CRISP), are presented. The correct molecular-replacement solution could only be obtained by removing all loops from the search model. The native structure was refined to 1.85 Å resolution and that of a Zn{sup 2+} complex was refined to 2.2 Å resolution. The latter structure revealed that the putative binding cavity coordinates Zn{sup 2+} similarly to snake-venom CRISPs, which are involved in Zn{sup 2+}-dependent mechanisms of inflammatory modulation. Both sGLIPR1 structures have extensive flexible loop/turn regions and unique charge distributions that were not observed in any of the previously reported CAP protein structures. A model is also proposed for the structure of full-length membrane-bound GLIPR1.« less

  9. How Many Protein Sequences Fold to a Given Structure? A Coevolutionary Analysis.

    PubMed

    Tian, Pengfei; Best, Robert B

    2017-10-17

    Quantifying the relationship between protein sequence and structure is key to understanding the protein universe. A fundamental measure of this relationship is the total number of amino acid sequences that can fold to a target protein structure, known as the "sequence capacity," which has been suggested as a proxy for how designable a given protein fold is. Although sequence capacity has been extensively studied using lattice models and theory, numerical estimates for real protein structures are currently lacking. In this work, we have quantitatively estimated the sequence capacity of 10 proteins with a variety of different structures using a statistical model based on residue-residue co-evolution to capture the variation of sequences from the same protein family. Remarkably, we find that even for the smallest protein folds, such as the WW domain, the number of foldable sequences is extremely large, exceeding the Avogadro constant. In agreement with earlier theoretical work, the calculated sequence capacity is positively correlated with the size of the protein, or better, the density of contacts. This allows the absolute sequence capacity of a given protein to be approximately predicted from its structure. On the other hand, the relative sequence capacity, i.e., normalized by the total number of possible sequences, is an extremely tiny number and is strongly anti-correlated with the protein length. Thus, although there may be more foldable sequences for larger proteins, it will be much harder to find them. Lastly, we have correlated the evolutionary age of proteins in the CATH database with their sequence capacity as predicted by our model. The results suggest a trade-off between the opposing requirements of high designability and the likelihood of a novel fold emerging by chance. Published by Elsevier Inc.

  10. Influence of structure properties on protein-protein interactions-QSAR modeling of changes in diffusion coefficients.

    PubMed

    Bauer, Katharina Christin; Hämmerling, Frank; Kittelmann, Jörg; Dürr, Cathrin; Görlich, Fabian; Hubbuch, Jürgen

    2017-04-01

    Information about protein-protein interactions provides valuable knowledge about the phase behavior of protein solutions during the biopharmaceutical production process. Up to date it is possible to capture their overall impact by an experimentally determined potential of mean force. For the description of this potential, the second virial coefficient B22, the diffusion interaction parameter kD, the storage modulus G', or the diffusion coefficient D is applied. In silico methods do not only have the potential to predict these parameters, but also to provide deeper understanding of the molecular origin of the protein-protein interactions by correlating the data to the protein's three-dimensional structure. This methodology furthermore allows a lower sample consumption and less experimental effort. Of all in silico methods, QSAR modeling, which correlates the properties of the molecule's structure with the experimental behavior, seems to be particularly suitable for this purpose. To verify this, the study reported here dealt with the determination of a QSAR model for the diffusion coefficient of proteins. This model consisted of diffusion coefficients for six different model proteins at various pH values and NaCl concentrations. The generated QSAR model showed a good correlation between experimental and predicted data with a coefficient of determination R2 = 0.9 and a good predictability for an external test set with R2 = 0.91. The information about the properties affecting protein-protein interactions present in solution was in agreement with experiment and theory. Furthermore, the model was able to give a more detailed picture of the protein properties influencing the diffusion coefficient and the acting protein-protein interactions. Biotechnol. Bioeng. 2017;114: 821-831. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.

  11. Biomacromolecular quantitative structure-activity relationship (BioQSAR): a proof-of-concept study on the modeling, prediction and interpretation of protein-protein binding affinity.

    PubMed

    Zhou, Peng; Wang, Congcong; Tian, Feifei; Ren, Yanrong; Yang, Chao; Huang, Jian

    2013-01-01

    Quantitative structure-activity relationship (QSAR), a regression modeling methodology that establishes statistical correlation between structure feature and apparent behavior for a series of congeneric molecules quantitatively, has been widely used to evaluate the activity, toxicity and property of various small-molecule compounds such as drugs, toxicants and surfactants. However, it is surprising to see that such useful technique has only very limited applications to biomacromolecules, albeit the solved 3D atom-resolution structures of proteins, nucleic acids and their complexes have accumulated rapidly in past decades. Here, we present a proof-of-concept paradigm for the modeling, prediction and interpretation of the binding affinity of 144 sequence-nonredundant, structure-available and affinity-known protein complexes (Kastritis et al. Protein Sci 20:482-491, 2011) using a biomacromolecular QSAR (BioQSAR) scheme. We demonstrate that the modeling performance and predictive power of BioQSAR are comparable to or even better than that of traditional knowledge-based strategies, mechanism-type methods and empirical scoring algorithms, while BioQSAR possesses certain additional features compared to the traditional methods, such as adaptability, interpretability, deep-validation and high-efficiency. The BioQSAR scheme could be readily modified to infer the biological behavior and functions of other biomacromolecules, if their X-ray crystal structures, NMR conformation assemblies or computationally modeled structures are available.

  12. The use of experimental structures to model protein dynamics.

    PubMed

    Katebi, Ataur R; Sankar, Kannan; Jia, Kejue; Jernigan, Robert L

    2015-01-01

    The number of solved protein structures submitted in the Protein Data Bank (PDB) has increased dramatically in recent years. For some specific proteins, this number is very high-for example, there are over 550 solved structures for HIV-1 protease, one protein that is essential for the life cycle of human immunodeficiency virus (HIV) which causes acquired immunodeficiency syndrome (AIDS) in humans. The large number of structures for the same protein and its variants include a sample of different conformational states of the protein. A rich set of structures solved experimentally for the same protein has information buried within the dataset that can explain the functional dynamics and structural mechanism of the protein. To extract the dynamics information and functional mechanism from the experimental structures, this chapter focuses on two methods-Principal Component Analysis (PCA) and Elastic Network Models (ENM). PCA is a widely used statistical dimensionality reduction technique to classify and visualize high-dimensional data. On the other hand, ENMs are well-established simple biophysical method for modeling the functionally important global motions of proteins. This chapter covers the basics of these two. Moreover, an improved ENM version that utilizes the variations found within a given set of structures for a protein is described. As a practical example, we have extracted the functional dynamics and mechanism of HIV-1 protease dimeric structure by using a set of 329 PDB structures of this protein. We have described, step by step, how to select a set of protein structures, how to extract the needed information from the PDB files for PCA, how to extract the dynamics information using PCA, how to calculate ENM modes, how to measure the congruency between the dynamics computed from the principal components (PCs) and the ENM modes, and how to compute entropies using the PCs. We provide the computer programs or references to software tools to accomplish each step and show how to use these programs and tools. We also include computer programs to generate movies based on PCs and ENM modes and describe how to visualize them.

  13. The Use of Experimental Structures to Model Protein Dynamics

    PubMed Central

    Katebi, Ataur R.; Sankar, Kannan; Jia, Kejue; Jernigan, Robert L.

    2014-01-01

    Summary The number of solved protein structures submitted in the Protein Data Bank (PDB) has increased dramatically in recent years. For some specific proteins, this number is very high – for example, there are over 550 solved structures for HIV-1 protease, one protein that is essential for the life cycle of human immunodeficiency virus (HIV) which causes acquired immunodeficiency syndrome (AIDS) in humans. The large number of structures for the same protein and its variants include a sample of different conformational states of the protein. A rich set of structures solved experimentally for the same protein has information buried within the dataset that can explain the functional dynamics and structural mechanism of the protein. To extract the dynamics information and functional mechanism from the experimental structures, this chapter focuses on two methods – Principal Component Analysis (PCA) and Elastic Network Models (ENM). PCA is a widely used statistical dimensionality reduction technique to classify and visualize high-dimensional data. On the other hand, ENMs are well-established simple biophysical method for modeling the functionally important global motions of proteins. This chapter covers the basics of these two. Moreover, an improved ENM version that utilizes the variations found within a given set of structures for a protein is described. As a practical example, we have extracted the functional dynamics and mechanism of HIV-1 protease dimeric structure by using a set of 329 PDB structures of this protein. We have described, step by step, how to select a set of protein structures, how to extract the needed information from the PDB files for PCA, how to extract the dynamics information using PCA, how to calculate ENM modes, how to measure the congruency between the dynamics computed from the principal components (PCs) and the ENM modes, and how to compute entropies using the PCs. We provide the computer programs or references to software tools to accomplish each step and show how to use these programs and tools. We also include computer programs to generate movies based on PCs and ENM modes and describe how to visualize them. PMID:25330965

  14. Exploration of freely available web-interfaces for comparative homology modelling of microbial proteins.

    PubMed

    Nema, Vijay; Pal, Sudhir Kumar

    2013-01-01

    This study was conducted to find the best suited freely available software for modelling of proteins by taking a few sample proteins. The proteins used were small to big in size with available crystal structures for the purpose of benchmarking. Key players like Phyre2, Swiss-Model, CPHmodels-3.0, Homer, (PS)2, (PS)(2)-V(2), Modweb were used for the comparison and model generation. Benchmarking process was done for four proteins, Icl, InhA, and KatG of Mycobacterium tuberculosis and RpoB of Thermus Thermophilus to get the most suited software. Parameters compared during analysis gave relatively better values for Phyre2 and Swiss-Model. This comparative study gave the information that Phyre2 and Swiss-Model make good models of small and large proteins as compared to other screened software. Other software was also good but is often not very efficient in providing full-length and properly folded structure.

  15. Atomic Structures of Minor Proteins VI and VII in the Human Adenovirus.

    PubMed

    Dai, Xinghong; Wu, Lily; Sun, Ren; Zhou, Z Hong

    2017-10-04

    Human adenoviruses (Ad) are dsDNA viruses associated with infectious diseases, yet better known as tools for gene delivery and oncolytic anti-cancer therapy. Atomic structures of Ad provide the basis for the development of antivirals and for engineering efforts towards more effective applications. Since 2010, atomic models of human Ad5 have been independently derived from photographic film cryoEM and X-ray crystallography, but discrepancies exist concerning the assignment of cement proteins IIIa, VIII and IX. To clarify these discrepancies, here we have employed the technology of direct electron-counting to obtain a cryoEM structure of human Ad5 at 3.2 Å resolution. Our improved structure unambiguously confirmed our previous cryoEM models of proteins IIIa, VIII and IX and explained the likely cause of conflict in the crystallography models. The improved structure also allows the identification of three new components in the cavities of hexons - the cleaved N-terminus of precursor protein VI (pVIn), the cleaved N-terminus of precursor protein VII (pVIIn2), and mature protein VI. The binding of pVIIn2--by extension that of genome-condensing pVII--to hexons is consistent with the previously proposed dsDNA genome-capsid co-assembly for adenoviruses, which resembles that of ssRNA viruses but differs from the well-established mechanism of pumping dsDNA into a preformed protein capsid, as exemplified by tailed bacteriophages and herpesviruses. IMPORTANCE Adenovirus is a double-edged sword to humans - as a widespread pathogen and a bioengineering tool for anti-cancer and gene therapy. Atomic structure of the virus provides the basis for antiviral and application developments, but conflicting atomic models from conventional/film cryoEM and X-ray crystallography for important cement proteins IIIa, VIII, and IX have caused confusion. Using the cutting-edge cryoEM technology with electron counting, we improved the structure of human adenovirus type 5 and confirmed our previous models of cement proteins IIIa, VIII, and IX, thus clarifying the inconsistent structures. The improved structure also reveals atomic details of membrane-lytic protein VI and genome-condensing protein VII and supports the previously proposed genome-capsid co-assembly mechanism for adenoviruses. Copyright © 2017 American Society for Microbiology.

  16. Structure refinement of membrane proteins via molecular dynamics simulations.

    PubMed

    Dutagaci, Bercem; Heo, Lim; Feig, Michael

    2018-07-01

    A refinement protocol based on physics-based techniques established for water soluble proteins is tested for membrane protein structures. Initial structures were generated by homology modeling and sampled via molecular dynamics simulations in explicit lipid bilayer and aqueous solvent systems. Snapshots from the simulations were selected based on scoring with either knowledge-based or implicit membrane-based scoring functions and averaged to obtain refined models. The protocol resulted in consistent and significant refinement of the membrane protein structures similar to the performance of refinement methods for soluble proteins. Refinement success was similar between sampling in the presence of lipid bilayers and aqueous solvent but the presence of lipid bilayers may benefit the improvement of lipid-facing residues. Scoring with knowledge-based functions (DFIRE and RWplus) was found to be as good as scoring using implicit membrane-based scoring functions suggesting that differences in internal packing is more important than orientations relative to the membrane during the refinement of membrane protein homology models. © 2018 Wiley Periodicals, Inc.

  17. LECTINPred: web Server that Uses Complex Networks of Protein Structure for Prediction of Lectins with Potential Use as Cancer Biomarkers or in Parasite Vaccine Design.

    PubMed

    Munteanu, Cristian R; Pedreira, Nieves; Dorado, Julián; Pazos, Alejandro; Pérez-Montoto, Lázaro G; Ubeira, Florencio M; González-Díaz, Humberto

    2014-04-01

    Lectins (Ls) play an important role in many diseases such as different types of cancer, parasitic infections and other diseases. Interestingly, the Protein Data Bank (PDB) contains +3000 protein 3D structures with unknown function. Thus, we can in principle, discover new Ls mining non-annotated structures from PDB or other sources. However, there are no general models to predict new biologically relevant Ls based on 3D chemical structures. We used the MARCH-INSIDE software to calculate the Markov-Shannon 3D electrostatic entropy parameters for the complex networks of protein structure of 2200 different protein 3D structures, including 1200 Ls. We have performed a Linear Discriminant Analysis (LDA) using these parameters as inputs in order to seek a new Quantitative Structure-Activity Relationship (QSAR) model, which is able to discriminate 3D structure of Ls from other proteins. We implemented this predictor in the web server named LECTINPred, freely available at http://bio-aims.udc.es/LECTINPred.php. This web server showed the following goodness-of-fit statistics: Sensitivity=96.7 % (for Ls), Specificity=87.6 % (non-active proteins), and Accuracy=92.5 % (for all proteins), considering altogether both the training and external prediction series. In mode 2, users can carry out an automatic retrieval of protein structures from PDB. We illustrated the use of this server, in operation mode 1, performing a data mining of PDB. We predicted Ls scores for +2000 proteins with unknown function and selected the top-scored ones as possible lectins. In operation mode 2, LECTINPred can also upload 3D structural models generated with structure-prediction tools like LOMETS or PHYRE2. The new Ls are expected to be of relevance as cancer biomarkers or useful in parasite vaccine design. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  18. 3D Protein structure prediction with genetic tabu search algorithm

    PubMed Central

    2010-01-01

    Background Protein structure prediction (PSP) has important applications in different fields, such as drug design, disease prediction, and so on. In protein structure prediction, there are two important issues. The first one is the design of the structure model and the second one is the design of the optimization technology. Because of the complexity of the realistic protein structure, the structure model adopted in this paper is a simplified model, which is called off-lattice AB model. After the structure model is assumed, optimization technology is needed for searching the best conformation of a protein sequence based on the assumed structure model. However, PSP is an NP-hard problem even if the simplest model is assumed. Thus, many algorithms have been developed to solve the global optimization problem. In this paper, a hybrid algorithm, which combines genetic algorithm (GA) and tabu search (TS) algorithm, is developed to complete this task. Results In order to develop an efficient optimization algorithm, several improved strategies are developed for the proposed genetic tabu search algorithm. The combined use of these strategies can improve the efficiency of the algorithm. In these strategies, tabu search introduced into the crossover and mutation operators can improve the local search capability, the adoption of variable population size strategy can maintain the diversity of the population, and the ranking selection strategy can improve the possibility of an individual with low energy value entering into next generation. Experiments are performed with Fibonacci sequences and real protein sequences. Experimental results show that the lowest energy obtained by the proposed GATS algorithm is lower than that obtained by previous methods. Conclusions The hybrid algorithm has the advantages from both genetic algorithm and tabu search algorithm. It makes use of the advantage of multiple search points in genetic algorithm, and can overcome poor hill-climbing capability in the conventional genetic algorithm by using the flexible memory functions of TS. Compared with some previous algorithms, GATS algorithm has better performance in global optimization and can predict 3D protein structure more effectively. PMID:20522256

  19. Sequence-structure relationship study in all-α transmembrane proteins using an unsupervised learning approach.

    PubMed

    Esque, Jérémy; Urbain, Aurélie; Etchebest, Catherine; de Brevern, Alexandre G

    2015-11-01

    Transmembrane proteins (TMPs) are major drug targets, but the knowledge of their precise topology structure remains highly limited compared with globular proteins. In spite of the difficulties in obtaining their structures, an important effort has been made these last years to increase their number from an experimental and computational point of view. In view of this emerging challenge, the development of computational methods to extract knowledge from these data is crucial for the better understanding of their functions and in improving the quality of structural models. Here, we revisit an efficient unsupervised learning procedure, called Hybrid Protein Model (HPM), which is applied to the analysis of transmembrane proteins belonging to the all-α structural class. HPM method is an original classification procedure that efficiently combines sequence and structure learning. The procedure was initially applied to the analysis of globular proteins. In the present case, HPM classifies a set of overlapping protein fragments, extracted from a non-redundant databank of TMP 3D structure. After fine-tuning of the learning parameters, the optimal classification results in 65 clusters. They represent at best similar relationships between sequence and local structure properties of TMPs. Interestingly, HPM distinguishes among the resulting clusters two helical regions with distinct hydrophobic patterns. This underlines the complexity of the topology of these proteins. The HPM classification enlightens unusual relationship between amino acids in TMP fragments, which can be useful to elaborate new amino acids substitution matrices. Finally, two challenging applications are described: the first one aims at annotating protein functions (channel or not), the second one intends to assess the quality of the structures (X-ray or models) via a new scoring function deduced from the HPM classification.

  20. Structural Analysis Of CD59 Of Chinese Tree Shrew: A New Reference Molecule For Human Immune System Specific CD59 Drug Discovery.

    PubMed

    Panda, Subhamay; Kumari, Leena; Panda, Santamay

    2017-11-17

    Chinese tree shrews (Tupaia belangeri chinensis) bear several characteristics that are considered to be very crucial for utilizing in animal experimental models in biomedical research. Subsequent to the identification of key aspects and signaling pathways in nervous and immune systems, it is revealed that tree shrews acquires shared common as well as unique characteristics, and hence offers a genetic basis for employing this animal as a prospective model for biomedical research. CD59 glycoprotein, commonly referred to as MAC-inhibitory protein (MAC-IP), membrane inhibitor of reactive lysis (MIRL), or protectin, is encoded by the CD59 gene in human beings. It is the member of the LY6/uPAR/alpha-neurotoxin protein family. With this initial point the objective of this study was to determine a comparative composite based structure of CD59 of Chinese tree shrew. The additional objective of this study was to examine the distribution of negatively and positively charged amino acid over molecular modeled structure, distribution of secondary structural elements, hydrophobicity molecular surface analysis and electrostatic potential analysis with the assistance of several bioinformatical analytical tools. CD59 Amino acid sequence of Chinese tree shrew collected from the online database system of National Centre for Biotechnology Information. SignalP 4.0 online server was employed for detection of signal peptide instance within the protein sequence of CD59. Molecular model structure of CD59 protein was generated by the Iterative Threading ASSEmbly Refinement (I-TASSER) suite. The confirmation for three-dimensional structural model was evaluated by structure validation tools. Location of negatively and positively charged amino acid over molecular modeled structure, distribution of secondary structural elements, and hydrophobicity molecular surface analysis was performed with the help of Chimera tool. Electrostatic potential analysis was carried out with the adaptive Poisson-Boltzmann solver package. Subsequently validated model was used for the functionally critical amino acids and active site prediction. The functionally critical amino acids and ligand- binding site (LBS) of the proteins (modeled) was determined using the COACH program. Analysis of Ramachandran plot for Chinese tree shrew depicted that overall, 100% of the residues in homology model were observed in allowed and favored regions, sequentially leading to the validation of the standard of generated protein structural model. In case of CD59 of Chinese tree shrew, the total score of G-factor was found to be -0.66 that was generally larger than the acceptable value. This approach suggests the significance and acceptability of the modeled structure of CD59 of Chinese tree shrew. The molecular model data in cooperation to other relevant post model analysis data put forward molecular insight to protecting activity of CD59 protein molecule of Chinese tree shrew. In the present study, we have proposed the first molecular model structure of uncharted CD59 of Chinese tree shrew by significantly utilizing the comparative composite modeling approach. Therefore, the development of a structural model of the CD59 protein was carried out and analyzed further for deducing molecular enrichment technique. The collaborative effort of molecular model and other relevant data of post model analysis carry forward molecular understanding to protecting activity of CD59 functions towards better insight of features of this natural lead compound. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  1. Classification of proteins: available structural space for molecular modeling.

    PubMed

    Andreeva, Antonina

    2012-01-01

    The wealth of available protein structural data provides unprecedented opportunity to study and better understand the underlying principles of protein folding and protein structure evolution. A key to achieving this lies in the ability to analyse these data and to organize them in a coherent classification scheme. Over the past years several protein classifications have been developed that aim to group proteins based on their structural relationships. Some of these classification schemes explore the concept of structural neighbourhood (structural continuum), whereas other utilize the notion of protein evolution and thus provide a discrete rather than continuum view of protein structure space. This chapter presents a strategy for classification of proteins with known three-dimensional structure. Steps in the classification process along with basic definitions are introduced. Examples illustrating some fundamental concepts of protein folding and evolution with a special focus on the exceptions to them are presented.

  2. Molecular Modeling of Heme Proteins Using MOE: Bio-Inorganic and Structure-Function Activity for Undergraduates

    ERIC Educational Resources Information Center

    Ray, Gigi B.; Cook, J. Whitney

    2005-01-01

    A biochemical molecular modeling project on heme proteins suitable for an introductory Biochemistry I class has been designed with a 2-fold objective: i) to reinforce the correlation between protein three-dimensional structure and function through a discovery oriented project, and ii) to introduce students to the fields of bioinorganic and…

  3. Exploring the repeat protein universe through computational protein design

    DOE PAGES

    Brunette, TJ; Parmeggiani, Fabio; Huang, Po-Ssu; ...

    2015-12-16

    A central question in protein evolution is the extent to which naturally occurring proteins sample the space of folded structures accessible to the polypeptide chain. Repeat proteins composed of multiple tandem copies of a modular structure unit are widespread in nature and have critical roles in molecular recognition, signalling, and other essential biological processes. Naturally occurring repeat proteins have been re-engineered for molecular recognition and modular scaffolding applications. In this paper, we use computational protein design to investigate the space of folded structures that can be generated by tandem repeating a simple helix–loop–helix–loop structural motif. Eighty-three designs with sequences unrelatedmore » to known repeat proteins were experimentally characterized. Of these, 53 are monomeric and stable at 95 °C, and 43 have solution X-ray scattering spectra consistent with the design models. Crystal structures of 15 designs spanning a broad range of curvatures are in close agreement with the design models with root mean square deviations ranging from 0.7 to 2.5 Å. Finally, our results show that existing repeat proteins occupy only a small fraction of the possible repeat protein sequence and structure space and that it is possible to design novel repeat proteins with precisely specified geometries, opening up a wide array of new possibilities for biomolecular engineering.« less

  4. Building proteins from C alpha coordinates using the dihedral probability grid Monte Carlo method.

    PubMed Central

    Mathiowetz, A. M.; Goddard, W. A.

    1995-01-01

    Dihedral probability grid Monte Carlo (DPG-MC) is a general-purpose method of conformational sampling that can be applied to many problems in peptide and protein modeling. Here we present the DPG-MC method and apply it to predicting complete protein structures from C alpha coordinates. This is useful in such endeavors as homology modeling, protein structure prediction from lattice simulations, or fitting protein structures to X-ray crystallographic data. It also serves as an example of how DPG-MC can be applied to systems with geometric constraints. The conformational propensities for individual residues are used to guide conformational searches as the protein is built from the amino-terminus to the carboxyl-terminus. Results for a number of proteins show that both the backbone and side chain can be accurately modeled using DPG-MC. Backbone atoms are generally predicted with RMS errors of about 0.5 A (compared to X-ray crystal structure coordinates) and all atoms are predicted to an RMS error of 1.7 A or better. PMID:7549885

  5. Conserved thioredoxin fold is present in Pisum sativum L. sieve element occlusion-1 protein

    PubMed Central

    Umate, Pavan; Tuteja, Renu

    2010-01-01

    Homology-based three-dimensional model for Pisum sativum sieve element occlusion 1 (Ps.SEO1) (forisomes) protein was constructed. A stretch of amino acids (residues 320 to 456) which is well conserved in all known members of forisomes proteins was used to model the 3D structure of Ps.SEO1. The structural prediction was done using Protein Homology/analogY Recognition Engine (PHYRE) web server. Based on studies of local sequence alignment, the thioredoxin-fold containing protein [Structural Classification of Proteins (SCOP) code d1o73a_], a member of the glutathione peroxidase family was selected as a template for modeling the spatial structure of Ps.SEO1. Selection was based on comparison of primary sequence, higher match quality and alignment accuracy. Motif 1 (EVF) is conserved in Ps.SEO1, Vicia faba (Vf.For1) and Medicago truncatula (MT.SEO3); motif 2 (KKED) is well conserved across all forisomes proteins and motif 3 (IGYIGNP) is conserved in Ps.SEO1 and Vf.For1. PMID:20404566

  6. Molecular dynamics simulations to study the solvent influence on protein structure

    NASA Astrophysics Data System (ADS)

    Dominguez, Hector

    2016-05-01

    Molecular simulations were carried out to study the influence of different water models in two protein systems. Most of the solvents used in protein simulations, e.g., SPC/E or TIP3P, fail to reproduce the bulk water static dielectric constant. Recently a new water model, TIP4P/ɛ, which reproduces the experimental dielectric constant was reported. Therefore, simulations for two different proteins, Lysozyme and Ubiquitin with SPC/E, TIP3P and TIP4P/ɛ solvents were carried out. Dielectric constants and structural properties were calculated and comparisons were conducted. The structural properties between the three models are very similar, however, the dielectric constants are different in each case.

  7. Water entrapment and structure ordering as protection mechanisms for protein structural preservation

    NASA Astrophysics Data System (ADS)

    Arsiccio, A.; Pisano, R.

    2018-02-01

    In this paper, molecular dynamics is used to further gain insight into the mechanisms by which typical pharmaceutical excipients preserve the protein structure. More specifically, the water entrapment scenario will be analyzed, which states that excipients form a cage around the protein, entrapping and slowing water molecules. Human growth hormone will be used as a model protein, but the results obtained are generally applicable. We will show that water entrapment, as well as the other mechanisms of protein stabilization in the dried state proposed so far, may be related to the formation of a dense hydrogen bonding network between excipient molecules. We will also present a simple phenomenological model capable of explaining the behavior and stabilizing effect provided by typical cryo- and lyo-protectants. This model uses, as input data, molecular properties which can be easily evaluated. We will finally show that the model predictions compare fairly well with experimental data.

  8. A Particle Swarm Optimization-Based Approach with Local Search for Predicting Protein Folding.

    PubMed

    Yang, Cheng-Hong; Lin, Yu-Shiun; Chuang, Li-Yeh; Chang, Hsueh-Wei

    2017-10-01

    The hydrophobic-polar (HP) model is commonly used for predicting protein folding structures and hydrophobic interactions. This study developed a particle swarm optimization (PSO)-based algorithm combined with local search algorithms; specifically, the high exploration PSO (HEPSO) algorithm (which can execute global search processes) was combined with three local search algorithms (hill-climbing algorithm, greedy algorithm, and Tabu table), yielding the proposed HE-L-PSO algorithm. By using 20 known protein structures, we evaluated the performance of the HE-L-PSO algorithm in predicting protein folding in the HP model. The proposed HE-L-PSO algorithm exhibited favorable performance in predicting both short and long amino acid sequences with high reproducibility and stability, compared with seven reported algorithms. The HE-L-PSO algorithm yielded optimal solutions for all predicted protein folding structures. All HE-L-PSO-predicted protein folding structures possessed a hydrophobic core that is similar to normal protein folding.

  9. XLinkDB 2.0: integrated, large-scale structural analysis of protein crosslinking data

    PubMed Central

    Schweppe, Devin K.; Zheng, Chunxiang; Chavez, Juan D.; Navare, Arti T.; Wu, Xia; Eng, Jimmy K.; Bruce, James E.

    2016-01-01

    Motivation: Large-scale chemical cross-linking with mass spectrometry (XL-MS) analyses are quickly becoming a powerful means for high-throughput determination of protein structural information and protein–protein interactions. Recent studies have garnered thousands of cross-linked interactions, yet the field lacks an effective tool to compile experimental data or access the network and structural knowledge for these large scale analyses. We present XLinkDB 2.0 which integrates tools for network analysis, Protein Databank queries, modeling of predicted protein structures and modeling of docked protein structures. The novel, integrated approach of XLinkDB 2.0 enables the holistic analysis of XL-MS protein interaction data without limitation to the cross-linker or analytical system used for the analysis. Availability and Implementation: XLinkDB 2.0 can be found here, including documentation and help: http://xlinkdb.gs.washington.edu/. Contact: jimbruce@uw.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27153666

  10. Improved cryoEM-Guided Iterative Molecular Dynamics–Rosetta Protein Structure Refinement Protocol for High Precision Protein Structure Prediction

    PubMed Central

    2016-01-01

    Many excellent methods exist that incorporate cryo-electron microscopy (cryoEM) data to constrain computational protein structure prediction and refinement. Previously, it was shown that iteration of two such orthogonal sampling and scoring methods – Rosetta and molecular dynamics (MD) simulations – facilitated exploration of conformational space in principle. Here, we go beyond a proof-of-concept study and address significant remaining limitations of the iterative MD–Rosetta protein structure refinement protocol. Specifically, all parts of the iterative refinement protocol are now guided by medium-resolution cryoEM density maps, and previous knowledge about the native structure of the protein is no longer necessary. Models are identified solely based on score or simulation time. All four benchmark proteins showed substantial improvement through three rounds of the iterative refinement protocol. The best-scoring final models of two proteins had sub-Ångstrom RMSD to the native structure over residues in secondary structure elements. Molecular dynamics was most efficient in refining secondary structure elements and was thus highly complementary to the Rosetta refinement which is most powerful in refining side chains and loop regions. PMID:25883538

  11. Structure-Based Druggability Assessment of the Mammalian Structural Proteome with Inclusion of Light Protein Flexibility

    PubMed Central

    Loving, Kathryn A.; Lin, Andy; Cheng, Alan C.

    2014-01-01

    Advances reported over the last few years and the increasing availability of protein crystal structure data have greatly improved structure-based druggability approaches. However, in practice, nearly all druggability estimation methods are applied to protein crystal structures as rigid proteins, with protein flexibility often not directly addressed. The inclusion of protein flexibility is important in correctly identifying the druggability of pockets that would be missed by methods based solely on the rigid crystal structure. These include cryptic pockets and flexible pockets often found at protein-protein interaction interfaces. Here, we apply an approach that uses protein modeling in concert with druggability estimation to account for light protein backbone movement and protein side-chain flexibility in protein binding sites. We assess the advantages and limitations of this approach on widely-used protein druggability sets. Applying the approach to all mammalian protein crystal structures in the PDB results in identification of 69 proteins with potential druggable cryptic pockets. PMID:25079060

  12. A coarse grain model for protein-surface interactions

    NASA Astrophysics Data System (ADS)

    Wei, Shuai; Knotts, Thomas A.

    2013-09-01

    The interaction of proteins with surfaces is important in numerous applications in many fields—such as biotechnology, proteomics, sensors, and medicine—but fundamental understanding of how protein stability and structure are affected by surfaces remains incomplete. Over the last several years, molecular simulation using coarse grain models has yielded significant insights, but the formalisms used to represent the surface interactions have been rudimentary. We present a new model for protein surface interactions that incorporates the chemical specificity of both the surface and the residues comprising the protein in the context of a one-bead-per-residue, coarse grain approach that maintains computational efficiency. The model is parameterized against experimental adsorption energies for multiple model peptides on different types of surfaces. The validity of the model is established by its ability to quantitatively and qualitatively predict the free energy of adsorption and structural changes for multiple biologically-relevant proteins on different surfaces. The validation, done with proteins not used in parameterization, shows that the model produces remarkable agreement between simulation and experiment.

  13. Molecular modeling of the AhR structure and interactions can shed light on ligand-dependent activation and transformation mechanisms.

    PubMed

    Bonati, Laura; Corrada, Dario; Tagliabue, Sara Giani; Motta, Stefano

    2017-02-01

    Molecular modeling has given important contributions to elucidation of the main stages in the AhR signal transduction pathway. Despite the lack of experimentally determined structures of the AhR functional domains, information derived from homologous systems has been exploited for modeling their structure and interactions. Homology models of the AhR PASB domain have provided information on the binding cavity and contributed to elucidate species-specific differences in ligand binding. Molecular Docking simulations of the ligand binding process have given insights into differences in binding of diverse agonists, antagonists, and selective AhR modulators, and their application to virtual screening of large databases of compounds have allowed identification of novel AhR ligands. Recently available structural information on protein-protein and protein-DNA complexes of other bHLH-PAS systems has opened the way for modeling the AhR:ARNT dimer structure and investigating the mechanisms of AhR transformation and DNA binding. Future research directions should include simulation of the protein dynamics to obtain a more reliable description of intermolecular interactions involved in signal transmission.

  14. @TOME-2: a new pipeline for comparative modeling of protein–ligand complexes

    PubMed Central

    Pons, Jean-Luc; Labesse, Gilles

    2009-01-01

    @TOME 2.0 is new web pipeline dedicated to protein structure modeling and small ligand docking based on comparative analyses. @TOME 2.0 allows fold recognition, template selection, structural alignment editing, structure comparisons, 3D-model building and evaluation. These tasks are routinely used in sequence analyses for structure prediction. In our pipeline the necessary software is efficiently interconnected in an original manner to accelerate all the processes. Furthermore, we have also connected comparative docking of small ligands that is performed using protein–protein superposition. The input is a simple protein sequence in one-letter code with no comment. The resulting 3D model, protein–ligand complexes and structural alignments can be visualized through dedicated Web interfaces or can be downloaded for further studies. These original features will aid in the functional annotation of proteins and the selection of templates for molecular modeling and virtual screening. Several examples are described to highlight some of the new functionalities provided by this pipeline. The server and its documentation are freely available at http://abcis.cbs.cnrs.fr/AT2/ PMID:19443448

  15. Protein asparagine deamidation prediction based on structures with machine learning methods.

    PubMed

    Jia, Lei; Sun, Yaxiong

    2017-01-01

    Chemical stability is a major concern in the development of protein therapeutics due to its impact on both efficacy and safety. Protein "hotspots" are amino acid residues that are subject to various chemical modifications, including deamidation, isomerization, glycosylation, oxidation etc. A more accurate prediction method for potential hotspot residues would allow their elimination or reduction as early as possible in the drug discovery process. In this work, we focus on prediction models for asparagine (Asn) deamidation. Sequence-based prediction method simply identifies the NG motif (amino acid asparagine followed by a glycine) to be liable to deamidation. It still dominates deamidation evaluation process in most pharmaceutical setup due to its convenience. However, the simple sequence-based method is less accurate and often causes over-engineering a protein. We introduce structure-based prediction models by mining available experimental and structural data of deamidated proteins. Our training set contains 194 Asn residues from 25 proteins that all have available high-resolution crystal structures. Experimentally measured deamidation half-life of Asn in penta-peptides as well as 3D structure-based properties, such as solvent exposure, crystallographic B-factors, local secondary structure and dihedral angles etc., were used to train prediction models with several machine learning algorithms. The prediction tools were cross-validated as well as tested with an external test data set. The random forest model had high enrichment in ranking deamidated residues higher than non-deamidated residues while effectively eliminated false positive predictions. It is possible that such quantitative protein structure-function relationship tools can also be applied to other protein hotspot predictions. In addition, we extensively discussed metrics being used to evaluate the performance of predicting unbalanced data sets such as the deamidation case.

  16. A Stochastic Point Cloud Sampling Method for Multi-Template Protein Comparative Modeling.

    PubMed

    Li, Jilong; Cheng, Jianlin

    2016-05-10

    Generating tertiary structural models for a target protein from the known structure of its homologous template proteins and their pairwise sequence alignment is a key step in protein comparative modeling. Here, we developed a new stochastic point cloud sampling method, called MTMG, for multi-template protein model generation. The method first superposes the backbones of template structures, and the Cα atoms of the superposed templates form a point cloud for each position of a target protein, which are represented by a three-dimensional multivariate normal distribution. MTMG stochastically resamples the positions for Cα atoms of the residues whose positions are uncertain from the distribution, and accepts or rejects new position according to a simulated annealing protocol, which effectively removes atomic clashes commonly encountered in multi-template comparative modeling. We benchmarked MTMG on 1,033 sequence alignments generated for CASP9, CASP10 and CASP11 targets, respectively. Using multiple templates with MTMG improves the GDT-TS score and TM-score of structural models by 2.96-6.37% and 2.42-5.19% on the three datasets over using single templates. MTMG's performance was comparable to Modeller in terms of GDT-TS score, TM-score, and GDT-HA score, while the average RMSD was improved by a new sampling approach. The MTMG software is freely available at: http://sysbio.rnet.missouri.edu/multicom_toolbox/mtmg.html.

  17. A Stochastic Point Cloud Sampling Method for Multi-Template Protein Comparative Modeling

    PubMed Central

    Li, Jilong; Cheng, Jianlin

    2016-01-01

    Generating tertiary structural models for a target protein from the known structure of its homologous template proteins and their pairwise sequence alignment is a key step in protein comparative modeling. Here, we developed a new stochastic point cloud sampling method, called MTMG, for multi-template protein model generation. The method first superposes the backbones of template structures, and the Cα atoms of the superposed templates form a point cloud for each position of a target protein, which are represented by a three-dimensional multivariate normal distribution. MTMG stochastically resamples the positions for Cα atoms of the residues whose positions are uncertain from the distribution, and accepts or rejects new position according to a simulated annealing protocol, which effectively removes atomic clashes commonly encountered in multi-template comparative modeling. We benchmarked MTMG on 1,033 sequence alignments generated for CASP9, CASP10 and CASP11 targets, respectively. Using multiple templates with MTMG improves the GDT-TS score and TM-score of structural models by 2.96–6.37% and 2.42–5.19% on the three datasets over using single templates. MTMG’s performance was comparable to Modeller in terms of GDT-TS score, TM-score, and GDT-HA score, while the average RMSD was improved by a new sampling approach. The MTMG software is freely available at: http://sysbio.rnet.missouri.edu/multicom_toolbox/mtmg.html. PMID:27161489

  18. In silico characterization and analysis of RTBP1 and NgTRF1 protein through MD simulation and molecular docking - A comparative study.

    PubMed

    Mukherjee, Koel; Pandey, Dev Mani; Vidyarthi, Ambarish Saran

    2015-02-06

    Gaining access to sequence and structure information of telomere binding proteins helps in understanding the essential biological processes involve in conserved sequence specific interaction between DNA and the proteins. Rice telomere binding protein (RTBP1) and Nicotiana glutinosa telomere repeat binding factor (NgTRF1) are helix turn helix motif type of proteins that plays role in telomeric DNA protection and length regulation. Both the proteins share same type of domain but till now there is very less communication on the in silico studies of these complete proteins.Here we intend to do a comparative study between two proteins through modeling of the complete proteins, physiochemical characterization, MD simulation and DNA-protein docking. I-TASSER and CLC protein work bench was performed to find out the protein 3D structure as well as the different parameters to characterize the proteins. MD simulation was completed by GROMOS forcefield of GROMACS for 10 ns of time stretch. The simulated 3D structures were docked with template DNA (3D DNA modeled through 3D-DART) of TTTAGGG conserved sequence motif using HADDOCK web server.Digging up all the facts about the proteins it was reveled that around 120 amino acids in the tail part was showing a good sequence similarity between the proteins. Molecular modeling, sequence characterization and secondary structure prediction also indicates the similarity between the protein's structure and sequence. The result of MD simulation highlights on the RMSD, RMSF, Rg, PCA and Energy plots which also conveys the similar type of motional behavior between them. The best complex formation for both the proteins in docking result also indicates for the first interaction site which is mainly the helix3 region of the DNA binding domain. The overall computational analysis reveals that RTBP1 and NgTRF1 proteins display good amount of similarity in their physicochemical properties, structure, dynamics and binding mode.

  19. In Silico Characterization and Analysis of RTBP1 and NgTRF1 Protein Through MD Simulation and Molecular Docking: A Comparative Study.

    PubMed

    Mukherjee, Koel; Pandey, Dev Mani; Vidyarthi, Ambarish Saran

    2015-09-01

    Gaining access to sequence and structure information of telomere-binding proteins helps in understanding the essential biological processes involve in conserved sequence-specific interaction between DNA and the proteins. Rice telomere-binding protein (RTBP1) and Nicotiana glutinosa telomere repeat binding factor (NgTRF1) are helix-turn-helix motif type of proteins that plays role in telomeric DNA protection and length regulation. Both the proteins share same type of domain, but till now there is very less communication on the in silico studies of these complete proteins. Here we intend to do a comparative study between two proteins through modeling of the complete proteins, physiochemical characterization, MD simulation and DNA-protein docking. I-TASSER and CLC protein work bench was performed to find out the protein 3D structure as well as the different parameters to characterize the proteins. MD simulation was completed by GROMOS forcefield of GROMACS for 10 ns of time stretch. The simulated 3D structures were docked with template DNA (3D DNA modeled through 3D-DART) of TTTAGGG conserved sequence motif using HADDOCK Web server. By digging up all the facts about the proteins, it was revealed that around 120 amino acids in the tail part were showing a good sequence similarity between the proteins. Molecular modeling, sequence characterization and secondary structure prediction also indicate the similarity between the protein's structure and sequence. The result of MD simulation highlights on the RMSD, RMSF, Rg, PCA and energy plots which also conveys the similar type of motional behavior between them. The best complex formation for both the proteins in docking result also indicates for the first interaction site which is mainly the helix3 region of the DNA-binding domain. The overall computational analysis reveals that RTBP1 and NgTRF1 proteins display good amount of similarity in their physicochemical properties, structure, dynamics and binding mode.

  20. CABS-flex: Server for fast simulation of protein structure fluctuations.

    PubMed

    Jamroz, Michal; Kolinski, Andrzej; Kmiecik, Sebastian

    2013-07-01

    The CABS-flex server (http://biocomp.chem.uw.edu.pl/CABSflex) implements CABS-model-based protocol for the fast simulations of near-native dynamics of globular proteins. In this application, the CABS model was shown to be a computationally efficient alternative to all-atom molecular dynamics--a classical simulation approach. The simulation method has been validated on a large set of molecular dynamics simulation data. Using a single input (user-provided file in PDB format), the CABS-flex server outputs an ensemble of protein models (in all-atom PDB format) reflecting the flexibility of the input structure, together with the accompanying analysis (residue mean-square-fluctuation profile and others). The ensemble of predicted models can be used in structure-based studies of protein functions and interactions.

  1. Characterization of protein-folding pathways by reduced-space modeling.

    PubMed

    Kmiecik, Sebastian; Kolinski, Andrzej

    2007-07-24

    Ab initio simulations of the folding pathways are currently limited to very small proteins. For larger proteins, some approximations or simplifications in protein models need to be introduced. Protein folding and unfolding are among the basic processes in the cell and are very difficult to characterize in detail by experiment or simulation. Chymotrypsin inhibitor 2 (CI2) and barnase are probably the best characterized experimentally in this respect. For these model systems, initial folding stages were simulated by using CA-CB-side chain (CABS), a reduced-space protein-modeling tool. CABS employs knowledge-based potentials that proved to be very successful in protein structure prediction. With the use of isothermal Monte Carlo (MC) dynamics, initiation sites with a residual structure and weak tertiary interactions were identified. Such structures are essential for the initiation of the folding process through a sequential reduction of the protein conformational space, overcoming the Levinthal paradox in this manner. Furthermore, nucleation sites that initiate a tertiary interactions network were located. The MC simulations correspond perfectly to the results of experimental and theoretical research and bring insights into CI2 folding mechanism: unambiguous sequence of folding events was reported as well as cooperative substructures compatible with those obtained in recent molecular dynamics unfolding studies. The correspondence between the simulation and experiment shows that knowledge-based potentials are not only useful in protein structure predictions but are also capable of reproducing the folding pathways. Thus, the results of this work significantly extend the applicability range of reduced models in the theoretical study of proteins.

  2. CABS-fold: Server for the de novo and consensus-based prediction of protein structure.

    PubMed

    Blaszczyk, Maciej; Jamroz, Michal; Kmiecik, Sebastian; Kolinski, Andrzej

    2013-07-01

    The CABS-fold web server provides tools for protein structure prediction from sequence only (de novo modeling) and also using alternative templates (consensus modeling). The web server is based on the CABS modeling procedures ranked in previous Critical Assessment of techniques for protein Structure Prediction competitions as one of the leading approaches for de novo and template-based modeling. Except for template data, fragmentary distance restraints can also be incorporated into the modeling process. The web server output is a coarse-grained trajectory of generated conformations, its Jmol representation and predicted models in all-atom resolution (together with accompanying analysis). CABS-fold can be freely accessed at http://biocomp.chem.uw.edu.pl/CABSfold.

  3. CABS-fold: server for the de novo and consensus-based prediction of protein structure

    PubMed Central

    Blaszczyk, Maciej; Jamroz, Michal; Kmiecik, Sebastian; Kolinski, Andrzej

    2013-01-01

    The CABS-fold web server provides tools for protein structure prediction from sequence only (de novo modeling) and also using alternative templates (consensus modeling). The web server is based on the CABS modeling procedures ranked in previous Critical Assessment of techniques for protein Structure Prediction competitions as one of the leading approaches for de novo and template-based modeling. Except for template data, fragmentary distance restraints can also be incorporated into the modeling process. The web server output is a coarse-grained trajectory of generated conformations, its Jmol representation and predicted models in all-atom resolution (together with accompanying analysis). CABS-fold can be freely accessed at http://biocomp.chem.uw.edu.pl/CABSfold. PMID:23748950

  4. STRUM: structure-based prediction of protein stability changes upon single-point mutation.

    PubMed

    Quan, Lijun; Lv, Qiang; Zhang, Yang

    2016-10-01

    Mutations in human genome are mainly through single nucleotide polymorphism, some of which can affect stability and function of proteins, causing human diseases. Several methods have been proposed to predict the effect of mutations on protein stability; but most require features from experimental structure. Given the fast progress in protein structure prediction, this work explores the possibility to improve the mutation-induced stability change prediction using low-resolution structure modeling. We developed a new method (STRUM) for predicting stability change caused by single-point mutations. Starting from wild-type sequences, 3D models are constructed by the iterative threading assembly refinement (I-TASSER) simulations, where physics- and knowledge-based energy functions are derived on the I-TASSER models and used to train STRUM models through gradient boosting regression. STRUM was assessed by 5-fold cross validation on 3421 experimentally determined mutations from 150 proteins. The Pearson correlation coefficient (PCC) between predicted and measured changes of Gibbs free-energy gap, ΔΔG, upon mutation reaches 0.79 with a root-mean-square error 1.2 kcal/mol in the mutation-based cross-validations. The PCC reduces if separating training and test mutations from non-homologous proteins, which reflects inherent correlations in the current mutation sample. Nevertheless, the results significantly outperform other state-of-the-art methods, including those built on experimental protein structures. Detailed analyses show that the most sensitive features in STRUM are the physics-based energy terms on I-TASSER models and the conservation scores from multiple-threading template alignments. However, the ΔΔG prediction accuracy has only a marginal dependence on the accuracy of protein structure models as long as the global fold is correct. These data demonstrate the feasibility to use low-resolution structure modeling for high-accuracy stability change prediction upon point mutations. http://zhanglab.ccmb.med.umich.edu/STRUM/ CONTACT: qiang@suda.edu.cn and zhng@umich.edu Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  5. STRUM: structure-based prediction of protein stability changes upon single-point mutation

    PubMed Central

    Quan, Lijun; Lv, Qiang; Zhang, Yang

    2016-01-01

    Motivation: Mutations in human genome are mainly through single nucleotide polymorphism, some of which can affect stability and function of proteins, causing human diseases. Several methods have been proposed to predict the effect of mutations on protein stability; but most require features from experimental structure. Given the fast progress in protein structure prediction, this work explores the possibility to improve the mutation-induced stability change prediction using low-resolution structure modeling. Results: We developed a new method (STRUM) for predicting stability change caused by single-point mutations. Starting from wild-type sequences, 3D models are constructed by the iterative threading assembly refinement (I-TASSER) simulations, where physics- and knowledge-based energy functions are derived on the I-TASSER models and used to train STRUM models through gradient boosting regression. STRUM was assessed by 5-fold cross validation on 3421 experimentally determined mutations from 150 proteins. The Pearson correlation coefficient (PCC) between predicted and measured changes of Gibbs free-energy gap, ΔΔG, upon mutation reaches 0.79 with a root-mean-square error 1.2 kcal/mol in the mutation-based cross-validations. The PCC reduces if separating training and test mutations from non-homologous proteins, which reflects inherent correlations in the current mutation sample. Nevertheless, the results significantly outperform other state-of-the-art methods, including those built on experimental protein structures. Detailed analyses show that the most sensitive features in STRUM are the physics-based energy terms on I-TASSER models and the conservation scores from multiple-threading template alignments. However, the ΔΔG prediction accuracy has only a marginal dependence on the accuracy of protein structure models as long as the global fold is correct. These data demonstrate the feasibility to use low-resolution structure modeling for high-accuracy stability change prediction upon point mutations. Availability and Implementation: http://zhanglab.ccmb.med.umich.edu/STRUM/ Contact: qiang@suda.edu.cn and zhng@umich.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27318206

  6. Protein Folding and Structure Prediction from the Ground Up: The Atomistic Associative Memory, Water Mediated, Structure and Energy Model.

    PubMed

    Chen, Mingchen; Lin, Xingcheng; Zheng, Weihua; Onuchic, José N; Wolynes, Peter G

    2016-08-25

    The associative memory, water mediated, structure and energy model (AWSEM) is a coarse-grained force field with transferable tertiary interactions that incorporates local in sequence energetic biases using bioinformatically derived structural information about peptide fragments with locally similar sequences that we call memories. The memory information from the protein data bank (PDB) database guides proper protein folding. The structural information about available sequences in the database varies in quality and can sometimes lead to frustrated free energy landscapes locally. One way out of this difficulty is to construct the input fragment memory information from all-atom simulations of portions of the complete polypeptide chain. In this paper, we investigate this approach first put forward by Kwac and Wolynes in a more complete way by studying the structure prediction capabilities of this approach for six α-helical proteins. This scheme which we call the atomistic associative memory, water mediated, structure and energy model (AAWSEM) amounts to an ab initio protein structure prediction method that starts from the ground up without using bioinformatic input. The free energy profiles from AAWSEM show that atomistic fragment memories are sufficient to guide the correct folding when tertiary forces are included. AAWSEM combines the efficiency of coarse-grained simulations on the full protein level with the local structural accuracy achievable from all-atom simulations of only parts of a large protein. The results suggest that a hybrid use of atomistic fragment memory and database memory in structural predictions may well be optimal for many practical applications.

  7. Template-Based Modeling of Protein-RNA Interactions

    PubMed Central

    Zheng, Jinfang; Kundrotas, Petras J.; Vakser, Ilya A.

    2016-01-01

    Protein-RNA complexes formed by specific recognition between RNA and RNA-binding proteins play an important role in biological processes. More than a thousand of such proteins in human are curated and many novel RNA-binding proteins are to be discovered. Due to limitations of experimental approaches, computational techniques are needed for characterization of protein-RNA interactions. Although much progress has been made, adequate methodologies reliably providing atomic resolution structural details are still lacking. Although protein-RNA free docking approaches proved to be useful, in general, the template-based approaches provide higher quality of predictions. Templates are key to building a high quality model. Sequence/structure relationships were studied based on a representative set of binary protein-RNA complexes from PDB. Several approaches were tested for pairwise target/template alignment. The analysis revealed a transition point between random and correct binding modes. The results showed that structural alignment is better than sequence alignment in identifying good templates, suitable for generating protein-RNA complexes close to the native structure, and outperforms free docking, successfully predicting complexes where the free docking fails, including cases of significant conformational change upon binding. A template-based protein-RNA interaction modeling protocol PRIME was developed and benchmarked on a representative set of complexes. PMID:27662342

  8. Building protein-protein interaction networks for Leishmania species through protein structural information.

    PubMed

    Dos Santos Vasconcelos, Crhisllane Rafaele; de Lima Campos, Túlio; Rezende, Antonio Mauro

    2018-03-06

    Systematic analysis of a parasite interactome is a key approach to understand different biological processes. It makes possible to elucidate disease mechanisms, to predict protein functions and to select promising targets for drug development. Currently, several approaches for protein interaction prediction for non-model species incorporate only small fractions of the entire proteomes and their interactions. Based on this perspective, this study presents an integration of computational methodologies, protein network predictions and comparative analysis of the protozoan species Leishmania braziliensis and Leishmania infantum. These parasites cause Leishmaniasis, a worldwide distributed and neglected disease, with limited treatment options using currently available drugs. The predicted interactions were obtained from a meta-approach, applying rigid body docking tests and template-based docking on protein structures predicted by different comparative modeling techniques. In addition, we trained a machine-learning algorithm (Gradient Boosting) using docking information performed on a curated set of positive and negative protein interaction data. Our final model obtained an AUC = 0.88, with recall = 0.69, specificity = 0.88 and precision = 0.83. Using this approach, it was possible to confidently predict 681 protein structures and 6198 protein interactions for L. braziliensis, and 708 protein structures and 7391 protein interactions for L. infantum. The predicted networks were integrated to protein interaction data already available, analyzed using several topological features and used to classify proteins as essential for network stability. The present study allowed to demonstrate the importance of integrating different methodologies of interaction prediction to increase the coverage of the protein interaction of the studied protocols, besides it made available protein structures and interactions not previously reported.

  9. The turn of the screw: an exercise in protein secondary structure.

    PubMed

    Pikaart, Michael

    2011-01-01

    An exercise using simple paper strips to illustrate protein helical and sheet secondary structures is presented. Drawing on the rich historical context of the use of physical models in protein biochemistry by early practitioners, in particular Linus Pauling, the purpose of this activity is to cultivate in students a hands-on, intuitive sense of protein secondary structure and to complement the common computer-based structural portrayals often used in teaching biochemistry. As students fold these paper strips into model secondary structures, they will better grasp how intramolecular hydrogen bonds form in the folding of a polypeptide into secondary structure, and how these hydrogen bonds direct the overall shape of helical and sheet structures, including the handedness of the α-helix and the difference between right- and the left-handed twist. Copyright © 2010 Wiley Periodicals, Inc.

  10. Modeling of protein binary complexes using structural mass spectrometry data

    PubMed Central

    Kamal, J.K. Amisha; Chance, Mark R.

    2008-01-01

    In this article, we describe a general approach to modeling the structure of binary protein complexes using structural mass spectrometry data combined with molecular docking. In the first step, hydroxyl radical mediated oxidative protein footprinting is used to identify residues that experience conformational reorganization due to binding or participate in the binding interface. In the second step, a three-dimensional atomic structure of the complex is derived by computational modeling. Homology modeling approaches are used to define the structures of the individual proteins if footprinting detects significant conformational reorganization as a function of complex formation. A three-dimensional model of the complex is constructed from these binary partners using the ClusPro program, which is composed of docking, energy filtering, and clustering steps. Footprinting data are used to incorporate constraints—positive and/or negative—in the docking step and are also used to decide the type of energy filter—electrostatics or desolvation—in the successive energy-filtering step. By using this approach, we examine the structure of a number of binary complexes of monomeric actin and compare the results to crystallographic data. Based on docking alone, a number of competing models with widely varying structures are observed, one of which is likely to agree with crystallographic data. When the docking steps are guided by footprinting data, accurate models emerge as top scoring. We demonstrate this method with the actin/gelsolin segment-1 complex. We also provide a structural model for the actin/cofilin complex using this approach which does not have a crystal or NMR structure. PMID:18042684

  11. SMOQ: a tool for predicting the absolute residue-specific quality of a single protein model with support vector machines

    PubMed Central

    2014-01-01

    Background It is important to predict the quality of a protein structural model before its native structure is known. The method that can predict the absolute local quality of individual residues in a single protein model is rare, yet particularly needed for using, ranking and refining protein models. Results We developed a machine learning tool (SMOQ) that can predict the distance deviation of each residue in a single protein model. SMOQ uses support vector machines (SVM) with protein sequence and structural features (i.e. basic feature set), including amino acid sequence, secondary structures, solvent accessibilities, and residue-residue contacts to make predictions. We also trained a SVM model with two new additional features (profiles and SOV scores) on 20 CASP8 targets and found that including them can only improve the performance when real deviations between native and model are higher than 5Å. The SMOQ tool finally released uses the basic feature set trained on 85 CASP8 targets. Moreover, SMOQ implemented a way to convert predicted local quality scores into a global quality score. SMOQ was tested on the 84 CASP9 single-domain targets. The average difference between the residue-specific distance deviation predicted by our method and the actual distance deviation on the test data is 2.637Å. The global quality prediction accuracy of the tool is comparable to other good tools on the same benchmark. Conclusion SMOQ is a useful tool for protein single model quality assessment. Its source code and executable are available at: http://sysbio.rnet.missouri.edu/multicom_toolbox/. PMID:24776231

  12. SMOQ: a tool for predicting the absolute residue-specific quality of a single protein model with support vector machines.

    PubMed

    Cao, Renzhi; Wang, Zheng; Wang, Yiheng; Cheng, Jianlin

    2014-04-28

    It is important to predict the quality of a protein structural model before its native structure is known. The method that can predict the absolute local quality of individual residues in a single protein model is rare, yet particularly needed for using, ranking and refining protein models. We developed a machine learning tool (SMOQ) that can predict the distance deviation of each residue in a single protein model. SMOQ uses support vector machines (SVM) with protein sequence and structural features (i.e. basic feature set), including amino acid sequence, secondary structures, solvent accessibilities, and residue-residue contacts to make predictions. We also trained a SVM model with two new additional features (profiles and SOV scores) on 20 CASP8 targets and found that including them can only improve the performance when real deviations between native and model are higher than 5Å. The SMOQ tool finally released uses the basic feature set trained on 85 CASP8 targets. Moreover, SMOQ implemented a way to convert predicted local quality scores into a global quality score. SMOQ was tested on the 84 CASP9 single-domain targets. The average difference between the residue-specific distance deviation predicted by our method and the actual distance deviation on the test data is 2.637Å. The global quality prediction accuracy of the tool is comparable to other good tools on the same benchmark. SMOQ is a useful tool for protein single model quality assessment. Its source code and executable are available at: http://sysbio.rnet.missouri.edu/multicom_toolbox/.

  13. SDSL-ESR-based protein structure characterization.

    PubMed

    Strancar, Janez; Kavalenka, Aleh; Urbancic, Iztok; Ljubetic, Ajasja; Hemminga, Marcus A

    2010-03-01

    As proteins are key molecules in living cells, knowledge about their structure can provide important insights and applications in science, biotechnology, and medicine. However, many protein structures are still a big challenge for existing high-resolution structure-determination methods, as can be seen in the number of protein structures published in the Protein Data Bank. This is especially the case for less-ordered, more hydrophobic and more flexible protein systems. The lack of efficient methods for structure determination calls for urgent development of a new class of biophysical techniques. This work attempts to address this problem with a novel combination of site-directed spin labelling electron spin resonance spectroscopy (SDSL-ESR) and protein structure modelling, which is coupled by restriction of the conformational spaces of the amino acid side chains. Comparison of the application to four different protein systems enables us to generalize the new method and to establish a general procedure for determination of protein structure.

  14. Probing the Energetics of Dynactin Filament Assembly and the Binding of Cargo Adaptor Proteins Using Molecular Dynamics Simulation and Electrostatics-Based Structural Modeling.

    PubMed

    Zheng, Wenjun

    2017-01-10

    Dynactin, a large multiprotein complex, binds with the cytoplasmic dynein-1 motor and various adaptor proteins to allow recruitment and transportation of cellular cargoes toward the minus end of microtubules. The structure of the dynactin complex is built around an actin-like minifilament with a defined length, which has been visualized in a high-resolution structure of the dynactin filament determined by cryo-electron microscopy (cryo-EM). To understand the energetic basis of dynactin filament assembly, we used molecular dynamics simulation to probe the intersubunit interactions among the actin-like proteins, various capping proteins, and four extended regions of the dynactin shoulder. Our simulations revealed stronger intersubunit interactions at the barbed and pointed ends of the filament and involving the extended regions (compared with the interactions within the filament), which may energetically drive filament termination by the capping proteins and recruitment of the actin-like proteins by the extended regions, two key features of the dynactin filament assembly process. Next, we modeled the unknown binding configuration among dynactin, dynein tails, and a number of coiled-coil adaptor proteins (including several Bicaudal-D and related proteins and three HOOK proteins), and predicted a key set of charged residues involved in their electrostatic interactions. Our modeling is consistent with previous findings of conserved regions, functional sites, and disease mutations in the adaptor proteins and will provide a structural framework for future functional and mutational studies of these adaptor proteins. In sum, this study yielded rich structural and energetic information about dynactin and associated adaptor proteins that cannot be directly obtained from the cryo-EM structures with limited resolutions.

  15. Random close packing in protein cores

    NASA Astrophysics Data System (ADS)

    Ohern, Corey

    Shortly after the determination of the first protein x-ray crystal structures, researchers analyzed their cores and reported packing fractions ϕ ~ 0 . 75 , a value that is similar to close packing equal-sized spheres. A limitation of these analyses was the use of `extended atom' models, rather than the more physically accurate `explicit hydrogen' model. The validity of using the explicit hydrogen model is proved by its ability to predict the side chain dihedral angle distributions observed in proteins. We employ the explicit hydrogen model to calculate the packing fraction of the cores of over 200 high resolution protein structures. We find that these protein cores have ϕ ~ 0 . 55 , which is comparable to random close-packing of non-spherical particles. This result provides a deeper understanding of the physical basis of protein structure that will enable predictions of the effects of amino acid mutations and design of new functional proteins. We gratefully acknowledge the support of the Raymond and Beverly Sackler Institute for Biological, Physical, and Engineering Sciences, National Library of Medicine training grant T15LM00705628 (J.C.G.), and National Science Foundation DMR-1307712 (L.R.).

  16. The SARS coronavirus nucleocapsid protein--forms and functions.

    PubMed

    Chang, Chung-ke; Hou, Ming-Hon; Chang, Chi-Fon; Hsiao, Chwan-Deng; Huang, Tai-huang

    2014-03-01

    The nucleocapsid phosphoprotein of the severe acute respiratory syndrome coronavirus (SARS-CoV N protein) packages the viral genome into a helical ribonucleocapsid (RNP) and plays a fundamental role during viral self-assembly. It is a protein with multifarious activities. In this article we will review our current understanding of the N protein structure and its interaction with nucleic acid. Highlights of the progresses include uncovering the modular organization, determining the structures of the structural domains, realizing the roles of protein disorder in protein-protein and protein-nucleic acid interactions, and visualizing the ribonucleoprotein (RNP) structure inside the virions. It was also demonstrated that N-protein binds to nucleic acid at multiple sites with a coupled-allostery manner. We propose a SARS-CoV RNP model that conforms to existing data and bears resemblance to the existing RNP structures of RNA viruses. The model highlights the critical role of modular organization and intrinsic disorder of the N protein in the formation and functions of the dynamic RNP capsid in RNA viruses. This paper forms part of a symposium in Antiviral Research on "From SARS to MERS: 10 years of research on highly pathogenic human coronaviruses." Copyright © 2014 Elsevier B.V. All rights reserved.

  17. Computational methods for constructing protein structure models from 3D electron microscopy maps.

    PubMed

    Esquivel-Rodríguez, Juan; Kihara, Daisuke

    2013-10-01

    Protein structure determination by cryo-electron microscopy (EM) has made significant progress in the past decades. Resolutions of EM maps have been improving as evidenced by recently reported structures that are solved at high resolutions close to 3Å. Computational methods play a key role in interpreting EM data. Among many computational procedures applied to an EM map to obtain protein structure information, in this article we focus on reviewing computational methods that model protein three-dimensional (3D) structures from a 3D EM density map that is constructed from two-dimensional (2D) maps. The computational methods we discuss range from de novo methods, which identify structural elements in an EM map, to structure fitting methods, where known high resolution structures are fit into a low-resolution EM map. A list of available computational tools is also provided. Copyright © 2013 Elsevier Inc. All rights reserved.

  18. Rosetta Structure Prediction as a Tool for Solving Difficult Molecular Replacement Problems.

    PubMed

    DiMaio, Frank

    2017-01-01

    Molecular replacement (MR), a method for solving the crystallographic phase problem using phases derived from a model of the target structure, has proven extremely valuable, accounting for the vast majority of structures solved by X-ray crystallography. However, when the resolution of data is low, or the starting model is very dissimilar to the target protein, solving structures via molecular replacement may be very challenging. In recent years, protein structure prediction methodology has emerged as a powerful tool in model building and model refinement for difficult molecular replacement problems. This chapter describes some of the tools available in Rosetta for model building and model refinement specifically geared toward difficult molecular replacement cases.

  19. Structural basis for the enhanced stability of protein model compounds and peptide backbone unit in ammonium ionic liquids.

    PubMed

    Vasantha, T; Attri, Pankaj; Venkatesu, Pannuru; Devi, R S Rama

    2012-10-04

    Protein folding/unfolding is a fascinating study in the presence of cosolvents, which protect/disrupt the native structure of protein, respectively. The structure and stability of proteins and their functional groups may be modulated by the addition of cosolvents. Ionic liquids (ILs) are finding a vast array of applications as novel cosolvents for a wide variety of biochemical processes that include protein folding. Here, the systematic and quantitative apparent transfer free energies (ΔG'(tr)) of protein model compounds from water to ILs through solubility measurements as a function of IL concentration at 25 °C have been exploited to quantify and interpret biomolecular interactions between model compounds of glycine peptides (GPs) with ammonium based ILs. The investigated aqueous systems consist of zwitterionic glycine peptides: glycine (Gly), diglycine (Gly(2)), triglycine (Gly(3)), tetraglycine (Gly(4)), and cyclic glycylglycine (c(GG)) in the presence of six ILs such as diethylammonium acetate (DEAA), diethylammonium hydrogen sulfate (DEAS), triethylammonium acetate (TEAA), triethylammonium hydrogen sulfate (TEAS), triethylammonium dihydrogen phosphate (TEAP), and trimethylammonium acetate (TMAA). We have observed positive values of ΔG'(tr) for GPs from water to ILs, indicating that interactions between ILs and GPs are unfavorable, which leads to stabilization of the structure of model protein compounds. Moreover, our experimental data ΔG'(tr) is used to obtain transfer free energies (Δg'(tr)) of the peptide backbone unit (or glycyl unit) (-CH(2)C═ONH-), which is the most numerous group in globular proteins, from water to IL solutions. To obtain the mechanism events of the ILs' role in enhancing the stability of the model compounds, we have further obtained m-values for GPs from solubility limits. These results explicitly elucidate that all alkyl ammonium ILs act as stabilizers for model compounds through the exclusion of ILs from model compounds of proteins and also reflect the effect of alkyl chain on the stability of protein model compounds.

  20. Rate Kinetics and Molecular Dynamics of the Structural Transitions in Amyloidogenic Proteins

    NASA Astrophysics Data System (ADS)

    Steckmann, Timothy M.

    Amyloid fibril aggregation is associated with several horrific diseases such as Alzheimer's, Creutzfeld-Jacob, diabetes, Parkinson's and others. The process of amyloid aggregation involves forming myriad different metastable intermediate aggregates. Amyloid fibrils are composed of proteins that originate in an innocuous alpha-helix or random-coil structure. The alpha-helices convert their structure to beta-strands that aggregate into beta-sheets, and then into protofibrils, and ultimately into fully formed amyloid fibrils. On the basis of experimental data, I have developed a mathematical model for the kinetics of the reaction pathways and determined rate parameters for peptide secondary structural conversion and aggregation during the entire fibrillogenesis process from random coil to fibrils, including the molecular species that accelerate the conversions. The specific steps of the model and the rate constants that are determined by fitting to experimental data provide insight on the molecular species involved in the fibril formation process. To better understand the molecular basis of the protein structural transitions and aggregation, I report on molecular dynamics (MD) computational studies on the formation of amyloid protofibrillar structures in the small model protein ccbeta, which undergoes many of the structural transitions of the larger, naturally occurring amyloid forming proteins. Two different structural transition processes involving hydrogen bonds are observed for aggregation into fibrils: the breaking of intrachain hydrogen bonds to allow beta-hairpin proteins to straighten, and the subsequent formation of interchain hydrogen bonds during aggregation into amyloid fibrils. For my MD simulations, I found that the temperature dependence of these two different structural transition processes results in the existence of a temperature window that the ccbeta protein experiences during the process of forming protofibrillar structures. Both the mathematical modeling of the kinetics and the MD simulations show that molecular structural heterogeneity is a major factor in the process. The MD simulations also show that intrachain and interchain hydrogen bonds breaking and forming is strongly correlated to the process of amyloid formation.

  1. Twilight reloaded: the peptide experience.

    PubMed

    Weichenberger, Christian X; Pozharski, Edwin; Rupp, Bernhard

    2017-03-01

    The de facto commoditization of biomolecular crystallography as a result of almost disruptive instrumentation automation and continuing improvement of software allows any sensibly trained structural biologist to conduct crystallographic studies of biomolecules with reasonably valid outcomes: that is, models based on properly interpreted electron density. Robust validation has led to major mistakes in the protein part of structure models becoming rare, but some depositions of protein-peptide complex structure models, which generally carry significant interest to the scientific community, still contain erroneous models of the bound peptide ligand. Here, the protein small-molecule ligand validation tool Twilight is updated to include peptide ligands. (i) The primary technical reasons and potential human factors leading to problems in ligand structure models are presented; (ii) a new method used to score peptide-ligand models is presented; (iii) a few instructive and specific examples, including an electron-density-based analysis of peptide-ligand structures that do not contain any ligands, are discussed in detail; (iv) means to avoid such mistakes and the implications for database integrity are discussed and (v) some suggestions as to how journal editors could help to expunge errors from the Protein Data Bank are provided.

  2. The fuzzy oil drop model, based on hydrophobicity density distribution, generalizes the influence of water environment on protein structure and function.

    PubMed

    Banach, Mateusz; Konieczny, Leszek; Roterman, Irena

    2014-10-21

    In this paper we show that the fuzzy oil drop model represents a general framework for describing the generation of hydrophobic cores in proteins and thus provides insight into the influence of the water environment upon protein structure and stability. The model has been successfully applied in the study of a wide range of proteins, however this paper focuses specifically on domains representing immunoglobulin-like folds. Here we provide evidence that immunoglobulin-like domains, despite being structurally similar, differ with respect to their participation in the generation of hydrophobic core. It is shown that β-structural fragments in β-barrels participate in hydrophobic core formation in a highly differentiated manner. Quantitatively measured participation in core formation helps explain the variable stability of proteins and is shown to be related to their biological properties. This also includes the known tendency of immunoglobulin domains to form amyloids, as shown using transthyretin to reveal the clear relation between amyloidogenic properties and structural characteristics based on the fuzzy oil drop model. Copyright © 2014 The Authors. Published by Elsevier Ltd.. All rights reserved.

  3. Modeling Current-Voltage Charateristics of Proteorhodopsin and Bacteriorhodopsin: Towards an Optoelectronics Based on Proteins.

    PubMed

    Alfinito, Eleonora; Reggiani, Lino

    2016-10-01

    Current-voltage characteristics of metal-protein-metal structures made of proteorhodopsin and bacteriorhodopsin are modeled by using a percolation-like approach. Starting from the tertiary structure pertaining to the single protein, an analogous resistance network is created. Charge transfer inside the network is described as a sequential tunneling mechanism and the current is calculated for each value of the given voltage. The theory is validated with available experiments, in dark and light. The role of the tertiary structure of the single protein and of the mechanisms responsible for the photo-activity is discussed.

  4. Interaction of sucralose with whey protein: Experimental and molecular modeling studies

    NASA Astrophysics Data System (ADS)

    Zhang, Hongmei; Sun, Shixin; Wang, Yanqing; Cao, Jian

    2017-12-01

    The objective of this research was to study the interactions of sucralose with whey protein isolate (WPI) by using the three-dimensional fluorescence spectroscopy, circular dichroism spectroscopy and molecular modeling. The results showed that the peptide strands structure of WPI had been changed by sucralose. Sucralose binding induced the secondary structural changes and increased content of aperiodic structure of WPI. Sucralose decreased the thermal stability of WPI and acted as a structure destabilizer during the thermal unfolding process of protein. In addition, the existence of sucralose decreased the reversibility of the unfolding of WPI. Nonetheless, sucralose-WPI complex was less stable than protein alone. The molecular modeling result showed that van der Waals and hydrogen bonding interactions contribute to the complexation free binding energy. There are more than one possible binding sites of WPI with sucralose by surface binding mode.

  5. Domain Motion Enhanced (DoME) Model for Efficient Conformational Sampling of Multidomain Proteins.

    PubMed

    Kobayashi, Chigusa; Matsunaga, Yasuhiro; Koike, Ryotaro; Ota, Motonori; Sugita, Yuji

    2015-11-19

    Large conformational changes of multidomain proteins are difficult to simulate using all-atom molecular dynamics (MD) due to the slow time scale. We show that a simple modification of the structure-based coarse-grained (CG) model enables a stable and efficient MD simulation of those proteins. "Motion Tree", a tree diagram that describes conformational changes between two structures in a protein, provides information on rigid structural units (domains) and the magnitudes of domain motions. In our new CG model, which we call the DoME (domain motion enhanced) model, interdomain interactions are defined as being inversely proportional to the magnitude of the domain motions in the diagram, whereas intradomain interactions are kept constant. We applied the DoME model in combination with the Go model to simulations of adenylate kinase (AdK). The results of the DoME-Go simulation are consistent with an all-atom MD simulation for 10 μs as well as known experimental data. Unlike the conventional Go model, the DoME-Go model yields stable simulation trajectories against temperature changes and conformational transitions are easily sampled despite domain rigidity. Evidently, identification of domains and their interfaces is useful approach for CG modeling of multidomain proteins.

  6. Determining Protein Complex Structures Based on a Bayesian Model of in Vivo Förster Resonance Energy Transfer (FRET) Data*

    PubMed Central

    Bonomi, Massimiliano; Pellarin, Riccardo; Kim, Seung Joong; Russel, Daniel; Sundin, Bryan A.; Riffle, Michael; Jaschob, Daniel; Ramsden, Richard; Davis, Trisha N.; Muller, Eric G. D.; Sali, Andrej

    2014-01-01

    The use of in vivo Förster resonance energy transfer (FRET) data to determine the molecular architecture of a protein complex in living cells is challenging due to data sparseness, sample heterogeneity, signal contributions from multiple donors and acceptors, unequal fluorophore brightness, photobleaching, flexibility of the linker connecting the fluorophore to the tagged protein, and spectral cross-talk. We addressed these challenges by using a Bayesian approach that produces the posterior probability of a model, given the input data. The posterior probability is defined as a function of the dependence of our FRET metric FRETR on a structure (forward model), a model of noise in the data, as well as prior information about the structure, relative populations of distinct states in the sample, forward model parameters, and data noise. The forward model was validated against kinetic Monte Carlo simulations and in vivo experimental data collected on nine systems of known structure. In addition, our Bayesian approach was validated by a benchmark of 16 protein complexes of known structure. Given the structures of each subunit of the complexes, models were computed from synthetic FRETR data with a distance root-mean-squared deviation error of 14 to 17 Å. The approach is implemented in the open-source Integrative Modeling Platform, allowing us to determine macromolecular structures through a combination of in vivo FRETR data and data from other sources, such as electron microscopy and chemical cross-linking. PMID:25139910

  7. The simulation approach to lipid-protein interactions.

    PubMed

    Paramo, Teresa; Garzón, Diana; Holdbrook, Daniel A; Khalid, Syma; Bond, Peter J

    2013-01-01

    The interactions between lipids and proteins are crucial for a range of biological processes, from the folding and stability of membrane proteins to signaling and metabolism facilitated by lipid-binding proteins. However, high-resolution structural details concerning functional lipid/protein interactions are scarce due to barriers in both experimental isolation of native lipid-bound complexes and subsequent biophysical characterization. The molecular dynamics (MD) simulation approach provides a means to complement available structural data, yielding dynamic, structural, and thermodynamic data for a protein embedded within a physiologically realistic, modelled lipid environment. In this chapter, we provide a guide to current methods for setting up and running simulations of membrane proteins and soluble, lipid-binding proteins, using standard atomistically detailed representations, as well as simplified, coarse-grained models. In addition, we outline recent studies that illustrate the power of the simulation approach in the context of biologically relevant lipid/protein interactions.

  8. Structural features that predict real-value fluctuations of globular proteins

    PubMed Central

    Jamroz, Michal; Kolinski, Andrzej; Kihara, Daisuke

    2012-01-01

    It is crucial to consider dynamics for understanding the biological function of proteins. We used a large number of molecular dynamics trajectories of non-homologous proteins as references and examined static structural features of proteins that are most relevant to fluctuations. We examined correlation of individual structural features with fluctuations and further investigated effective combinations of features for predicting the real-value of residue fluctuations using the support vector regression. It was found that some structural features have higher correlation than crystallographic B-factors with fluctuations observed in molecular dynamics trajectories. Moreover, support vector regression that uses combinations of static structural features showed accurate prediction of fluctuations with an average Pearson’s correlation coefficient of 0.669 and a root mean square error of 1.04 Å. This correlation coefficient is higher than the one observed for the prediction by the Gaussian network model. An advantage of the developed method over the Gaussian network models is that the former predicts the real-value of fluctuation. The results help improve our understanding of relationships between protein structure and fluctuation. Furthermore, the developed method provides a convienient practial way to predict fluctuations of proteins using easily computed static structural features of proteins. PMID:22328193

  9. A Practical Approach to Protein Crystallography.

    PubMed

    Ilari, Andrea; Savino, Carmelinda

    2017-01-01

    Macromolecular crystallography is a powerful tool for structural biology. The resolution of a protein crystal structure is becoming much easier than in the past, thanks to developments in computing, automation of crystallization techniques and high-flux synchrotron sources to collect diffraction datasets. The aim of this chapter is to provide practical procedures to determine a protein crystal structure, illustrating the new techniques, experimental methods, and software that have made protein crystallography a tool accessible to a larger scientific community.It is impossible to give more than a taste of what the X-ray crystallographic technique entails in one brief chapter and there are different ways to solve a protein structure. Since the number of structures available in the Protein Data Bank (PDB) is becoming ever larger (the protein data bank now contains more than 100,000 entries) and therefore the probability to find a good model to solve the structure is ever increasing, we focus our attention on the Molecular Replacement method. Indeed, whenever applicable, this method allows the resolution of macromolecular structures starting from a single data set and a search model downloaded from the PDB, with the aid only of computer work.

  10. Protein homology model refinement by large-scale energy optimization.

    PubMed

    Park, Hahnbeom; Ovchinnikov, Sergey; Kim, David E; DiMaio, Frank; Baker, David

    2018-03-20

    Proteins fold to their lowest free-energy structures, and hence the most straightforward way to increase the accuracy of a partially incorrect protein structure model is to search for the lowest-energy nearby structure. This direct approach has met with little success for two reasons: first, energy function inaccuracies can lead to false energy minima, resulting in model degradation rather than improvement; and second, even with an accurate energy function, the search problem is formidable because the energy only drops considerably in the immediate vicinity of the global minimum, and there are a very large number of degrees of freedom. Here we describe a large-scale energy optimization-based refinement method that incorporates advances in both search and energy function accuracy that can substantially improve the accuracy of low-resolution homology models. The method refined low-resolution homology models into correct folds for 50 of 84 diverse protein families and generated improved models in recent blind structure prediction experiments. Analyses of the basis for these improvements reveal contributions from both the improvements in conformational sampling techniques and the energy function.

  11. Target Highlights in CASP9: Experimental Target Structures for the Critical Assessment of Techniques for Protein Structure Prediction

    PubMed Central

    Kryshtafovych, Andriy; Moult, John; Bartual, Sergio G.; Bazan, J. Fernando; Berman, Helen; Casteel, Darren E.; Christodoulou, Evangelos; Everett, John K.; Hausmann, Jens; Heidebrecht, Tatjana; Hills, Tanya; Hui, Raymond; Hunt, John F.; Jayaraman, Seetharaman; Joachimiak, Andrzej; Kennedy, Michael A.; Kim, Choel; Lingel, Andreas; Michalska, Karolina; Montelione, Gaetano T.; Otero, José M.; Perrakis, Anastassis; Pizarro, Juan C.; van Raaij, Mark J.; Ramelot, Theresa A.; Rousseau, Francois; Tong, Liang; Wernimont, Amy K.; Young, Jasmine; Schwede, Torsten

    2011-01-01

    One goal of the CASP Community Wide Experiment on the Critical Assessment of Techniques for Protein Structure Prediction is to identify the current state of the art in protein structure prediction and modeling. A fundamental principle of CASP is blind prediction on a set of relevant protein targets, i.e. the participating computational methods are tested on a common set of experimental target proteins, for which the experimental structures are not known at the time of modeling. Therefore, the CASP experiment would not have been possible without broad support of the experimental protein structural biology community. In this manuscript, several experimental groups discuss the structures of the proteins which they provided as prediction targets for CASP9, highlighting structural and functional peculiarities of these structures: the long tail fibre protein gp37 from bacteriophage T4, the cyclic GMP-dependent protein kinase Iβ (PKGIβ) dimerization/docking domain, the ectodomain of the JTB (Jumping Translocation Breakpoint) transmembrane receptor, Autotaxin (ATX) in complex with an inhibitor, the DNA-Binding J-Binding Protein 1 (JBP1) domain essential for biosynthesis and maintenance of DNA base-J (β-D-glucosyl-hydroxymethyluracil) in Trypanosoma and Leishmania, an so far uncharacterized 73 residue domain from Ruminococcus gnavus with a fold typical for PDZ-like domains, a domain from the Phycobilisome (PBS) core-membrane linker (LCM) phycobiliprotein ApcE from Synechocystis, the Heat shock protein 90 (Hsp90) activators PFC0360w and PFC0270w from Plasmodium falciparum, and 2-oxo-3-deoxygalactonate kinase from Klebsiella pneumoniae. PMID:22020785

  12. Anharmonic Normal Mode Analysis of Elastic Network Model Improves the Modeling of Atomic Fluctuations in Protein Crystal Structures

    PubMed Central

    Zheng, Wenjun

    2010-01-01

    Abstract Protein conformational dynamics, despite its significant anharmonicity, has been widely explored by normal mode analysis (NMA) based on atomic or coarse-grained potential functions. To account for the anharmonic aspects of protein dynamics, this study proposes, and has performed, an anharmonic NMA (ANMA) based on the Cα-only elastic network models, which assume elastic interactions between pairs of residues whose Cα atoms or heavy atoms are within a cutoff distance. The key step of ANMA is to sample an anharmonic potential function along the directions of eigenvectors of the lowest normal modes to determine the mean-squared fluctuations along these directions. ANMA was evaluated based on the modeling of anisotropic displacement parameters (ADPs) from a list of 83 high-resolution protein crystal structures. Significant improvement was found in the modeling of ADPs by ANMA compared with standard NMA. Further improvement in the modeling of ADPs is attained if the interactions between a protein and its crystalline environment are taken into account. In addition, this study has determined the optimal cutoff distances for ADP modeling based on elastic network models, and these agree well with the peaks of the statistical distributions of distances between Cα atoms or heavy atoms derived from a large set of protein crystal structures. PMID:20550915

  13. [Crystal structure of SMU.2055 protein from Streptococcus mutans and its small molecule inhibitors design and selection].

    PubMed

    Xiaodan, Chen; Xiurong, Zhan; Xinyu, Wu; Chunyan, Zhao; Wanghong, Zhao

    2015-04-01

    The aim of this study is to analyze the three-dimensional crystal structure of SMU.2055 protein, a putative acetyltransferase from the major caries pathogen Streptococcus mutans (S. mutans). The design and selection of the structure-based small molecule inhibitors are also studied. The three-dimensional crystal structure of SMU.2055 protein was obtained by structural genomics research methods of gene cloning and expression, protein purification with Ni²⁺-chelating affinity chromatography, crystal screening, and X-ray diffraction data collection. An inhibitor virtual model matching with its target protein structure was set up using computer-aided drug design methods, virtual screening and fine docking, and Libdock and Autodock procedures. The crystal of SMU.2055 protein was obtained, and its three-dimensional crystal structure was analyzed. This crystal was diffracted to a resolution of 0.23 nm. It belongs to orthorhombic space group C222(1), with unit cell parameters of a = 9.20 nm, b = 9.46 nm, and c = 19.39 nm. The asymmetric unit contained four molecules, with a solvent content of 56.7%. Moreover, five small molecule compounds, whose structure matched with that of the target protein in high degree, were designed and selected. Protein crystallography research of S. mutans SMU.2055 helps to understand the structures and functions of proteins from S. mutans at the atomic level. These five compounds may be considered as effective inhibitors to SMU.2055. The virtual model of small molecule inhibitors we built will lay a foundation to the anticaries research based on the crystal structure of proteins.

  14. Biomolecular interactions modulate macromolecular structure and dynamics in atomistic model of a bacterial cytoplasm

    PubMed Central

    Yu, Isseki; Mori, Takaharu; Ando, Tadashi; Harada, Ryuhei; Jung, Jaewoon; Sugita, Yuji; Feig, Michael

    2016-01-01

    Biological macromolecules function in highly crowded cellular environments. The structure and dynamics of proteins and nucleic acids are well characterized in vitro, but in vivo crowding effects remain unclear. Using molecular dynamics simulations of a comprehensive atomistic model cytoplasm we found that protein-protein interactions may destabilize native protein structures, whereas metabolite interactions may induce more compact states due to electrostatic screening. Protein-protein interactions also resulted in significant variations in reduced macromolecular diffusion under crowded conditions, while metabolites exhibited significant two-dimensional surface diffusion and altered protein-ligand binding that may reduce the effective concentration of metabolites and ligands in vivo. Metabolic enzymes showed weak non-specific association in cellular environments attributed to solvation and entropic effects. These effects are expected to have broad implications for the in vivo functioning of biomolecules. This work is a first step towards physically realistic in silico whole-cell models that connect molecular with cellular biology. DOI: http://dx.doi.org/10.7554/eLife.19274.001 PMID:27801646

  15. Structural analysis of herpes simplex virus by optical super-resolution imaging

    NASA Astrophysics Data System (ADS)

    Laine, Romain F.; Albecka, Anna; van de Linde, Sebastian; Rees, Eric J.; Crump, Colin M.; Kaminski, Clemens F.

    2015-01-01

    Herpes simplex virus type-1 (HSV-1) is one of the most widespread pathogens among humans. Although the structure of HSV-1 has been extensively investigated, the precise organization of tegument and envelope proteins remains elusive. Here we use super-resolution imaging by direct stochastic optical reconstruction microscopy (dSTORM) in combination with a model-based analysis of single-molecule localization data, to determine the position of protein layers within virus particles. We resolve different protein layers within individual HSV-1 particles using multi-colour dSTORM imaging and discriminate envelope-anchored glycoproteins from tegument proteins, both in purified virions and in virions present in infected cells. Precise characterization of HSV-1 structure was achieved by particle averaging of purified viruses and model-based analysis of the radial distribution of the tegument proteins VP16, VP1/2 and pUL37, and envelope protein gD. From this data, we propose a model of the protein organization inside the tegument.

  16. Application potential of ATR-FT/IR molecular spectroscopy in animal nutrition: revelation of protein molecular structures of canola meal and presscake, as affected by heat-processing methods, in relationship with their protein digestive behavior and utilization for dairy cattle.

    PubMed

    Theodoridou, Katerina; Yu, Peiqiang

    2013-06-12

    Protein quality relies not only on total protein but also on protein inherent structures. The most commonly occurring protein secondary structures (α-helix and β-sheet) may influence protein quality, nutrient utilization, and digestive behavior. The objectives of this study were to reveal the protein molecular structures of canola meal (yellow and brown) and presscake as affected by the heat-processing methods and to investigate the relationship between structure changes and protein rumen degradations kinetics, estimated protein intestinal digestibility, degraded protein balance, and metabolizable protein. Heat-processing conditions resulted in a higher value for α-helix and β-sheet for brown canola presscake compared to brown canola meal. The multivariate molecular spectral analyses (PCA, CLA) showed that there were significant molecular structural differences in the protein amide I and II fingerprint region (ca. 1700-1480 cm(-1)) between the brown canola meal and presscake. The in situ degradation parameters, amide I and II, and α-helix to β-sheet ratio (R_a_β) were positively correlated with the degradable fraction and the degradation rate. Modeling results showed that α-helix was positively correlated with the truly absorbed rumen synthesized microbial protein in the small intestine when using both the Dutch DVE/OEB system and the NRC-2001 model. Concerning the protein profiles, R_a_β was a better predictor for crude protein (79%) and for neutral detergent insoluble crude protein (68%). In conclusion, ATR-FT/IR molecular spectroscopy may be used to rapidly characterize feed structures at the molecular level and also as a potential predictor of feed functionality, digestive behavior, and nutrient utilization of canola feed.

  17. Internal protein motions in molecular-dynamics simulations of Bragg and diffuse X-ray scattering.

    PubMed

    Wall, Michael E

    2018-03-01

    Molecular-dynamics (MD) simulations of Bragg and diffuse X-ray scattering provide a means of obtaining experimentally validated models of protein conformational ensembles. This paper shows that compared with a single periodic unit-cell model, the accuracy of simulating diffuse scattering is increased when the crystal is modeled as a periodic supercell consisting of a 2 × 2 × 2 layout of eight unit cells. The MD simulations capture the general dependence of correlations on the separation of atoms. There is substantial agreement between the simulated Bragg reflections and the crystal structure; there are local deviations, however, indicating both the limitation of using a single structure to model disordered regions of the protein and local deviations of the average structure away from the crystal structure. Although it was anticipated that a simulation of longer duration might be required to achieve maximal agreement of the diffuse scattering calculation with the data using the supercell model, only a microsecond is required, the same as for the unit cell. Rigid protein motions only account for a minority fraction of the variation in atom positions from the simulation. The results indicate that protein crystal dynamics may be dominated by internal motions rather than packing interactions, and that MD simulations can be combined with Bragg and diffuse X-ray scattering to model the protein conformational ensemble.

  18. Structure Prediction of the Second Extracellular Loop in G-Protein-Coupled Receptors

    PubMed Central

    Kmiecik, Sebastian; Jamroz, Michal; Kolinski, Michal

    2014-01-01

    G-protein-coupled receptors (GPCRs) play key roles in living organisms. Therefore, it is important to determine their functional structures. The second extracellular loop (ECL2) is a functionally important region of GPCRs, which poses significant challenge for computational structure prediction methods. In this work, we evaluated CABS, a well-established protein modeling tool for predicting ECL2 structure in 13 GPCRs. The ECL2s (with between 13 and 34 residues) are predicted in an environment of other extracellular loops being fully flexible and the transmembrane domain fixed in its x-ray conformation. The modeling procedure used theoretical predictions of ECL2 secondary structure and experimental constraints on disulfide bridges. Our approach yielded ensembles of low-energy conformers and the most populated conformers that contained models close to the available x-ray structures. The level of similarity between the predicted models and x-ray structures is comparable to that of other state-of-the-art computational methods. Our results extend other studies by including newly crystallized GPCRs. PMID:24896119

  19. Integration of QUARK and I-TASSER for Ab Initio Protein Structure Prediction in CASP11.

    PubMed

    Zhang, Wenxuan; Yang, Jianyi; He, Baoji; Walker, Sara Elizabeth; Zhang, Hongjiu; Govindarajoo, Brandon; Virtanen, Jouko; Xue, Zhidong; Shen, Hong-Bin; Zhang, Yang

    2016-09-01

    We tested two pipelines developed for template-free protein structure prediction in the CASP11 experiment. First, the QUARK pipeline constructs structure models by reassembling fragments of continuously distributed lengths excised from unrelated proteins. Five free-modeling (FM) targets have the model successfully constructed by QUARK with a TM-score above 0.4, including the first model of T0837-D1, which has a TM-score = 0.736 and RMSD = 2.9 Å to the native. Detailed analysis showed that the success is partly attributed to the high-resolution contact map prediction derived from fragment-based distance-profiles, which are mainly located between regular secondary structure elements and loops/turns and help guide the orientation of secondary structure assembly. In the Zhang-Server pipeline, weakly scoring threading templates are re-ordered by the structural similarity to the ab initio folding models, which are then reassembled by I-TASSER based structure assembly simulations; 60% more domains with length up to 204 residues, compared to the QUARK pipeline, were successfully modeled by the I-TASSER pipeline with a TM-score above 0.4. The robustness of the I-TASSER pipeline can stem from the composite fragment-assembly simulations that combine structures from both ab initio folding and threading template refinements. Despite the promising cases, challenges still exist in long-range beta-strand folding, domain parsing, and the uncertainty of secondary structure prediction; the latter of which was found to affect nearly all aspects of FM structure predictions, from fragment identification, target classification, structure assembly, to final model selection. Significant efforts are needed to solve these problems before real progress on FM could be made. Proteins 2016; 84(Suppl 1):76-86. © 2015 Wiley Periodicals, Inc. © 2015 Wiley Periodicals, Inc.

  20. AIDA: ab initio domain assembly for automated multi-domain protein structure prediction and domain–domain interaction prediction

    PubMed Central

    Xu, Dong; Jaroszewski, Lukasz; Li, Zhanwen; Godzik, Adam

    2015-01-01

    Motivation: Most proteins consist of multiple domains, independent structural and evolutionary units that are often reshuffled in genomic rearrangements to form new protein architectures. Template-based modeling methods can often detect homologous templates for individual domains, but templates that could be used to model the entire query protein are often not available. Results: We have developed a fast docking algorithm ab initio domain assembly (AIDA) for assembling multi-domain protein structures, guided by the ab initio folding potential. This approach can be extended to discontinuous domains (i.e. domains with ‘inserted’ domains). When tested on experimentally solved structures of multi-domain proteins, the relative domain positions were accurately found among top 5000 models in 86% of cases. AIDA server can use domain assignments provided by the user or predict them from the provided sequence. The latter approach is particularly useful for automated protein structure prediction servers. The blind test consisting of 95 CASP10 targets shows that domain boundaries could be successfully determined for 97% of targets. Availability and implementation: The AIDA package as well as the benchmark sets used here are available for download at http://ffas.burnham.org/AIDA/. Contact: adam@sanfordburnham.org Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25701568

  1. GeneSilico protein structure prediction meta-server.

    PubMed

    Kurowski, Michal A; Bujnicki, Janusz M

    2003-07-01

    Rigorous assessments of protein structure prediction have demonstrated that fold recognition methods can identify remote similarities between proteins when standard sequence search methods fail. It has been shown that the accuracy of predictions is improved when refined multiple sequence alignments are used instead of single sequences and if different methods are combined to generate a consensus model. There are several meta-servers available that integrate protein structure predictions performed by various methods, but they do not allow for submission of user-defined multiple sequence alignments and they seldom offer confidentiality of the results. We developed a novel WWW gateway for protein structure prediction, which combines the useful features of other meta-servers available, but with much greater flexibility of the input. The user may submit an amino acid sequence or a multiple sequence alignment to a set of methods for primary, secondary and tertiary structure prediction. Fold-recognition results (target-template alignments) are converted into full-atom 3D models and the quality of these models is uniformly assessed. A consensus between different FR methods is also inferred. The results are conveniently presented on-line on a single web page over a secure, password-protected connection. The GeneSilico protein structure prediction meta-server is freely available for academic users at http://genesilico.pl/meta.

  2. GeneSilico protein structure prediction meta-server

    PubMed Central

    Kurowski, Michal A.; Bujnicki, Janusz M.

    2003-01-01

    Rigorous assessments of protein structure prediction have demonstrated that fold recognition methods can identify remote similarities between proteins when standard sequence search methods fail. It has been shown that the accuracy of predictions is improved when refined multiple sequence alignments are used instead of single sequences and if different methods are combined to generate a consensus model. There are several meta-servers available that integrate protein structure predictions performed by various methods, but they do not allow for submission of user-defined multiple sequence alignments and they seldom offer confidentiality of the results. We developed a novel WWW gateway for protein structure prediction, which combines the useful features of other meta-servers available, but with much greater flexibility of the input. The user may submit an amino acid sequence or a multiple sequence alignment to a set of methods for primary, secondary and tertiary structure prediction. Fold-recognition results (target-template alignments) are converted into full-atom 3D models and the quality of these models is uniformly assessed. A consensus between different FR methods is also inferred. The results are conveniently presented on-line on a single web page over a secure, password-protected connection. The GeneSilico protein structure prediction meta-server is freely available for academic users at http://genesilico.pl/meta. PMID:12824313

  3. Modeling protein structure at near atomic resolutions with Gorgon.

    PubMed

    Baker, Matthew L; Abeysinghe, Sasakthi S; Schuh, Stephen; Coleman, Ross A; Abrams, Austin; Marsh, Michael P; Hryc, Corey F; Ruths, Troy; Chiu, Wah; Ju, Tao

    2011-05-01

    Electron cryo-microscopy (cryo-EM) has played an increasingly important role in elucidating the structure and function of macromolecular assemblies in near native solution conditions. Typically, however, only non-atomic resolution reconstructions have been obtained for these large complexes, necessitating computational tools for integrating and extracting structural details. With recent advances in cryo-EM, maps at near-atomic resolutions have been achieved for several macromolecular assemblies from which models have been manually constructed. In this work, we describe a new interactive modeling toolkit called Gorgon targeted at intermediate to near-atomic resolution density maps (10-3.5 Å), particularly from cryo-EM. Gorgon's de novo modeling procedure couples sequence-based secondary structure prediction with feature detection and geometric modeling techniques to generate initial protein backbone models. Beyond model building, Gorgon is an extensible interactive visualization platform with a variety of computational tools for annotating a wide variety of 3D volumes. Examples from cryo-EM maps of Rotavirus and Rice Dwarf Virus are used to demonstrate its applicability to modeling protein structure. Copyright © 2011 Elsevier Inc. All rights reserved.

  4. Exploration of freely available web-interfaces for comparative homology modelling of microbial proteins

    PubMed Central

    Nema, Vijay; Pal, Sudhir Kumar

    2013-01-01

    Aim: This study was conducted to find the best suited freely available software for modelling of proteins by taking a few sample proteins. The proteins used were small to big in size with available crystal structures for the purpose of benchmarking. Key players like Phyre2, Swiss-Model, CPHmodels-3.0, Homer, (PS)2, (PS)2-V2, Modweb were used for the comparison and model generation. Results: Benchmarking process was done for four proteins, Icl, InhA, and KatG of Mycobacterium tuberculosis and RpoB of Thermus Thermophilus to get the most suited software. Parameters compared during analysis gave relatively better values for Phyre2 and Swiss-Model. Conclusion: This comparative study gave the information that Phyre2 and Swiss-Model make good models of small and large proteins as compared to other screened software. Other software was also good but is often not very efficient in providing full-length and properly folded structure. PMID:24023424

  5. Putting proteins back into water

    NASA Astrophysics Data System (ADS)

    de Los Rios, Paolo; Caldarelli, Guido

    2000-12-01

    We introduce a simplified protein model where the solvent (water) degrees of freedom appear explicitly (although in an extremely simplified fashion). Using this model we are able to recover the thermodynamic phenomenology of proteins over a wide range of temperatures. In particular we describe both the warm and the cold protein denaturation within a single framework, while addressing important issues about the structure of model proteins.

  6. Insights on the structural perturbations in human MTHFR Ala222Val mutant by protein modeling and molecular dynamics.

    PubMed

    Abhinand, P A; Shaikh, Faraz; Bhakat, Soumendranath; Radadiya, Ashish; Bhaskar, L V K S; Shah, Anamik; Ragunath, P K

    2016-01-01

    Methylenetetrahydrofolate reductase (MTHFR) protein catalyzes the only biochemical reaction which produces methyltetrahydrofolate, the active form of folic acid essential for several molecular functions. The Ala222Val polymorphism of human MTHFR encodes a thermolabile protein associated with increased risk of neural tube defects and cardiovascular disease. Experimental studies have shown that the mutation does not affect the kinetic properties of MTHFR, but inactivates the protein by increasing flavin adenine dinucleotide (FAD) loss. The lack of completely solved crystal structure of MTHFR is an impediment in understanding the structural perturbations caused by the Ala222Val mutation; computational modeling provides a suitable alternative. The three-dimensional structure of human MTHFR protein was obtained through homology modeling, by taking the MTHFR structures from Escherichia coli and Thermus thermophilus as templates. Subsequently, the modeled structure was docked with FAD using Glide, which revealed a very good binding affinity, authenticated by a Glide XP score of -10.3983 (kcal mol(-1)). The MTHFR was mutated by changing Alanine 222 to Valine. The wild-type MTHFR-FAD complex and the Ala222Val mutant MTHFR-FAD complex were subjected to molecular dynamics simulation over 50 ns period. The average difference in backbone root mean square deviation (RMSD) between wild and mutant variant was found to be ~.11 Å. The greater degree of fluctuations in the mutant protein translates to increased conformational stability as a result of mutation. The FAD-binding ability of the mutant MTHFR was also found to be significantly lowered as a result of decreased protein grip caused by increased conformational flexibility. The study provides insights into the Ala222Val mutation of human MTHFR that induces major conformational changes in the tertiary structure, causing a significant reduction in the FAD-binding affinity.

  7. Ab initio folding of proteins using all-atom discrete molecular dynamics

    PubMed Central

    Ding, Feng; Tsao, Douglas; Nie, Huifen; Dokholyan, Nikolay V.

    2008-01-01

    Summary Discrete molecular dynamics (DMD) is a rapid sampling method used in protein folding and aggregation studies. Until now, DMD was used to perform simulations of simplified protein models in conjunction with structure-based force fields. Here, we develop an all-atom protein model and a transferable force field featuring packing, solvation, and environment-dependent hydrogen bond interactions. Using the replica exchange method, we perform folding simulations of six small proteins (20–60 residues) with distinct native structures. In all cases, native or near-native states are reached in simulations. For three small proteins, multiple folding transitions are observed and the computationally-characterized thermodynamics are in quantitative agreement with experiments. The predictive power of all-atom DMD highlights the importance of environment-dependent hydrogen bond interactions in modeling protein folding. The developed approach can be used for accurate and rapid sampling of conformational spaces of proteins and protein-protein complexes, and applied to protein engineering and design of protein-protein interactions. PMID:18611374

  8. Modeling Structural Dynamics of Biomolecular Complexes by Coarse-Grained Molecular Simulations.

    PubMed

    Takada, Shoji; Kanada, Ryo; Tan, Cheng; Terakawa, Tsuyoshi; Li, Wenfei; Kenzaki, Hiroo

    2015-12-15

    Due to hierarchic nature of biomolecular systems, their computational modeling calls for multiscale approaches, in which coarse-grained (CG) simulations are used to address long-time dynamics of large systems. Here, we review recent developments and applications of CG modeling methods, focusing on our methods primarily for proteins, DNA, and their complexes. These methods have been implemented in the CG biomolecular simulator, CafeMol. Our CG model has resolution such that ∼10 non-hydrogen atoms are grouped into one CG particle on average. For proteins, each amino acid is represented by one CG particle. For DNA, one nucleotide is simplified by three CG particles, representing sugar, phosphate, and base. The protein modeling is based on the idea that proteins have a globally funnel-like energy landscape, which is encoded in the structure-based potential energy function. We first describe two representative minimal models of proteins, called the elastic network model and the classic Go̅ model. We then present a more elaborate protein model, which extends the minimal model to incorporate sequence and context dependent local flexibility and nonlocal contacts. For DNA, we describe a model developed by de Pablo's group that was tuned to well reproduce sequence-dependent structural and thermodynamic experimental data for single- and double-stranded DNAs. Protein-DNA interactions are modeled either by the structure-based term for specific cases or by electrostatic and excluded volume terms for nonspecific cases. We also discuss the time scale mapping in CG molecular dynamics simulations. While the apparent single time step of our CGMD is about 10 times larger than that in the fully atomistic molecular dynamics for small-scale dynamics, large-scale motions can be further accelerated by two-orders of magnitude with the use of CG model and a low friction constant in Langevin dynamics. Next, we present four examples of applications. First, the classic Go̅ model was used to emulate one ATP cycle of a molecular motor, kinesin. Second, nonspecific protein-DNA binding was studied by a combination of elaborate protein and DNA models. Third, a transcription factor, p53, that contains highly fluctuating regions was simulated on two perpendicularly arranged DNA segments, addressing intersegmental transfer of p53. Fourth, we simulated structural dynamics of dinucleosomes connected by a linker DNA finding distinct types of internucleosome docking and salt-concentration-dependent compaction. Finally, we discuss many of limitations in the current approaches and future directions. Especially, more accurate electrostatic treatment and a phospholipid model that matches our CG resolutions are of immediate importance.

  9. Reduced Fragment Diversity for Alpha and Alpha-Beta Protein Structure Prediction using Rosetta.

    PubMed

    Abbass, Jad; Nebel, Jean-Christophe

    2017-01-01

    Protein structure prediction is considered a main challenge in computational biology. The biannual international competition, Critical Assessment of protein Structure Prediction (CASP), has shown in its eleventh experiment that free modelling target predictions are still beyond reliable accuracy, therefore, much effort should be made to improve ab initio methods. Arguably, Rosetta is considered as the most competitive method when it comes to targets with no homologues. Relying on fragments of length 9 and 3 from known structures, Rosetta creates putative structures by assembling candidate fragments. Generally, the structure with the lowest energy score, also known as first model, is chosen to be the "predicted one". A thorough study has been conducted on the role and diversity of 3-mers involved in Rosetta's model "refinement" phase. Usage of the standard number of 3-mers - i.e. 200 - has been shown to degrade alpha and alpha-beta protein conformations initially achieved by assembling 9-mers. Therefore, a new prediction pipeline is proposed for Rosetta where the "refinement" phase is customised according to a target's structural class prediction. Over 8% improvement in terms of first model structure accuracy is reported for alpha and alpha-beta classes when decreasing the number of 3- mers. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  10. Matt: local flexibility aids protein multiple structure alignment.

    PubMed

    Menke, Matthew; Berger, Bonnie; Cowen, Lenore

    2008-01-01

    Even when there is agreement on what measure a protein multiple structure alignment should be optimizing, finding the optimal alignment is computationally prohibitive. One approach used by many previous methods is aligned fragment pair chaining, where short structural fragments from all the proteins are aligned against each other optimally, and the final alignment chains these together in geometrically consistent ways. Ye and Godzik have recently suggested that adding geometric flexibility may help better model protein structures in a variety of contexts. We introduce the program Matt (Multiple Alignment with Translations and Twists), an aligned fragment pair chaining algorithm that, in intermediate steps, allows local flexibility between fragments: small translations and rotations are temporarily allowed to bring sets of aligned fragments closer, even if they are physically impossible under rigid body transformations. After a dynamic programming assembly guided by these "bent" alignments, geometric consistency is restored in the final step before the alignment is output. Matt is tested against other recent multiple protein structure alignment programs on the popular Homstrad and SABmark benchmark datasets. Matt's global performance is competitive with the other programs on Homstrad, but outperforms the other programs on SABmark, a benchmark of multiple structure alignments of proteins with more distant homology. On both datasets, Matt demonstrates an ability to better align the ends of alpha-helices and beta-strands, an important characteristic of any structure alignment program intended to help construct a structural template library for threading approaches to the inverse protein-folding problem. The related question of whether Matt alignments can be used to distinguish distantly homologous structure pairs from pairs of proteins that are not homologous is also considered. For this purpose, a p-value score based on the length of the common core and average root mean squared deviation (RMSD) of Matt alignments is shown to largely separate decoys from homologous protein structures in the SABmark benchmark dataset. We postulate that Matt's strong performance comes from its ability to model proteins in different conformational states and, perhaps even more important, its ability to model backbone distortions in more distantly related proteins.

  11. Molecular dynamics simulations of the Bcl-2 protein to predict the structure of its unordered flexible loop domain.

    PubMed

    Raghav, Pawan Kumar; Verma, Yogesh Kumar; Gangenahalli, Gurudutta U

    2012-05-01

    B-cell lymphoma (Bcl-2) protein is an anti-apoptotic member of the Bcl-2 family. It is functionally demarcated into four Bcl-2 homology (BH) domains: BH1, BH2, BH3, BH4, one flexible loop domain (FLD), a transmembrane domain (TM), and an X domain. Bcl-2's BH domains have clearly been elucidated from a structural perspective, whereas the conformation of FLD has not yet been predicted, despite its important role in regulating apoptosis through its interactions with JNK-1, PKC, PP2A phosphatase, caspase 3, MAP kinase, ubiquitin, PS1, and FKBP38. Many important residues that regulate Bcl-2 anti-apoptotic activity are present in this domain, for example Asp34, Thr56, Thr69, Ser70, Thr74, and Ser87. The structural elucidation of the FLD would likely help in attempts to accurately predict the effect of mutating these residues on the overall structure of the protein and the interactions of other proteins in this domain. Therefore, we have generated an increased quality model of the Bcl-2 protein including the FLD through modeling. Further, molecular dynamics (MD) simulations were used for FLD optimization, to predict the flexibility, and to determine the stability of the folded FLD. In addition, essential dynamics (ED) was used to predict the collective motions and the essential subspace relevant to Bcl-2 protein function. The predicted average structure and ensemble of MD-simulated structures were submitted to the Protein Model Database (PMDB), and the Bcl-2 structures obtained exhibited enhanced quality. This study should help to elucidate the structural basis for Bcl-2 anti-apoptotic activity regulation through its binding to other proteins via the FLD.

  12. Low-Resolution Structure of the Full-Length Barley (Hordeum vulgare) SGT1 Protein in Solution, Obtained Using Small-Angle X-Ray Scattering

    PubMed Central

    Taube, Michał; Pieńkowska, Joanna R.; Jarmołowski, Artur; Kozak, Maciej

    2014-01-01

    SGT1 is an evolutionarily conserved eukaryotic protein involved in many important cellular processes. In plants, SGT1 is involved in resistance to disease. In a low ionic strength environment, the SGT1 protein tends to form dimers. The protein consists of three structurally independent domains (the tetratricopeptide repeats domain (TPR), the CHORD- and SGT1-containing domain (CS), and the SGT1-specific domain (SGS)), and two less conserved variable regions (VR1 and VR2). In the present study, we provide the low-resolution structure of the barley (Hordeum vulgare) SGT1 protein in solution and its dimer/monomer equilibrium using small-angle scattering of synchrotron radiation, ab-initio modeling and circular dichroism spectroscopy. The multivariate curve resolution least-square method (MCR-ALS) was applied to separate the scattering data of the monomeric and dimeric species from a complex mixture. The models of the barley SGT1 dimer and monomer were formulated using rigid body modeling with ab-initio structure prediction. Both oligomeric forms of barley SGT1 have elongated shapes with unfolded inter-domain regions. Circular dichroism spectroscopy confirmed that the barley SGT1 protein had a modular architecture, with an α-helical TPR domain, a β-sheet sandwich CS domain, and a disordered SGS domain separated by VR1 and VR2 regions. Using molecular docking and ab-initio protein structure prediction, a model of dimerization of the TPR domains was proposed. PMID:24714665

  13. Structural Model for the Interaction of a Designed Ankyrin Repeat Protein with the Human Epidermal Growth Factor Receptor 2

    PubMed Central

    Epa, V. Chandana; Dolezal, Olan; Doughty, Larissa; Xiao, Xiaowen; Jost, Christian; Plückthun, Andreas; Adams, Timothy E.

    2013-01-01

    Designed Ankyrin Repeat Proteins are a class of novel binding proteins that can be selected and evolved to bind to targets with high affinity and specificity. We are interested in the DARPin H10-2-G3, which has been evolved to bind with very high affinity to the human epidermal growth factor receptor 2 (HER2). HER2 is found to be over-expressed in 30% of breast cancers, and is the target for the FDA-approved therapeutic monoclonal antibodies trastuzumab and pertuzumab and small molecule tyrosine kinase inhibitors. Here, we use computational macromolecular docking, coupled with several interface metrics such as shape complementarity, interaction energy, and electrostatic complementarity, to model the structure of the complex between the DARPin H10-2-G3 and HER2. We analyzed the interface between the two proteins and then validated the structural model by showing that selected HER2 point mutations at the putative interface with H10-2-G3 reduce the affinity of binding up to 100-fold without affecting the binding of trastuzumab. Comparisons made with a subsequently solved X-ray crystal structure of the complex yielded a backbone atom root mean square deviation of 0.84–1.14 Ångstroms. The study presented here demonstrates the capability of the computational techniques of structural bioinformatics in generating useful structural models of protein-protein interactions. PMID:23527120

  14. Phylogenetic analysis and protein structure modelling identifies distinct Ca(2+)/Cation antiporters and conservation of gene family structure within Arabidopsis and rice species.

    PubMed

    Pittman, Jon K; Hirschi, Kendal D

    2016-12-01

    The Ca(2+)/Cation Antiporter (CaCA) superfamily is an ancient and widespread family of ion-coupled cation transporters found in nearly all kingdoms of life. In animals, K(+)-dependent and K(+)-indendent Na(+)/Ca(2+) exchangers (NCKX and NCX) are important CaCA members. Recently it was proposed that all rice and Arabidopsis CaCA proteins should be classified as NCX proteins. Here we performed phylogenetic analysis of CaCA genes and protein structure homology modelling to further characterise members of this transporter superfamily. Phylogenetic analysis of rice and Arabidopsis CaCAs in comparison with selected CaCA members from non-plant species demonstrated that these genes form clearly distinct families, with the H(+)/Cation exchanger (CAX) and cation/Ca(2+) exchanger (CCX) families dominant in higher plants but the NCKX and NCX families absent. NCX-related Mg(2+)/H(+) exchanger (MHX) and CAX-related Na(+)/Ca(2+) exchanger-like (NCL) proteins are instead present. Analysis of genomes of ten closely-related rice species and four Arabidopsis-related species found that CaCA gene family structures are highly conserved within related plants, apart from minor variation. Protein structures were modelled for OsCAX1a and OsMHX1. Despite exhibiting broad structural conservation, there are clear structural differences observed between the different CaCA types. Members of the CaCA superfamily form clearly distinct families with different phylogenetic, structural and functional characteristics, and therefore should not be simply classified as NCX proteins, which should remain as a separate gene family.

  15. Predicting the accuracy of ligand overlay methods with Random Forest models.

    PubMed

    Nandigam, Ravi K; Evans, David A; Erickson, Jon A; Kim, Sangtae; Sutherland, Jeffrey J

    2008-12-01

    The accuracy of binding mode prediction using standard molecular overlay methods (ROCS, FlexS, Phase, and FieldCompare) is studied. Previous work has shown that simple decision tree modeling can be used to improve accuracy by selection of the best overlay template. This concept is extended to the use of Random Forest (RF) modeling for template and algorithm selection. An extensive data set of 815 ligand-bound X-ray structures representing 5 gene families was used for generating ca. 70,000 overlays using four programs. RF models, trained using standard measures of ligand and protein similarity and Lipinski-related descriptors, are used for automatically selecting the reference ligand and overlay method maximizing the probability of reproducing the overlay deduced from X-ray structures (i.e., using rmsd < or = 2 A as the criteria for success). RF model scores are highly predictive of overlay accuracy, and their use in template and method selection produces correct overlays in 57% of cases for 349 overlay ligands not used for training RF models. The inclusion in the models of protein sequence similarity enables the use of templates bound to related protein structures, yielding useful results even for proteins having no available X-ray structures.

  16. Protein Structure Prediction by Protein Threading

    NASA Astrophysics Data System (ADS)

    Xu, Ying; Liu, Zhijie; Cai, Liming; Xu, Dong

    The seminal work of Bowie, Lüthy, and Eisenberg (Bowie et al., 1991) on "the inverse protein folding problem" laid the foundation of protein structure prediction by protein threading. By using simple measures for fitness of different amino acid types to local structural environments defined in terms of solvent accessibility and protein secondary structure, the authors derived a simple and yet profoundly novel approach to assessing if a protein sequence fits well with a given protein structural fold. Their follow-up work (Elofsson et al., 1996; Fischer and Eisenberg, 1996; Fischer et al., 1996a,b) and the work by Jones, Taylor, and Thornton (Jones et al., 1992) on protein fold recognition led to the development of a new brand of powerful tools for protein structure prediction, which we now term "protein threading." These computational tools have played a key role in extending the utility of all the experimentally solved structures by X-ray crystallography and nuclear magnetic resonance (NMR), providing structural models and functional predictions for many of the proteins encoded in the hundreds of genomes that have been sequenced up to now.

  17. Models of S/π interactions in protein structures: Comparison of the H2S–benzene complex with PDB data

    PubMed Central

    Ringer, Ashley L.; Senenko, Anastasia; Sherrill, C. David

    2007-01-01

    S/π interactions are prevalent in biochemistry and play an important role in protein folding and stabilization. Geometries of cysteine/aromatic interactions found in crystal structures from the Brookhaven Protein Data Bank (PDB) are analyzed and compared with the equilibrium configurations predicted by high-level quantum mechanical results for the H2S–benzene complex. A correlation is observed between the energetically favorable configurations on the quantum mechanical potential energy surface of the H2S–benzene model and the cysteine/aromatic configurations most frequently found in crystal structures of the PDB. In contrast to some previous PDB analyses, configurations with the sulfur over the aromatic ring are found to be the most important. Our results suggest that accurate quantum computations on models of noncovalent interactions may be helpful in understanding the structures of proteins and other complex systems. PMID:17766371

  18. Predicting Structure and Function for Novel Proteins of an Extremophilic Iron Oxidizing Bacterium

    NASA Astrophysics Data System (ADS)

    Wheeler, K.; Zemla, A.; Banfield, J.; Thelen, M.

    2007-12-01

    Proteins isolated from uncultivated microbial populations represent the functional components of microbial processes and contribute directly to community fitness under natural conditions. Investigations into proteins in the environment are hindered by the lack of genome data, or where available, the high proportion of proteins of unknown function. We have identified thousands of proteins from biofilms in the extremely acidic drainage outflow of an iron mine ecosystem (1). With an extensive genomic and proteomic foundation, we have focused directly on the problem of several hundred proteins of unknown function within this well-defined model system. Here we describe the geobiological insights gained by using a high throughput computational approach for predicting structure and function of 421 novel proteins from the biofilm community. We used a homology based modeling system to compare these proteins to those of known structure (AS2TS) (2). This approach has resulted in the assignment of structures to 360 proteins (85%) and provided functional information for up to 75% of the modeled proteins. Detailed examination of the modeling results enables confident, high-throughput prediction of the roles of many of the novel proteins within the microbial community. For instance, one prediction places a protein in the phosphoenolpyruvate/pyruvate domain superfamily as a carboxylase that fills in a gap in an otherwise complete carbon cycle. Particularly important for a community in such a metal rich environment is the evolution of over 25% of the novel proteins that contain a metal cofactor; of these, one third are likely Fe containing proteins. Two of the most abundant proteins in biofilm samples are unusual c-type cytochromes. Both of these proteins catalyze iron- oxidation, a key metabolic reaction supporting the energy requirements of this community. Structural models of these cytochromes verify our experimental results on heme binding and electron transfer reactivity, and provide details for a working hypothesis of electron flow within the biofilm's major bacterium. Nearly 7% of the novel proteins contain tetratrico peptide repeat (TPR) modules, a protein-protein interaction domain that participates in signal transduction and a wide variety of other cellular functions. Like many biofilms, the various organisms in this community use unknown mechanisms to communicate, relying upon each other for survival. Especially interesting is evidence that most of these novel TPR proteins are located in the extracellular or membrane fractions, suggesting their role in intracellular communication. (1) Ram et al, 2005, Science 308:1915-20, "Community Proteomics of a Natural Microbial Biofilm" (2) Zemla et al, 2005, Nucleic Acids Res 33 (Web Server issue):W111-5, "AS2TS system for protein structure modeling and analysis" This work was funded by the DOE Genomics: GTL Program and was performed under the auspices of the DOE by the University of California, Lawrence Livermore National Laboratory under contract W-7405-Eng-48.

  19. Anomalous diffusion in neutral evolution of model proteins.

    PubMed

    Nelson, Erik D; Grishin, Nick V

    2015-06-01

    Protein evolution is frequently explored using minimalist polymer models, however, little attention has been given to the problem of structural drift, or diffusion. Here, we study neutral evolution of small protein motifs using an off-lattice heteropolymer model in which individual monomers interact as low-resolution amino acids. In contrast to most earlier models, both the length and folded structure of the polymers are permitted to change. To describe structural change, we compute the mean-square distance (MSD) between monomers in homologous folds separated by n neutral mutations. We find that structural change is episodic, and, averaged over lineages (for example, those extending from a single sequence), exhibits a power-law dependence on n. We show that this exponent depends on the alignment method used, and we analyze the distribution of waiting times between neutral mutations. The latter are more disperse than for models required to maintain a specific fold, but exhibit a similar power-law tail.

  20. Anomalous diffusion in neutral evolution of model proteins

    NASA Astrophysics Data System (ADS)

    Nelson, Erik D.; Grishin, Nick V.

    2015-06-01

    Protein evolution is frequently explored using minimalist polymer models, however, little attention has been given to the problem of structural drift, or diffusion. Here, we study neutral evolution of small protein motifs using an off-lattice heteropolymer model in which individual monomers interact as low-resolution amino acids. In contrast to most earlier models, both the length and folded structure of the polymers are permitted to change. To describe structural change, we compute the mean-square distance (MSD) between monomers in homologous folds separated by n neutral mutations. We find that structural change is episodic, and, averaged over lineages (for example, those extending from a single sequence), exhibits a power-law dependence on n . We show that this exponent depends on the alignment method used, and we analyze the distribution of waiting times between neutral mutations. The latter are more disperse than for models required to maintain a specific fold, but exhibit a similar power-law tail.

  1. Towards structural models of molecular recognition in olfactory receptors.

    PubMed

    Afshar, M; Hubbard, R E; Demaille, J

    1998-02-01

    The G protein coupled receptors (GPCR) are an important class of proteins that act as signal transducers through the cytoplasmic membrane. Understanding the structure and activation mechanism of these proteins is crucial for understanding many different aspects of cellular signalling. The olfactory receptors correspond to the largest family of GPCRs. Very little is known about how the structures of the receptors govern the specificity of interaction which enables identification of particular odorant molecules. In this paper, we review recent developments in two areas of molecular modelling: methods for modelling the configuration of trans-membrane helices and methods for automatic docking of ligands into receptor structures. We then show how a subset of these methods can be combined to construct a model of a rat odorant receptor interacting with lyral for which experimental data are available. This modelling can help us make progress towards elucidating the specificity of interactions between receptors and odorant molecules.

  2. Alanine and proline content modulate global sensitivity to discrete perturbations in disordered proteins

    PubMed Central

    Perez, Romel B.; Tischer, Alexander; Auton, Matthew; Whitten, Steven T.

    2014-01-01

    Molecular transduction of biological signals is understood primarily in terms of the cooperative structural transitions of protein macromolecules, providing a mechanism through which discrete local structure perturbations affect global macromolecular properties. The recognition that proteins lacking tertiary stability, commonly referred to as intrinsically disordered proteins, mediate key signaling pathways suggests that protein structures without cooperative intramolecular interactions may also have the ability to couple local and global structure changes. Presented here are results from experiments that measured and tested the ability of disordered proteins to couple local changes in structure to global changes in structure. Using the intrinsically disordered N-terminal region of the p53 protein as an experimental model, a set of proline and alanine to glycine substitution variants were designed to modulate backbone conformational propensities without introducing non-native intramolecular interactions. The hydrodynamic radius (Rh) was used to monitor changes in global structure. Circular dichroism spectroscopy showed that the glycine substitutions decreased polyproline II (PPII) propensities relative to the wild type, as expected, and fluorescence methods indicated that substitution-induced changes in Rh were not associated with folding. The experiments showed that changes in local PPII structure cause changes in Rh that are variable and that depend on the intrinsic chain propensities of proline and alanine residues, demonstrating a mechanism for coupling local and global structure changes. Molecular simulations that model our results were used to extend the analysis to other proteins and illustrate the generality of the observed proline and alanine effects on the structures of intrinsically disordered proteins. PMID:25244701

  3. Computational 3D structures of drug-targeting proteins in the 2009-H1N1 influenza A virus

    NASA Astrophysics Data System (ADS)

    Du, Qi-Shi; Wang, Shu-Qing; Huang, Ri-Bo; Chou, Kuo-Chen

    2010-01-01

    The neuraminidase (NA) and M2 proton channel of influenza virus are the drug-targeting proteins, based on which several drugs were developed. However these once powerful drugs encountered drug-resistant problem to the H5N1 and H1N1 flu. To address this problem, the computational 3D structures of NA and M2 proteins of 2009-H1N1 influenza virus were built using the molecular modeling technique and computational chemistry method. Based on the models the structure features of NA and M2 proteins were analyzed, the docking structures of drug-protein complexes were computed, and the residue mutations were annotated. The results may help to solve the drug-resistant problem and stimulate designing more effective drugs against 2009-H1N1 influenza pandemic.

  4. A template-finding algorithm and a comprehensive benchmark for homology modeling of proteins

    PubMed Central

    Vallat, Brinda Kizhakke; Pillardy, Jaroslaw; Elber, Ron

    2010-01-01

    The first step in homology modeling is to identify a template protein for the target sequence. The template structure is used in later phases of the calculation to construct an atomically detailed model for the target. We have built from the Protein Data Bank a large-scale learning set that includes tens of millions of pair matches that can be either a true template or a false one. Discriminatory learning (learning from positive and negative examples) is employed to train a decision tree. Each branch of the tree is a mathematical programming model. The decision tree is tested on an independent set from PDB entries and on the sequences of CASP7. It provides significant enrichment of true templates (between 50-100 percent) when compared to PSI-BLAST. The model is further verified by building atomically detailed structures for each of the tentative true templates with modeller. The probability that a true match does not yield an acceptable structural model (within 6Å RMSD from the native structure), decays linearly as a function of the TM structural-alignment score. PMID:18300226

  5. Whole Protein Native Fitness Potentials

    NASA Astrophysics Data System (ADS)

    Faraggi, Eshel; Kloczkowski, Andrzej

    2013-03-01

    Protein structure prediction can be separated into two tasks: sample the configuration space of the protein chain, and assign a fitness between these hypothetical models and the native structure of the protein. One of the more promising developments in this area is that of knowledge based energy functions. However, standard approaches using pair-wise interactions have shown shortcomings demonstrated by the superiority of multi-body-potentials. These shortcomings are due to residue pair-wise interaction being dependent on other residues along the chain. We developed a method that uses whole protein information filtered through machine learners to score protein models based on their likeness to native structures. For all models we calculated parameters associated with the distance to the solvent and with distances between residues. These parameters, in addition to energy estimates obtained by using a four-body-potential, DFIRE, and RWPlus were used as training for machine learners to predict the fitness of the models. Testing on CASP 9 targets showed that our method is superior to DFIRE, RWPlus, and the four-body potential, which are considered standards in the field.

  6. Hydrogen atoms in protein structures: high-resolution X-ray diffraction structure of the DFPase

    PubMed Central

    2013-01-01

    Background Hydrogen atoms represent about half of the total number of atoms in proteins and are often involved in substrate recognition and catalysis. Unfortunately, X-ray protein crystallography at usual resolution fails to access directly their positioning, mainly because light atoms display weak contributions to diffraction. However, sub-Ångstrom diffraction data, careful modeling and a proper refinement strategy can allow the positioning of a significant part of hydrogen atoms. Results A comprehensive study on the X-ray structure of the diisopropyl-fluorophosphatase (DFPase) was performed, and the hydrogen atoms were modeled, including those of solvent molecules. This model was compared to the available neutron structure of DFPase, and differences in the protein and the active site solvation were noticed. Conclusions A further examination of the DFPase X-ray structure provides substantial evidence about the presence of an activated water molecule that may constitute an interesting piece of information as regard to the enzymatic hydrolysis mechanism. PMID:23915572

  7. PDBFlex: exploring flexibility in protein structures

    PubMed Central

    Hrabe, Thomas; Li, Zhanwen; Sedova, Mayya; Rotkiewicz, Piotr; Jaroszewski, Lukasz; Godzik, Adam

    2016-01-01

    The PDBFlex database, available freely and with no login requirements at http://pdbflex.org, provides information on flexibility of protein structures as revealed by the analysis of variations between depositions of different structural models of the same protein in the Protein Data Bank (PDB). PDBFlex collects information on all instances of such depositions, identifying them by a 95% sequence identity threshold, performs analysis of their structural differences and clusters them according to their structural similarities for easy analysis. The PDBFlex contains tools and viewers enabling in-depth examination of structural variability including: 2D-scaling visualization of RMSD distances between structures of the same protein, graphs of average local RMSD in the aligned structures of protein chains, graphical presentation of differences in secondary structure and observed structural disorder (unresolved residues), difference distance maps between all sets of coordinates and 3D views of individual structures and simulated transitions between different conformations, the latter displayed using JSMol visualization software. PMID:26615193

  8. Identification of a "glycine-loop"-like coiled structure in the 34 AA Pro,Gly,Met repeat domain of the biomineral-associated protein, PM27.

    PubMed

    Wustman, Brandon A; Santos, Rudolpho; Zhang, Bo; Evans, John Spencer

    2002-12-05

    Fracture resistance in biomineralized structures has been linked to the presence of proteins, some of which possess sequences that are associated with elastic behavior. One such protein superfamily, the Pro,Gly-rich sea urchin intracrystalline spicule matrix proteins, form protein-protein supramolecular assemblies that modify the microstructure and fracture-resistant properties of the calcium carbonate mineral phase within embryonic sea urchin spicules and adult sea urchin spines. In this report, we detail the identification of a repetitive keratin-like "glycine-loop"- or coil-like structure within the 34-AA (AA: amino acid) N-terminal domain, (PGMG)(8)PG, of the spicule matrix protein, PM27. The identification of this repetitive structural motif was accomplished using two capped model peptides: a 9-AA sequence, GPGMGPGMG, and a 34-AA peptide representing the entire motif. Using CD, NMR spectrometry, and molecular dynamics simulated annealing/minimization simulations, we have determined that the 9-AA model peptide adopts a loop-like structure at pH 7.4. The structure of the 34-AA polypeptide resembles a coil structure consisting of repeating loop motifs that do not exhibit long-range ordering. Given that loop structures have been associated with protein elastic behavior and protein motion, it is plausible that the 34-AA Pro,Gly,Met repeat sequence motif in PM27 represents a putative elastic or mobile domain. Copyright 2002 Wiley Periodicals, Inc.

  9. Protein Structure Prediction Using Gas Phase Molecular Dynamics Simulation: EOTAXIN-3 Cytokine as a Case Study

    NASA Astrophysics Data System (ADS)

    Khairudin, Nurul Bahiyah Ahmad; Wahab, Habibah A.

    In the current work, the structure of the enzyme CC chemokine eotaxin-3 (1G2S) was chosen as a case study to investigate the effects of gas phase on the predicted protein conformation using molecular dynamics simulation. Generally, simulating proteins in the gas phase tend to suffer from various drawbacks, among which excessive numbers of protein-protein hydrogen bonds. However, current results showed that the effects of gas phase simulation on 1G2S did not amplify the protein-protein hydrogen bonds. It was also found that some of the hydrogen bonds which were crucial in maintaining the secondary structural elements were disrupted. The predicted models showed high values of RMSD, 11.5 Å and 13.5 Å for both vacuum and explicit solvent simulations, respectively, indicating that the conformers were very much different from the native conformation. Even though the RMSD value for the in vacuo model was slightly lower, it somehow suffered from lower fraction of native contacts, poor hydrogen bonding networks and fewer occurrences of secondary structural elements compared to the solvated model. This finding supports the notion that water plays a dominant role in guiding the protein to fold along the correct path.

  10. Cilia/Ift protein and motor -related bone diseases and mouse models.

    PubMed

    Yuan, Xue; Yang, Shuying

    2015-01-01

    Primary cilia are essential cellular organelles projecting from the cell surface to sense and transduce developmental signaling. They are tiny but have complicated structures containing microtubule (MT)-based internal structures (the axoneme) and mother centriole formed basal body. Intraflagellar transport (Ift) operated by Ift proteins and motors are indispensable for cilia formation and function. Mutations in Ift proteins or Ift motors cause various human diseases, some of which have severe bone defects. Over the last few decades, major advances have occurred in understanding the roles of these proteins and cilia in bone development and remodeling by examining cilia/Ift protein-related human diseases and establishing mouse transgenic models. In this review, we describe current advances in the understanding of the cilia/Ift structure and function. We further summarize cilia/Ift-related human diseases and current mouse models with an emphasis on bone-related phenotypes, cilia morphology, and signaling pathways.

  11. Structure prediction and analysis of MxaF from obligate, facultative and restricted facultative methylobacterium.

    PubMed

    Singh, Raghvendra Pratap; Singh, Ram Nageena; Srivastava, Manish K; Srivastava, Alok Kumar; Kumar, Sudheer; Dubey, Ramesh Chandra; Sharma, Arun Kumar

    2012-01-01

    Methylobacteria are ubiquitous in the biosphere which are capable of growing on C1 compounds such as formate, formaldehyde, methanol and methylamine as well as on a wide range of multi-carbon growth substrates such as C2, C3 and C4 compounds due to the methylotrophic enzymes methanol dehydrogenase (MDH). MDH is performing these functions with the help of a key protein mxaF. Unfortunately, detailed structural analysis and homology modeling of mxaF is remains undefined. Hence, the objective of this research is the characterization and three dimensional modeling of mxaF protein from three different methylotrophs by using I-TASSER server. The predicted model were further optimize and validate by Profile 3D, Errat, Verifiy3-D and PROCHECK server. Predicted and best evaluated models have been successfully deposited to PMDB database with PMDB ID PM0077505, PM0077506 and PM0077507. Active site identification revealed 11, 13 and 14 putative functional site residues in respected models. It may play a major role during protein-protein, and protein-cofactor interactions. This study can provide us an ab-initio and detail information to understand the structure, mechanism of action and regulation of mxaF protein.

  12. Structure prediction and analysis of MxaF from obligate, facultative and restricted facultative methylobacterium

    PubMed Central

    Singh, Raghvendra Pratap; Singh, Ram Nageena; Srivastava, Manish K; Srivastava, Alok Kumar; Kumar, Sudheer; Dubey, Ramesh Chandra; Sharma, Arun Kumar

    2012-01-01

    Methylobacteria are ubiquitous in the biosphere which are capable of growing on C1 compounds such as formate, formaldehyde, methanol and methylamine as well as on a wide range of multi-carbon growth substrates such as C2, C3 and C4 compounds due to the methylotrophic enzymes methanol dehydrogenase (MDH). MDH is performing these functions with the help of a key protein mxaF. Unfortunately, detailed structural analysis and homology modeling of mxaF is remains undefined. Hence, the objective of this research is the characterization and three dimensional modeling of mxaF protein from three different methylotrophs by using I-TASSER server. The predicted model were further optimize and validate by Profile 3D, Errat, Verifiy3-D and PROCHECK server. Predicted and best evaluated models have been successfully deposited to PMDB database with PMDB ID PM0077505, PM0077506 and PM0077507. Active site identification revealed 11, 13 and 14 putative functional site residues in respected models. It may play a major role during protein-protein, and protein-cofactor interactions. This study can provide us an ab-initio and detail information to understand the structure, mechanism of action and regulation of mxaF protein. PMID:23275704

  13. Distance restraints from crosslinking mass spectrometry: mining a molecular dynamics simulation database to evaluate lysine-lysine distances.

    PubMed

    Merkley, Eric D; Rysavy, Steven; Kahraman, Abdullah; Hafen, Ryan P; Daggett, Valerie; Adkins, Joshua N

    2014-06-01

    Integrative structural biology attempts to model the structures of protein complexes that are challenging or intractable by classical structural methods (due to size, dynamics, or heterogeneity) by combining computational structural modeling with data from experimental methods. One such experimental method is chemical crosslinking mass spectrometry (XL-MS), in which protein complexes are crosslinked and characterized using liquid chromatography-mass spectrometry to pinpoint specific amino acid residues in close structural proximity. The commonly used lysine-reactive N-hydroxysuccinimide ester reagents disuccinimidylsuberate (DSS) and bis(sulfosuccinimidyl)suberate (BS(3) ) have a linker arm that is 11.4 Å long when fully extended, allowing Cα (alpha carbon of protein backbone) atoms of crosslinked lysine residues to be up to ∼24 Å apart. However, XL-MS studies on proteins of known structure frequently report crosslinks that exceed this distance. Typically, a tolerance of ∼3 Å is added to the theoretical maximum to account for this observation, with limited justification for the chosen value. We used the Dynameomics database, a repository of high-quality molecular dynamics simulations of 807 proteins representative of diverse protein folds, to investigate the relationship between lysine-lysine distances in experimental starting structures and in simulation ensembles. We conclude that for DSS/BS(3), a distance constraint of 26-30 Å between Cα atoms is appropriate. This analysis provides a theoretical basis for the widespread practice of adding a tolerance to the crosslinker length when comparing XL-MS results to structures or in modeling. We also discuss the comparison of XL-MS results to MD simulations and known structures as a means to test and validate experimental XL-MS methods. © 2014 The Protein Society.

  14. Algorithm for selection of optimized EPR distance restraints for de novo protein structure determination

    PubMed Central

    Kazmier, Kelli; Alexander, Nathan S.; Meiler, Jens; Mchaourab, Hassane S.

    2010-01-01

    A hybrid protein structure determination approach combining sparse Electron Paramagnetic Resonance (EPR) distance restraints and Rosetta de novo protein folding has been previously demonstrated to yield high quality models (Alexander et al., 2008). However, widespread application of this methodology to proteins of unknown structures is hindered by the lack of a general strategy to place spin label pairs in the primary sequence. In this work, we report the development of an algorithm that optimally selects spin labeling positions for the purpose of distance measurements by EPR. For the α-helical subdomain of T4 lysozyme (T4L), simulated restraints that maximize sequence separation between the two spin labels while simultaneously ensuring pairwise connectivity of secondary structure elements yielded vastly improved models by Rosetta folding. 50% of all these models have the correct fold compared to only 21% and 8% correctly folded models when randomly placed restraints or no restraints are used, respectively. Moreover, the improvements in model quality require a limited number of optimized restraints, the number of which is determined by the pairwise connectivities of T4L α-helices. The predicted improvement in Rosetta model quality was verified by experimental determination of distances between spin labels pairs selected by the algorithm. Overall, our results reinforce the rationale for the combined use of sparse EPR distance restraints and de novo folding. By alleviating the experimental bottleneck associated with restraint selection, this algorithm sets the stage for extending computational structure determination to larger, traditionally elusive protein topologies of critical structural and biochemical importance. PMID:21074624

  15. Protein classification using probabilistic chain graphs and the Gene Ontology structure.

    PubMed

    Carroll, Steven; Pavlovic, Vladimir

    2006-08-01

    Probabilistic graphical models have been developed in the past for the task of protein classification. In many cases, classifications obtained from the Gene Ontology have been used to validate these models. In this work we directly incorporate the structure of the Gene Ontology into the graphical representation for protein classification. We present a method in which each protein is represented by a replicate of the Gene Ontology structure, effectively modeling each protein in its own 'annotation space'. Proteins are also connected to one another according to different measures of functional similarity, after which belief propagation is run to make predictions at all ontology terms. The proposed method was evaluated on a set of 4879 proteins from the Saccharomyces Genome Database whose interactions were also recorded in the GRID project. Results indicate that direct utilization of the Gene Ontology improves predictive ability, outperforming traditional models that do not take advantage of dependencies among functional terms. Average increase in accuracy (precision) of positive and negative term predictions of 27.8% (2.0%) over three different similarity measures and three subontologies was observed. C/C++/Perl implementation is available from authors upon request.

  16. Physics of protein folding

    NASA Astrophysics Data System (ADS)

    Finkelstein, A. V.; Galzitskaya, O. V.

    2004-04-01

    Protein physics is grounded on three fundamental experimental facts: protein, this long heteropolymer, has a well defined compact three-dimensional structure; this structure can spontaneously arise from the unfolded protein chain in appropriate environment; and this structure is separated from the unfolded state of the chain by the “all-or-none” phase transition, which ensures robustness of protein structure and therefore of its action. The aim of this review is to consider modern understanding of physical principles of self-organization of protein structures and to overview such important features of this process, as finding out the unique protein structure among zillions alternatives, nucleation of the folding process and metastable folding intermediates. Towards this end we will consider the main experimental facts and simple, mostly phenomenological theoretical models. We will concentrate on relatively small (single-domain) water-soluble globular proteins (whose structure and especially folding are much better studied and understood than those of large or membrane and fibrous proteins) and consider kinetic and structural aspects of transition of initially unfolded protein chains into their final solid (“native”) 3D structures.

  17. CABS-flex: server for fast simulation of protein structure fluctuations

    PubMed Central

    Jamroz, Michal; Kolinski, Andrzej; Kmiecik, Sebastian

    2013-01-01

    The CABS-flex server (http://biocomp.chem.uw.edu.pl/CABSflex) implements CABS-model–based protocol for the fast simulations of near-native dynamics of globular proteins. In this application, the CABS model was shown to be a computationally efficient alternative to all-atom molecular dynamics—a classical simulation approach. The simulation method has been validated on a large set of molecular dynamics simulation data. Using a single input (user-provided file in PDB format), the CABS-flex server outputs an ensemble of protein models (in all-atom PDB format) reflecting the flexibility of the input structure, together with the accompanying analysis (residue mean-square-fluctuation profile and others). The ensemble of predicted models can be used in structure-based studies of protein functions and interactions. PMID:23658222

  18. Protein engineering and the use of molecular modeling and simulation: the case of heterodimeric Fc engineering.

    PubMed

    Spreter Von Kreudenstein, Thomas; Lario, Paula I; Dixit, Surjit B

    2014-01-01

    Computational and structure guided methods can make significant contributions to the development of solutions for difficult protein engineering problems, including the optimization of next generation of engineered antibodies. In this paper, we describe a contemporary industrial antibody engineering program, based on hypothesis-driven in silico protein optimization method. The foundational concepts and methods of computational protein engineering are discussed, and an example of a computational modeling and structure-guided protein engineering workflow is provided for the design of best-in-class heterodimeric Fc with high purity and favorable biophysical properties. We present the engineering rationale as well as structural and functional characterization data on these engineered designs. Copyright © 2013 Elsevier Inc. All rights reserved.

  19. Challenges in structural approaches to cell modeling

    PubMed Central

    Im, Wonpil; Liang, Jie; Olson, Arthur; Zhou, Huan-Xiang; Vajda, Sandor; Vakser, Ilya A.

    2016-01-01

    Computational modeling is essential for structural characterization of biomolecular mechanisms across the broad spectrum of scales. Adequate understanding of biomolecular mechanisms inherently involves our ability to model them. Structural modeling of individual biomolecules and their interactions has been rapidly progressing. However, in terms of the broader picture, the focus is shifting toward larger systems, up to the level of a cell. Such modeling involves a more dynamic and realistic representation of the interactomes in vivo, in a crowded cellular environment, as well as membranes and membrane proteins, and other cellular components. Structural modeling of a cell complements computational approaches to cellular mechanisms based on differential equations, graph models, and other techniques to model biological networks, imaging data, etc. Structural modeling along with other computational and experimental approaches will provide a fundamental understanding of life at the molecular level and lead to important applications to biology and medicine. A cross section of diverse approaches presented in this review illustrates the developing shift from the structural modeling of individual molecules to that of cell biology. Studies in several related areas are covered: biological networks; automated construction of three-dimensional cell models using experimental data; modeling of protein complexes; prediction of non-specific and transient protein interactions; thermodynamic and kinetic effects of crowding; cellular membrane modeling; and modeling of chromosomes. The review presents an expert opinion on the current state-of-the-art in these various aspects of structural modeling in cellular biology, and the prospects of future developments in this emerging field. PMID:27255863

  20. Comparing pharmacophore models derived from crystallography and NMR ensembles

    NASA Astrophysics Data System (ADS)

    Ghanakota, Phani; Carlson, Heather A.

    2017-11-01

    NMR and X-ray crystallography are the two most widely used methods for determining protein structures. Our previous study examining NMR versus X-Ray sources of protein conformations showed improved performance with NMR structures when used in our Multiple Protein Structures (MPS) method for receptor-based pharmacophores (Damm, Carlson, J Am Chem Soc 129:8225-8235, 2007). However, that work was based on a single test case, HIV-1 protease, because of the rich data available for that system. New data for more systems are available now, which calls for further examination of the effect of different sources of protein conformations. The MPS technique was applied to Growth factor receptor bound protein 2 (Grb2), Src SH2 homology domain (Src-SH2), FK506-binding protein 1A (FKBP12), and Peroxisome proliferator-activated receptor-γ (PPAR-γ). Pharmacophore models from both crystal and NMR ensembles were able to discriminate between high-affinity, low-affinity, and decoy molecules. As we found in our original study, NMR models showed optimal performance when all elements were used. The crystal models had more pharmacophore elements compared to their NMR counterparts. The crystal-based models exhibited optimum performance only when pharmacophore elements were dropped. This supports our assertion that the higher flexibility in NMR ensembles helps focus the models on the most essential interactions with the protein. Our studies suggest that the "extra" pharmacophore elements seen at the periphery in X-ray models arise as a result of decreased protein flexibility and make very little contribution to model performance.

  1. Computational Simulation of the Activation Cycle of Gα Subunit in the G Protein Cycle Using an Elastic Network Model

    PubMed Central

    Kim, Min Hyeok; Kim, Young Jin; Kim, Hee Ryung; Jeon, Tae-Joon; Choi, Jae Boong; Chung, Ka Young; Kim, Moon Ki

    2016-01-01

    Agonist-activated G protein-coupled receptors (GPCRs) interact with GDP-bound G protein heterotrimers (Gαβγ) promoting GDP/GTP exchange, which results in dissociation of Gα from the receptor and Gβγ. The GTPase activity of Gα hydrolyzes GTP to GDP, and the GDP-bound Gα interacts with Gβγ, forming a GDP-bound G protein heterotrimer. The G protein cycle is allosterically modulated by conformational changes of the Gα subunit. Although biochemical and biophysical methods have elucidated the structure and dynamics of Gα, the precise conformational mechanisms underlying the G protein cycle are not fully understood yet. Simulation methods could help to provide additional details to gain further insight into G protein signal transduction mechanisms. In this study, using the available X-ray crystal structures of Gα, we simulated the entire G protein cycle and described not only the steric features of the Gα structure, but also conformational changes at each step. Each reference structure in the G protein cycle was modeled as an elastic network model and subjected to normal mode analysis. Our simulation data suggests that activated receptors trigger conformational changes of the Gα subunit that are thermodynamically favorable for opening of the nucleotide-binding pocket and GDP release. Furthermore, the effects of GTP binding and hydrolysis on mobility changes of the C and N termini and switch regions are elucidated. In summary, our simulation results enabled us to provide detailed descriptions of the structural and dynamic features of the G protein cycle. PMID:27483005

  2. Solving protein structures using short-distance cross-linking constraints as a guide for discrete molecular dynamics simulations.

    PubMed

    Brodie, Nicholas I; Popov, Konstantin I; Petrotchenko, Evgeniy V; Dokholyan, Nikolay V; Borchers, Christoph H

    2017-07-01

    We present an integrated experimental and computational approach for de novo protein structure determination in which short-distance cross-linking data are incorporated into rapid discrete molecular dynamics (DMD) simulations as constraints, reducing the conformational space and achieving the correct protein folding on practical time scales. We tested our approach on myoglobin and FK506 binding protein-models for α helix-rich and β sheet-rich proteins, respectively-and found that the lowest-energy structures obtained were in agreement with the crystal structure, hydrogen-deuterium exchange, surface modification, and long-distance cross-linking validation data. Our approach is readily applicable to other proteins with unknown structures.

  3. Predicting structural classes of proteins by incorporating their global and local physicochemical and conformational properties into general Chou's PseAAC.

    PubMed

    Contreras-Torres, Ernesto

    2018-06-02

    In this study, I introduce novel global and local 0D-protein descriptors based on a statistical quantity named Total Sum of Squares (TSS). This quantity represents the sum of the squares differences of amino acid properties from the arithmetic mean property. As an extension, the amino acid-types and amino acid-groups formalisms are used for describing zones of interest in proteins. To assess the effectiveness of the proposed descriptors, a Nearest Neighbor model for predicting the major four protein structural classes was built. This model has a success rate of 98.53% on the jackknife cross-validation test; this performance being superior to other reported methods despite the simplicity of the predictor. Additionally, this predictor has an average success rate of 98.35% in different cross-validation tests performed. A value of 0.98 for the Kappa statistic clearly discriminates this model from a random predictor. The results obtained by the Nearest Neighbor model demonstrated the ability of the proposed descriptors not only to reflect relevant biochemical information related to the structural classes of proteins but also to allow appropriate interpretability. It can thus be expected that the current method may play a supplementary role to other existing approaches for protein structural class prediction and other protein attributes. Copyright © 2018 Elsevier Ltd. All rights reserved.

  4. G23D: Online tool for mapping and visualization of genomic variants on 3D protein structures.

    PubMed

    Solomon, Oz; Kunik, Vered; Simon, Amos; Kol, Nitzan; Barel, Ortal; Lev, Atar; Amariglio, Ninette; Somech, Raz; Rechavi, Gidi; Eyal, Eran

    2016-08-26

    Evaluation of the possible implications of genomic variants is an increasingly important task in the current high throughput sequencing era. Structural information however is still not routinely exploited during this evaluation process. The main reasons can be attributed to the partial structural coverage of the human proteome and the lack of tools which conveniently convert genomic positions, which are the frequent output of genomic pipelines, to proteins and structure coordinates. We present G23D, a tool for conversion of human genomic coordinates to protein coordinates and protein structures. G23D allows mapping of genomic positions/variants on evolutionary related (and not only identical) protein three dimensional (3D) structures as well as on theoretical models. By doing so it significantly extends the space of variants for which structural insight is feasible. To facilitate interpretation of the variant consequence, pathogenic variants, functional sites and polymorphism sites are displayed on protein sequence and structure diagrams alongside the input variants. G23D also provides modeling of the mutant structure, analysis of intra-protein contacts and instant access to functional predictions and predictions of thermo-stability changes. G23D is available at http://www.sheba-cancer.org.il/G23D . G23D extends the fraction of variants for which structural analysis is applicable and provides better and faster accessibility for structural data to biologists and geneticists who routinely work with genomic information.

  5. Modeling protein conformational changes by iterative fitting of distance constraints using reoriented normal modes.

    PubMed

    Zheng, Wenjun; Brooks, Bernard R

    2006-06-15

    Recently we have developed a normal-modes-based algorithm that predicts the direction of protein conformational changes given the initial state crystal structure together with a small number of pairwise distance constraints for the end state. Here we significantly extend this method to accurately model both the direction and amplitude of protein conformational changes. The new protocol implements a multisteps search in the conformational space that is driven by iteratively minimizing the error of fitting the given distance constraints and simultaneously enforcing the restraint of low elastic energy. At each step, an incremental structural displacement is computed as a linear combination of the lowest 10 normal modes derived from an elastic network model, whose eigenvectors are reorientated to correct for the distortions caused by the structural displacements in the previous steps. We test this method on a list of 16 pairs of protein structures for which relatively large conformational changes are observed (root mean square deviation >3 angstroms), using up to 10 pairwise distance constraints selected by a fluctuation analysis of the initial state structures. This method has achieved a near-optimal performance in almost all cases, and in many cases the final structural models lie within root mean square deviation of 1 approximately 2 angstroms from the native end state structures.

  6. A fragmentation and reassembly method for ab initio phasing.

    PubMed

    Shrestha, Rojan; Zhang, Kam Y J

    2015-02-01

    Ab initio phasing with de novo models has become a viable approach for structural solution from protein crystallographic diffraction data. This approach takes advantage of the known protein sequence information, predicts de novo models and uses them for structure determination by molecular replacement. However, even the current state-of-the-art de novo modelling method has a limit as to the accuracy of the model predicted, which is sometimes insufficient to be used as a template for successful molecular replacement. A fragment-assembly phasing method has been developed that starts from an ensemble of low-accuracy de novo models, disassembles them into fragments, places them independently in the crystallographic unit cell by molecular replacement and then reassembles them into a whole structure that can provide sufficient phase information to enable complete structure determination by automated model building. Tests on ten protein targets showed that the method could solve structures for eight of these targets, although the predicted de novo models cannot be used as templates for successful molecular replacement since the best model for each target is on average more than 4.0 Å away from the native structure. The method has extended the applicability of the ab initio phasing by de novo models approach. The method can be used to solve structures when the best de novo models are still of low accuracy.

  7. Simplified Protein Models: Predicting Folding Pathways and Structure Using Amino Acid Sequences

    NASA Astrophysics Data System (ADS)

    Adhikari, Aashish N.; Freed, Karl F.; Sosnick, Tobin R.

    2013-07-01

    We demonstrate the ability of simultaneously determining a protein’s folding pathway and structure using a properly formulated model without prior knowledge of the native structure. Our model employs a natural coordinate system for describing proteins and a search strategy inspired by the observation that real proteins fold in a sequential fashion by incrementally stabilizing nativelike substructures or “foldons.” Comparable folding pathways and structures are obtained for the twelve proteins recently studied using atomistic molecular dynamics simulations [K. Lindorff-Larsen, S. Piana, R. O. Dror, D. E. Shaw, Science 334, 517 (2011)], with our calculations running several orders of magnitude faster. We find that nativelike propensities in the unfolded state do not necessarily determine the order of structure formation, a departure from a major conclusion of the molecular dynamics study. Instead, our results support a more expansive view wherein intrinsic local structural propensities may be enhanced or overridden in the folding process by environmental context. The success of our search strategy validates it as an expedient mechanism for folding both in silico and in vivo.

  8. Layers: A molecular surface peeling algorithm and its applications to analyze protein structures

    PubMed Central

    Karampudi, Naga Bhushana Rao; Bahadur, Ranjit Prasad

    2015-01-01

    We present an algorithm ‘Layers’ to peel the atoms of proteins as layers. Using Layers we show an efficient way to transform protein structures into 2D pattern, named residue transition pattern (RTP), which is independent of molecular orientations. RTP explains the folding patterns of proteins and hence identification of similarity between proteins is simple and reliable using RTP than with the standard sequence or structure based methods. Moreover, Layers generates a fine-tunable coarse model for the molecular surface by using non-random sampling. The coarse model can be used for shape comparison, protein recognition and ligand design. Additionally, Layers can be used to develop biased initial configuration of molecules for protein folding simulations. We have developed a random forest classifier to predict the RTP of a given polypeptide sequence. Layers is a standalone application; however, it can be merged with other applications to reduce the computational load when working with large datasets of protein structures. Layers is available freely at http://www.csb.iitkgp.ernet.in/applications/mol_layers/main. PMID:26553411

  9. Fragment-based modelling of single stranded RNA bound to RNA recognition motif containing proteins

    PubMed Central

    de Beauchene, Isaure Chauvot; de Vries, Sjoerd J.; Zacharias, Martin

    2016-01-01

    Abstract Protein-RNA complexes are important for many biological processes. However, structural modeling of such complexes is hampered by the high flexibility of RNA. Particularly challenging is the docking of single-stranded RNA (ssRNA). We have developed a fragment-based approach to model the structure of ssRNA bound to a protein, based on only the protein structure, the RNA sequence and conserved contacts. The conformational diversity of each RNA fragment is sampled by an exhaustive library of trinucleotides extracted from all known experimental protein–RNA complexes. The method was applied to ssRNA with up to 12 nucleotides which bind to dimers of the RNA recognition motifs (RRMs), a highly abundant eukaryotic RNA-binding domain. The fragment based docking allows a precise de novo atomic modeling of protein-bound ssRNA chains. On a benchmark of seven experimental ssRNA–RRM complexes, near-native models (with a mean heavy-atom deviation of <3 Å from experiment) were generated for six out of seven bound RNA chains, and even more precise models (deviation < 2 Å) were obtained for five out of seven cases, a significant improvement compared to the state of the art. The method is not restricted to RRMs but was also successfully applied to Pumilio RNA binding proteins. PMID:27131381

  10. Toxicity challenges in environmental chemicals: Prediction of human plasma protein binding through quantitative structure-activity relationship (QSAR) models

    EPA Science Inventory

    The present study explores the merit of utilizing available pharmaceutical data to construct a quantitative structure-activity relationship (QSAR) for prediction of the fraction of a chemical unbound to plasma protein (Fub) in environmentally relevant compounds. Independent model...

  11. Using Molecular Visualization to Explore Protein Structure and Function and Enhance Student Facility with Computational Tools

    ERIC Educational Resources Information Center

    Terrell, Cassidy R.; Listenberger, Laura L.

    2017-01-01

    Recognizing that undergraduate students can benefit from analysis of 3D protein structure and function, we have developed a multiweek, inquiry-based molecular visualization project for Biochemistry I students. This project uses a virtual model of cyclooxygenase-1 (COX-1) to guide students through multiple levels of protein structure analysis. The…

  12. Roles of beta-turns in protein folding: from peptide models to protein engineering.

    PubMed

    Marcelino, Anna Marie C; Gierasch, Lila M

    2008-05-01

    Reverse turns are a major class of protein secondary structure; they represent sites of chain reversal and thus sites where the globular character of a protein is created. It has been speculated for many years that turns may nucleate the formation of structure in protein folding, as their propensity to occur will favor the approximation of their flanking regions and their general tendency to be hydrophilic will favor their disposition at the solvent-accessible surface. Reverse turns are local features, and it is therefore not surprising that their structural properties have been extensively studied using peptide models. In this article, we review research on peptide models of turns to test the hypothesis that the propensities of turns to form in short peptides will relate to the roles of corresponding sequences in protein folding. Turns with significant stability as isolated entities should actively promote the folding of a protein, and by contrast, turn sequences that merely allow the chain to adopt conformations required for chain reversal are predicted to be passive in the folding mechanism. We discuss results of protein engineering studies of the roles of turn residues in folding mechanisms. Factors that correlate with the importance of turns in folding indeed include their intrinsic stability, as well as their topological context and their participation in hydrophobic networks within the protein's structure.

  13. Biological and functional relevance of CASP predictions.

    PubMed

    Liu, Tianyun; Ish-Shalom, Shirbi; Torng, Wen; Lafita, Aleix; Bock, Christian; Mort, Matthew; Cooper, David N; Bliven, Spencer; Capitani, Guido; Mooney, Sean D; Altman, Russ B

    2018-03-01

    Our goal is to answer the question: compared with experimental structures, how useful are predicted models for functional annotation? We assessed the functional utility of predicted models by comparing the performances of a suite of methods for functional characterization on the predictions and the experimental structures. We identified 28 sites in 25 protein targets to perform functional assessment. These 28 sites included nine sites with known ligand binding (holo-sites), nine sites that are expected or suggested by experimental authors for small molecule binding (apo-sites), and Ten sites containing important motifs, loops, or key residues with important disease-associated mutations. We evaluated the utility of the predictions by comparing their microenvironments to the experimental structures. Overall structural quality correlates with functional utility. However, the best-ranked predictions (global) may not have the best functional quality (local). Our assessment provides an ability to discriminate between predictions with high structural quality. When assessing ligand-binding sites, most prediction methods have higher performance on apo-sites than holo-sites. Some servers show consistently high performance for certain types of functional sites. Finally, many functional sites are associated with protein-protein interaction. We also analyzed biologically relevant features from the protein assemblies of two targets where the active site spanned the protein-protein interface. For the assembly targets, we find that the features in the models are mainly determined by the choice of template. © 2017 The Authors Proteins: Structure, Function and Bioinformatics Published by Wiley Periodicals, Inc.

  14. Assessing the performance of MM/PBSA and MM/GBSA methods. 8. Predicting binding free energies and poses of protein-RNA complexes.

    PubMed

    Chen, Fu; Sun, Huiyong; Wang, Junmei; Zhu, Feng; Liu, Hui; Wang, Zhe; Lei, Tailong; Li, Youyong; Hou, Tingjun

    2018-06-21

    Molecular docking provides a computationally efficient way to predict the atomic structural details of protein-RNA interactions (PRI), but accurate prediction of the three-dimensional structures and binding affinities for PRI is still notoriously difficult, partly due to the unreliability of the existing scoring functions for PRI. MM/PBSA and MM/GBSA are more theoretically rigorous than most scoring functions for protein-RNA docking, but their prediction performance for protein-RNA systems remains unclear. Here, we systemically evaluated the capability of MM/PBSA and MM/GBSA to predict the binding affinities and recognize the near-native binding structures for protein-RNA systems with different solvent models and interior dielectric constants (ϵ in ). For predicting the binding affinities, the predictions given by MM/GBSA based on the minimized structures in explicit solvent and the GBGBn1 model with ϵ in = 2 yielded the highest correlation with the experimental data. Moreover, the MM/GBSA calculations based on the minimized structures in implicit solvent and the GBGBn1 model distinguished the near-native binding structures within the top 10 decoys for 118 out of the 149 protein-RNA systems (79.2%). This performance is better than all docking scoring functions studied here. Therefore, the MM/GBSA rescoring is an efficient way to improve the prediction capability of scoring functions for protein-RNA systems. Published by Cold Spring Harbor Laboratory Press for the RNA Society.

  15. Molecular dynamics simulations of the Nip7 proteins from the marine deep- and shallow-water Pyrococcus species.

    PubMed

    Medvedev, Kirill E; Alemasov, Nikolay A; Vorobjev, Yuri N; Boldyreva, Elena V; Kolchanov, Nikolay A; Afonnikov, Dmitry A

    2014-10-15

    The identification of the mechanisms of adaptation of protein structures to extreme environmental conditions is a challenging task of structural biology. We performed molecular dynamics (MD) simulations of the Nip7 protein involved in RNA processing from the shallow-water (P. furiosus) and the deep-water (P. abyssi) marine hyperthermophylic archaea at different temperatures (300 and 373 K) and pressures (0.1, 50 and 100 MPa). The aim was to disclose similarities and differences between the deep- and shallow-sea protein models at different temperatures and pressures. The current results demonstrate that the 3D models of the two proteins at all the examined values of pressures and temperatures are compact, stable and similar to the known crystal structure of the P. abyssi Nip7. The structural deviations and fluctuations in the polypeptide chain during the MD simulations were the most pronounced in the loop regions, their magnitude being larger for the C-terminal domain in both proteins. A number of highly mobile segments the protein globule presumably involved in protein-protein interactions were identified. Regions of the polypeptide chain with significant difference in conformational dynamics between the deep- and shallow-water proteins were identified. The results of our analysis demonstrated that in the examined ranges of temperatures and pressures, increase in temperature has a stronger effect on change in the dynamic properties of the protein globule than the increase in pressure. The conformational changes of both the deep- and shallow-sea protein models under increasing temperature and pressure are non-uniform. Our current results indicate that amino acid substitutions between shallow- and deep-water proteins only slightly affect overall stability of two proteins. Rather, they may affect the interactions of the Nip7 protein with its protein or RNA partners.

  16. Adjusting protein graphs based on graph entropy.

    PubMed

    Peng, Sheng-Lung; Tsay, Yu-Wei

    2014-01-01

    Measuring protein structural similarity attempts to establish a relationship of equivalence between polymer structures based on their conformations. In several recent studies, researchers have explored protein-graph remodeling, instead of looking a minimum superimposition for pairwise proteins. When graphs are used to represent structured objects, the problem of measuring object similarity become one of computing the similarity between graphs. Graph theory provides an alternative perspective as well as efficiency. Once a protein graph has been created, its structural stability must be verified. Therefore, a criterion is needed to determine if a protein graph can be used for structural comparison. In this paper, we propose a measurement for protein graph remodeling based on graph entropy. We extend the concept of graph entropy to determine whether a graph is suitable for representing a protein. The experimental results suggest that when applied, graph entropy helps a conformational on protein graph modeling. Furthermore, it indirectly contributes to protein structural comparison if a protein graph is solid.

  17. Adjusting protein graphs based on graph entropy

    PubMed Central

    2014-01-01

    Measuring protein structural similarity attempts to establish a relationship of equivalence between polymer structures based on their conformations. In several recent studies, researchers have explored protein-graph remodeling, instead of looking a minimum superimposition for pairwise proteins. When graphs are used to represent structured objects, the problem of measuring object similarity become one of computing the similarity between graphs. Graph theory provides an alternative perspective as well as efficiency. Once a protein graph has been created, its structural stability must be verified. Therefore, a criterion is needed to determine if a protein graph can be used for structural comparison. In this paper, we propose a measurement for protein graph remodeling based on graph entropy. We extend the concept of graph entropy to determine whether a graph is suitable for representing a protein. The experimental results suggest that when applied, graph entropy helps a conformational on protein graph modeling. Furthermore, it indirectly contributes to protein structural comparison if a protein graph is solid. PMID:25474347

  18. Structure-based Markov random field model for representing evolutionary constraints on functional sites.

    PubMed

    Jeong, Chan-Seok; Kim, Dongsup

    2016-02-24

    Elucidating the cooperative mechanism of interconnected residues is an important component toward understanding the biological function of a protein. Coevolution analysis has been developed to model the coevolutionary information reflecting structural and functional constraints. Recently, several methods have been developed based on a probabilistic graphical model called the Markov random field (MRF), which have led to significant improvements for coevolution analysis; however, thus far, the performance of these models has mainly been assessed by focusing on the aspect of protein structure. In this study, we built an MRF model whose graphical topology is determined by the residue proximity in the protein structure, and derived a novel positional coevolution estimate utilizing the node weight of the MRF model. This structure-based MRF method was evaluated for three data sets, each of which annotates catalytic site, allosteric site, and comprehensively determined functional site information. We demonstrate that the structure-based MRF architecture can encode the evolutionary information associated with biological function. Furthermore, we show that the node weight can more accurately represent positional coevolution information compared to the edge weight. Lastly, we demonstrate that the structure-based MRF model can be reliably built with only a few aligned sequences in linear time. The results show that adoption of a structure-based architecture could be an acceptable approximation for coevolution modeling with efficient computation complexity.

  19. Structural modelling and comparative analysis of homologous, analogous and specific proteins from Trypanosoma cruzi versus Homo sapiens: putative drug targets for chagas' disease treatment

    PubMed Central

    2010-01-01

    Background Trypanosoma cruzi is the etiological agent of Chagas' disease, an endemic infection that causes thousands of deaths every year in Latin America. Therapeutic options remain inefficient, demanding the search for new drugs and/or new molecular targets. Such efforts can focus on proteins that are specific to the parasite, but analogous enzymes and enzymes with a three-dimensional (3D) structure sufficiently different from the corresponding host proteins may represent equally interesting targets. In order to find these targets we used the workflows MHOLline and AnEnΠ obtaining 3D models from homologous, analogous and specific proteins of Trypanosoma cruzi versus Homo sapiens. Results We applied genome wide comparative modelling techniques to obtain 3D models for 3,286 predicted proteins of T. cruzi. In combination with comparative genome analysis to Homo sapiens, we were able to identify a subset of 397 enzyme sequences, of which 356 are homologous, 3 analogous and 38 specific to the parasite. Conclusions In this work, we present a set of 397 enzyme models of T. cruzi that can constitute potential structure-based drug targets to be investigated for the development of new strategies to fight Chagas' disease. The strategies presented here support the concept of structural analysis in conjunction with protein functional analysis as an interesting computational methodology to detect potential targets for structure-based rational drug design. For example, 2,4-dienoyl-CoA reductase (EC 1.3.1.34) and triacylglycerol lipase (EC 3.1.1.3), classified as analogous proteins in relation to H. sapiens enzymes, were identified as new potential molecular targets. PMID:21034488

  20. Structural modelling and comparative analysis of homologous, analogous and specific proteins from Trypanosoma cruzi versus Homo sapiens: putative drug targets for chagas' disease treatment.

    PubMed

    Capriles, Priscila V S Z; Guimarães, Ana C R; Otto, Thomas D; Miranda, Antonio B; Dardenne, Laurent E; Degrave, Wim M

    2010-10-29

    Trypanosoma cruzi is the etiological agent of Chagas' disease, an endemic infection that causes thousands of deaths every year in Latin America. Therapeutic options remain inefficient, demanding the search for new drugs and/or new molecular targets. Such efforts can focus on proteins that are specific to the parasite, but analogous enzymes and enzymes with a three-dimensional (3D) structure sufficiently different from the corresponding host proteins may represent equally interesting targets. In order to find these targets we used the workflows MHOLline and AnEnΠ obtaining 3D models from homologous, analogous and specific proteins of Trypanosoma cruzi versus Homo sapiens. We applied genome wide comparative modelling techniques to obtain 3D models for 3,286 predicted proteins of T. cruzi. In combination with comparative genome analysis to Homo sapiens, we were able to identify a subset of 397 enzyme sequences, of which 356 are homologous, 3 analogous and 38 specific to the parasite. In this work, we present a set of 397 enzyme models of T. cruzi that can constitute potential structure-based drug targets to be investigated for the development of new strategies to fight Chagas' disease. The strategies presented here support the concept of structural analysis in conjunction with protein functional analysis as an interesting computational methodology to detect potential targets for structure-based rational drug design. For example, 2,4-dienoyl-CoA reductase (EC 1.3.1.34) and triacylglycerol lipase (EC 3.1.1.3), classified as analogous proteins in relation to H. sapiens enzymes, were identified as new potential molecular targets.

  1. Complete protein-protein association kinetics in atomic detail revealed by molecular dynamics simulations and Markov modelling

    NASA Astrophysics Data System (ADS)

    Plattner, Nuria; Doerr, Stefan; de Fabritiis, Gianni; Noé, Frank

    2017-10-01

    Protein-protein association is fundamental to many life processes. However, a microscopic model describing the structures and kinetics during association and dissociation is lacking on account of the long lifetimes of associated states, which have prevented efficient sampling by direct molecular dynamics (MD) simulations. Here we demonstrate protein-protein association and dissociation in atomistic resolution for the ribonuclease barnase and its inhibitor barstar by combining adaptive high-throughput MD simulations and hidden Markov modelling. The model reveals experimentally consistent intermediate structures, energetics and kinetics on timescales from microseconds to hours. A variety of flexibly attached intermediates and misbound states funnel down to a transition state and a native basin consisting of the loosely bound near-native state and the tightly bound crystallographic state. These results offer a deeper level of insight into macromolecular recognition and our approach opens the door for understanding and manipulating a wide range of macromolecular association processes.

  2. Peppytides: Interactive Models of Polypeptide Chains

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zuckermann, Ron; Chakraborty, Promita; Derisi, Joe

    2014-01-21

    Peppytides are scaled, 3D-printed models of polypeptide chains that can be folded into accurate protein structures. Designed and created by Berkeley Lab Researcher, Promita Chakraborty, and Berkeley Lab Senior Scientist, Dr. Ron Zuckermann, Peppytides are accurate physical models of polypeptide chains that anyone can interact with and fold intro various protein structures - proving to be a great educational tool, resulting in a deeper understanding of these fascinating structures and how they function. Build your own Peppytide model and learn about how nature's machines fold into their intricate architectures!

  3. Peppytides: Interactive Models of Polypeptide Chains

    ScienceCinema

    Zuckermann, Ron; Chakraborty, Promita; Derisi, Joe

    2018-06-08

    Peppytides are scaled, 3D-printed models of polypeptide chains that can be folded into accurate protein structures. Designed and created by Berkeley Lab Researcher, Promita Chakraborty, and Berkeley Lab Senior Scientist, Dr. Ron Zuckermann, Peppytides are accurate physical models of polypeptide chains that anyone can interact with and fold intro various protein structures - proving to be a great educational tool, resulting in a deeper understanding of these fascinating structures and how they function. Build your own Peppytide model and learn about how nature's machines fold into their intricate architectures!

  4. The origin of consistent protein structure refinement from structural averaging.

    PubMed

    Park, Hahnbeom; DiMaio, Frank; Baker, David

    2015-06-02

    Recent studies have shown that explicit solvent molecular dynamics (MD) simulation followed by structural averaging can consistently improve protein structure models. We find that improvement upon averaging is not limited to explicit water MD simulation, as consistent improvements are also observed for more efficient implicit solvent MD or Monte Carlo minimization simulations. To determine the origin of these improvements, we examine the changes in model accuracy brought about by averaging at the individual residue level. We find that the improvement in model quality from averaging results from the superposition of two effects: a dampening of deviations from the correct structure in the least well modeled regions, and a reinforcement of consistent movements towards the correct structure in better modeled regions. These observations are consistent with an energy landscape model in which the magnitude of the energy gradient toward the native structure decreases with increasing distance from the native state. Copyright © 2015 Elsevier Ltd. All rights reserved.

  5. Predicting side-chain conformations of methionine using a hard-sphere model with stereochemical constraints

    NASA Astrophysics Data System (ADS)

    Virrueta, A.; Gaines, J.; O'Hern, C. S.; Regan, L.

    2015-03-01

    Current research in the O'Hern and Regan laboratories focuses on the development of hard-sphere models with stereochemical constraints for protein structure prediction as an alternative to molecular dynamics methods that utilize knowledge-based corrections in their force-fields. Beginning with simple hydrophobic dipeptides like valine, leucine, and isoleucine, we have shown that our model is able to reproduce the side-chain dihedral angle distributions derived from sets of high-resolution protein crystal structures. However, methionine remains an exception - our model yields a chi-3 side-chain dihedral angle distribution that is relatively uniform from 60 to 300 degrees, while the observed distribution displays peaks at 60, 180, and 300 degrees. Our goal is to resolve this discrepancy by considering clashes with neighboring residues, and averaging the reduced distribution of allowable methionine structures taken from a set of crystallized proteins. We will also re-evaluate the electron density maps from which these protein structures are derived to ensure that the methionines and their local environments are correctly modeled. This work will ultimately serve as a tool for computing side-chain entropy and protein stability. A. V. is supported by an NSF Graduate Research Fellowship and a Ford Foundation Fellowship. J. G. is supported by NIH training Grant NIH-5T15LM007056-28.

  6. Automatic Prediction of Protein 3D Structures by Probabilistic Multi-template Homology Modeling.

    PubMed

    Meier, Armin; Söding, Johannes

    2015-10-01

    Homology modeling predicts the 3D structure of a query protein based on the sequence alignment with one or more template proteins of known structure. Its great importance for biological research is owed to its speed, simplicity, reliability and wide applicability, covering more than half of the residues in protein sequence space. Although multiple templates have been shown to generally increase model quality over single templates, the information from multiple templates has so far been combined using empirically motivated, heuristic approaches. We present here a rigorous statistical framework for multi-template homology modeling. First, we find that the query proteins' atomic distance restraints can be accurately described by two-component Gaussian mixtures. This insight allowed us to apply the standard laws of probability theory to combine restraints from multiple templates. Second, we derive theoretically optimal weights to correct for the redundancy among related templates. Third, a heuristic template selection strategy is proposed. We improve the average GDT-ha model quality score by 11% over single template modeling and by 6.5% over a conventional multi-template approach on a set of 1000 query proteins. Robustness with respect to wrong constraints is likewise improved. We have integrated our multi-template modeling approach with the popular MODELLER homology modeling software in our free HHpred server http://toolkit.tuebingen.mpg.de/hhpred and also offer open source software for running MODELLER with the new restraints at https://bitbucket.org/soedinglab/hh-suite.

  7. Homology-based Modeling of Rhodopsin-like Family Members in the Inactive State: Structural Analysis and Deduction of Tips for Modeling and Optimization.

    PubMed

    Pappalardo, Matteo; Rayan, Mahmoud; Abu-Lafi, Saleh; Leonardi, Martha E; Milardi, Danilo; Guccione, Salvatore; Rayan, Anwar

    2017-08-01

    Modeling G-Protein Coupled Receptors (GPCRs) is an emergent field of research, since utility of high-quality models in receptor structure-based strategies might facilitate the discovery of interesting drug candidates. The findings from a quantitative analysis of eighteen resolved structures of rhodopsin family "A" receptors crystallized with antagonists and 153 pairs of structures are described. A strategy termed endeca-amino acids fragmentation was used to analyze the structures models aiming to detect the relationship between sequence identity and Root Mean Square Deviation (RMSD) at each trans-membrane-domain. Moreover, we have applied the leave-one-out strategy to study the shiftiness likelihood of the helices. The type of correlation between sequence identity and RMSD was studied using the aforementioned set receptors as representatives of membrane proteins and 98 serine proteases with 4753 pairs of structures as representatives of globular proteins. Data analysis using fragmentation strategy revealed that there is some extent of correlation between sequence identity and global RMSD of 11AA width windows. However, spatial conservation is not always close to the endoplasmic side as was reported before. A comparative study with globular proteins shows that GPCRs have higher standard deviation and higher slope in the graph with correlation between sequence identity and RMSD. The extracted information disclosed in this paper could be incorporated in the modeling protocols while using technique for model optimization and refinement. © 2017 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.

  8. Protein purification and crystallization artifacts: The tale usually not told.

    PubMed

    Niedzialkowska, Ewa; Gasiorowska, Olga; Handing, Katarzyna B; Majorek, Karolina A; Porebski, Przemyslaw J; Shabalin, Ivan G; Zasadzinska, Ewelina; Cymborowski, Marcin; Minor, Wladek

    2016-03-01

    The misidentification of a protein sample, or contamination of a sample with the wrong protein, may be a potential reason for the non-reproducibility of experiments. This problem may occur in the process of heterologous overexpression and purification of recombinant proteins, as well as purification of proteins from natural sources. If the contaminated or misidentified sample is used for crystallization, in many cases the problem may not be detected until structures are determined. In the case of functional studies, the problem may not be detected for years. Here several procedures that can be successfully used for the identification of crystallized protein contaminants, including: (i) a lattice parameter search against known structures, (ii) sequence or fold identification from partially built models, and (iii) molecular replacement with common contaminants as search templates have been presented. A list of common contaminant structures to be used as alternative search models was provided. These methods were used to identify four cases of purification and crystallization artifacts. This report provides troubleshooting pointers for researchers facing difficulties in phasing or model building. © 2016 The Protein Society.

  9. Finding the target sites of RNA-binding proteins

    PubMed Central

    Li, Xiao; Kazan, Hilal; Lipshitz, Howard D; Morris, Quaid D

    2014-01-01

    RNA–protein interactions differ from DNA–protein interactions because of the central role of RNA secondary structure. Some RNA-binding domains (RBDs) recognize their target sites mainly by their shape and geometry and others are sequence-specific but are sensitive to secondary structure context. A number of small- and large-scale experimental approaches have been developed to measure RNAs associated in vitro and in vivo with RNA-binding proteins (RBPs). Generalizing outside of the experimental conditions tested by these assays requires computational motif finding. Often RBP motif finding is done by adapting DNA motif finding methods; but modeling secondary structure context leads to better recovery of RBP-binding preferences. Genome-wide assessment of mRNA secondary structure has recently become possible, but these data must be combined with computational predictions of secondary structure before they add value in predicting in vivo binding. There are two main approaches to incorporating structural information into motif models: supplementing primary sequence motif models with preferred secondary structure contexts (e.g., MEMERIS and RNAcontext) and directly modeling secondary structure recognized by the RBP using stochastic context-free grammars (e.g., CMfinder and RNApromo). The former better reconstruct known binding preferences for sequence-specific RBPs but are not suitable for modeling RBPs that recognize shape and geometry of RNAs. Future work in RBP motif finding should incorporate interactions between multiple RBDs and multiple RBPs in binding to RNA. WIREs RNA 2014, 5:111–130. doi: 10.1002/wrna.1201 PMID:24217996

  10. Structure and non-structure of centrosomal proteins.

    PubMed

    Dos Santos, Helena G; Abia, David; Janowski, Robert; Mortuza, Gulnahar; Bertero, Michela G; Boutin, Maïlys; Guarín, Nayibe; Méndez-Giraldez, Raúl; Nuñez, Alfonso; Pedrero, Juan G; Redondo, Pilar; Sanz, María; Speroni, Silvia; Teichert, Florian; Bruix, Marta; Carazo, José M; Gonzalez, Cayetano; Reina, José; Valpuesta, José M; Vernos, Isabelle; Zabala, Juan C; Montoya, Guillermo; Coll, Miquel; Bastolla, Ugo; Serrano, Luis

    2013-01-01

    Here we perform a large-scale study of the structural properties and the expression of proteins that constitute the human Centrosome. Centrosomal proteins tend to be larger than generic human proteins (control set), since their genes contain in average more exons (20.3 versus 14.6). They are rich in predicted disordered regions, which cover 57% of their length, compared to 39% in the general human proteome. They also contain several regions that are dually predicted to be disordered and coiled-coil at the same time: 55 proteins (15%) contain disordered and coiled-coil fragments that cover more than 20% of their length. Helices prevail over strands in regions homologous to known structures (47% predicted helical residues against 17% predicted as strands), and even more in the whole centrosomal proteome (52% against 7%), while for control human proteins 34.5% of the residues are predicted as helical and 12.8% are predicted as strands. This difference is mainly due to residues predicted as disordered and helical (30% in centrosomal and 9.4% in control proteins), which may correspond to alpha-helix forming molecular recognition features (α-MoRFs). We performed expression assays for 120 full-length centrosomal proteins and 72 domain constructs that we have predicted to be globular. These full-length proteins are often insoluble: Only 39 out of 120 expressed proteins (32%) and 19 out of 72 domains (26%) were soluble. We built or retrieved structural models for 277 out of 361 human proteins whose centrosomal localization has been experimentally verified. We could not find any suitable structural template with more than 20% sequence identity for 84 centrosomal proteins (23%), for which around 74% of the residues are predicted to be disordered or coiled-coils. The three-dimensional models that we built are available at http://ub.cbm.uam.es/centrosome/models/index.php.

  11. Unifying the microscopic picture of His-containing turns: from gas phase model peptides to crystallized proteins.

    PubMed

    Sohn, Woon Yong; Habka, Sana; Gloaguen, Eric; Mons, Michel

    2017-07-14

    The presence in crystallized proteins of a local anchoring between the side chain of a His residue, located in the central position of a γ- or β-turn, and its local main chain environment, was assessed by the comparison of protein structures with relevant isolated model peptides. Gas phase laser spectroscopy, combined with relevant quantum chemistry methods, was used to characterize the γ- and β-turn structures in these model peptides. A conformer-selective NH stretch infrared study provided evidence for the formation in vacuo of two types of short-range H-bonded motifs, labelled ε-6 δ and δ- δ 7/π H , bridging the His side chain (in its gauche+ rotamer) to the neighbouring NH(i) and CO(i) sites of the backbone; each side chain-backbone motif was found to be specific of the tautomer (ε or δ) adopted by the His side chain in its neutral form. A close comparison between β- and γ-turns, selected from the Protein Data Bank, and the gas phase models demonstrated that a significant proportion of the gauche+ His rotamer distribution of proteins was well described by the corresponding gas phase H-bonded structures. This is consistent with the persistence of local 6 δ and δ 7/π H intramolecular interactions in proteins, emphasizing the relevance of gas phase data to secondary structures that are poorly accessible to solvents, e.g., in the case of a specific compact topology (Xxx-His β-turns). Deviations from the gas phase structures were also observed, mainly in His-Xxx β-turns, and assigned to solvent accessible turn structures. They were well accounted for by theoretical models of microhydrated turns, in which a few solvent molecules take over the gas phase motifs, constituting a water-mediated local anchoring of the His side chain to the backbone. Finally, the present gas phase benchmark models also pinpointed weaknesses in the protein structure determination by X-ray diffraction analysis; in particular, besides the lack of tautomer information, inaccuracies in the description of imidazole ring flip rotamerism were identified.

  12. Rule-based modeling and simulations of the inner kinetochore structure.

    PubMed

    Tschernyschkow, Sergej; Herda, Sabine; Gruenert, Gerd; Döring, Volker; Görlich, Dennis; Hofmeister, Antje; Hoischen, Christian; Dittrich, Peter; Diekmann, Stephan; Ibrahim, Bashar

    2013-09-01

    Combinatorial complexity is a central problem when modeling biochemical reaction networks, since the association of a few components can give rise to a large variation of protein complexes. Available classical modeling approaches are often insufficient for the analysis of very large and complex networks in detail. Recently, we developed a new rule-based modeling approach that facilitates the analysis of spatial and combinatorially complex problems. Here, we explore for the first time how this approach can be applied to a specific biological system, the human kinetochore, which is a multi-protein complex involving over 100 proteins. Applying our freely available SRSim software to a large data set on kinetochore proteins in human cells, we construct a spatial rule-based simulation model of the human inner kinetochore. The model generates an estimation of the probability distribution of the inner kinetochore 3D architecture and we show how to analyze this distribution using information theory. In our model, the formation of a bridge between CenpA and an H3 containing nucleosome only occurs efficiently for higher protein concentration realized during S-phase but may be not in G1. Above a certain nucleosome distance the protein bridge barely formed pointing towards the importance of chromatin structure for kinetochore complex formation. We define a metric for the distance between structures that allow us to identify structural clusters. Using this modeling technique, we explore different hypothetical chromatin layouts. Applying a rule-based network analysis to the spatial kinetochore complex geometry allowed us to integrate experimental data on kinetochore proteins, suggesting a 3D model of the human inner kinetochore architecture that is governed by a combinatorial algebraic reaction network. This reaction network can serve as bridge between multiple scales of modeling. Our approach can be applied to other systems beyond kinetochores. Copyright © 2013 Elsevier Ltd. All rights reserved.

  13. Alanine and proline content modulate global sensitivity to discrete perturbations in disordered proteins.

    PubMed

    Perez, Romel B; Tischer, Alexander; Auton, Matthew; Whitten, Steven T

    2014-12-01

    Molecular transduction of biological signals is understood primarily in terms of the cooperative structural transitions of protein macromolecules, providing a mechanism through which discrete local structure perturbations affect global macromolecular properties. The recognition that proteins lacking tertiary stability, commonly referred to as intrinsically disordered proteins (IDPs), mediate key signaling pathways suggests that protein structures without cooperative intramolecular interactions may also have the ability to couple local and global structure changes. Presented here are results from experiments that measured and tested the ability of disordered proteins to couple local changes in structure to global changes in structure. Using the intrinsically disordered N-terminal region of the p53 protein as an experimental model, a set of proline (PRO) and alanine (ALA) to glycine (GLY) substitution variants were designed to modulate backbone conformational propensities without introducing non-native intramolecular interactions. The hydrodynamic radius (R(h)) was used to monitor changes in global structure. Circular dichroism spectroscopy showed that the GLY substitutions decreased polyproline II (PP(II)) propensities relative to the wild type, as expected, and fluorescence methods indicated that substitution-induced changes in R(h) were not associated with folding. The experiments showed that changes in local PP(II) structure cause changes in R(h) that are variable and that depend on the intrinsic chain propensities of PRO and ALA residues, demonstrating a mechanism for coupling local and global structure changes. Molecular simulations that model our results were used to extend the analysis to other proteins and illustrate the generality of the observed PRO and alanine effects on the structures of IDPs. © 2014 Wiley Periodicals, Inc.

  14. Modeling helical proteins using residual dipolar couplings, sparse long-range distance constraints and a simple residue-based force field

    PubMed Central

    Eggimann, Becky L.; Vostrikov, Vitaly V.; Veglia, Gianluigi; Siepmann, J. Ilja

    2013-01-01

    We present a fast and simple protocol to obtain moderate-resolution backbone structures of helical proteins. This approach utilizes a combination of sparse backbone NMR data (residual dipolar couplings and paramagnetic relaxation enhancements) or EPR data with a residue-based force field and Monte Carlo/simulated annealing protocol to explore the folding energy landscape of helical proteins. By using only backbone NMR data, which are relatively easy to collect and analyze, and strategically placed spin relaxation probes, we show that it is possible to obtain protein structures with correct helical topology and backbone RMS deviations well below 4 Å. This approach offers promising alternatives for the structural determination of proteins in which nuclear Overha-user effect data are difficult or impossible to assign and produces initial models that will speed up the high-resolution structure determination by NMR spectroscopy. PMID:24639619

  15. Reduced Point Charge Models of Proteins: Effect of Protein-Water Interactions in Molecular Dynamics Simulations of Ubiquitin Systems.

    PubMed

    Leherte, Laurence; Vercauteren, Daniel P

    2017-10-26

    We investigate the influence of various solvent models on the structural stability and protein-water interface of three ubiquitin complexes (PDB access codes: 1Q0W , 2MBB , 2G3Q ) modeled using the Amber99sb force field (FF) and two different point charge distributions. A previously developed reduced point charge model (RPCM), wherein each amino acid residue is described by a limited number of point charges, is tested and compared to its all-atom (AA) version. The complexes are solvated in TIP4P-Ew or TIP3P type water molecules, involving either the scaling of the Lennard-Jones protein-O water interaction parameters, or the coarse-grain (CG) SIRAH water description. The best agreements between the RPCM and AA models were obtained for structural, protein-water, and ligand-ubiquitin properties when using the TIP4P-Ew water FF with a scaling factor γ of 0.7. At the RPCM level, a decrease in γ, or the inclusion of SIRAH particles, allows weakening of the protein-water interactions. It results in a slight collapse of the protein structure and a less compact hydration shell and, thus, in a decrease in the number of protein-water and water-water H-bonds. The dynamics of the surface protein atoms and of the water shell molecules are also slightly refrained, which allow the generation of stable RPCM trajectories.

  16. Structure prediction of the second extracellular loop in G-protein-coupled receptors.

    PubMed

    Kmiecik, Sebastian; Jamroz, Michal; Kolinski, Michal

    2014-06-03

    G-protein-coupled receptors (GPCRs) play key roles in living organisms. Therefore, it is important to determine their functional structures. The second extracellular loop (ECL2) is a functionally important region of GPCRs, which poses significant challenge for computational structure prediction methods. In this work, we evaluated CABS, a well-established protein modeling tool for predicting ECL2 structure in 13 GPCRs. The ECL2s (with between 13 and 34 residues) are predicted in an environment of other extracellular loops being fully flexible and the transmembrane domain fixed in its x-ray conformation. The modeling procedure used theoretical predictions of ECL2 secondary structure and experimental constraints on disulfide bridges. Our approach yielded ensembles of low-energy conformers and the most populated conformers that contained models close to the available x-ray structures. The level of similarity between the predicted models and x-ray structures is comparable to that of other state-of-the-art computational methods. Our results extend other studies by including newly crystallized GPCRs. Copyright © 2014 The Authors. Published by Elsevier Inc. All rights reserved.

  17. Implementation of a parallel protein structure alignment service on cloud.

    PubMed

    Hung, Che-Lun; Lin, Yaw-Ling

    2013-01-01

    Protein structure alignment has become an important strategy by which to identify evolutionary relationships between protein sequences. Several alignment tools are currently available for online comparison of protein structures. In this paper, we propose a parallel protein structure alignment service based on the Hadoop distribution framework. This service includes a protein structure alignment algorithm, a refinement algorithm, and a MapReduce programming model. The refinement algorithm refines the result of alignment. To process vast numbers of protein structures in parallel, the alignment and refinement algorithms are implemented using MapReduce. We analyzed and compared the structure alignments produced by different methods using a dataset randomly selected from the PDB database. The experimental results verify that the proposed algorithm refines the resulting alignments more accurately than existing algorithms. Meanwhile, the computational performance of the proposed service is proportional to the number of processors used in our cloud platform.

  18. Implementation of a Parallel Protein Structure Alignment Service on Cloud

    PubMed Central

    Hung, Che-Lun; Lin, Yaw-Ling

    2013-01-01

    Protein structure alignment has become an important strategy by which to identify evolutionary relationships between protein sequences. Several alignment tools are currently available for online comparison of protein structures. In this paper, we propose a parallel protein structure alignment service based on the Hadoop distribution framework. This service includes a protein structure alignment algorithm, a refinement algorithm, and a MapReduce programming model. The refinement algorithm refines the result of alignment. To process vast numbers of protein structures in parallel, the alignment and refinement algorithms are implemented using MapReduce. We analyzed and compared the structure alignments produced by different methods using a dataset randomly selected from the PDB database. The experimental results verify that the proposed algorithm refines the resulting alignments more accurately than existing algorithms. Meanwhile, the computational performance of the proposed service is proportional to the number of processors used in our cloud platform. PMID:23671842

  19. All-atom 3D structure prediction of transmembrane β-barrel proteins from sequences.

    PubMed

    Hayat, Sikander; Sander, Chris; Marks, Debora S; Elofsson, Arne

    2015-04-28

    Transmembrane β-barrels (TMBs) carry out major functions in substrate transport and protein biogenesis but experimental determination of their 3D structure is challenging. Encouraged by successful de novo 3D structure prediction of globular and α-helical membrane proteins from sequence alignments alone, we developed an approach to predict the 3D structure of TMBs. The approach combines the maximum-entropy evolutionary coupling method for predicting residue contacts (EVfold) with a machine-learning approach (boctopus2) for predicting β-strands in the barrel. In a blinded test for 19 TMB proteins of known structure that have a sufficient number of diverse homologous sequences available, this combined method (EVfold_bb) predicts hydrogen-bonded residue pairs between adjacent β-strands at an accuracy of ∼70%. This accuracy is sufficient for the generation of all-atom 3D models. In the transmembrane barrel region, the average 3D structure accuracy [template-modeling (TM) score] of top-ranked models is 0.54 (ranging from 0.36 to 0.85), with a higher (44%) number of residue pairs in correct strand-strand registration than in earlier methods (18%). Although the nonbarrel regions are predicted less accurately overall, the evolutionary couplings identify some highly constrained loop residues and, for FecA protein, the barrel including the structure of a plug domain can be accurately modeled (TM score = 0.68). Lower prediction accuracy tends to be associated with insufficient sequence information and we therefore expect increasing numbers of β-barrel families to become accessible to accurate 3D structure prediction as the number of available sequences increases.

  20. Structural elucidation of the interaction between neurodegenerative disease-related tau protein with model lipid membranes

    NASA Astrophysics Data System (ADS)

    Jones, Emmalee M.

    A protein's sequence of amino acids determines how it folds. That folded structure is linked to protein function, and misfolding to dysfunction. Protein misfolding and aggregation into beta-sheet rich fibrillar aggregates is connected with over 20 neurodegenerative diseases, including Alzheimer's disease (AD). AD is characterized in part by misfolding, aggregation and deposition of the microtubule associated tau protein into neurofibrillary tangles (NFTs). However, two questions remain: What is tau's fibrillization mechanism, and what is tau's cytotoxicity mechanism? Tau is prone to heterogeneous interactions, including with lipid membranes. Lipids have been found in NFTs, anionic lipid vesicles induced aggregation of the microtubule binding domain of tau, and other protein aggregates induced ion permeability in cells. This evidence prompted our investigation of tau's interaction with model lipid membranes to elucidate the structural perturbations those interactions induced in tau protein and in the membrane. We show that although tau is highly charged and soluble, it is highly surface active and preferentially interacts with anionic membranes. To resolve molecular-scale structural details of tau and model membranes, we utilized X-ray and neutron scattering techniques. X-ray reflectivity indicated tau aggregated at air/water and anionic lipid membrane interfaces and penetrated into membranes. More significantly, membrane interfaces induced tau protein to partially adopt a more compact conformation with density similar to folded protein and ordered structure characteristic of beta-sheet formation. This suggests possible membrane-based mechanisms of tau aggregation. Membrane morphological changes were seen using fluorescence microscopy, and X-ray scattering techniques showed tau completely disrupts anionic membranes, suggesting an aggregate-based cytotoxicity mechanism. Further investigation of protein constructs and a "hyperphosphorylation" disease mimic helped clarify the role of the microtubule binding domain in anionic lipid affinity and demonstrated even "hyperphosphorylation" did not prevent interaction with anionic membranes. Additional studies investigated more complex membrane models to increase physiological relevance. These insights revealed structural changes in tau protein and lipid membranes after interaction. We observed tau's affinity for interfaces, and aggregation and compaction once tau partitions to interfaces. We observed the beginnings of beta-sheet formation in tau at anionic lipid membranes. We also examined disruption to the membrane on a molecular scale.

  1. GALT protein database: querying structural and functional features of GALT enzyme.

    PubMed

    d'Acierno, Antonio; Facchiano, Angelo; Marabotti, Anna

    2014-09-01

    Knowledge of the impact of variations on protein structure can enhance the comprehension of the mechanisms of genetic diseases related to that protein. Here, we present a new version of GALT Protein Database, a Web-accessible data repository for the storage and interrogation of structural effects of variations of the enzyme galactose-1-phosphate uridylyltransferase (GALT), the impairment of which leads to classic Galactosemia, a rare genetic disease. This new version of this database now contains the models of 201 missense variants of GALT enzyme, including heterozygous variants, and it allows users not only to retrieve information about the missense variations affecting this protein, but also to investigate their impact on substrate binding, intersubunit interactions, stability, and other structural features. In addition, it allows the interactive visualization of the models of variants collected into the database. We have developed additional tools to improve the use of the database by nonspecialized users. This Web-accessible database (http://bioinformatica.isa.cnr.it/GALT/GALT2.0) represents a model of tools potentially suitable for application to other proteins that are involved in human pathologies and that are subjected to genetic variations. © 2014 WILEY PERIODICALS, INC.

  2. The protein structure prediction problem could be solved using the current PDB library

    PubMed Central

    Zhang, Yang; Skolnick, Jeffrey

    2005-01-01

    For single-domain proteins, we examine the completeness of the structures in the current Protein Data Bank (PDB) library for use in full-length model construction of unknown sequences. To address this issue, we employ a comprehensive benchmark set of 1,489 medium-size proteins that cover the PDB at the level of 35% sequence identity and identify templates by structure alignment. With homologous proteins excluded, we can always find similar folds to native with an average rms deviation (RMSD) from native of 2.5 Å with ≈82% alignment coverage. These template structures often contain a significant number of insertions/deletions. The tasser algorithm was applied to build full-length models, where continuous fragments are excised from the top-scoring templates and reassembled under the guide of an optimized force field, which includes consensus restraints taken from the templates and knowledge-based statistical potentials. For almost all targets (except for 2/1,489), the resultant full-length models have an RMSD to native below 6 Å (97% of them below 4 Å). On average, the RMSD of full-length models is 2.25 Å, with aligned regions improved from 2.5 Å to 1.88 Å, comparable with the accuracy of low-resolution experimental structures. Furthermore, starting from state-of-the-art structural alignments, we demonstrate a methodology that can consistently bring template-based alignments closer to native. These results are highly suggestive that the protein-folding problem can in principle be solved based on the current PDB library by developing efficient fold recognition algorithms that can recover such initial alignments. PMID:15653774

  3. Generation of wavy structure on lipid membrane by peripheral proteins: a linear elastic analysis.

    PubMed

    Mahata, Paritosh; Das, Sovan Lal

    2017-05-01

    We carry out a linear elastic analysis to study wavy structure generation on lipid membrane by peripheral membrane proteins. We model the lipid membrane as linearly elastic and anisotropic material. The hydrophobic insertion by proteins into the lipid membrane has been idealized as penetration of rigid rod-like inclusions into the membrane and the electrostatic interaction between protein and membrane has been modeled by a distributed surface traction acting on the membrane surface. With the proposed model we study curvature generation by several binding domains of peripheral membrane proteins containing BAR domains and amphipathic alpha-helices. It is observed that electrostatic interaction is essential for curvature generation by the BAR domains. © 2017 Federation of European Biochemical Societies.

  4. Toward a blueprint for UDP-glucose pyrophosphorylase structure/function properties: homology-modeling analyses.

    PubMed

    Geisler, Matt; Wilczynska, Malgorzata; Karpinski, Stanislaw; Kleczkowski, Leszek A

    2004-11-01

    UDP-glucose pyrophosphorylase (UGPase) is an important enzyme of synthesis of sucrose, cellulose, and several other polysaccharides in all plants. The protein is evolutionarily conserved among eukaryotes, but has little relation, aside from its catalytic reaction, to UGPases of prokaryotic origin. Using protein homology modeling strategy, 3D structures for barley, poplar, and Arabidopsis UGPases have been derived, based on recently published crystal structure of human UDP-N-acetylglucosamine pyrophosphorylase. The derived 3D structures correspond to a bowl-shaped protein with the active site at a central groove, and a C-terminal domain that includes a loop (I-loop) possibly involved in dimerization. Data on a plethora of earlier described UGPase mutants from a variety of eukaryotic organisms have been revisited, and we have, in most cases, verified the role of each mutation in enzyme catalysis/regulation/structural integrity. We have also found that one of two alternatively spliced forms of poplar UGPase has a very short I-loop, suggesting differences in oligomerization ability of the two isozymes. The derivation of the structural model for plant UGPase should serve as a useful blueprint for further function/structure studies on this protein.

  5. Bio-mimicking galactose oxidase and hemocyanin, two dioxygen-processing copper proteins.

    PubMed

    Gamez, Patrick; Koval, Iryna A; Reedijk, Jan

    2004-12-21

    The modelling of the active sites of metalloproteins is one of the most challenging tasks in bio-inorganic chemistry. Copper proteins form part of this stimulating field of research as copper enzymes are mainly involved in oxidation bio-reactions. Thus, the understanding of the structure-function relationship of their active sites will allow the design of effective and environmental friendly oxidation catalysts. This perspective illustrates some outstanding structural and functional synthetic models of the active site of copper proteins, with special attention given to models of galactose oxidase and hemocyanin.

  6. 3Drefine: an interactive web server for efficient protein structure refinement

    PubMed Central

    Bhattacharya, Debswapna; Nowotny, Jackson; Cao, Renzhi; Cheng, Jianlin

    2016-01-01

    3Drefine is an interactive web server for consistent and computationally efficient protein structure refinement with the capability to perform web-based statistical and visual analysis. The 3Drefine refinement protocol utilizes iterative optimization of hydrogen bonding network combined with atomic-level energy minimization on the optimized model using a composite physics and knowledge-based force fields for efficient protein structure refinement. The method has been extensively evaluated on blind CASP experiments as well as on large-scale and diverse benchmark datasets and exhibits consistent improvement over the initial structure in both global and local structural quality measures. The 3Drefine web server allows for convenient protein structure refinement through a text or file input submission, email notification, provided example submission and is freely available without any registration requirement. The server also provides comprehensive analysis of submissions through various energy and statistical feedback and interactive visualization of multiple refined models through the JSmol applet that is equipped with numerous protein model analysis tools. The web server has been extensively tested and used by many users. As a result, the 3Drefine web server conveniently provides a useful tool easily accessible to the community. The 3Drefine web server has been made publicly available at the URL: http://sysbio.rnet.missouri.edu/3Drefine/. PMID:27131371

  7. What are the structural features that drive partitioning of proteins in aqueous two-phase systems?

    PubMed

    Wu, Zhonghua; Hu, Gang; Wang, Kui; Zaslavsky, Boris Yu; Kurgan, Lukasz; Uversky, Vladimir N

    2017-01-01

    Protein partitioning in aqueous two-phase systems (ATPSs) represents a convenient, inexpensive, and easy to scale-up protein separation technique. Since partition behavior of a protein dramatically depends on an ATPS composition, it would be highly beneficial to have reliable means for (even qualitative) prediction of partitioning of a target protein under different conditions. Our aim was to understand which structural features of proteins contribute to partitioning of a query protein in a given ATPS. We undertook a systematic empirical analysis of relations between 57 numerical structural descriptors derived from the corresponding amino acid sequences and crystal structures of 10 well-characterized proteins and the partition behavior of these proteins in 29 different ATPSs. This analysis revealed that just a few structural characteristics of proteins can accurately determine behavior of these proteins in a given ATPS. However, partition behavior of proteins in different ATPSs relies on different structural features. In other words, we could not find a unique set of protein structural features derived from their crystal structures that could be used for the description of the protein partition behavior of all proteins in all ATPSs analyzed in this study. We likely need to gain better insight into relationships between protein-solvent interactions and protein structure peculiarities, in particular given limitations of the used here crystal structures, to be able to construct a model that accurately predicts protein partition behavior across all ATPSs. Copyright © 2016 Elsevier B.V. All rights reserved.

  8. Accurate disulfide-bonding network predictions improve ab initio structure prediction of cysteine-rich proteins

    PubMed Central

    Yang, Jing; He, Bao-Ji; Jang, Richard; Zhang, Yang; Shen, Hong-Bin

    2015-01-01

    Abstract Motivation: Cysteine-rich proteins cover many important families in nature but there are currently no methods specifically designed for modeling the structure of these proteins. The accuracy of disulfide connectivity pattern prediction, particularly for the proteins of higher-order connections, e.g. >3 bonds, is too low to effectively assist structure assembly simulations. Results: We propose a new hierarchical order reduction protocol called Cyscon for disulfide-bonding prediction. The most confident disulfide bonds are first identified and bonding prediction is then focused on the remaining cysteine residues based on SVR training. Compared with purely machine learning-based approaches, Cyscon improved the average accuracy of connectivity pattern prediction by 21.9%. For proteins with more than 5 disulfide bonds, Cyscon improved the accuracy by 585% on the benchmark set of PDBCYS. When applied to 158 non-redundant cysteine-rich proteins, Cyscon predictions helped increase (or decrease) the TM-score (or RMSD) of the ab initio QUARK modeling by 12.1% (or 14.4%). This result demonstrates a new avenue to improve the ab initio structure modeling for cysteine-rich proteins. Availability and implementation: http://www.csbio.sjtu.edu.cn/bioinf/Cyscon/ Contact: zhng@umich.edu or hbshen@sjtu.edu.cn Supplementary information: Supplementary data are available at Bioinformatics online. PMID:26254435

  9. Design of structurally distinct proteins using strategies inspired by evolution

    DOE PAGES

    Jacobs, T. M.; Williams, B.; Williams, T.; ...

    2016-05-06

    Natural recombination combines pieces of preexisting proteins to create new tertiary structures and functions. In this paper, we describe a computational protocol, called SEWING, which is inspired by this process and builds new proteins from connected or disconnected pieces of existing structures. Helical proteins designed with SEWING contain structural features absent from other de novo designed proteins and, in some cases, remain folded at more than 100°C. High-resolution structures of the designed proteins CA01 and DA05R1 were solved by x-ray crystallography (2.2 angstrom resolution) and nuclear magnetic resonance, respectively, and there was excellent agreement with the design models. Finally, thismore » method provides a new strategy to rapidly create large numbers of diverse and designable protein scaffolds.« less

  10. Protein Modelling: What Happened to the “Protein Structure Gap”?

    PubMed Central

    Schwede, Torsten

    2013-01-01

    Computational modeling and prediction of three-dimensional macromolecular structures and complexes from their sequence has been a long standing vision in structural biology as it holds the promise to bypass part of the laborious process of experimental structure solution. Over the last two decades, a paradigm shift has occurred: starting from a situation where the “structure knowledge gap” between the huge number of protein sequences and small number of known structures has hampered the widespread use of structure-based approaches in life science research, today some form of structural information – either experimental or computational – is available for the majority of amino acids encoded by common model organism genomes. Template based homology modeling techniques have matured to a point where they are now routinely used to complement experimental techniques. With the scientific focus of interest moving towards larger macromolecular complexes and dynamic networks of interactions, the integration of computational modeling methods with low-resolution experimental techniques allows studying large and complex molecular machines. Computational modeling and prediction techniques are still facing a number of challenges which hamper the more widespread use by the non-expert scientist. For example, it is often difficult to convey the underlying assumptions of a computational technique, as well as the expected accuracy and structural variability of a specific model. However, these aspects are crucial to understand the limitations of a model, and to decide which interpretations and conclusions can be supported. PMID:24010712

  11. Evolutionary Dynamics on Protein Bi-stability Landscapes can Potentially Resolve Adaptive Conflicts

    PubMed Central

    Sikosek, Tobias; Bornberg-Bauer, Erich; Chan, Hue Sun

    2012-01-01

    Experimental studies have shown that some proteins exist in two alternative native-state conformations. It has been proposed that such bi-stable proteins can potentially function as evolutionary bridges at the interface between two neutral networks of protein sequences that fold uniquely into the two different native conformations. Under adaptive conflict scenarios, bi-stable proteins may be of particular advantage if they simultaneously provide two beneficial biological functions. However, computational models that simulate protein structure evolution do not yet recognize the importance of bi-stability. Here we use a biophysical model to analyze sequence space to identify bi-stable or multi-stable proteins with two or more equally stable native-state structures. The inclusion of such proteins enhances phenotype connectivity between neutral networks in sequence space. Consideration of the sequence space neighborhood of bridge proteins revealed that bi-stability decreases gradually with each mutation that takes the sequence further away from an exactly bi-stable protein. With relaxed selection pressures, we found that bi-stable proteins in our model are highly successful under simulated adaptive conflict. Inspired by these model predictions, we developed a method to identify real proteins in the PDB with bridge-like properties, and have verified a clear bi-stability gradient for a series of mutants studied by Alexander et al. (Proc Nat Acad Sci USA 2009, 106:21149–21154) that connect two sequences that fold uniquely into two different native structures via a bridge-like intermediate mutant sequence. Based on these findings, new testable predictions for future studies on protein bi-stability and evolution are discussed. PMID:23028272

  12. PRISM-EM: template interface-based modelling of multi-protein complexes guided by cryo-electron microscopy density maps.

    PubMed

    Kuzu, Guray; Keskin, Ozlem; Nussinov, Ruth; Gursoy, Attila

    2016-10-01

    The structures of protein assemblies are important for elucidating cellular processes at the molecular level. Three-dimensional electron microscopy (3DEM) is a powerful method to identify the structures of assemblies, especially those that are challenging to study by crystallography. Here, a new approach, PRISM-EM, is reported to computationally generate plausible structural models using a procedure that combines crystallographic structures and density maps obtained from 3DEM. The predictions are validated against seven available structurally different crystallographic complexes. The models display mean deviations in the backbone of <5 Å. PRISM-EM was further tested on different benchmark sets; the accuracy was evaluated with respect to the structure of the complex, and the correlation with EM density maps and interface predictions were evaluated and compared with those obtained using other methods. PRISM-EM was then used to predict the structure of the ternary complex of the HIV-1 envelope glycoprotein trimer, the ligand CD4 and the neutralizing protein m36.

  13. Prediction of homoprotein and heteroprotein complexes by protein docking and template‐based modeling: A CASP‐CAPRI experiment

    PubMed Central

    Velankar, Sameer; Kryshtafovych, Andriy; Huang, Shen‐You; Schneidman‐Duhovny, Dina; Sali, Andrej; Segura, Joan; Fernandez‐Fuentes, Narcis; Viswanath, Shruthi; Elber, Ron; Grudinin, Sergei; Popov, Petr; Neveu, Emilie; Lee, Hasup; Baek, Minkyung; Park, Sangwoo; Heo, Lim; Rie Lee, Gyu; Seok, Chaok; Qin, Sanbo; Zhou, Huan‐Xiang; Ritchie, David W.; Maigret, Bernard; Devignes, Marie‐Dominique; Ghoorah, Anisah; Torchala, Mieczyslaw; Chaleil, Raphaël A.G.; Bates, Paul A.; Ben‐Zeev, Efrat; Eisenstein, Miriam; Negi, Surendra S.; Weng, Zhiping; Vreven, Thom; Pierce, Brian G.; Borrman, Tyler M.; Yu, Jinchao; Ochsenbein, Françoise; Guerois, Raphaël; Vangone, Anna; Rodrigues, João P.G.L.M.; van Zundert, Gydo; Nellen, Mehdi; Xue, Li; Karaca, Ezgi; Melquiond, Adrien S.J.; Visscher, Koen; Kastritis, Panagiotis L.; Bonvin, Alexandre M.J.J.; Xu, Xianjin; Qiu, Liming; Yan, Chengfei; Li, Jilong; Ma, Zhiwei; Cheng, Jianlin; Zou, Xiaoqin; Shen, Yang; Peterson, Lenna X.; Kim, Hyung‐Rae; Roy, Amit; Han, Xusi; Esquivel‐Rodriguez, Juan; Kihara, Daisuke; Yu, Xiaofeng; Bruce, Neil J.; Fuller, Jonathan C.; Wade, Rebecca C.; Anishchenko, Ivan; Kundrotas, Petras J.; Vakser, Ilya A.; Imai, Kenichiro; Yamada, Kazunori; Oda, Toshiyuki; Nakamura, Tsukasa; Tomii, Kentaro; Pallara, Chiara; Romero‐Durana, Miguel; Jiménez‐García, Brian; Moal, Iain H.; Férnandez‐Recio, Juan; Joung, Jong Young; Kim, Jong Yun; Joo, Keehyoung; Lee, Jooyoung; Kozakov, Dima; Vajda, Sandor; Mottarella, Scott; Hall, David R.; Beglov, Dmitri; Mamonov, Artem; Xia, Bing; Bohnuud, Tanggis; Del Carpio, Carlos A.; Ichiishi, Eichiro; Marze, Nicholas; Kuroda, Daisuke; Roy Burman, Shourya S.; Gray, Jeffrey J.; Chermak, Edrisse; Cavallo, Luigi; Oliva, Romina; Tovchigrechko, Andrey

    2016-01-01

    ABSTRACT We present the results for CAPRI Round 30, the first joint CASP‐CAPRI experiment, which brought together experts from the protein structure prediction and protein–protein docking communities. The Round comprised 25 targets from amongst those submitted for the CASP11 prediction experiment of 2014. The targets included mostly homodimers, a few homotetramers, and two heterodimers, and comprised protein chains that could readily be modeled using templates from the Protein Data Bank. On average 24 CAPRI groups and 7 CASP groups submitted docking predictions for each target, and 12 CAPRI groups per target participated in the CAPRI scoring experiment. In total more than 9500 models were assessed against the 3D structures of the corresponding target complexes. Results show that the prediction of homodimer assemblies by homology modeling techniques and docking calculations is quite successful for targets featuring large enough subunit interfaces to represent stable associations. Targets with ambiguous or inaccurate oligomeric state assignments, often featuring crystal contact‐sized interfaces, represented a confounding factor. For those, a much poorer prediction performance was achieved, while nonetheless often providing helpful clues on the correct oligomeric state of the protein. The prediction performance was very poor for genuine tetrameric targets, where the inaccuracy of the homology‐built subunit models and the smaller pair‐wise interfaces severely limited the ability to derive the correct assembly mode. Our analysis also shows that docking procedures tend to perform better than standard homology modeling techniques and that highly accurate models of the protein components are not always required to identify their association modes with acceptable accuracy. Proteins 2016; 84(Suppl 1):323–348. © 2016 The Authors Proteins: Structure, Function, and Bioinformatics Published by Wiley Periodicals, Inc. PMID:27122118

  14. Assessing the Chemical Accuracy of Protein Structures via Peptide Acidity

    PubMed Central

    Anderson, Janet S.; Hernández, Griselda; LeMaster, David M.

    2012-01-01

    Although the protein native state is a Boltzmann conformational ensemble, practical applications often require a representative model from the most populated region of that distribution. The acidity of the backbone amides, as reflected in hydrogen exchange rates, is exquisitely sensitive to the surrounding charge and dielectric volume distribution. For each of four proteins, three independently determined X-ray structures of differing crystallographic resolution were used to predict exchange for the static solvent-exposed amide hydrogens. The average correlation coefficients range from 0.74 for ubiquitin to 0.93 for Pyrococcus furiosus rubredoxin, reflecting the larger range of experimental exchange rates exhibited by the latter protein. The exchange prediction errors modestly correlate with the crystallographic resolution. MODELLER 9v6-derived homology models at ~60% sequence identity (36% identity for chymotrypsin inhibitor CI2) yielded correlation coefficients that are ~0.1 smaller than for the cognate X-ray structures. The most recently deposited NOE-based ubiquitin structure and the original NMR structure of CI2 fail to provide statistically significant predictions of hydrogen exchange. However, the more recent RECOORD refinement study of CI2 yielded predictions comparable to the X-ray and homology model-based analyses. PMID:23182463

  15. PSPP: A Protein Structure Prediction Pipeline for Computing Clusters

    DTIC Science & Technology

    2009-07-01

    Evanseck JD, et al. (1998) All-atom empirical potential for molecular modeling and dynamics studies of proteins. Journal of Physical Chemistry B 102...dimensional (3-D) protein structures are critical for the understanding of molecular mechanisms of living systems. Traditionally, X-ray crystallography...disordered proteins are often responsible for molecular recognition, molecular assembly, protein modifica- tion, and entropic chain activities in organisms [26

  16. Intrinsically disordered proteins--relation to general model expressing the active role of the water environment.

    PubMed

    Kalinowska, Barbara; Banach, Mateusz; Konieczny, Leszek; Marchewka, Damian; Roterman, Irena

    2014-01-01

    This work discusses the role of unstructured polypeptide chain fragments in shaping the protein's hydrophobic core. Based on the "fuzzy oil drop" model, which assumes an idealized distribution of hydrophobicity density described by the 3D Gaussian, we can determine which fragments make up the core and pinpoint residues whose location conflicts with theoretical predictions. We show that the structural influence of the water environment determines the positions of disordered fragments, leading to the formation of a hydrophobic core overlaid by a hydrophilic mantle. This phenomenon is further described by studying selected proteins which are known to be unstable and contain intrinsically disordered fragments. Their properties are established quantitatively, explaining the causative relation between the protein's structure and function and facilitating further comparative analyses of various structural models. © 2014 Elsevier Inc. All rights reserved.

  17. Gi- and Gs-coupled GPCRs show different modes of G-protein binding.

    PubMed

    Van Eps, Ned; Altenbach, Christian; Caro, Lydia N; Latorraca, Naomi R; Hollingsworth, Scott A; Dror, Ron O; Ernst, Oliver P; Hubbell, Wayne L

    2018-03-06

    More than two decades ago, the activation mechanism for the membrane-bound photoreceptor and prototypical G protein-coupled receptor (GPCR) rhodopsin was uncovered. Upon light-induced changes in ligand-receptor interaction, movement of specific transmembrane helices within the receptor opens a crevice at the cytoplasmic surface, allowing for coupling of heterotrimeric guanine nucleotide-binding proteins (G proteins). The general features of this activation mechanism are conserved across the GPCR superfamily. Nevertheless, GPCRs have selectivity for distinct G-protein family members, but the mechanism of selectivity remains elusive. Structures of GPCRs in complex with the stimulatory G protein, G s , and an accessory nanobody to stabilize the complex have been reported, providing information on the intermolecular interactions. However, to reveal the structural selectivity filters, it will be necessary to determine GPCR-G protein structures involving other G-protein subtypes. In addition, it is important to obtain structures in the absence of a nanobody that may influence the structure. Here, we present a model for a rhodopsin-G protein complex derived from intermolecular distance constraints between the activated receptor and the inhibitory G protein, G i , using electron paramagnetic resonance spectroscopy and spin-labeling methodologies. Molecular dynamics simulations demonstrated the overall stability of the modeled complex. In the rhodopsin-G i complex, G i engages rhodopsin in a manner distinct from previous GPCR-G s structures, providing insight into specificity determinants. Copyright © 2018 the Author(s). Published by PNAS.

  18. Investigating the Structural Compaction of Biomolecules Upon Transition to the Gas-Phase Using ESI-TWIMS-MS.

    PubMed

    Devine, Paul W A; Fisher, Henry C; Calabrese, Antonio N; Whelan, Fiona; Higazi, Daniel R; Potts, Jennifer R; Lowe, David C; Radford, Sheena E; Ashcroft, Alison E

    2017-09-01

    Collision cross-section (CCS) measurements obtained from ion mobility spectrometry-mass spectrometry (IMS-MS) analyses often provide useful information concerning a protein's size and shape and can be complemented by modeling procedures. However, there have been some concerns about the extent to which certain proteins maintain a native-like conformation during the gas-phase analysis, especially proteins with dynamic or extended regions. Here we have measured the CCSs of a range of biomolecules including non-globular proteins and RNAs of different sequence, size, and stability. Using traveling wave IMS-MS, we show that for the proteins studied, the measured CCS deviates significantly from predicted CCS values based upon currently available structures. The results presented indicate that these proteins collapse to different extents varying on their elongated structures upon transition into the gas-phase. Comparing two RNAs of similar mass but different solution structures, we show that these biomolecules may also be susceptible to gas-phase compaction. Together, the results suggest that caution is needed when predicting structural models based on CCS data for RNAs as well as proteins with non-globular folds. Graphical Abstract ᅟ.

  19. FINDSITE-metal: Integrating evolutionary information and machine learning for structure-based metal binding site prediction at the proteome level

    PubMed Central

    Brylinski, Michal; Skolnick, Jeffrey

    2010-01-01

    The rapid accumulation of gene sequences, many of which are hypothetical proteins with unknown function, has stimulated the development of accurate computational tools for protein function prediction with evolution/structure-based approaches showing considerable promise. In this paper, we present FINDSITE-metal, a new threading-based method designed specifically to detect metal binding sites in modeled protein structures. Comprehensive benchmarks using different quality protein structures show that weakly homologous protein models provide sufficient structural information for quite accurate annotation by FINDSITE-metal. Combining structure/evolutionary information with machine learning results in highly accurate metal binding annotations; for protein models constructed by TASSER, whose average Cα RMSD from the native structure is 8.9 Å, 59.5% (71.9%) of the best of top five predicted metal locations are within 4 Å (8 Å) from a bound metal in the crystal structure. For most of the targets, multiple metal binding sites are detected with the best predicted binding site at rank 1 and within the top 2 ranks in 65.6% and 83.1% of the cases, respectively. Furthermore, for iron, copper, zinc, calcium and magnesium ions, the binding metal can be predicted with high, typically 70-90%, accuracy. FINDSITE-metal also provides a set of confidence indexes that help assess the reliability of predictions. Finally, we describe the proteome-wide application of FINDSITE-metal that quantifies the metal binding complement of the human proteome. FINDSITE-metal is freely available to the academic community at http://cssb.biology.gatech.edu/findsite-metal/. PMID:21287609

  20. Catalytic site identification—a web server to identify catalytic site structural matches throughout PDB

    PubMed Central

    Kirshner, Daniel A.; Nilmeier, Jerome P.; Lightstone, Felice C.

    2013-01-01

    The catalytic site identification web server provides the innovative capability to find structural matches to a user-specified catalytic site among all Protein Data Bank proteins rapidly (in less than a minute). The server also can examine a user-specified protein structure or model to identify structural matches to a library of catalytic sites. Finally, the server provides a database of pre-calculated matches between all Protein Data Bank proteins and the library of catalytic sites. The database has been used to derive a set of hypothesized novel enzymatic function annotations. In all cases, matches and putative binding sites (protein structure and surfaces) can be visualized interactively online. The website can be accessed at http://catsid.llnl.gov. PMID:23680785

  1. Catalytic site identification--a web server to identify catalytic site structural matches throughout PDB.

    PubMed

    Kirshner, Daniel A; Nilmeier, Jerome P; Lightstone, Felice C

    2013-07-01

    The catalytic site identification web server provides the innovative capability to find structural matches to a user-specified catalytic site among all Protein Data Bank proteins rapidly (in less than a minute). The server also can examine a user-specified protein structure or model to identify structural matches to a library of catalytic sites. Finally, the server provides a database of pre-calculated matches between all Protein Data Bank proteins and the library of catalytic sites. The database has been used to derive a set of hypothesized novel enzymatic function annotations. In all cases, matches and putative binding sites (protein structure and surfaces) can be visualized interactively online. The website can be accessed at http://catsid.llnl.gov.

  2. Modelling protein functional domains in signal transduction using Maude

    NASA Technical Reports Server (NTRS)

    Sriram, M. G.

    2003-01-01

    Modelling of protein-protein interactions in signal transduction is receiving increased attention in computational biology. This paper describes recent research in the application of Maude, a symbolic language founded on rewriting logic, to the modelling of functional domains within signalling proteins. Protein functional domains (PFDs) are a critical focus of modern signal transduction research. In general, Maude models can simulate biological signalling networks and produce specific testable hypotheses at various levels of abstraction. Developing symbolic models of signalling proteins containing functional domains is important because of the potential to generate analyses of complex signalling networks based on structure-function relationships.

  3. Mimicking the action of folding chaperones by Hamiltonian replica-exchange molecular dynamics simulations: application in the refinement of de novo models.

    PubMed

    Fan, Hao; Periole, Xavier; Mark, Alan E

    2012-07-01

    The efficiency of using a variant of Hamiltonian replica-exchange molecular dynamics (Chaperone H-replica-exchange molecular dynamics [CH-REMD]) for the refinement of protein structural models generated de novo is investigated. In CH-REMD, the interaction between the protein and its environment, specifically, the electrostatic interaction between the protein and the solvating water, is varied leading to cycles of partial unfolding and refolding mimicking some aspects of folding chaperones. In 10 of the 15 cases examined, the CH-REMD approach sampled structures in which the root-mean-square deviation (RMSD) of secondary structure elements (SSE-RMSD) with respect to the experimental structure was more than 1.0 Å lower than the initial de novo model. In 14 of the 15 cases, the improvement was more than 0.5 Å. The ability of three different statistical potentials to identify near-native conformations was also examined. Little correlation between the SSE-RMSD of the sampled structures with respect to the experimental structure and any of the scoring functions tested was found. The most effective scoring function tested was the DFIRE potential. Using the DFIRE potential, the SSE-RMSD of the best scoring structures was on average 0.3 Å lower than the initial model. Overall the work demonstrates that targeted enhanced-sampling techniques such as CH-REMD can lead to the systematic refinement of protein structural models generated de novo but that improved potentials for the identification of near-native structures are still needed. Copyright © 2012 Wiley Periodicals, Inc.

  4. GPCR-I-TASSER: A hybrid approach to G protein-coupled receptor structure modeling and the application to the human genome

    PubMed Central

    Zhang, Jian; Yang, Jianyi; Jang, Richard; Zhang, Yang

    2015-01-01

    SUMMARY Experimental structure determination remains very difficult for G protein-coupled receptors (GPCRs). We propose a new hybrid protocol to construct GPCR structure models that integrates experimental mutagenesis data with ab initio transmembrane (TM) helix assembly simulations. The method was tested on 24 known GPCRs where the ab initio TM-helix assembly procedure constructed the correct fold for 20 cases. When combined with weak-homology and sparse mutagenesis restraints, the method generated correct folds for all the tested cases with an average C-alpha RMSD 2.4 Å in the TM-regions. The new hybrid protocol was applied to model all 1026 GPCRs in the human genome, where 923 have a high confidence score that are expected to have correct folds; these contain many pharmaceutically important families with no previously solved structures, including Trace amine, Prostanoids, Releasing hormones, Melanocortins, Vasopressin and Neuropeptide Y receptors. The results demonstrate new progress on genome-wide structure modeling of transmembrane proteins. PMID:26190572

  5. Proton channel models

    PubMed Central

    Pupo, Amaury; Baez-Nieto, David; Martínez, Agustín; Latorre, Ramón; González, Carlos

    2014-01-01

    Voltage-gated proton channels are integral membrane proteins with the capacity to permeate elementary particles in a voltage and pH dependent manner. These proteins have been found in several species and are involved in various physiological processes. Although their primary topology is known, lack of details regarding their structures in the open conformation has limited analyses toward a deeper understanding of the molecular determinants of their function and regulation. Consequently, the function-structure relationships have been inferred based on homology models. In the present work, we review the existing proton channel models, their assumptions, predictions and the experimental facts that support them. Modeling proton channels is not a trivial task due to the lack of a close homolog template. Hence, there are important differences between published models. This work attempts to critically review existing proton channel models toward the aim of contributing to a better understanding of the structural features of these proteins. PMID:24755912

  6. Structural Biology of Tumor Necrosis Factor Demonstrated for Undergraduates Instruction by Computer Simulation

    ERIC Educational Resources Information Center

    Roy, Urmi

    2016-01-01

    This work presents a three-dimensional (3D) modeling exercise for undergraduate students in chemistry and health sciences disciplines, focusing on a protein-group linked to immune system regulation. Specifically, the exercise involves molecular modeling and structural analysis of tumor necrosis factor (TNF) proteins, both wild type and mutant. The…

  7. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lee, Myeongsang; Baek, Inchul; Choi, Hyunsung

    Pathological amyloid proteins have been implicated in neuro-degenerative diseases, specifically Alzheimer's, Parkinson's, Lewy-body diseases and prion related diseases. In prion related diseases, functional tau proteins can be transformed into pathological agents by environmental factors, including oxidative stress, inflammation, Aβ-mediated toxicity and covalent modification. These pathological agents are stable under physiological conditions and are not easily degraded. This un-degradable characteristic of tau proteins enables their utilization as functional materials to capturing the carbon dioxides. For the proper utilization of amyloid proteins as functional materials efficiently, a basic study regarding their structural characteristic is necessary. Here, we investigated the basic tau proteinmore » structure of wild-type (WT) and tau proteins with lysine residues mutation at glutamic residue (Q2K) on tau protein at atomistic scale. We also reported the size effect of both the WT and Q2K structures, which allowed us to identify the stability of those amyloid structures. - Highlights: • Lysine mutation effect alters the structure conformation and characteristic of tau. • Over the 15 layers both WT and Q2K models, both tau proteins undergo fractions. • Lysine mutation causes the increment of non-bonded energy and solvent accessible surface area. • Structural instability of Q2K model was proved by the number of hydrogen bonds analysis.« less

  8. Unconstrained Structure Formation in Coarse-Grained Protein Simulations

    NASA Astrophysics Data System (ADS)

    Bereau, Tristan

    The ability of proteins to fold into well-defined structures forms the basis of a wide variety of biochemical functions in and out of the cell membrane. Many of these processes, however, operate at time- and length-scales that are currently unattainable by all-atom computer simulations. To cope with this difficulty, increasingly more accurate and sophisticated coarse-grained models are currently being developed. In the present thesis, we introduce a solvent-free coarse-grained model for proteins. Proteins are modeled by four beads per amino acid, providing enough backbone resolution to allow for accurate sampling of local conformations. It relies on simple interactions that emphasize structure, such as hydrogen bonds and hydrophobicity. Realistic alpha/beta content is achieved by including an effective nearest-neighbor dipolar interaction. Parameters are tuned to reproduce both local conformations and tertiary structures. By studying both helical and extended conformations we make sure the force field is not biased towards any particular secondary structure. Without any further adjustments or bias a realistic oligopeptide aggregation scenario is observed. The model is subsequently applied to various biophysical problems: (i) kinetics of folding of two model peptides, (ii) large-scale amyloid-beta oligomerization, and (iii) protein folding cooperativity. The last topic---defined by the nature of the finite-size thermodynamic transition exhibited upon folding---was investigated from a microcanonical perspective: the accurate evaluation of the density of states can unambiguously characterize the nature of the transition, unlike its corresponding canonical analysis. Extending the results of lattice simulations and theoretical models, we find that it is the interplay between secondary structure and the loss of non-native tertiary contacts which determines the nature of the transition. Finally, we combine the peptide model with a high-resolution, solvent-free, lipid model. The lipid force field was systematically tuned to reproduce the structural and mechanical properties of phosphatidylcholine bilayers. The two models were cross-parametrized against atomistic potential of mean force curves for the insertion of single amino acid side chains into a bilayer. Coarse-grained transmembrane protein simulations were then compared with experiments and atomistic simulations to validate the force field. The transferability of the two models across amino acid sequences and lipid species permits the investigation of a wide variety of scenarios, while the absence of explicit solvent allows for studies of large-scale phenomena.

  9. ModeRNA: a tool for comparative modeling of RNA 3D structure

    PubMed Central

    Rother, Magdalena; Rother, Kristian; Puton, Tomasz; Bujnicki, Janusz M.

    2011-01-01

    RNA is a large group of functionally important biomacromolecules. In striking analogy to proteins, the function of RNA depends on its structure and dynamics, which in turn is encoded in the linear sequence. However, while there are numerous methods for computational prediction of protein three-dimensional (3D) structure from sequence, with comparative modeling being the most reliable approach, there are very few such methods for RNA. Here, we present ModeRNA, a software tool for comparative modeling of RNA 3D structures. As an input, ModeRNA requires a 3D structure of a template RNA molecule, and a sequence alignment between the target to be modeled and the template. It must be emphasized that a good alignment is required for successful modeling, and for large and complex RNA molecules the development of a good alignment usually requires manual adjustments of the input data based on previous expertise of the respective RNA family. ModeRNA can model post-transcriptional modifications, a functionally important feature analogous to post-translational modifications in proteins. ModeRNA can also model DNA structures or use them as templates. It is equipped with many functions for merging fragments of different nucleic acid structures into a single model and analyzing their geometry. Windows and UNIX implementations of ModeRNA with comprehensive documentation and a tutorial are freely available. PMID:21300639

  10. Protein folding optimization based on 3D off-lattice model via an improved artificial bee colony algorithm.

    PubMed

    Li, Bai; Lin, Mu; Liu, Qiao; Li, Ya; Zhou, Changjun

    2015-10-01

    Protein folding is a fundamental topic in molecular biology. Conventional experimental techniques for protein structure identification or protein folding recognition require strict laboratory requirements and heavy operating burdens, which have largely limited their applications. Alternatively, computer-aided techniques have been developed to optimize protein structures or to predict the protein folding process. In this paper, we utilize a 3D off-lattice model to describe the original protein folding scheme as a simplified energy-optimal numerical problem, where all types of amino acid residues are binarized into hydrophobic and hydrophilic ones. We apply a balance-evolution artificial bee colony (BE-ABC) algorithm as the minimization solver, which is featured by the adaptive adjustment of search intensity to cater for the varying needs during the entire optimization process. In this work, we establish a benchmark case set with 13 real protein sequences from the Protein Data Bank database and evaluate the convergence performance of BE-ABC algorithm through strict comparisons with several state-of-the-art ABC variants in short-term numerical experiments. Besides that, our obtained best-so-far protein structures are compared to the ones in comprehensive previous literature. This study also provides preliminary insights into how artificial intelligence techniques can be applied to reveal the dynamics of protein folding. Graphical Abstract Protein folding optimization using 3D off-lattice model and advanced optimization techniques.

  11. Investigating energy-based pool structure selection in the structure ensemble modeling with experimental distance constraints: The example from a multidomain protein Pub1.

    PubMed

    Zhu, Guanhua; Liu, Wei; Bao, Chenglong; Tong, Dudu; Ji, Hui; Shen, Zuowei; Yang, Daiwen; Lu, Lanyuan

    2018-05-01

    The structural variations of multidomain proteins with flexible parts mediate many biological processes, and a structure ensemble can be determined by selecting a weighted combination of representative structures from a simulated structure pool, producing the best fit to experimental constraints such as interatomic distance. In this study, a hybrid structure-based and physics-based atomistic force field with an efficient sampling strategy is adopted to simulate a model di-domain protein against experimental paramagnetic relaxation enhancement (PRE) data that correspond to distance constraints. The molecular dynamics simulations produce a wide range of conformations depicted on a protein energy landscape. Subsequently, a conformational ensemble recovered with low-energy structures and the minimum-size restraint is identified in good agreement with experimental PRE rates, and the result is also supported by chemical shift perturbations and small-angle X-ray scattering data. It is illustrated that the regularizations of energy and ensemble-size prevent an arbitrary interpretation of protein conformations. Moreover, energy is found to serve as a critical control to refine the structure pool and prevent data overfitting, because the absence of energy regularization exposes ensemble construction to the noise from high-energy structures and causes a more ambiguous representation of protein conformations. Finally, we perform structure-ensemble optimizations with a topology-based structure pool, to enhance the understanding on the ensemble results from different sources of pool candidates. © 2018 Wiley Periodicals, Inc.

  12. Predicting turns in proteins with a unified model.

    PubMed

    Song, Qi; Li, Tonghua; Cong, Peisheng; Sun, Jiangming; Li, Dapeng; Tang, Shengnan

    2012-01-01

    Turns are a critical element of the structure of a protein; turns play a crucial role in loops, folds, and interactions. Current prediction methods are well developed for the prediction of individual turn types, including α-turn, β-turn, and γ-turn, etc. However, for further protein structure and function prediction it is necessary to develop a uniform model that can accurately predict all types of turns simultaneously. In this study, we present a novel approach, TurnP, which offers the ability to investigate all the turns in a protein based on a unified model. The main characteristics of TurnP are: (i) using newly exploited features of structural evolution information (secondary structure and shape string of protein) based on structure homologies, (ii) considering all types of turns in a unified model, and (iii) practical capability of accurate prediction of all turns simultaneously for a query. TurnP utilizes predicted secondary structures and predicted shape strings, both of which have greater accuracy, based on innovative technologies which were both developed by our group. Then, sequence and structural evolution features, which are profile of sequence, profile of secondary structures and profile of shape strings are generated by sequence and structure alignment. When TurnP was validated on a non-redundant dataset (4,107 entries) by five-fold cross-validation, we achieved an accuracy of 88.8% and a sensitivity of 71.8%, which exceeded the most state-of-the-art predictors of certain type of turn. Newly determined sequences, the EVA and CASP9 datasets were used as independent tests and the results we achieved were outstanding for turn predictions and confirmed the good performance of TurnP for practical applications.

  13. Predicting Turns in Proteins with a Unified Model

    PubMed Central

    Song, Qi; Li, Tonghua; Cong, Peisheng; Sun, Jiangming; Li, Dapeng; Tang, Shengnan

    2012-01-01

    Motivation Turns are a critical element of the structure of a protein; turns play a crucial role in loops, folds, and interactions. Current prediction methods are well developed for the prediction of individual turn types, including α-turn, β-turn, and γ-turn, etc. However, for further protein structure and function prediction it is necessary to develop a uniform model that can accurately predict all types of turns simultaneously. Results In this study, we present a novel approach, TurnP, which offers the ability to investigate all the turns in a protein based on a unified model. The main characteristics of TurnP are: (i) using newly exploited features of structural evolution information (secondary structure and shape string of protein) based on structure homologies, (ii) considering all types of turns in a unified model, and (iii) practical capability of accurate prediction of all turns simultaneously for a query. TurnP utilizes predicted secondary structures and predicted shape strings, both of which have greater accuracy, based on innovative technologies which were both developed by our group. Then, sequence and structural evolution features, which are profile of sequence, profile of secondary structures and profile of shape strings are generated by sequence and structure alignment. When TurnP was validated on a non-redundant dataset (4,107 entries) by five-fold cross-validation, we achieved an accuracy of 88.8% and a sensitivity of 71.8%, which exceeded the most state-of-the-art predictors of certain type of turn. Newly determined sequences, the EVA and CASP9 datasets were used as independent tests and the results we achieved were outstanding for turn predictions and confirmed the good performance of TurnP for practical applications. PMID:23144872

  14. Eukaryotic major facilitator superfamily transporter modeling based on the prokaryotic GlpT crystal structure.

    PubMed

    Lemieux, M Joanne

    2007-01-01

    The major facilitator superfamily (MFS) of transporters represents the largest family of secondary active transporters and has a diverse range of substrates. With structural information for four MFS transporters, we can see a strong structural commonality suggesting, as predicted, a common architecture for MFS transporters. The rate for crystal structure determination of MFS transporters is slow, making modeling of both prokaryotic and eukaryotic transporters more enticing. In this review, models of eukaryotic transporters Glut1, G6PT, OCT1, OCT2 and Pho84, based on the crystal structures of the prokaryotic GlpT, based on the crystal structure of LacY are discussed. The techniques used to generate the different models are compared. In addition, the validity of these models and the strategy of using prokaryotic crystal structures to model eukaryotic proteins are discussed. For comparison, E. coli GlpT was modeled based on the E. coli LacY structure and compared to the crystal structure of GlpT demonstrating that experimental evidence is essential for accurate modeling of membrane proteins.

  15. Roles of β-Turns in Protein Folding: From Peptide Models to Protein Engineering

    PubMed Central

    Marcelino, Anna Marie C.; Gierasch, Lila M.

    2010-01-01

    Reverse turns are a major class of protein secondary structure; they represent sites of chain reversal and thus sites where the globular character of a protein is created. It has been speculated for many years that turns may nucleate the formation of structure in protein folding, as their propensity to occur will favor the approximation of their flanking regions and their general tendency to be hydrophilic will favor their disposition at the solvent-accessible surface. Reverse turns are local features, and it is therefore not surprising that their structural properties have been extensively studied using peptide models. In this article, we review research on peptide models of turns to test the hypothesis that the propensities of turns to form in short peptides will relate to the roles of corresponding sequences in protein folding. Turns with significant stability as isolated entities should actively promote the folding of a protein, and by contrast, turn sequences that merely allow the chain to adopt conformations required for chain reversal are predicted to be passive in the folding mechanism. We discuss results of protein engineering studies of the roles of turn residues in folding mechanisms. Factors that correlate with the importance of turns in folding indeed include their intrinsic stability, as well as their topological context and their participation in hydrophobic networks within the protein’s structure. PMID:18275088

  16. Protein–Mineral Interactions: Molecular Dynamics Simulations Capture Importance of Variations in Mineral Surface Composition and Structure

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Andersen, Amity; Reardon, Patrick N.; Chacon, Stephany S.

    Molecular dynamics simulations, conventional and metadynamics, were performed to determine the interaction of model protein Gb1 over kaolinite (001), Na+-montmorillonite (001), Ca2+-montmorillonite (001), goethite (100), and Na+-birnessite (001) mineral surfaces. Gb1, a small (56 residue) protein with a well-characterized solution-state nuclear magnetic resonance (NMR) structure and having α-helix, four-fold β-sheet, and hydrophobic core features, is used as a model protein to study protein soil mineral interactions and gain insights on structural changes and potential degradation of protein. From our simulations, we observe little change to the hydrated Gb1 structure over the kaolinite, montmorillonite, and goethite surfaces relative to its solvatedmore » structure without these mineral surfaces present. Over the Na+-birnessite basal surface, however, the Gb1 structure is highly disturbed as a result of interaction with this birnessite surface. Unraveling of the Gb1 β-sheet at specific turns and a partial unraveling of the α-helix is observed over birnessite, which suggests specific vulnerable residue sites for oxidation or hydrolysis possibly leading to fragmentation.« less

  17. PASS2: an automated database of protein alignments organised as structural superfamilies.

    PubMed

    Bhaduri, Anirban; Pugalenthi, Ganesan; Sowdhamini, Ramanathan

    2004-04-02

    The functional selection and three-dimensional structural constraints of proteins in nature often relates to the retention of significant sequence similarity between proteins of similar fold and function despite poor sequence identity. Organization of structure-based sequence alignments for distantly related proteins, provides a map of the conserved and critical regions of the protein universe that is useful for the analysis of folding principles, for the evolutionary unification of protein families and for maximizing the information return from experimental structure determination. The Protein Alignment organised as Structural Superfamily (PASS2) database represents continuously updated, structural alignments for evolutionary related, sequentially distant proteins. An automated and updated version of PASS2 is, in direct correspondence with SCOP 1.63, consisting of sequences having identity below 40% among themselves. Protein domains have been grouped into 628 multi-member superfamilies and 566 single member superfamilies. Structure-based sequence alignments for the superfamilies have been obtained using COMPARER, while initial equivalencies have been derived from a preliminary superposition using LSQMAN or STAMP 4.0. The final sequence alignments have been annotated for structural features using JOY4.0. The database is supplemented with sequence relatives belonging to different genomes, conserved spatially interacting and structural motifs, probabilistic hidden markov models of superfamilies based on the alignments and useful links to other databases. Probabilistic models and sensitive position specific profiles obtained from reliable superfamily alignments aid annotation of remote homologues and are useful tools in structural and functional genomics. PASS2 presents the phylogeny of its members both based on sequence and structural dissimilarities. Clustering of members allows us to understand diversification of the family members. The search engine has been improved for simpler browsing of the database. The database resolves alignments among the structural domains consisting of evolutionarily diverged set of sequences. Availability of reliable sequence alignments of distantly related proteins despite poor sequence identity and single-member superfamilies permit better sampling of structures in libraries for fold recognition of new sequences and for the understanding of protein structure-function relationships of individual superfamilies. PASS2 is accessible at http://www.ncbs.res.in/~faculty/mini/campass/pass2.html

  18. Shaping up the protein folding funnel by local interaction: lesson from a structure prediction study.

    PubMed

    Chikenji, George; Fujitsuka, Yoshimi; Takada, Shoji

    2006-02-28

    Predicting protein tertiary structure by folding-like simulations is one of the most stringent tests of how much we understand the principle of protein folding. Currently, the most successful method for folding-based structure prediction is the fragment assembly (FA) method. Here, we address why the FA method is so successful and its lesson for the folding problem. To do so, using the FA method, we designed a structure prediction test of "chimera proteins." In the chimera proteins, local structural preference is specific to the target sequences, whereas nonlocal interactions are only sequence-independent compaction forces. We find that these chimera proteins can find the native folds of the intact sequences with high probability indicating dominant roles of the local interactions. We further explore roles of local structural preference by exact calculation of the HP lattice model of proteins. From these results, we suggest principles of protein folding: For small proteins, compact structures that are fully compatible with local structural preference are few, one of which is the native fold. These local biases shape up the funnel-like energy landscape.

  19. Shaping up the protein folding funnel by local interaction: Lesson from a structure prediction study

    PubMed Central

    Chikenji, George; Fujitsuka, Yoshimi; Takada, Shoji

    2006-01-01

    Predicting protein tertiary structure by folding-like simulations is one of the most stringent tests of how much we understand the principle of protein folding. Currently, the most successful method for folding-based structure prediction is the fragment assembly (FA) method. Here, we address why the FA method is so successful and its lesson for the folding problem. To do so, using the FA method, we designed a structure prediction test of “chimera proteins.” In the chimera proteins, local structural preference is specific to the target sequences, whereas nonlocal interactions are only sequence-independent compaction forces. We find that these chimera proteins can find the native folds of the intact sequences with high probability indicating dominant roles of the local interactions. We further explore roles of local structural preference by exact calculation of the HP lattice model of proteins. From these results, we suggest principles of protein folding: For small proteins, compact structures that are fully compatible with local structural preference are few, one of which is the native fold. These local biases shape up the funnel-like energy landscape. PMID:16488978

  20. Challenges in structural approaches to cell modeling.

    PubMed

    Im, Wonpil; Liang, Jie; Olson, Arthur; Zhou, Huan-Xiang; Vajda, Sandor; Vakser, Ilya A

    2016-07-31

    Computational modeling is essential for structural characterization of biomolecular mechanisms across the broad spectrum of scales. Adequate understanding of biomolecular mechanisms inherently involves our ability to model them. Structural modeling of individual biomolecules and their interactions has been rapidly progressing. However, in terms of the broader picture, the focus is shifting toward larger systems, up to the level of a cell. Such modeling involves a more dynamic and realistic representation of the interactomes in vivo, in a crowded cellular environment, as well as membranes and membrane proteins, and other cellular components. Structural modeling of a cell complements computational approaches to cellular mechanisms based on differential equations, graph models, and other techniques to model biological networks, imaging data, etc. Structural modeling along with other computational and experimental approaches will provide a fundamental understanding of life at the molecular level and lead to important applications to biology and medicine. A cross section of diverse approaches presented in this review illustrates the developing shift from the structural modeling of individual molecules to that of cell biology. Studies in several related areas are covered: biological networks; automated construction of three-dimensional cell models using experimental data; modeling of protein complexes; prediction of non-specific and transient protein interactions; thermodynamic and kinetic effects of crowding; cellular membrane modeling; and modeling of chromosomes. The review presents an expert opinion on the current state-of-the-art in these various aspects of structural modeling in cellular biology, and the prospects of future developments in this emerging field. Copyright © 2016 Elsevier Ltd. All rights reserved.

  1. Residue contacts predicted by evolutionary covariance extend the application of ab initio molecular replacement to larger and more challenging protein folds.

    PubMed

    Simkovic, Felix; Thomas, Jens M H; Keegan, Ronan M; Winn, Martyn D; Mayans, Olga; Rigden, Daniel J

    2016-07-01

    For many protein families, the deluge of new sequence information together with new statistical protocols now allow the accurate prediction of contacting residues from sequence information alone. This offers the possibility of more accurate ab initio (non-homology-based) structure prediction. Such models can be used in structure solution by molecular replacement (MR) where the target fold is novel or is only distantly related to known structures. Here, AMPLE, an MR pipeline that assembles search-model ensembles from ab initio structure predictions ('decoys'), is employed to assess the value of contact-assisted ab initio models to the crystallographer. It is demonstrated that evolutionary covariance-derived residue-residue contact predictions improve the quality of ab initio models and, consequently, the success rate of MR using search models derived from them. For targets containing β-structure, decoy quality and MR performance were further improved by the use of a β-strand contact-filtering protocol. Such contact-guided decoys achieved 14 structure solutions from 21 attempted protein targets, compared with nine for simple Rosetta decoys. Previously encountered limitations were superseded in two key respects. Firstly, much larger targets of up to 221 residues in length were solved, which is far larger than the previously benchmarked threshold of 120 residues. Secondly, contact-guided decoys significantly improved success with β-sheet-rich proteins. Overall, the improved performance of contact-guided decoys suggests that MR is now applicable to a significantly wider range of protein targets than were previously tractable, and points to a direct benefit to structural biology from the recent remarkable advances in sequencing.

  2. Residue contacts predicted by evolutionary covariance extend the application of ab initio molecular replacement to larger and more challenging protein folds

    PubMed Central

    Simkovic, Felix; Thomas, Jens M. H.; Keegan, Ronan M.; Winn, Martyn D.; Mayans, Olga; Rigden, Daniel J.

    2016-01-01

    For many protein families, the deluge of new sequence information together with new statistical protocols now allow the accurate prediction of contacting residues from sequence information alone. This offers the possibility of more accurate ab initio (non-homology-based) structure prediction. Such models can be used in structure solution by molecular replacement (MR) where the target fold is novel or is only distantly related to known structures. Here, AMPLE, an MR pipeline that assembles search-model ensembles from ab initio structure predictions (‘decoys’), is employed to assess the value of contact-assisted ab initio models to the crystallographer. It is demonstrated that evolutionary covariance-derived residue–residue contact predictions improve the quality of ab initio models and, consequently, the success rate of MR using search models derived from them. For targets containing β-structure, decoy quality and MR performance were further improved by the use of a β-strand contact-filtering protocol. Such contact-guided decoys achieved 14 structure solutions from 21 attempted protein targets, compared with nine for simple Rosetta decoys. Previously encountered limitations were superseded in two key respects. Firstly, much larger targets of up to 221 residues in length were solved, which is far larger than the previously benchmarked threshold of 120 residues. Secondly, contact-guided decoys significantly improved success with β-sheet-rich proteins. Overall, the improved performance of contact-guided decoys suggests that MR is now applicable to a significantly wider range of protein targets than were previously tractable, and points to a direct benefit to structural biology from the recent remarkable advances in sequencing. PMID:27437113

  3. Interactive comparison and remediation of collections of macromolecular structures.

    PubMed

    Moriarty, Nigel W; Liebschner, Dorothee; Klei, Herbert E; Echols, Nathaniel; Afonine, Pavel V; Headd, Jeffrey J; Poon, Billy K; Adams, Paul D

    2018-01-01

    Often similar structures need to be compared to reveal local differences throughout the entire model or between related copies within the model. Therefore, a program to compare multiple structures and enable correction any differences not supported by the density map was written within the Phenix framework (Adams et al., Acta Cryst 2010; D66:213-221). This program, called Structure Comparison, can also be used for structures with multiple copies of the same protein chain in the asymmetric unit, that is, as a result of non-crystallographic symmetry (NCS). Structure Comparison was designed to interface with Coot(Emsley et al., Acta Cryst 2010; D66:486-501) and PyMOL(DeLano, PyMOL 0.99; 2002) to facilitate comparison of large numbers of related structures. Structure Comparison analyzes collections of protein structures using several metrics, such as the rotamer conformation of equivalent residues, displays the results in tabular form and allows superimposed protein chains and density maps to be quickly inspected and edited (via the tools in Coot) for consistency, completeness and correctness. © 2017 The Protein Society.

  4. Looping and clustering model for the organization of protein-DNA complexes on the bacterial genome

    NASA Astrophysics Data System (ADS)

    Walter, Jean-Charles; Walliser, Nils-Ole; David, Gabriel; Dorignac, Jérôme; Geniet, Frédéric; Palmeri, John; Parmeggiani, Andrea; Wingreen, Ned S.; Broedersz, Chase P.

    2018-03-01

    The bacterial genome is organized by a variety of associated proteins inside a structure called the nucleoid. These proteins can form complexes on DNA that play a central role in various biological processes, including chromosome segregation. A prominent example is the large ParB-DNA complex, which forms an essential component of the segregation machinery in many bacteria. ChIP-Seq experiments show that ParB proteins localize around centromere-like parS sites on the DNA to which ParB binds specifically, and spreads from there over large sections of the chromosome. Recent theoretical and experimental studies suggest that DNA-bound ParB proteins can interact with each other to condense into a coherent 3D complex on the DNA. However, the structural organization of this protein-DNA complex remains unclear, and a predictive quantitative theory for the distribution of ParB proteins on DNA is lacking. Here, we propose the looping and clustering model, which employs a statistical physics approach to describe protein-DNA complexes. The looping and clustering model accounts for the extrusion of DNA loops from a cluster of interacting DNA-bound proteins that is organized around a single high-affinity binding site. Conceptually, the structure of the protein-DNA complex is determined by a competition between attractive protein interactions and loop closure entropy of this protein-DNA cluster on the one hand, and the positional entropy for placing loops within the cluster on the other. Indeed, we show that the protein interaction strength determines the ‘tightness’ of the loopy protein-DNA complex. Thus, our model provides a theoretical framework for quantitatively computing the binding profiles of ParB-like proteins around a cognate (parS) binding site.

  5. Characterization of the low-temperature properties of a simplified protein model

    NASA Astrophysics Data System (ADS)

    Hagmann, Johannes-Geert; Nakagawa, Naoko; Peyrard, Michel

    2014-01-01

    Prompted by results that showed that a simple protein model, the frustrated Gō model, appears to exhibit a transition reminiscent of the protein dynamical transition, we examine the validity of this model to describe the low-temperature properties of proteins. First, we examine equilibrium fluctuations. We calculate its incoherent neutron-scattering structure factor and show that it can be well described by a theory using the one-phonon approximation. By performing an inherent structure analysis, we assess the transitions among energy states at low temperatures. Then, we examine nonequilibrium fluctuations after a sudden cooling of the protein. We investigate the violation of the fluctuation-dissipation theorem in order to analyze the protein glass transition. We find that the effective temperature of the quenched protein deviates from the temperature of the thermostat, however it relaxes towards the actual temperature with an Arrhenius behavior as the waiting time increases. These results of the equilibrium and nonequilibrium studies converge to the conclusion that the apparent dynamical transition of this coarse-grained model cannot be attributed to a glassy behavior.

  6. Stability and the Evolvability of Function in a Model Protein

    PubMed Central

    Bloom, Jesse D.; Wilke, Claus O.; Arnold, Frances H.; Adami, Christoph

    2004-01-01

    Functional proteins must fold with some minimal stability to a structure that can perform a biochemical task. Here we use a simple model to investigate the relationship between the stability requirement and the capacity of a protein to evolve the function of binding to a ligand. Although our model contains no built-in tradeoff between stability and function, proteins evolved function more efficiently when the stability requirement was relaxed. Proteins with both high stability and high function evolved more efficiently when the stability requirement was gradually increased than when there was constant selection for high stability. These results show that in our model, the evolution of function is enhanced by allowing proteins to explore sequences corresponding to marginally stable structures, and that it is easier to improve stability while maintaining high function than to improve function while maintaining high stability. Our model also demonstrates that even in the absence of a fundamental biophysical tradeoff between stability and function, the speed with which function can evolve is limited by the stability requirement imposed on the protein. PMID:15111394

  7. Systems biology of the structural proteome.

    PubMed

    Brunk, Elizabeth; Mih, Nathan; Monk, Jonathan; Zhang, Zhen; O'Brien, Edward J; Bliven, Spencer E; Chen, Ke; Chang, Roger L; Bourne, Philip E; Palsson, Bernhard O

    2016-03-11

    The success of genome-scale models (GEMs) can be attributed to the high-quality, bottom-up reconstructions of metabolic, protein synthesis, and transcriptional regulatory networks on an organism-specific basis. Such reconstructions are biochemically, genetically, and genomically structured knowledge bases that can be converted into a mathematical format to enable a myriad of computational biological studies. In recent years, genome-scale reconstructions have been extended to include protein structural information, which has opened up new vistas in systems biology research and empowered applications in structural systems biology and systems pharmacology. Here, we present the generation, application, and dissemination of genome-scale models with protein structures (GEM-PRO) for Escherichia coli and Thermotoga maritima. We show the utility of integrating molecular scale analyses with systems biology approaches by discussing several comparative analyses on the temperature dependence of growth, the distribution of protein fold families, substrate specificity, and characteristic features of whole cell proteomes. Finally, to aid in the grand challenge of big data to knowledge, we provide several explicit tutorials of how protein-related information can be linked to genome-scale models in a public GitHub repository ( https://github.com/SBRG/GEMPro/tree/master/GEMPro_recon/). Translating genome-scale, protein-related information to structured data in the format of a GEM provides a direct mapping of gene to gene-product to protein structure to biochemical reaction to network states to phenotypic function. Integration of molecular-level details of individual proteins, such as their physical, chemical, and structural properties, further expands the description of biochemical network-level properties, and can ultimately influence how to model and predict whole cell phenotypes as well as perform comparative systems biology approaches to study differences between organisms. GEM-PRO offers insight into the physical embodiment of an organism's genotype, and its use in this comparative framework enables exploration of adaptive strategies for these organisms, opening the door to many new lines of research. With these provided tools, tutorials, and background, the reader will be in a position to run GEM-PRO for their own purposes.

  8. Domain analyses of Usher syndrome causing Clarin-1 and GPR98 protein models.

    PubMed

    Khan, Sehrish Haider; Javed, Muhammad Rizwan; Qasim, Muhammad; Shahzadi, Samar; Jalil, Asma; Rehman, Shahid Ur

    2014-01-01

    Usher syndrome is an autosomal recessive disorder that causes hearing loss, Retinitis Pigmentosa (RP) and vestibular dysfunction. It is clinically and genetically heterogeneous disorder which is clinically divided into three types i.e. type I, type II and type III. To date, there are about twelve loci and ten identified genes which are associated with Usher syndrome. A mutation in any of these genes e.g. CDH23, CLRN1, GPR98, MYO7A, PCDH15, USH1C, USH1G, USH2A and DFNB31 can result in Usher syndrome or non-syndromic deafness. These genes provide instructions for making proteins that play important roles in normal hearing, balance and vision. Studies have shown that protein structures of only seven genes have been determined experimentally and there are still three genes whose structures are unavailable. These genes are Clarin-1, GPR98 and Usherin. In the absence of an experimentally determined structure, homology modeling and threading often provide a useful 3D model of a protein. Therefore in the current study Clarin-1 and GPR98 proteins have been analyzed for signal peptide, domains and motifs. Clarin-1 protein was found to be without any signal peptide and consists of prokar lipoprotein domain. Clarin-1 is classified within claudin 2 super family and consists of twelve motifs. Whereas, GPR98 has a 29 amino acids long signal peptide and classified within GPCR family 2 having Concanavalin A-like lectin/glucanase superfamily. It was found to be consists of GPS and G protein receptor F2 domains and twenty nine motifs. Their 3D structures have been predicted using I-TASSER server. The model of Clarin-1 showed only α-helix but no beta sheets while model of GPR98 showed both α-helix and β sheets. The predicted structures were then evaluated and validated by MolProbity and Ramachandran plot. The evaluation of the predicted structures showed 78.9% residues of Clarin-1 and 78.9% residues of GPR98 within favored regions. The findings of present study has resulted in the three dimensional structure prediction and conserved domain analysis which will be quite beneficial in better understanding of molecular components, protein-protein interaction, clinical heterogeneity and pathophysiology of Usher syndrome.

  9. Domain analyses of Usher syndrome causing Clarin-1 and GPR98 protein models

    PubMed Central

    Khan, Sehrish Haider; Javed, Muhammad Rizwan; Qasim, Muhammad; Shahzadi, Samar; Jalil, Asma; Rehman, Shahid ur

    2014-01-01

    Usher syndrome is an autosomal recessive disorder that causes hearing loss, Retinitis Pigmentosa (RP) and vestibular dysfunction. It is clinically and genetically heterogeneous disorder which is clinically divided into three types i.e. type I, type II and type III. To date, there are about twelve loci and ten identified genes which are associated with Usher syndrome. A mutation in any of these genes e.g. CDH23, CLRN1, GPR98, MYO7A, PCDH15, USH1C, USH1G, USH2A and DFNB31 can result in Usher syndrome or non-syndromic deafness. These genes provide instructions for making proteins that play important roles in normal hearing, balance and vision. Studies have shown that protein structures of only seven genes have been determined experimentally and there are still three genes whose structures are unavailable. These genes are Clarin-1, GPR98 and Usherin. In the absence of an experimentally determined structure, homology modeling and threading often provide a useful 3D model of a protein. Therefore in the current study Clarin-1 and GPR98 proteins have been analyzed for signal peptide, domains and motifs. Clarin-1 protein was found to be without any signal peptide and consists of prokar lipoprotein domain. Clarin-1 is classified within claudin 2 super family and consists of twelve motifs. Whereas, GPR98 has a 29 amino acids long signal peptide and classified within GPCR family 2 having Concanavalin A-like lectin/glucanase superfamily. It was found to be consists of GPS and G protein receptor F2 domains and twenty nine motifs. Their 3D structures have been predicted using I-TASSER server. The model of Clarin-1 showed only α-helix but no beta sheets while model of GPR98 showed both α-helix and β sheets. The predicted structures were then evaluated and validated by MolProbity and Ramachandran plot. The evaluation of the predicted structures showed 78.9% residues of Clarin-1 and 78.9% residues of GPR98 within favored regions. The findings of present study has resulted in the three dimensional structure prediction and conserved domain analysis which will be quite beneficial in better understanding of molecular components, protein-protein interaction, clinical heterogeneity and pathophysiology of Usher syndrome. PMID:25258483

  10. Multiscale Persistent Functions for Biomolecular Structure Characterization

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Xia, Kelin; Li, Zhiming; Mu, Lin

    Here in this paper, we introduce multiscale persistent functions for biomolecular structure characterization. The essential idea is to combine our multiscale rigidity functions (MRFs) with persistent homology analysis, so as to construct a series of multiscale persistent functions, particularly multiscale persistent entropies, for structure characterization. To clarify the fundamental idea of our method, the multiscale persistent entropy (MPE) model is discussed in great detail. Mathematically, unlike the previous persistent entropy (Chintakunta et al. in Pattern Recognit 48(2):391–401, 2015; Merelli et al. in Entropy 17(10):6872–6892, 2015; Rucco et al. in: Proceedings of ECCS 2014, Springer, pp 117–128, 2016), a special resolutionmore » parameter is incorporated into our model. Various scales can be achieved by tuning its value. Physically, our MPE can be used in conformational entropy evaluation. More specifically, it is found that our method incorporates in it a natural classification scheme. This is achieved through a density filtration of an MRF built from angular distributions. To further validate our model, a systematical comparison with the traditional entropy evaluation model is done. Additionally, it is found that our model is able to preserve the intrinsic topological features of biomolecular data much better than traditional approaches, particularly for resolutions in the intermediate range. Moreover, by comparing with traditional entropies from various grid sizes, bond angle-based methods and a persistent homology-based support vector machine method (Cang et al. in Mol Based Math Biol 3:140–162, 2015), we find that our MPE method gives the best results in terms of average true positive rate in a classic protein structure classification test. More interestingly, all-alpha and all-beta protein classes can be clearly separated from each other with zero error only in our model. Finally, a special protein structure index (PSI) is proposed, for the first time, to describe the “regularity” of protein structures. Basically, a protein structure is deemed as regular if it has a consistent and orderly configuration. Our PSI model is tested on a database of 110 proteins; we find that structures with larger portions of loops and intrinsically disorder regions are always associated with larger PSI, meaning an irregular configuration, while proteins with larger portions of secondary structures, i.e., alpha-helix or beta-sheet, have smaller PSI. Essentially, PSI can be used to describe the “regularity” information in any systems.« less

  11. Memprot: a program to model the detergent corona around a membrane protein based on SEC–SAXS data

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Pérez, Javier, E-mail: javier.perez@synchrotron-soleil.fr; Koutsioubas, Alexandros; Synchrotron SOLEIL, L’Orme des Merisiers, BP 48, Saint-Aubin, 91192 Gif-sur-Yvette

    Systematic SAXS simulations have been analysed over a wide range of parameters in order to better understand the detergent corona around a membrane protein. The application of small-angle X-ray scattering (SAXS) to structural investigations of transmembrane proteins in detergent solution has been hampered by two main inherent hurdles. On the one hand, the formation of a detergent corona around the hydrophobic region of the protein strongly modifies the scattering curve of the protein. On the other hand, free micelles of detergent without a precisely known concentration coexist with the protein–detergent complex in solution, therefore adding an uncontrolled signal. To gainmore » robust structural information on such systems from SAXS data, in previous work, advantage was taken of the online combination of size-exclusion chromatography (SEC) and SAXS, and the detergent corona around aquaporin-0, a membrane protein of known structure, could be modelled. A precise geometrical model of the corona, shaped as an elliptical torus, was determined. Here, in order to better understand the correlations between the corona model parameters and to discuss the uniqueness of the model, this work was revisited by analyzing systematic SAXS simulations over a wide range of parameters of the torus.« less

  12. Probing the structure of Leishmania major DHFR TS and structure based virtual screening of peptide library for the identification of anti-leishmanial leads.

    PubMed

    Rajasekaran, Rajalakshmi; Chen, Yi-Ping Phoebe

    2012-09-01

    Leishmaniasis, a multi-faceted ethereal disease is considered to be one of the World's major communicable diseases that demands exhaustive research and control measures. The substantial data on these protozoan parasites has not been utilized completely to develop potential therapeutic strategies against Leishmaniasis. Dihydrofolate reductase thymidylate synthase (DHFR-TS) plays a major role in the infective state of the parasite and hence the DHFR-TS based drugs remains of much interest to researchers working on Leishmaniasis. Although, crystal structures of DHFR-TS from different species including Plasmodium falciparum and Trypanosoma cruzi are available, the experimentally determined structure of the Leishmania major DHFR-TS has not yet been reported in the Protein Data Bank. A high quality three dimensional structure of L.major DHFR-TS has been modeled through the homology modeling approach. Carefully refined and the energy minimized structure of the modeled protein was validated using a number of structure validation programs to confirm its structure quality. The modeled protein structure was used in the process of structure based virtual screening to figure out a potential lead structure against DHFR TS. The lead molecule identified has a binding affinity of 0.51 nM and clearly follows drug like properties.

  13. Visualizing ligand molecules in Twilight electron density.

    PubMed

    Weichenberger, Christian X; Pozharski, Edwin; Rupp, Bernhard

    2013-02-01

    Three-dimensional models of protein structures determined by X-ray crystallography are based on the interpretation of experimentally derived electron-density maps. The real-space correlation coefficient (RSCC) provides an easily comprehensible, objective measure of the residue-based fit of atom coordinates to electron density. Among protein structure models, protein-ligand complexes are of special interest, given their contribution to understanding the molecular underpinnings of biological activity and to drug design. For consumers of such models, it is not trivial to determine the degree to which ligand-structure modelling is biased by subjective electron-density interpretation. A standalone script, Twilight, is presented for the analysis, visualization and annotation of a pre-filtered set of 2815 protein-ligand complexes deposited with the PDB as of 15 January 2012 with ligand RSCC values that are below a threshold of 0.6. It also provides simplified access to the visualization of any protein-ligand complex available from the PDB and annotated by the Uppsala Electron Density Server. The script runs on various platforms and is available for download at http://www.ruppweb.org/twilight/.

  14. Structural and functional features of lysine acetylation of plant and animal tubulins.

    PubMed

    Rayevsky, Alexey V; Sharifi, Mohsen; Samofalova, Dariya A; Karpov, Pavel A; Blume, Yaroslav B

    2017-10-10

    The study of the genome and the proteome of different species and representatives of distinct kingdoms, especially detection of proteome via wide-scaled analyses has various challenges and pitfalls. Attempts to combine all available information together and isolate some common features for determination of the pathway and their mechanism of action generally have a highly complicated nature. However, microtubule (MT) monomers are highly conserved protein structures, and microtubules are structurally conserved from Homo sapiens to Arabidopsis thaliana. The interaction of MT elements with microtubule-associated proteins and post-translational modifiers is fully dependent on protein interfaces, and almost all MT modifications are well described except acetylation. Crystallography and interactome data using different approaches were combined to identify conserved proteins important in acetylation of microtubules. Application of computational methods and comparative analysis of binding modes generated a robust predictive model of acetylation of the ϵ-amino group of Lys40 in α-tubulins. In turn, the model discarded some probable mechanisms of interaction between elements of interest. Reconstruction of unresolved protein structures was carried out with modeling by homology to the existing crystal structure (PDBID: 1Z2B) from B. taurus using Swiss-model server, followed by a molecular dynamics simulation. Docking of the human tubulin fragment with Lys40 into the active site of α-tubulin acetyltransferase, reproduces the binding mode of peptidomimetic from X-ray structure (PDBID: 4PK3). © 2017 International Federation for Cell Biology.

  15. Solving protein structures using short-distance cross-linking constraints as a guide for discrete molecular dynamics simulations

    PubMed Central

    Brodie, Nicholas I.; Popov, Konstantin I.; Petrotchenko, Evgeniy V.; Dokholyan, Nikolay V.; Borchers, Christoph H.

    2017-01-01

    We present an integrated experimental and computational approach for de novo protein structure determination in which short-distance cross-linking data are incorporated into rapid discrete molecular dynamics (DMD) simulations as constraints, reducing the conformational space and achieving the correct protein folding on practical time scales. We tested our approach on myoglobin and FK506 binding protein—models for α helix–rich and β sheet–rich proteins, respectively—and found that the lowest-energy structures obtained were in agreement with the crystal structure, hydrogen-deuterium exchange, surface modification, and long-distance cross-linking validation data. Our approach is readily applicable to other proteins with unknown structures. PMID:28695211

  16. Structural pierce into molecular mechanism underlying Clostridium perfringens Epsilon toxin function.

    PubMed

    Khalili, Saeed; Jahangiri, Abolfazl; Hashemi, Zahra Sadat; Khalesi, Bahman; Mard-Soltani, Maysam; Amani, Jafar

    2017-03-01

    Epsilon toxin of the Clostridium perfringens garnered a lot of attention due to its potential for toxicity in humans, extreme potency for cytotoxicity in mice and lack of any approved therapeutics prescribed for human. However, the intricacies of the Epsilon toxin action mechanism are yet to be understood. In this regard, various in silico tools have been exploited to model and refine the 3D structure of the toxin and its two receptors. The receptor proteins were embedded into designed lipid membranes within an aqueous and ionized environment. Thereafter, the modeled structures subjected to series of consecutive molecular dynamics runs to achieve the most natural like coordination for each model. Ultimately, protein-protein interaction analyses were performed to understand the probable action mechanism. The obtained results successfully confirmed the accuracy of employed methods to achieve high quality models for the toxin and its receptors within their lipid bilayers. Molecular dynamics analyses lead the structures to a more native like coordination. Moreover, the results of previous empirical studies were confirmed, while new insights for action mechanisms including the detailed roles of Hepatitis A virus cellular receptor 1 (HAVCR1) and Myelin and lymphocyte protein (MAL) proteins were achieved. In light of previous and our observations, we suggested novel models which elucidated the existing interplay between potential players of Epsilon toxin action mechanism with detailed structural evidences. These models would pave the way to have more robust understanding of the Epsilon toxin biology, more precise vaccine construction and more successful drug (inhibitor) design. Copyright © 2017 Elsevier Ltd. All rights reserved.

  17. A Novel Domain Assembly Routine for Creating Full-Length Models of Membrane Proteins from Known Domain Structures.

    PubMed

    Koehler Leman, Julia; Bonneau, Richard

    2018-04-03

    Membrane proteins composed of soluble and membrane domains are often studied one domain at a time. However, to understand the biological function of entire protein systems and their interactions with each other and drugs, knowledge of full-length structures or models is required. Although few computational methods exist that could potentially be used to model full-length constructs of membrane proteins, none of these methods are perfectly suited for the problem at hand. Existing methods require an interface or knowledge of the relative orientations of the domains or are not designed for domain assembly, and none of them are developed for membrane proteins. Here we describe the first domain assembly protocol specifically designed for membrane proteins that assembles intra- and extracellular soluble domains and the transmembrane domain into models of the full-length membrane protein. Our protocol does not require an interface between the domains and samples possible domain orientations based on backbone dihedrals in the flexible linker regions, created via fragment insertion, while keeping the transmembrane domain fixed in the membrane. For five examples tested, our method mp_domain_assembly, implemented in RosettaMP, samples domain orientations close to the known structure and is best used in conjunction with experimental data to reduce the conformational search space.

  18. Trends in structural coverage of the protein universe and the impact of the Protein Structure Initiative.

    PubMed

    Khafizov, Kamil; Madrid-Aliste, Carlos; Almo, Steven C; Fiser, Andras

    2014-03-11

    The exponential growth of protein sequence data provides an ever-expanding body of unannotated and misannotated proteins. The National Institutes of Health-supported Protein Structure Initiative and related worldwide structural genomics efforts facilitate functional annotation of proteins through structural characterization. Recently there have been profound changes in the taxonomic composition of sequence databases, which are effectively redefining the scope and contribution of these large-scale structure-based efforts. The faster-growing bacterial genomic entries have overtaken the eukaryotic entries over the last 5 y, but also have become more redundant. Despite the enormous increase in the number of sequences, the overall structural coverage of proteins--including proteins for which reliable homology models can be generated--on the residue level has increased from 30% to 40% over the last 10 y. Structural genomics efforts contributed ∼50% of this new structural coverage, despite determining only ∼10% of all new structures. Based on current trends, it is expected that ∼55% structural coverage (the level required for significant functional insight) will be achieved within 15 y, whereas without structural genomics efforts, realizing this goal will take approximately twice as long.

  19. Toward structure prediction of cyclic peptides.

    PubMed

    Yu, Hongtao; Lin, Yu-Shan

    2015-02-14

    Cyclic peptides are a promising class of molecules that can be used to target specific protein-protein interactions. A computational method to accurately predict their structures would substantially advance the development of cyclic peptides as modulators of protein-protein interactions. Here, we develop a computational method that integrates bias-exchange metadynamics simulations, a Boltzmann reweighting scheme, dihedral principal component analysis and a modified density peak-based cluster analysis to provide a converged structural description for cyclic peptides. Using this method, we evaluate the performance of a number of popular protein force fields on a model cyclic peptide. All the tested force fields seem to over-stabilize the α-helix and PPII/β regions in the Ramachandran plot, commonly populated by linear peptides and proteins. Our findings suggest that re-parameterization of a force field that well describes the full Ramachandran plot is necessary to accurately model cyclic peptides.

  20. Energetically Unfavorable Amide Conformations for N6-Acetyllysine Side Chains in Refined Protein Structures

    PubMed Central

    Genshaft, Alexander; Moser, Joe-Ann S.; D'Antonio, Edward L.; Bowman, Christine M.; Christianson, David W.

    2013-01-01

    The reversible acetylation of lysine to form N6-acetyllysine in the regulation of protein function is a hallmark of epigenetics. Acetylation of the positively charged amino group of the lysine side chain generates a neutral N-alkylacetamide moiety that serves as a molecular “switch” for the modulation of protein function and protein-protein interactions. We now report the analysis of 381 N6-acetyllysine side chain amide conformations as found in 79 protein crystal structures and 11 protein NMR structures deposited in the Protein Data Bank (PDB) of the Research Collaboratory for Structural Bioinformatics. We find that only 74.3% of N6-acetyllysine residues in protein crystal structures and 46.5% in protein NMR structures contain amide groups with energetically preferred trans or generously trans conformations. Surprisingly, 17.6% of N6-acetyllysine residues in protein crystal structures and 5.3% in protein NMR structures contain amide groups with energetically unfavorable cis or generously cis conformations. Even more surprisingly, 8.1% of N6-acetyllysine residues in protein crystal structures and 48.2% in NMR structures contain amide groups with energetically prohibitive twisted conformations that approach the transition state structure for cis-trans isomerization. In contrast, 109 unique N-alkylacetamide groups contained in 84 highly-accurate small molecule crystal structures retrieved from the Cambridge Structural Database exclusively adopt energetically preferred trans conformations. Therefore, we conclude that cis and twisted N6-acetyllysine amides in protein structures deposited in the PDB are erroneously modeled due to their energetically unfavorable or prohibitive conformations. PMID:23401043

  1. Distance restraints from crosslinking mass spectrometry: Mining a molecular dynamics simulation database to evaluate lysine–lysine distances

    PubMed Central

    Merkley, Eric D; Rysavy, Steven; Kahraman, Abdullah; Hafen, Ryan P; Daggett, Valerie; Adkins, Joshua N

    2014-01-01

    Integrative structural biology attempts to model the structures of protein complexes that are challenging or intractable by classical structural methods (due to size, dynamics, or heterogeneity) by combining computational structural modeling with data from experimental methods. One such experimental method is chemical crosslinking mass spectrometry (XL-MS), in which protein complexes are crosslinked and characterized using liquid chromatography-mass spectrometry to pinpoint specific amino acid residues in close structural proximity. The commonly used lysine-reactive N-hydroxysuccinimide ester reagents disuccinimidylsuberate (DSS) and bis(sulfosuccinimidyl)suberate (BS3) have a linker arm that is 11.4 Å long when fully extended, allowing Cα (alpha carbon of protein backbone) atoms of crosslinked lysine residues to be up to ∼24 Å apart. However, XL-MS studies on proteins of known structure frequently report crosslinks that exceed this distance. Typically, a tolerance of ∼3 Å is added to the theoretical maximum to account for this observation, with limited justification for the chosen value. We used the Dynameomics database, a repository of high-quality molecular dynamics simulations of 807 proteins representative of diverse protein folds, to investigate the relationship between lysine–lysine distances in experimental starting structures and in simulation ensembles. We conclude that for DSS/BS3, a distance constraint of 26–30 Å between Cα atoms is appropriate. This analysis provides a theoretical basis for the widespread practice of adding a tolerance to the crosslinker length when comparing XL-MS results to structures or in modeling. We also discuss the comparison of XL-MS results to MD simulations and known structures as a means to test and validate experimental XL-MS methods. PMID:24639379

  2. Using a non-invasive technique in nutrition: synchrotron radiation infrared microspectroscopy spectroscopic characterization of oil seeds treated with different processing conditions on molecular spectral factors influencing nutrient delivery.

    PubMed

    Zhang, Xuewei; Yu, Peiqiang

    2014-07-02

    Non-invasive techniques are a key to study nutrition and structure interaction. Fourier transform infrared microspectroscopy coupled with a synchrotron radiation source (SR-IMS) is a rapid, non-invasive, and non-destructive bioanalytical technique. To understand internal structure changes in relation to nutrient availability in oil seed processing is vital to find optimal processing conditions. The objective of this study was to use a synchrotron-based bioanalytical technique SR-IMS as a non-invasive and non-destructive tool to study the effects of heat-processing methods and oil seed canola type on modeled protein structure based on spectral data within intact tissue that were randomly selected and quantify the relationship between the modeled protein structure and protein nutrient supply to ruminants. The results showed that the moisture heat-related processing significantly changed (p<0.05) modeled protein structures compared to the raw canola (control) and those processing by dry heating. The moisture heating increased (p<0.05) spectral intensities of amide I, amide II, α-helices, and β-sheets but decreased (p<0.05) the ratio of modeled α-helices to β-sheet spectral intensity. There was no difference (p>0.05) in the protein spectral profile between the raw and dry-heated canola tissue and between yellow- and brown-type canola tissue. The results indicated that different heat processing methods have different impacts on the protein inherent structure. The protein intrinsic structure in canola seed tissue was more sensitive and more response to the moisture heating in comparison to the dry heating. These changes are expected to be related to the nutritive value. However, the current study is based on limited samples, and more large-scale studies are needed to confirm our findings.

  3. Topological properties of complex networks in protein structures

    NASA Astrophysics Data System (ADS)

    Kim, Kyungsik; Jung, Jae-Won; Min, Seungsik

    2014-03-01

    We study topological properties of networks in structural classification of proteins. We model the native-state protein structure as a network made of its constituent amino-acids and their interactions. We treat four structural classes of proteins composed predominantly of α helices and β sheets and consider several proteins from each of these classes whose sizes range from amino acids of the Protein Data Bank. Particularly, we simulate and analyze the network metrics such as the mean degree, the probability distribution of degree, the clustering coefficient, the characteristic path length, the local efficiency, and the cost. This work was supported by the KMAR and DP under Grant WISE project (153-3100-3133-302-350).

  4. A Parametric Rosetta Energy Function Analysis with LK Peptides on SAM Surfaces.

    PubMed

    Lubin, Joseph H; Pacella, Michael S; Gray, Jeffrey J

    2018-05-08

    Although structures have been determined for many soluble proteins and an increasing number of membrane proteins, experimental structure determination methods are limited for complexes of proteins and solid surfaces. An economical alternative or complement to experimental structure determination is molecular simulation. Rosetta is one software suite that models protein-surface interactions, but Rosetta is normally benchmarked on soluble proteins. For surface interactions, the validity of the energy function is uncertain because it is a combination of independent parameters from energy functions developed separately for solution proteins and mineral surfaces. Here, we assess the performance of the RosettaSurface algorithm and test the accuracy of its energy function by modeling the adsorption of leucine/lysine (LK)-repeat peptides on methyl- and carboxy-terminated self-assembled monolayers (SAMs). We investigated how RosettaSurface predictions for this system compare with the experimental results, which showed that on both surfaces, LK-α peptides folded into helices and LK-β peptides held extended structures. Utilizing this model system, we performed a parametric analysis of Rosetta's Talaris energy function and determined that adjusting solvation parameters offered improved predictive accuracy. Simultaneously increasing lysine carbon hydrophilicity and the hydrophobicity of the surface methyl head groups yielded computational predictions most closely matching the experimental results. De novo models still should be interpreted skeptically unless bolstered in an integrative approach with experimental data.

  5. Optimization of the GBMV2 implicit solvent force field for accurate simulation of protein conformational equilibria.

    PubMed

    Lee, Kuo Hao; Chen, Jianhan

    2017-06-15

    Accurate treatment of solvent environment is critical for reliable simulations of protein conformational equilibria. Implicit treatment of solvation, such as using the generalized Born (GB) class of models arguably provides an optimal balance between computational efficiency and physical accuracy. Yet, GB models are frequently plagued by a tendency to generate overly compact structures. The physical origins of this drawback are relatively well understood, and the key to a balanced implicit solvent protein force field is careful optimization of physical parameters to achieve a sufficient level of cancellation of errors. The latter has been hampered by the difficulty of generating converged conformational ensembles of non-trivial model proteins using the popular replica exchange sampling technique. Here, we leverage improved sampling efficiency of a newly developed multi-scale enhanced sampling technique to re-optimize the generalized-Born with molecular volume (GBMV2) implicit solvent model with the CHARMM36 protein force field. Recursive optimization of key GBMV2 parameters (such as input radii) and protein torsion profiles (via the CMAP torsion cross terms) has led to a more balanced GBMV2 protein force field that recapitulates the structures and stabilities of both helical and β-hairpin model peptides. Importantly, this force field appears to be free of the over-compaction bias, and can generate structural ensembles of several intrinsically disordered proteins of various lengths that seem highly consistent with available experimental data. © 2017 Wiley Periodicals, Inc. © 2017 Wiley Periodicals, Inc.

  6. Structurally detailed coarse-grained model for Sec-facilitated co-translational protein translocation and membrane integration

    PubMed Central

    Miller, Thomas F.

    2017-01-01

    We present a coarse-grained simulation model that is capable of simulating the minute-timescale dynamics of protein translocation and membrane integration via the Sec translocon, while retaining sufficient chemical and structural detail to capture many of the sequence-specific interactions that drive these processes. The model includes accurate geometric representations of the ribosome and Sec translocon, obtained directly from experimental structures, and interactions parameterized from nearly 200 μs of residue-based coarse-grained molecular dynamics simulations. A protocol for mapping amino-acid sequences to coarse-grained beads enables the direct simulation of trajectories for the co-translational insertion of arbitrary polypeptide sequences into the Sec translocon. The model reproduces experimentally observed features of membrane protein integration, including the efficiency with which polypeptide domains integrate into the membrane, the variation in integration efficiency upon single amino-acid mutations, and the orientation of transmembrane domains. The central advantage of the model is that it connects sequence-level protein features to biological observables and timescales, enabling direct simulation for the mechanistic analysis of co-translational integration and for the engineering of membrane proteins with enhanced membrane integration efficiency. PMID:28328943

  7. Mechanical Modeling and Computer Simulation of Protein Folding

    ERIC Educational Resources Information Center

    Prigozhin, Maxim B.; Scott, Gregory E.; Denos, Sharlene

    2014-01-01

    In this activity, science education and modern technology are bridged to teach students at the high school and undergraduate levels about protein folding and to strengthen their model building skills. Students are guided from a textbook picture of a protein as a rigid crystal structure to a more realistic view: proteins are highly dynamic…

  8. Lead discovery and in silico 3D structure modeling of tumorigenic FAM72A (p17).

    PubMed

    Pramanik, Subrata; Kutzner, Arne; Heese, Klaus

    2015-01-01

    FAM72A (p17) is a novel neuronal protein that has been linked to tumorigenic effects in non-neuronal tissue. Using state of the art in silico physicochemical analyses (e.g., I-TASSER, RaptorX, and Modeller), we determined the three-dimensional (3D) protein structure of FAM72A and further identified potential ligand-protein interactions. Our data indicate a Zn(2+)/Fe(3+)-containing 3D protein structure, based on a 3GA3_A model template, which potentially interacts with the organic molecule RSM ((2s)-2-(acetylamino)-N-methyl-4-[(R)-methylsulfinyl] butanamide). The discovery of RSM may serve as potential lead for further anti-FAM72A drug screening tests in the pharmaceutical industry because interference with FAM72A's activities via RSM-related molecules might be a novel option to influence the tumor suppressor protein p53 signaling pathways for the treatment of various types of cancers.

  9. Evolution of sparsity and modularity in a model of protein allostery

    NASA Astrophysics Data System (ADS)

    Hemery, Mathieu; Rivoire, Olivier

    2015-04-01

    The sequence of a protein is not only constrained by its physical and biochemical properties under current selection, but also by features of its past evolutionary history. Understanding the extent and the form that these evolutionary constraints may take is important to interpret the information in protein sequences. To study this problem, we introduce a simple but physical model of protein evolution where selection targets allostery, the functional coupling of distal sites on protein surfaces. This model shows how the geometrical organization of couplings between amino acids within a protein structure can depend crucially on its evolutionary history. In particular, two scenarios are found to generate a spatial concentration of functional constraints: high mutation rates and fluctuating selective pressures. This second scenario offers a plausible explanation for the high tolerance of natural proteins to mutations and for the spatial organization of their least tolerant amino acids, as revealed by sequence analysis and mutagenesis experiments. It also implies a faculty to adapt to new selective pressures that is consistent with observations. The model illustrates how several independent functional modules may emerge within the same protein structure, depending on the nature of past environmental fluctuations. Our model thus relates the evolutionary history of proteins to the geometry of their functional constraints, with implications for decoding and engineering protein sequences.

  10. Internal protein motions in molecular-dynamics simulations of Bragg and diffuse X-ray scattering

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wall, Michael E.

    Molecular-dynamics (MD) simulations of Bragg and diffuse X-ray scattering provide a means of obtaining experimentally validated models of protein conformational ensembles. This paper shows that compared with a single periodic unit-cell model, the accuracy of simulating diffuse scattering is increased when the crystal is modeled as a periodic supercell consisting of a 2 × 2 × 2 layout of eight unit cells. The MD simulations capture the general dependence of correlations on the separation of atoms. There is substantial agreement between the simulated Bragg reflections and the crystal structure; there are local deviations, however, indicating both the limitation of using a single structuremore » to model disordered regions of the protein and local deviations of the average structure away from the crystal structure. Although it was anticipated that a simulation of longer duration might be required to achieve maximal agreement of the diffuse scattering calculation with the data using the supercell model, only a microsecond is required, the same as for the unit cell. Rigid protein motions only account for a minority fraction of the variation in atom positions from the simulation. The results indicate that protein crystal dynamics may be dominated by internal motions rather than packing interactions, and that MD simulations can be combined with Bragg and diffuse X-ray scattering to model the protein conformational ensemble.« less

  11. Internal protein motions in molecular-dynamics simulations of Bragg and diffuse X-ray scattering

    DOE PAGES

    Wall, Michael E.

    2018-01-25

    Molecular-dynamics (MD) simulations of Bragg and diffuse X-ray scattering provide a means of obtaining experimentally validated models of protein conformational ensembles. This paper shows that compared with a single periodic unit-cell model, the accuracy of simulating diffuse scattering is increased when the crystal is modeled as a periodic supercell consisting of a 2 × 2 × 2 layout of eight unit cells. The MD simulations capture the general dependence of correlations on the separation of atoms. There is substantial agreement between the simulated Bragg reflections and the crystal structure; there are local deviations, however, indicating both the limitation of using a single structuremore » to model disordered regions of the protein and local deviations of the average structure away from the crystal structure. Although it was anticipated that a simulation of longer duration might be required to achieve maximal agreement of the diffuse scattering calculation with the data using the supercell model, only a microsecond is required, the same as for the unit cell. Rigid protein motions only account for a minority fraction of the variation in atom positions from the simulation. The results indicate that protein crystal dynamics may be dominated by internal motions rather than packing interactions, and that MD simulations can be combined with Bragg and diffuse X-ray scattering to model the protein conformational ensemble.« less

  12. Evaluation of variability in high-resolution protein structures by global distance scoring.

    PubMed

    Anzai, Risa; Asami, Yoshiki; Inoue, Waka; Ueno, Hina; Yamada, Koya; Okada, Tetsuji

    2018-01-01

    Systematic analysis of the statistical and dynamical properties of proteins is critical to understanding cellular events. Extraction of biologically relevant information from a set of high-resolution structures is important because it can provide mechanistic details behind the functional properties of protein families, enabling rational comparison between families. Most of the current structural comparisons are pairwise-based, which hampers the global analysis of increasing contents in the Protein Data Bank. Additionally, pairing of protein structures introduces uncertainty with respect to reproducibility because it frequently accompanies other settings for superimposition. This study introduces intramolecular distance scoring for the global analysis of proteins, for each of which at least several high-resolution structures are available. As a pilot study, we have tested 300 human proteins and showed that the method is comprehensively used to overview advances in each protein and protein family at the atomic level. This method, together with the interpretation of the model calculations, provide new criteria for understanding specific structural variation in a protein, enabling global comparison of the variability in proteins from different species.

  13. Computational modeling of membrane proteins

    PubMed Central

    Leman, Julia Koehler; Ulmschneider, Martin B.; Gray, Jeffrey J.

    2014-01-01

    The determination of membrane protein (MP) structures has always trailed that of soluble proteins due to difficulties in their overexpression, reconstitution into membrane mimetics, and subsequent structure determination. The percentage of MP structures in the protein databank (PDB) has been at a constant 1-2% for the last decade. In contrast, over half of all drugs target MPs, only highlighting how little we understand about drug-specific effects in the human body. To reduce this gap, researchers have attempted to predict structural features of MPs even before the first structure was experimentally elucidated. In this review, we present current computational methods to predict MP structure, starting with secondary structure prediction, prediction of trans-membrane spans, and topology. Even though these methods generate reliable predictions, challenges such as predicting kinks or precise beginnings and ends of secondary structure elements are still waiting to be addressed. We describe recent developments in the prediction of 3D structures of both α-helical MPs as well as β-barrels using comparative modeling techniques, de novo methods, and molecular dynamics (MD) simulations. The increase of MP structures has (1) facilitated comparative modeling due to availability of more and better templates, and (2) improved the statistics for knowledge-based scoring functions. Moreover, de novo methods have benefitted from the use of correlated mutations as restraints. Finally, we outline current advances that will likely shape the field in the forthcoming decade. PMID:25355688

  14. Where to attach dye molecules to a protein: lessons from the computer program WHAT IF

    NASA Astrophysics Data System (ADS)

    Altenberg-Greulich, B.; Vriend, G.

    2001-10-01

    Genomic and proteomic projects are producing a flood of data that all require interpretation which often is best performed based on a three dimensional structure of the molecule(s) involved. These structures can be determined experimentally, or modelled by homology. Because of the complexity of the questions and the heterogeneity of the data, the software used for modelling proteins must become even more versatile. We describe several case studies in which the questions asked, the data, and the requirements on the software all are very different. It is shown how structural knowledge about a protein helps to determine the best place to bind a fluorescent dye. Such dyes are needed to determine protein-protein, protein-DNA interactions or intrinsic fluorescence microscopy. Further, using dyes you can trace molecules in the cell and thus get a handle on subcellular localisation. The first example (OCT-1) involves the search for free amino groups in a protein-DNA complex. The second example (BPTI) is a case, in which the amino acid distribution shows that amino groups are spread all over the structure, so that the natural structure has to be modified to get an answer. The third example (HFE) involves a model built by homology. In this case the amino group distribution can also be predicted. All these studies were performed using the WHAT IF software package. This package is available including source code, documentation, etc. See http://www.cmbi.kun.nl/whatif/

  15. Impact of computational structure-based methods on drug discovery.

    PubMed

    Reynolds, Charles H

    2014-01-01

    Structure-based drug design has become an indispensible tool in drug discovery. The emergence of structure-based design is due to gains in structural biology that have provided exponential growth in the number of protein crystal structures, new computational algorithms and approaches for modeling protein-ligand interactions, and the tremendous growth of raw computer power in the last 30 years. Computer modeling and simulation have made major contributions to the discovery of many groundbreaking drugs in recent years. Examples are presented that highlight the evolution of computational structure-based design methodology, and the impact of that methodology on drug discovery.

  16. MD simulations of papillomavirus DNA-E2 protein complexes hints at a protein structural code for DNA deformation.

    PubMed

    Falconi, M; Oteri, F; Eliseo, T; Cicero, D O; Desideri, A

    2008-08-01

    The structural dynamics of the DNA binding domains of the human papillomavirus strain 16 and the bovine papillomavirus strain 1, complexed with their DNA targets, has been investigated by modeling, molecular dynamics simulations, and nuclear magnetic resonance analysis. The simulations underline different dynamical features of the protein scaffolds and a different mechanical interaction of the two proteins with DNA. The two protein structures, although very similar, show differences in the relative mobility of secondary structure elements. Protein structural analyses, principal component analysis, and geometrical and energetic DNA analyses indicate that the two transcription factors utilize a different strategy in DNA recognition and deformation. Results show that the protein indirect DNA readout is not only addressable to the DNA molecule flexibility but it is finely tuned by the mechanical and dynamical properties of the protein scaffold involved in the interaction.

  17. Computational design of water-soluble α-helical barrels.

    PubMed

    Thomson, Andrew R; Wood, Christopher W; Burton, Antony J; Bartlett, Gail J; Sessions, Richard B; Brady, R Leo; Woolfson, Derek N

    2014-10-24

    The design of protein sequences that fold into prescribed de novo structures is challenging. General solutions to this problem require geometric descriptions of protein folds and methods to fit sequences to these. The α-helical coiled coils present a promising class of protein for this and offer considerable scope for exploring hitherto unseen structures. For α-helical barrels, which have more than four helices and accessible central channels, many of the possible structures remain unobserved. Here, we combine geometrical considerations, knowledge-based scoring, and atomistic modeling to facilitate the design of new channel-containing α-helical barrels. X-ray crystal structures of the resulting designs match predicted in silico models. Furthermore, the observed channels are chemically defined and have diameters related to oligomer state, which present routes to design protein function. Copyright © 2014, American Association for the Advancement of Science.

  18. Sequence and structure insights of kazal type thrombin inhibitor protein: Studied with phylogeny, homology modeling and dynamic MM/GBSA studies.

    PubMed

    Jadhav, Aparna; Dash, RadhaCharan; Hirwani, Raj; Abdin, Malik

    2018-03-01

    Despite the wide medical importance of serine protease inhibitors, many of kazal type proteins are still to be explored. These thrombin inhibiting proteins are found in the digestive system of hematophagous organisms mainly Arthropods. We studied one of such protein i.e. Kazal type-1 protein from sand-fly Phlebotomus papatasi as its structure and interaction with thrombin is unclear. Initially, Dipetalin a kazal-follistasin domain protein was run through PSI-BLAST to retrieve related sequences. Using this set of sequence a phylogenetic tree was constructed, which identified a distantly related kazal type-1 protein. A three-dimensional structure was predicted for this protein and was aligned with Rhodniin for further evaluation. To have a comparative understanding of it's binding at the thrombin active site, the aligned kazal model-thrombin and rhodniin-thrombin complexes were subjected to molecular dynamics simulations. Dynamics analysis with reference to main chain RMSD, H-chain residue RMSF and total energy showed rhodniin-thrombin complex as a more stable system. Further, the MM/GBSA method was applied that calculated the binding free energy (ΔG binding ) for rhodniin and kazal model as -220.32kcal/Mol and -90.70kcal/Mol, respectively. Thus, it shows that kazal model has weaker bonding with thrombin, unlike rhodniin. Copyright © 2017 Elsevier B.V. All rights reserved.

  19. PDBsum new things.

    PubMed

    Laskowski, Roman A

    2009-01-01

    PDBsum (http://www.ebi.ac.uk/pdbsum) provides summary information about each experimentally determined structural model in the Protein Data Bank (PDB). Here we describe some of its most recent features, including figures from the structure's key reference, citation data, Pfam domain diagrams, topology diagrams and protein-protein interactions. Furthermore, it now accepts users' own PDB format files and generates a private set of analyses for each uploaded structure.

  20. Principles of assembly reveal a periodic table of protein complexes.

    PubMed

    Ahnert, Sebastian E; Marsh, Joseph A; Hernández, Helena; Robinson, Carol V; Teichmann, Sarah A

    2015-12-11

    Structural insights into protein complexes have had a broad impact on our understanding of biological function and evolution. In this work, we sought a comprehensive understanding of the general principles underlying quaternary structure organization in protein complexes. We first examined the fundamental steps by which protein complexes can assemble, using experimental and structure-based characterization of assembly pathways. Most assembly transitions can be classified into three basic types, which can then be used to exhaustively enumerate a large set of possible quaternary structure topologies. These topologies, which include the vast majority of observed protein complex structures, enable a natural organization of protein complexes into a periodic table. On the basis of this table, we can accurately predict the expected frequencies of quaternary structure topologies, including those not yet observed. These results have important implications for quaternary structure prediction, modeling, and engineering. Copyright © 2015, American Association for the Advancement of Science.

  1. Predicting the Accuracy of Protein–Ligand Docking on Homology Models

    PubMed Central

    BORDOGNA, ANNALISA; PANDINI, ALESSANDRO; BONATI, LAURA

    2011-01-01

    Ligand–protein docking is increasingly used in Drug Discovery. The initial limitations imposed by a reduced availability of target protein structures have been overcome by the use of theoretical models, especially those derived by homology modeling techniques. While this greatly extended the use of docking simulations, it also introduced the need for general and robust criteria to estimate the reliability of docking results given the model quality. To this end, a large-scale experiment was performed on a diverse set including experimental structures and homology models for a group of representative ligand–protein complexes. A wide spectrum of model quality was sampled using templates at different evolutionary distances and different strategies for target–template alignment and modeling. The obtained models were scored by a selection of the most used model quality indices. The binding geometries were generated using AutoDock, one of the most common docking programs. An important result of this study is that indeed quantitative and robust correlations exist between the accuracy of docking results and the model quality, especially in the binding site. Moreover, state-of-the-art indices for model quality assessment are already an effective tool for an a priori prediction of the accuracy of docking experiments in the context of groups of proteins with conserved structural characteristics. PMID:20607693

  2. Large-scale modelling of the divergent spectrin repeats in nesprins: giant modular proteins.

    PubMed

    Autore, Flavia; Pfuhl, Mark; Quan, Xueping; Williams, Aisling; Roberts, Roland G; Shanahan, Catherine M; Fraternali, Franca

    2013-01-01

    Nesprin-1 and nesprin-2 are nuclear envelope (NE) proteins characterized by a common structure of an SR (spectrin repeat) rod domain and a C-terminal transmembrane KASH [Klarsicht-ANC-Syne-homology] domain and display N-terminal actin-binding CH (calponin homology) domains. Mutations in these proteins have been described in Emery-Dreifuss muscular dystrophy and attributed to disruptions of interactions at the NE with nesprins binding partners, lamin A/C and emerin. Evolutionary analysis of the rod domains of the nesprins has shown that they are almost entirely composed of unbroken SR-like structures. We present a bioinformatical approach to accurate definition of the boundaries of each SR by comparison with canonical SR structures, allowing for a large-scale homology modelling of the 74 nesprin-1 and 56 nesprin-2 SRs. The exposed and evolutionary conserved residues identify important pbs for protein-protein interactions that can guide tailored binding experiments. Most importantly, the bioinformatics analyses and the 3D models have been central to the design of selected constructs for protein expression. 1D NMR and CD spectra have been performed of the expressed SRs, showing a folded, stable, high content α-helical structure, typical of SRs. Molecular Dynamics simulations have been performed to study the structural and elastic properties of consecutive SRs, revealing insights in the mechanical properties adopted by these modules in the cell.

  3. Synergistic interactions of lipids and myelin basic protein

    NASA Astrophysics Data System (ADS)

    Hu, Yufang; Doudevski, Ivo; Wood, Denise; Moscarello, Mario; Husted, Cynthia; Genain, Claude; Zasadzinski, Joseph A.; Israelachvili, Jacob

    2004-09-01

    This report describes force measurements and atomic force microscope imaging of lipid-protein interactions that determine the structure of a model membrane system that closely mimics the myelin sheath. Our results suggest that noncovalent, mainly electrostatic and hydrophobic, interactions are responsible for the multilamellar structure and stability of myelin. We find that myelin basic protein acts as a lipid coupler between two apposed bilayers and as a lipid "hole-filler," effectively preventing defect holes from developing. From our protein-mediated-adhesion and force-distance measurements, we develop a simple quantitative model that gives a reasonably accurate picture of the molecular mechanism and adhesion of bilayer-bridging proteins by means of noncovalent interactions. The results and model indicate that optimum myelin adhesion and stability depend on the difference between, rather than the product of, the opposite charges on the lipid bilayers and myelin basic protein, as well as on the repulsive forces associated with membrane fluidity, and that small changes in any of these parameters away from the synergistically optimum values can lead to large changes in the adhesion or even its total elimination. Our results also show that the often-asked question of which membrane species, the lipids or the proteins, are the "important ones" may be misplaced. Both components work synergistically to provide the adhesion and overall structure. A better appreciation of the mechanism of this synergy may allow for a better understanding of stacked and especially myelin membrane structures and may lead to better treatments for demyelinating diseases such as multiple sclerosis. lipid-protein interactions | myelin membrane structure | membrane adhesion | membrane regeneration/healing | demyelinating diseases

  4. Introduction to bioinformatics.

    PubMed

    Can, Tolga

    2014-01-01

    Bioinformatics is an interdisciplinary field mainly involving molecular biology and genetics, computer science, mathematics, and statistics. Data intensive, large-scale biological problems are addressed from a computational point of view. The most common problems are modeling biological processes at the molecular level and making inferences from collected data. A bioinformatics solution usually involves the following steps: Collect statistics from biological data. Build a computational model. Solve a computational modeling problem. Test and evaluate a computational algorithm. This chapter gives a brief introduction to bioinformatics by first providing an introduction to biological terminology and then discussing some classical bioinformatics problems organized by the types of data sources. Sequence analysis is the analysis of DNA and protein sequences for clues regarding function and includes subproblems such as identification of homologs, multiple sequence alignment, searching sequence patterns, and evolutionary analyses. Protein structures are three-dimensional data and the associated problems are structure prediction (secondary and tertiary), analysis of protein structures for clues regarding function, and structural alignment. Gene expression data is usually represented as matrices and analysis of microarray data mostly involves statistics analysis, classification, and clustering approaches. Biological networks such as gene regulatory networks, metabolic pathways, and protein-protein interaction networks are usually modeled as graphs and graph theoretic approaches are used to solve associated problems such as construction and analysis of large-scale networks.

  5. Packing of sidechains in low-resolution models for proteins.

    PubMed

    Keskin, O; Bahar, I

    1998-01-01

    Atomic level rotamer libraries for sidechains in proteins have been proposed by several groups. Conformations of side groups in coarse-grained models, on the other hand, have not yet been analyzed, although low resolution approaches are the only efficient way to explore global structural features. A residue-specific backbone-dependent library for sidechain isomers, compatible with a coarse-grained model, is proposed. The isomeric states are utilized in packing sidechains of known backbone structures. Sidechain positions are predicted with a root-mean-square deviation (r.m.s.d.) of 2.40 A with respect to crystal structure for 50 test proteins. The rmsd for core residues is 1.60 A and decreases to 1.35 A when conformational correlations and directional effects in inter-residue couplings are considered. An automated method for assigning sidechain positions in coarse-grained model proteins is proposed and made available on the internet; the method accounts satisfactorily for sidechain packing, particularly in the core.

  6. Mechanochemical models of processive molecular motors

    NASA Astrophysics Data System (ADS)

    Lan, Ganhui; Sun, Sean X.

    2012-05-01

    Motor proteins are the molecular engines powering the living cell. These nanometre-sized molecules convert chemical energy, both enthalpic and entropic, into useful mechanical work. High resolution single molecule experiments can now observe motor protein movement with increasing precision. The emerging data must be combined with structural and kinetic measurements to develop a quantitative mechanism. This article describes a modelling framework where quantitative understanding of motor behaviour can be developed based on the protein structure. The framework is applied to myosin motors, with emphasis on how synchrony between motor domains give rise to processive unidirectional movement. The modelling approach shows that the elasticity of protein domains are important in regulating motor function. Simple models of protein domain elasticity are presented. The framework can be generalized to other motor systems, or an ensemble of motors such as muscle contraction. Indeed, for hundreds of myosins, our framework can be reduced to the Huxely-Simmons description of muscle movement in the mean-field limit.

  7. Electrostatics, structure prediction, and the energy landscapes for protein folding and binding.

    PubMed

    Tsai, Min-Yeh; Zheng, Weihua; Balamurugan, D; Schafer, Nicholas P; Kim, Bobby L; Cheung, Margaret S; Wolynes, Peter G

    2016-01-01

    While being long in range and therefore weakly specific, electrostatic interactions are able to modulate the stability and folding landscapes of some proteins. The relevance of electrostatic forces for steering the docking of proteins to each other is widely acknowledged, however, the role of electrostatics in establishing specifically funneled landscapes and their relevance for protein structure prediction are still not clear. By introducing Debye-Hückel potentials that mimic long-range electrostatic forces into the Associative memory, Water mediated, Structure, and Energy Model (AWSEM), a transferable protein model capable of predicting tertiary structures, we assess the effects of electrostatics on the landscapes of thirteen monomeric proteins and four dimers. For the monomers, we find that adding electrostatic interactions does not improve structure prediction. Simulations of ribosomal protein S6 show, however, that folding stability depends monotonically on electrostatic strength. The trend in predicted melting temperatures of the S6 variants agrees with experimental observations. Electrostatic effects can play a range of roles in binding. The binding of the protein complex KIX-pKID is largely assisted by electrostatic interactions, which provide direct charge-charge stabilization of the native state and contribute to the funneling of the binding landscape. In contrast, for several other proteins, including the DNA-binding protein FIS, electrostatics causes frustration in the DNA-binding region, which favors its binding with DNA but not with its protein partner. This study highlights the importance of long-range electrostatics in functional responses to problems where proteins interact with their charged partners, such as DNA, RNA, as well as membranes. © 2015 The Protein Society.

  8. GalaxyTBM: template-based modeling by building a reliable core and refining unreliable local regions.

    PubMed

    Ko, Junsu; Park, Hahnbeom; Seok, Chaok

    2012-08-10

    Protein structures can be reliably predicted by template-based modeling (TBM) when experimental structures of homologous proteins are available. However, it is challenging to obtain structures more accurate than the single best templates by either combining information from multiple templates or by modeling regions that vary among templates or are not covered by any templates. We introduce GalaxyTBM, a new TBM method in which the more reliable core region is modeled first from multiple templates and less reliable, variable local regions, such as loops or termini, are then detected and re-modeled by an ab initio method. This TBM method is based on "Seok-server," which was tested in CASP9 and assessed to be amongst the top TBM servers. The accuracy of the initial core modeling is enhanced by focusing on more conserved regions in the multiple-template selection and multiple sequence alignment stages. Additional improvement is achieved by ab initio modeling of up to 3 unreliable local regions in the fixed framework of the core structure. Overall, GalaxyTBM reproduced the performance of Seok-server, with GalaxyTBM and Seok-server resulting in average GDT-TS of 68.1 and 68.4, respectively, when tested on 68 single-domain CASP9 TBM targets. For application to multi-domain proteins, GalaxyTBM must be combined with domain-splitting methods. Application of GalaxyTBM to CASP9 targets demonstrates that accurate protein structure prediction is possible by use of a multiple-template-based approach, and ab initio modeling of variable regions can further enhance the model quality.

  9. Hydrophobic potential of mean force as a solvation function for protein structure prediction.

    PubMed

    Lin, Matthew S; Fawzi, Nicolas Lux; Head-Gordon, Teresa

    2007-06-01

    We have developed a solvation function that combines a Generalized Born model for polarization of protein charge by the high dielectric solvent, with a hydrophobic potential of mean force (HPMF) as a model for hydrophobic interaction, to aid in the discrimination of native structures from other misfolded states in protein structure prediction. We find that our energy function outperforms other reported scoring functions in terms of correct native ranking for 91% of proteins and low Z scores for a variety of decoy sets, including the challenging Rosetta decoys. This work shows that the stabilizing effect of hydrophobic exposure to aqueous solvent that defines the HPMF hydration physics is an apparent improvement over solvent-accessible surface area models that penalize hydrophobic exposure. Decoys generated by thermal sampling around the native-state basin reveal a potentially important role for side-chain entropy in the future development of even more accurate free energy surfaces.

  10. SIRAH: a structurally unbiased coarse-grained force field for proteins with aqueous solvation and long-range electrostatics.

    PubMed

    Darré, Leonardo; Machado, Matías Rodrigo; Brandner, Astrid Febe; González, Humberto Carlos; Ferreira, Sebastián; Pantano, Sergio

    2015-02-10

    Modeling of macromolecular structures and interactions represents an important challenge for computational biology, involving different time and length scales. However, this task can be facilitated through the use of coarse-grained (CG) models, which reduce the number of degrees of freedom and allow efficient exploration of complex conformational spaces. This article presents a new CG protein model named SIRAH, developed to work with explicit solvent and to capture sequence, temperature, and ionic strength effects in a topologically unbiased manner. SIRAH is implemented in GROMACS, and interactions are calculated using a standard pairwise Hamiltonian for classical molecular dynamics simulations. We present a set of simulations that test the capability of SIRAH to produce a qualitatively correct solvation on different amino acids, hydrophilic/hydrophobic interactions, and long-range electrostatic recognition leading to spontaneous association of unstructured peptides and stable structures of single polypeptides and protein-protein complexes.

  11. Conservation of protein structure over four billion years.

    PubMed

    Ingles-Prieto, Alvaro; Ibarra-Molero, Beatriz; Delgado-Delgado, Asuncion; Perez-Jimenez, Raul; Fernandez, Julio M; Gaucher, Eric A; Sanchez-Ruiz, Jose M; Gavira, Jose A

    2013-09-03

    Little is known about the evolution of protein structures and the degree of protein structure conservation over planetary time scales. Here, we report the X-ray crystal structures of seven laboratory resurrections of Precambrian thioredoxins dating up to approximately four billion years ago. Despite considerable sequence differences compared with extant enzymes, the ancestral proteins display the canonical thioredoxin fold, whereas only small structural changes have occurred over four billion years. This remarkable degree of structure conservation since a time near the last common ancestor of life supports a punctuated-equilibrium model of structure evolution in which the generation of new folds occurs over comparatively short periods and is followed by long periods of structural stasis. Copyright © 2013 Elsevier Ltd. All rights reserved.

  12. Efficient molecular mechanics simulations of the folding, orientation, and assembly of peptides in lipid bilayers using an implicit atomic solvation model

    NASA Astrophysics Data System (ADS)

    Bordner, Andrew J.; Zorman, Barry; Abagyan, Ruben

    2011-10-01

    Membrane proteins comprise a significant fraction of the proteomes of sequenced organisms and are the targets of approximately half of marketed drugs. However, in spite of their prevalence and biomedical importance, relatively few experimental structures are available due to technical challenges. Computational simulations can potentially address this deficit by providing structural models of membrane proteins. Solvation within the spatially heterogeneous membrane/solvent environment provides a major component of the energetics driving protein folding and association within the membrane. We have developed an implicit solvation model for membranes that is both computationally efficient and accurate enough to enable molecular mechanics predictions for the folding and association of peptides within the membrane. We derived the new atomic solvation model parameters using an unbiased fitting procedure to experimental data and have applied it to diverse problems in order to test its accuracy and to gain insight into membrane protein folding. First, we predicted the positions and orientations of peptides and complexes within the lipid bilayer and compared the simulation results with solid-state NMR structures. Additionally, we performed folding simulations for a series of host-guest peptides with varying propensities to form alpha helices in a hydrophobic environment and compared the structures with experimental measurements. We were also able to successfully predict the structures of amphipathic peptides as well as the structures for dimeric complexes of short hexapeptides that have experimentally characterized propensities to form beta sheets within the membrane. Finally, we compared calculated relative transfer energies with data from experiments measuring the effects of mutations on the free energies of translocon-mediated insertion of proteins into lipid bilayers and of combined folding and membrane insertion of a beta barrel protein.

  13. Amyloidogenesis of Natively Unfolded Proteins

    PubMed Central

    Uversky, Vladimir N.

    2009-01-01

    Aggregation and subsequent development of protein deposition diseases originate from conformational changes in corresponding amyloidogenic proteins. The accumulated data support the model where protein fibrillogenesis proceeds via the formation of a relatively unfolded amyloidogenic conformation, which shares many structural properties with the pre-molten globule state, a partially folded intermediate first found during the equilibrium and kinetic (un)folding studies of several globular proteins and later described as one of the structural forms of natively unfolded proteins. The flexibility of this structural form is essential for the conformational rearrangements driving the formation of the core cross-beta structure of the amyloid fibril. Obviously, molecular mechanisms describing amyloidogenesis of ordered and natively unfolded proteins are different. For ordered protein to fibrillate, its unique and rigid structure has to be destabilized and partially unfolded. On the other hand, fibrillogenesis of a natively unfolded protein involves the formation of partially folded conformation; i.e., partial folding rather than unfolding. In this review recent findings are surveyed to illustrate some unique features of the natively unfolded proteins amyloidogenesis. PMID:18537543

  14. Combined Monte Carlo/torsion-angle molecular dynamics for ensemble modeling of proteins, nucleic acids and carbohydrates.

    PubMed

    Zhang, Weihong; Howell, Steven C; Wright, David W; Heindel, Andrew; Qiu, Xiangyun; Chen, Jianhan; Curtis, Joseph E

    2017-05-01

    We describe a general method to use Monte Carlo simulation followed by torsion-angle molecular dynamics simulations to create ensembles of structures to model a wide variety of soft-matter biological systems. Our particular emphasis is focused on modeling low-resolution small-angle scattering and reflectivity structural data. We provide examples of this method applied to HIV-1 Gag protein and derived fragment proteins, TraI protein, linear B-DNA, a nucleosome core particle, and a glycosylated monoclonal antibody. This procedure will enable a large community of researchers to model low-resolution experimental data with greater accuracy by using robust physics based simulation and sampling methods which are a significant improvement over traditional methods used to interpret such data. Published by Elsevier Inc.

  15. Twilight reloaded: the peptide experience

    PubMed Central

    Weichenberger, Christian X.; Pozharski, Edwin; Rupp, Bernhard

    2017-01-01

    The de facto commoditization of biomolecular crystallography as a result of almost disruptive instrumentation automation and continuing improvement of software allows any sensibly trained structural biologist to conduct crystallo­graphic studies of biomolecules with reasonably valid outcomes: that is, models based on properly interpreted electron density. Robust validation has led to major mistakes in the protein part of structure models becoming rare, but some depositions of protein–peptide complex structure models, which generally carry significant interest to the scientific community, still contain erroneous models of the bound peptide ligand. Here, the protein small-molecule ligand validation tool Twilight is updated to include peptide ligands. (i) The primary technical reasons and potential human factors leading to problems in ligand structure models are presented; (ii) a new method used to score peptide-ligand models is presented; (iii) a few instructive and specific examples, including an electron-density-based analysis of peptide-ligand structures that do not contain any ligands, are discussed in detail; (iv) means to avoid such mistakes and the implications for database integrity are discussed and (v) some suggestions as to how journal editors could help to expunge errors from the Protein Data Bank are provided. PMID:28291756

  16. Quantification of the transferability of a designed protein specificity switch reveals extensive epistasis in molecular recognition

    DOE PAGES

    Melero, Cristina; Ollikainen, Noah; Harwood, Ian; ...

    2014-10-13

    Re-engineering protein–protein recognition is an important route to dissecting and controlling complex interaction networks. Experimental approaches have used the strategy of “second-site suppressors,” where a functional interaction is inferred between two proteins if a mutation in one protein can be compensated by a mutation in the second. Mimicking this strategy, computational design has been applied successfully to change protein recognition specificity by predicting such sets of compensatory mutations in protein–protein interfaces. To extend this approach, it would be advantageous to be able to “transplant” existing engineered and experimentally validated specificity changes to other homologous protein–protein complexes. Here, we test thismore » strategy by designing a pair of mutations that modulates peptide recognition specificity in the Syntrophin PDZ domain, confirming the designed interaction biochemically and structurally, and then transplanting the mutations into the context of five related PDZ domain–peptide complexes. We find a wide range of energetic effects of identical mutations in structurally similar positions, revealing a dramatic context dependence (epistasis) of designed mutations in homologous protein–protein interactions. To better understand the structural basis of this context dependence, we apply a structure-based computational model that recapitulates these energetic effects and we use this model to make and validate forward predictions. The context dependence of these mutations is captured by computational predictions, our results both highlight the considerable difficulties in designing protein–protein interactions and provide challenging benchmark cases for the development of improved protein modeling and design methods that accurately account for the context.« less

  17. Quantification of the transferability of a designed protein specificity switch reveals extensive epistasis in molecular recognition

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Melero, Cristina; Ollikainen, Noah; Harwood, Ian

    Re-engineering protein–protein recognition is an important route to dissecting and controlling complex interaction networks. Experimental approaches have used the strategy of “second-site suppressors,” where a functional interaction is inferred between two proteins if a mutation in one protein can be compensated by a mutation in the second. Mimicking this strategy, computational design has been applied successfully to change protein recognition specificity by predicting such sets of compensatory mutations in protein–protein interfaces. To extend this approach, it would be advantageous to be able to “transplant” existing engineered and experimentally validated specificity changes to other homologous protein–protein complexes. Here, we test thismore » strategy by designing a pair of mutations that modulates peptide recognition specificity in the Syntrophin PDZ domain, confirming the designed interaction biochemically and structurally, and then transplanting the mutations into the context of five related PDZ domain–peptide complexes. We find a wide range of energetic effects of identical mutations in structurally similar positions, revealing a dramatic context dependence (epistasis) of designed mutations in homologous protein–protein interactions. To better understand the structural basis of this context dependence, we apply a structure-based computational model that recapitulates these energetic effects and we use this model to make and validate forward predictions. The context dependence of these mutations is captured by computational predictions, our results both highlight the considerable difficulties in designing protein–protein interactions and provide challenging benchmark cases for the development of improved protein modeling and design methods that accurately account for the context.« less

  18. Adaptation of model proteins from cold to hot environments involves continuous and small adjustments of average parameters related to amino acid composition.

    PubMed

    De Vendittis, Emmanuele; Castellano, Immacolata; Cotugno, Roberta; Ruocco, Maria Rosaria; Raimo, Gennaro; Masullo, Mariorosario

    2008-01-07

    The growth temperature adaptation of six model proteins has been studied in 42 microorganisms belonging to eubacterial and archaeal kingdoms, covering optimum growth temperatures from 7 to 103 degrees C. The selected proteins include three elongation factors involved in translation, the enzymes glyceraldehyde-3-phosphate dehydrogenase and superoxide dismutase, the cell division protein FtsZ. The common strategy of protein adaptation from cold to hot environments implies the occurrence of small changes in the amino acid composition, without altering the overall structure of the macromolecule. These continuous adjustments were investigated through parameters related to the amino acid composition of each protein. The average value per residue of mass, volume and accessible surface area allowed an evaluation of the usage of bulky residues, whereas the average hydrophobicity reflected that of hydrophobic residues. The specific proportion of bulky and hydrophobic residues in each protein almost linearly increased with the temperature of the host microorganism. This finding agrees with the structural and functional properties exhibited by proteins in differently adapted sources, thus explaining the great compactness or the high flexibility exhibited by (hyper)thermophilic or psychrophilic proteins, respectively. Indeed, heat-adapted proteins incline toward the usage of heavier-size and more hydrophobic residues with respect to mesophiles, whereas the cold-adapted macromolecules show the opposite behavior with a certain preference for smaller-size and less hydrophobic residues. An investigation on the different increase of bulky residues along with the growth temperature observed in the six model proteins suggests the relevance of the possible different role and/or structure organization played by protein domains. The significance of the linear correlations between growth temperature and parameters related to the amino acid composition improved when the analysis was collectively carried out on all model proteins.

  19. Generation of 3D templates of active sites of proteins with rigid prosthetic groups.

    PubMed

    Nebel, Jean-Christophe

    2006-05-15

    With the increasing availability of protein structures, the generation of biologically meaningful 3D patterns from the simultaneous alignment of several protein structures is an exciting prospect: active sites could be better understood, protein functions and protein 3D structures could be predicted more accurately. Although patterns can already be generated at the fold and topological levels, no system produces high-resolution 3D patterns including atom and cavity positions. To address this challenge, our research focuses on generating patterns from proteins with rigid prosthetic groups. Since these groups are key elements of protein active sites, the generated 3D patterns are expected to be biologically meaningful. In this paper, we present a new approach which allows the generation of 3D patterns from proteins with rigid prosthetic groups. Using 237 protein chains representing proteins containing porphyrin rings, our method was validated by comparing 3D templates generated from homologues with the 3D structure of the proteins they model. Atom positions were predicted reliably: 93% of them had an accuracy of 1.00 A or less. Moreover, similar results were obtained regarding chemical group and cavity positions. Results also suggested our system could contribute to the validation of 3D protein models. Finally, a 3D template was generated for the active site of human cytochrome P450 CYP17, the 3D structure of which is unknown. Its analysis showed that it is biologically meaningful: our method detected the main patterns of the cytochrome P450 superfamily and the motifs linked to catalytic reactions. The 3D template also suggested the position of a residue, which could be involved in a hydrogen bond with CYP17 substrates and the shape and location of a cavity. Comparisons with independently generated 3D models comforted these hypotheses. Alignment software (Nestor3D) is available at http://www.kingston.ac.uk/~ku33185/Nestor3D.html

  20. A Coarse-Grained Protein Model in a Water-like Solvent

    NASA Astrophysics Data System (ADS)

    Sharma, Sumit; Kumar, Sanat K.; Buldyrev, Sergey V.; Debenedetti, Pablo G.; Rossky, Peter J.; Stanley, H. Eugene

    2013-05-01

    Simulations employing an explicit atom description of proteins in solvent can be computationally expensive. On the other hand, coarse-grained protein models in implicit solvent miss essential features of the hydrophobic effect, especially its temperature dependence, and have limited ability to capture the kinetics of protein folding. We propose a free space two-letter protein (``H-P'') model in a simple, but qualitatively accurate description for water, the Jagla model, which coarse-grains water into an isotropically interacting sphere. Using Monte Carlo simulations, we design protein-like sequences that can undergo a collapse, exposing the ``Jagla-philic'' monomers to the solvent, while maintaining a ``hydrophobic'' core. This protein-like model manifests heat and cold denaturation in a manner that is reminiscent of proteins. While this protein-like model lacks the details that would introduce secondary structure formation, we believe that these ideas represent a first step in developing a useful, but computationally expedient, means of modeling proteins.

  1. GPCR-ModSim: A comprehensive web based solution for modeling G-protein coupled receptors

    PubMed Central

    Esguerra, Mauricio; Siretskiy, Alexey; Bello, Xabier; Sallander, Jessica; Gutiérrez-de-Terán, Hugo

    2016-01-01

    GPCR-ModSim (http://open.gpcr-modsim.org) is a centralized and easy to use service dedicated to the structural modeling of G-protein Coupled Receptors (GPCRs). 3D molecular models can be generated from amino acid sequence by homology-modeling techniques, considering different receptor conformations. GPCR-ModSim includes a membrane insertion and molecular dynamics (MD) equilibration protocol, which can be used to refine the generated model or any GPCR structure uploaded to the server, including if desired non-protein elements such as orthosteric or allosteric ligands, structural waters or ions. We herein revise the main characteristics of GPCR-ModSim and present new functionalities. The templates used for homology modeling have been updated considering the latest structural data, with separate profile structural alignments built for inactive, partially-active and active groups of templates. We have also added the possibility to perform multiple-template homology modeling in a unique and flexible way. Finally, our new MD protocol considers a series of distance restraints derived from a recently identified conserved network of helical contacts, allowing for a smoother refinement of the generated models which is particularly advised when there is low homology to the available templates. GPCR- ModSim has been tested on the GPCR Dock 2013 competition with satisfactory results. PMID:27166369

  2. MSX-3D: a tool to validate 3D protein models using mass spectrometry.

    PubMed

    Heymann, Michaël; Paramelle, David; Subra, Gilles; Forest, Eric; Martinez, Jean; Geourjon, Christophe; Deléage, Gilbert

    2008-12-01

    The technique of chemical cross-linking followed by mass spectrometry has proven to bring valuable information about the protein structure and interactions between proteic subunits. It is an effective and efficient way to experimentally investigate some aspects of a protein structure when NMR and X-ray crystallography data are lacking. We introduce MSX-3D, a tool specifically geared to validate protein models using mass spectrometry. In addition to classical peptides identifications, it allows an interactive 3D visualization of the distance constraints derived from a cross-linking experiment. Freely available at http://proteomics-pbil.ibcp.fr

  3. The Structural Biology Knowledgebase: a portal to protein structures, sequences, functions, and methods.

    PubMed

    Gabanyi, Margaret J; Adams, Paul D; Arnold, Konstantin; Bordoli, Lorenza; Carter, Lester G; Flippen-Andersen, Judith; Gifford, Lida; Haas, Juergen; Kouranov, Andrei; McLaughlin, William A; Micallef, David I; Minor, Wladek; Shah, Raship; Schwede, Torsten; Tao, Yi-Ping; Westbrook, John D; Zimmerman, Matthew; Berman, Helen M

    2011-07-01

    The Protein Structure Initiative's Structural Biology Knowledgebase (SBKB, URL: http://sbkb.org ) is an open web resource designed to turn the products of the structural genomics and structural biology efforts into knowledge that can be used by the biological community to understand living systems and disease. Here we will present examples on how to use the SBKB to enable biological research. For example, a protein sequence or Protein Data Bank (PDB) structure ID search will provide a list of related protein structures in the PDB, associated biological descriptions (annotations), homology models, structural genomics protein target status, experimental protocols, and the ability to order available DNA clones from the PSI:Biology-Materials Repository. A text search will find publication and technology reports resulting from the PSI's high-throughput research efforts. Web tools that aid in research, including a system that accepts protein structure requests from the community, will also be described. Created in collaboration with the Nature Publishing Group, the Structural Biology Knowledgebase monthly update also provides a research library, editorials about new research advances, news, and an events calendar to present a broader view of structural genomics and structural biology.

  4. Computer Modeling of the Structure and Spectra of Fluorescent Proteins

    PubMed Central

    Grigorenko, B.L.; Savitsky, A.P.

    2009-01-01

    Fluorescent proteins from the family of green fluorescent proteins are intensively used as biomarkers in living systems. The chromophore group based on the hydroxybenzylidene-imidazoline molecule, which is formed in nature from three amino-acid residues inside the protein globule and well shielded from external media, is responsible for light absorption and fluorescence. Along with the intense experimental studies of the properties of fluorescent proteins and their chromophores by biochemical, X-ray, and spectroscopic tools, in recent years, computer modeling has been used to characterize their properties and spectra. We present in this review the most interesting results of the molecular modeling of the structural parameters and optical and vibrational spectra of the chromophorecontaining domains of fluorescent proteins by methods of quantum chemistry, molecular dynamics, and combined quantum-mechanical-molecular-mechanical approaches. The main emphasis is on the correlation of theoretical and experimental data and on the predictive power of modeling, which may be useful for creating new, efficient biomarkers. PMID:22649601

  5. Structural insights into SAM domain-mediated tankyrase oligomerization.

    PubMed

    DaRosa, Paul A; Ovchinnikov, Sergey; Xu, Wenqing; Klevit, Rachel E

    2016-09-01

    Tankyrase 1 (TNKS1; a.k.a. ARTD5) and tankyrase 2 (TNKS2; a.k.a ARTD6) are highly homologous poly(ADP-ribose) polymerases (PARPs) that function in a wide variety of cellular processes including Wnt signaling, Src signaling, Akt signaling, Glut4 vesicle translocation, telomere length regulation, and centriole and spindle pole maturation. Tankyrase proteins include a sterile alpha motif (SAM) domain that undergoes oligomerization in vitro and in vivo. However, the SAM domains of TNKS1 and TNKS2 have not been structurally characterized and the mode of oligomerization is not yet defined. Here we model the SAM domain-mediated oligomerization of tankyrase. The structural model, supported by mutagenesis and NMR analysis, demonstrates a helical, homotypic head-to-tail polymer that facilitates TNKS self-association. Furthermore, we show that TNKS1 and TNKS2 can form (TNKS1 SAM-TNKS2 SAM) hetero-oligomeric structures mediated by their SAM domains. Though wild-type tankyrase proteins have very low solubility, model-based mutations of the SAM oligomerization interface residues allowed us to obtain soluble TNKS proteins. These structural insights will be invaluable for the functional and biophysical characterization of TNKS1/2, including the role of TNKS oligomerization in protein poly(ADP-ribosyl)ation (PARylation) and PARylation-dependent ubiquitylation. © 2016 The Protein Society.

  6. Structure prediction and binding sites analysis of curcin protein of Jatropha curcas using computational approaches.

    PubMed

    Srivastava, Mugdha; Gupta, Shishir K; Abhilash, P C; Singh, Nandita

    2012-07-01

    Ribosome inactivating proteins (RIPs) are defense proteins in a number of higher-plant species that are directly targeted toward herbivores. Jatropha curcas is one of the biodiesel plants having RIPs. The Jatropha seed meal, after extraction of oil, is rich in curcin, a highly toxic RIP similar to ricin, which makes it unsuitable for animal feed. Although the toxicity of curcin is well documented in the literature, the detailed toxic properties and the 3D structure of curcin has not been determined by X-ray crystallography, NMR spectroscopy or any in silico techniques to date. In this pursuit, the structure of curcin was modeled by a composite approach of 3D structure prediction using threading and ab initio modeling. Assessment of model quality was assessed by methods which include Ramachandran plot analysis and Qmean score estimation. Further, we applied the protein-ligand docking approach to identify the r-RNA binding residue of curcin. The present work provides the first structural insight into the binding mode of r-RNA adenine to the curcin protein and forms the basis for designing future inhibitors of curcin. Cloning of a future peptide inhibitor within J. curcas can produce non-toxic varieties of J. curcas, which would make the seed-cake suitable as animal feed without curcin detoxification.

  7. HDOCK: a web server for protein-protein and protein-DNA/RNA docking based on a hybrid strategy.

    PubMed

    Yan, Yumeng; Zhang, Di; Zhou, Pei; Li, Botong; Huang, Sheng-You

    2017-07-03

    Protein-protein and protein-DNA/RNA interactions play a fundamental role in a variety of biological processes. Determining the complex structures of these interactions is valuable, in which molecular docking has played an important role. To automatically make use of the binding information from the PDB in docking, here we have presented HDOCK, a novel web server of our hybrid docking algorithm of template-based modeling and free docking, in which cases with misleading templates can be rescued by the free docking protocol. The server supports protein-protein and protein-DNA/RNA docking and accepts both sequence and structure inputs for proteins. The docking process is fast and consumes about 10-20 min for a docking run. Tested on the cases with weakly homologous complexes of <30% sequence identity from five docking benchmarks, the HDOCK pipeline tied with template-based modeling on the protein-protein and protein-DNA benchmarks and performed better than template-based modeling on the three protein-RNA benchmarks when the top 10 predictions were considered. The performance of HDOCK became better when more predictions were considered. Combining the results of HDOCK and template-based modeling by ranking first of the template-based model further improved the predictive power of the server. The HDOCK web server is available at http://hdock.phys.hust.edu.cn/. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  8. Large-scale structure prediction by improved contact predictions and model quality assessment.

    PubMed

    Michel, Mirco; Menéndez Hurtado, David; Uziela, Karolis; Elofsson, Arne

    2017-07-15

    Accurate contact predictions can be used for predicting the structure of proteins. Until recently these methods were limited to very big protein families, decreasing their utility. However, recent progress by combining direct coupling analysis with machine learning methods has made it possible to predict accurate contact maps for smaller families. To what extent these predictions can be used to produce accurate models of the families is not known. We present the PconsFold2 pipeline that uses contact predictions from PconsC3, the CONFOLD folding algorithm and model quality estimations to predict the structure of a protein. We show that the model quality estimation significantly increases the number of models that reliably can be identified. Finally, we apply PconsFold2 to 6379 Pfam families of unknown structure and find that PconsFold2 can, with an estimated 90% specificity, predict the structure of up to 558 Pfam families of unknown structure. Out of these, 415 have not been reported before. Datasets as well as models of all the 558 Pfam families are available at http://c3.pcons.net/ . All programs used here are freely available. arne@bioinfo.se. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  9. ssHMM: extracting intuitive sequence-structure motifs from high-throughput RNA-binding protein data

    PubMed Central

    Krestel, Ralf; Ohler, Uwe; Vingron, Martin; Marsico, Annalisa

    2017-01-01

    Abstract RNA-binding proteins (RBPs) play an important role in RNA post-transcriptional regulation and recognize target RNAs via sequence-structure motifs. The extent to which RNA structure influences protein binding in the presence or absence of a sequence motif is still poorly understood. Existing RNA motif finders either take the structure of the RNA only partially into account, or employ models which are not directly interpretable as sequence-structure motifs. We developed ssHMM, an RNA motif finder based on a hidden Markov model (HMM) and Gibbs sampling which fully captures the relationship between RNA sequence and secondary structure preference of a given RBP. Compared to previous methods which output separate logos for sequence and structure, it directly produces a combined sequence-structure motif when trained on a large set of sequences. ssHMM’s model is visualized intuitively as a graph and facilitates biological interpretation. ssHMM can be used to find novel bona fide sequence-structure motifs of uncharacterized RBPs, such as the one presented here for the YY1 protein. ssHMM reaches a high motif recovery rate on synthetic data, it recovers known RBP motifs from CLIP-Seq data, and scales linearly on the input size, being considerably faster than MEMERIS and RNAcontext on large datasets while being on par with GraphProt. It is freely available on Github and as a Docker image. PMID:28977546

  10. Protein and Peptide Gas-phase Structure Investigation Using Collision Cross Section Measurements and Hydrogen Deuterium Exchange

    NASA Astrophysics Data System (ADS)

    Khakinejad, Mahdiar

    Protein and peptide gas-phase structure analysis provides the opportunity to study these species outside of their explicit environment where the interaction network with surrounding molecules makes the analysis difficult [1]. Although gas-phase structure analysis offers a unique opportunity to study the intrinsic behavior of these biomolecules [2-4], proteins and peptides exhibit very low vapor pressures [2]. Peptide and protein ions can be rendered in the gas-phase using electrospray ionization (ESI) [5]. There is a growing body of literature that shows proteins and peptides can maintain solution structures during the process of ESI and these structures can persist for a few hundred milliseconds [6-9]. Techniques for monitoring gas-phase protein and peptide ion structures are categorized as physical probes and chemical probes. Collision cross section (CCS) measurement, being a physical probe, is a powerful method to investigate gas-phase structure size [3, 7, 10-15]; however, CCS values alone do not establish a one to one relation with structure(i.e., the CCS value is an orientationally averaged value [15-18]. Here we propose the utility of gas-phase hydrogen deuterium exchange (HDX) as a second criterion of structure elucidation. The proposed approach incudes extensive MD simulations to sample biomolecular ion conformation space with the production of numerous, random in-silico structures. Subsequently a CCS can be calculated for these structures and theoretical CCS values are compared with experimental values to produce a pool of candidate structures. Utilizing a chemical reaction model based on the gas-phase HDX mechanism, the HDX kinetics behavior of these candidate structures are predicted and compared to experimental results to nominate the best in-silico structures which match (chemically and physically) with experimental observations. For the predictive approach to succeed, an extensive technique and method development is essential. To combine CCS measurements and gas-phase HDX studies at the amino acid residue level, for the first time a drift tube is connected to a linear ion trap (LIT) with electron transfer dissociation (ETD) capability[19, 20]. In this manner CCS and per-residue deuterium uptake measurements for a model peptide carried out successfully[19]. In this study, the gas-phase conformations of electrosprayed ions of the model peptide KKDDDDIIKIIK have been examined. Using ion structures obtained from molecular dynamics (MD) simulation and considering charge-site/exchange-site density the level of the maximum total deuterium uptake for the gas-phase ions is explained. Also a new hydrogen accessibility scoring (HAS) model that includes two distance calculations (charge site to carbonyl group and carbonyl group to exchange site) is applied to the in-silico structures to describe the expected HDX behavior for these structures. Further investigation to improve the accuracy of the model is accomplished by a "per-residue" HDX kinetics study of the model peptide [21]. In this study, the ion residence time and the deuterium uptake of each residue is measured at different partial pressures of D2O. Subsequently the contribution each residue to the overall HDX rate of the intact peptide ion is calculated. These rate contributions of the residues exhibit a better fit to HAS than their maximum deuterium uptake. Proteins and peptides with very frequent acidic residue in their sequence provide very poor signal levels when employing positive polarity ESI. Also, the comparison of protonated and deprotonated ions of these biomolecules offers the potential to provide a better structural characterization [22]. Per-residue deuterium uptake values resulting from collision-induced dissociation (CID) of the model peptide KKDDDDIIKIIK were used to investigated the degree of hydrogen deuterium scrambling for deprotonated ions [23]. Remarkably, limited isotopic scrambling was observed in this study of this small model peptide. This data and the per-residue deuterium uptake of the triply-protonated model peptide Acetyl-PAAAAKAAAAKAAAAKAAAAK are exploited to propose a lemma to allocate protonation and deprotonation sites for peptide ions in the gas-phase. Insulin ions, as a small protein model system, are examined to investigate the relation of the maximum deuterium uptake value for each insulin chain to the exposed surface area of each insulin subunit [22]. The results show that the methodology can be applied on the protein complexes to provide information about the exposed surface area of each subunit.

  11. 3Drefine: an interactive web server for efficient protein structure refinement.

    PubMed

    Bhattacharya, Debswapna; Nowotny, Jackson; Cao, Renzhi; Cheng, Jianlin

    2016-07-08

    3Drefine is an interactive web server for consistent and computationally efficient protein structure refinement with the capability to perform web-based statistical and visual analysis. The 3Drefine refinement protocol utilizes iterative optimization of hydrogen bonding network combined with atomic-level energy minimization on the optimized model using a composite physics and knowledge-based force fields for efficient protein structure refinement. The method has been extensively evaluated on blind CASP experiments as well as on large-scale and diverse benchmark datasets and exhibits consistent improvement over the initial structure in both global and local structural quality measures. The 3Drefine web server allows for convenient protein structure refinement through a text or file input submission, email notification, provided example submission and is freely available without any registration requirement. The server also provides comprehensive analysis of submissions through various energy and statistical feedback and interactive visualization of multiple refined models through the JSmol applet that is equipped with numerous protein model analysis tools. The web server has been extensively tested and used by many users. As a result, the 3Drefine web server conveniently provides a useful tool easily accessible to the community. The 3Drefine web server has been made publicly available at the URL: http://sysbio.rnet.missouri.edu/3Drefine/. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  12. Microgravity

    NASA Image and Video Library

    2001-06-06

    X-rays diffracted from a well-ordered protein crystal create sharp patterns of scattered light on film. A computer can use these patterns to generate a model of a protein molecule. To analyze the selected crystal, an X-ray crystallographer shines X-rays through the crystal. Unlike a single dental X-ray, which produces a shadow image of a tooth, these X-rays have to be taken many times from different angles to produce a pattern from the scattered light, a map of the intensity of the X-rays after they diffract through the crystal. The X-rays bounce off the electron clouds that form the outer structure of each atom. A flawed crystal will yield a blurry pattern; a well-ordered protein crystal yields a series of sharp diffraction patterns. From these patterns, researchers build an electron density map. With powerful computers and a lot of calculations, scientists can use the electron density patterns to determine the structure of the protein and make a computer-generated model of the structure. The models let researchers improve their understanding of how the protein functions. They also allow scientists to look for receptor sites and active areas that control a protein's function and role in the progress of diseases. From there, pharmaceutical researchers can design molecules that fit the active site, much like a key and lock, so that the protein is locked without affecting the rest of the body. This is called structure-based drug design.

  13. NOXclass: prediction of protein-protein interaction types.

    PubMed

    Zhu, Hongbo; Domingues, Francisco S; Sommer, Ingolf; Lengauer, Thomas

    2006-01-19

    Structural models determined by X-ray crystallography play a central role in understanding protein-protein interactions at the molecular level. Interpretation of these models requires the distinction between non-specific crystal packing contacts and biologically relevant interactions. This has been investigated previously and classification approaches have been proposed. However, less attention has been devoted to distinguishing different types of biological interactions. These interactions are classified as obligate and non-obligate according to the effect of the complex formation on the stability of the protomers. So far no automatic classification methods for distinguishing obligate, non-obligate and crystal packing interactions have been made available. Six interface properties have been investigated on a dataset of 243 protein interactions. The six properties have been combined using a support vector machine algorithm, resulting in NOXclass, a classifier for distinguishing obligate, non-obligate and crystal packing interactions. We achieve an accuracy of 91.8% for the classification of these three types of interactions using a leave-one-out cross-validation procedure. NOXclass allows the interpretation and analysis of protein quaternary structures. In particular, it generates testable hypotheses regarding the nature of protein-protein interactions, when experimental results are not available. We expect this server will benefit the users of protein structural models, as well as protein crystallographers and NMR spectroscopists. A web server based on the method and the datasets used in this study are available at http://noxclass.bioinf.mpi-inf.mpg.de/.

  14. MISS-Prot: web server for self/non-self discrimination of protein residue networks in parasites; theory and experiments in Fasciola peptides and Anisakis allergens.

    PubMed

    González-Díaz, Humberto; Muíño, Laura; Anadón, Ana M; Romaris, Fernanda; Prado-Prado, Francisco J; Munteanu, Cristian R; Dorado, Julián; Sierra, Alejandro Pazos; Mezo, Mercedes; González-Warleta, Marta; Gárate, Teresa; Ubeira, Florencio M

    2011-06-01

    Infections caused by human parasites (HPs) affect the poorest 500 million people worldwide but chemotherapy has become expensive, toxic, and/or less effective due to drug resistance. On the other hand, many 3D structures in Protein Data Bank (PDB) remain without function annotation. We need theoretical models to quickly predict biologically relevant Parasite Self Proteins (PSP), which are expressed differentially in a given parasite and are dissimilar to proteins expressed in other parasites and have a high probability to become new vaccines (unique sequence) or drug targets (unique 3D structure). We present herein a model for PSPs in eight different HPs (Ascaris, Entamoeba, Fasciola, Giardia, Leishmania, Plasmodium, Trypanosoma, and Toxoplasma) with 90% accuracy for 15 341 training and validation cases. The model combines protein residue networks, Markov Chain Models (MCM) and Artificial Neural Networks (ANN). The input parameters are the spectral moments of the Markov transition matrix for electrostatic interactions associated with the protein residue complex network calculated with the MARCH-INSIDE software. We implemented this model in a new web-server called MISS-Prot (MARCH-INSIDE Scores for Self-Proteins). MISS-Prot was programmed using PHP/HTML/Python and MARCH-INSIDE routines and is freely available at: . This server is easy to use by non-experts in Bioinformatics who can carry out automatic online upload and prediction with 3D structures deposited at PDB (mode 1). We can also study outcomes of Peptide Mass Fingerprinting (PMFs) and MS/MS for query proteins with unknown 3D structures (mode 2). We illustrated the use of MISS-Prot in experimental and/or theoretical studies of peptides from Fasciola hepatica cathepsin proteases or present on 10 Anisakis simplex allergens (Ani s 1 to Ani s 10). In doing so, we combined electrophoresis (1DE), MALDI-TOF Mass Spectroscopy, and MASCOT to seek sequences, Molecular Mechanics + Molecular Dynamics (MM/MD) to generate 3D structures and MISS-Prot to predict PSP scores. MISS-Prot also allows the prediction of PSP proteins in 16 additional species including parasite hosts, fungi pathogens, disease transmission vectors, and biotechnologically relevant organisms.

  15. Rotavirus architecture at subnanometer resolution.

    PubMed

    Li, Zongli; Baker, Matthew L; Jiang, Wen; Estes, Mary K; Prasad, B V Venkataram

    2009-02-01

    Rotavirus, a nonturreted member of the Reoviridae, is the causative agent of severe infantile diarrhea. The double-stranded RNA genome encodes six structural proteins that make up the triple-layer particle. X-ray crystallography has elucidated the structure of one of these capsid proteins, VP6, and two domains from VP4, the spike protein. Complementing this work, electron cryomicroscopy (cryoEM) has provided relatively low-resolution structures for the triple-layer capsid in several biochemical states. However, a complete, high-resolution structural model of rotavirus remains unresolved. Combining new structural analysis techniques with the subnanometer-resolution cryoEM structure of rotavirus, we now provide a more detailed structural model for the major capsid proteins and their interactions within the triple-layer particle. Through a series of intersubunit interactions, the spike protein (VP4) adopts a dimeric appearance above the capsid surface, while forming a trimeric base anchored inside one of the three types of aqueous channels between VP7 and VP6 capsid layers. While the trimeric base suggests the presence of three VP4 molecules in one spike, only hints of the third molecule are observed above the capsid surface. Beyond their interactions with VP4, the interactions between VP6 and VP7 subunits could also be readily identified. In the innermost T=1 layer composed of VP2, visualization of the secondary structure elements allowed us to identify the polypeptide fold for VP2 and examine the complex network of interactions between this layer and the T=13 VP6 layer. This integrated structural approach has resulted in a relatively high-resolution structural model for the complete, infectious structure of rotavirus, as well as revealing the subtle nuances required for maintaining interactions in such a large macromolecular assembly.

  16. Insight into the Intermolecular Recognition Mechanism between Keap1 and IKKβ Combining Homology Modelling, Protein-Protein Docking, Molecular Dynamics Simulations and Virtual Alanine Mutation

    PubMed Central

    Jiang, Zheng-Yu; Chu, Hong-Xi; Xi, Mei-Yang; Yang, Ting-Ting; Jia, Jian-Min; Huang, Jing-Jie; Guo, Xiao-Ke; Zhang, Xiao-Jin; You, Qi-Dong; Sun, Hao-Peng

    2013-01-01

    Degradation of certain proteins through the ubiquitin-proteasome pathway is a common strategy taken by the key modulators responsible for stress responses. Kelch-like ECH-associated protein-1(Keap1), a substrate adaptor component of the Cullin3 (Cul3)-based ubiquitin E3 ligase complex, mediates the ubiquitination of two key modulators, NF-E2-related factor 2 (Nrf2) and IκB kinase β (IKKβ), which are involved in the redox control of gene transcription. However, compared to the Keap1-Nrf2 protein-protein interaction (PPI), the intermolecular recognition mechanism of Keap1 and IKKβ has been poorly investigated. In order to explore the binding pattern between Keap1 and IKKβ, the PPI model of Keap1 and IKKβ was investigated. The structure of human IKKβ was constructed by means of the homology modeling method and using reported crystal structure of Xenopus laevis IKKβ as the template. A protein-protein docking method was applied to develop the Keap1-IKKβ complex model. After the refinement and visual analysis of docked proteins, the chosen pose was further optimized through molecular dynamics simulations. The resulting structure was utilized to conduct the virtual alanine mutation for the exploration of hot-spots significant for the intermolecular interaction. Overall, our results provided structural insights into the PPI model of Keap1-IKKβ and suggest that the substrate specificity of Keap1 depend on the interaction with the key tyrosines, namely Tyr525, Tyr574 and Tyr334. The study presented in the current project may be useful to design molecules that selectively modulate Keap1. The selective recognition mechanism of Keap1 with IKKβ or Nrf2 will be helpful to further know the crosstalk between NF-κB and Nrf2 signaling. PMID:24066166

  17. Insight into the intermolecular recognition mechanism between Keap1 and IKKβ combining homology modelling, protein-protein docking, molecular dynamics simulations and virtual alanine mutation.

    PubMed

    Jiang, Zheng-Yu; Chu, Hong-Xi; Xi, Mei-Yang; Yang, Ting-Ting; Jia, Jian-Min; Huang, Jing-Jie; Guo, Xiao-Ke; Zhang, Xiao-Jin; You, Qi-Dong; Sun, Hao-Peng

    2013-01-01

    Degradation of certain proteins through the ubiquitin-proteasome pathway is a common strategy taken by the key modulators responsible for stress responses. Kelch-like ECH-associated protein-1(Keap1), a substrate adaptor component of the Cullin3 (Cul3)-based ubiquitin E3 ligase complex, mediates the ubiquitination of two key modulators, NF-E2-related factor 2 (Nrf2) and IκB kinase β (IKKβ), which are involved in the redox control of gene transcription. However, compared to the Keap1-Nrf2 protein-protein interaction (PPI), the intermolecular recognition mechanism of Keap1 and IKKβ has been poorly investigated. In order to explore the binding pattern between Keap1 and IKKβ, the PPI model of Keap1 and IKKβ was investigated. The structure of human IKKβ was constructed by means of the homology modeling method and using reported crystal structure of Xenopus laevis IKKβ as the template. A protein-protein docking method was applied to develop the Keap1-IKKβ complex model. After the refinement and visual analysis of docked proteins, the chosen pose was further optimized through molecular dynamics simulations. The resulting structure was utilized to conduct the virtual alanine mutation for the exploration of hot-spots significant for the intermolecular interaction. Overall, our results provided structural insights into the PPI model of Keap1-IKKβ and suggest that the substrate specificity of Keap1 depend on the interaction with the key tyrosines, namely Tyr525, Tyr574 and Tyr334. The study presented in the current project may be useful to design molecules that selectively modulate Keap1. The selective recognition mechanism of Keap1 with IKKβ or Nrf2 will be helpful to further know the crosstalk between NF-κB and Nrf2 signaling.

  18. The role of internal duplication in the evolution of multi-domain proteins.

    PubMed

    Nacher, J C; Hayashida, M; Akutsu, T

    2010-08-01

    Many proteins consist of several structural domains. These multi-domain proteins have likely been generated by selective genome growth dynamics during evolution to perform new functions as well as to create structures that fold on a biologically feasible time scale. Domain units frequently evolved through a variety of genetic shuffling mechanisms. Here we examine the protein domain statistics of more than 1000 organisms including eukaryotic, archaeal and bacterial species. The analysis extends earlier findings on asymmetric statistical laws for proteome to a wider variety of species. While proteins are composed of a wide range of domains, displaying a power-law decay, the computation of domain families for each protein reveals an exponential distribution, characterizing a protein universe composed of a thin number of unique families. Structural studies in proteomics have shown that domain repeats, or internal duplicated domains, represent a small but significant fraction of genome. In spite of its importance, this observation has been largely overlooked until recently. We model the evolutionary dynamics of proteome and demonstrate that these distinct distributions are in fact rooted in an internal duplication mechanism. This process generates the contemporary protein structural domain universe, determines its reduced thickness, and tames its growth. These findings have important implications, ranging from protein interaction network modeling to evolutionary studies based on fundamental mechanisms governing genome expansion.

  19. Molecular dynamics simulations of the Nip7 proteins from the marine deep- and shallow-water Pyrococcus species

    PubMed Central

    2014-01-01

    Background The identification of the mechanisms of adaptation of protein structures to extreme environmental conditions is a challenging task of structural biology. We performed molecular dynamics (MD) simulations of the Nip7 protein involved in RNA processing from the shallow-water (P. furiosus) and the deep-water (P. abyssi) marine hyperthermophylic archaea at different temperatures (300 and 373 K) and pressures (0.1, 50 and 100 MPa). The aim was to disclose similarities and differences between the deep- and shallow-sea protein models at different temperatures and pressures. Results The current results demonstrate that the 3D models of the two proteins at all the examined values of pressures and temperatures are compact, stable and similar to the known crystal structure of the P. abyssi Nip7. The structural deviations and fluctuations in the polypeptide chain during the MD simulations were the most pronounced in the loop regions, their magnitude being larger for the C-terminal domain in both proteins. A number of highly mobile segments the protein globule presumably involved in protein-protein interactions were identified. Regions of the polypeptide chain with significant difference in conformational dynamics between the deep- and shallow-water proteins were identified. Conclusions The results of our analysis demonstrated that in the examined ranges of temperatures and pressures, increase in temperature has a stronger effect on change in the dynamic properties of the protein globule than the increase in pressure. The conformational changes of both the deep- and shallow-sea protein models under increasing temperature and pressure are non-uniform. Our current results indicate that amino acid substitutions between shallow- and deep-water proteins only slightly affect overall stability of two proteins. Rather, they may affect the interactions of the Nip7 protein with its protein or RNA partners. PMID:25315147

  20. Protein 3D Structure Computed from Evolutionary Sequence Variation

    PubMed Central

    Sheridan, Robert; Hopf, Thomas A.; Pagnani, Andrea; Zecchina, Riccardo; Sander, Chris

    2011-01-01

    The evolutionary trajectory of a protein through sequence space is constrained by its function. Collections of sequence homologs record the outcomes of millions of evolutionary experiments in which the protein evolves according to these constraints. Deciphering the evolutionary record held in these sequences and exploiting it for predictive and engineering purposes presents a formidable challenge. The potential benefit of solving this challenge is amplified by the advent of inexpensive high-throughput genomic sequencing. In this paper we ask whether we can infer evolutionary constraints from a set of sequence homologs of a protein. The challenge is to distinguish true co-evolution couplings from the noisy set of observed correlations. We address this challenge using a maximum entropy model of the protein sequence, constrained by the statistics of the multiple sequence alignment, to infer residue pair couplings. Surprisingly, we find that the strength of these inferred couplings is an excellent predictor of residue-residue proximity in folded structures. Indeed, the top-scoring residue couplings are sufficiently accurate and well-distributed to define the 3D protein fold with remarkable accuracy. We quantify this observation by computing, from sequence alone, all-atom 3D structures of fifteen test proteins from different fold classes, ranging in size from 50 to 260 residues., including a G-protein coupled receptor. These blinded inferences are de novo, i.e., they do not use homology modeling or sequence-similar fragments from known structures. The co-evolution signals provide sufficient information to determine accurate 3D protein structure to 2.7–4.8 Å Cα-RMSD error relative to the observed structure, over at least two-thirds of the protein (method called EVfold, details at http://EVfold.org). This discovery provides insight into essential interactions constraining protein evolution and will facilitate a comprehensive survey of the universe of protein structures, new strategies in protein and drug design, and the identification of functional genetic variants in normal and disease genomes. PMID:22163331

Top