Worldwide Protein Data Bank validation information: usage and trends.
Smart, Oliver S; Horský, Vladimír; Gore, Swanand; Svobodová Vařeková, Radka; Bendová, Veronika; Kleywegt, Gerard J; Velankar, Sameer
2018-03-01
Realising the importance of assessing the quality of the biomolecular structures deposited in the Protein Data Bank (PDB), the Worldwide Protein Data Bank (wwPDB) partners established Validation Task Forces to obtain advice on the methods and standards to be used to validate structures determined by X-ray crystallography, nuclear magnetic resonance spectroscopy and three-dimensional electron cryo-microscopy. The resulting wwPDB validation pipeline is an integral part of the wwPDB OneDep deposition, biocuration and validation system. The wwPDB Validation Service webserver (https://validate.wwpdb.org) can be used to perform checks prior to deposition. Here, it is shown how validation metrics can be combined to produce an overall score that allows the ranking of macromolecular structures and domains in search results. The ValTrends DB database provides users with a convenient way to access and analyse validation information and other properties of X-ray crystal structures in the PDB, including investigating trends in and correlations between different structure properties and validation metrics.
Worldwide Protein Data Bank validation information: usage and trends
Horský, Vladimír; Gore, Swanand; Svobodová Vařeková, Radka; Bendová, Veronika
2018-01-01
Realising the importance of assessing the quality of the biomolecular structures deposited in the Protein Data Bank (PDB), the Worldwide Protein Data Bank (wwPDB) partners established Validation Task Forces to obtain advice on the methods and standards to be used to validate structures determined by X-ray crystallography, nuclear magnetic resonance spectroscopy and three-dimensional electron cryo-microscopy. The resulting wwPDB validation pipeline is an integral part of the wwPDB OneDep deposition, biocuration and validation system. The wwPDB Validation Service webserver (https://validate.wwpdb.org) can be used to perform checks prior to deposition. Here, it is shown how validation metrics can be combined to produce an overall score that allows the ranking of macromolecular structures and domains in search results. The ValTrendsDB database provides users with a convenient way to access and analyse validation information and other properties of X-ray crystal structures in the PDB, including investigating trends in and correlations between different structure properties and validation metrics. PMID:29533231
Implementing an X-ray validation pipeline for the Protein Data Bank
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gore, Swanand; Velankar, Sameer; Kleywegt, Gerard J., E-mail: gerard@ebi.ac.uk
2012-04-01
The implementation of a validation pipeline, based on community recommendations, for future depositions of X-ray crystal structures in the Protein Data Bank is described. There is an increasing realisation that the quality of the biomacromolecular structures deposited in the Protein Data Bank (PDB) archive needs to be assessed critically using established and powerful validation methods. The Worldwide Protein Data Bank (wwPDB) organization has convened several Validation Task Forces (VTFs) to advise on the methods and standards that should be used to validate all of the entries already in the PDB as well as all structures that will be deposited inmore » the future. The recommendations of the X-ray VTF are currently being implemented in a software pipeline. Here, ongoing work on this pipeline is briefly described as well as ways in which validation-related information could be presented to users of structural data.« less
A new test set for validating predictions of protein-ligand interaction.
Nissink, J Willem M; Murray, Chris; Hartshorn, Mike; Verdonk, Marcel L; Cole, Jason C; Taylor, Robin
2002-12-01
We present a large test set of protein-ligand complexes for the purpose of validating algorithms that rely on the prediction of protein-ligand interactions. The set consists of 305 complexes with protonation states assigned by manual inspection. The following checks have been carried out to identify unsuitable entries in this set: (1) assessing the involvement of crystallographically related protein units in ligand binding; (2) identification of bad clashes between protein side chains and ligand; and (3) assessment of structural errors, and/or inconsistency of ligand placement with crystal structure electron density. In addition, the set has been pruned to assure diversity in terms of protein-ligand structures, and subsets are supplied for different protein-structure resolution ranges. A classification of the set by protein type is available. As an illustration, validation results are shown for GOLD and SuperStar. GOLD is a program that performs flexible protein-ligand docking, and SuperStar is used for the prediction of favorable interaction sites in proteins. The new CCDC/Astex test set is freely available to the scientific community (http://www.ccdc.cam.ac.uk). Copyright 2002 Wiley-Liss, Inc.
Models of protein-ligand crystal structures: trust, but verify.
Deller, Marc C; Rupp, Bernhard
2015-09-01
X-ray crystallography provides the most accurate models of protein-ligand structures. These models serve as the foundation of many computational methods including structure prediction, molecular modelling, and structure-based drug design. The success of these computational methods ultimately depends on the quality of the underlying protein-ligand models. X-ray crystallography offers the unparalleled advantage of a clear mathematical formalism relating the experimental data to the protein-ligand model. In the case of X-ray crystallography, the primary experimental evidence is the electron density of the molecules forming the crystal. The first step in the generation of an accurate and precise crystallographic model is the interpretation of the electron density of the crystal, typically carried out by construction of an atomic model. The atomic model must then be validated for fit to the experimental electron density and also for agreement with prior expectations of stereochemistry. Stringent validation of protein-ligand models has become possible as a result of the mandatory deposition of primary diffraction data, and many computational tools are now available to aid in the validation process. Validation of protein-ligand complexes has revealed some instances of overenthusiastic interpretation of ligand density. Fundamental concepts and metrics of protein-ligand quality validation are discussed and we highlight software tools to assist in this process. It is essential that end users select high quality protein-ligand models for their computational and biological studies, and we provide an overview of how this can be achieved.
ProTSAV: A protein tertiary structure analysis and validation server.
Singh, Ankita; Kaushik, Rahul; Mishra, Avinash; Shanker, Asheesh; Jayaram, B
2016-01-01
Quality assessment of predicted model structures of proteins is as important as the protein tertiary structure prediction. A highly efficient quality assessment of predicted model structures directs further research on function. Here we present a new server ProTSAV, capable of evaluating predicted model structures based on some popular online servers and standalone tools. ProTSAV furnishes the user with a single quality score in case of individual protein structure along with a graphical representation and ranking in case of multiple protein structure assessment. The server is validated on ~64,446 protein structures including experimental structures from RCSB and predicted model structures for CASP targets and from public decoy sets. ProTSAV succeeds in predicting quality of protein structures with a specificity of 100% and a sensitivity of 98% on experimentally solved structures and achieves a specificity of 88%and a sensitivity of 91% on predicted protein structures of CASP11 targets under 2Å.The server overcomes the limitations of any single server/method and is seen to be robust in helping in quality assessment. ProTSAV is freely available at http://www.scfbio-iitd.res.in/software/proteomics/protsav.jsp. Copyright © 2015 Elsevier B.V. All rights reserved.
Validation of Structures in the Protein Data Bank.
Gore, Swanand; Sanz García, Eduardo; Hendrickx, Pieter M S; Gutmanas, Aleksandras; Westbrook, John D; Yang, Huanwang; Feng, Zukang; Baskaran, Kumaran; Berrisford, John M; Hudson, Brian P; Ikegawa, Yasuyo; Kobayashi, Naohiro; Lawson, Catherine L; Mading, Steve; Mak, Lora; Mukhopadhyay, Abhik; Oldfield, Thomas J; Patwardhan, Ardan; Peisach, Ezra; Sahni, Gaurav; Sekharan, Monica R; Sen, Sanchayita; Shao, Chenghua; Smart, Oliver S; Ulrich, Eldon L; Yamashita, Reiko; Quesada, Martha; Young, Jasmine Y; Nakamura, Haruki; Markley, John L; Berman, Helen M; Burley, Stephen K; Velankar, Sameer; Kleywegt, Gerard J
2017-12-05
The Worldwide PDB recently launched a deposition, biocuration, and validation tool: OneDep. At various stages of OneDep data processing, validation reports for three-dimensional structures of biological macromolecules are produced. These reports are based on recommendations of expert task forces representing crystallography, nuclear magnetic resonance, and cryoelectron microscopy communities. The reports provide useful metrics with which depositors can evaluate the quality of the experimental data, the structural model, and the fit between them. The validation module is also available as a stand-alone web server and as a programmatically accessible web service. A growing number of journals require the official wwPDB validation reports (produced at biocuration) to accompany manuscripts describing macromolecular structures. Upon public release of the structure, the validation report becomes part of the public PDB archive. Geometric quality scores for proteins in the PDB archive have improved over the past decade. Copyright © 2017 The Authors. Published by Elsevier Ltd.. All rights reserved.
Rational design of alpha-helical tandem repeat proteins with closed architectures
Doyle, Lindsey; Hallinan, Jazmine; Bolduc, Jill; Parmeggiani, Fabio; Baker, David; Stoddard, Barry L.; Bradley, Philip
2015-01-01
Tandem repeat proteins, which are formed by repetition of modular units of protein sequence and structure, play important biological roles as macromolecular binding and scaffolding domains, enzymes, and building blocks for the assembly of fibrous materials1,2. The modular nature of repeat proteins enables the rapid construction and diversification of extended binding surfaces by duplication and recombination of simple building blocks3,4. The overall architecture of tandem repeat protein structures – which is dictated by the internal geometry and local packing of the repeat building blocks – is highly diverse, ranging from extended, super-helical folds that bind peptide, DNA, and RNA partners5–9, to closed and compact conformations with internal cavities suitable for small molecule binding and catalysis10. Here we report the development and validation of computational methods for de novo design of tandem repeat protein architectures driven purely by geometric criteria defining the inter-repeat geometry, without reference to the sequences and structures of existing repeat protein families. We have applied these methods to design a series of closed alpha-solenoid11 repeat structures (alpha-toroids) in which the inter-repeat packing geometry is constrained so as to juxtapose the N- and C-termini; several of these designed structures have been validated by X-ray crystallography. Unlike previous approaches to tandem repeat protein engineering12–20, our design procedure does not rely on template sequence or structural information taken from natural repeat proteins and hence can produce structures unlike those seen in nature. As an example, we have successfully designed and validated closed alpha-solenoid repeats with a left-handed helical architecture that – to our knowledge – is not yet present in the protein structure database21. PMID:26675735
Sheffler, Will; Baker, David
2009-01-01
We present a novel method called RosettaHoles for visual and quantitative assessment of underpacking in the protein core. RosettaHoles generates a set of spherical cavity balls that fill the empty volume between atoms in the protein interior. For visualization, the cavity balls are aggregated into contiguous overlapping clusters and small cavities are discarded, leaving an uncluttered representation of the unfilled regions of space in a structure. For quantitative analysis, the cavity ball data are used to estimate the probability of observing a given cavity in a high-resolution crystal structure. RosettaHoles provides excellent discrimination between real and computationally generated structures, is predictive of incorrect regions in models, identifies problematic structures in the Protein Data Bank, and promises to be a useful validation tool for newly solved experimental structures.
Sheffler, Will; Baker, David
2009-01-01
We present a novel method called RosettaHoles for visual and quantitative assessment of underpacking in the protein core. RosettaHoles generates a set of spherical cavity balls that fill the empty volume between atoms in the protein interior. For visualization, the cavity balls are aggregated into contiguous overlapping clusters and small cavities are discarded, leaving an uncluttered representation of the unfilled regions of space in a structure. For quantitative analysis, the cavity ball data are used to estimate the probability of observing a given cavity in a high-resolution crystal structure. RosettaHoles provides excellent discrimination between real and computationally generated structures, is predictive of incorrect regions in models, identifies problematic structures in the Protein Data Bank, and promises to be a useful validation tool for newly solved experimental structures. PMID:19177366
A New Generation of Crystallographic Validation Tools for the Protein Data Bank
Read, Randy J.; Adams, Paul D.; Arendall, W. Bryan; Brunger, Axel T.; Emsley, Paul; Joosten, Robbie P.; Kleywegt, Gerard J.; Krissinel, Eugene B.; Lütteke, Thomas; Otwinowski, Zbyszek; Perrakis, Anastassis; Richardson, Jane S.; Sheffler, William H.; Smith, Janet L.; Tickle, Ian J.; Vriend, Gert; Zwart, Peter H.
2011-01-01
Summary This report presents the conclusions of the X-ray Validation Task Force of the worldwide Protein Data Bank (PDB). The PDB has expanded massively since current criteria for validation of deposited structures were adopted, allowing a much more sophisticated understanding of all the components of macromolecular crystals. The size of the PDB creates new opportunities to validate structures by comparison with the existing database, and the now-mandatory deposition of structure factors creates new opportunities to validate the underlying diffraction data. These developments highlighted the need for a new assessment of validation criteria. The Task Force recommends that a small set of validation data be presented in an easily understood format, relative to both the full PDB and the applicable resolution class, with greater detail available to interested users. Most importantly, we recommend that referees and editors judging the quality of structural experiments have access to a concise summary of well-established quality indicators. PMID:22000512
A new generation of crystallographic validation tools for the protein data bank.
Read, Randy J; Adams, Paul D; Arendall, W Bryan; Brunger, Axel T; Emsley, Paul; Joosten, Robbie P; Kleywegt, Gerard J; Krissinel, Eugene B; Lütteke, Thomas; Otwinowski, Zbyszek; Perrakis, Anastassis; Richardson, Jane S; Sheffler, William H; Smith, Janet L; Tickle, Ian J; Vriend, Gert; Zwart, Peter H
2011-10-12
This report presents the conclusions of the X-ray Validation Task Force of the worldwide Protein Data Bank (PDB). The PDB has expanded massively since current criteria for validation of deposited structures were adopted, allowing a much more sophisticated understanding of all the components of macromolecular crystals. The size of the PDB creates new opportunities to validate structures by comparison with the existing database, and the now-mandatory deposition of structure factors creates new opportunities to validate the underlying diffraction data. These developments highlighted the need for a new assessment of validation criteria. The Task Force recommends that a small set of validation data be presented in an easily understood format, relative to both the full PDB and the applicable resolution class, with greater detail available to interested users. Most importantly, we recommend that referees and editors judging the quality of structural experiments have access to a concise summary of well-established quality indicators. Copyright © 2011 Elsevier Ltd. All rights reserved.
An ambiguity principle for assigning protein structural domains.
Postic, Guillaume; Ghouzam, Yassine; Chebrek, Romain; Gelly, Jean-Christophe
2017-01-01
Ambiguity is the quality of being open to several interpretations. For an image, it arises when the contained elements can be delimited in two or more distinct ways, which may cause confusion. We postulate that it also applies to the analysis of protein three-dimensional structure, which consists in dividing the molecule into subunits called domains. Because different definitions of what constitutes a domain can be used to partition a given structure, the same protein may have different but equally valid domain annotations. However, knowledge and experience generally displace our ability to accept more than one way to decompose the structure of an object-in this case, a protein. This human bias in structure analysis is particularly harmful because it leads to ignoring potential avenues of research. We present an automated method capable of producing multiple alternative decompositions of protein structure (web server and source code available at www.dsimb.inserm.fr/sword/). Our innovative algorithm assigns structural domains through the hierarchical merging of protein units, which are evolutionarily preserved substructures that describe protein architecture at an intermediate level, between domain and secondary structure. To validate the use of these protein units for decomposing protein structures into domains, we set up an extensive benchmark made of expert annotations of structural domains and including state-of-the-art domain parsing algorithms. The relevance of our "multipartitioning" approach is shown through numerous examples of applications covering protein function, evolution, folding, and structure prediction. Finally, we introduce a measure for the structural ambiguity of protein molecules.
Ryu, Hyojung; Lim, GyuTae; Sung, Bong Hyun; Lee, Jinhyuk
2016-02-15
Protein structure refinement is a necessary step for the study of protein function. In particular, some nuclear magnetic resonance (NMR) structures are of lower quality than X-ray crystallographic structures. Here, we present NMRe, a web-based server for NMR structure refinement. The previously developed knowledge-based energy function STAP (Statistical Torsion Angle Potential) was used for NMRe refinement. With STAP, NMRe provides two refinement protocols using two types of distance restraints. If a user provides NOE (Nuclear Overhauser Effect) data, the refinement is performed with the NOE distance restraints as a conventional NMR structure refinement. Additionally, NMRe generates NOE-like distance restraints based on the inter-hydrogen distances derived from the input structure. The efficiency of NMRe refinement was validated on 20 NMR structures. Most of the quality assessment scores of the refined NMR structures were better than those of the original structures. The refinement results are provided as a three-dimensional structure view, a secondary structure scheme, and numerical and graphical structure validation scores. NMRe is available at http://psb.kobic.re.kr/nmre/. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
An ambiguity principle for assigning protein structural domains
Postic, Guillaume; Ghouzam, Yassine; Chebrek, Romain; Gelly, Jean-Christophe
2017-01-01
Ambiguity is the quality of being open to several interpretations. For an image, it arises when the contained elements can be delimited in two or more distinct ways, which may cause confusion. We postulate that it also applies to the analysis of protein three-dimensional structure, which consists in dividing the molecule into subunits called domains. Because different definitions of what constitutes a domain can be used to partition a given structure, the same protein may have different but equally valid domain annotations. However, knowledge and experience generally displace our ability to accept more than one way to decompose the structure of an object—in this case, a protein. This human bias in structure analysis is particularly harmful because it leads to ignoring potential avenues of research. We present an automated method capable of producing multiple alternative decompositions of protein structure (web server and source code available at www.dsimb.inserm.fr/sword/). Our innovative algorithm assigns structural domains through the hierarchical merging of protein units, which are evolutionarily preserved substructures that describe protein architecture at an intermediate level, between domain and secondary structure. To validate the use of these protein units for decomposing protein structures into domains, we set up an extensive benchmark made of expert annotations of structural domains and including state-of-the-art domain parsing algorithms. The relevance of our “multipartitioning” approach is shown through numerous examples of applications covering protein function, evolution, folding, and structure prediction. Finally, we introduce a measure for the structural ambiguity of protein molecules. PMID:28097215
MSX-3D: a tool to validate 3D protein models using mass spectrometry.
Heymann, Michaël; Paramelle, David; Subra, Gilles; Forest, Eric; Martinez, Jean; Geourjon, Christophe; Deléage, Gilbert
2008-12-01
The technique of chemical cross-linking followed by mass spectrometry has proven to bring valuable information about the protein structure and interactions between proteic subunits. It is an effective and efficient way to experimentally investigate some aspects of a protein structure when NMR and X-ray crystallography data are lacking. We introduce MSX-3D, a tool specifically geared to validate protein models using mass spectrometry. In addition to classical peptides identifications, it allows an interactive 3D visualization of the distance constraints derived from a cross-linking experiment. Freely available at http://proteomics-pbil.ibcp.fr
A Generative Angular Model of Protein Structure Evolution
Golden, Michael; García-Portugués, Eduardo; Sørensen, Michael; Mardia, Kanti V.; Hamelryck, Thomas; Hein, Jotun
2017-01-01
Abstract Recently described stochastic models of protein evolution have demonstrated that the inclusion of structural information in addition to amino acid sequences leads to a more reliable estimation of evolutionary parameters. We present a generative, evolutionary model of protein structure and sequence that is valid on a local length scale. The model concerns the local dependencies between sequence and structure evolution in a pair of homologous proteins. The evolutionary trajectory between the two structures in the protein pair is treated as a random walk in dihedral angle space, which is modeled using a novel angular diffusion process on the two-dimensional torus. Coupling sequence and structure evolution in our model allows for modeling both “smooth” conformational changes and “catastrophic” conformational jumps, conditioned on the amino acid changes. The model has interpretable parameters and is comparatively more realistic than previous stochastic models, providing new insights into the relationship between sequence and structure evolution. For example, using the trained model we were able to identify an apparent sequence–structure evolutionary motif present in a large number of homologous protein pairs. The generative nature of our model enables us to evaluate its validity and its ability to simulate aspects of protein evolution conditioned on an amino acid sequence, a related amino acid sequence, a related structure or any combination thereof. PMID:28453724
Potrzebowski, Wojciech; André, Ingemar
2015-07-01
For highly oriented fibrillar molecules, three-dimensional structures can often be determined from X-ray fiber diffraction data. However, because of limited information content, structure determination and validation can be challenging. We demonstrate that automated structure determination of protein fibers can be achieved by guiding the building of macromolecular models with fiber diffraction data. We illustrate the power of our approach by determining the structures of six bacteriophage viruses de novo using fiber diffraction data alone and together with solid-state NMR data. Furthermore, we demonstrate the feasibility of molecular replacement from monomeric and fibrillar templates by solving the structure of a plant virus using homology modeling and protein-protein docking. The generated models explain the experimental data to the same degree as deposited reference structures but with improved structural quality. We also developed a cross-validation method for model selection. The results highlight the power of fiber diffraction data as structural constraints.
NASA Astrophysics Data System (ADS)
Santos, Marlus Alves Dos; Teixeira, Francesco Brugnera; Moreira, Heline Hellen Teixeira; Rodrigues, Adele Aud; Machado, Fabrício Castro; Clemente, Tatiana Mordente; Brigido, Paula Cristina; Silva, Rebecca Tavares E.; Purcino, Cecílio; Gomes, Rafael Gonçalves Barbosa; Bahia, Diana; Mortara, Renato Arruda; Munte, Claudia Elisabeth; Horjales, Eduardo; da Silva, Claudio Vieira
2014-03-01
Structural studies of proteins normally require large quantities of pure material that can only be obtained through heterologous expression systems and recombinant technique. In these procedures, large amounts of expressed protein are often found in the insoluble fraction, making protein purification from the soluble fraction inefficient, laborious, and costly. Usually, protein refolding is avoided due to a lack of experimental assays that can validate correct folding and that can compare the conformational population to that of the soluble fraction. Herein, we propose a validation method using simple and rapid 1D 1H nuclear magnetic resonance (NMR) spectra that can efficiently compare protein samples, including individual information of the environment of each proton in the structure.
Twilight reloaded: the peptide experience.
Weichenberger, Christian X; Pozharski, Edwin; Rupp, Bernhard
2017-03-01
The de facto commoditization of biomolecular crystallography as a result of almost disruptive instrumentation automation and continuing improvement of software allows any sensibly trained structural biologist to conduct crystallographic studies of biomolecules with reasonably valid outcomes: that is, models based on properly interpreted electron density. Robust validation has led to major mistakes in the protein part of structure models becoming rare, but some depositions of protein-peptide complex structure models, which generally carry significant interest to the scientific community, still contain erroneous models of the bound peptide ligand. Here, the protein small-molecule ligand validation tool Twilight is updated to include peptide ligands. (i) The primary technical reasons and potential human factors leading to problems in ligand structure models are presented; (ii) a new method used to score peptide-ligand models is presented; (iii) a few instructive and specific examples, including an electron-density-based analysis of peptide-ligand structures that do not contain any ligands, are discussed in detail; (iv) means to avoid such mistakes and the implications for database integrity are discussed and (v) some suggestions as to how journal editors could help to expunge errors from the Protein Data Bank are provided.
Kemege, Kyle E.; Hickey, John M.; Lovell, Scott; Battaile, Kevin P.; Zhang, Yang; Hefty, P. Scott
2011-01-01
Chlamydia trachomatis is a medically important pathogen that encodes a relatively high percentage of proteins with unknown function. The three-dimensional structure of a protein can be very informative regarding the protein's functional characteristics; however, determining protein structures experimentally can be very challenging. Computational methods that model protein structures with sufficient accuracy to facilitate functional studies have had notable successes. To evaluate the accuracy and potential impact of computational protein structure modeling of hypothetical proteins encoded by Chlamydia, a successful computational method termed I-TASSER was utilized to model the three-dimensional structure of a hypothetical protein encoded by open reading frame (ORF) CT296. CT296 has been reported to exhibit functional properties of a divalent cation transcription repressor (DcrA), with similarity to the Escherichia coli iron-responsive transcriptional repressor, Fur. Unexpectedly, the I-TASSER model of CT296 exhibited no structural similarity to any DNA-interacting proteins or motifs. To validate the I-TASSER-generated model, the structure of CT296 was solved experimentally using X-ray crystallography. Impressively, the ab initio I-TASSER-generated model closely matched (2.72-Å Cα root mean square deviation [RMSD]) the high-resolution (1.8-Å) crystal structure of CT296. Modeled and experimentally determined structures of CT296 share structural characteristics of non-heme Fe(II) 2-oxoglutarate-dependent enzymes, although key enzymatic residues are not conserved, suggesting a unique biochemical process is likely associated with CT296 function. Additionally, functional analyses did not support prior reports that CT296 has properties shared with divalent cation repressors such as Fur. PMID:21965559
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kemege, Kyle E.; Hickey, John M.; Lovell, Scott
2012-02-13
Chlamydia trachomatis is a medically important pathogen that encodes a relatively high percentage of proteins with unknown function. The three-dimensional structure of a protein can be very informative regarding the protein's functional characteristics; however, determining protein structures experimentally can be very challenging. Computational methods that model protein structures with sufficient accuracy to facilitate functional studies have had notable successes. To evaluate the accuracy and potential impact of computational protein structure modeling of hypothetical proteins encoded by Chlamydia, a successful computational method termed I-TASSER was utilized to model the three-dimensional structure of a hypothetical protein encoded by open reading frame (ORF)more » CT296. CT296 has been reported to exhibit functional properties of a divalent cation transcription repressor (DcrA), with similarity to the Escherichia coli iron-responsive transcriptional repressor, Fur. Unexpectedly, the I-TASSER model of CT296 exhibited no structural similarity to any DNA-interacting proteins or motifs. To validate the I-TASSER-generated model, the structure of CT296 was solved experimentally using X-ray crystallography. Impressively, the ab initio I-TASSER-generated model closely matched (2.72-{angstrom} C{alpha} root mean square deviation [RMSD]) the high-resolution (1.8-{angstrom}) crystal structure of CT296. Modeled and experimentally determined structures of CT296 share structural characteristics of non-heme Fe(II) 2-oxoglutarate-dependent enzymes, although key enzymatic residues are not conserved, suggesting a unique biochemical process is likely associated with CT296 function. Additionally, functional analyses did not support prior reports that CT296 has properties shared with divalent cation repressors such as Fur.« less
The Quality and Validation of Structures from Structural Genomics
Domagalski, Marcin J.; Zheng, Heping; Zimmerman, Matthew D.; Dauter, Zbigniew; Wlodawer, Alexander; Minor, Wladek
2014-01-01
Quality control of three-dimensional structures of macromolecules is a critical step to ensure the integrity of structural biology data, especially those produced by structural genomics centers. Whereas the Protein Data Bank (PDB) has proven to be a remarkable success overall, the inconsistent quality of structures reveals a lack of universal standards for structure/deposit validation. Here, we review the state-of-the-art methods used in macromolecular structure validation, focusing on validation of structures determined by X-ray crystallography. We describe some general protocols used in the rebuilding and re-refinement of problematic structural models. We also briefly discuss some frontier areas of structure validation, including refinement of protein–ligand complexes, automation of structure redetermination, and the use of NMR structures and computational models to solve X-ray crystal structures by molecular replacement. PMID:24203341
Functional classification of protein structures by local structure matching in graph representation.
Mills, Caitlyn L; Garg, Rohan; Lee, Joslynn S; Tian, Liang; Suciu, Alexandru; Cooperman, Gene; Beuning, Penny J; Ondrechen, Mary Jo
2018-03-31
As a result of high-throughput protein structure initiatives, over 14,400 protein structures have been solved by structural genomics (SG) centers and participating research groups. While the totality of SG data represents a tremendous contribution to genomics and structural biology, reliable functional information for these proteins is generally lacking. Better functional predictions for SG proteins will add substantial value to the structural information already obtained. Our method described herein, Graph Representation of Active Sites for Prediction of Function (GRASP-Func), predicts quickly and accurately the biochemical function of proteins by representing residues at the predicted local active site as graphs rather than in Cartesian coordinates. We compare the GRASP-Func method to our previously reported method, structurally aligned local sites of activity (SALSA), using the ribulose phosphate binding barrel (RPBB), 6-hairpin glycosidase (6-HG), and Concanavalin A-like Lectins/Glucanase (CAL/G) superfamilies as test cases. In each of the superfamilies, SALSA and the much faster method GRASP-Func yield similar correct classification of previously characterized proteins, providing a validated benchmark for the new method. In addition, we analyzed SG proteins using our SALSA and GRASP-Func methods to predict function. Forty-one SG proteins in the RPBB superfamily, nine SG proteins in the 6-HG superfamily, and one SG protein in the CAL/G superfamily were successfully classified into one of the functional families in their respective superfamily by both methods. This improved, faster, validated computational method can yield more reliable predictions of function that can be used for a wide variety of applications by the community. © 2018 The Authors Protein Science published by Wiley Periodicals, Inc. on behalf of The Protein Society.
PDBStat: a universal restraint converter and restraint analysis software package for protein NMR.
Tejero, Roberto; Snyder, David; Mao, Binchen; Aramini, James M; Montelione, Gaetano T
2013-08-01
The heterogeneous array of software tools used in the process of protein NMR structure determination presents organizational challenges in the structure determination and validation processes, and creates a learning curve that limits the broader use of protein NMR in biology. These challenges, including accurate use of data in different data formats required by software carrying out similar tasks, continue to confound the efforts of novices and experts alike. These important issues need to be addressed robustly in order to standardize protein NMR structure determination and validation. PDBStat is a C/C++ computer program originally developed as a universal coordinate and protein NMR restraint converter. Its primary function is to provide a user-friendly tool for interconverting between protein coordinate and protein NMR restraint data formats. It also provides an integrated set of computational methods for protein NMR restraint analysis and structure quality assessment, relabeling of prochiral atoms with correct IUPAC names, as well as multiple methods for analysis of the consistency of atomic positions indicated by their convergence across a protein NMR ensemble. In this paper we provide a detailed description of the PDBStat software, and highlight some of its valuable computational capabilities. As an example, we demonstrate the use of the PDBStat restraint converter for restrained CS-Rosetta structure generation calculations, and compare the resulting protein NMR structure models with those generated from the same NMR restraint data using more traditional structure determination methods. These results demonstrate the value of a universal restraint converter in allowing the use of multiple structure generation methods with the same restraint data for consensus analysis of protein NMR structures and the underlying restraint data.
PDBStat: A Universal Restraint Converter and Restraint Analysis Software Package for Protein NMR
Tejero, Roberto; Snyder, David; Mao, Binchen; Aramini, James M.; Montelione, Gaetano T
2013-01-01
The heterogeneous array of software tools used in the process of protein NMR structure determination presents organizational challenges in the structure determination and validation processes, and creates a learning curve that limits the broader use of protein NMR in biology. These challenges, including accurate use of data in different data formats required by software carrying out similar tasks, continue to confound the efforts of novices and experts alike. These important issues need to be addressed robustly in order to standardize protein NMR structure determination and validation. PDBStat is a C/C++ computer program originally developed as a universal coordinate and protein NMR restraint converter. Its primary function is to provide a user-friendly tool for interconverting between protein coordinate and protein NMR restraint data formats. It also provides an integrated set of computational methods for protein NMR restraint analysis and structure quality assessment, relabeling of prochiral atoms with correct IUPAC names, as well as multiple methods for analysis of the consistency of atomic positions indicated by their convergence across a protein NMR ensemble. In this paper we provide a detailed description of the PDBStat software, and highlight some of its valuable computational capabilities. As an example, we demonstrate the use of the PDBStat restraint converter for restrained CS-Rosetta structure generation calculations, and compare the resulting protein NMR structure models with those generated from the same NMR restraint data using more traditional structure determination methods. These results demonstrate the value of a universal restraint converter in allowing the use of multiple structure generation methods with the same restraint data for consensus analysis of protein NMR structures and the underlying restraint data. PMID:23897031
Brodie, Nicholas I; Popov, Konstantin I; Petrotchenko, Evgeniy V; Dokholyan, Nikolay V; Borchers, Christoph H
2017-07-01
We present an integrated experimental and computational approach for de novo protein structure determination in which short-distance cross-linking data are incorporated into rapid discrete molecular dynamics (DMD) simulations as constraints, reducing the conformational space and achieving the correct protein folding on practical time scales. We tested our approach on myoglobin and FK506 binding protein-models for α helix-rich and β sheet-rich proteins, respectively-and found that the lowest-energy structures obtained were in agreement with the crystal structure, hydrogen-deuterium exchange, surface modification, and long-distance cross-linking validation data. Our approach is readily applicable to other proteins with unknown structures.
Trabanino, Rene J.; Hall, Spencer E.; Vaidehi, Nagarajan; Floriano, Wely B.; Kam, Victor W. T.; Goddard, William A.
2004-01-01
G-protein-coupled receptors (GPCRs) are involved in cell communication processes and with mediating such senses as vision, smell, taste, and pain. They constitute a prominent superfamily of drug targets, but an atomic-level structure is available for only one GPCR, bovine rhodopsin, making it difficult to use structure-based methods to design receptor-specific drugs. We have developed the MembStruk first principles computational method for predicting the three-dimensional structure of GPCRs. In this article we validate the MembStruk procedure by comparing its predictions with the high-resolution crystal structure of bovine rhodopsin. The crystal structure of bovine rhodopsin has the second extracellular (EC-II) loop closed over the transmembrane regions by making a disulfide linkage between Cys-110 and Cys-187, but we speculate that opening this loop may play a role in the activation process of the receptor through the cysteine linkage with helix 3. Consequently we predicted two structures for bovine rhodopsin from the primary sequence (with no input from the crystal structure)—one with the EC-II loop closed as in the crystal structure, and the other with the EC-II loop open. The MembStruk-predicted structure of bovine rhodopsin with the closed EC-II loop deviates from the crystal by 2.84 Å coordinate root mean-square (CRMS) in the transmembrane region main-chain atoms. The predicted three-dimensional structures for other GPCRs can be validated only by predicting binding sites and energies for various ligands. For such predictions we developed the HierDock first principles computational method. We validate HierDock by predicting the binding site of 11-cis-retinal in the crystal structure of bovine rhodopsin. Scanning the whole protein without using any prior knowledge of the binding site, we find that the best scoring conformation in rhodopsin is 1.1 Å CRMS from the crystal structure for the ligand atoms. This predicted conformation has the carbonyl O only 2.82 Å from the N of Lys-296. Making this Schiff base bond and minimizing leads to a final conformation only 0.62 Å CRMS from the crystal structure. We also used HierDock to predict the binding site of 11-cis-retinal in the MembStruk-predicted structure of bovine rhodopsin (closed loop). Scanning the whole protein structure leads to a structure in which the carbonyl O is only 2.85 Å from the N of Lys-296. Making this Schiff base bond and minimizing leads to a final conformation only 2.92 Å CRMS from the crystal structure. The good agreement of the ab initio-predicted protein structures and ligand binding site with experiment validates the use of the MembStruk and HierDock first principles' methods. Since these methods are generic and applicable to any GPCR, they should be useful in predicting the structures of other GPCRs and the binding site of ligands to these proteins. PMID:15041637
NIAS-Server: Neighbors Influence of Amino acids and Secondary Structures in Proteins.
Borguesan, Bruno; Inostroza-Ponta, Mario; Dorn, Márcio
2017-03-01
The exponential growth in the number of experimentally determined three-dimensional protein structures provide a new and relevant knowledge about the conformation of amino acids in proteins. Only a few of probability densities of amino acids are publicly available for use in structure validation and prediction methods. NIAS (Neighbors Influence of Amino acids and Secondary structures) is a web-based tool used to extract information about conformational preferences of amino acid residues and secondary structures in experimental-determined protein templates. This information is useful, for example, to characterize folds and local motifs in proteins, molecular folding, and can help the solution of complex problems such as protein structure prediction, protein design, among others. The NIAS-Server and supplementary data are available at http://sbcb.inf.ufrgs.br/nias .
2015-01-01
the Protein Data Bank (http://www.rcsb.org/ pdb /). These structures are the most accurate and can be used for molecular docking. Target flexibility is...crystallized with the different ligands. In total, 240 files with the structures of 37 proteins were downloaded from PDB and used for docking...total, 240 files with protein structures were downloaded from the PDB and used for protein–ligand docking. It is widely accepted that ligand binding
Quality assessment of protein model-structures based on structural and functional similarities.
Konopka, Bogumil M; Nebel, Jean-Christophe; Kotulska, Malgorzata
2012-09-21
Experimental determination of protein 3D structures is expensive, time consuming and sometimes impossible. A gap between number of protein structures deposited in the World Wide Protein Data Bank and the number of sequenced proteins constantly broadens. Computational modeling is deemed to be one of the ways to deal with the problem. Although protein 3D structure prediction is a difficult task, many tools are available. These tools can model it from a sequence or partial structural information, e.g. contact maps. Consequently, biologists have the ability to generate automatically a putative 3D structure model of any protein. However, the main issue becomes evaluation of the model quality, which is one of the most important challenges of structural biology. GOBA--Gene Ontology-Based Assessment is a novel Protein Model Quality Assessment Program. It estimates the compatibility between a model-structure and its expected function. GOBA is based on the assumption that a high quality model is expected to be structurally similar to proteins functionally similar to the prediction target. Whereas DALI is used to measure structure similarity, protein functional similarity is quantified using standardized and hierarchical description of proteins provided by Gene Ontology combined with Wang's algorithm for calculating semantic similarity. Two approaches are proposed to express the quality of protein model-structures. One is a single model quality assessment method, the other is its modification, which provides a relative measure of model quality. Exhaustive evaluation is performed on data sets of model-structures submitted to the CASP8 and CASP9 contests. The validation shows that the method is able to discriminate between good and bad model-structures. The best of tested GOBA scores achieved 0.74 and 0.8 as a mean Pearson correlation to the observed quality of models in our CASP8 and CASP9-based validation sets. GOBA also obtained the best result for two targets of CASP8, and one of CASP9, compared to the contest participants. Consequently, GOBA offers a novel single model quality assessment program that addresses the practical needs of biologists. In conjunction with other Model Quality Assessment Programs (MQAPs), it would prove useful for the evaluation of single protein models.
Twilight reloaded: the peptide experience
Weichenberger, Christian X.; Pozharski, Edwin; Rupp, Bernhard
2017-01-01
The de facto commoditization of biomolecular crystallography as a result of almost disruptive instrumentation automation and continuing improvement of software allows any sensibly trained structural biologist to conduct crystallographic studies of biomolecules with reasonably valid outcomes: that is, models based on properly interpreted electron density. Robust validation has led to major mistakes in the protein part of structure models becoming rare, but some depositions of protein–peptide complex structure models, which generally carry significant interest to the scientific community, still contain erroneous models of the bound peptide ligand. Here, the protein small-molecule ligand validation tool Twilight is updated to include peptide ligands. (i) The primary technical reasons and potential human factors leading to problems in ligand structure models are presented; (ii) a new method used to score peptide-ligand models is presented; (iii) a few instructive and specific examples, including an electron-density-based analysis of peptide-ligand structures that do not contain any ligands, are discussed in detail; (iv) means to avoid such mistakes and the implications for database integrity are discussed and (v) some suggestions as to how journal editors could help to expunge errors from the Protein Data Bank are provided. PMID:28291756
Baker, Matthew L.; Hryc, Corey F.; Zhang, Qinfen; Wu, Weimin; Jakana, Joanita; Haase-Pettingell, Cameron; Afonine, Pavel V.; Adams, Paul D.; King, Jonathan A.; Jiang, Wen; Chiu, Wah
2013-01-01
High-resolution structures of viruses have made important contributions to modern structural biology. Bacteriophages, the most diverse and abundant organisms on earth, replicate and infect all bacteria and archaea, making them excellent potential alternatives to antibiotics and therapies for multidrug-resistant bacteria. Here, we improved upon our previous electron cryomicroscopy structure of Salmonella bacteriophage epsilon15, achieving a resolution sufficient to determine the tertiary structures of both gp7 and gp10 protein subunits that form the T = 7 icosahedral lattice. This study utilizes recently established best practice for near-atomic to high-resolution (3–5 Å) electron cryomicroscopy data evaluation. The resolution and reliability of the density map were cross-validated by multiple reconstructions from truly independent data sets, whereas the models of the individual protein subunits were validated adopting the best practices from X-ray crystallography. Some sidechain densities are clearly resolved and show the subunit–subunit interactions within and across the capsomeres that are required to stabilize the virus. The presence of the canonical phage and jellyroll viral protein folds, gp7 and gp10, respectively, in the same virus suggests that epsilon15 may have emerged more recently relative to other bacteriophages. PMID:23840063
Brodie, Nicholas I.; Popov, Konstantin I.; Petrotchenko, Evgeniy V.; Dokholyan, Nikolay V.; Borchers, Christoph H.
2017-01-01
We present an integrated experimental and computational approach for de novo protein structure determination in which short-distance cross-linking data are incorporated into rapid discrete molecular dynamics (DMD) simulations as constraints, reducing the conformational space and achieving the correct protein folding on practical time scales. We tested our approach on myoglobin and FK506 binding protein—models for α helix–rich and β sheet–rich proteins, respectively—and found that the lowest-energy structures obtained were in agreement with the crystal structure, hydrogen-deuterium exchange, surface modification, and long-distance cross-linking validation data. Our approach is readily applicable to other proteins with unknown structures. PMID:28695211
Kekilli, Demet; Dworkowski, Florian S N; Pompidor, Guillaume; Fuchs, Martin R; Andrew, Colin R; Antonyuk, Svetlana; Strange, Richard W; Eady, Robert R; Hasnain, S Samar; Hough, Michael A
2014-05-01
It is crucial to assign the correct redox and ligand states to crystal structures of proteins with an active redox centre to gain valid functional information and prevent the misinterpretation of structures. Single-crystal spectroscopies, particularly when applied in situ at macromolecular crystallography beamlines, allow spectroscopic investigations of redox and ligand states and the identification of reaction intermediates in protein crystals during the collection of structural data. Single-crystal resonance Raman spectroscopy was carried out in combination with macromolecular crystallography on Swiss Light Source beamline X10SA using cytochrome c' from Alcaligenes xylosoxidans. This allowed the fingerprinting and validation of different redox and ligand states, identification of vibrational modes and identification of intermediates together with monitoring of radiation-induced changes. This combined approach provides a powerful tool to obtain complementary data and correctly assign the true oxidation and ligand state(s) in redox-protein crystals.
Protein Secondary Structure Prediction Using AutoEncoder Network and Bayes Classifier
NASA Astrophysics Data System (ADS)
Wang, Leilei; Cheng, Jinyong
2018-03-01
Protein secondary structure prediction is belong to bioinformatics,and it's important in research area. In this paper, we propose a new prediction way of protein using bayes classifier and autoEncoder network. Our experiments show some algorithms including the construction of the model, the classification of parameters and so on. The data set is a typical CB513 data set for protein. In terms of accuracy, the method is the cross validation based on the 3-fold. Then we can get the Q3 accuracy. Paper results illustrate that the autoencoder network improved the prediction accuracy of protein secondary structure.
Vivaldi: visualization and validation of biomacromolecular NMR structures from the PDB.
Hendrickx, Pieter M S; Gutmanas, Aleksandras; Kleywegt, Gerard J
2013-04-01
We describe Vivaldi (VIsualization and VALidation DIsplay; http://pdbe.org/vivaldi), a web-based service for the analysis, visualization, and validation of NMR structures in the Protein Data Bank (PDB). Vivaldi provides access to model coordinates and several types of experimental NMR data using interactive visualization tools, augmented with structural annotations and model-validation information. The service presents information about the modeled NMR ensemble, validation of experimental chemical shifts, residual dipolar couplings, distance and dihedral angle constraints, as well as validation scores based on empirical knowledge and databases. Vivaldi was designed for both expert NMR spectroscopists and casual non-expert users who wish to obtain a better grasp of the information content and quality of NMR structures in the public archive. Copyright © 2013 Wiley Periodicals, Inc.
Improvement on a simplified model for protein folding simulation.
Zhang, Ming; Chen, Changjun; He, Yi; Xiao, Yi
2005-11-01
Improvements were made on a simplified protein model--the Ramachandran model-to achieve better computer simulation of protein folding. To check the validity of such improvements, we chose the ultrafast folding protein Engrailed Homeodomain as an example and explored several aspects of its folding. The engrailed homeodomain is a mainly alpha-helical protein of 61 residues from Drosophila melanogaster. We found that the simplified model of Engrailed Homeodomain can fold into a global minimum state with a tertiary structure in good agreement with its native structure.
A coarse grain model for protein-surface interactions
NASA Astrophysics Data System (ADS)
Wei, Shuai; Knotts, Thomas A.
2013-09-01
The interaction of proteins with surfaces is important in numerous applications in many fields—such as biotechnology, proteomics, sensors, and medicine—but fundamental understanding of how protein stability and structure are affected by surfaces remains incomplete. Over the last several years, molecular simulation using coarse grain models has yielded significant insights, but the formalisms used to represent the surface interactions have been rudimentary. We present a new model for protein surface interactions that incorporates the chemical specificity of both the surface and the residues comprising the protein in the context of a one-bead-per-residue, coarse grain approach that maintains computational efficiency. The model is parameterized against experimental adsorption energies for multiple model peptides on different types of surfaces. The validity of the model is established by its ability to quantitatively and qualitatively predict the free energy of adsorption and structural changes for multiple biologically-relevant proteins on different surfaces. The validation, done with proteins not used in parameterization, shows that the model produces remarkable agreement between simulation and experiment.
Automation of NMR structure determination of proteins.
Altieri, Amanda S; Byrd, R Andrew
2004-10-01
The automation of protein structure determination using NMR is coming of age. The tedious processes of resonance assignment, followed by assignment of NOE (nuclear Overhauser enhancement) interactions (now intertwined with structure calculation), assembly of input files for structure calculation, intermediate analyses of incorrect assignments and bad input data, and finally structure validation are all being automated with sophisticated software tools. The robustness of the different approaches continues to deal with problems of completeness and uniqueness; nevertheless, the future is very bright for automation of NMR structure generation to approach the levels found in X-ray crystallography. Currently, near completely automated structure determination is possible for small proteins, and the prospect for medium-sized and large proteins is good. Copyright 2004 Elsevier Ltd.
Quality assessment of protein model-structures based on structural and functional similarities
2012-01-01
Background Experimental determination of protein 3D structures is expensive, time consuming and sometimes impossible. A gap between number of protein structures deposited in the World Wide Protein Data Bank and the number of sequenced proteins constantly broadens. Computational modeling is deemed to be one of the ways to deal with the problem. Although protein 3D structure prediction is a difficult task, many tools are available. These tools can model it from a sequence or partial structural information, e.g. contact maps. Consequently, biologists have the ability to generate automatically a putative 3D structure model of any protein. However, the main issue becomes evaluation of the model quality, which is one of the most important challenges of structural biology. Results GOBA - Gene Ontology-Based Assessment is a novel Protein Model Quality Assessment Program. It estimates the compatibility between a model-structure and its expected function. GOBA is based on the assumption that a high quality model is expected to be structurally similar to proteins functionally similar to the prediction target. Whereas DALI is used to measure structure similarity, protein functional similarity is quantified using standardized and hierarchical description of proteins provided by Gene Ontology combined with Wang's algorithm for calculating semantic similarity. Two approaches are proposed to express the quality of protein model-structures. One is a single model quality assessment method, the other is its modification, which provides a relative measure of model quality. Exhaustive evaluation is performed on data sets of model-structures submitted to the CASP8 and CASP9 contests. Conclusions The validation shows that the method is able to discriminate between good and bad model-structures. The best of tested GOBA scores achieved 0.74 and 0.8 as a mean Pearson correlation to the observed quality of models in our CASP8 and CASP9-based validation sets. GOBA also obtained the best result for two targets of CASP8, and one of CASP9, compared to the contest participants. Consequently, GOBA offers a novel single model quality assessment program that addresses the practical needs of biologists. In conjunction with other Model Quality Assessment Programs (MQAPs), it would prove useful for the evaluation of single protein models. PMID:22998498
TAP score: torsion angle propensity normalization applied to local protein structure evaluation
Tosatto, Silvio CE; Battistutta, Roberto
2007-01-01
Background Experimentally determined protein structures may contain errors and require validation. Conformational criteria based on the Ramachandran plot are mainly used to distinguish between distorted and adequately refined models. While the readily available criteria are sufficient to detect totally wrong structures, establishing the more subtle differences between plausible structures remains more challenging. Results A new criterion, called TAP score, measuring local sequence to structure fitness based on torsion angle propensities normalized against the global minimum and maximum is introduced. It is shown to be more accurate than previous methods at estimating the validity of a protein model in terms of commonly used experimental quality parameters on two test sets representing the full PDB database and a subset of obsolete PDB structures. Highly selective TAP thresholds are derived to recognize over 90% of the top experimental structures in the absence of experimental information. Both a web server and an executable version of the TAP score are available at . Conclusion A novel procedure for energy normalization (TAP) has significantly improved the possibility to recognize the best experimental structures. It will allow the user to more reliably isolate problematic structures in the context of automated experimental structure determination. PMID:17504537
Kato, Koichi; Nakayoshi, Tomoki; Fukuyoshi, Shuichi; Kurimoto, Eiji; Oda, Akifumi
2017-10-12
Although various higher-order protein structure prediction methods have been developed, almost all of them were developed based on the three-dimensional (3D) structure information of known proteins. Here we predicted the short protein structures by molecular dynamics (MD) simulations in which only Newton's equations of motion were used and 3D structural information of known proteins was not required. To evaluate the ability of MD simulationto predict protein structures, we calculated seven short test protein (10-46 residues) in the denatured state and compared their predicted and experimental structures. The predicted structure for Trp-cage (20 residues) was close to the experimental structure by 200-ns MD simulation. For proteins shorter or longer than Trp-cage, root-mean square deviation values were larger than those for Trp-cage. However, secondary structures could be reproduced by MD simulations for proteins with 10-34 residues. Simulations by replica exchange MD were performed, but the results were similar to those from normal MD simulations. These results suggest that normal MD simulations can roughly predict short protein structures and 200-ns simulations are frequently sufficient for estimating the secondary structures of protein (approximately 20 residues). Structural prediction method using only fundamental physical laws are useful for investigating non-natural proteins, such as primitive proteins and artificial proteins for peptide-based drug delivery systems.
NASA Astrophysics Data System (ADS)
Orellana, Laura; Yoluk, Ozge; Carrillo, Oliver; Orozco, Modesto; Lindahl, Erik
2016-08-01
Protein conformational changes are at the heart of cell functions, from signalling to ion transport. However, the transient nature of the intermediates along transition pathways hampers their experimental detection, making the underlying mechanisms elusive. Here we retrieve dynamic information on the actual transition routes from principal component analysis (PCA) of structurally-rich ensembles and, in combination with coarse-grained simulations, explore the conformational landscapes of five well-studied proteins. Modelling them as elastic networks in a hybrid elastic-network Brownian dynamics simulation (eBDIMS), we generate trajectories connecting stable end-states that spontaneously sample the crystallographic motions, predicting the structures of known intermediates along the paths. We also show that the explored non-linear routes can delimit the lowest energy passages between end-states sampled by atomistic molecular dynamics. The integrative methodology presented here provides a powerful framework to extract and expand dynamic pathway information from the Protein Data Bank, as well as to validate sampling methods in general.
Improta, Roberto; Vitagliano, Luigi; Esposito, Luciana
2015-11-01
The elucidation of the mutual influence between peptide bond geometry and local conformation has important implications for protein structure refinement, validation, and prediction. To gain insights into the structural determinants and the energetic contributions associated with protein/peptide backbone plasticity, we here report an extensive analysis of the variability of the peptide bond angles by combining statistical analyses of protein structures and quantum mechanics calculations on small model peptide systems. Our analyses demonstrate that all the backbone bond angles strongly depend on the peptide conformation and unveil the existence of regular trends as function of ψ and/or φ. The excellent agreement of the quantum mechanics calculations with the statistical surveys of protein structures validates the computational scheme here employed and demonstrates that the valence geometry of protein/peptide backbone is primarily dictated by local interactions. Notably, for the first time we show that the position of the H(α) hydrogen atom, which is an important parameter in NMR structural studies, is also dependent on the local conformation. Most of the trends observed may be satisfactorily explained by invoking steric repulsive interactions; in some specific cases the valence bond variability is also influenced by hydrogen-bond like interactions. Moreover, we can provide a reliable estimate of the energies involved in the interplay between geometry and conformations. © 2015 Wiley Periodicals, Inc.
Alfinito, Eleonora; Reggiani, Lino
2016-10-01
Current-voltage characteristics of metal-protein-metal structures made of proteorhodopsin and bacteriorhodopsin are modeled by using a percolation-like approach. Starting from the tertiary structure pertaining to the single protein, an analogous resistance network is created. Charge transfer inside the network is described as a sequential tunneling mechanism and the current is calculated for each value of the given voltage. The theory is validated with available experiments, in dark and light. The role of the tertiary structure of the single protein and of the mechanisms responsible for the photo-activity is discussed.
NMR in the SPINE Structural Proteomics project.
Ab, E; Atkinson, A R; Banci, L; Bertini, I; Ciofi-Baffoni, S; Brunner, K; Diercks, T; Dötsch, V; Engelke, F; Folkers, G E; Griesinger, C; Gronwald, W; Günther, U; Habeck, M; de Jong, R N; Kalbitzer, H R; Kieffer, B; Leeflang, B R; Loss, S; Luchinat, C; Marquardsen, T; Moskau, D; Neidig, K P; Nilges, M; Piccioli, M; Pierattelli, R; Rieping, W; Schippmann, T; Schwalbe, H; Travé, G; Trenner, J; Wöhnert, J; Zweckstetter, M; Kaptein, R
2006-10-01
This paper describes the developments, role and contributions of the NMR spectroscopy groups in the Structural Proteomics In Europe (SPINE) consortium. Focusing on the development of high-throughput (HTP) pipelines for NMR structure determinations of proteins, all aspects from sample preparation, data acquisition, data processing, data analysis to structure determination have been improved with respect to sensitivity, automation, speed, robustness and validation. Specific highlights are protonless (13)C-direct detection methods and inferential structure determinations (ISD). In addition to technological improvements, these methods have been applied to deliver over 60 NMR structures of proteins, among which are five that failed to crystallize. The inclusion of NMR spectroscopy in structural proteomics pipelines improves the success rate for protein structure determinations.
Computational protein design-the next generation tool to expand synthetic biology applications.
Gainza-Cirauqui, Pablo; Correia, Bruno Emanuel
2018-05-02
One powerful approach to engineer synthetic biology pathways is the assembly of proteins sourced from one or more natural organisms. However, synthetic pathways often require custom functions or biophysical properties not displayed by natural proteins, limitations that could be overcome through modern protein engineering techniques. Structure-based computational protein design is a powerful tool to engineer new functional capabilities in proteins, and it is beginning to have a profound impact in synthetic biology. Here, we review efforts to increase the capabilities of synthetic biology using computational protein design. We focus primarily on computationally designed proteins not only validated in vitro, but also shown to modulate different activities in living cells. Efforts made to validate computational designs in cells can illustrate both the challenges and opportunities in the intersection of protein design and synthetic biology. We also highlight protein design approaches, which although not validated as conveyors of new cellular function in situ, may have rapid and innovative applications in synthetic biology. We foresee that in the near-future, computational protein design will vastly expand the functional capabilities of synthetic cells. Copyright © 2018. Published by Elsevier Ltd.
Contreras-Torres, Ernesto
2018-06-02
In this study, I introduce novel global and local 0D-protein descriptors based on a statistical quantity named Total Sum of Squares (TSS). This quantity represents the sum of the squares differences of amino acid properties from the arithmetic mean property. As an extension, the amino acid-types and amino acid-groups formalisms are used for describing zones of interest in proteins. To assess the effectiveness of the proposed descriptors, a Nearest Neighbor model for predicting the major four protein structural classes was built. This model has a success rate of 98.53% on the jackknife cross-validation test; this performance being superior to other reported methods despite the simplicity of the predictor. Additionally, this predictor has an average success rate of 98.35% in different cross-validation tests performed. A value of 0.98 for the Kappa statistic clearly discriminates this model from a random predictor. The results obtained by the Nearest Neighbor model demonstrated the ability of the proposed descriptors not only to reflect relevant biochemical information related to the structural classes of proteins but also to allow appropriate interpretability. It can thus be expected that the current method may play a supplementary role to other existing approaches for protein structural class prediction and other protein attributes. Copyright © 2018 Elsevier Ltd. All rights reserved.
Chakravorty, Dhruva K.; Wang, Bing; Lee, Chul Won; Guerra, Alfredo J.; Giedroc, David P.; Merz, Kenneth M.
2013-01-01
Correctly calculating the structure of metal coordination sites in a protein during the process of nuclear magnetic resonance (NMR) structure determination and refinement continues to be a challenging task. In this study, we present an accurate and convenient means by which to include metal ions in the NMR structure determination process using molecular dynamics (MD) constrained by NMR-derived data to obtain a realistic and physically viable description of the metal binding site(s). This method provides the framework to accurately portray the metal ions and its binding residues in a pseudo-bond or dummy-cation like approach, and is validated by quantum mechanical/molecular mechanical (QM/MM) MD calculations constrained by NMR-derived data. To illustrate this approach, we refine the zinc coordination complex structure of the zinc sensing transcriptional repressor protein Staphylococcus aureus CzrA, generating over 130 ns of MD and QM/MM MD NMR-data compliant sampling. In addition to refining the first coordination shell structure of the Zn(II) ion, this protocol benefits from being performed in a periodically replicated solvation environment including long-range electrostatics. We determine that unrestrained (not based on NMR data) MD simulations correlated to the NMR data in a time-averaged ensemble. The accurate solution structure ensemble of the metal-bound protein accurately describes the role of conformational dynamics in allosteric regulation of DNA binding by zinc and serves to validate our previous unrestrained MD simulations of CzrA. This methodology has potentially broad applicability in the structure determination of metal ion bound proteins, protein folding and metal template protein-design studies. PMID:23609042
The 3D structures of VDAC represent a native conformation
Hiller, Sebastian; Abramson, Jeff; Mannella, Carmen; Wagner, Gerhard; Zeth, Kornelius
2010-01-01
The most abundant protein of the mitochondrial outer membrane is the voltage-dependent anion channel (VDAC), which facilitates the exchange of ions and molecules between mitochondria and cytosol and is regulated by interactions with other proteins and small molecules. VDAC has been extensively studied for more than three decades, and last year three independent investigations revealed a structure of VDAC-1 exhibiting 19 transmembrane β-strands, constituting a unique structural class of β-barrel membrane proteins. Here, we provide a historical perspective on VDAC research and give an overview of the experimental design used to obtain these structures. Furthermore, we validate the protein refolding approach and summarize biochemical and biophysical evidence that links the 19-stranded structure to the native form of VDAC. PMID:20708406
Melero, Cristina; Ollikainen, Noah; Harwood, Ian; ...
2014-10-13
Re-engineering protein–protein recognition is an important route to dissecting and controlling complex interaction networks. Experimental approaches have used the strategy of “second-site suppressors,” where a functional interaction is inferred between two proteins if a mutation in one protein can be compensated by a mutation in the second. Mimicking this strategy, computational design has been applied successfully to change protein recognition specificity by predicting such sets of compensatory mutations in protein–protein interfaces. To extend this approach, it would be advantageous to be able to “transplant” existing engineered and experimentally validated specificity changes to other homologous protein–protein complexes. Here, we test thismore » strategy by designing a pair of mutations that modulates peptide recognition specificity in the Syntrophin PDZ domain, confirming the designed interaction biochemically and structurally, and then transplanting the mutations into the context of five related PDZ domain–peptide complexes. We find a wide range of energetic effects of identical mutations in structurally similar positions, revealing a dramatic context dependence (epistasis) of designed mutations in homologous protein–protein interactions. To better understand the structural basis of this context dependence, we apply a structure-based computational model that recapitulates these energetic effects and we use this model to make and validate forward predictions. The context dependence of these mutations is captured by computational predictions, our results both highlight the considerable difficulties in designing protein–protein interactions and provide challenging benchmark cases for the development of improved protein modeling and design methods that accurately account for the context.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Melero, Cristina; Ollikainen, Noah; Harwood, Ian
Re-engineering protein–protein recognition is an important route to dissecting and controlling complex interaction networks. Experimental approaches have used the strategy of “second-site suppressors,” where a functional interaction is inferred between two proteins if a mutation in one protein can be compensated by a mutation in the second. Mimicking this strategy, computational design has been applied successfully to change protein recognition specificity by predicting such sets of compensatory mutations in protein–protein interfaces. To extend this approach, it would be advantageous to be able to “transplant” existing engineered and experimentally validated specificity changes to other homologous protein–protein complexes. Here, we test thismore » strategy by designing a pair of mutations that modulates peptide recognition specificity in the Syntrophin PDZ domain, confirming the designed interaction biochemically and structurally, and then transplanting the mutations into the context of five related PDZ domain–peptide complexes. We find a wide range of energetic effects of identical mutations in structurally similar positions, revealing a dramatic context dependence (epistasis) of designed mutations in homologous protein–protein interactions. To better understand the structural basis of this context dependence, we apply a structure-based computational model that recapitulates these energetic effects and we use this model to make and validate forward predictions. The context dependence of these mutations is captured by computational predictions, our results both highlight the considerable difficulties in designing protein–protein interactions and provide challenging benchmark cases for the development of improved protein modeling and design methods that accurately account for the context.« less
Rajasekaran, Rajalakshmi; Chen, Yi-Ping Phoebe
2012-09-01
Leishmaniasis, a multi-faceted ethereal disease is considered to be one of the World's major communicable diseases that demands exhaustive research and control measures. The substantial data on these protozoan parasites has not been utilized completely to develop potential therapeutic strategies against Leishmaniasis. Dihydrofolate reductase thymidylate synthase (DHFR-TS) plays a major role in the infective state of the parasite and hence the DHFR-TS based drugs remains of much interest to researchers working on Leishmaniasis. Although, crystal structures of DHFR-TS from different species including Plasmodium falciparum and Trypanosoma cruzi are available, the experimentally determined structure of the Leishmania major DHFR-TS has not yet been reported in the Protein Data Bank. A high quality three dimensional structure of L.major DHFR-TS has been modeled through the homology modeling approach. Carefully refined and the energy minimized structure of the modeled protein was validated using a number of structure validation programs to confirm its structure quality. The modeled protein structure was used in the process of structure based virtual screening to figure out a potential lead structure against DHFR TS. The lead molecule identified has a binding affinity of 0.51 nM and clearly follows drug like properties.
Structure-Based Design of Highly Selective Inhibitors of the CREB Binding Protein Bromodomain.
Denny, R Aldrin; Flick, Andrew C; Coe, Jotham; Langille, Jonathan; Basak, Arindrajit; Liu, Shenping; Stock, Ingrid; Sahasrabudhe, Parag; Bonin, Paul; Hay, Duncan A; Brennan, Paul E; Pletcher, Mathew; Jones, Lyn H; Chekler, Eugene L Piatnitski
2017-07-13
Chemical probes are required for preclinical target validation to interrogate novel biological targets and pathways. Selective inhibitors of the CREB binding protein (CREBBP)/EP300 bromodomains are required to facilitate the elucidation of biology associated with these important epigenetic targets. Medicinal chemistry optimization that paid particular attention to physiochemical properties delivered chemical probes with desirable potency, selectivity, and permeability attributes. An important feature of the optimization process was the successful application of rational structure-based drug design to address bromodomain selectivity issues (particularly against the structurally related BRD4 protein).
Zhu, Tong; Zhang, John Z H; He, Xiao
2014-09-14
In this work, protein side chain (1)H chemical shifts are used as probes to detect and correct side-chain packing errors in protein's NMR structures through structural refinement. By applying the automated fragmentation quantum mechanics/molecular mechanics (AF-QM/MM) method for ab initio calculation of chemical shifts, incorrect side chain packing was detected in the NMR structures of the Pin1 WW domain. The NMR structure is then refined by using molecular dynamics simulation and the polarized protein-specific charge (PPC) model. The computationally refined structure of the Pin1 WW domain is in excellent agreement with the corresponding X-ray structure. In particular, the use of the PPC model yields a more accurate structure than that using the standard (nonpolarizable) force field. For comparison, some of the widely used empirical models for chemical shift calculations are unable to correctly describe the relationship between the particular proton chemical shift and protein structures. The AF-QM/MM method can be used as a powerful tool for protein NMR structure validation and structural flaw detection.
Cole, Jason C.
2017-01-01
The Cambridge Structural Database (CSD) is the worldwide resource for the dissemination of all published three-dimensional structures of small-molecule organic and metal–organic compounds. This paper briefly describes how this collection of crystal structures can be used en masse in the context of macromolecular crystallography. Examples highlight how the CSD and associated software aid protein–ligand complex validation, and show how the CSD could be further used in the generation of geometrical restraints for protein structure refinement. PMID:28291758
X-ray laser diffraction for structure determination of the rhodopsin-arrestin complex
NASA Astrophysics Data System (ADS)
Zhou, X. Edward; Gao, Xiang; Barty, Anton; Kang, Yanyong; He, Yuanzheng; Liu, Wei; Ishchenko, Andrii; White, Thomas A.; Yefanov, Oleksandr; Han, Gye Won; Xu, Qingping; de Waal, Parker W.; Suino-Powell, Kelly M.; Boutet, Sébastien; Williams, Garth J.; Wang, Meitian; Li, Dianfan; Caffrey, Martin; Chapman, Henry N.; Spence, John C. H.; Fromme, Petra; Weierstall, Uwe; Stevens, Raymond C.; Cherezov, Vadim; Melcher, Karsten; Xu, H. Eric
2016-04-01
Serial femtosecond X-ray crystallography (SFX) using an X-ray free electron laser (XFEL) is a recent advancement in structural biology for solving crystal structures of challenging membrane proteins, including G-protein coupled receptors (GPCRs), which often only produce microcrystals. An XFEL delivers highly intense X-ray pulses of femtosecond duration short enough to enable the collection of single diffraction images before significant radiation damage to crystals sets in. Here we report the deposition of the XFEL data and provide further details on crystallization, XFEL data collection and analysis, structure determination, and the validation of the structural model. The rhodopsin-arrestin crystal structure solved with SFX represents the first near-atomic resolution structure of a GPCR-arrestin complex, provides structural insights into understanding of arrestin-mediated GPCR signaling, and demonstrates the great potential of this SFX-XFEL technology for accelerating crystal structure determination of challenging proteins and protein complexes.
X-ray laser diffraction for structure determination of the rhodopsin-arrestin complex.
Zhou, X Edward; Gao, Xiang; Barty, Anton; Kang, Yanyong; He, Yuanzheng; Liu, Wei; Ishchenko, Andrii; White, Thomas A; Yefanov, Oleksandr; Han, Gye Won; Xu, Qingping; de Waal, Parker W; Suino-Powell, Kelly M; Boutet, Sébastien; Williams, Garth J; Wang, Meitian; Li, Dianfan; Caffrey, Martin; Chapman, Henry N; Spence, John C H; Fromme, Petra; Weierstall, Uwe; Stevens, Raymond C; Cherezov, Vadim; Melcher, Karsten; Xu, H Eric
2016-04-12
Serial femtosecond X-ray crystallography (SFX) using an X-ray free electron laser (XFEL) is a recent advancement in structural biology for solving crystal structures of challenging membrane proteins, including G-protein coupled receptors (GPCRs), which often only produce microcrystals. An XFEL delivers highly intense X-ray pulses of femtosecond duration short enough to enable the collection of single diffraction images before significant radiation damage to crystals sets in. Here we report the deposition of the XFEL data and provide further details on crystallization, XFEL data collection and analysis, structure determination, and the validation of the structural model. The rhodopsin-arrestin crystal structure solved with SFX represents the first near-atomic resolution structure of a GPCR-arrestin complex, provides structural insights into understanding of arrestin-mediated GPCR signaling, and demonstrates the great potential of this SFX-XFEL technology for accelerating crystal structure determination of challenging proteins and protein complexes.
X-ray laser diffraction for structure determination of the rhodopsin-arrestin complex
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhou, X. Edward; Gao, Xiang; Barty, Anton
Here, serial femtosecond X-ray crystallography (SFX) using an X-ray free electron laser (XFEL) is a recent advancement in structural biology for solving crystal structures of challenging membrane proteins, including G-protein coupled receptors (GPCRs), which often only produce microcrystals. An XFEL delivers highly intense X-ray pulses of femtosecond duration short enough to enable the collection of single diffraction images before significant radiation damage to crystals sets in. Here we report the deposition of the XFEL data and provide further details on crystallization, XFEL data collection and analysis, structure determination, and the validation of the structural model. The rhodopsin-arrestin crystal structure solvedmore » with SFX represents the first near-atomic resolution structure of a GPCR-arrestin complex, provides structural insights into understanding of arrestin-mediated GPCR signaling, and demonstrates the great potential of this SFX-XFEL technology for accelerating crystal structure determination of challenging proteins and protein complexes.« less
X-ray laser diffraction for structure determination of the rhodopsin-arrestin complex
Zhou, X. Edward; Gao, Xiang; Barty, Anton; Kang, Yanyong; He, Yuanzheng; Liu, Wei; Ishchenko, Andrii; White, Thomas A.; Yefanov, Oleksandr; Han, Gye Won; Xu, Qingping; de Waal, Parker W.; Suino-Powell, Kelly M.; Boutet, Sébastien; Williams, Garth J.; Wang, Meitian; Li, Dianfan; Caffrey, Martin; Chapman, Henry N.; Spence, John C.H.; Fromme, Petra; Weierstall, Uwe; Stevens, Raymond C.; Cherezov, Vadim; Melcher, Karsten; Xu, H. Eric
2016-01-01
Serial femtosecond X-ray crystallography (SFX) using an X-ray free electron laser (XFEL) is a recent advancement in structural biology for solving crystal structures of challenging membrane proteins, including G-protein coupled receptors (GPCRs), which often only produce microcrystals. An XFEL delivers highly intense X-ray pulses of femtosecond duration short enough to enable the collection of single diffraction images before significant radiation damage to crystals sets in. Here we report the deposition of the XFEL data and provide further details on crystallization, XFEL data collection and analysis, structure determination, and the validation of the structural model. The rhodopsin-arrestin crystal structure solved with SFX represents the first near-atomic resolution structure of a GPCR-arrestin complex, provides structural insights into understanding of arrestin-mediated GPCR signaling, and demonstrates the great potential of this SFX-XFEL technology for accelerating crystal structure determination of challenging proteins and protein complexes. PMID:27070998
X-ray laser diffraction for structure determination of the rhodopsin-arrestin complex
Zhou, X. Edward; Gao, Xiang; Barty, Anton; ...
2016-04-12
Here, serial femtosecond X-ray crystallography (SFX) using an X-ray free electron laser (XFEL) is a recent advancement in structural biology for solving crystal structures of challenging membrane proteins, including G-protein coupled receptors (GPCRs), which often only produce microcrystals. An XFEL delivers highly intense X-ray pulses of femtosecond duration short enough to enable the collection of single diffraction images before significant radiation damage to crystals sets in. Here we report the deposition of the XFEL data and provide further details on crystallization, XFEL data collection and analysis, structure determination, and the validation of the structural model. The rhodopsin-arrestin crystal structure solvedmore » with SFX represents the first near-atomic resolution structure of a GPCR-arrestin complex, provides structural insights into understanding of arrestin-mediated GPCR signaling, and demonstrates the great potential of this SFX-XFEL technology for accelerating crystal structure determination of challenging proteins and protein complexes.« less
HARMONY: a server for the assessment of protein structures
Pugalenthi, G.; Shameer, K.; Srinivasan, N.; Sowdhamini, R.
2006-01-01
Protein structure validation is an important step in computational modeling and structure determination. Stereochemical assessment of protein structures examine internal parameters such as bond lengths and Ramachandran (φ,ψ) angles. Gross structure prediction methods such as inverse folding procedure and structure determination especially at low resolution can sometimes give rise to models that are incorrect due to assignment of misfolds or mistracing of electron density maps. Such errors are not reflected as strain in internal parameters. HARMONY is a procedure that examines the compatibility between the sequence and the structure of a protein by assigning scores to individual residues and their amino acid exchange patterns after considering their local environments. Local environments are described by the backbone conformation, solvent accessibility and hydrogen bonding patterns. We are now providing HARMONY through a web server such that users can submit their protein structure files and, if required, the alignment of homologous sequences. Scores are mapped on the structure for subsequent examination that is useful to also recognize regions of possible local errors in protein structures. HARMONY server is located at PMID:16844999
Modeling complexes of modeled proteins.
Anishchenko, Ivan; Kundrotas, Petras J; Vakser, Ilya A
2017-03-01
Structural characterization of proteins is essential for understanding life processes at the molecular level. However, only a fraction of known proteins have experimentally determined structures. This fraction is even smaller for protein-protein complexes. Thus, structural modeling of protein-protein interactions (docking) primarily has to rely on modeled structures of the individual proteins, which typically are less accurate than the experimentally determined ones. Such "double" modeling is the Grand Challenge of structural reconstruction of the interactome. Yet it remains so far largely untested in a systematic way. We present a comprehensive validation of template-based and free docking on a set of 165 complexes, where each protein model has six levels of structural accuracy, from 1 to 6 Å C α RMSD. Many template-based docking predictions fall into acceptable quality category, according to the CAPRI criteria, even for highly inaccurate proteins (5-6 Å RMSD), although the number of such models (and, consequently, the docking success rate) drops significantly for models with RMSD > 4 Å. The results show that the existing docking methodologies can be successfully applied to protein models with a broad range of structural accuracy, and the template-based docking is much less sensitive to inaccuracies of protein models than the free docking. Proteins 2017; 85:470-478. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
NASA Astrophysics Data System (ADS)
Kalid, Ori; Toledo Warshaviak, Dora; Shechter, Sharon; Sherman, Woody; Shacham, Sharon
2012-11-01
We present the Consensus Induced Fit Docking (cIFD) approach for adapting a protein binding site to accommodate multiple diverse ligands for virtual screening. This novel approach results in a single binding site structure that can bind diverse chemotypes and is thus highly useful for efficient structure-based virtual screening. We first describe the cIFD method and its validation on three targets that were previously shown to be challenging for docking programs (COX-2, estrogen receptor, and HIV reverse transcriptase). We then demonstrate the application of cIFD to the challenging discovery of irreversible Crm1 inhibitors. We report the identification of 33 novel Crm1 inhibitors, which resulted from the testing of 402 purchased compounds selected from a screening set containing 261,680 compounds. This corresponds to a hit rate of 8.2 %. The novel Crm1 inhibitors reveal diverse chemical structures, validating the utility of the cIFD method in a real-world drug discovery project. This approach offers a pragmatic way to implicitly account for protein flexibility without the additional computational costs of ensemble docking or including full protein flexibility during virtual screening.
Quantum-mechanics-derived 13Cα chemical shift server (CheShift) for protein structure validation
Vila, Jorge A.; Arnautova, Yelena A.; Martin, Osvaldo A.; Scheraga, Harold A.
2009-01-01
A server (CheShift) has been developed to predict 13Cα chemical shifts of protein structures. It is based on the generation of 696,916 conformations as a function of the φ, ψ, ω, χ1 and χ2 torsional angles for all 20 naturally occurring amino acids. Their 13Cα chemical shifts were computed at the DFT level of theory with a small basis set and extrapolated, with an empirically-determined linear regression formula, to reproduce the values obtained with a larger basis set. Analysis of the accuracy and sensitivity of the CheShift predictions, in terms of both the correlation coefficient R and the conformational-averaged rmsd between the observed and predicted 13Cα chemical shifts, was carried out for 3 sets of conformations: (i) 36 x-ray-derived protein structures solved at 2.3 Å or better resolution, for which sets of 13Cα chemical shifts were available; (ii) 15 pairs of x-ray and NMR-derived sets of protein conformations; and (iii) a set of decoys for 3 proteins showing an rmsd with respect to the x-ray structure from which they were derived of up to 3 Å. Comparative analysis carried out with 4 popular servers, namely SHIFTS, SHIFTX, SPARTA, and PROSHIFT, for these 3 sets of conformations demonstrated that CheShift is the most sensitive server with which to detect subtle differences between protein models and, hence, to validate protein structures determined by either x-ray or NMR methods, if the observed 13Cα chemical shifts are available. CheShift is available as a web server. PMID:19805131
SARNAclust: Semi-automatic detection of RNA protein binding motifs from immunoprecipitation data
Dotu, Ivan; Adamson, Scott I.; Coleman, Benjamin; Fournier, Cyril; Ricart-Altimiras, Emma; Eyras, Eduardo
2018-01-01
RNA-protein binding is critical to gene regulation, controlling fundamental processes including splicing, translation, localization and stability, and aberrant RNA-protein interactions are known to play a role in a wide variety of diseases. However, molecular understanding of RNA-protein interactions remains limited; in particular, identification of RNA motifs that bind proteins has long been challenging, especially when such motifs depend on both sequence and structure. Moreover, although RNA binding proteins (RBPs) often contain more than one binding domain, algorithms capable of identifying more than one binding motif simultaneously have not been developed. In this paper we present a novel pipeline to determine binding peaks in crosslinking immunoprecipitation (CLIP) data, to discover multiple possible RNA sequence/structure motifs among them, and to experimentally validate such motifs. At the core is a new semi-automatic algorithm SARNAclust, the first unsupervised method to identify and deconvolve multiple sequence/structure motifs simultaneously. SARNAclust computes similarity between sequence/structure objects using a graph kernel, providing the ability to isolate the impact of specific features through the bulge graph formalism. Application of SARNAclust to synthetic data shows its capability of clustering 5 motifs at once with a V-measure value of over 0.95, while GraphClust achieves only a V-measure of 0.083 and RNAcontext cannot detect any of the motifs. When applied to existing eCLIP sets, SARNAclust finds known motifs for SLBP and HNRNPC and novel motifs for several other RBPs such as AGGF1, AKAP8L and ILF3. We demonstrate an experimental validation protocol, a targeted Bind-n-Seq-like high-throughput sequencing approach that relies on RNA inverse folding for oligo pool design, that can validate the components within the SLBP motif. Finally, we use this protocol to experimentally interrogate the SARNAclust motif predictions for protein ILF3. Our results support a newly identified partially double-stranded UUUUUGAGA motif similar to that known for the splicing factor HNRNPC. PMID:29596423
Generation of 3D templates of active sites of proteins with rigid prosthetic groups.
Nebel, Jean-Christophe
2006-05-15
With the increasing availability of protein structures, the generation of biologically meaningful 3D patterns from the simultaneous alignment of several protein structures is an exciting prospect: active sites could be better understood, protein functions and protein 3D structures could be predicted more accurately. Although patterns can already be generated at the fold and topological levels, no system produces high-resolution 3D patterns including atom and cavity positions. To address this challenge, our research focuses on generating patterns from proteins with rigid prosthetic groups. Since these groups are key elements of protein active sites, the generated 3D patterns are expected to be biologically meaningful. In this paper, we present a new approach which allows the generation of 3D patterns from proteins with rigid prosthetic groups. Using 237 protein chains representing proteins containing porphyrin rings, our method was validated by comparing 3D templates generated from homologues with the 3D structure of the proteins they model. Atom positions were predicted reliably: 93% of them had an accuracy of 1.00 A or less. Moreover, similar results were obtained regarding chemical group and cavity positions. Results also suggested our system could contribute to the validation of 3D protein models. Finally, a 3D template was generated for the active site of human cytochrome P450 CYP17, the 3D structure of which is unknown. Its analysis showed that it is biologically meaningful: our method detected the main patterns of the cytochrome P450 superfamily and the motifs linked to catalytic reactions. The 3D template also suggested the position of a residue, which could be involved in a hydrogen bond with CYP17 substrates and the shape and location of a cavity. Comparisons with independently generated 3D models comforted these hypotheses. Alignment software (Nestor3D) is available at http://www.kingston.ac.uk/~ku33185/Nestor3D.html
Prediction of redox-sensitive cysteines using sequential distance and other sequence-based features.
Sun, Ming-An; Zhang, Qing; Wang, Yejun; Ge, Wei; Guo, Dianjing
2016-08-24
Reactive oxygen species can modify the structure and function of proteins and may also act as important signaling molecules in various cellular processes. Cysteine thiol groups of proteins are particularly susceptible to oxidation. Meanwhile, their reversible oxidation is of critical roles for redox regulation and signaling. Recently, several computational tools have been developed for predicting redox-sensitive cysteines; however, those methods either only focus on catalytic redox-sensitive cysteines in thiol oxidoreductases, or heavily depend on protein structural data, thus cannot be widely used. In this study, we analyzed various sequence-based features potentially related to cysteine redox-sensitivity, and identified three types of features for efficient computational prediction of redox-sensitive cysteines. These features are: sequential distance to the nearby cysteines, PSSM profile and predicted secondary structure of flanking residues. After further feature selection using SVM-RFE, we developed Redox-Sensitive Cysteine Predictor (RSCP), a SVM based classifier for redox-sensitive cysteine prediction using primary sequence only. Using 10-fold cross-validation on RSC758 dataset, the accuracy, sensitivity, specificity, MCC and AUC were estimated as 0.679, 0.602, 0.756, 0.362 and 0.727, respectively. When evaluated using 10-fold cross-validation with BALOSCTdb dataset which has structure information, the model achieved performance comparable to current structure-based method. Further validation using an independent dataset indicates it is robust and of relatively better accuracy for predicting redox-sensitive cysteines from non-enzyme proteins. In this study, we developed a sequence-based classifier for predicting redox-sensitive cysteines. The major advantage of this method is that it does not rely on protein structure data, which ensures more extensive application compared to other current implementations. Accurate prediction of redox-sensitive cysteines not only enhances our understanding about the redox sensitivity of cysteine, it may also complement the proteomics approach and facilitate further experimental investigation of important redox-sensitive cysteines.
Pulawski, Wojciech; Jamroz, Michal; Kolinski, Michal; Kolinski, Andrzej; Kmiecik, Sebastian
2016-11-28
The CABS coarse-grained model is a well-established tool for modeling globular proteins (predicting their structure, dynamics, and interactions). Here we introduce an extension of the CABS representation and force field (CABS-membrane) to the modeling of the effect of the biological membrane environment on the structure of membrane proteins. We validate the CABS-membrane model in folding simulations of 10 short helical membrane proteins not using any knowledge about their structure. The simulations start from random protein conformations placed outside the membrane environment and allow for full flexibility of the modeled proteins during their spontaneous insertion into the membrane. In the resulting trajectories, we have found models close to the experimental membrane structures. We also attempted to select the correctly folded models using simple filtering followed by structural clustering combined with reconstruction to the all-atom representation and all-atom scoring. The CABS-membrane model is a promising approach for further development toward modeling of large protein-membrane systems.
Benchmarking protein classification algorithms via supervised cross-validation.
Kertész-Farkas, Attila; Dhir, Somdutta; Sonego, Paolo; Pacurar, Mircea; Netoteia, Sergiu; Nijveen, Harm; Kuzniar, Arnold; Leunissen, Jack A M; Kocsor, András; Pongor, Sándor
2008-04-24
Development and testing of protein classification algorithms are hampered by the fact that the protein universe is characterized by groups vastly different in the number of members, in average protein size, similarity within group, etc. Datasets based on traditional cross-validation (k-fold, leave-one-out, etc.) may not give reliable estimates on how an algorithm will generalize to novel, distantly related subtypes of the known protein classes. Supervised cross-validation, i.e., selection of test and train sets according to the known subtypes within a database has been successfully used earlier in conjunction with the SCOP database. Our goal was to extend this principle to other databases and to design standardized benchmark datasets for protein classification. Hierarchical classification trees of protein categories provide a simple and general framework for designing supervised cross-validation strategies for protein classification. Benchmark datasets can be designed at various levels of the concept hierarchy using a simple graph-theoretic distance. A combination of supervised and random sampling was selected to construct reduced size model datasets, suitable for algorithm comparison. Over 3000 new classification tasks were added to our recently established protein classification benchmark collection that currently includes protein sequence (including protein domains and entire proteins), protein structure and reading frame DNA sequence data. We carried out an extensive evaluation based on various machine-learning algorithms such as nearest neighbor, support vector machines, artificial neural networks, random forests and logistic regression, used in conjunction with comparison algorithms, BLAST, Smith-Waterman, Needleman-Wunsch, as well as 3D comparison methods DALI and PRIDE. The resulting datasets provide lower, and in our opinion more realistic estimates of the classifier performance than do random cross-validation schemes. A combination of supervised and random sampling was used to construct model datasets, suitable for algorithm comparison.
2013-01-01
Chemical cross-linking of proteins combined with mass spectrometry provides an attractive and novel method for the analysis of native protein structures and protein complexes. Analysis of the data however is complex. Only a small number of cross-linked peptides are produced during sample preparation and must be identified against a background of more abundant native peptides. To facilitate the search and identification of cross-linked peptides, we have developed a novel software suite, named Hekate. Hekate is a suite of tools that address the challenges involved in analyzing protein cross-linking experiments when combined with mass spectrometry. The software is an integrated pipeline for the automation of the data analysis workflow and provides a novel scoring system based on principles of linear peptide analysis. In addition, it provides a tool for the visualization of identified cross-links using three-dimensional models, which is particularly useful when combining chemical cross-linking with other structural techniques. Hekate was validated by the comparative analysis of cytochrome c (bovine heart) against previously reported data.1 Further validation was carried out on known structural elements of DNA polymerase III, the catalytic α-subunit of the Escherichia coli DNA replisome along with new insight into the previously uncharacterized C-terminal domain of the protein. PMID:24010795
Outcome of the First wwPDB/CCDC/D3R Ligand Validation Workshop
DOE Office of Scientific and Technical Information (OSTI.GOV)
Adams, Paul D.; Aertgeerts, Kathleen; Bauer, Cary
Crystallographic studies of ligands bound to biological macromolecules (proteins and nucleic acids) represent an important source of information concerning drug-target interactions, providing atomic level insights into the physical chemistry of complex formation between macromolecules and ligands. Of the more than 115,000 entries extant in the Protein Data Bank archive, ~75% include at least one non-polymeric ligand. Ligand geometrical and stereochemical quality, the suitability of ligand models for in silico drug discovery/design, and the goodness-of-fit of ligand models to electron density maps vary widely across the archive. We describe the proceedings and conclusions from the first Worldwide Protein Data Bank/Cambridge Crystallographicmore » Data Centre/Drug Design Data Resource (wwPDB/CCDC/D3R) Ligand Validation Workshop held at the Research Collaboratory for Structural Bioinformatics at Rutgers University on July 30-31, 2015. Experts in protein crystallography from academe and industry came together with non-profit and for-profit software providers for crystallography and with experts in computational chemistry and data archiving to discuss and make recommendations on best practices, as framed by a series of questions central to structural studies of macromolecule-ligand complexes. What data concerning bound ligands should be archived in the Protein Data Bank? How should the ligands be best represented? How should structural models of macromolecule-ligand complexes be validated? What supplementary information should accompany publications of structural studies of biological macromolecules? Consensus recommendations on best practices developed in response to each of these questions are provided, together with some details regarding implementation. Important issues addressed but not resolved at the workshop are also enumerated.« less
Outcome of the First wwPDB/CCDC/D3R Ligand Validation Workshop
Adams, Paul D.; Aertgeerts, Kathleen; Bauer, Cary; ...
2016-04-05
Crystallographic studies of ligands bound to biological macromolecules (proteins and nucleic acids) represent an important source of information concerning drug-target interactions, providing atomic level insights into the physical chemistry of complex formation between macromolecules and ligands. Of the more than 115,000 entries extant in the Protein Data Bank archive, ~75% include at least one non-polymeric ligand. Ligand geometrical and stereochemical quality, the suitability of ligand models for in silico drug discovery/design, and the goodness-of-fit of ligand models to electron density maps vary widely across the archive. We describe the proceedings and conclusions from the first Worldwide Protein Data Bank/Cambridge Crystallographicmore » Data Centre/Drug Design Data Resource (wwPDB/CCDC/D3R) Ligand Validation Workshop held at the Research Collaboratory for Structural Bioinformatics at Rutgers University on July 30-31, 2015. Experts in protein crystallography from academe and industry came together with non-profit and for-profit software providers for crystallography and with experts in computational chemistry and data archiving to discuss and make recommendations on best practices, as framed by a series of questions central to structural studies of macromolecule-ligand complexes. What data concerning bound ligands should be archived in the Protein Data Bank? How should the ligands be best represented? How should structural models of macromolecule-ligand complexes be validated? What supplementary information should accompany publications of structural studies of biological macromolecules? Consensus recommendations on best practices developed in response to each of these questions are provided, together with some details regarding implementation. Important issues addressed but not resolved at the workshop are also enumerated.« less
Outcome of the first wwPDB/CCDC/D3R Ligand Validation Workshop
Adams, Paul D.; Aertgeerts, Kathleen; Bauer, Cary; Bell, Jeffrey A.; Berman, Helen M.; Bhat, Talapady N.; Blaney, Jeff; Bolton, Evan; Bricogne, Gerard; Brown, David; Burley, Stephen K.; Case, David A.; Clark, Kirk L.; Darden, Tom; Emsley, Paul; Feher, Victoria A.; Feng, Zukang; Groom, Colin R.; Harris, Seth F.; Hendle, Jorg; Holder, Thomas; Joachimiak, Andrzej; Kleywegt, Gerard J.; Krojer, Tobias; Marcotrigiano, Joseph; Mark, Alan E.; Markley, John L.; Miller, Matthew; Minor, Wladek; Montelione, Gaetano T.; Murshudov, Garib; Nakagawa, Atsushi; Nakamura, Haruki; Nicholls, Anthony; Nicklaus, Marc; Nolte, Robert T.; Padyana, Anil K.; Peishoff, Catherine E.; Pieniazek, Susan; Read, Randy J.; Shao, Chenghua; Sheriff, Steven; Smart, Oliver; Soisson, Stephen; Spurlino, John; Stouch, Terry; Svobodova, Radka; Tempel, Wolfram; Terwilliger, Thomas C.; Tronrud, Dale; Velankar, Sameer; Ward, Suzanna; Warren, Gregory L.; Westbrook, John D.; Williams, Pamela; Yang, Huanwang; Young, Jasmine
2016-01-01
Summary Crystallographic studies of ligands bound to biological macromolecules (proteins and nucleic acids) represent an important source of information concerning drug-target interactions, providing atomic level insights into the physical chemistry of complex formation between macromolecules and ligands. Of the more than 115,000 entries extant in the Protein Data Bank archive, ~75% include at least one non-polymeric ligand. Ligand geometrical and stereochemical quality, the suitability of ligand models for in silico drug discovery/design, and the goodness-of-fit of ligand models to electron density maps vary widely across the archive. We describe the proceedings and conclusions from the first Worldwide Protein Data Bank/Cambridge Crystallographic Data Centre/Drug Design Data Resource (wwPDB/CCDC/D3R) Ligand Validation Workshop held at the Research Collaboratory for Structural Bioinformatics at Rutgers University on July 30–31, 2015. Experts in protein crystallography from academe and industry came together with non-profit and for-profit software providers for crystallography and with experts in computational chemistry and data archiving to discuss and make recommendations on best practices, as framed by a series of questions central to structural studies of macromolecule-ligand complexes. What data concerning bound ligands should be archived in the Protein Data Bank? How should the ligands be best represented? How should structural models of macromolecule-ligand complexes be validated? What supplementary information should accompany publications of structural studies of biological macromolecules? Consensus recommendations on best practices developed in response to each of these questions are provided, together with some details regarding implementation. Important issues addressed but not resolved at the workshop are also enumerated. PMID:27050687
Huang, Wenxi; Liu, Wanting; Jin, Jingjie; Xiao, Qilan; Lu, Ruibin; Chen, Wei; Xiong, Sheng; Zhang, Gong
2018-03-25
Translational pausing coordinates protein synthesis and co-translational folding. It is a common factor that facilitates the correct folding of large, multi-domain proteins. For small proteins, pausing sites rarely occurs in the gene body, and the 3'-end pausing sites are only essential for the folding of a fraction of proteins. The determinant of the necessity of the pausings remains obscure. In this study, we demonstrated that the steady-state structural fluctuation is a predictor of the necessity of pausing-mediated co-translational folding for small proteins. Validated by experiments with 5 model proteins, we found that the rigid protein structures do not, while the flexible structures do need 3'-end pausings to fold correctly. Therefore, rational optimization of translational pausing can improve soluble expression of small proteins with flexible structures, but not the rigid ones. The rigidity of the structure can be quantitatively estimated in silico using molecular dynamic simulation. Nevertheless, we also found that the translational pausing optimization increases the fitness of the expression host, and thus benefits the recombinant protein production, independent from the soluble expression. These results shed light on the structural basis of the translational pausing and provided a practical tool for industrial protein fermentation. Copyright © 2017. Published by Elsevier Inc.
Predicting turns in proteins with a unified model.
Song, Qi; Li, Tonghua; Cong, Peisheng; Sun, Jiangming; Li, Dapeng; Tang, Shengnan
2012-01-01
Turns are a critical element of the structure of a protein; turns play a crucial role in loops, folds, and interactions. Current prediction methods are well developed for the prediction of individual turn types, including α-turn, β-turn, and γ-turn, etc. However, for further protein structure and function prediction it is necessary to develop a uniform model that can accurately predict all types of turns simultaneously. In this study, we present a novel approach, TurnP, which offers the ability to investigate all the turns in a protein based on a unified model. The main characteristics of TurnP are: (i) using newly exploited features of structural evolution information (secondary structure and shape string of protein) based on structure homologies, (ii) considering all types of turns in a unified model, and (iii) practical capability of accurate prediction of all turns simultaneously for a query. TurnP utilizes predicted secondary structures and predicted shape strings, both of which have greater accuracy, based on innovative technologies which were both developed by our group. Then, sequence and structural evolution features, which are profile of sequence, profile of secondary structures and profile of shape strings are generated by sequence and structure alignment. When TurnP was validated on a non-redundant dataset (4,107 entries) by five-fold cross-validation, we achieved an accuracy of 88.8% and a sensitivity of 71.8%, which exceeded the most state-of-the-art predictors of certain type of turn. Newly determined sequences, the EVA and CASP9 datasets were used as independent tests and the results we achieved were outstanding for turn predictions and confirmed the good performance of TurnP for practical applications.
Predicting Turns in Proteins with a Unified Model
Song, Qi; Li, Tonghua; Cong, Peisheng; Sun, Jiangming; Li, Dapeng; Tang, Shengnan
2012-01-01
Motivation Turns are a critical element of the structure of a protein; turns play a crucial role in loops, folds, and interactions. Current prediction methods are well developed for the prediction of individual turn types, including α-turn, β-turn, and γ-turn, etc. However, for further protein structure and function prediction it is necessary to develop a uniform model that can accurately predict all types of turns simultaneously. Results In this study, we present a novel approach, TurnP, which offers the ability to investigate all the turns in a protein based on a unified model. The main characteristics of TurnP are: (i) using newly exploited features of structural evolution information (secondary structure and shape string of protein) based on structure homologies, (ii) considering all types of turns in a unified model, and (iii) practical capability of accurate prediction of all turns simultaneously for a query. TurnP utilizes predicted secondary structures and predicted shape strings, both of which have greater accuracy, based on innovative technologies which were both developed by our group. Then, sequence and structural evolution features, which are profile of sequence, profile of secondary structures and profile of shape strings are generated by sequence and structure alignment. When TurnP was validated on a non-redundant dataset (4,107 entries) by five-fold cross-validation, we achieved an accuracy of 88.8% and a sensitivity of 71.8%, which exceeded the most state-of-the-art predictors of certain type of turn. Newly determined sequences, the EVA and CASP9 datasets were used as independent tests and the results we achieved were outstanding for turn predictions and confirmed the good performance of TurnP for practical applications. PMID:23144872
Can natural proteins designed with 'inverted' peptide sequences adopt native-like protein folds?
Sridhar, Settu; Guruprasad, Kunchur
2014-01-01
We have carried out a systematic computational analysis on a representative dataset of proteins of known three-dimensional structure, in order to evaluate whether it would possible to 'swap' certain short peptide sequences in naturally occurring proteins with their corresponding 'inverted' peptides and generate 'artificial' proteins that are predicted to retain native-like protein fold. The analysis of 3,967 representative proteins from the Protein Data Bank revealed 102,677 unique identical inverted peptide sequence pairs that vary in sequence length between 5-12 and 18 amino acid residues. Our analysis illustrates with examples that such 'artificial' proteins may be generated by identifying peptides with 'similar structural environment' and by using comparative protein modeling and validation studies. Our analysis suggests that natural proteins may be tolerant to accommodating such peptides.
Watching proteins function with picosecond X-ray crystallography and molecular dynamics simulations.
NASA Astrophysics Data System (ADS)
Anfinrud, Philip
2006-03-01
Time-resolved electron density maps of myoglobin, a ligand-binding heme protein, have been stitched together into movies that unveil with < 2-å spatial resolution and 150-ps time-resolution the correlated protein motions that accompany and/or mediate ligand migration within the hydrophobic interior of a protein. A joint analysis of all-atom molecular dynamics (MD) calculations and picosecond time-resolved X-ray structures provides single-molecule insights into mechanisms of protein function. Ensemble-averaged MD simulations of the L29F mutant of myoglobin following ligand dissociation reproduce the direction, amplitude, and timescales of crystallographically-determined structural changes. This close agreement with experiments at comparable resolution in space and time validates the individual MD trajectories, which identify and structurally characterize a conformational switch that directs dissociated ligands to one of two nearby protein cavities. This unique combination of simulation and experiment unveils functional protein motions and illustrates at an atomic level relationships among protein structure, dynamics, and function. In collaboration with Friedrich Schotte and Gerhard Hummer, NIH.
Identification of Conserved Water Sites in Protein Structures for Drug Design.
Jukič, Marko; Konc, Janez; Gobec, Stanislav; Janežič, Dušanka
2017-12-26
Identification of conserved waters in protein structures is a challenging task with applications in molecular docking and protein stability prediction. As an alternative to computationally demanding simulations of proteins in water, experimental cocrystallized waters in the Protein Data Bank (PDB) in combination with a local structure alignment algorithm can be used for reliable prediction of conserved water sites. We developed the ProBiS H2O approach based on the previously developed ProBiS algorithm, which enables identification of conserved water sites in proteins using experimental protein structures from the PDB or a set of custom protein structures available to the user. With a protein structure, a binding site, or an individual water molecule as a query, ProBiS H2O collects similar proteins from the PDB and performs local or binding site-specific superimpositions of the query structure with similar proteins using the ProBiS algorithm. It collects the experimental water molecules from the similar proteins and transposes them to the query protein. Transposed waters are clustered by their mutual proximity, which enables identification of discrete sites in the query protein with high water conservation. ProBiS H2O is a robust and fast new approach that uses existing experimental structural data to identify conserved water sites on the interfaces of protein complexes, for example protein-small molecule interfaces, and elsewhere on the protein structures. It has been successfully validated in several reported proteins in which conserved water molecules were found to play an important role in ligand binding with applications in drug design.
Reddy Chichili, Vishnu Priyanka; Kumar, Veerendra; Sivaraman, J.
2016-01-01
Protein-protein interactions are key events controlling several biological processes. We have developed and employed a method to trap transiently interacting protein complexes for structural studies using glycine-rich linkers to fuse interacting partners, one of which is unstructured. Initial steps involve isothermal titration calorimetry to identify the minimum binding region of the unstructured protein in its interaction with its stable binding partner. This is followed by computational analysis to identify the approximate site of the interaction and to design an appropriate linker length. Subsequently, fused constructs are generated and characterized using size exclusion chromatography and dynamic light scattering experiments. The structure of the chimeric protein is then solved by crystallization, and validated both in vitro and in vivo by substituting key interacting residues of the full length, unlinked proteins with alanine. This protocol offers the opportunity to study crucial and currently unattainable transient protein interactions involved in various biological processes. PMID:26985443
Thermodynamic prediction of protein neutrality.
Bloom, Jesse D; Silberg, Jonathan J; Wilke, Claus O; Drummond, D Allan; Adami, Christoph; Arnold, Frances H
2005-01-18
We present a simple theory that uses thermodynamic parameters to predict the probability that a protein retains the wild-type structure after one or more random amino acid substitutions. Our theory predicts that for large numbers of substitutions the probability that a protein retains its structure will decline exponentially with the number of substitutions, with the severity of this decline determined by properties of the structure. Our theory also predicts that a protein can gain extra robustness to the first few substitutions by increasing its thermodynamic stability. We validate our theory with simulations on lattice protein models and by showing that it quantitatively predicts previously published experimental measurements on subtilisin and our own measurements on variants of TEM1 beta-lactamase. Our work unifies observations about the clustering of functional proteins in sequence space, and provides a basis for interpreting the response of proteins to substitutions in protein engineering applications.
Thermodynamic prediction of protein neutrality
Bloom, Jesse D.; Silberg, Jonathan J.; Wilke, Claus O.; Drummond, D. Allan; Adami, Christoph; Arnold, Frances H.
2005-01-01
We present a simple theory that uses thermodynamic parameters to predict the probability that a protein retains the wild-type structure after one or more random amino acid substitutions. Our theory predicts that for large numbers of substitutions the probability that a protein retains its structure will decline exponentially with the number of substitutions, with the severity of this decline determined by properties of the structure. Our theory also predicts that a protein can gain extra robustness to the first few substitutions by increasing its thermodynamic stability. We validate our theory with simulations on lattice protein models and by showing that it quantitatively predicts previously published experimental measurements on subtilisin and our own measurements on variants of TEM1 β-lactamase. Our work unifies observations about the clustering of functional proteins in sequence space, and provides a basis for interpreting the response of proteins to substitutions in protein engineering applications. PMID:15644440
Classification of protein quaternary structure by functional domain composition
Yu, Xiaojing; Wang, Chuan; Li, Yixue
2006-01-01
Background The number and the arrangement of subunits that form a protein are referred to as quaternary structure. Quaternary structure is an important protein attribute that is closely related to its function. Proteins with quaternary structure are called oligomeric proteins. Oligomeric proteins are involved in various biological processes, such as metabolism, signal transduction, and chromosome replication. Thus, it is highly desirable to develop some computational methods to automatically classify the quaternary structure of proteins from their sequences. Results To explore this problem, we adopted an approach based on the functional domain composition of proteins. Every protein was represented by a vector calculated from the domains in the PFAM database. The nearest neighbor algorithm (NNA) was used for classifying the quaternary structure of proteins from this information. The jackknife cross-validation test was performed on the non-redundant protein dataset in which the sequence identity was less than 25%. The overall success rate obtained is 75.17%. Additionally, to demonstrate the effectiveness of this method, we predicted the proteins in an independent dataset and achieved an overall success rate of 84.11% Conclusion Compared with the amino acid composition method and Blast, the results indicate that the domain composition approach may be a more effective and promising high-throughput method in dealing with this complicated problem in bioinformatics. PMID:16584572
Bonomi, Massimiliano; Pellarin, Riccardo; Kim, Seung Joong; Russel, Daniel; Sundin, Bryan A.; Riffle, Michael; Jaschob, Daniel; Ramsden, Richard; Davis, Trisha N.; Muller, Eric G. D.; Sali, Andrej
2014-01-01
The use of in vivo Förster resonance energy transfer (FRET) data to determine the molecular architecture of a protein complex in living cells is challenging due to data sparseness, sample heterogeneity, signal contributions from multiple donors and acceptors, unequal fluorophore brightness, photobleaching, flexibility of the linker connecting the fluorophore to the tagged protein, and spectral cross-talk. We addressed these challenges by using a Bayesian approach that produces the posterior probability of a model, given the input data. The posterior probability is defined as a function of the dependence of our FRET metric FRETR on a structure (forward model), a model of noise in the data, as well as prior information about the structure, relative populations of distinct states in the sample, forward model parameters, and data noise. The forward model was validated against kinetic Monte Carlo simulations and in vivo experimental data collected on nine systems of known structure. In addition, our Bayesian approach was validated by a benchmark of 16 protein complexes of known structure. Given the structures of each subunit of the complexes, models were computed from synthetic FRETR data with a distance root-mean-squared deviation error of 14 to 17 Å. The approach is implemented in the open-source Integrative Modeling Platform, allowing us to determine macromolecular structures through a combination of in vivo FRETR data and data from other sources, such as electron microscopy and chemical cross-linking. PMID:25139910
Enabling Large-Scale Design, Synthesis and Validation of Small Molecule Protein-Protein Antagonists
Koes, David; Khoury, Kareem; Huang, Yijun; Wang, Wei; Bista, Michal; Popowicz, Grzegorz M.; Wolf, Siglinde; Holak, Tad A.; Dömling, Alexander; Camacho, Carlos J.
2012-01-01
Although there is no shortage of potential drug targets, there are only a handful known low-molecular-weight inhibitors of protein-protein interactions (PPIs). One problem is that current efforts are dominated by low-yield high-throughput screening, whose rigid framework is not suitable for the diverse chemotypes present in PPIs. Here, we developed a novel pharmacophore-based interactive screening technology that builds on the role anchor residues, or deeply buried hot spots, have in PPIs, and redesigns these entry points with anchor-biased virtual multicomponent reactions, delivering tens of millions of readily synthesizable novel compounds. Application of this approach to the MDM2/p53 cancer target led to high hit rates, resulting in a large and diverse set of confirmed inhibitors, and co-crystal structures validate the designed compounds. Our unique open-access technology promises to expand chemical space and the exploration of the human interactome by leveraging in-house small-scale assays and user-friendly chemistry to rationally design ligands for PPIs with known structure. PMID:22427896
Kavianpour, Hamidreza; Vasighi, Mahdi
2017-02-01
Nowadays, having knowledge about cellular attributes of proteins has an important role in pharmacy, medical science and molecular biology. These attributes are closely correlated with the function and three-dimensional structure of proteins. Knowledge of protein structural class is used by various methods for better understanding the protein functionality and folding patterns. Computational methods and intelligence systems can have an important role in performing structural classification of proteins. Most of protein sequences are saved in databanks as characters and strings and a numerical representation is essential for applying machine learning methods. In this work, a binary representation of protein sequences is introduced based on reduced amino acids alphabets according to surrounding hydrophobicity index. Many important features which are hidden in these long binary sequences can be clearly displayed through their cellular automata images. The extracted features from these images are used to build a classification model by support vector machine. Comparing to previous studies on the several benchmark datasets, the promising classification rates obtained by tenfold cross-validation imply that the current approach can help in revealing some inherent features deeply hidden in protein sequences and improve the quality of predicting protein structural class.
Young, Jasmine Y.; Westbrook, John D.; Feng, Zukang; Sala, Raul; Peisach, Ezra; Oldfield, Thomas J.; Sen, Sanchayita; Gutmanas, Aleksandras; Armstrong, David R.; Berrisford, John M.; Chen, Li; Chen, Minyu; Di Costanzo, Luigi; Dimitropoulos, Dimitris; Gao, Guanghua; Ghosh, Sutapa; Gore, Swanand; Guranovic, Vladimir; Hendrickx, Pieter MS; Hudson, Brian P.; Igarashi, Reiko; Ikegawa, Yasuyo; Kobayashi, Naohiro; Lawson, Catherine L.; Liang, Yuhe; Mading, Steve; Mak, Lora; Mir, M. Saqib; Mukhopadhyay, Abhik; Patwardhan, Ardan; Persikova, Irina; Rinaldi, Luana; Sanz-Garcia, Eduardo; Sekharan, Monica R.; Shao, Chenghua; Swaminathan, G. Jawahar; Tan, Lihua; Ulrich, Eldon L.; van Ginkel, Glen; Yamashita, Reiko; Yang, Huanwang; Zhuravleva, Marina A.; Quesada, Martha; Kleywegt, Gerard J.; Berman, Helen M.; Markley, John L.; Nakamura, Haruki; Velankar, Sameer; Burley, Stephen K.
2017-01-01
SUMMARY OneDep, a unified system for deposition, biocuration, and validation of experimentally determined structures of biological macromolecules to the Protein Data Bank (PDB) archive, has been developed as a global collaboration by the Worldwide Protein Data Bank (wwPDB) partners. This new system was designed to ensure that the wwPDB could meet the evolving archiving requirements of the scientific community over the coming decades. OneDep unifies deposition, biocuration, and validation pipelines across all wwPDB, EMDB, and BMRB deposition sites with improved focus on data quality and completeness in these archives, while supporting growth in the number of depositions and increases in their average size and complexity. In this paper, we describe the design, functional operation, and supporting infrastructure of the OneDep system, and provide initial performance assessments. PMID:28190782
NASA Astrophysics Data System (ADS)
Shi, Jade; Nobrega, R. Paul; Schwantes, Christian; Kathuria, Sagar V.; Bilsel, Osman; Matthews, C. Robert; Lane, T. J.; Pande, Vijay S.
2017-03-01
The dynamics of globular proteins can be described in terms of transitions between a folded native state and less-populated intermediates, or excited states, which can play critical roles in both protein folding and function. Excited states are by definition transient species, and therefore are difficult to characterize using current experimental techniques. Here, we report an atomistic model of the excited state ensemble of a stabilized mutant of an extensively studied flavodoxin fold protein CheY. We employed a hybrid simulation and experimental approach in which an aggregate 42 milliseconds of all-atom molecular dynamics were used as an informative prior for the structure of the excited state ensemble. This prior was then refined against small-angle X-ray scattering (SAXS) data employing an established method (EROS). The most striking feature of the resulting excited state ensemble was an unstructured N-terminus stabilized by non-native contacts in a conformation that is topologically simpler than the native state. Using these results, we then predict incisive single molecule FRET experiments as a means of model validation. This study demonstrates the paradigm of uniting simulation and experiment in a statistical model to study the structure of protein excited states and rationally design validating experiments.
Christensen, Anders S.; Linnet, Troels E.; Borg, Mikael; Boomsma, Wouter; Lindorff-Larsen, Kresten; Hamelryck, Thomas; Jensen, Jan H.
2013-01-01
We present the ProCS method for the rapid and accurate prediction of protein backbone amide proton chemical shifts - sensitive probes of the geometry of key hydrogen bonds that determine protein structure. ProCS is parameterized against quantum mechanical (QM) calculations and reproduces high level QM results obtained for a small protein with an RMSD of 0.25 ppm (r = 0.94). ProCS is interfaced with the PHAISTOS protein simulation program and is used to infer statistical protein ensembles that reflect experimentally measured amide proton chemical shift values. Such chemical shift-based structural refinements, starting from high-resolution X-ray structures of Protein G, ubiquitin, and SMN Tudor Domain, result in average chemical shifts, hydrogen bond geometries, and trans-hydrogen bond (h3 JNC') spin-spin coupling constants that are in excellent agreement with experiment. We show that the structural sensitivity of the QM-based amide proton chemical shift predictions is needed to obtain this agreement. The ProCS method thus offers a powerful new tool for refining the structures of hydrogen bonding networks to high accuracy with many potential applications such as protein flexibility in ligand binding. PMID:24391900
Lee, Hui Sun; Im, Wonpil
2016-04-01
Molecular recognition by protein mostly occurs in a local region on the protein surface. Thus, an efficient computational method for accurate characterization of protein local structural conservation is necessary to better understand biology and drug design. We present a novel local structure alignment tool, G-LoSA. G-LoSA aligns protein local structures in a sequence order independent way and provides a GA-score, a chemical feature-based and size-independent structure similarity score. Our benchmark validation shows the robust performance of G-LoSA to the local structures of diverse sizes and characteristics, demonstrating its universal applicability to local structure-centric comparative biology studies. In particular, G-LoSA is highly effective in detecting conserved local regions on the entire surface of a given protein. In addition, the applications of G-LoSA to identifying template ligands and predicting ligand and protein binding sites illustrate its strong potential for computer-aided drug design. We hope that G-LoSA can be a useful computational method for exploring interesting biological problems through large-scale comparison of protein local structures and facilitating drug discovery research and development. G-LoSA is freely available to academic users at http://im.compbio.ku.edu/GLoSA/. © 2016 The Protein Society.
Alonso-López, Diego; Gutiérrez, Miguel A.; Lopes, Katia P.; Prieto, Carlos; Santamaría, Rodrigo; De Las Rivas, Javier
2016-01-01
APID (Agile Protein Interactomes DataServer) is an interactive web server that provides unified generation and delivery of protein interactomes mapped to their respective proteomes. This resource is a new, fully redesigned server that includes a comprehensive collection of protein interactomes for more than 400 organisms (25 of which include more than 500 interactions) produced by the integration of only experimentally validated protein–protein physical interactions. For each protein–protein interaction (PPI) the server includes currently reported information about its experimental validation to allow selection and filtering at different quality levels. As a whole, it provides easy access to the interactomes from specific species and includes a global uniform compendium of 90,379 distinct proteins and 678,441 singular interactions. APID integrates and unifies PPIs from major primary databases of molecular interactions, from other specific repositories and also from experimentally resolved 3D structures of protein complexes where more than two proteins were identified. For this purpose, a collection of 8,388 structures were analyzed to identify specific PPIs. APID also includes a new graph tool (based on Cytoscape.js) for visualization and interactive analyses of PPI networks. The server does not require registration and it is freely available for use at http://apid.dep.usal.es. PMID:27131791
Resolution of ab initio shapes determined from small-angle scattering.
Tuukkanen, Anne T; Kleywegt, Gerard J; Svergun, Dmitri I
2016-11-01
Spatial resolution is an important characteristic of structural models, and the authors of structures determined by X-ray crystallography or electron cryo-microscopy always provide the resolution upon publication and deposition. Small-angle scattering of X-rays or neutrons (SAS) has recently become a mainstream structural method providing the overall three-dimensional structures of proteins, nucleic acids and complexes in solution. However, no quantitative resolution measure is available for SAS-derived models, which significantly hampers their validation and further use. Here, a method is derived for resolution assessment for ab initio shape reconstruction from scattering data. The inherent variability of the ab initio shapes is utilized and it is demonstrated how their average Fourier shell correlation function is related to the model resolution. The method is validated against simulated data for proteins with known high-resolution structures and its efficiency is demonstrated in applications to experimental data. It is proposed that henceforth the resolution be reported in publications and depositions of ab initio SAS models.
Resolution of ab initio shapes determined from small-angle scattering
Tuukkanen, Anne T.; Kleywegt, Gerard J.; Svergun, Dmitri I.
2016-01-01
Spatial resolution is an important characteristic of structural models, and the authors of structures determined by X-ray crystallography or electron cryo-microscopy always provide the resolution upon publication and deposition. Small-angle scattering of X-rays or neutrons (SAS) has recently become a mainstream structural method providing the overall three-dimensional structures of proteins, nucleic acids and complexes in solution. However, no quantitative resolution measure is available for SAS-derived models, which significantly hampers their validation and further use. Here, a method is derived for resolution assessment for ab initio shape reconstruction from scattering data. The inherent variability of the ab initio shapes is utilized and it is demonstrated how their average Fourier shell correlation function is related to the model resolution. The method is validated against simulated data for proteins with known high-resolution structures and its efficiency is demonstrated in applications to experimental data. It is proposed that henceforth the resolution be reported in publications and depositions of ab initio SAS models. PMID:27840683
Doreleijers, J F; Vriend, G; Raves, M L; Kaptein, R
1999-11-15
A statistical analysis is reported of 1,200 of the 1,404 nuclear magnetic resonance (NMR)-derived protein and nucleic acid structures deposited in the Protein Data Bank (PDB) before 1999. Excluded from this analysis were the entries not yet fully validated by the PDB and the more than 100 entries that contained < 95% of the expected hydrogens. The aim was to assess the geometry of the hydrogens in the remaining structures and to provide a check on their nomenclature. Deviations in bond lengths, bond angles, improper dihedral angles, and planarity with respect to estimated values were checked. More than 100 entries showed anomalous protonation states for some of their amino acids. Approximately 250,000 (1.7%) atom names differed from the consensus PDB nomenclature. Most of the inconsistencies are due to swapped prochiral labeling. Large deviations from the expected geometry exist for a considerable number of entries, many of which are average structures. The most common causes for these deviations seem to be poor minimization of average structures and an improper balance between force-field constraints for experimental and holonomic data. Some specific geometric outliers are related to the refinement programs used. A number of recommendations for biomolecular databases, modeling programs, and authors submitting biomolecular structures are given.
Votano, Joseph R; Parham, Marc; Hall, L Mark; Hall, Lowell H; Kier, Lemont B; Oloff, Scott; Tropsha, Alexander
2006-11-30
Four modeling techniques, using topological descriptors to represent molecular structure, were employed to produce models of human serum protein binding (% bound) on a data set of 1008 experimental values, carefully screened from publicly available sources. To our knowledge, this data is the largest set on human serum protein binding reported for QSAR modeling. The data was partitioned into a training set of 808 compounds and an external validation test set of 200 compounds. Partitioning was accomplished by clustering the compounds in a structure descriptor space so that random sampling of 20% of the whole data set produced an external test set that is a good representative of the training set with respect to both structure and protein binding values. The four modeling techniques include multiple linear regression (MLR), artificial neural networks (ANN), k-nearest neighbors (kNN), and support vector machines (SVM). With the exception of the MLR model, the ANN, kNN, and SVM QSARs were ensemble models. Training set correlation coefficients and mean absolute error ranged from r2=0.90 and MAE=7.6 for ANN to r2=0.61 and MAE=16.2 for MLR. Prediction results from the validation set yielded correlation coefficients and mean absolute errors which ranged from r2=0.70 and MAE=14.1 for ANN to a low of r2=0.59 and MAE=18.3 for the SVM model. Structure descriptors that contribute significantly to the models are discussed and compared with those found in other published models. For the ANN model, structure descriptor trends with respect to their affects on predicted protein binding can assist the chemist in structure modification during the drug design process.
A Maltose-Binding Protein Fusion Construct Yields a Robust Crystallography Platform for MCL1
Clifton, Matthew C.; Dranow, David M.; Leed, Alison; Fulroth, Ben; Fairman, James W.; Abendroth, Jan; Atkins, Kateri A.; Wallace, Ellen; Fan, Dazhong; Xu, Guoping; Ni, Z. J.; Daniels, Doug; Van Drie, John; Wei, Guo; Burgin, Alex B.; Golub, Todd R.; Hubbard, Brian K.; Serrano-Wu, Michael H.
2015-01-01
Crystallization of a maltose-binding protein MCL1 fusion has yielded a robust crystallography platform that generated the first apo MCL1 crystal structure, as well as five ligand-bound structures. The ability to obtain fragment-bound structures advances structure-based drug design efforts that, despite considerable effort, had previously been intractable by crystallography. In the ligand-independent crystal form we identify inhibitor binding modes not observed in earlier crystallographic systems. This MBP-MCL1 construct dramatically improves the structural understanding of well-validated MCL1 ligands, and will likely catalyze the structure-based optimization of high affinity MCL1 inhibitors. PMID:25909780
Is the isolated ligand binding domain a good model of the domain in the native receptor?
Deming, Dustin; Cheng, Qing; Jayaraman, Vasanthi
2003-05-16
Numerous studies have used the atomic level structure of the isolated ligand binding domain of the glutamate receptor to elucidate the agonist-induced activation and desensitization processes in this group of proteins. However, no study has demonstrated the structural equivalence of the isolated ligand binding fragments and the protein in the native receptor. In this report, using visible absorption spectroscopy we show that the electronic environment of the antagonist 6-cyano-7-nitro-2,3-dihydroxyquinoxaline is identical for the isolated protein and the native glutamate receptors expressed in cells. Our results hence establish that the local structure of the ligand binding site is the same in the two proteins and validate the detailed structure-function relationships that have been developed based on a comparison of the structure of the isolated ligand binding domain and electrophysiological consequences in the native receptor.
Dong, Zheng; Zhou, Hongyu; Tao, Peng
2018-02-01
PAS domains are widespread in archaea, bacteria, and eukaryota, and play important roles in various functions. In this study, we aim to explore functional evolutionary relationship among proteins in the PAS domain superfamily in view of the sequence-structure-dynamics-function relationship. We collected protein sequences and crystal structure data from RCSB Protein Data Bank of the PAS domain superfamily belonging to three biological functions (nucleotide binding, photoreceptor activity, and transferase activity). Protein sequences were aligned and then used to select sequence-conserved residues and build phylogenetic tree. Three-dimensional structure alignment was also applied to obtain structure-conserved residues. The protein dynamics were analyzed using elastic network model (ENM) and validated by molecular dynamics (MD) simulation. The result showed that the proteins with same function could be grouped by sequence similarity, and proteins in different functional groups displayed statistically significant difference in their vibrational patterns. Interestingly, in all three functional groups, conserved amino acid residues identified by sequence and structure conservation analysis generally have a lower fluctuation than other residues. In addition, the fluctuation of conserved residues in each biological function group was strongly correlated with the corresponding biological function. This research suggested a direct connection in which the protein sequences were related to various functions through structural dynamics. This is a new attempt to delineate functional evolution of proteins using the integrated information of sequence, structure, and dynamics. © 2017 The Protein Society.
Heo, Lim; Lee, Hasup; Seok, Chaok
2016-08-18
Protein-protein docking methods have been widely used to gain an atomic-level understanding of protein interactions. However, docking methods that employ low-resolution energy functions are popular because of computational efficiency. Low-resolution docking tends to generate protein complex structures that are not fully optimized. GalaxyRefineComplex takes such low-resolution docking structures and refines them to improve model accuracy in terms of both interface contact and inter-protein orientation. This refinement method allows flexibility at the protein interface and in the overall docking structure to capture conformational changes that occur upon binding. Symmetric refinement is also provided for symmetric homo-complexes. This method was validated by refining models produced by available docking programs, including ZDOCK and M-ZDOCK, and was successfully applied to CAPRI targets in a blind fashion. An example of using the refinement method with an existing docking method for ligand binding mode prediction of a drug target is also presented. A web server that implements the method is freely available at http://galaxy.seoklab.org/refinecomplex.
Automated protein NMR structure determination using wavelet de-noised NOESY spectra.
Dancea, Felician; Günther, Ulrich
2005-11-01
A major time-consuming step of protein NMR structure determination is the generation of reliable NOESY cross peak lists which usually requires a significant amount of manual interaction. Here we present a new algorithm for automated peak picking involving wavelet de-noised NOESY spectra in a process where the identification of peaks is coupled to automated structure determination. The core of this method is the generation of incremental peak lists by applying different wavelet de-noising procedures which yield peak lists of a different noise content. In combination with additional filters which probe the consistency of the peak lists, good convergence of the NOESY-based automated structure determination could be achieved. These algorithms were implemented in the context of the ARIA software for automated NOE assignment and structure determination and were validated for a polysulfide-sulfur transferase protein of known structure. The procedures presented here should be commonly applicable for efficient protein NMR structure determination and automated NMR peak picking.
Random close packing in protein cores
NASA Astrophysics Data System (ADS)
Gaines, Jennifer C.; Smith, W. Wendell; Regan, Lynne; O'Hern, Corey S.
2016-03-01
Shortly after the determination of the first protein x-ray crystal structures, researchers analyzed their cores and reported packing fractions ϕ ≈0.75 , a value that is similar to close packing of equal-sized spheres. A limitation of these analyses was the use of extended atom models, rather than the more physically accurate explicit hydrogen model. The validity of the explicit hydrogen model was proved in our previous studies by its ability to predict the side chain dihedral angle distributions observed in proteins. In contrast, the extended atom model is not able to recapitulate the side chain dihedral angle distributions, and gives rise to large atomic clashes at side chain dihedral angle combinations that are highly probable in protein crystal structures. Here, we employ the explicit hydrogen model to calculate the packing fraction of the cores of over 200 high-resolution protein structures. We find that these protein cores have ϕ ≈0.56 , which is similar to results obtained from simulations of random packings of individual amino acids. This result provides a deeper understanding of the physical basis of protein structure that will enable predictions of the effects of amino acid mutations to protein cores and interfaces of known structure.
Random close packing in protein cores.
Gaines, Jennifer C; Smith, W Wendell; Regan, Lynne; O'Hern, Corey S
2016-03-01
Shortly after the determination of the first protein x-ray crystal structures, researchers analyzed their cores and reported packing fractions ϕ ≈ 0.75, a value that is similar to close packing of equal-sized spheres. A limitation of these analyses was the use of extended atom models, rather than the more physically accurate explicit hydrogen model. The validity of the explicit hydrogen model was proved in our previous studies by its ability to predict the side chain dihedral angle distributions observed in proteins. In contrast, the extended atom model is not able to recapitulate the side chain dihedral angle distributions, and gives rise to large atomic clashes at side chain dihedral angle combinations that are highly probable in protein crystal structures. Here, we employ the explicit hydrogen model to calculate the packing fraction of the cores of over 200 high-resolution protein structures. We find that these protein cores have ϕ ≈ 0.56, which is similar to results obtained from simulations of random packings of individual amino acids. This result provides a deeper understanding of the physical basis of protein structure that will enable predictions of the effects of amino acid mutations to protein cores and interfaces of known structure.
Wang, W; Zhang, W; Jiang, R; Luan, Y
2010-05-01
It is of vital importance to find genetic variants that underlie human complex diseases and locate genes that are responsible for these diseases. Since proteins are typically composed of several structural domains, it is reasonable to assume that harmful genetic variants may alter structures of protein domains, affect functions of proteins and eventually cause disorders. With this understanding, the authors explore the possibility of recovering associations between protein domains and complex diseases. The authors define associations between protein domains and disease families on the basis of associations between non-synonymous single nucleotide polymorphisms (nsSNPs) and complex diseases, similarities between diseases, and relations between proteins and domains. Based on a domain-domain interaction network, the authors propose a 'guilt-by-proximity' principle to rank candidate domains according to their average distance to a set of seed domains in the domain-domain interaction network. The authors validate the method through large-scale cross-validation experiments on simulated linkage intervals, random controls and the whole genome. Results show that areas under receiver operating characteristic curves (AUC scores) can be as high as 77.90%, and the mean rank ratios can be as low as 21.82%. The authors further offer a freely accessible web interface for a genome-wide landscape of associations between domains and disease families.
STRUM: structure-based prediction of protein stability changes upon single-point mutation.
Quan, Lijun; Lv, Qiang; Zhang, Yang
2016-10-01
Mutations in human genome are mainly through single nucleotide polymorphism, some of which can affect stability and function of proteins, causing human diseases. Several methods have been proposed to predict the effect of mutations on protein stability; but most require features from experimental structure. Given the fast progress in protein structure prediction, this work explores the possibility to improve the mutation-induced stability change prediction using low-resolution structure modeling. We developed a new method (STRUM) for predicting stability change caused by single-point mutations. Starting from wild-type sequences, 3D models are constructed by the iterative threading assembly refinement (I-TASSER) simulations, where physics- and knowledge-based energy functions are derived on the I-TASSER models and used to train STRUM models through gradient boosting regression. STRUM was assessed by 5-fold cross validation on 3421 experimentally determined mutations from 150 proteins. The Pearson correlation coefficient (PCC) between predicted and measured changes of Gibbs free-energy gap, ΔΔG, upon mutation reaches 0.79 with a root-mean-square error 1.2 kcal/mol in the mutation-based cross-validations. The PCC reduces if separating training and test mutations from non-homologous proteins, which reflects inherent correlations in the current mutation sample. Nevertheless, the results significantly outperform other state-of-the-art methods, including those built on experimental protein structures. Detailed analyses show that the most sensitive features in STRUM are the physics-based energy terms on I-TASSER models and the conservation scores from multiple-threading template alignments. However, the ΔΔG prediction accuracy has only a marginal dependence on the accuracy of protein structure models as long as the global fold is correct. These data demonstrate the feasibility to use low-resolution structure modeling for high-accuracy stability change prediction upon point mutations. http://zhanglab.ccmb.med.umich.edu/STRUM/ CONTACT: qiang@suda.edu.cn and zhng@umich.edu Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
STRUM: structure-based prediction of protein stability changes upon single-point mutation
Quan, Lijun; Lv, Qiang; Zhang, Yang
2016-01-01
Motivation: Mutations in human genome are mainly through single nucleotide polymorphism, some of which can affect stability and function of proteins, causing human diseases. Several methods have been proposed to predict the effect of mutations on protein stability; but most require features from experimental structure. Given the fast progress in protein structure prediction, this work explores the possibility to improve the mutation-induced stability change prediction using low-resolution structure modeling. Results: We developed a new method (STRUM) for predicting stability change caused by single-point mutations. Starting from wild-type sequences, 3D models are constructed by the iterative threading assembly refinement (I-TASSER) simulations, where physics- and knowledge-based energy functions are derived on the I-TASSER models and used to train STRUM models through gradient boosting regression. STRUM was assessed by 5-fold cross validation on 3421 experimentally determined mutations from 150 proteins. The Pearson correlation coefficient (PCC) between predicted and measured changes of Gibbs free-energy gap, ΔΔG, upon mutation reaches 0.79 with a root-mean-square error 1.2 kcal/mol in the mutation-based cross-validations. The PCC reduces if separating training and test mutations from non-homologous proteins, which reflects inherent correlations in the current mutation sample. Nevertheless, the results significantly outperform other state-of-the-art methods, including those built on experimental protein structures. Detailed analyses show that the most sensitive features in STRUM are the physics-based energy terms on I-TASSER models and the conservation scores from multiple-threading template alignments. However, the ΔΔG prediction accuracy has only a marginal dependence on the accuracy of protein structure models as long as the global fold is correct. These data demonstrate the feasibility to use low-resolution structure modeling for high-accuracy stability change prediction upon point mutations. Availability and Implementation: http://zhanglab.ccmb.med.umich.edu/STRUM/ Contact: qiang@suda.edu.cn and zhng@umich.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27318206
NASA Astrophysics Data System (ADS)
Jiang, Zhou-Ting; Zhang, Lin-Xi; Sun, Ting-Ting; Wu, Tai-Quan
2009-10-01
The character of forming long-range contacts affects the three-dimensional structure of globular proteins deeply. As the different ability to form long-range contacts between 20 types of amino acids and 4 categories of globular proteins, the statistical properties are thoroughly discussed in this paper. Two parameters NC and ND are defined to confine the valid residues in detail. The relationship between hydrophobicity scales and valid residue percentage of each amino acid is given in the present work and the linear functions are shown in our statistical results. It is concluded that the hydrophobicity scale defined by chemical derivatives of the amino acids and nonpolar phase of large unilamellar vesicle membranes is the most effective technique to characterise the hydrophobic behavior of amino acid residues. Meanwhile, residue percentage Pi and sequential residue length Li of a certain protein i are calculated under different conditions. The statistical results show that the average value of Pi as well as Li of all-α proteins has a minimum among these 4 classes of globular proteins, indicating that all-α proteins are hardly capable of forming long-range contacts one by one along their linear amino acid sequences. All-β proteins have a higher tendency to construct long-range contacts along their primary sequences related to the secondary configurations, i.e. parallel and anti-parallel configurations of β sheets. The investigation of the interior properties of globular proteins give us the connection between the three-dimensional structure and its primary sequence data or secondary configurations, and help us to understand the structure of protein and its folding process well.
Computational approaches for drug discovery.
Hung, Che-Lun; Chen, Chi-Chun
2014-09-01
Cellular proteins are the mediators of multiple organism functions being involved in physiological mechanisms and disease. By discovering lead compounds that affect the function of target proteins, the target diseases or physiological mechanisms can be modulated. Based on knowledge of the ligand-receptor interaction, the chemical structures of leads can be modified to improve efficacy, selectivity and reduce side effects. One rational drug design technology, which enables drug discovery based on knowledge of target structures, functional properties and mechanisms, is computer-aided drug design (CADD). The application of CADD can be cost-effective using experiments to compare predicted and actual drug activity, the results from which can used iteratively to improve compound properties. The two major CADD-based approaches are structure-based drug design, where protein structures are required, and ligand-based drug design, where ligand and ligand activities can be used to design compounds interacting with the protein structure. Approaches in structure-based drug design include docking, de novo design, fragment-based drug discovery and structure-based pharmacophore modeling. Approaches in ligand-based drug design include quantitative structure-affinity relationship and pharmacophore modeling based on ligand properties. Based on whether the structure of the receptor and its interaction with the ligand are known, different design strategies can be seed. After lead compounds are generated, the rule of five can be used to assess whether these have drug-like properties. Several quality validation methods, such as cost function analysis, Fisher's cross-validation analysis and goodness of hit test, can be used to estimate the metrics of different drug design strategies. To further improve CADD performance, multi-computers and graphics processing units may be applied to reduce costs. © 2014 Wiley Periodicals, Inc.
Farkas, Viktor; Jákli, Imre; Tóth, Gábor K; Perczel, András
2016-09-19
Both far- and near-UV electronic circular dichroism (ECD) spectra have bands sensitive to thermal unfolding of Trp and Tyr residues containing proteins. Beside spectral changes at 222 nm reporting secondary structural variations (far-UV range), L b bands (near-UV range) are applicable as 3D-fold sensors of protein's core structure. In this study we show that both L b (Tyr) and L b (Trp) ECD bands could be used as sensors of fold compactness. ECD is a relative method and thus requires NMR referencing and cross-validation, also provided here. The ensemble of 204 ECD spectra of Trp-cage miniproteins is analysed as a training set for "calibrating" Trp↔Tyr folded systems of known NMR structure. While in the far-UV ECD spectra changes are linear as a function of the temperature, near-UV ECD data indicate a non-linear and thus, cooperative unfolding mechanism of these proteins. Ensemble of ECD spectra deconvoluted gives both conformational weights and insight to a protein folding↔unfolding mechanism. We found that the L b 293 band is reporting on the 3D-structure compactness. In addition, the pure near-UV ECD spectrum of the unfolded state is described here for the first time. Thus, ECD folding information now validated can be applied with confidence in a large thermal window (5≤T≤85 °C) compared to NMR for studying the unfolding of Trp↔Tyr residue pairs. In conclusion, folding propensities of important proteins (RNA polymerase II, ubiquitin protein ligase, tryptase-inhibitor etc.) can now be analysed with higher confidence. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Fitmunk: improving protein structures by accurate, automatic modeling of side-chain conformations.
Porebski, Przemyslaw Jerzy; Cymborowski, Marcin; Pasenkiewicz-Gierula, Marta; Minor, Wladek
2016-02-01
Improvements in crystallographic hardware and software have allowed automated structure-solution pipelines to approach a near-`one-click' experience for the initial determination of macromolecular structures. However, in many cases the resulting initial model requires a laborious, iterative process of refinement and validation. A new method has been developed for the automatic modeling of side-chain conformations that takes advantage of rotamer-prediction methods in a crystallographic context. The algorithm, which is based on deterministic dead-end elimination (DEE) theory, uses new dense conformer libraries and a hybrid energy function derived from experimental data and prior information about rotamer frequencies to find the optimal conformation of each side chain. In contrast to existing methods, which incorporate the electron-density term into protein-modeling frameworks, the proposed algorithm is designed to take advantage of the highly discriminatory nature of electron-density maps. This method has been implemented in the program Fitmunk, which uses extensive conformational sampling. This improves the accuracy of the modeling and makes it a versatile tool for crystallographic model building, refinement and validation. Fitmunk was extensively tested on over 115 new structures, as well as a subset of 1100 structures from the PDB. It is demonstrated that the ability of Fitmunk to model more than 95% of side chains accurately is beneficial for improving the quality of crystallographic protein models, especially at medium and low resolutions. Fitmunk can be used for model validation of existing structures and as a tool to assess whether side chains are modeled optimally or could be better fitted into electron density. Fitmunk is available as a web service at http://kniahini.med.virginia.edu/fitmunk/server/ or at http://fitmunk.bitbucket.org/.
Moriarty, Nigel W.; Tronrud, Dale E.; Adams, Paul D.; ...
2016-01-01
Chemical restraints are a fundamental part of crystallographic protein structure refinement. In response to mounting evidence that conventional restraints have shortcomings, it has previously been documented that using backbone restraints that depend on the protein backbone conformation helps to address these shortcomings and improves the performance of refinements [Moriartyet al.(2014),FEBS J.281, 4061–4071]. It is important that these improvements be made available to all in the protein crystallography community. Toward this end, a change in the default geometry library used byPhenixis described here. Tests are presented showing that this change will not generate increased numbers of outliers during validation, or depositionmore » in the Protein Data Bank, during the transition period in which some validation tools still use the conventional restraint libraries.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Moriarty, Nigel W.; Tronrud, Dale E.; Adams, Paul D.
Chemical restraints are a fundamental part of crystallographic protein structure refinement. In response to mounting evidence that conventional restraints have shortcomings, it has previously been documented that using backbone restraints that depend on the protein backbone conformation helps to address these shortcomings and improves the performance of refinements [Moriartyet al.(2014),FEBS J.281, 4061–4071]. It is important that these improvements be made available to all in the protein crystallography community. Toward this end, a change in the default geometry library used byPhenixis described here. Tests are presented showing that this change will not generate increased numbers of outliers during validation, or depositionmore » in the Protein Data Bank, during the transition period in which some validation tools still use the conventional restraint libraries.« less
An Evolution-Based Approach to De Novo Protein Design and Case Study on Mycobacterium tuberculosis
Brender, Jeffrey R.; Czajka, Jeff; Marsh, David; Gray, Felicia; Cierpicki, Tomasz; Zhang, Yang
2013-01-01
Computational protein design is a reverse procedure of protein folding and structure prediction, where constructing structures from evolutionarily related proteins has been demonstrated to be the most reliable method for protein 3-dimensional structure prediction. Following this spirit, we developed a novel method to design new protein sequences based on evolutionarily related protein families. For a given target structure, a set of proteins having similar fold are identified from the PDB library by structural alignments. A structural profile is then constructed from the protein templates and used to guide the conformational search of amino acid sequence space, where physicochemical packing is accommodated by single-sequence based solvation, torsion angle, and secondary structure predictions. The method was tested on a computational folding experiment based on a large set of 87 protein structures covering different fold classes, which showed that the evolution-based design significantly enhances the foldability and biological functionality of the designed sequences compared to the traditional physics-based force field methods. Without using homologous proteins, the designed sequences can be folded with an average root-mean-square-deviation of 2.1 Å to the target. As a case study, the method is extended to redesign all 243 structurally resolved proteins in the pathogenic bacteria Mycobacterium tuberculosis, which is the second leading cause of death from infectious disease. On a smaller scale, five sequences were randomly selected from the design pool and subjected to experimental validation. The results showed that all the designed proteins are soluble with distinct secondary structure and three have well ordered tertiary structure, as demonstrated by circular dichroism and NMR spectroscopy. Together, these results demonstrate a new avenue in computational protein design that uses knowledge of evolutionary conservation from protein structural families to engineer new protein molecules of improved fold stability and biological functionality. PMID:24204234
Chemical and protein structural basis for biological crosstalk between PPAR α and COX enzymes
NASA Astrophysics Data System (ADS)
Cleves, Ann E.; Jain, Ajay N.
2015-02-01
We have previously validated a probabilistic framework that combined computational approaches for predicting the biological activities of small molecule drugs. Molecule comparison methods included molecular structural similarity metrics and similarity computed from lexical analysis of text in drug package inserts. Here we present an analysis of novel drug/target predictions, focusing on those that were not obvious based on known pharmacological crosstalk. Considering those cases where the predicted target was an enzyme with known 3D structure allowed incorporation of information from molecular docking and protein binding pocket similarity in addition to ligand-based comparisons. Taken together, the combination of orthogonal information sources led to investigation of a surprising predicted relationship between a transcription factor and an enzyme, specifically, PPAR α and the cyclooxygenase enzymes. These predictions were confirmed by direct biochemical experiments which validate the approach and show for the first time that PPAR α agonists are cyclooxygenase inhibitors.
Validating metal binding sites in macromolecule structures using the CheckMyMetal web server
Zheng, Heping; Chordia, Mahendra D.; Cooper, David R.; Chruszcz, Maksymilian; Müller, Peter; Sheldrick, George M.
2015-01-01
Metals play vital roles in both the mechanism and architecture of biological macromolecules. Yet structures of metal-containing macromolecules where metals are misidentified and/or suboptimally modeled are abundant in the Protein Data Bank (PDB). This shows the need for a diagnostic tool to identify and correct such modeling problems with metal binding environments. The "CheckMyMetal" (CMM) web server (http://csgid.org/csgid/metal_sites/) is a sophisticated, user-friendly web-based method to evaluate metal binding sites in macromolecular structures in respect to 7350 metal binding sites observed in a benchmark dataset of 2304 high resolution crystal structures. The protocol outlines how the CMM server can be used to detect geometric and other irregularities in the structures of metal binding sites and alert researchers to potential errors in metal assignment. The protocol also gives practical guidelines for correcting problematic sites by modifying the metal binding environment and/or redefining metal identity in the PDB file. Several examples where this has led to meaningful results are described in the anticipated results section. CMM was designed for a broad audience—biomedical researchers studying metal-containing proteins and nucleic acids—but is equally well suited for structural biologists to validate new structures during modeling or refinement. The CMM server takes the coordinates of a metal-containing macromolecule structure in the PDB format as input and responds within a few seconds for a typical protein structure modeled with a few hundred amino acids. PMID:24356774
CABS-flex: server for fast simulation of protein structure fluctuations
Jamroz, Michal; Kolinski, Andrzej; Kmiecik, Sebastian
2013-01-01
The CABS-flex server (http://biocomp.chem.uw.edu.pl/CABSflex) implements CABS-model–based protocol for the fast simulations of near-native dynamics of globular proteins. In this application, the CABS model was shown to be a computationally efficient alternative to all-atom molecular dynamics—a classical simulation approach. The simulation method has been validated on a large set of molecular dynamics simulation data. Using a single input (user-provided file in PDB format), the CABS-flex server outputs an ensemble of protein models (in all-atom PDB format) reflecting the flexibility of the input structure, together with the accompanying analysis (residue mean-square-fluctuation profile and others). The ensemble of predicted models can be used in structure-based studies of protein functions and interactions. PMID:23658222
CABS-flex: Server for fast simulation of protein structure fluctuations.
Jamroz, Michal; Kolinski, Andrzej; Kmiecik, Sebastian
2013-07-01
The CABS-flex server (http://biocomp.chem.uw.edu.pl/CABSflex) implements CABS-model-based protocol for the fast simulations of near-native dynamics of globular proteins. In this application, the CABS model was shown to be a computationally efficient alternative to all-atom molecular dynamics--a classical simulation approach. The simulation method has been validated on a large set of molecular dynamics simulation data. Using a single input (user-provided file in PDB format), the CABS-flex server outputs an ensemble of protein models (in all-atom PDB format) reflecting the flexibility of the input structure, together with the accompanying analysis (residue mean-square-fluctuation profile and others). The ensemble of predicted models can be used in structure-based studies of protein functions and interactions.
Structure determination of an integral membrane protein at room temperature from crystals in situ
DOE Office of Scientific and Technical Information (OSTI.GOV)
Axford, Danny; Foadi, James; Imperial College London, London SW7 2AZ
2015-05-14
The X-ray structure determination of an integral membrane protein using synchrotron diffraction data measured in situ at room temperature is demonstrated. The structure determination of an integral membrane protein using synchrotron X-ray diffraction data collected at room temperature directly in vapour-diffusion crystallization plates (in situ) is demonstrated. Exposing the crystals in situ eliminates manual sample handling and, since it is performed at room temperature, removes the complication of cryoprotection and potential structural anomalies induced by sample cryocooling. Essential to the method is the ability to limit radiation damage by recording a small amount of data per sample from many samplesmore » and subsequently assembling the resulting data sets using specialized software. The validity of this procedure is established by the structure determination of Haemophilus influenza TehA at 2.3 Å resolution. The method presented offers an effective protocol for the fast and efficient determination of membrane-protein structures at room temperature using third-generation synchrotron beamlines.« less
The RCSB protein data bank: integrative view of protein, gene and 3D structural information
Rose, Peter W.; Prlić, Andreas; Altunkaya, Ali; Bi, Chunxiao; Bradley, Anthony R.; Christie, Cole H.; Costanzo, Luigi Di; Duarte, Jose M.; Dutta, Shuchismita; Feng, Zukang; Green, Rachel Kramer; Goodsell, David S.; Hudson, Brian; Kalro, Tara; Lowe, Robert; Peisach, Ezra; Randle, Christopher; Rose, Alexander S.; Shao, Chenghua; Tao, Yi-Ping; Valasatava, Yana; Voigt, Maria; Westbrook, John D.; Woo, Jesse; Yang, Huangwang; Young, Jasmine Y.; Zardecki, Christine; Berman, Helen M.; Burley, Stephen K.
2017-01-01
The Research Collaboratory for Structural Bioinformatics Protein Data Bank (RCSB PDB, http://rcsb.org), the US data center for the global PDB archive, makes PDB data freely available to all users, from structural biologists to computational biologists and beyond. New tools and resources have been added to the RCSB PDB web portal in support of a ‘Structural View of Biology.’ Recent developments have improved the User experience, including the high-speed NGL Viewer that provides 3D molecular visualization in any web browser, improved support for data file download and enhanced organization of website pages for query, reporting and individual structure exploration. Structure validation information is now visible for all archival entries. PDB data have been integrated with external biological resources, including chromosomal position within the human genome; protein modifications; and metabolic pathways. PDB-101 educational materials have been reorganized into a searchable website and expanded to include new features such as the Geis Digital Archive. PMID:27794042
Wong, Sienna; Jin, J-P
2017-01-01
Study of folded structure of proteins provides insights into their biological functions, conformational dynamics and molecular evolution. Current methods of elucidating folded structure of proteins are laborious, low-throughput, and constrained by various limitations. Arising from these methods is the need for a sensitive, quantitative, rapid and high-throughput method not only analysing the folded structure of proteins, but also to monitor dynamic changes under physiological or experimental conditions. In this focused review, we outline the foundation and limitations of current protein structure-determination methods prior to discussing the advantages of an emerging antibody epitope analysis for applications in structural, conformational and evolutionary studies of proteins. We discuss the application of this method using representative examples in monitoring allosteric conformation of regulatory proteins and the determination of the evolutionary lineage of related proteins and protein isoforms. The versatility of the method described herein is validated by the ability to modulate a variety of assay parameters to meet the needs of the user in order to monitor protein conformation. Furthermore, the assay has been used to clarify the lineage of troponin isoforms beyond what has been depicted by sequence homology alone, demonstrating the nonlinear evolutionary relationship between primary structure and tertiary structure of proteins. The antibody epitope analysis method is a highly adaptable technique of protein conformation elucidation, which can be easily applied without the need for specialized equipment or technical expertise. When applied in a systematic and strategic manner, this method has the potential to reveal novel and biomedically meaningful information for structure-function relationship and evolutionary lineage of proteins. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.
NNvPDB: Neural Network based Protein Secondary Structure Prediction with PDB Validation.
Sakthivel, Seethalakshmi; S K M, Habeeb
2015-01-01
The predicted secondary structural states are not cross validated by any of the existing servers. Hence, information on the level of accuracy for every sequence is not reported by the existing servers. This was overcome by NNvPDB, which not only reported greater Q3 but also validates every prediction with the homologous PDB entries. NNvPDB is based on the concept of Neural Network, with a new and different approach of training the network every time with five PDB structures that are similar to query sequence. The average accuracy for helix is 76%, beta sheet is 71% and overall (helix, sheet and coil) is 66%. http://bit.srmuniv.ac.in/cgi-bin/bit/cfpdb/nnsecstruct.pl.
Computational design of a self-assembling symmetrical β-propeller protein.
Voet, Arnout R D; Noguchi, Hiroki; Addy, Christine; Simoncini, David; Terada, Daiki; Unzai, Satoru; Park, Sam-Yong; Zhang, Kam Y J; Tame, Jeremy R H
2014-10-21
The modular structure of many protein families, such as β-propeller proteins, strongly implies that duplication played an important role in their evolution, leading to highly symmetrical intermediate forms. Previous attempts to create perfectly symmetrical propeller proteins have failed, however. We have therefore developed a new and rapid computational approach to design such proteins. As a test case, we have created a sixfold symmetrical β-propeller protein and experimentally validated the structure using X-ray crystallography. Each blade consists of 42 residues. Proteins carrying 2-10 identical blades were also expressed and purified. Two or three tandem blades assemble to recreate the highly stable sixfold symmetrical architecture, consistent with the duplication and fusion theory. The other proteins produce different monodisperse complexes, up to 42 blades (180 kDa) in size, which self-assemble according to simple symmetry rules. Our procedure is suitable for creating nano-building blocks from different protein templates of desired symmetry.
The Paris-Sud yeast structural genomics pilot-project: from structure to function.
Quevillon-Cheruel, Sophie; Liger, Dominique; Leulliot, Nicolas; Graille, Marc; Poupon, Anne; Li de La Sierra-Gallay, Inès; Zhou, Cong-Zhao; Collinet, Bruno; Janin, Joël; Van Tilbeurgh, Herman
2004-01-01
We present here the outlines and results from our yeast structural genomics (YSG) pilot-project. A lab-scale platform for the systematic production and structure determination is presented. In order to validate this approach, 250 non-membrane proteins of unknown structure were targeted. Strategies and final statistics are evaluated. We finally discuss the opportunity of structural genomics programs to contribute to functional biochemical annotation.
Khvostichenko, Daria S.; Schieferstein, Jeremy M.; Pawate, Ashtamurthy S.; ...
2014-08-21
Crystallization from lipidic mesophase matrices is a promising route to diffraction-quality crystals and structures of membrane proteins. The microfluidic approach reported here eliminates two bottlenecks of the standard mesophase-based crystallization protocols: (i) manual preparation of viscous mesophases and (ii) manual harvesting of often small and fragile protein crystals. In the approach reported here, protein-loaded mesophases are formulated in an X-ray transparent microfluidic chip using only 60 nL of the protein solution per crystallization trial. The X-ray transparency of the chip enables diffraction data collection from multiple crystals residing in microfluidic wells, eliminating the normally required manual harvesting and mounting ofmore » individual crystals. In addition, we validated our approach by on-chip crystallization of photosynthetic reaction center, a membrane protein from Rhodobacter sphaeroides, followed by solving its structure to a resolution of 2.5 Å using X-ray diffraction data collected on-chip under ambient conditions. A moderate conformational change in hydrophilic chains of the protein was observed when comparing the on-chip, room temperature structure with known structures for which data were acquired under cryogenic conditions.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Khvostichenko, Daria S.; Schieferstein, Jeremy M.; Pawate, Ashtamurthy S.
2014-10-01
Crystallization from lipidic mesophase matrices is a promising route to diffraction-quality crystals and structures of membrane proteins. The microfluidic approach reported here eliminates two bottlenecks of the standard mesophase-based crystallization protocols: (i) manual preparation of viscous mesophases and (ii) manual harvesting of often small and fragile protein crystals. In the approach reported here, protein-loaded mesophases are formulated in an X-ray transparent microfluidic chip using only 60 nL of the protein solution per crystallization trial. The X-ray transparency of the chip enables diffraction data collection from multiple crystals residing in microfluidic wells, eliminating the normally required manual harvesting and mounting ofmore » individual crystals. We validated our approach by on-chip crystallization of photosynthetic reaction center, a membrane protein from Rhodobacter sphaeroides, followed by solving its structure to a resolution of 2.5 Å using X-ray diffraction data collected on-chip under ambient conditions. A moderate conformational change in hydrophilic chains of the protein was observed when comparing the on-chip, room temperature structure with known structures for which data were acquired under cryogenic conditions.« less
Validating a Coarse-Grained Potential Energy Function through Protein Loop Modelling
MacDonald, James T.; Kelley, Lawrence A.; Freemont, Paul S.
2013-01-01
Coarse-grained (CG) methods for sampling protein conformational space have the potential to increase computational efficiency by reducing the degrees of freedom. The gain in computational efficiency of CG methods often comes at the expense of non-protein like local conformational features. This could cause problems when transitioning to full atom models in a hierarchical framework. Here, a CG potential energy function was validated by applying it to the problem of loop prediction. A novel method to sample the conformational space of backbone atoms was benchmarked using a standard test set consisting of 351 distinct loops. This method used a sequence-independent CG potential energy function representing the protein using -carbon positions only and sampling conformations with a Monte Carlo simulated annealing based protocol. Backbone atoms were added using a method previously described and then gradient minimised in the Rosetta force field. Despite the CG potential energy function being sequence-independent, the method performed similarly to methods that explicitly use either fragments of known protein backbones with similar sequences or residue-specific /-maps to restrict the search space. The method was also able to predict with sub-Angstrom accuracy two out of seven loops from recently solved crystal structures of proteins with low sequence and structure similarity to previously deposited structures in the PDB. The ability to sample realistic loop conformations directly from a potential energy function enables the incorporation of additional geometric restraints and the use of more advanced sampling methods in a way that is not possible to do easily with fragment replacement methods and also enable multi-scale simulations for protein design and protein structure prediction. These restraints could be derived from experimental data or could be design restraints in the case of computational protein design. C++ source code is available for download from http://www.sbg.bio.ic.ac.uk/phyre2/PD2/. PMID:23824634
Shi, Jade; Nobrega, R. Paul; Schwantes, Christian; ...
2017-03-08
The dynamics of globular proteins can be described in terms of transitions between a folded native state and less-populated intermediates, or excited states, which can play critical roles in both protein folding and function. Excited states are by definition transient species, and therefore are difficult to characterize using current experimental techniques. We report an atomistic model of the excited state ensemble of a stabilized mutant of an extensively studied flavodoxin fold protein CheY. We employed a hybrid simulation and experimental approach in which an aggregate 42 milliseconds of all-atom molecular dynamics were used as an informative prior for the structuremore » of the excited state ensemble. The resulting prior was then refined against small-angle X-ray scattering (SAXS) data employing an established method (EROS). The most striking feature of the resulting excited state ensemble was an unstructured N-terminus stabilized by non-native contacts in a conformation that is topologically simpler than the native state. We then predict incisive single molecule FRET experiments, using these results, as a means of model validation. Our study demonstrates the paradigm of uniting simulation and experiment in a statistical model to study the structure of protein excited states and rationally design validating experiments.« less
Babaei, Sepideh; Geranmayeh, Amir; Seyyedsalehi, Seyyed Ali
2010-12-01
The supervised learning of recurrent neural networks well-suited for prediction of protein secondary structures from the underlying amino acids sequence is studied. Modular reciprocal recurrent neural networks (MRR-NN) are proposed to model the strong correlations between adjacent secondary structure elements. Besides, a multilayer bidirectional recurrent neural network (MBR-NN) is introduced to capture the long-range intramolecular interactions between amino acids in formation of the secondary structure. The final modular prediction system is devised based on the interactive integration of the MRR-NN and the MBR-NN structures to arbitrarily engage the neighboring effects of the secondary structure types concurrent with memorizing the sequential dependencies of amino acids along the protein chain. The advanced combined network augments the percentage accuracy (Q₃) to 79.36% and boosts the segment overlap (SOV) up to 70.09% when tested on the PSIPRED dataset in three-fold cross-validation. Copyright © 2010 Elsevier Ireland Ltd. All rights reserved.
Dewhurst, Henry M.; Choudhury, Shilpa; Torres, Matthew P.
2015-01-01
Predicting the biological function potential of post-translational modifications (PTMs) is becoming increasingly important in light of the exponential increase in available PTM data from high-throughput proteomics. We developed structural analysis of PTM hotspots (SAPH-ire)—a quantitative PTM ranking method that integrates experimental PTM observations, sequence conservation, protein structure, and interaction data to allow rank order comparisons within or between protein families. Here, we applied SAPH-ire to the study of PTMs in diverse G protein families, a conserved and ubiquitous class of proteins essential for maintenance of intracellular structure (tubulins) and signal transduction (large and small Ras-like G proteins). A total of 1728 experimentally verified PTMs from eight unique G protein families were clustered into 451 unique hotspots, 51 of which have a known and cited biological function or response. Using customized software, the hotspots were analyzed in the context of 598 unique protein structures. By comparing distributions of hotspots with known versus unknown function, we show that SAPH-ire analysis is predictive for PTM biological function. Notably, SAPH-ire revealed high-ranking hotspots for which a functional impact has not yet been determined, including phosphorylation hotspots in the N-terminal tails of G protein gamma subunits—conserved protein structures never before reported as regulators of G protein coupled receptor signaling. To validate this prediction we used the yeast model system for G protein coupled receptor signaling, revealing that gamma subunit–N-terminal tail phosphorylation is activated in response to G protein coupled receptor stimulation and regulates protein stability in vivo. These results demonstrate the utility of integrating protein structural and sequence features into PTM prioritization schemes that can improve the analysis and functional power of modification-specific proteomics data. PMID:26070665
Structural basis for spectrin recognition by ankyrin.
Ipsaro, Jonathan J; Mondragón, Alfonso
2010-05-20
Maintenance of membrane integrity and organization in the metazoan cell is accomplished through intracellular tethering of membrane proteins to an extensive, flexible protein network. Spectrin, the principal component of this network, is anchored to membrane proteins through the adaptor protein ankyrin. To elucidate the atomic basis for this interaction, we determined a crystal structure of human betaI-spectrin repeats 13 to 15 in complex with the ZU5-ANK domain of human ankyrin R. The structure reveals the role of repeats 14 to 15 in binding, the electrostatic and hydrophobic contributions along the interface, and the necessity for a particular orientation of the spectrin repeats. Using structural and biochemical data as a guide, we characterized the individual proteins and their interactions by binding and thermal stability analyses. In addition to validating the structural model, these data provide insight into the nature of some mutations associated with cell morphology defects, including those found in human diseases such as hereditary spherocytosis and elliptocytosis. Finally, analysis of the ZU5 domain suggests it is a versatile protein-protein interaction module with distinct interaction surfaces. The structure represents not only the first of a spectrin fragment in complex with its binding partner, but also that of an intermolecular complex involving a ZU5 domain.
NASA Astrophysics Data System (ADS)
Paulino, M.; Esteves, A.; Vega, M.; Tabares, G.; Ehrlich, R.; Tapia, O.
1998-07-01
EgDf1 is a developmentally regulated protein from the parasite Echinococcus granulosus related to a family of hydrophobic ligand binding proteins. This protein could play a crucial role during the parasite life cycle development since this organism is unable to synthetize most of their own lipids de novo. Furthermore, it has been shown that two related protein from other parasitic platyhelminths (Fh15 from Fasciola hepatica and Sm14 from Schistosoma mansoni) are able to confer protective inmunity against experimental infection in animal models. A three-dimensional structure would help establishing structure/function relationships on a knowledge based manner. 3D structures for EgDf1 protein were modelled by using myelin P2 (mP2) and intestine fatty acid binding protein (I-FABP) as templates. Molecular dynamics techniques were used to validate the models. Template mP2 yielded the best 3D structure for EgDf1. Palmitic and oleic acids were docked inside EgDf1. The present theoretical results suggest definite location in the secondary structure of the epitopic regions, consensus phosphorylation motifs and oleic acid as a good ligand candidate to EgDf1. This protein might well be involved in the process of supplying hydrophobic metabolites for membrane biosynthesis and for signaling pathways.
2016-01-01
Abstract Molecular recognition by protein mostly occurs in a local region on the protein surface. Thus, an efficient computational method for accurate characterization of protein local structural conservation is necessary to better understand biology and drug design. We present a novel local structure alignment tool, G‐LoSA. G‐LoSA aligns protein local structures in a sequence order independent way and provides a GA‐score, a chemical feature‐based and size‐independent structure similarity score. Our benchmark validation shows the robust performance of G‐LoSA to the local structures of diverse sizes and characteristics, demonstrating its universal applicability to local structure‐centric comparative biology studies. In particular, G‐LoSA is highly effective in detecting conserved local regions on the entire surface of a given protein. In addition, the applications of G‐LoSA to identifying template ligands and predicting ligand and protein binding sites illustrate its strong potential for computer‐aided drug design. We hope that G‐LoSA can be a useful computational method for exploring interesting biological problems through large‐scale comparison of protein local structures and facilitating drug discovery research and development. G‐LoSA is freely available to academic users at http://im.compbio.ku.edu/GLoSA/. PMID:26813336
DOE Office of Scientific and Technical Information (OSTI.GOV)
Raymond, Amy; Lovell, Scott; Lorimer, Don
2009-12-01
With the goal of improving yield and success rates of heterologous protein production for structural studies we have developed the database and algorithm software package Gene Composer. This freely available electronic tool facilitates the information-rich design of protein constructs and their engineered synthetic gene sequences, as detailed in the accompanying manuscript. In this report, we compare heterologous protein expression levels from native sequences to that of codon engineered synthetic gene constructs designed by Gene Composer. A test set of proteins including a human kinase (P38{alpha}), viral polymerase (HCV NS5B), and bacterial structural protein (FtsZ) were expressed in both E. colimore » and a cell-free wheat germ translation system. We also compare the protein expression levels in E. coli for a set of 11 different proteins with greatly varied G:C content and codon bias. The results consistently demonstrate that protein yields from codon engineered Gene Composer designs are as good as or better than those achieved from the synonymous native genes. Moreover, structure guided N- and C-terminal deletion constructs designed with the aid of Gene Composer can lead to greater success in gene to structure work as exemplified by the X-ray crystallographic structure determination of FtsZ from Bacillus subtilis. These results validate the Gene Composer algorithms, and suggest that using a combination of synthetic gene and protein construct engineering tools can improve the economics of gene to structure research.« less
PREFMD: a web server for protein structure refinement via molecular dynamics simulations.
Heo, Lim; Feig, Michael
2018-03-15
Refinement of protein structure models is a long-standing problem in structural bioinformatics. Molecular dynamics-based methods have emerged as an avenue to achieve consistent refinement. The PREFMD web server implements an optimized protocol based on the method successfully tested in CASP11. Validation with recent CASP refinement targets shows consistent and more significant improvement in global structure accuracy over other state-of-the-art servers. PREFMD is freely available as a web server at http://feiglab.org/prefmd. Scripts for running PREFMD as a stand-alone package are available at https://github.com/feiglab/prefmd.git. feig@msu.edu. Supplementary data are available at Bioinformatics online.
Brooks, Mark A; Gewartowski, Kamil; Mitsiki, Eirini; Létoquart, Juliette; Pache, Roland A; Billier, Ysaline; Bertero, Michela; Corréa, Margot; Czarnocki-Cieciura, Mariusz; Dadlez, Michal; Henriot, Véronique; Lazar, Noureddine; Delbos, Lila; Lebert, Dorothée; Piwowarski, Jan; Rochaix, Pascal; Böttcher, Bettina; Serrano, Luis; Séraphin, Bertrand; van Tilbeurgh, Herman; Aloy, Patrick; Perrakis, Anastassis; Dziembowski, Andrzej
2010-09-08
For high-throughput structural studies of protein complexes of composition inferred from proteomics data, it is crucial that candidate complexes are selected accurately. Herein, we exemplify a procedure that combines a bioinformatics tool for complex selection with in vivo validation, to deliver structural results in a medium-throughout manner. We have selected a set of 20 yeast complexes, which were predicted to be feasible by either an automated bioinformatics algorithm, by manual inspection of primary data, or by literature searches. These complexes were validated with two straightforward and efficient biochemical assays, and heterologous expression technologies of complex components were then used to produce the complexes to assess their feasibility experimentally. Approximately one-half of the selected complexes were useful for structural studies, and we detail one particular success story. Our results underscore the importance of accurate target selection and validation in avoiding transient, unstable, or simply nonexistent complexes from the outset. Copyright © 2010 Elsevier Ltd. All rights reserved.
Protein Data Bank (PDB): The Single Global Macromolecular Structure Archive
Burley, Stephen K.; Berman, Helen M.; Kleywegt, Gerard J.; Markley, John L.; Nakamura, Haruki; Velankar, Sameer
2018-01-01
The Protein Data Bank (PDB)—the single global repository of experimentally determined 3D structures of biological macromolecules and their complexes—was established in 1971, becoming the first open-access digital resource in the biological sciences. The PDB archive currently houses ~130,000 entries (May 2017). It is managed by the Worldwide Protein Data Bank organization (wwPDB; wwpdb.org), which includes the RCSB Protein Data Bank (RCSB PDB; rcsb.org), the Protein Data Bank Japan (PDBj; pdbj.org), the Protein Data Bank in Europe (PDBe; pdbe.org), and BioMagResBank (BMRB; www.bmrb.wisc.edu). The four wwPDB partners operate a unified global software system that enforces community-agreed data standards and supports data Deposition, Biocuration, and Validation of ~11,000 new PDB entries annually (deposit.wwpdb.org). The RCSB PDB currently acts as the archive keeper, ensuring disaster recovery of PDB data and coordinating weekly updates. wwPDB partners disseminate the same archival data from multiple FTP sites, while operating complementary websites that provide their own views of PDB data with selected value-added information and links to related data resources. At present, the PDB archives experimental data, associated metadata, and 3D-atomic level structural models derived from three well-established methods: crystallography, nuclear magnetic resonance spectroscopy (NMR), and electron microscopy (3DEM). wwPDB partners are working closely with experts in related experimental areas (small-angle scattering, chemical cross-linking/mass spectrometry, Forster energy resonance transfer or FRET, etc.) to establish a federation of data resources that will support sustainable archiving and validation of 3D structural models and experimental data derived from integrative or hybrid methods. PMID:28573592
Protein Data Bank (PDB): The Single Global Macromolecular Structure Archive.
Burley, Stephen K; Berman, Helen M; Kleywegt, Gerard J; Markley, John L; Nakamura, Haruki; Velankar, Sameer
2017-01-01
The Protein Data Bank (PDB)--the single global repository of experimentally determined 3D structures of biological macromolecules and their complexes--was established in 1971, becoming the first open-access digital resource in the biological sciences. The PDB archive currently houses ~130,000 entries (May 2017). It is managed by the Worldwide Protein Data Bank organization (wwPDB; wwpdb.org), which includes the RCSB Protein Data Bank (RCSB PDB; rcsb.org), the Protein Data Bank Japan (PDBj; pdbj.org), the Protein Data Bank in Europe (PDBe; pdbe.org), and BioMagResBank (BMRB; www.bmrb.wisc.edu). The four wwPDB partners operate a unified global software system that enforces community-agreed data standards and supports data Deposition, Biocuration, and Validation of ~11,000 new PDB entries annually (deposit.wwpdb.org). The RCSB PDB currently acts as the archive keeper, ensuring disaster recovery of PDB data and coordinating weekly updates. wwPDB partners disseminate the same archival data from multiple FTP sites, while operating complementary websites that provide their own views of PDB data with selected value-added information and links to related data resources. At present, the PDB archives experimental data, associated metadata, and 3D-atomic level structural models derived from three well-established methods: crystallography, nuclear magnetic resonance spectroscopy (NMR), and electron microscopy (3DEM). wwPDB partners are working closely with experts in related experimental areas (small-angle scattering, chemical cross-linking/mass spectrometry, Forster energy resonance transfer or FRET, etc.) to establish a federation of data resources that will support sustainable archiving and validation of 3D structural models and experimental data derived from integrative or hybrid methods.
Meissner, Kamila A; Lunev, Sergey; Wang, Yuan-Ze; Linzke, Marleen; de Assis Batista, Fernando; Wrenger, Carsten; Groves, Matthew R
2017-01-01
The validation of drug targets in malaria and other human diseases remains a highly difficult and laborious process. In the vast majority of cases, highly specific small molecule tools to inhibit a proteins function in vivo are simply not available. Additionally, the use of genetic tools in the analysis of malarial pathways is challenging. These issues result in difficulties in specifically modulating a hypothetical drug target's function in vivo. The current "toolbox" of various methods and techniques to identify a protein's function in vivo remains very limited and there is a pressing need for expansion. New approaches are urgently required to support target validation in the drug discovery process. Oligomerisation is the natural assembly of multiple copies of a single protein into one object and this self-assembly is present in more than half of all protein structures. Thus, oligomerisation plays a central role in the generation of functional biomolecules. A key feature of oligomerisation is that the oligomeric interfaces between the individual parts of the final assembly are highly specific. However, these interfaces have not yet been systematically explored or exploited to dissect biochemical pathways in vivo. This mini review will describe the current state of the antimalarial toolset as well as the potentially druggable malarial pathways. A specific focus is drawn to the initial efforts to exploit oligomerisation surfaces in drug target validation. As alternative to the conventional methods, Protein Interference Assay (PIA) can be used for specific distortion of the target protein function and pathway assessment in vivo. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.
Hidden Markov model approach for identifying the modular framework of the protein backbone.
Camproux, A C; Tuffery, P; Chevrolat, J P; Boisvieux, J F; Hazout, S
1999-12-01
The hidden Markov model (HMM) was used to identify recurrent short 3D structural building blocks (SBBs) describing protein backbones, independently of any a priori knowledge. Polypeptide chains are decomposed into a series of short segments defined by their inter-alpha-carbon distances. Basically, the model takes into account the sequentiality of the observed segments and assumes that each one corresponds to one of several possible SBBs. Fitting the model to a database of non-redundant proteins allowed us to decode proteins in terms of 12 distinct SBBs with different roles in protein structure. Some SBBs correspond to classical regular secondary structures. Others correspond to a significant subdivision of their bounding regions previously considered to be a single pattern. The major contribution of the HMM is that this model implicitly takes into account the sequential connections between SBBs and thus describes the most probable pathways by which the blocks are connected to form the framework of the protein structures. Validation of the SBBs code was performed by extracting SBB series repeated in recoding proteins and examining their structural similarities. Preliminary results on the sequence specificity of SBBs suggest promising perspectives for the prediction of SBBs or series of SBBs from the protein sequences.
Building a Better Fragment Library for De Novo Protein Structure Prediction
de Oliveira, Saulo H. P.; Shi, Jiye; Deane, Charlotte M.
2015-01-01
Fragment-based approaches are the current standard for de novo protein structure prediction. These approaches rely on accurate and reliable fragment libraries to generate good structural models. In this work, we describe a novel method for structure fragment library generation and its application in fragment-based de novo protein structure prediction. The importance of correct testing procedures in assessing the quality of fragment libraries is demonstrated. In particular, the exclusion of homologs to the target from the libraries to correctly simulate a de novo protein structure prediction scenario, something which surprisingly is not always done. We demonstrate that fragments presenting different predominant predicted secondary structures should be treated differently during the fragment library generation step and that exhaustive and random search strategies should both be used. This information was used to develop a novel method, Flib. On a validation set of 41 structurally diverse proteins, Flib libraries presents both a higher precision and coverage than two of the state-of-the-art methods, NNMake and HHFrag. Flib also achieves better precision and coverage on the set of 275 protein domains used in the two previous experiments of the the Critical Assessment of Structure Prediction (CASP9 and CASP10). We compared Flib libraries against NNMake libraries in a structure prediction context. Of the 13 cases in which a correct answer was generated, Flib models were more accurate than NNMake models for 10. “Flib is available for download at: http://www.stats.ox.ac.uk/research/proteins/resources”. PMID:25901595
Criteria to Extract High-Quality Protein Data Bank Subsets for Structure Users.
Carugo, Oliviero; Djinović-Carugo, Kristina
2016-01-01
It is often necessary to build subsets of the Protein Data Bank to extract structural trends and average values. For this purpose it is mandatory that the subsets are non-redundant and of high quality. The first problem can be solved relatively easily at the sequence level or at the structural level. The second, on the contrary, needs special attention. It is not sufficient, in fact, to consider the crystallographic resolution and other feature must be taken into account: the absence of strings of residues from the electron density maps and from the files deposited in the Protein Data Bank; the B-factor values; the appropriate validation of the structural models; the quality of the electron density maps, which is not uniform; and the temperature of the diffraction experiments. More stringent criteria produce smaller subsets, which can be enlarged with more tolerant selection criteria. The incessant growth of the Protein Data Bank and especially of the number of high-resolution structures is allowing the use of more stringent selection criteria, with a consequent improvement of the quality of the subsets of the Protein Data Bank.
Probing alpha-helical and beta-sheet structures of peptides at solid/liquid interfaces with SFG.
Chen, Xiaoyun; Wang, Jie; Sniadecki, Jason J; Even, Mark A; Chen, Zhan
2005-03-29
We demonstrated that sum frequency generation (SFG) vibrational spectroscopy can distinguish different secondary structures of proteins or peptides adsorbed at solid/liquid interfaces. The SFG spectrum for tachyplesin I at the polystyrene (PS)/solution interface has a fingerprint peak corresponding to the B1/B3 mode of the antiparallel beta-sheet. This peak disappeared upon the addition of dithiothreitol, which can disrupt the beta-sheet structure. The SFG spectrum indicative of the MSI594 alpha-helical structure was observed at the PS/MSI594 solution interface. This research validates SFG as a powerful technique for revealing detailed secondary structures of interfacial proteins and peptides.
Protein lipoxidation: Detection strategies and challenges
Aldini, Giancarlo; Domingues, M. Rosário; Spickett, Corinne M.; Domingues, Pedro; Altomare, Alessandra; Sánchez-Gómez, Francisco J.; Oeste, Clara L.; Pérez-Sala, Dolores
2015-01-01
Enzymatic and non-enzymatic lipid metabolism can give rise to reactive species that may covalently modify cellular or plasma proteins through a process known as lipoxidation. Under basal conditions, protein lipoxidation can contribute to normal cell homeostasis and participate in signaling or adaptive mechanisms, as exemplified by lipoxidation of Ras proteins or of the cytoskeletal protein vimentin, both of which behave as sensors of electrophilic species. Nevertheless, increased lipoxidation under pathological conditions may lead to deleterious effects on protein structure or aggregation. This can result in impaired degradation and accumulation of abnormally folded proteins contributing to pathophysiology, as may occur in neurodegenerative diseases. Identification of the protein targets of lipoxidation and its functional consequences under pathophysiological situations can unveil the modification patterns associated with the various outcomes, as well as preventive strategies or potential therapeutic targets. Given the wide structural variability of lipid moieties involved in lipoxidation, highly sensitive and specific methods for its detection are required. Derivatization of reactive carbonyl species is instrumental in the detection of adducts retaining carbonyl groups. In addition, use of tagged derivatives of electrophilic lipids enables enrichment of lipoxidized proteins or peptides. Ultimate confirmation of lipoxidation requires high resolution mass spectrometry approaches to unequivocally identify the adduct and the targeted residue. Moreover, rigorous validation of the targets identified and assessment of the functional consequences of these modifications are essential. Here we present an update on methods to approach the complex field of lipoxidation along with validation strategies and functional assays illustrated with well-studied lipoxidation targets. PMID:26072467
Mavridis, Lazaros; Janes, Robert W
2017-01-01
Circular dichroism (CD) spectroscopy is extensively utilized for determining the percentages of secondary structure content present in proteins. However, although a large contributor, secondary structure is not the only factor that influences the shape and magnitude of the CD spectrum produced. Other structural features can make contributions so an entire protein structural conformation can give rise to a CD spectrum. There is a need for an application capable of generating protein CD spectra from atomic coordinates. However, no empirically derived method to do this currently exists. PDB2CD has been created as an empirical-based approach to the generation of protein CD spectra from atomic coordinates. The method utilizes a combination of structural features within the conformation of a protein; not only its percentage secondary structure content, but also the juxtaposition of these structural components relative to one another, and the overall structure similarity of the query protein to proteins in our dataset, the SP175 dataset, the 'gold standard' set obtained from the Protein Circular Dichroism Data Bank (PCDDB). A significant number of the CD spectra associated with the 71 proteins in this dataset have been produced with excellent accuracy using a leave-one-out cross-validation process. The method also creates spectra in good agreement with those of a test set of 14 proteins from the PCDDB. The PDB2CD package provides a web-based, user friendly approach to enable researchers to produce CD spectra from protein atomic coordinates. http://pdb2cd.cryst.bbk.ac.uk CONTACT: r.w.janes@qmul.ac.ukSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.
Li, Liqi; Cui, Xiang; Yu, Sanjiu; Zhang, Yuan; Luo, Zhong; Yang, Hua; Zhou, Yue; Zheng, Xiaoqi
2014-01-01
Protein structure prediction is critical to functional annotation of the massively accumulated biological sequences, which prompts an imperative need for the development of high-throughput technologies. As a first and key step in protein structure prediction, protein structural class prediction becomes an increasingly challenging task. Amongst most homological-based approaches, the accuracies of protein structural class prediction are sufficiently high for high similarity datasets, but still far from being satisfactory for low similarity datasets, i.e., below 40% in pairwise sequence similarity. Therefore, we present a novel method for accurate and reliable protein structural class prediction for both high and low similarity datasets. This method is based on Support Vector Machine (SVM) in conjunction with integrated features from position-specific score matrix (PSSM), PROFEAT and Gene Ontology (GO). A feature selection approach, SVM-RFE, is also used to rank the integrated feature vectors through recursively removing the feature with the lowest ranking score. The definitive top features selected by SVM-RFE are input into the SVM engines to predict the structural class of a query protein. To validate our method, jackknife tests were applied to seven widely used benchmark datasets, reaching overall accuracies between 84.61% and 99.79%, which are significantly higher than those achieved by state-of-the-art tools. These results suggest that our method could serve as an accurate and cost-effective alternative to existing methods in protein structural classification, especially for low similarity datasets.
Li, Zhan-Chao; Zhou, Xi-Bin; Dai, Zong; Zou, Xiao-Yong
2009-07-01
A prior knowledge of protein structural classes can provide useful information about its overall structure, so it is very important for quick and accurate determination of protein structural class with computation method in protein science. One of the key for computation method is accurate protein sample representation. Here, based on the concept of Chou's pseudo-amino acid composition (AAC, Chou, Proteins: structure, function, and genetics, 43:246-255, 2001), a novel method of feature extraction that combined continuous wavelet transform (CWT) with principal component analysis (PCA) was introduced for the prediction of protein structural classes. Firstly, the digital signal was obtained by mapping each amino acid according to various physicochemical properties. Secondly, CWT was utilized to extract new feature vector based on wavelet power spectrum (WPS), which contains more abundant information of sequence order in frequency domain and time domain, and PCA was then used to reorganize the feature vector to decrease information redundancy and computational complexity. Finally, a pseudo-amino acid composition feature vector was further formed to represent primary sequence by coupling AAC vector with a set of new feature vector of WPS in an orthogonal space by PCA. As a showcase, the rigorous jackknife cross-validation test was performed on the working datasets. The results indicated that prediction quality has been improved, and the current approach of protein representation may serve as a useful complementary vehicle in classifying other attributes of proteins, such as enzyme family class, subcellular localization, membrane protein types and protein secondary structure, etc.
The Protein Data Bank in Europe (PDBe): bringing structure to biology
DOE Office of Scientific and Technical Information (OSTI.GOV)
Velankar, Sameer; Kleywegt, Gerard J., E-mail: gerard@ebi.ac.uk
2011-04-01
Some future challenges for the PDB and its guardians are discussed and current and future activities in structural bioinformatics at the Protein Data Bank in Europe (PDBe) are described. The Protein Data Bank in Europe (PDBe) is the European partner in the Worldwide PDB and as such handles depositions of X-ray, NMR and EM data and structure models. PDBe also provides advanced bioinformatics services based on data from the PDB and related resources. Some of the challenges facing the PDB and its guardians are discussed, as well as some of the areas on which PDBe activities will focus in themore » future (advanced services, ligands, integration, validation and experimental data). Finally, some recent developments at PDBe are described.« less
Gold, Matthew G.; Fowler, Douglas M.; Means, Christopher K.; Pawson, Catherine T.; Stephany, Jason J.; Langeberg, Lorene K.; Fields, Stanley; Scott, John D.
2013-01-01
PKA is retained within distinct subcellular environments by the association of its regulatory type II (RII) subunits with A-kinase anchoring proteins (AKAPs). Conventional reagents that universally disrupt PKA anchoring are patterned after a conserved AKAP motif. We introduce a phage selection procedure that exploits high-resolution structural information to engineer RII mutants that are selective for a particular AKAP. Selective RII (RSelect) sequences were obtained for eight AKAPs following competitive selection screening. Biochemical and cell-based experiments validated the efficacy of RSelect proteins for AKAP2 and AKAP18. These engineered proteins represent a new class of reagents that can be used to dissect the contributions of different AKAP-targeted pools of PKA. Molecular modeling and high-throughput sequencing analyses revealed the molecular basis of AKAP-selective interactions and shed new light on native RII-AKAP interactions. We propose that this structure-directed evolution strategy might be generally applicable for the investigation of other protein interaction surfaces. PMID:23625929
Panda, Subhamay; Kumari, Leena; Panda, Santamay
2017-11-17
Chinese tree shrews (Tupaia belangeri chinensis) bear several characteristics that are considered to be very crucial for utilizing in animal experimental models in biomedical research. Subsequent to the identification of key aspects and signaling pathways in nervous and immune systems, it is revealed that tree shrews acquires shared common as well as unique characteristics, and hence offers a genetic basis for employing this animal as a prospective model for biomedical research. CD59 glycoprotein, commonly referred to as MAC-inhibitory protein (MAC-IP), membrane inhibitor of reactive lysis (MIRL), or protectin, is encoded by the CD59 gene in human beings. It is the member of the LY6/uPAR/alpha-neurotoxin protein family. With this initial point the objective of this study was to determine a comparative composite based structure of CD59 of Chinese tree shrew. The additional objective of this study was to examine the distribution of negatively and positively charged amino acid over molecular modeled structure, distribution of secondary structural elements, hydrophobicity molecular surface analysis and electrostatic potential analysis with the assistance of several bioinformatical analytical tools. CD59 Amino acid sequence of Chinese tree shrew collected from the online database system of National Centre for Biotechnology Information. SignalP 4.0 online server was employed for detection of signal peptide instance within the protein sequence of CD59. Molecular model structure of CD59 protein was generated by the Iterative Threading ASSEmbly Refinement (I-TASSER) suite. The confirmation for three-dimensional structural model was evaluated by structure validation tools. Location of negatively and positively charged amino acid over molecular modeled structure, distribution of secondary structural elements, and hydrophobicity molecular surface analysis was performed with the help of Chimera tool. Electrostatic potential analysis was carried out with the adaptive Poisson-Boltzmann solver package. Subsequently validated model was used for the functionally critical amino acids and active site prediction. The functionally critical amino acids and ligand- binding site (LBS) of the proteins (modeled) was determined using the COACH program. Analysis of Ramachandran plot for Chinese tree shrew depicted that overall, 100% of the residues in homology model were observed in allowed and favored regions, sequentially leading to the validation of the standard of generated protein structural model. In case of CD59 of Chinese tree shrew, the total score of G-factor was found to be -0.66 that was generally larger than the acceptable value. This approach suggests the significance and acceptability of the modeled structure of CD59 of Chinese tree shrew. The molecular model data in cooperation to other relevant post model analysis data put forward molecular insight to protecting activity of CD59 protein molecule of Chinese tree shrew. In the present study, we have proposed the first molecular model structure of uncharted CD59 of Chinese tree shrew by significantly utilizing the comparative composite modeling approach. Therefore, the development of a structural model of the CD59 protein was carried out and analyzed further for deducing molecular enrichment technique. The collaborative effort of molecular model and other relevant data of post model analysis carry forward molecular understanding to protecting activity of CD59 functions towards better insight of features of this natural lead compound. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.
Crystal Structure of Menin Reveals Binding Site for Mixed Lineage Leukemia (MLL) Protein
DOE Office of Scientific and Technical Information (OSTI.GOV)
Murai, Marcelo J.; Chruszcz, Maksymilian; Reddy, Gireesh
2014-10-02
Menin is a tumor suppressor protein that is encoded by the MEN1 (multiple endocrine neoplasia 1) gene and controls cell growth in endocrine tissues. Importantly, menin also serves as a critical oncogenic cofactor of MLL (mixed lineage leukemia) fusion proteins in acute leukemias. Direct association of menin with MLL fusion proteins is required for MLL fusion protein-mediated leukemogenesis in vivo, and this interaction has been validated as a new potential therapeutic target for development of novel anti-leukemia agents. Here, we report the first crystal structure of menin homolog from Nematostella vectensis. Due to a very high sequence similarity, the Nematostellamore » menin is a close homolog of human menin, and these two proteins likely have very similar structures. Menin is predominantly an {alpha}-helical protein with the protein core comprising three tetratricopeptide motifs that are flanked by two {alpha}-helical bundles and covered by a {beta}-sheet motif. A very interesting feature of menin structure is the presence of a large central cavity that is highly conserved between Nematostella and human menin. By employing site-directed mutagenesis, we have demonstrated that this cavity constitutes the binding site for MLL. Our data provide a structural basis for understanding the role of menin as a tumor suppressor protein and as an oncogenic co-factor of MLL fusion proteins. It also provides essential structural information for development of inhibitors targeting the menin-MLL interaction as a novel therapeutic strategy in MLL-related leukemias.« less
Intermediates and the folding of proteins L and G
DOE Office of Scientific and Technical Information (OSTI.GOV)
Brown, Scott; Head-Gordon, Teresa
We use a minimalist protein model, in combination with a sequence design strategy, to determine differences in primary structure for proteins L and G that are responsible for the two proteins folding through distinctly different folding mechanisms. We find that the folding of proteins L and G are consistent with a nucleation-condensation mechanism, each of which is described as helix-assisted {beta}-1 and {beta}-2 hairpin formation, respectively. We determine that the model for protein G exhibits an early intermediate that precedes the rate-limiting barrier of folding and which draws together misaligned secondary structure elements that are stabilized by hydrophobic core contactsmore » involving the third {beta}-strand, and presages the later transition state in which the correct strand alignment of these same secondary structure elements is restored. Finally the validity of the targeted intermediate ensemble for protein G was analyzed by fitting the kinetic data to a two-step first order reversible reaction, proving that protein G folding involves an on-pathway early intermediate, and should be populated and therefore observable by experiment.« less
Intermediates and the folding of proteins L and G
Brown, Scott; Head-Gordon, Teresa
2004-01-01
We use a minimalist protein model, in combination with a sequence design strategy, to determine differences in primary structure for proteins L and G, which are responsible for the two proteins folding through distinctly different folding mechanisms. We find that the folding of proteins L and G are consistent with a nucleation-condensation mechanism, each of which is described as helix-assisted β-1 and β-2 hairpin formation, respectively. We determine that the model for protein G exhibits an early intermediate that precedes the rate-limiting barrier of folding, and which draws together misaligned secondary structure elements that are stabilized by hydrophobic core contacts involving the third β-strand, and presages the later transition state in which the correct strand alignment of these same secondary structure elements is restored. Finally, the validity of the targeted intermediate ensemble for protein G was analyzed by fitting the kinetic data to a two-step first-order reversible reaction, proving that protein G folding involves an on-pathway early intermediate, and should be populated and therefore observable by experiment. PMID:15044729
Suckau, Detlev; Resemann, Anja
2009-12-01
The ability to match Top-Down protein sequencing (TDS) results by MALDI-TOF to protein sequences by classical protein database searching was evaluated in this work. Resulting from these analyses were the protein identity, the simultaneous assignment of the N- and C-termini and protein sequences of up to 70 residues from either terminus. In combination with de novo sequencing using the MALDI-TDS data, even fusion proteins were assigned and the detailed sequence around the fusion site was elucidated. MALDI-TDS allowed to efficiently match protein sequences quickly and to validate recombinant protein structures-in particular, protein termini-on the level of undigested proteins.
Saxena, Shalini; Abdullah, Maaged; Sriram, Dharmarajan; Guruprasad, Lalitha
2017-10-17
MurG (Rv2153c) is a key player in the biosynthesis of the peptidoglycan layer in Mycobacterium tuberculosis (Mtb). This work is an attempt to highlight the structural and functional relationship of Mtb MurG, the three-dimensional (3D) structure of protein was constructed by homology modelling using Discovery Studio 3.5 software. The quality and consistency of generated model was assessed by PROCHECK, ProSA and ERRAT. Later, the model was optimized by molecular dynamics (MD) simulations and the optimized model complex with substrate Uridine-diphosphate-N-acetylglucosamine (UD1) facilitated us to employ structure-based virtual screening approach to obtain new hits from Asinex database using energy-optimized pharmacophore modelling (e-pharmacophore). The pharmacophore model was validated using enrichment calculations, and finally, validated model was employed for high-throughput virtual screening and molecular docking to identify novel Mtb MurG inhibitors. This study led to the identification of 10 potential compounds with good fitness, docking score, which make important interactions with the protein active site. The 25 ns MD simulations of three potential lead compounds with protein confirmed that the structure was stable and make several non-bonding interactions with amino acids, such as Leu290, Met310 and Asn167. Hence, we concluded that the identified compounds may act as new leads for the design of Mtb MurG inhibitors.
Validation of ligands in macromolecular structures determined by X-ray crystallography
Horský, Vladimír; Svobodová Vařeková, Radka; Bendová, Veronika
2018-01-01
Crystallographic studies of ligands bound to biological macromolecules (proteins and nucleic acids) play a crucial role in structure-guided drug discovery and design, and also provide atomic level insights into the physical chemistry of complex formation between macromolecules and ligands. The quality with which small-molecule ligands have been modelled in Protein Data Bank (PDB) entries has been, and continues to be, a matter of concern for many investigators. Correctly interpreting whether electron density found in a binding site is compatible with the soaked or co-crystallized ligand or represents water or buffer molecules is often far from trivial. The Worldwide PDB validation report (VR) provides a mechanism to highlight any major issues concerning the quality of the data and the model at the time of deposition and annotation, so the depositors can fix issues, resulting in improved data quality. The ligand-validation methods used in the generation of the current VRs are described in detail, including an examination of the metrics to assess both geometry and electron-density fit. It is found that the LLDF score currently used to identify ligand electron-density fit outliers can give misleading results and that better ligand-validation metrics are required. PMID:29533230
Westbrook, John D; Feng, Zukang; Persikova, Irina; Sala, Raul; Sen, Sanchayita; Berrisford, John M; Swaminathan, G Jawahar; Oldfield, Thomas J; Gutmanas, Aleksandras; Igarashi, Reiko; Armstrong, David R; Baskaran, Kumaran; Chen, Li; Chen, Minyu; Clark, Alice R; Di Costanzo, Luigi; Dimitropoulos, Dimitris; Gao, Guanghua; Ghosh, Sutapa; Gore, Swanand; Guranovic, Vladimir; Hendrickx, Pieter M S; Hudson, Brian P; Ikegawa, Yasuyo; Kengaku, Yumiko; Lawson, Catherine L; Liang, Yuhe; Mak, Lora; Mukhopadhyay, Abhik; Narayanan, Buvaneswari; Nishiyama, Kayoko; Patwardhan, Ardan; Sahni, Gaurav; Sanz-García, Eduardo; Sato, Junko; Sekharan, Monica R; Shao, Chenghua; Smart, Oliver S; Tan, Lihua; van Ginkel, Glen; Yang, Huanwang; Zhuravleva, Marina A; Markley, John L; Nakamura, Haruki; Kurisu, Genji; Kleywegt, Gerard J; Velankar, Sameer; Berman, Helen M; Burley, Stephen K
2018-01-01
Abstract The Protein Data Bank (PDB) is the single global repository for experimentally determined 3D structures of biological macromolecules and their complexes with ligands. The worldwide PDB (wwPDB) is the international collaboration that manages the PDB archive according to the FAIR principles: Findability, Accessibility, Interoperability and Reusability. The wwPDB recently developed OneDep, a unified tool for deposition, validation and biocuration of structures of biological macromolecules. All data deposited to the PDB undergo critical review by wwPDB Biocurators. This article outlines the importance of biocuration for structural biology data deposited to the PDB and describes wwPDB biocuration processes and the role of expert Biocurators in sustaining a high-quality archive. Structural data submitted to the PDB are examined for self-consistency, standardized using controlled vocabularies, cross-referenced with other biological data resources and validated for scientific/technical accuracy. We illustrate how biocuration is integral to PDB data archiving, as it facilitates accurate, consistent and comprehensive representation of biological structure data, allowing efficient and effective usage by research scientists, educators, students and the curious public worldwide. Database URL: https://www.wwpdb.org/ PMID:29688351
Electrostatics of cysteine residues in proteins: Parameterization and validation of a simple model
Salsbury, Freddie R.; Poole, Leslie B.; Fetrow, Jacquelyn S.
2013-01-01
One of the most popular and simple models for the calculation of pKas from a protein structure is the semi-macroscopic electrostatic model MEAD. This model requires empirical parameters for each residue to calculate pKas. Analysis of current, widely used empirical parameters for cysteine residues showed that they did not reproduce expected cysteine pKas; thus, we set out to identify parameters consistent with the CHARMM27 force field that capture both the behavior of typical cysteines in proteins and the behavior of cysteines which have perturbed pKas. The new parameters were validated in three ways: (1) calculation across a large set of typical cysteines in proteins (where the calculations are expected to reproduce expected ensemble behavior); (2) calculation across a set of perturbed cysteines in proteins (where the calculations are expected to reproduce the shifted ensemble behavior); and (3) comparison to experimentally determined pKa values (where the calculation should reproduce the pKa within experimental error). Both the general behavior of cysteines in proteins and the perturbed pKa in some proteins can be predicted reasonably well using the newly determined empirical parameters within the MEAD model for protein electrostatics. This study provides the first general analysis of the electrostatics of cysteines in proteins, with specific attention paid to capturing both the behavior of typical cysteines in a protein and the behavior of cysteines whose pKa should be shifted, and validation of force field parameters for cysteine residues. PMID:22777874
Wang, Edina; Chinni, Suresh; Bhore, Subhash Janardhan
2014-01-01
Background: The fatty-acid profile of the vegetable oils determines its properties and nutritional value. Palm-oil obtained from the African oil-palm [Elaeis guineensis Jacq. (Tenera)] contains 44% palmitic acid (C16:0), but, palm-oil obtained from the American oilpalm [Elaeis oleifera] contains only 25% C16:0. In part, the b-ketoacyl-[ACP] synthase II (KASII) [EC: 2.3.1.179] protein is responsible for the high level of C16:0 in palm-oil derived from the African oil-palm. To understand more about E. guineensis KASII (EgKASII) and E. oleifera KASII (EoKASII) proteins, it is essential to know its structures. Hence, this study was undertaken. Objective: The objective of this study was to predict three-dimensional (3D) structure of EgKASII and EoKASII proteins using molecular modelling tools. Materials and Methods: The amino-acid sequences for KASII proteins were retrieved from the protein database of National Center for Biotechnology Information (NCBI), USA. The 3D structures were predicted for both proteins using homology modelling and ab-initio technique approach of protein structure prediction. The molecular dynamics (MD) simulation was performed to refine the predicted structures. The predicted structure models were evaluated and root mean square deviation (RMSD) and root mean square fluctuation (RMSF) values were calculated. Results: The homology modelling showed that EgKASII and EoKASII proteins are 78% and 74% similar with Streptococcus pneumonia KASII and Brucella melitensis KASII, respectively. The EgKASII and EoKASII structures predicted by using ab-initio technique approach shows 6% and 9% deviation to its structures predicted by homology modelling, respectively. The structure refinement and validation confirmed that the predicted structures are accurate. Conclusion: The 3D structures for EgKASII and EoKASII proteins were predicted. However, further research is essential to understand the interaction of EgKASII and EoKASII proteins with its substrates. PMID:24748752
Wang, Edina; Chinni, Suresh; Bhore, Subhash Janardhan
2014-01-01
The fatty-acid profile of the vegetable oils determines its properties and nutritional value. Palm-oil obtained from the African oil-palm [Elaeis guineensis Jacq. (Tenera)] contains 44% palmitic acid (C16:0), but, palm-oil obtained from the American oilpalm [Elaeis oleifera] contains only 25% C16:0. In part, the b-ketoacyl-[ACP] synthase II (KASII) [EC: 2.3.1.179] protein is responsible for the high level of C16:0 in palm-oil derived from the African oil-palm. To understand more about E. guineensis KASII (EgKASII) and E. oleifera KASII (EoKASII) proteins, it is essential to know its structures. Hence, this study was undertaken. The objective of this study was to predict three-dimensional (3D) structure of EgKASII and EoKASII proteins using molecular modelling tools. The amino-acid sequences for KASII proteins were retrieved from the protein database of National Center for Biotechnology Information (NCBI), USA. The 3D structures were predicted for both proteins using homology modelling and ab-initio technique approach of protein structure prediction. The molecular dynamics (MD) simulation was performed to refine the predicted structures. The predicted structure models were evaluated and root mean square deviation (RMSD) and root mean square fluctuation (RMSF) values were calculated. The homology modelling showed that EgKASII and EoKASII proteins are 78% and 74% similar with Streptococcus pneumonia KASII and Brucella melitensis KASII, respectively. The EgKASII and EoKASII structures predicted by using ab-initio technique approach shows 6% and 9% deviation to its structures predicted by homology modelling, respectively. The structure refinement and validation confirmed that the predicted structures are accurate. The 3D structures for EgKASII and EoKASII proteins were predicted. However, further research is essential to understand the interaction of EgKASII and EoKASII proteins with its substrates.
The property distance index PD predicts peptides that cross-react with IgE antibodies
Ivanciuc, Ovidiu; Midoro-Horiuti, Terumi; Schein, Catherine H.; Xie, Liping; Hillman, Gilbert R.; Goldblum, Randall M.; Braun, Werner
2009-01-01
Similarities in the sequence and structure of allergens can explain clinically observed cross-reactivities. Distinguishing sequences that bind IgE in patient sera can be used to identify potentially allergenic protein sequences and aid in the design of hypo-allergenic proteins. The property distance index PD, incorporated in our Structural Database of Allergenic Proteins (SDAP, http://fermi.utmb.edu/SDAP/), may identify potentially cross-reactive segments of proteins, based on their similarity to known IgE epitopes. We sought to obtain experimental validation of the PD index as a quantitative predictor of IgE cross-reactivity, by designing peptide variants with predetermined PD scores relative to three linear IgE epitopes of Jun a 1, the dominant allergen from mountain cedar pollen. For each of the three epitopes, 60 peptides were designed with increasing PD values (decreasing physicochemical similarity) to the starting sequence. The peptides synthesized on a derivatized cellulose membrane were probed with sera from patients who were allergic to Jun a 1, and the experimental data were interpreted with a PD classification method. Peptides with low PD values relative to a given epitope were more likely to bind IgE from the sera than were those with PD values larger than 6. Control sequences, with PD values between 18 and 20 to all the three epitopes, did not bind patient IgE, thus validating our procedure for identifying negative control peptides. The PD index is a statistically validated method to detect discrete regions of proteins that have a high probability of cross-reacting with IgE from allergic patients. PMID:18950868
Cross-Link Guided Molecular Modeling with ROSETTA
Leitner, Alexander; Rosenberger, George; Aebersold, Ruedi; Malmström, Lars
2013-01-01
Chemical cross-links identified by mass spectrometry generate distance restraints that reveal low-resolution structural information on proteins and protein complexes. The technology to reliably generate such data has become mature and robust enough to shift the focus to the question of how these distance restraints can be best integrated into molecular modeling calculations. Here, we introduce three workflows for incorporating distance restraints generated by chemical cross-linking and mass spectrometry into ROSETTA protocols for comparative and de novo modeling and protein-protein docking. We demonstrate that the cross-link validation and visualization software Xwalk facilitates successful cross-link data integration. Besides the protocols we introduce XLdb, a database of chemical cross-links from 14 different publications with 506 intra-protein and 62 inter-protein cross-links, where each cross-link can be mapped on an experimental structure from the Protein Data Bank. Finally, we demonstrate on a protein-protein docking reference data set the impact of virtual cross-links on protein docking calculations and show that an inter-protein cross-link can reduce on average the RMSD of a docking prediction by 5.0 Å. The methods and results presented here provide guidelines for the effective integration of chemical cross-link data in molecular modeling calculations and should advance the structural analysis of particularly large and transient protein complexes via hybrid structural biology methods. PMID:24069194
There is Diversity in Disorder-"In all Chaos there is a Cosmos, in all Disorder a Secret Order".
Nielsen, Jakob T; Mulder, Frans A A
2016-01-01
The protein universe consists of a continuum of structures ranging from full order to complete disorder. As the structured part of the proteome has been intensively studied, stably folded proteins are increasingly well documented and understood. However, proteins that are fully, or in large part, disordered are much less well characterized. Here we collected NMR chemical shifts in a small database for 117 protein sequences that are known to contain disorder. We demonstrate that NMR chemical shift data can be brought to bear as an exquisite judge of protein disorder at the residue level, and help in validation. With the help of secondary chemical shift analysis we demonstrate that the proteins in the database span the full spectrum of disorder, but still, largely segregate into two classes; disordered with small segments of order scattered along the sequence, and structured with small segments of disorder inserted between the different structured regions. A detailed analysis reveals that the distribution of order/disorder along the sequence shows a complex and asymmetric distribution, that is highly protein-dependent. Access to ratified training data further suggests an avenue to improving prediction of disorder from sequence.
Černý, Jiří; Schneider, Bohdan; Biedermannová, Lada
2017-07-14
Water molecules represent an integral part of proteins and a key determinant of protein structure, dynamics and function. WatAA is a newly developed, web-based atlas of amino-acid hydration in proteins. The atlas provides information about the ordered first hydration shell of the most populated amino-acid conformers in proteins. The data presented in the atlas are drawn from two sources: experimental data and ab initio quantum-mechanics calculations. The experimental part is based on a data-mining study of a large set of high-resolution protein crystal structures. The crystal-derived data include 3D maps of water distribution around amino-acids and probability of occurrence of each of the identified hydration sites. The quantum mechanics calculations validate and extend this primary description by optimizing the water position for each hydration site, by providing hydrogen atom positions and by quantifying the interaction energy that stabilizes the water molecule at the particular hydration site position. The calculations show that the majority of experimentally derived hydration sites are positioned near local energy minima for water, and the calculated interaction energies help to assess the preference of water for the individual hydration sites. We propose that the atlas can be used to validate water placement in electron density maps in crystallographic refinement, to locate water molecules mediating protein-ligand interactions in drug design, and to prepare and evaluate molecular dynamics simulations. WatAA: Atlas of Protein Hydration is freely available without login at .
Text Mining for Protein Docking
Badal, Varsha D.; Kundrotas, Petras J.; Vakser, Ilya A.
2015-01-01
The rapidly growing amount of publicly available information from biomedical research is readily accessible on the Internet, providing a powerful resource for predictive biomolecular modeling. The accumulated data on experimentally determined structures transformed structure prediction of proteins and protein complexes. Instead of exploring the enormous search space, predictive tools can simply proceed to the solution based on similarity to the existing, previously determined structures. A similar major paradigm shift is emerging due to the rapidly expanding amount of information, other than experimentally determined structures, which still can be used as constraints in biomolecular structure prediction. Automated text mining has been widely used in recreating protein interaction networks, as well as in detecting small ligand binding sites on protein structures. Combining and expanding these two well-developed areas of research, we applied the text mining to structural modeling of protein-protein complexes (protein docking). Protein docking can be significantly improved when constraints on the docking mode are available. We developed a procedure that retrieves published abstracts on a specific protein-protein interaction and extracts information relevant to docking. The procedure was assessed on protein complexes from Dockground (http://dockground.compbio.ku.edu). The results show that correct information on binding residues can be extracted for about half of the complexes. The amount of irrelevant information was reduced by conceptual analysis of a subset of the retrieved abstracts, based on the bag-of-words (features) approach. Support Vector Machine models were trained and validated on the subset. The remaining abstracts were filtered by the best-performing models, which decreased the irrelevant information for ~ 25% complexes in the dataset. The extracted constraints were incorporated in the docking protocol and tested on the Dockground unbound benchmark set, significantly increasing the docking success rate. PMID:26650466
Prediction of physical protein protein interactions
NASA Astrophysics Data System (ADS)
Szilágyi, András; Grimm, Vera; Arakaki, Adrián K.; Skolnick, Jeffrey
2005-06-01
Many essential cellular processes such as signal transduction, transport, cellular motion and most regulatory mechanisms are mediated by protein-protein interactions. In recent years, new experimental techniques have been developed to discover the protein-protein interaction networks of several organisms. However, the accuracy and coverage of these techniques have proven to be limited, and computational approaches remain essential both to assist in the design and validation of experimental studies and for the prediction of interaction partners and detailed structures of protein complexes. Here, we provide a critical overview of existing structure-independent and structure-based computational methods. Although these techniques have significantly advanced in the past few years, we find that most of them are still in their infancy. We also provide an overview of experimental techniques for the detection of protein-protein interactions. Although the developments are promising, false positive and false negative results are common, and reliable detection is possible only by taking a consensus of different experimental approaches. The shortcomings of experimental techniques affect both the further development and the fair evaluation of computational prediction methods. For an adequate comparative evaluation of prediction and high-throughput experimental methods, an appropriately large benchmark set of biophysically characterized protein complexes would be needed, but is sorely lacking.
Pan, Li; Aguilar, Hillary Andaluz; Wang, Linna; Iliuk, Anton; Tao, W Andy
2016-11-30
Glycoproteins have vast structural diversity that plays an important role in many biological processes and have great potential as disease biomarkers. Here, we report a novel functionalized reverse phase protein array (RPPA), termed polymer-based reverse phase glycoprotein array (polyGPA), to capture and profile glycoproteomes specifically, and validate glycoproteins. Nitrocellulose membrane functionalized with globular hydroxyaminodendrimers was used to covalently capture preoxidized glycans on glycoproteins from complex protein samples such as biofluids. The captured glycoproteins were subsequently detected using the same validated antibodies as in RPPA. We demonstrated the outstanding specificity, sensitivity, and quantitative capabilities of polyGPA by capturing and detecting purified as well as endogenous α-1-acid glycoprotein (AGP) in human plasma. We further applied quantitative N-glycoproteomics and the strategy to validate a panel of glycoproteins identified as potential biomarkers for bladder cancer by analyzing urine glycoproteins from bladder cancer patients or matched healthy individuals.
From protein structure to function via single crystal optical spectroscopy
Ronda, Luca; Bruno, Stefano; Bettati, Stefano; Storici, Paola; Mozzarelli, Andrea
2015-01-01
The more than 100,000 protein structures determined by X-ray crystallography provide a wealth of information for the characterization of biological processes at the molecular level. However, several crystallographic “artifacts,” including conformational selection, crystallization conditions and radiation damages, may affect the quality and the interpretation of the electron density maps, thus limiting the relevance of structure determinations. Moreover, for most of these structures, no functional data have been obtained in the crystalline state, thus posing serious questions on their validity in infereing protein mechanisms. In order to solve these issues, spectroscopic methods have been applied for the determination of equilibrium and kinetic properties of proteins in the crystalline state. These methods are UV-vis spectrophotometry, spectrofluorimetry, IR, EPR, Raman, and resonance Raman spectroscopy. Some of these approaches have been implemented with on-line instruments at X-ray synchrotron beamlines. Here, we provide an overview of investigations predominantly carried out in our laboratory by single crystal polarized absorption UV-vis microspectrophotometry, the most applied technique for the functional characterization of proteins in the crystalline state. Studies on hemoglobins, pyridoxal 5′-phosphate dependent enzymes and green fluorescent protein in the crystalline state have addressed key biological issues, leading to either straightforward structure-function correlations or limitations to structure-based mechanisms. PMID:25988179
Simplified Protein Models: Predicting Folding Pathways and Structure Using Amino Acid Sequences
NASA Astrophysics Data System (ADS)
Adhikari, Aashish N.; Freed, Karl F.; Sosnick, Tobin R.
2013-07-01
We demonstrate the ability of simultaneously determining a protein’s folding pathway and structure using a properly formulated model without prior knowledge of the native structure. Our model employs a natural coordinate system for describing proteins and a search strategy inspired by the observation that real proteins fold in a sequential fashion by incrementally stabilizing nativelike substructures or “foldons.” Comparable folding pathways and structures are obtained for the twelve proteins recently studied using atomistic molecular dynamics simulations [K. Lindorff-Larsen, S. Piana, R. O. Dror, D. E. Shaw, Science 334, 517 (2011)], with our calculations running several orders of magnitude faster. We find that nativelike propensities in the unfolded state do not necessarily determine the order of structure formation, a departure from a major conclusion of the molecular dynamics study. Instead, our results support a more expansive view wherein intrinsic local structural propensities may be enhanced or overridden in the folding process by environmental context. The success of our search strategy validates it as an expedient mechanism for folding both in silico and in vivo.
Systematic Validation of Protein Force Fields against Experimental Data
Eastwood, Michael P.; Dror, Ron O.; Shaw, David E.
2012-01-01
Molecular dynamics simulations provide a vehicle for capturing the structures, motions, and interactions of biological macromolecules in full atomic detail. The accuracy of such simulations, however, is critically dependent on the force field—the mathematical model used to approximate the atomic-level forces acting on the simulated molecular system. Here we present a systematic and extensive evaluation of eight different protein force fields based on comparisons of experimental data with molecular dynamics simulations that reach a previously inaccessible timescale. First, through extensive comparisons with experimental NMR data, we examined the force fields' abilities to describe the structure and fluctuations of folded proteins. Second, we quantified potential biases towards different secondary structure types by comparing experimental and simulation data for small peptides that preferentially populate either helical or sheet-like structures. Third, we tested the force fields' abilities to fold two small proteins—one α-helical, the other with β-sheet structure. The results suggest that force fields have improved over time, and that the most recent versions, while not perfect, provide an accurate description of many structural and dynamical properties of proteins. PMID:22384157
Engineering Proteins for Thermostability with iRDP Web Server
Ghanate, Avinash; Ramasamy, Sureshkumar; Suresh, C. G.
2015-01-01
Engineering protein molecules with desired structure and biological functions has been an elusive goal. Development of industrially viable proteins with improved properties such as stability, catalytic activity and altered specificity by modifying the structure of an existing protein has widely been targeted through rational protein engineering. Although a range of factors contributing to thermal stability have been identified and widely researched, the in silico implementation of these as strategies directed towards enhancement of protein stability has not yet been explored extensively. A wide range of structural analysis tools is currently available for in silico protein engineering. However these tools concentrate on only a limited number of factors or individual protein structures, resulting in cumbersome and time-consuming analysis. The iRDP web server presented here provides a unified platform comprising of iCAPS, iStability and iMutants modules. Each module addresses different facets of effective rational engineering of proteins aiming towards enhanced stability. While iCAPS aids in selection of target protein based on factors contributing to structural stability, iStability uniquely offers in silico implementation of known thermostabilization strategies in proteins for identification and stability prediction of potential stabilizing mutation sites. iMutants aims to assess mutants based on changes in local interaction network and degree of residue conservation at the mutation sites. Each module was validated using an extensively diverse dataset. The server is freely accessible at http://irdp.ncl.res.in and has no login requirements. PMID:26436543
Engineering Proteins for Thermostability with iRDP Web Server.
Panigrahi, Priyabrata; Sule, Manas; Ghanate, Avinash; Ramasamy, Sureshkumar; Suresh, C G
2015-01-01
Engineering protein molecules with desired structure and biological functions has been an elusive goal. Development of industrially viable proteins with improved properties such as stability, catalytic activity and altered specificity by modifying the structure of an existing protein has widely been targeted through rational protein engineering. Although a range of factors contributing to thermal stability have been identified and widely researched, the in silico implementation of these as strategies directed towards enhancement of protein stability has not yet been explored extensively. A wide range of structural analysis tools is currently available for in silico protein engineering. However these tools concentrate on only a limited number of factors or individual protein structures, resulting in cumbersome and time-consuming analysis. The iRDP web server presented here provides a unified platform comprising of iCAPS, iStability and iMutants modules. Each module addresses different facets of effective rational engineering of proteins aiming towards enhanced stability. While iCAPS aids in selection of target protein based on factors contributing to structural stability, iStability uniquely offers in silico implementation of known thermostabilization strategies in proteins for identification and stability prediction of potential stabilizing mutation sites. iMutants aims to assess mutants based on changes in local interaction network and degree of residue conservation at the mutation sites. Each module was validated using an extensively diverse dataset. The server is freely accessible at http://irdp.ncl.res.in and has no login requirements.
Identification of DNA-Binding Proteins Using Structural, Electrostatic and Evolutionary Features
Nimrod, Guy; Szilágyi, András; Leslie, Christina; Ben-Tal, Nir
2009-01-01
Summary DNA binding proteins (DBPs) often take part in various crucial processes of the cell's life cycle. Therefore, the identification and characterization of these proteins are of great importance. We present here a random forests classifier for identifying DBPs among proteins with known three-dimensional structures. First, clusters of evolutionarily conserved regions (patches) on the protein's surface are detected using the PatchFinder algorithm; previous studies showed that these regions are typically the proteins' functionally important regions. Next, we train a classifier using features like the electrostatic potential, cluster-based amino acid conservation patterns and the secondary structure content of the patches, as well as features of the whole protein including its dipole moment. Using 10-fold cross validation on a dataset of 138 DNA-binding proteins and 110 proteins which do not bind DNA, the classifier achieved a sensitivity and a specificity of 0.90, which is overall better than the performance of previously published methods. Furthermore, when we tested 5 different methods on 11 new DBPs which did not appear in the original dataset, only our method annotated all correctly. The resulting classifier was applied to a collection of 757 proteins of known structure and unknown function. Of these proteins, 218 were predicted to bind DNA, and we anticipate that some of them interact with DNA using new structural motifs. The use of complementary computational tools supports the notion that at least some of them do bind DNA. PMID:19233205
Kuzu, Guray; Keskin, Ozlem; Nussinov, Ruth; Gursoy, Attila
2016-10-01
The structures of protein assemblies are important for elucidating cellular processes at the molecular level. Three-dimensional electron microscopy (3DEM) is a powerful method to identify the structures of assemblies, especially those that are challenging to study by crystallography. Here, a new approach, PRISM-EM, is reported to computationally generate plausible structural models using a procedure that combines crystallographic structures and density maps obtained from 3DEM. The predictions are validated against seven available structurally different crystallographic complexes. The models display mean deviations in the backbone of <5 Å. PRISM-EM was further tested on different benchmark sets; the accuracy was evaluated with respect to the structure of the complex, and the correlation with EM density maps and interface predictions were evaluated and compared with those obtained using other methods. PRISM-EM was then used to predict the structure of the ternary complex of the HIV-1 envelope glycoprotein trimer, the ligand CD4 and the neutralizing protein m36.
Strategies for carbohydrate model building, refinement and validation.
Agirre, Jon
2017-02-01
Sugars are the most stereochemically intricate family of biomolecules and present substantial challenges to anyone trying to understand their nomenclature, reactions or branched structures. Current crystallographic programs provide an abstraction layer allowing inexpert structural biologists to build complete protein or nucleic acid model components automatically either from scratch or with little manual intervention. This is, however, still not generally true for sugars. The need for carbohydrate-specific building and validation tools has been highlighted a number of times in the past, concomitantly with the introduction of a new generation of experimental methods that have been ramping up the production of protein-sugar complexes and glycoproteins for the past decade. While some incipient advances have been made to address these demands, correctly modelling and refining carbohydrates remains a challenge. This article will address many of the typical difficulties that a structural biologist may face when dealing with carbohydrates, with an emphasis on problem solving in the resolution range where X-ray crystallography and cryo-electron microscopy are expected to overlap in the next decade.
Wang, Nanyi; Wang, Lirong; Xie, Xiang-Qun
2017-11-27
Molecular docking is widely applied to computer-aided drug design and has become relatively mature in the recent decades. Application of docking in modeling varies from single lead compound optimization to large-scale virtual screening. The performance of molecular docking is highly dependent on the protein structures selected. It is especially challenging for large-scale target prediction research when multiple structures are available for a single target. Therefore, we have established ProSelection, a docking preferred-protein selection algorithm, in order to generate the proper structure subset(s). By the ProSelection algorithm, protein structures of "weak selectors" are filtered out whereas structures of "strong selectors" are kept. Specifically, the structure which has a good statistical performance of distinguishing active ligands from inactive ligands is defined as a strong selector. In this study, 249 protein structures of 14 autophagy-related targets are investigated. Surflex-dock was used as the docking engine to distinguish active and inactive compounds against these protein structures. Both t test and Mann-Whitney U test were used to distinguish the strong from the weak selectors based on the normality of the docking score distribution. The suggested docking score threshold for active ligands (SDA) was generated for each strong selector structure according to the receiver operating characteristic (ROC) curve. The performance of ProSelection was further validated by predicting the potential off-targets of 43 U.S. Federal Drug Administration approved small molecule antineoplastic drugs. Overall, ProSelection will accelerate the computational work in protein structure selection and could be a useful tool for molecular docking, target prediction, and protein-chemical database establishment research.
Random close packing in protein cores
NASA Astrophysics Data System (ADS)
Ohern, Corey
Shortly after the determination of the first protein x-ray crystal structures, researchers analyzed their cores and reported packing fractions ϕ ~ 0 . 75 , a value that is similar to close packing equal-sized spheres. A limitation of these analyses was the use of `extended atom' models, rather than the more physically accurate `explicit hydrogen' model. The validity of using the explicit hydrogen model is proved by its ability to predict the side chain dihedral angle distributions observed in proteins. We employ the explicit hydrogen model to calculate the packing fraction of the cores of over 200 high resolution protein structures. We find that these protein cores have ϕ ~ 0 . 55 , which is comparable to random close-packing of non-spherical particles. This result provides a deeper understanding of the physical basis of protein structure that will enable predictions of the effects of amino acid mutations and design of new functional proteins. We gratefully acknowledge the support of the Raymond and Beverly Sackler Institute for Biological, Physical, and Engineering Sciences, National Library of Medicine training grant T15LM00705628 (J.C.G.), and National Science Foundation DMR-1307712 (L.R.).
MolProbity: More and better reference data for improved all-atom structure validation.
Williams, Christopher J; Headd, Jeffrey J; Moriarty, Nigel W; Prisant, Michael G; Videau, Lizbeth L; Deis, Lindsay N; Verma, Vishal; Keedy, Daniel A; Hintze, Bradley J; Chen, Vincent B; Jain, Swati; Lewis, Steven M; Arendall, W Bryan; Snoeyink, Jack; Adams, Paul D; Lovell, Simon C; Richardson, Jane S; Richardson, David C
2018-01-01
This paper describes the current update on macromolecular model validation services that are provided at the MolProbity website, emphasizing changes and additions since the previous review in 2010. There have been many infrastructure improvements, including rewrite of previous Java utilities to now use existing or newly written Python utilities in the open-source CCTBX portion of the Phenix software system. This improves long-term maintainability and enhances the thorough integration of MolProbity-style validation within Phenix. There is now a complete MolProbity mirror site at http://molprobity.manchester.ac.uk. GitHub serves our open-source code, reference datasets, and the resulting multi-dimensional distributions that define most validation criteria. Coordinate output after Asn/Gln/His "flip" correction is now more idealized, since the post-refinement step has apparently often been skipped in the past. Two distinct sets of heavy-atom-to-hydrogen distances and accompanying van der Waals radii have been researched and improved in accuracy, one for the electron-cloud-center positions suitable for X-ray crystallography and one for nuclear positions. New validations include messages at input about problem-causing format irregularities, updates of Ramachandran and rotamer criteria from the million quality-filtered residues in a new reference dataset, the CaBLAM Cα-CO virtual-angle analysis of backbone and secondary structure for cryoEM or low-resolution X-ray, and flagging of the very rare cis-nonProline and twisted peptides which have recently been greatly overused. Due to wide application of MolProbity validation and corrections by the research community, in Phenix, and at the worldwide Protein Data Bank, newly deposited structures have continued to improve greatly as measured by MolProbity's unique all-atom clashscore. © 2017 The Protein Society.
Lapkouski, Mikalai; Hofbauerova, Katerina; Sovova, Zofie; Ettrichova, Olga; González-Pérez, Sergio; Dulebo, Alexander; Kaftan, David; Kuta Smatanova, Ivana; Revuelta, Jose L.; Arellano, Juan B.; Carey, Jannette; Ettrich, Rüdiger
2012-01-01
Raman microscopy permits structural analysis of protein crystals in situ in hanging drops, allowing for comparison with Raman measurements in solution. Nevertheless, the two methods sometimes reveal subtle differences in structure that are often ascribed to the water layer surrounding the protein. The novel method of drop-coating deposition Raman spectropscopy (DCDR) exploits an intermediate phase that, although nominally “dry,” has been shown to preserve protein structural features present in solution. The potential of this new approach to bridge the structural gap between proteins in solution and in crystals is explored here with extrinsic protein PsbP of photosystem II from Spinacia oleracea. In the high-resolution (1.98 Å) x-ray crystal structure of PsbP reported here, several segments of the protein chain are present but unresolved. Analysis of the three kinds of Raman spectra of PsbP suggests that most of the subtle differences can indeed be attributed to the water envelope, which is shown here to have a similar Raman intensity in glassy and crystal states. Using molecular dynamics simulations cross-validated by Raman solution data, two unresolved segments of the PsbP crystal structure were modeled as loops, and the amino terminus was inferred to contain an additional beta segment. The complete PsbP structure was compared with that of the PsbP-like protein CyanoP, which plays a more peripheral role in photosystem II function. The comparison suggests possible interaction surfaces of PsbP with higher-plant photosystem II. This work provides the first complete structural picture of this key protein, and it represents the first systematic comparison of Raman data from solution, glassy, and crystalline states of a protein. PMID:23071614
PDBe: Protein Data Bank in Europe
Velankar, S.; Alhroub, Y.; Best, C.; Caboche, S.; Conroy, M. J.; Dana, J. M.; Fernandez Montecelo, M. A.; van Ginkel, G.; Golovin, A.; Gore, S. P.; Gutmanas, A.; Haslam, P.; Hendrickx, P. M. S.; Heuson, E.; Hirshberg, M.; John, M.; Lagerstedt, I.; Mir, S.; Newman, L. E.; Oldfield, T. J.; Patwardhan, A.; Rinaldi, L.; Sahni, G.; Sanz-García, E.; Sen, S.; Slowley, R.; Suarez-Uruena, A.; Swaminathan, G. J.; Symmons, M. F.; Vranken, W. F.; Wainwright, M.; Kleywegt, G. J.
2012-01-01
The Protein Data Bank in Europe (PDBe; pdbe.org) is a partner in the Worldwide PDB organization (wwPDB; wwpdb.org) and as such actively involved in managing the single global archive of biomacromolecular structure data, the PDB. In addition, PDBe develops tools, services and resources to make structure-related data more accessible to the biomedical community. Here we describe recently developed, extended or improved services, including an animated structure-presentation widget (PDBportfolio), a widget to graphically display the coverage of any UniProt sequence in the PDB (UniPDB), chemistry- and taxonomy-based PDB-archive browsers (PDBeXplore), and a tool for interactive visualization of NMR structures, corresponding experimental data as well as validation and analysis results (Vivaldi). PMID:22110033
De novo inference of protein function from coarse-grained dynamics.
Bhadra, Pratiti; Pal, Debnath
2014-10-01
Inference of molecular function of proteins is the fundamental task in the quest for understanding cellular processes. The task is getting increasingly difficult with thousands of new proteins discovered each day. The difficulty arises primarily due to lack of high-throughput experimental technique for assessing protein molecular function, a lacunae that computational approaches are trying hard to fill. The latter too faces a major bottleneck in absence of clear evidence based on evolutionary information. Here we propose a de novo approach to annotate protein molecular function through structural dynamics match for a pair of segments from two dissimilar proteins, which may share even <10% sequence identity. To screen these matches, corresponding 1 µs coarse-grained (CG) molecular dynamics trajectories were used to compute normalized root-mean-square-fluctuation graphs and select mobile segments, which were, thereafter, matched for all pairs using unweighted three-dimensional autocorrelation vectors. Our in-house custom-built forcefield (FF), extensively validated against dynamics information obtained from experimental nuclear magnetic resonance data, was specifically used to generate the CG dynamics trajectories. The test for correspondence of dynamics-signature of protein segments and function revealed 87% true positive rate and 93.5% true negative rate, on a dataset of 60 experimentally validated proteins, including moonlighting proteins and those with novel functional motifs. A random test against 315 unique fold/function proteins for a negative test gave >99% true recall. A blind prediction on a novel protein appears consistent with additional evidences retrieved therein. This is the first proof-of-principle of generalized use of structural dynamics for inferring protein molecular function leveraging our custom-made CG FF, useful to all. © 2014 Wiley Periodicals, Inc.
Hidden relationships between metalloproteins unveiled by structural comparison of their metal sites
NASA Astrophysics Data System (ADS)
Valasatava, Yana; Andreini, Claudia; Rosato, Antonio
2015-03-01
Metalloproteins account for a substantial fraction of all proteins. They incorporate metal atoms, which are required for their structure and/or function. Here we describe a new computational protocol to systematically compare and classify metal-binding sites on the basis of their structural similarity. These sites are extracted from the MetalPDB database of minimal functional sites (MFSs) in metal-binding biological macromolecules. Structural similarity is measured by the scoring function of the available MetalS2 program. Hierarchical clustering was used to organize MFSs into clusters, for each of which a representative MFS was identified. The comparison of all representative MFSs provided a thorough structure-based classification of the sites analyzed. As examples, the application of the proposed computational protocol to all heme-binding proteins and zinc-binding proteins of known structure highlighted the existence of structural subtypes, validated known evolutionary links and shed new light on the occurrence of similar sites in systems at different evolutionary distances. The present approach thus makes available an innovative viewpoint on metalloproteins, where the functionally crucial metal sites effectively lead the discovery of structural and functional relationships in a largely protein-independent manner.
Screening a fragment cocktail library using ultrafiltration
Shibata, Sayaka; Zhang, Zhongsheng; Korotkov, Konstantin V.; Delarosa, Jaclyn; Napuli, Alberto; Kelley, Angela M.; Mueller, Natasha; Ross, Jennifer; Zucker, Frank H.; Buckner, Frederick S.; Merritt, Ethan A.; Verlinde, Christophe L. M. J.; Van Voorhis, Wesley C.; Hol, Wim G. J.; Fan, Erkang
2011-01-01
Ultrafiltration provides a generic method to discover ligands for protein drug targets with millimolar to micromolar Kd, the typical range of fragment-based drug discovery. This method was tailored to a 96-well format, and cocktails of fragment-sized molecules, with molecular masses between 150 and 300 Da, were screened against medical structural genomics target proteins. The validity of the method was confirmed through competitive binding assays in the presence of ligands known to bind the target proteins. PMID:21750879
NASA Astrophysics Data System (ADS)
Fernández, Ariel
2013-08-01
A significant episteric ("around a solid") distortion of the hydrogen-bond structure of water is promoted by solutes with nanoscale surface detail and physico-chemical complexity, such as soluble natural proteins. These structural distortions defy analysis because the discrete nature of the solvent at the interface is not upheld by the continuous laws of electrostatics. This work derives and validates an electrostatic equation that governs the episteric distortions of the hydrogen-bond matrix. The equation correlates distortions from bulk-like structural patterns with anomalous polarization components that do not align with the electrostatic field of the solute. The result implies that the interfacial energy stored in the orthogonal polarization correlates with the distortion of the water hydrogen-bond network. The result is validated vis-à-vis experimental data on protein interfacial thermodynamics and is interpreted in terms of the interaction energy between the electrostatic field of the solute and the dipole moment induced by the anomalous polarization of interfacial water. Finally, we consider solutes capable of changing their interface through conformational transitions and introduce a principle of minimal episteric distortion (MED) of the water matrix. We assess the importance of the MED principle in the context of protein folding, concluding that the native fold may be identified topologically with the conformation that minimizes the interfacial tension or disruption of the water matrix.
Biswas, Ria; Bagchi, Angshuman
2017-09-11
The tumour necrosis factor (TNF) receptor-associated factor (TRAF) family of proteins having E3 ligase activity are the key molecules involved in cellular immune response pathways. TRAF6 is a unique member of the TRAF superfamily differing from other members of the family, owing to its specific interactions with molecules outside the TNF receptor superfamily. The C-terminal domain of TRAF proteins contains the catalytic residues and are known to be involved in self-oligomerization forming a mushroom-shaped trimeric structure, which is the functional form of the protein. However, the monomeric crystal structure of TRAF6 C-terminal domain has been already determined, but the trimeric structure of the same is still not available. We here applied computational structural modelling and molecular dynamics simulations studies to get insights into the molecular interactions involved in determining the trimeric structure of the TRAF6 C-terminal domain. The non-availability of the trimeric structure of the TRAF6 C-terminal domain prevented the elucidation of the molecular mechanism of many different biological processes. Our results suggest that the trimer complex is transient in nature. The amino acid residues Lys340 and Glu345 in the coiled coil domain in the C-terminus of TRAF6 play a critical role in trimer structure formation. This structural modelling study may therefore be utilized to obtain the experimentally validated trimeric structure of this important protein.
Recommendations of the wwPDB NMR Validation Task Force
Montelione, Gaetano T.; Nilges, Michael; Bax, Ad; Güntert, Peter; Herrmann, Torsten; Richardson, Jane S.; Schwieters, Charles; Vranken, Wim F.; Vuister, Geerten W.; Wishart, David S.; Berman, Helen M.; Kleywegt, Gerard J.; Markley, John L.
2013-01-01
As methods for analysis of biomolecular structure and dynamics using nuclear magnetic resonance spectroscopy (NMR) continue to advance, the resulting 3D structures, chemical shifts, and other NMR data are broadly impacting biology, chemistry, and medicine. Structure model assessment is a critical area of NMR methods development, and is an essential component of the process of making these structures accessible and useful to the wider scientific community. For these reasons, the Worldwide Protein Data Bank (wwPDB) has convened an NMR Validation Task Force (NMR-VTF) to work with the wwPDB partners in developing metrics and policies for biomolecular NMR data harvesting, structure representation, and structure quality assessment. This paper summarizes the recommendations of the NMR-VTF, and lays the groundwork for future work in developing standards and metrics for biomolecular NMR structure quality assessment. PMID:24010715
Madaoui, Hocine; Guerois, Raphaël
2008-01-01
Protein surfaces are under significant selection pressure to maintain interactions with their partners throughout evolution. Capturing how selection pressure acts at the interfaces of protein–protein complexes is a fundamental issue with high interest for the structural prediction of macromolecular assemblies. We tackled this issue under the assumption that, throughout evolution, mutations should minimally disrupt the physicochemical compatibility between specific clusters of interacting residues. This constraint drove the development of the so-called Surface COmplementarity Trace in Complex History score (SCOTCH), which was found to discriminate with high efficiency the structure of biological complexes. SCOTCH performances were assessed not only with respect to other evolution-based approaches, such as conservation and coevolution analyses, but also with respect to statistically based scoring methods. Validated on a set of 129 complexes of known structure exhibiting both permanent and transient intermolecular interactions, SCOTCH appears as a robust strategy to guide the prediction of protein–protein complex structures. Of particular interest, it also provides a basic framework to efficiently track how protein surfaces could evolve while keeping their partners in contact. PMID:18511568
Modelling dynamics in protein crystal structures by ensemble refinement
Burnley, B Tom; Afonine, Pavel V; Adams, Paul D; Gros, Piet
2012-01-01
Single-structure models derived from X-ray data do not adequately account for the inherent, functionally important dynamics of protein molecules. We generated ensembles of structures by time-averaged refinement, where local molecular vibrations were sampled by molecular-dynamics (MD) simulation whilst global disorder was partitioned into an underlying overall translation–libration–screw (TLS) model. Modeling of 20 protein datasets at 1.1–3.1 Å resolution reduced cross-validated Rfree values by 0.3–4.9%, indicating that ensemble models fit the X-ray data better than single structures. The ensembles revealed that, while most proteins display a well-ordered core, some proteins exhibit a ‘molten core’ likely supporting functionally important dynamics in ligand binding, enzyme activity and protomer assembly. Order–disorder changes in HIV protease indicate a mechanism of entropy compensation for ordering the catalytic residues upon ligand binding by disordering specific core residues. Thus, ensemble refinement extracts dynamical details from the X-ray data that allow a more comprehensive understanding of structure–dynamics–function relationships. DOI: http://dx.doi.org/10.7554/eLife.00311.001 PMID:23251785
Data Mining of Macromolecular Structures.
van Beusekom, Bart; Perrakis, Anastassis; Joosten, Robbie P
2016-01-01
The use of macromolecular structures is widespread for a variety of applications, from teaching protein structure principles all the way to ligand optimization in drug development. Applying data mining techniques on these experimentally determined structures requires a highly uniform, standardized structural data source. The Protein Data Bank (PDB) has evolved over the years toward becoming the standard resource for macromolecular structures. However, the process selecting the data most suitable for specific applications is still very much based on personal preferences and understanding of the experimental techniques used to obtain these models. In this chapter, we will first explain the challenges with data standardization, annotation, and uniformity in the PDB entries determined by X-ray crystallography. We then discuss the specific effect that crystallographic data quality and model optimization methods have on structural models and how validation tools can be used to make informed choices. We also discuss specific advantages of using the PDB_REDO databank as a resource for structural data. Finally, we will provide guidelines on how to select the most suitable protein structure models for detailed analysis and how to select a set of structure models suitable for data mining.
The archiving and dissemination of biological structure data.
Berman, Helen M; Burley, Stephen K; Kleywegt, Gerard J; Markley, John L; Nakamura, Haruki; Velankar, Sameer
2016-10-01
The global Protein Data Bank (PDB) was the first open-access digital archive in biology. The history and evolution of the PDB are described, together with the ways in which molecular structural biology data and information are collected, curated, validated, archived, and disseminated by the members of the Worldwide Protein Data Bank organization (wwPDB; http://wwpdb.org). Particular emphasis is placed on the role of community in establishing the standards and policies by which the PDB archive is managed day-to-day. Copyright © 2016 The Authors. Published by Elsevier Ltd.. All rights reserved.
Matching multiple rigid domain decompositions of proteins
Flynn, Emily; Streinu, Ileana
2017-01-01
We describe efficient methods for consistently coloring and visualizing collections of rigid cluster decompositions obtained from variations of a protein structure, and lay the foundation for more complex setups that may involve different computational and experimental methods. The focus here is on three biological applications: the conceptually simpler problems of visualizing results of dilution and mutation analyses, and the more complex task of matching decompositions of multiple NMR models of the same protein. Implemented into the KINARI web server application, the improved visualization techniques give useful information about protein folding cores, help examining the effect of mutations on protein flexibility and function, and provide insights into the structural motions of PDB proteins solved with solution NMR. These tools have been developed with the goal of improving and validating rigidity analysis as a credible coarse-grained model capturing essential information about a protein’s slow motions near the native state. PMID:28141528
Identification of DNA-binding proteins using structural, electrostatic and evolutionary features.
Nimrod, Guy; Szilágyi, András; Leslie, Christina; Ben-Tal, Nir
2009-04-10
DNA-binding proteins (DBPs) participate in various crucial processes in the life-cycle of the cells, and the identification and characterization of these proteins is of great importance. We present here a random forests classifier for identifying DBPs among proteins with known 3D structures. First, clusters of evolutionarily conserved regions (patches) on the surface of proteins were detected using the PatchFinder algorithm; earlier studies showed that these regions are typically the functionally important regions of proteins. Next, we trained a classifier using features like the electrostatic potential, cluster-based amino acid conservation patterns and the secondary structure content of the patches, as well as features of the whole protein, including its dipole moment. Using 10-fold cross-validation on a dataset of 138 DBPs and 110 proteins that do not bind DNA, the classifier achieved a sensitivity and a specificity of 0.90, which is overall better than the performance of published methods. Furthermore, when we tested five different methods on 11 new DBPs that did not appear in the original dataset, only our method annotated all correctly. The resulting classifier was applied to a collection of 757 proteins of known structure and unknown function. Of these proteins, 218 were predicted to bind DNA, and we anticipate that some of them interact with DNA using new structural motifs. The use of complementary computational tools supports the notion that at least some of them do bind DNA.
Outcome of the First wwPDB/CCDC/D3R Ligand Validation Workshop.
Adams, Paul D; Aertgeerts, Kathleen; Bauer, Cary; Bell, Jeffrey A; Berman, Helen M; Bhat, Talapady N; Blaney, Jeff M; Bolton, Evan; Bricogne, Gerard; Brown, David; Burley, Stephen K; Case, David A; Clark, Kirk L; Darden, Tom; Emsley, Paul; Feher, Victoria A; Feng, Zukang; Groom, Colin R; Harris, Seth F; Hendle, Jorg; Holder, Thomas; Joachimiak, Andrzej; Kleywegt, Gerard J; Krojer, Tobias; Marcotrigiano, Joseph; Mark, Alan E; Markley, John L; Miller, Matthew; Minor, Wladek; Montelione, Gaetano T; Murshudov, Garib; Nakagawa, Atsushi; Nakamura, Haruki; Nicholls, Anthony; Nicklaus, Marc; Nolte, Robert T; Padyana, Anil K; Peishoff, Catherine E; Pieniazek, Susan; Read, Randy J; Shao, Chenghua; Sheriff, Steven; Smart, Oliver; Soisson, Stephen; Spurlino, John; Stouch, Terry; Svobodova, Radka; Tempel, Wolfram; Terwilliger, Thomas C; Tronrud, Dale; Velankar, Sameer; Ward, Suzanna C; Warren, Gregory L; Westbrook, John D; Williams, Pamela; Yang, Huanwang; Young, Jasmine
2016-04-05
Crystallographic studies of ligands bound to biological macromolecules (proteins and nucleic acids) represent an important source of information concerning drug-target interactions, providing atomic level insights into the physical chemistry of complex formation between macromolecules and ligands. Of the more than 115,000 entries extant in the Protein Data Bank (PDB) archive, ∼75% include at least one non-polymeric ligand. Ligand geometrical and stereochemical quality, the suitability of ligand models for in silico drug discovery and design, and the goodness-of-fit of ligand models to electron-density maps vary widely across the archive. We describe the proceedings and conclusions from the first Worldwide PDB/Cambridge Crystallographic Data Center/Drug Design Data Resource (wwPDB/CCDC/D3R) Ligand Validation Workshop held at the Research Collaboratory for Structural Bioinformatics at Rutgers University on July 30-31, 2015. Experts in protein crystallography from academe and industry came together with non-profit and for-profit software providers for crystallography and with experts in computational chemistry and data archiving to discuss and make recommendations on best practices, as framed by a series of questions central to structural studies of macromolecule-ligand complexes. What data concerning bound ligands should be archived in the PDB? How should the ligands be best represented? How should structural models of macromolecule-ligand complexes be validated? What supplementary information should accompany publications of structural studies of biological macromolecules? Consensus recommendations on best practices developed in response to each of these questions are provided, together with some details regarding implementation. Important issues addressed but not resolved at the workshop are also enumerated. Copyright © 2016 Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Canino, Lawrence S.; Shen, Tongye; McCammon, J. Andrew
2002-12-01
We extend the self-consistent pair contact probability method to the evaluation of the partition function for a protein complex at thermodynamic equilibrium. Specifically, we adapt the method for multichain models and introduce a parametrization for amino acid-specific pairwise interactions. This method is similar to the Gaussian network model but allows for the adjusting of the strengths of native state contacts. The method is first validated on a high resolution x-ray crystal structure of bovine Pancreatic Phospholipase A2 by comparing calculated B-factors with reported values. We then examine binding-induced changes in flexibility in protein-protein complexes, comparing computed results with those obtained from x-ray crystal structures and molecular dynamics simulations. In particular, we focus on the mouse acetylcholinesterase:fasciculin II and the human α-thrombin:thrombomodulin complexes.
NASA Astrophysics Data System (ADS)
Kaushik, Aman C.; Kumar, Sanjay; Wei, Dong Q.; Sahi, Shakti
2018-02-01
GPR142 (G protein receptor 142) is a novel orphan GPCR (G protein coupled receptor) belonging to ‘Class A’ of GPCR family and expressed in beta cells of pancreas. In this study, we reported the structure based virtual screening to identify the hit compounds which can be developed as leads for potential agonists. The results were validated through induced fit docking, pharmacophore modeling and system biology approaches. Since, there is no solved crystal structure of GPR142, we attempted to predict the 3D structure followed by validation and then identification of active site using threading and ab initio methods. Also, structure based virtual screening was performed against a total of 1171519 compounds from different libraries and only top 20 best hit compounds were screened and analyzed. Moreover, the biochemical pathway of GPR142 complex with screened compound2 was also designed and compared with experimental data. Interestingly, compound2 showed an increase in insulin production via Gq mediated signaling pathway suggesting the possible role of novel GPR142 agonists in therapy against type 2 diabetes.
A New Method for Determining Structure Ensemble: Application to a RNA Binding Di-Domain Protein.
Liu, Wei; Zhang, Jingfeng; Fan, Jing-Song; Tria, Giancarlo; Grüber, Gerhard; Yang, Daiwen
2016-05-10
Structure ensemble determination is the basis of understanding the structure-function relationship of a multidomain protein with weak domain-domain interactions. Paramagnetic relaxation enhancement has been proven a powerful tool in the study of structure ensembles, but there exist a number of challenges such as spin-label flexibility, domain dynamics, and overfitting. Here we propose a new (to our knowledge) method to describe structure ensembles using a minimal number of conformers. In this method, individual domains are considered rigid; the position of each spin-label conformer and the structure of each protein conformer are defined by three and six orthogonal parameters, respectively. First, the spin-label ensemble is determined by optimizing the positions and populations of spin-label conformers against intradomain paramagnetic relaxation enhancements with a genetic algorithm. Subsequently, the protein structure ensemble is optimized using a more efficient genetic algorithm-based approach and an overfitting indicator, both of which were established in this work. The method was validated using a reference ensemble with a set of conformers whose populations and structures are known. This method was also applied to study the structure ensemble of the tandem di-domain of a poly (U) binding protein. The determined ensemble was supported by small-angle x-ray scattering and nuclear magnetic resonance relaxation data. The ensemble obtained suggests an induced fit mechanism for recognition of target RNA by the protein. Copyright © 2016 Biophysical Society. Published by Elsevier Inc. All rights reserved.
Houston, Simon; Lithgow, Karen Vivien; Osbak, Kara Krista; Kenyon, Chris Richard; Cameron, Caroline E
2018-05-16
Syphilis continues to be a major global health threat with 11 million new infections each year, and a global burden of 36 million cases. The causative agent of syphilis, Treponema pallidum subspecies pallidum, is a highly virulent bacterium, however the molecular mechanisms underlying T. pallidum pathogenesis remain to be definitively identified. This is due to the fact that T. pallidum is currently uncultivatable, inherently fragile and thus difficult to work with, and phylogenetically distinct with no conventional virulence factor homologs found in other pathogens. In fact, approximately 30% of its predicted protein-coding genes have no known orthologs or assigned functions. Here we employed a structural bioinformatics approach using Phyre2-based tertiary structure modeling to improve our understanding of T. pallidum protein function on a proteome-wide scale. Phyre2-based tertiary structure modeling generated high-confidence predictions for 80% of the T. pallidum proteome (780/978 predicted proteins). Tertiary structure modeling also inferred the same function as primary structure-based annotations from genome sequencing pipelines for 525/605 proteins (87%), which represents 54% (525/978) of all T. pallidum proteins. Of the 175 T. pallidum proteins modeled with high confidence that were not assigned functions in the previously annotated published proteome, 167 (95%) were able to be assigned predicted functions. Twenty-one of the 175 hypothetical proteins modeled with high confidence were also predicted to exhibit significant structural similarity with proteins experimentally confirmed to be required for virulence in other pathogens. Phyre2-based structural modeling is a powerful bioinformatics tool that has provided insight into the potential structure and function of the majority of T. pallidum proteins and helped validate the primary structure-based annotation of more than 50% of all T. pallidum proteins with high confidence. This work represents the first T. pallidum proteome-wide structural modeling study and is one of few studies to apply this approach for the functional annotation of a whole proteome.
Balu, Rajkamal; Knott, Robert; Cowieson, Nathan P.; Elvin, Christopher M.; Hill, Anita J.; Choudhury, Namita R.; Dutta, Naba K.
2015-01-01
Rec1-resilin is the first recombinant resilin-mimetic protein polymer, synthesized from exon-1 of the Drosophila melanogaster gene CG15920 that has demonstrated unusual multi-stimuli responsiveness in aqueous solution. Crosslinked hydrogels of Rec1-resilin have also displayed remarkable mechanical properties including near-perfect rubber-like elasticity. The structural basis of these extraordinary properties is not clearly understood. Here we combine a computational and experimental investigation to examine structural ensembles of Rec1-resilin in aqueous solution. The structure of Rec1-resilin in aqueous solutions is investigated experimentally using circular dichroism (CD) spectroscopy and small angle X-ray scattering (SAXS). Both bench-top and synchrotron SAXS are employed to extract structural data sets of Rec1-resilin and to confirm their validity. Computational approaches have been applied to these experimental data sets in order to extract quantitative information about structural ensembles including radius of gyration, pair-distance distribution function, and the fractal dimension. The present work confirms that Rec1-resilin is an intrinsically disordered protein (IDP) that displays equilibrium structural qualities between those of a structured globular protein and a denatured protein. The ensemble optimization method (EOM) analysis reveals a single conformational population with partial compactness. This work provides new insight into the structural ensembles of Rec1-resilin in solution. PMID:26042819
Balu, Rajkamal; Knott, Robert; Cowieson, Nathan P; Elvin, Christopher M; Hill, Anita J; Choudhury, Namita R; Dutta, Naba K
2015-06-04
Rec1-resilin is the first recombinant resilin-mimetic protein polymer, synthesized from exon-1 of the Drosophila melanogaster gene CG15920 that has demonstrated unusual multi-stimuli responsiveness in aqueous solution. Crosslinked hydrogels of Rec1-resilin have also displayed remarkable mechanical properties including near-perfect rubber-like elasticity. The structural basis of these extraordinary properties is not clearly understood. Here we combine a computational and experimental investigation to examine structural ensembles of Rec1-resilin in aqueous solution. The structure of Rec1-resilin in aqueous solutions is investigated experimentally using circular dichroism (CD) spectroscopy and small angle X-ray scattering (SAXS). Both bench-top and synchrotron SAXS are employed to extract structural data sets of Rec1-resilin and to confirm their validity. Computational approaches have been applied to these experimental data sets in order to extract quantitative information about structural ensembles including radius of gyration, pair-distance distribution function, and the fractal dimension. The present work confirms that Rec1-resilin is an intrinsically disordered protein (IDP) that displays equilibrium structural qualities between those of a structured globular protein and a denatured protein. The ensemble optimization method (EOM) analysis reveals a single conformational population with partial compactness. This work provides new insight into the structural ensembles of Rec1-resilin in solution.
NASA Astrophysics Data System (ADS)
Balu, Rajkamal; Knott, Robert; Cowieson, Nathan P.; Elvin, Christopher M.; Hill, Anita J.; Choudhury, Namita R.; Dutta, Naba K.
2015-06-01
Rec1-resilin is the first recombinant resilin-mimetic protein polymer, synthesized from exon-1 of the Drosophila melanogaster gene CG15920 that has demonstrated unusual multi-stimuli responsiveness in aqueous solution. Crosslinked hydrogels of Rec1-resilin have also displayed remarkable mechanical properties including near-perfect rubber-like elasticity. The structural basis of these extraordinary properties is not clearly understood. Here we combine a computational and experimental investigation to examine structural ensembles of Rec1-resilin in aqueous solution. The structure of Rec1-resilin in aqueous solutions is investigated experimentally using circular dichroism (CD) spectroscopy and small angle X-ray scattering (SAXS). Both bench-top and synchrotron SAXS are employed to extract structural data sets of Rec1-resilin and to confirm their validity. Computational approaches have been applied to these experimental data sets in order to extract quantitative information about structural ensembles including radius of gyration, pair-distance distribution function, and the fractal dimension. The present work confirms that Rec1-resilin is an intrinsically disordered protein (IDP) that displays equilibrium structural qualities between those of a structured globular protein and a denatured protein. The ensemble optimization method (EOM) analysis reveals a single conformational population with partial compactness. This work provides new insight into the structural ensembles of Rec1-resilin in solution.
Dewhurst, Henry M; Choudhury, Shilpa; Torres, Matthew P
2015-08-01
Predicting the biological function potential of post-translational modifications (PTMs) is becoming increasingly important in light of the exponential increase in available PTM data from high-throughput proteomics. We developed structural analysis of PTM hotspots (SAPH-ire)--a quantitative PTM ranking method that integrates experimental PTM observations, sequence conservation, protein structure, and interaction data to allow rank order comparisons within or between protein families. Here, we applied SAPH-ire to the study of PTMs in diverse G protein families, a conserved and ubiquitous class of proteins essential for maintenance of intracellular structure (tubulins) and signal transduction (large and small Ras-like G proteins). A total of 1728 experimentally verified PTMs from eight unique G protein families were clustered into 451 unique hotspots, 51 of which have a known and cited biological function or response. Using customized software, the hotspots were analyzed in the context of 598 unique protein structures. By comparing distributions of hotspots with known versus unknown function, we show that SAPH-ire analysis is predictive for PTM biological function. Notably, SAPH-ire revealed high-ranking hotspots for which a functional impact has not yet been determined, including phosphorylation hotspots in the N-terminal tails of G protein gamma subunits--conserved protein structures never before reported as regulators of G protein coupled receptor signaling. To validate this prediction we used the yeast model system for G protein coupled receptor signaling, revealing that gamma subunit-N-terminal tail phosphorylation is activated in response to G protein coupled receptor stimulation and regulates protein stability in vivo. These results demonstrate the utility of integrating protein structural and sequence features into PTM prioritization schemes that can improve the analysis and functional power of modification-specific proteomics data. © 2015 by The American Society for Biochemistry and Molecular Biology, Inc.
xMDFF: molecular dynamics flexible fitting of low-resolution X-ray structures.
McGreevy, Ryan; Singharoy, Abhishek; Li, Qufei; Zhang, Jingfen; Xu, Dong; Perozo, Eduardo; Schulten, Klaus
2014-09-01
X-ray crystallography remains the most dominant method for solving atomic structures. However, for relatively large systems, the availability of only medium-to-low-resolution diffraction data often limits the determination of all-atom details. A new molecular dynamics flexible fitting (MDFF)-based approach, xMDFF, for determining structures from such low-resolution crystallographic data is reported. xMDFF employs a real-space refinement scheme that flexibly fits atomic models into an iteratively updating electron-density map. It addresses significant large-scale deformations of the initial model to fit the low-resolution density, as tested with synthetic low-resolution maps of D-ribose-binding protein. xMDFF has been successfully applied to re-refine six low-resolution protein structures of varying sizes that had already been submitted to the Protein Data Bank. Finally, via systematic refinement of a series of data from 3.6 to 7 Å resolution, xMDFF refinements together with electrophysiology experiments were used to validate the first all-atom structure of the voltage-sensing protein Ci-VSP.
Objective identification of residue ranges for the superposition of protein structures
2011-01-01
Background The automation of objectively selecting amino acid residue ranges for structure superpositions is important for meaningful and consistent protein structure analyses. So far there is no widely-used standard for choosing these residue ranges for experimentally determined protein structures, where the manual selection of residue ranges or the use of suboptimal criteria remain commonplace. Results We present an automated and objective method for finding amino acid residue ranges for the superposition and analysis of protein structures, in particular for structure bundles resulting from NMR structure calculations. The method is implemented in an algorithm, CYRANGE, that yields, without protein-specific parameter adjustment, appropriate residue ranges in most commonly occurring situations, including low-precision structure bundles, multi-domain proteins, symmetric multimers, and protein complexes. Residue ranges are chosen to comprise as many residues of a protein domain that increasing their number would lead to a steep rise in the RMSD value. Residue ranges are determined by first clustering residues into domains based on the distance variance matrix, and then refining for each domain the initial choice of residues by excluding residues one by one until the relative decrease of the RMSD value becomes insignificant. A penalty for the opening of gaps favours contiguous residue ranges in order to obtain a result that is as simple as possible, but not simpler. Results are given for a set of 37 proteins and compared with those of commonly used protein structure validation packages. We also provide residue ranges for 6351 NMR structures in the Protein Data Bank. Conclusions The CYRANGE method is capable of automatically determining residue ranges for the superposition of protein structure bundles for a large variety of protein structures. The method correctly identifies ordered regions. Global structure superpositions based on the CYRANGE residue ranges allow a clear presentation of the structure, and unnecessary small gaps within the selected ranges are absent. In the majority of cases, the residue ranges from CYRANGE contain fewer gaps and cover considerably larger parts of the sequence than those from other methods without significantly increasing the RMSD values. CYRANGE thus provides an objective and automatic method for standardizing the choice of residue ranges for the superposition of protein structures. PMID:21592348
Liang, Yunyun; Liu, Sanyang; Zhang, Shengli
2015-01-01
Prediction of protein structural classes for low-similarity sequences is useful for understanding fold patterns, regulation, functions, and interactions of proteins. It is well known that feature extraction is significant to prediction of protein structural class and it mainly uses protein primary sequence, predicted secondary structure sequence, and position-specific scoring matrix (PSSM). Currently, prediction solely based on the PSSM has played a key role in improving the prediction accuracy. In this paper, we propose a novel method called CSP-SegPseP-SegACP by fusing consensus sequence (CS), segmented PsePSSM, and segmented autocovariance transformation (ACT) based on PSSM. Three widely used low-similarity datasets (1189, 25PDB, and 640) are adopted in this paper. Then a 700-dimensional (700D) feature vector is constructed and the dimension is decreased to 224D by using principal component analysis (PCA). To verify the performance of our method, rigorous jackknife cross-validation tests are performed on 1189, 25PDB, and 640 datasets. Comparison of our results with the existing PSSM-based methods demonstrates that our method achieves the favorable and competitive performance. This will offer an important complementary to other PSSM-based methods for prediction of protein structural classes for low-similarity sequences.
A General Method for Targeted Quantitative Cross-Linking Mass Spectrometry.
Chavez, Juan D; Eng, Jimmy K; Schweppe, Devin K; Cilia, Michelle; Rivera, Keith; Zhong, Xuefei; Wu, Xia; Allen, Terrence; Khurgel, Moshe; Kumar, Akhilesh; Lampropoulos, Athanasios; Larsson, Mårten; Maity, Shuvadeep; Morozov, Yaroslav; Pathmasiri, Wimal; Perez-Neut, Mathew; Pineyro-Ruiz, Coriness; Polina, Elizabeth; Post, Stephanie; Rider, Mark; Tokmina-Roszyk, Dorota; Tyson, Katherine; Vieira Parrine Sant'Ana, Debora; Bruce, James E
2016-01-01
Chemical cross-linking mass spectrometry (XL-MS) provides protein structural information by identifying covalently linked proximal amino acid residues on protein surfaces. The information gained by this technique is complementary to other structural biology methods such as x-ray crystallography, NMR and cryo-electron microscopy[1]. The extension of traditional quantitative proteomics methods with chemical cross-linking can provide information on the structural dynamics of protein structures and protein complexes. The identification and quantitation of cross-linked peptides remains challenging for the general community, requiring specialized expertise ultimately limiting more widespread adoption of the technique. We describe a general method for targeted quantitative mass spectrometric analysis of cross-linked peptide pairs. We report the adaptation of the widely used, open source software package Skyline, for the analysis of quantitative XL-MS data as a means for data analysis and sharing of methods. We demonstrate the utility and robustness of the method with a cross-laboratory study and present data that is supported by and validates previously published data on quantified cross-linked peptide pairs. This advance provides an easy to use resource so that any lab with access to a LC-MS system capable of performing targeted quantitative analysis can quickly and accurately measure dynamic changes in protein structure and protein interactions.
Electrostatics of cysteine residues in proteins: parameterization and validation of a simple model.
Salsbury, Freddie R; Poole, Leslie B; Fetrow, Jacquelyn S
2012-11-01
One of the most popular and simple models for the calculation of pK(a) s from a protein structure is the semi-macroscopic electrostatic model MEAD. This model requires empirical parameters for each residue to calculate pK(a) s. Analysis of current, widely used empirical parameters for cysteine residues showed that they did not reproduce expected cysteine pK(a) s; thus, we set out to identify parameters consistent with the CHARMM27 force field that capture both the behavior of typical cysteines in proteins and the behavior of cysteines which have perturbed pK(a) s. The new parameters were validated in three ways: (1) calculation across a large set of typical cysteines in proteins (where the calculations are expected to reproduce expected ensemble behavior); (2) calculation across a set of perturbed cysteines in proteins (where the calculations are expected to reproduce the shifted ensemble behavior); and (3) comparison to experimentally determined pK(a) values (where the calculation should reproduce the pK(a) within experimental error). Both the general behavior of cysteines in proteins and the perturbed pK(a) in some proteins can be predicted reasonably well using the newly determined empirical parameters within the MEAD model for protein electrostatics. This study provides the first general analysis of the electrostatics of cysteines in proteins, with specific attention paid to capturing both the behavior of typical cysteines in a protein and the behavior of cysteines whose pK(a) should be shifted, and validation of force field parameters for cysteine residues. Copyright © 2012 Wiley Periodicals, Inc.
Xia, Jie; Hsieh, Jui-Hua; Hu, Huabin; Wu, Song; Wang, Xiang Simon
2017-06-26
Structure-based virtual screening (SBVS) has become an indispensable technique for hit identification at the early stage of drug discovery. However, the accuracy of current scoring functions is not high enough to confer success to every target and thus remains to be improved. Previously, we had developed binary pose filters (PFs) using knowledge derived from the protein-ligand interface of a single X-ray structure of a specific target. This novel approach had been validated as an effective way to improve ligand enrichment. Continuing from it, in the present work we attempted to incorporate knowledge collected from diverse protein-ligand interfaces of multiple crystal structures of the same target to build PF ensembles (PFEs). Toward this end, we first constructed a comprehensive data set to meet the requirements of ensemble modeling and validation. This set contains 10 diverse targets, 118 well-prepared X-ray structures of protein-ligand complexes, and large benchmarking actives/decoys sets. Notably, we designed a unique workflow of two-layer classifiers based on the concept of ensemble learning and applied it to the construction of PFEs for all of the targets. Through extensive benchmarking studies, we demonstrated that (1) coupling PFE with Chemgauss4 significantly improves the early enrichment of Chemgauss4 itself and (2) PFEs show greater consistency in boosting early enrichment and larger overall enrichment than our prior PFs. In addition, we analyzed the pairwise topological similarities among cognate ligands used to construct PFEs and found that it is the higher chemical diversity of the cognate ligands that leads to the improved performance of PFEs. Taken together, the results so far prove that the incorporation of knowledge from diverse protein-ligand interfaces by ensemble modeling is able to enhance the screening competence of SBVS scoring functions.
Defining and predicting structurally conserved regions in protein superfamilies
Huang, Ivan K.; Grishin, Nick V.
2013-01-01
Motivation: The structures of homologous proteins are generally better conserved than their sequences. This phenomenon is demonstrated by the prevalence of structurally conserved regions (SCRs) even in highly divergent protein families. Defining SCRs requires the comparison of two or more homologous structures and is affected by their availability and divergence, and our ability to deduce structurally equivalent positions among them. In the absence of multiple homologous structures, it is necessary to predict SCRs of a protein using information from only a set of homologous sequences and (if available) a single structure. Accurate SCR predictions can benefit homology modelling and sequence alignment. Results: Using pairwise DaliLite alignments among a set of homologous structures, we devised a simple measure of structural conservation, termed structural conservation index (SCI). SCI was used to distinguish SCRs from non-SCRs. A database of SCRs was compiled from 386 SCOP superfamilies containing 6489 protein domains. Artificial neural networks were then trained to predict SCRs with various features deduced from a single structure and homologous sequences. Assessment of the predictions via a 5-fold cross-validation method revealed that predictions based on features derived from a single structure perform similarly to ones based on homologous sequences, while combining sequence and structural features was optimal in terms of accuracy (0.755) and Matthews correlation coefficient (0.476). These results suggest that even without information from multiple structures, it is still possible to effectively predict SCRs for a protein. Finally, inspection of the structures with the worst predictions pinpoints difficulties in SCR definitions. Availability: The SCR database and the prediction server can be found at http://prodata.swmed.edu/SCR. Contact: 91huangi@gmail.com or grishin@chop.swmed.edu Supplementary information: Supplementary data are available at Bioinformatics Online PMID:23193223
Li, Hongdong; Zhang, Yang; Guan, Yuanfang; Menon, Rajasree; Omenn, Gilbert S
2017-01-01
Tens of thousands of splice isoforms of proteins have been catalogued as predicted sequences from transcripts in humans and other species. Relatively few have been characterized biochemically or structurally. With the extensive development of protein bioinformatics, the characterization and modeling of isoform features, isoform functions, and isoform-level networks have advanced notably. Here we present applications of the I-TASSER family of algorithms for folding and functional predictions and the IsoFunc, MIsoMine, and Hisonet data resources for isoform-level analyses of network and pathway-based functional predictions and protein-protein interactions. Hopefully, predictions and insights from protein bioinformatics will stimulate many experimental validation studies.
Nagpal, Suhani; Tiwari, Satyam; Mapa, Koyeli; Thukral, Lipi
2015-01-01
Many proteins comprising of complex topologies require molecular chaperones to achieve their unique three-dimensional folded structure. The E.coli chaperone, GroEL binds with a large number of unfolded and partially folded proteins, to facilitate proper folding and prevent misfolding and aggregation. Although the major structural components of GroEL are well defined, scaffolds of the non-native substrates that determine chaperone-mediated folding have been difficult to recognize. Here we performed all-atomistic and replica-exchange molecular dynamics simulations to dissect non-native ensemble of an obligate GroEL folder, DapA. Thermodynamics analyses of unfolding simulations revealed populated intermediates with distinct structural characteristics. We found that surface exposed hydrophobic patches are significantly increased, primarily contributed from native and non-native β-sheet elements. We validate the structural properties of these conformers using experimental data, including circular dichroism (CD), 1-anilinonaphthalene-8-sulfonic acid (ANS) binding measurements and previously reported hydrogen-deutrium exchange coupled to mass spectrometry (HDX-MS). Further, we constructed network graphs to elucidate long-range intra-protein connectivity of native and intermediate topologies, demonstrating regions that serve as central "hubs". Overall, our results implicate that genomic variations (or mutations) in the distinct regions of protein structures might disrupt these topological signatures disabling chaperone-mediated folding, leading to formation of aggregates.
Duffy, Fergal J; O'Donovan, Darragh; Devocelle, Marc; Moran, Niamh; O'Connell, David J; Shields, Denis C
2015-03-23
Protein-protein and protein-peptide interactions are responsible for the vast majority of biological functions in vivo, but targeting these interactions with small molecules has historically been difficult. What is required are efficient combined computational and experimental screening methods to choose among a number of potential protein interfaces worthy of targeting lead macrocyclic compounds for further investigation. To achieve this, we have generated combinatorial 3D virtual libraries of short disulfide-bonded peptides and compared them to pharmacophore models of important protein-protein and protein-peptide structures, including short linear motifs (SLiMs), protein-binding peptides, and turn structures at protein-protein interfaces, built from 3D models available in the Protein Data Bank. We prepared a total of 372 reference pharmacophores, which were matched against 108,659 multiconformer cyclic peptides. After normalization to exclude nonspecific cyclic peptides, the top hits notably are enriched for mimetics of turn structures, including a turn at the interaction surface of human α thrombin, and also feature several protein-binding peptides. The top cyclic peptide hits also cover the critical "hot spot" interaction sites predicted from the interaction crystal structure. We have validated our method by testing cyclic peptides predicted to inhibit thrombin, a key protein in the blood coagulation pathway of important therapeutic interest, identifying a cyclic peptide inhibitor with lead-like activity. We conclude that protein interfaces most readily targetable by cyclic peptides and related macrocyclic drugs may be identified computationally among a set of candidate interfaces, accelerating the choice of interfaces against which lead compounds may be screened.
Structure of the immature HIV-1 capsid in intact virus particles at 8.8 Å resolution
NASA Astrophysics Data System (ADS)
Schur, Florian K. M.; Hagen, Wim J. H.; Rumlová, Michaela; Ruml, Tomáš; Müller, Barbara; Kräusslich, Hans-Georg; Briggs, John A. G.
2015-01-01
Human immunodeficiency virus type 1 (HIV-1) assembly proceeds in two stages. First, the 55 kilodalton viral Gag polyprotein assembles into a hexameric protein lattice at the plasma membrane of the infected cell, inducing budding and release of an immature particle. Second, Gag is cleaved by the viral protease, leading to internal rearrangement of the virus into the mature, infectious form. Immature and mature HIV-1 particles are heterogeneous in size and morphology, preventing high-resolution analysis of their protein arrangement in situ by conventional structural biology methods. Here we apply cryo-electron tomography and sub-tomogram averaging methods to resolve the structure of the capsid lattice within intact immature HIV-1 particles at subnanometre resolution, allowing unambiguous positioning of all α-helices. The resulting model reveals tertiary and quaternary structural interactions that mediate HIV-1 assembly. Strikingly, these interactions differ from those predicted by the current model based on in vitro-assembled arrays of Gag-derived proteins from Mason-Pfizer monkey virus. To validate this difference, we solve the structure of the capsid lattice within intact immature Mason-Pfizer monkey virus particles. Comparison with the immature HIV-1 structure reveals that retroviral capsid proteins, while having conserved tertiary structures, adopt different quaternary arrangements during virus assembly. The approach demonstrated here should be applicable to determine structures of other proteins at subnanometre resolution within heterogeneous environments.
Accurate prediction of RNA-binding protein residues with two discriminative structural descriptors.
Sun, Meijian; Wang, Xia; Zou, Chuanxin; He, Zenghui; Liu, Wei; Li, Honglin
2016-06-07
RNA-binding proteins participate in many important biological processes concerning RNA-mediated gene regulation, and several computational methods have been recently developed to predict the protein-RNA interactions of RNA-binding proteins. Newly developed discriminative descriptors will help to improve the prediction accuracy of these prediction methods and provide further meaningful information for researchers. In this work, we designed two structural features (residue electrostatic surface potential and triplet interface propensity) and according to the statistical and structural analysis of protein-RNA complexes, the two features were powerful for identifying RNA-binding protein residues. Using these two features and other excellent structure- and sequence-based features, a random forest classifier was constructed to predict RNA-binding residues. The area under the receiver operating characteristic curve (AUC) of five-fold cross-validation for our method on training set RBP195 was 0.900, and when applied to the test set RBP68, the prediction accuracy (ACC) was 0.868, and the F-score was 0.631. The good prediction performance of our method revealed that the two newly designed descriptors could be discriminative for inferring protein residues interacting with RNAs. To facilitate the use of our method, a web-server called RNAProSite, which implements the proposed method, was constructed and is freely available at http://lilab.ecust.edu.cn/NABind .
Extracting physicochemical features to predict protein secondary structure.
Huang, Yin-Fu; Chen, Shu-Ying
2013-01-01
We propose a protein secondary structure prediction method based on position-specific scoring matrix (PSSM) profiles and four physicochemical features including conformation parameters, net charges, hydrophobic, and side chain mass. First, the SVM with the optimal window size and the optimal parameters of the kernel function is found. Then, we train the SVM using the PSSM profiles generated from PSI-BLAST and the physicochemical features extracted from the CB513 data set. Finally, we use the filter to refine the predicted results from the trained SVM. For all the performance measures of our method, Q 3 reaches 79.52, SOV94 reaches 86.10, and SOV99 reaches 74.60; all the measures are higher than those of the SVMpsi method and the SVMfreq method. This validates that considering these physicochemical features in predicting protein secondary structure would exhibit better performances.
Extracting Physicochemical Features to Predict Protein Secondary Structure
Chen, Shu-Ying
2013-01-01
We propose a protein secondary structure prediction method based on position-specific scoring matrix (PSSM) profiles and four physicochemical features including conformation parameters, net charges, hydrophobic, and side chain mass. First, the SVM with the optimal window size and the optimal parameters of the kernel function is found. Then, we train the SVM using the PSSM profiles generated from PSI-BLAST and the physicochemical features extracted from the CB513 data set. Finally, we use the filter to refine the predicted results from the trained SVM. For all the performance measures of our method, Q 3 reaches 79.52, SOV94 reaches 86.10, and SOV99 reaches 74.60; all the measures are higher than those of the SVMpsi method and the SVMfreq method. This validates that considering these physicochemical features in predicting protein secondary structure would exhibit better performances. PMID:23766688
Ligand Binding Site Detection by Local Structure Alignment and Its Performance Complementarity
Lee, Hui Sun; Im, Wonpil
2013-01-01
Accurate determination of potential ligand binding sites (BS) is a key step for protein function characterization and structure-based drug design. Despite promising results of template-based BS prediction methods using global structure alignment (GSA), there is a room to improve the performance by properly incorporating local structure alignment (LSA) because BS are local structures and often similar for proteins with dissimilar global folds. We present a template-based ligand BS prediction method using G-LoSA, our LSA tool. A large benchmark set validation shows that G-LoSA predicts drug-like ligands’ positions in single-chain protein targets more precisely than TM-align, a GSA-based method, while the overall success rate of TM-align is better. G-LoSA is particularly efficient for accurate detection of local structures conserved across proteins with diverse global topologies. Recognizing the performance complementarity of G-LoSA to TM-align and a non-template geometry-based method, fpocket, a robust consensus scoring method, CMCS-BSP (Complementary Methods and Consensus Scoring for ligand Binding Site Prediction), is developed and shows improvement on prediction accuracy. The G-LoSA source code is freely available at http://im.bioinformatics.ku.edu/GLoSA. PMID:23957286
Abascal-Palacios, Guillermo; Schindler, Christina; Rojas, Adriana L; Bonifacino, Juan S.; Hierro, Aitor
2016-01-01
Summary The Golgi-Associated Retrograde Protein (GARP) is a tethering complex involved in the fusion of endosome-derived transport vesicles to the trans-Golgi network through interaction with components of the Syntaxin 6/Syntaxin 16/Vti1a/VAMP4 SNARE complex. The mechanisms by which GARP and other tethering factors engage the SNARE fusion machinery are poorly understood. Herein we report the structural basis for the interaction of the human Ang2 subunit of GARP with Syntaxin 6 and the closely related Syntaxin 10. The crystal structure of Syntaxin 6 Habc domain in complex with a peptide from the N terminus of Ang2 shows a novel binding mode in which a di-tyrosine motif of Ang2 interacts with a highly conserved groove in Syntaxin 6. Structure-based mutational analyses validate the crystal structure and support the phylogenetic conservation of this interaction. The same binding determinants are found in other tethering proteins and syntaxins, suggesting a general interaction mechanism. PMID:23932592
An overview of tools for the validation of protein NMR structures.
Vuister, Geerten W; Fogh, Rasmus H; Hendrickx, Pieter M S; Doreleijers, Jurgen F; Gutmanas, Aleksandras
2014-04-01
Biomolecular structures at atomic resolution present a valuable resource for the understanding of biology. NMR spectroscopy accounts for 11% of all structures in the PDB repository. In response to serious problems with the accuracy of some of the NMR-derived structures and in order to facilitate proper analysis of the experimental models, a number of program suites are available. We discuss nine of these tools in this review: PROCHECK-NMR, PSVS, GLM-RMSD, CING, Molprobity, Vivaldi, ResProx, NMR constraints analyzer and QMEAN. We evaluate these programs for their ability to assess the structural quality, restraints and their violations, chemical shifts, peaks and the handling of multi-model NMR ensembles. We document both the input required by the programs and output they generate. To discuss their relative merits we have applied the tools to two representative examples from the PDB: a small, globular monomeric protein (Staphylococcal nuclease from S. aureus, PDB entry 2kq3) and a small, symmetric homodimeric protein (a region of human myosin-X, PDB entry 2lw9).
Structural reducibility of multilayer networks
NASA Astrophysics Data System (ADS)
de Domenico, Manlio; Nicosia, Vincenzo; Arenas, Alexandre; Latora, Vito
2015-04-01
Many complex systems can be represented as networks consisting of distinct types of interactions, which can be categorized as links belonging to different layers. For example, a good description of the full protein-protein interactome requires, for some organisms, up to seven distinct network layers, accounting for different genetic and physical interactions, each containing thousands of protein-protein relationships. A fundamental open question is then how many layers are indeed necessary to accurately represent the structure of a multilayered complex system. Here we introduce a method based on quantum theory to reduce the number of layers to a minimum while maximizing the distinguishability between the multilayer network and the corresponding aggregated graph. We validate our approach on synthetic benchmarks and we show that the number of informative layers in some real multilayer networks of protein-genetic interactions, social, economical and transportation systems can be reduced by up to 75%.
Guidelines to reach high-quality purified recombinant proteins.
Oliveira, Carla; Domingues, Lucília
2018-01-01
The final goal in recombinant protein production is to obtain high-quality pure protein samples. Indeed, the successful downstream application of a recombinant protein depends on its quality. Besides production, which is conditioned by the host, the quality of a recombinant protein product relies mainly on the purification procedure. Thus, the purification strategy must be carefully designed from the molecular level. On the other hand, the quality control of a protein sample must be performed to ensure its purity, homogeneity and structural conformity, in order to validate the recombinant production and purification process. Therefore, this review aims at providing succinct information on the rational purification design of recombinant proteins produced in Escherichia coli, specifically the tagging purification, as well as on accessible tools for evaluating and optimizing protein quality. The classical techniques for structural protein characterization-denaturing protein gel electrophoresis (SDS-PAGE), size exclusion chromatography (SEC), dynamic light scattering (DLS) and circular dichroism (CD)-are revisited with focus on the protein and their main advantages and disadvantages. Furthermore, methods for determining protein concentration and protein storage are also presented. The guidelines compiled herein will aid preparing pure, soluble and homogeneous functional recombinant proteins from the very beginning of the molecular cloning design.
Urvoas, Agathe; Guellouz, Asma; Valerio-Lepiniec, Marie; Graille, Marc; Durand, Dominique; Desravines, Danielle C; van Tilbeurgh, Herman; Desmadril, Michel; Minard, Philippe
2010-11-26
Repeat proteins have a modular organization and a regular architecture that make them attractive models for design and directed evolution experiments. HEAT repeat proteins, although very common, have not been used as a scaffold for artificial proteins, probably because they are made of long and irregular repeats. Here, we present and validate a consensus sequence for artificial HEAT repeat proteins. The sequence was defined from the structure-based sequence analysis of a thermostable HEAT-like repeat protein. Appropriate sequences were identified for the N- and C-caps. A library of genes coding for artificial proteins based on this sequence design, named αRep, was assembled using new and versatile methodology based on circular amplification. Proteins picked randomly from this library are expressed as soluble proteins. The biophysical properties of proteins with different numbers of repeats and different combinations of side chains in hypervariable positions were characterized. Circular dichroism and differential scanning calorimetry experiments showed that all these proteins are folded cooperatively and are very stable (T(m) >70 °C). Stability of these proteins increases with the number of repeats. Detailed gel filtration and small-angle X-ray scattering studies showed that the purified proteins form either monomers or dimers. The X-ray structure of a stable dimeric variant structure was solved. The protein is folded with a highly regular topology and the repeat structure is organized, as expected, as pairs of alpha helices. In this protein variant, the dimerization interface results directly from the variable surface enriched in aromatic residues located in the randomized positions of the repeats. The dimer was crystallized both in an apo and in a PEG-bound form, revealing a very well defined binding crevice and some structure flexibility at the interface. This fortuitous binding site could later prove to be a useful binding site for other low molecular mass partners. Copyright © 2010 Elsevier Ltd. All rights reserved.
Molecular Dynamics Analysis of Lysozyme Protein in Ethanol-Water Mixed Solvent Environment
NASA Astrophysics Data System (ADS)
Ochije, Henry Ikechukwu
Effect of protein-solvent interaction on the protein structure is widely studied using both experimental and computational techniques. Despite such extensive studies molecular level understanding of proteins and some simple solvents is still not fully understood. This work focuses on detailed molecular dynamics simulations to study of solvent effect on lysozyme protein, using water, alcohol and different concentrations of water-alcohol mixtures as solvents. The lysozyme protein structure in water, alcohol and alcohol-water mixture (0-12% alcohol) was studied using GROMACS molecular dynamics simulation code. Compared to water environment, the lysozome structure showed remarkable changes in solvents with increasing alcohol concentration. In particular, significant changes were observed in the protein secondary structure involving alpha helices. The influence of alcohol on the lysozyme protein was investigated by studying thermodynamic and structural properties. With increasing ethanol concentration we observed a systematic increase in total energy, enthalpy, root mean square deviation (RMSD), and radius of gyration. a polynomial interpolation approach. Using the resulting polynomial equation, we could determine above quantities for any intermediate alcohol percentage. In order to validate this approach, we selected an intermediate ethanol percentage and carried out full MD simulation. The results from MD simulation were in reasonably good agreement with that obtained using polynomial approach. Hence, the polynomial approach based method proposed here eliminates the need for computationally intensive full MD analysis for the concentrations within the range (0-12%) studied in this work.
Protein classification using probabilistic chain graphs and the Gene Ontology structure.
Carroll, Steven; Pavlovic, Vladimir
2006-08-01
Probabilistic graphical models have been developed in the past for the task of protein classification. In many cases, classifications obtained from the Gene Ontology have been used to validate these models. In this work we directly incorporate the structure of the Gene Ontology into the graphical representation for protein classification. We present a method in which each protein is represented by a replicate of the Gene Ontology structure, effectively modeling each protein in its own 'annotation space'. Proteins are also connected to one another according to different measures of functional similarity, after which belief propagation is run to make predictions at all ontology terms. The proposed method was evaluated on a set of 4879 proteins from the Saccharomyces Genome Database whose interactions were also recorded in the GRID project. Results indicate that direct utilization of the Gene Ontology improves predictive ability, outperforming traditional models that do not take advantage of dependencies among functional terms. Average increase in accuracy (precision) of positive and negative term predictions of 27.8% (2.0%) over three different similarity measures and three subontologies was observed. C/C++/Perl implementation is available from authors upon request.
Chandrasekaran, Srinivas Niranj; Das, Jhuma; Dokholyan, Nikolay V.; Carter, Charles W.
2016-01-01
PATH rapidly computes a path and a transition state between crystal structures by minimizing the Onsager-Machlup action. It requires input parameters whose range of values can generate different transition-state structures that cannot be uniquely compared with those generated by other methods. We outline modifications to estimate these input parameters to circumvent these difficulties and validate the PATH transition states by showing consistency between transition-states derived by different algorithms for unrelated protein systems. Although functional protein conformational change trajectories are to a degree stochastic, they nonetheless pass through a well-defined transition state whose detailed structural properties can rapidly be identified using PATH. PMID:26958584
Merkley, Eric D; Rysavy, Steven; Kahraman, Abdullah; Hafen, Ryan P; Daggett, Valerie; Adkins, Joshua N
2014-06-01
Integrative structural biology attempts to model the structures of protein complexes that are challenging or intractable by classical structural methods (due to size, dynamics, or heterogeneity) by combining computational structural modeling with data from experimental methods. One such experimental method is chemical crosslinking mass spectrometry (XL-MS), in which protein complexes are crosslinked and characterized using liquid chromatography-mass spectrometry to pinpoint specific amino acid residues in close structural proximity. The commonly used lysine-reactive N-hydroxysuccinimide ester reagents disuccinimidylsuberate (DSS) and bis(sulfosuccinimidyl)suberate (BS(3) ) have a linker arm that is 11.4 Å long when fully extended, allowing Cα (alpha carbon of protein backbone) atoms of crosslinked lysine residues to be up to ∼24 Å apart. However, XL-MS studies on proteins of known structure frequently report crosslinks that exceed this distance. Typically, a tolerance of ∼3 Å is added to the theoretical maximum to account for this observation, with limited justification for the chosen value. We used the Dynameomics database, a repository of high-quality molecular dynamics simulations of 807 proteins representative of diverse protein folds, to investigate the relationship between lysine-lysine distances in experimental starting structures and in simulation ensembles. We conclude that for DSS/BS(3), a distance constraint of 26-30 Å between Cα atoms is appropriate. This analysis provides a theoretical basis for the widespread practice of adding a tolerance to the crosslinker length when comparing XL-MS results to structures or in modeling. We also discuss the comparison of XL-MS results to MD simulations and known structures as a means to test and validate experimental XL-MS methods. © 2014 The Protein Society.
Gianni, Stefano; Jemth, Per
2014-07-01
The only experimental strategy to address the structure of folding transition states, the so-called Φ value analysis, relies on the synergy between site directed mutagenesis and the measurement of reaction kinetics. Despite its importance, the Φ value analysis has been often criticized and its power to pinpoint structural information has been questioned. In this hypothesis, we demonstrate that comparing the Φ values between proteins not only allows highlighting the robustness of folding pathways but also provides per se a strong validation of the method. © 2014 International Union of Biochemistry and Molecular Biology.
Paris, Guillaume; Kraszewski, Sebastian; Ramseyer, Christophe; Enescu, Mironel
2012-11-01
The role of the 17 disulfide (S-S) bridges in preserving the native conformation of human serum albumin (HSA) is investigated by performing classical molecular dynamics (MD) simulations on protein structures with intact and, respectively, reduced S-S bridges. The thermal unfolding simulations predict a clear destabilization of the protein secondary structure upon reduction of the S-S bridges as well as a significant distortion of the tertiary structure that is revealed by the changes in the protein native contacts fraction. The effect of the S-S bridges reduction on the protein compactness was tested by calculating Gibbs free energy profiles with respect to the protein gyration radius. The theoretical results obtained using the OPLS-AA and the AMBER ff03 force fields are in agreement with the available experimental data. Beyond the validation of the simulation method, the results here reported provide new insights into the mechanism of the protein reductive/oxidative unfolding/folding processes. It is predicted that in the native conformation of the protein, the thiol (-SH) groups belonging to the same reduced S-S bridge are located in potential wells that maintain them in contact. The -SH pairs can be dispatched by specific conformational transitions of the peptide chain located in the neighborhood of the cysteine residues. Copyright © 2012 Wiley Periodicals, Inc.
Zhou, Peng; Wang, Congcong; Tian, Feifei; Ren, Yanrong; Yang, Chao; Huang, Jian
2013-01-01
Quantitative structure-activity relationship (QSAR), a regression modeling methodology that establishes statistical correlation between structure feature and apparent behavior for a series of congeneric molecules quantitatively, has been widely used to evaluate the activity, toxicity and property of various small-molecule compounds such as drugs, toxicants and surfactants. However, it is surprising to see that such useful technique has only very limited applications to biomacromolecules, albeit the solved 3D atom-resolution structures of proteins, nucleic acids and their complexes have accumulated rapidly in past decades. Here, we present a proof-of-concept paradigm for the modeling, prediction and interpretation of the binding affinity of 144 sequence-nonredundant, structure-available and affinity-known protein complexes (Kastritis et al. Protein Sci 20:482-491, 2011) using a biomacromolecular QSAR (BioQSAR) scheme. We demonstrate that the modeling performance and predictive power of BioQSAR are comparable to or even better than that of traditional knowledge-based strategies, mechanism-type methods and empirical scoring algorithms, while BioQSAR possesses certain additional features compared to the traditional methods, such as adaptability, interpretability, deep-validation and high-efficiency. The BioQSAR scheme could be readily modified to infer the biological behavior and functions of other biomacromolecules, if their X-ray crystal structures, NMR conformation assemblies or computationally modeled structures are available.
Protein-Protein Interface Predictions by Data-Driven Methods: A Review
Xue, Li C; Dobbs, Drena; Bonvin, Alexandre M.J.J.; Honavar, Vasant
2015-01-01
Reliably pinpointing which specific amino acid residues form the interface(s) between a protein and its binding partner(s) is critical for understanding the structural and physicochemical determinants of protein recognition and binding affinity, and has wide applications in modeling and validating protein interactions predicted by high-throughput methods, in engineering proteins, and in prioritizing drug targets. Here, we review the basic concepts, principles and recent advances in computational approaches to the analysis and prediction of protein-protein interfaces. We point out caveats for objectively evaluating interface predictors, and discuss various applications of data-driven interface predictors for improving energy model-driven protein-protein docking. Finally, we stress the importance of exploiting binding partner information in reliably predicting interfaces and highlight recent advances in this emerging direction. PMID:26460190
Panjikar, Santosh; Parthasarathy, Venkataraman; Lamzin, Victor S; Weiss, Manfred S; Tucker, Paul A
2005-04-01
The EMBL-Hamburg Automated Crystal Structure Determination Platform is a system that combines a number of existing macromolecular crystallographic computer programs and several decision-makers into a software pipeline for automated and efficient crystal structure determination. The pipeline can be invoked as soon as X-ray data from derivatized protein crystals have been collected and processed. It is controlled by a web-based graphical user interface for data and parameter input, and for monitoring the progress of structure determination. A large number of possible structure-solution paths are encoded in the system and the optimal path is selected by the decision-makers as the structure solution evolves. The processes have been optimized for speed so that the pipeline can be used effectively for validating the X-ray experiment at a synchrotron beamline.
Global Low Frequency Protein Motions in Long-Range Allosteric Signaling
NASA Astrophysics Data System (ADS)
McLeish, Tom; Rogers, Thomas; Townsend, Philip; Burnell, David; Pohl, Ehmke; Wilson, Mark; Cann, Martin; Richards, Shane; Jones, Matthew
2015-03-01
We present a foundational theory for how allostery can occur as a function of low frequency dynamics without a change in protein structure. Elastic inhomogeneities allow entropic ``signalling at a distance.'' Remarkably, many globular proteins display just this class of elastic structure, in particular those that support allosteric binding of substrates (long-range co-operative effects between the binding sites of small molecules). Through multi-scale modelling of global normal modes we demonstrate negative co-operativity between the two cAMP ligands without change to the mean structure. Crucially, the value of the co-operativity is itself controlled by the interactions around a set of third allosteric ``control sites.'' The theory makes key experimental predictions, validated by analysis of variant proteins by a combination of structural biology and isothermal calorimetry. A quantitative description of allostery as a free energy landscape revealed a protein ``design space'' that identified the key inter- and intramolecular regulatory parameters that frame CRP/FNR family allostery. Furthermore, by analyzing naturally occurring CAP variants from diverse species, we demonstrate an evolutionary selection pressure to conserve residues crucial for allosteric control. The methodology establishes the means to engineer allosteric mechanisms that are driven by low frequency dynamics.
Diverse, high-quality test set for the validation of protein-ligand docking performance.
Hartshorn, Michael J; Verdonk, Marcel L; Chessari, Gianni; Brewerton, Suzanne C; Mooij, Wijnand T M; Mortenson, Paul N; Murray, Christopher W
2007-02-22
A procedure for analyzing and classifying publicly available crystal structures has been developed. It has been used to identify high-resolution protein-ligand complexes that can be assessed by reconstructing the electron density for the ligand using the deposited structure factors. The complexes have been clustered according to the protein sequences, and clusters have been discarded if they do not represent proteins thought to be of direct interest to the pharmaceutical or agrochemical industry. Rules have been used to exclude complexes containing non-drug-like ligands. One complex from each cluster has been selected where a structure of sufficient quality was available. The final Astex diverse set contains 85 diverse, relevant protein-ligand complexes, which have been prepared in a format suitable for docking and are to be made freely available to the entire research community (http://www.ccdc.cam.ac.uk). The performance of the docking program GOLD against the new set is assessed using a variety of protocols. Relatively unbiased protocols give success rates of approximately 80% for redocking into native structures, but it is possible to get success rates of over 90% with some protocols.
Elson, D S; Jo, J A
2007-01-01
We report a side viewing fibre-based endoscope that is compatible with intravascular imaging and fluorescence lifetime imaging microscopy (FLIM). The instrument has been validated through testing with fluorescent dyes and collagen and elastin powders using the Laguerre expansion deconvolution technique to calculate the fluorescence lifetimes. The instrument has also been tested on freshly excised unstained animal vascular tissues. PMID:19503759
Rapid Sampling of Hydrogen Bond Networks for Computational Protein Design.
Maguire, Jack B; Boyken, Scott E; Baker, David; Kuhlman, Brian
2018-05-08
Hydrogen bond networks play a critical role in determining the stability and specificity of biomolecular complexes, and the ability to design such networks is important for engineering novel structures, interactions, and enzymes. One key feature of hydrogen bond networks that makes them difficult to rationally engineer is that they are highly cooperative and are not energetically favorable until the hydrogen bonding potential has been satisfied for all buried polar groups in the network. Existing computational methods for protein design are ill-equipped for creating these highly cooperative networks because they rely on energy functions and sampling strategies that are focused on pairwise interactions. To enable the design of complex hydrogen bond networks, we have developed a new sampling protocol in the molecular modeling program Rosetta that explicitly searches for sets of amino acid mutations that can form self-contained hydrogen bond networks. For a given set of designable residues, the protocol often identifies many alternative sets of mutations/networks, and we show that it can readily be applied to large sets of residues at protein-protein interfaces or in the interior of proteins. The protocol builds on a recently developed method in Rosetta for designing hydrogen bond networks that has been experimentally validated for small symmetric systems but was not extensible to many larger protein structures and complexes. The sampling protocol we describe here not only recapitulates previously validated designs with performance improvements but also yields viable hydrogen bond networks for cases where the previous method fails, such as the design of large, asymmetric interfaces relevant to engineering protein-based therapeutics.
Protein asparagine deamidation prediction based on structures with machine learning methods.
Jia, Lei; Sun, Yaxiong
2017-01-01
Chemical stability is a major concern in the development of protein therapeutics due to its impact on both efficacy and safety. Protein "hotspots" are amino acid residues that are subject to various chemical modifications, including deamidation, isomerization, glycosylation, oxidation etc. A more accurate prediction method for potential hotspot residues would allow their elimination or reduction as early as possible in the drug discovery process. In this work, we focus on prediction models for asparagine (Asn) deamidation. Sequence-based prediction method simply identifies the NG motif (amino acid asparagine followed by a glycine) to be liable to deamidation. It still dominates deamidation evaluation process in most pharmaceutical setup due to its convenience. However, the simple sequence-based method is less accurate and often causes over-engineering a protein. We introduce structure-based prediction models by mining available experimental and structural data of deamidated proteins. Our training set contains 194 Asn residues from 25 proteins that all have available high-resolution crystal structures. Experimentally measured deamidation half-life of Asn in penta-peptides as well as 3D structure-based properties, such as solvent exposure, crystallographic B-factors, local secondary structure and dihedral angles etc., were used to train prediction models with several machine learning algorithms. The prediction tools were cross-validated as well as tested with an external test data set. The random forest model had high enrichment in ranking deamidated residues higher than non-deamidated residues while effectively eliminated false positive predictions. It is possible that such quantitative protein structure-function relationship tools can also be applied to other protein hotspot predictions. In addition, we extensively discussed metrics being used to evaluate the performance of predicting unbalanced data sets such as the deamidation case.
Bordner, Andrew J; Gorin, Andrey A
2008-05-12
Protein-protein interactions are ubiquitous and essential for all cellular processes. High-resolution X-ray crystallographic structures of protein complexes can reveal the details of their function and provide a basis for many computational and experimental approaches. Differentiation between biological and non-biological contacts and reconstruction of the intact complex is a challenging computational problem. A successful solution can provide additional insights into the fundamental principles of biological recognition and reduce errors in many algorithms and databases utilizing interaction information extracted from the Protein Data Bank (PDB). We have developed a method for identifying protein complexes in the PDB X-ray structures by a four step procedure: (1) comprehensively collecting all protein-protein interfaces; (2) clustering similar protein-protein interfaces together; (3) estimating the probability that each cluster is relevant based on a diverse set of properties; and (4) combining these scores for each PDB entry in order to predict the complex structure. The resulting clusters of biologically relevant interfaces provide a reliable catalog of evolutionary conserved protein-protein interactions. These interfaces, as well as the predicted protein complexes, are available from the Protein Interface Server (PInS) website (see Availability and requirements section). Our method demonstrates an almost two-fold reduction of the annotation error rate as evaluated on a large benchmark set of complexes validated from the literature. We also estimate relative contributions of each interface property to the accurate discrimination of biologically relevant interfaces and discuss possible directions for further improving the prediction method.
CheckMyMetal: a macromolecular metal-binding validation tool
Porebski, Przemyslaw J.
2017-01-01
Metals are essential in many biological processes, and metal ions are modeled in roughly 40% of the macromolecular structures in the Protein Data Bank (PDB). However, a significant fraction of these structures contain poorly modeled metal-binding sites. CheckMyMetal (CMM) is an easy-to-use metal-binding site validation server for macromolecules that is freely available at http://csgid.org/csgid/metal_sites. The CMM server can detect incorrect metal assignments as well as geometrical and other irregularities in the metal-binding sites. Guidelines for metal-site modeling and validation in macromolecules are illustrated by several practical examples grouped by the type of metal. These examples show CMM users (and crystallographers in general) problems they may encounter during the modeling of a specific metal ion. PMID:28291757
qPIPSA: Relating enzymatic kinetic parameters and interaction fields
Gabdoulline, Razif R; Stein, Matthias; Wade, Rebecca C
2007-01-01
Background The simulation of metabolic networks in quantitative systems biology requires the assignment of enzymatic kinetic parameters. Experimentally determined values are often not available and therefore computational methods to estimate these parameters are needed. It is possible to use the three-dimensional structure of an enzyme to perform simulations of a reaction and derive kinetic parameters. However, this is computationally demanding and requires detailed knowledge of the enzyme mechanism. We have therefore sought to develop a general, simple and computationally efficient procedure to relate protein structural information to enzymatic kinetic parameters that allows consistency between the kinetic and structural information to be checked and estimation of kinetic constants for structurally and mechanistically similar enzymes. Results We describe qPIPSA: quantitative Protein Interaction Property Similarity Analysis. In this analysis, molecular interaction fields, for example, electrostatic potentials, are computed from the enzyme structures. Differences in molecular interaction fields between enzymes are then related to the ratios of their kinetic parameters. This procedure can be used to estimate unknown kinetic parameters when enzyme structural information is available and kinetic parameters have been measured for related enzymes or were obtained under different conditions. The detailed interaction of the enzyme with substrate or cofactors is not modeled and is assumed to be similar for all the proteins compared. The protein structure modeling protocol employed ensures that differences between models reflect genuine differences between the protein sequences, rather than random fluctuations in protein structure. Conclusion Provided that the experimental conditions and the protein structural models refer to the same protein state or conformation, correlations between interaction fields and kinetic parameters can be established for sets of related enzymes. Outliers may arise due to variation in the importance of different contributions to the kinetic parameters, such as protein stability and conformational changes. The qPIPSA approach can assist in the validation as well as estimation of kinetic parameters, and provide insights into enzyme mechanism. PMID:17919319
Xue, Yi; Skrynnikov, Nikolai R
2014-01-01
Currently, the best existing molecular dynamics (MD) force fields cannot accurately reproduce the global free-energy minimum which realizes the experimental protein structure. As a result, long MD trajectories tend to drift away from the starting coordinates (e.g., crystallographic structures). To address this problem, we have devised a new simulation strategy aimed at protein crystals. An MD simulation of protein crystal is essentially an ensemble simulation involving multiple protein molecules in a crystal unit cell (or a block of unit cells). To ensure that average protein coordinates remain correct during the simulation, we introduced crystallography-based restraints into the MD protocol. Because these restraints are aimed at the ensemble-average structure, they have only minimal impact on conformational dynamics of the individual protein molecules. So long as the average structure remains reasonable, the proteins move in a native-like fashion as dictated by the original force field. To validate this approach, we have used the data from solid-state NMR spectroscopy, which is the orthogonal experimental technique uniquely sensitive to protein local dynamics. The new method has been tested on the well-established model protein, ubiquitin. The ensemble-restrained MD simulations produced lower crystallographic R factors than conventional simulations; they also led to more accurate predictions for crystallographic temperature factors, solid-state chemical shifts, and backbone order parameters. The predictions for 15N R1 relaxation rates are at least as accurate as those obtained from conventional simulations. Taken together, these results suggest that the presented trajectories may be among the most realistic protein MD simulations ever reported. In this context, the ensemble restraints based on high-resolution crystallographic data can be viewed as protein-specific empirical corrections to the standard force fields. PMID:24452989
Maurer-Stroh, Sebastian; Gao, He; Han, Hao; Baeten, Lies; Schymkowitz, Joost; Rousseau, Frederic; Zhang, Louxin; Eisenhaber, Frank
2013-02-01
Data mining in protein databases, derivatives from more fundamental protein 3D structure and sequence databases, has considerable unearthed potential for the discovery of sequence motif--structural motif--function relationships as the finding of the U-shape (Huf-Zinc) motif, originally a small student's project, exemplifies. The metal ion zinc is critically involved in universal biological processes, ranging from protein-DNA complexes and transcription regulation to enzymatic catalysis and metabolic pathways. Proteins have evolved a series of motifs to specifically recognize and bind zinc ions. Many of these, so called zinc fingers, are structurally independent globular domains with discontinuous binding motifs made up of residues mostly far apart in sequence. Through a systematic approach starting from the BRIX structure fragment database, we discovered that there exists another predictable subset of zinc-binding motifs that not only have a conserved continuous sequence pattern but also share a characteristic local conformation, despite being included in totally different overall folds. While this does not allow general prediction of all Zn binding motifs, a HMM-based web server, Huf-Zinc, is available for prediction of these novel, as well as conventional, zinc finger motifs in protein sequences. The Huf-Zinc webserver can be freely accessed through this URL (http://mendel.bii.a-star.edu.sg/METHODS/hufzinc/).
Joseph, Benesh; Sikora, Arthur; Bordignon, Enrica; Jeschke, Gunnar; Cafiso, David S; Prisner, Thomas F
2015-05-18
Membrane proteins may be influenced by the environment, and they may be unstable in detergents or fail to crystallize. As a result, approaches to characterize structures in a native environment are highly desirable. Here, we report a novel general strategy for precise distance measurements on outer membrane proteins in whole Escherichia coli cells and isolated outer membranes. The cobalamin transporter BtuB was overexpressed and spin-labeled in whole cells and outer membranes and interspin distances were measured to a spin-labeled cobalamin using pulse EPR spectroscopy. A comparative analysis of the data reveals a similar interspin distance between whole cells, outer membranes, and synthetic vesicles. This approach provides an elegant way to study conformational changes or protein-protein/ligand interactions at surface-exposed sites of membrane protein complexes in whole cells and native membranes, and provides a method to validate outer membrane protein structures in their native environment. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
NASA Astrophysics Data System (ADS)
Lou, Chenguang; Martos-Maldonado, Manuel C.; Madsen, Charlotte S.; Thomsen, Rasmus P.; Midtgaard, Søren Roi; Christensen, Niels Johan; Kjems, Jørgen; Thulstrup, Peter W.; Wengel, Jesper; Jensen, Knud J.
2016-07-01
Peptide-based structures can be designed to yield artificial proteins with specific folding patterns and functions. Template-based assembly of peptide units is one design option, but the use of two orthogonal self-assembly principles, oligonucleotide triple helix and a coiled coil protein domain formation have never been realized for de novo protein design. Here, we show the applicability of peptide-oligonucleotide conjugates for self-assembly of higher-ordered protein-like structures. The resulting nano-assemblies were characterized by ultraviolet-melting, gel electrophoresis, circular dichroism (CD) spectroscopy, small-angle X-ray scattering and transmission electron microscopy. These studies revealed the formation of the desired triple helix and coiled coil domains at low concentrations, while a dimer of trimers was dominating at high concentration. CD spectroscopy showed an extraordinarily high degree of α-helicity for the peptide moieties in the assemblies. The results validate the use of orthogonal self-assembly principles as a paradigm for de novo protein design.
Development and Fit-for-Purpose Validation of a Soluble Human Programmed Death-1 Protein Assay.
Ni, Yan G; Yuan, Xiling; Newitt, John A; Peterson, Jon E; Gleason, Carol R; Haulenbeek, Jonathan; Santockyte, Rasa; Lafont, Virginie; Marsilio, Frank; Neely, Robert J; DeSilva, Binodh; Piccoli, Steven P
2015-07-01
Programmed death-1 (PD-1) protein is a co-inhibitory receptor which negatively regulates immune cell activation and permits tumors to evade normal immune defense. Anti-PD-1 antibodies have been shown to restore immune cell activation and effector function-an exciting breakthrough in cancer immunotherapy. Recent reports have documented a soluble form of PD-1 (sPD-1) in the circulation of normal and disease state individuals. A clinical assay to quantify sPD-1 would contribute to the understanding of sPD-1-function and facilitate the development of anti-PD-1 drugs. Here, we report the development and validation of a sPD-1 protein assay. The assay validation followed the framework for full validation of a biotherapeutic pharmacokinetic assay. A purified recombinant human PD-1 protein was characterized extensively and was identified as the assay reference material which mimics the endogenous analyte in structure and function. The lower limit of quantitation (LLOQ) was determined to be 100 pg/mL, with a dynamic range spanning three logs to 10,000 pg/mL. The intra- and inter-assay imprecision were ≤15%, and the assay bias (percent deviation) was ≤10%. Potential matrix effects were investigated in sera from both normal healthy volunteers and selected cancer patients. Bulk-prepared frozen standards and pre-coated Streptavidin plates were used in the assay to ensure consistency in assay performance over time. This assay appears to specifically measure total sPD-1 protein since the human anti-PD-1 antibody, nivolumab, and the endogenous ligands of PD-1 protein, PDL-1 and PDL-2, do not interfere with the assay.
Gouran, Hossein; Chakraborty, Sandeep; Rao, Basuthkar J; Asgeirsson, Bjarni; Dandekar, Abhaya
2014-01-01
Duplication of genes is one of the preferred ways for natural selection to add advantageous functionality to the genome without having to reinvent the wheel with respect to catalytic efficiency and protein stability. The duplicated secretory virulence factors of Xylella fastidiosa (LesA, LesB and LesC), implicated in Pierce's disease of grape and citrus variegated chlorosis of citrus species, epitomizes the positive selection pressures exerted on advantageous genes in such pathogens. A deeper insight into the evolution of these lipases/esterases is essential to develop resistance mechanisms in transgenic plants. Directed evolution, an attempt to accelerate the evolutionary steps in the laboratory, is inherently simple when targeted for loss of function. A bigger challenge is to specify mutations that endow a new function, such as a lost functionality in a duplicated gene. Previously, we have proposed a method for enumerating candidates for mutations intended to transfer the functionality of one protein into another related protein based on the spatial and electrostatic properties of the active site residues (DECAAF). In the current work, we present in vivo validation of DECAAF by inducing tributyrin hydrolysis in LesB based on the active site similarity to LesA. The structures of these proteins have been modeled using RaptorX based on the closely related LipA protein from Xanthomonas oryzae. These mutations replicate the spatial and electrostatic conformation of LesA in the modeled structure of the mutant LesB as well, providing in silico validation before proceeding to the laborious in vivo work. Such focused mutations allows one to dissect the relevance of the duplicated genes in finer detail as compared to gene knockouts, since they do not interfere with other moonlighting functions, protein expression levels or protein-protein interaction.
Rao, Basuthkar J.; Asgeirsson, Bjarni; Dandekar, Abhaya
2014-01-01
Duplication of genes is one of the preferred ways for natural selection to add advantageous functionality to the genome without having to reinvent the wheel with respect to catalytic efficiency and protein stability. The duplicated secretory virulence factors of Xylella fastidiosa (LesA, LesB and LesC), implicated in Pierce's disease of grape and citrus variegated chlorosis of citrus species, epitomizes the positive selection pressures exerted on advantageous genes in such pathogens. A deeper insight into the evolution of these lipases/esterases is essential to develop resistance mechanisms in transgenic plants. Directed evolution, an attempt to accelerate the evolutionary steps in the laboratory, is inherently simple when targeted for loss of function. A bigger challenge is to specify mutations that endow a new function, such as a lost functionality in a duplicated gene. Previously, we have proposed a method for enumerating candidates for mutations intended to transfer the functionality of one protein into another related protein based on the spatial and electrostatic properties of the active site residues (DECAAF). In the current work, we present in vivo validation of DECAAF by inducing tributyrin hydrolysis in LesB based on the active site similarity to LesA. The structures of these proteins have been modeled using RaptorX based on the closely related LipA protein from Xanthomonas oryzae. These mutations replicate the spatial and electrostatic conformation of LesA in the modeled structure of the mutant LesB as well, providing in silico validation before proceeding to the laborious in vivo work. Such focused mutations allows one to dissect the relevance of the duplicated genes in finer detail as compared to gene knockouts, since they do not interfere with other moonlighting functions, protein expression levels or protein-protein interaction. PMID:25717364
Validating clustering of molecular dynamics simulations using polymer models.
Phillips, Joshua L; Colvin, Michael E; Newsam, Shawn
2011-11-14
Molecular dynamics (MD) simulation is a powerful technique for sampling the meta-stable and transitional conformations of proteins and other biomolecules. Computational data clustering has emerged as a useful, automated technique for extracting conformational states from MD simulation data. Despite extensive application, relatively little work has been done to determine if the clustering algorithms are actually extracting useful information. A primary goal of this paper therefore is to provide such an understanding through a detailed analysis of data clustering applied to a series of increasingly complex biopolymer models. We develop a novel series of models using basic polymer theory that have intuitive, clearly-defined dynamics and exhibit the essential properties that we are seeking to identify in MD simulations of real biomolecules. We then apply spectral clustering, an algorithm particularly well-suited for clustering polymer structures, to our models and MD simulations of several intrinsically disordered proteins. Clustering results for the polymer models provide clear evidence that the meta-stable and transitional conformations are detected by the algorithm. The results for the polymer models also help guide the analysis of the disordered protein simulations by comparing and contrasting the statistical properties of the extracted clusters. We have developed a framework for validating the performance and utility of clustering algorithms for studying molecular biopolymer simulations that utilizes several analytic and dynamic polymer models which exhibit well-behaved dynamics including: meta-stable states, transition states, helical structures, and stochastic dynamics. We show that spectral clustering is robust to anomalies introduced by structural alignment and that different structural classes of intrinsically disordered proteins can be reliably discriminated from the clustering results. To our knowledge, our framework is the first to utilize model polymers to rigorously test the utility of clustering algorithms for studying biopolymers.
Validating clustering of molecular dynamics simulations using polymer models
2011-01-01
Background Molecular dynamics (MD) simulation is a powerful technique for sampling the meta-stable and transitional conformations of proteins and other biomolecules. Computational data clustering has emerged as a useful, automated technique for extracting conformational states from MD simulation data. Despite extensive application, relatively little work has been done to determine if the clustering algorithms are actually extracting useful information. A primary goal of this paper therefore is to provide such an understanding through a detailed analysis of data clustering applied to a series of increasingly complex biopolymer models. Results We develop a novel series of models using basic polymer theory that have intuitive, clearly-defined dynamics and exhibit the essential properties that we are seeking to identify in MD simulations of real biomolecules. We then apply spectral clustering, an algorithm particularly well-suited for clustering polymer structures, to our models and MD simulations of several intrinsically disordered proteins. Clustering results for the polymer models provide clear evidence that the meta-stable and transitional conformations are detected by the algorithm. The results for the polymer models also help guide the analysis of the disordered protein simulations by comparing and contrasting the statistical properties of the extracted clusters. Conclusions We have developed a framework for validating the performance and utility of clustering algorithms for studying molecular biopolymer simulations that utilizes several analytic and dynamic polymer models which exhibit well-behaved dynamics including: meta-stable states, transition states, helical structures, and stochastic dynamics. We show that spectral clustering is robust to anomalies introduced by structural alignment and that different structural classes of intrinsically disordered proteins can be reliably discriminated from the clustering results. To our knowledge, our framework is the first to utilize model polymers to rigorously test the utility of clustering algorithms for studying biopolymers. PMID:22082218
Ma, Dejian; Tillman, Tommy S; Tang, Pei; Meirovitch, Eva; Eckenhoff, Roderic; Carnini, Anna; Xu, Yan
2008-10-28
Structural studies of polytopic membrane proteins are often hampered by the vagaries of these proteins in membrane mimetic environments and by the difficulties in handling them with conventional techniques. Designing and creating water-soluble analogues with preserved native structures offer an attractive alternative. We report here solution NMR studies of WSK3, a water-soluble analogue of the potassium channel KcsA. The WSK3 NMR structure (PDB ID code 2K1E) resembles the KcsA crystal structures, validating the approach. By more stringent comparison criteria, however, the introduction of several charged residues aimed at improving water solubility seems to have led to the possible formations of a few salt bridges and hydrogen bonds not present in the native structure, resulting in slight differences in the structure of WSK3 relative to KcsA. NMR dynamics measurements show that WSK3 is highly flexible in the absence of a lipid environment. Reduced spectral density mapping and model-free analyses reveal dynamic characteristics consistent with an isotropically tumbling tetramer experiencing slow (nanosecond) motions with unusually low local ordering. An altered hydrogen-bond network near the selectivity filter and the pore helix, and the intrinsically dynamic nature of the selectivity filter, support the notion that this region is crucial for slow inactivation. Our results have implications not only for the design of water-soluble analogues of membrane proteins but also for our understanding of the basic determinants of intrinsic protein structure and dynamics.
NASA Astrophysics Data System (ADS)
Volpert, Marianna; Mangum, Jonathan E.; Jamsai, Duangporn; D'Sylva, Rebecca; O'Bryan, Moira K.; McIntyre, Peter
2014-02-01
While the Cysteine-Rich Secretory Proteins (CRISPs) have been broadly proposed as regulators of reproduction and immunity, physiological roles have yet to be established for individual members of this family. Past efforts to investigate their functions have been limited by the difficulty of purifying correctly folded CRISPs from bacterial expression systems, which yield low quantities of correctly folded protein containing the eight disulfide bonds that define the CRISP family. Here we report the expression and purification of native, glycosylated CRISP3 from human and mouse, expressed in HEK 293 cells and isolated using ion exchange and size exclusion chromatography. Functional authenticity was verified by substrate-affinity, native glycosylation characteristics and quaternary structure (monomer in solution). Validated protein was used in comparative structure/function studies to characterise sites and patterns of N-glycosylation in CRISP3, revealing interesting inter-species differences.
Chiappori, Federica; Mattiazzi, Luca; Milanesi, Luciano; Merelli, Ivan
2016-03-02
Phosphorylation is one of the most important post-translational modifications (PTM) employed by cells to regulate several cellular processes. Studying the effects of phosphorylations on protein structures allows to investigate the modulation mechanisms of several proteins including chaperones, like the small HSPs, which display different multimeric structures according to the phosphorylation of a few serine residues. In this context, the proposed study is aimed at finding a method to correlate different PTM patterns (in particular phosphorylations at the monomers interface of multimeric complexes) with the dynamic behaviour of the complex, using physicochemical parameters derived from molecular dynamics simulations in the timescale of nanoseconds. We have developed a methodology relying on computing nine physicochemical parameters, derived from the analysis of short MD simulations, and combined with N identifiers that characterize the PTMs of the analysed protein. The nine general parameters were validated on three proteins, with known post-translational modified conformation and unmodified conformation. Then, we applied this approach to the case study of αB-Crystallin, a chaperone which multimeric state (up to 40 units) is supposed to be controlled by phosphorylation of Ser45 and Ser59. Phosphorylation of serines at the dimer interface induces the release of hexamers, the active state of αB-Crystallin. 30 ns of MD simulation were obtained for each possible combination of dimer phosphorylation state and average values of structural, dynamic, energetic and functional features were calculated on the equilibrated portion of the trajectories. Principal Component Analysis was applied to the parameters and the first five Principal Components, which summed up to 84 % of the total variance, were finally considered. The validation of this approach on multimeric proteins, which structures were known both modified and unmodified, allowed us to propose a new approach that can be used to predict the impact of PTM patterns in multi-modified proteins using data collected from short molecular dynamics simulations. Analysis on the αB-Crystallin case study clusters together all-P dimers with all-P hexamers and no-P dimer with no-P hexamer and results suggest a great influence of Ser59 phosphorylation on chain B.
Epa, V. Chandana; Dolezal, Olan; Doughty, Larissa; Xiao, Xiaowen; Jost, Christian; Plückthun, Andreas; Adams, Timothy E.
2013-01-01
Designed Ankyrin Repeat Proteins are a class of novel binding proteins that can be selected and evolved to bind to targets with high affinity and specificity. We are interested in the DARPin H10-2-G3, which has been evolved to bind with very high affinity to the human epidermal growth factor receptor 2 (HER2). HER2 is found to be over-expressed in 30% of breast cancers, and is the target for the FDA-approved therapeutic monoclonal antibodies trastuzumab and pertuzumab and small molecule tyrosine kinase inhibitors. Here, we use computational macromolecular docking, coupled with several interface metrics such as shape complementarity, interaction energy, and electrostatic complementarity, to model the structure of the complex between the DARPin H10-2-G3 and HER2. We analyzed the interface between the two proteins and then validated the structural model by showing that selected HER2 point mutations at the putative interface with H10-2-G3 reduce the affinity of binding up to 100-fold without affecting the binding of trastuzumab. Comparisons made with a subsequently solved X-ray crystal structure of the complex yielded a backbone atom root mean square deviation of 0.84–1.14 Ångstroms. The study presented here demonstrates the capability of the computational techniques of structural bioinformatics in generating useful structural models of protein-protein interactions. PMID:23527120
Lee, Woonghee; Kim, Jin Hae; Westler, William M; Markley, John L
2011-06-15
PONDEROSA (Peak-picking Of Noe Data Enabled by Restriction of Shift Assignments) accepts input information consisting of a protein sequence, backbone and sidechain NMR resonance assignments, and 3D-NOESY ((13)C-edited and/or (15)N-edited) spectra, and returns assignments of NOESY crosspeaks, distance and angle constraints, and a reliable NMR structure represented by a family of conformers. PONDEROSA incorporates and integrates external software packages (TALOS+, STRIDE and CYANA) to carry out different steps in the structure determination. PONDEROSA implements internal functions that identify and validate NOESY peak assignments and assess the quality of the calculated three-dimensional structure of the protein. The robustness of the analysis results from PONDEROSA's hierarchical processing steps that involve iterative interaction among the internal and external modules. PONDEROSA supports a variety of input formats: SPARKY assignment table (.shifts) and spectrum file formats (.ucsf), XEASY proton file format (.prot), and NMR-STAR format (.star). To demonstrate the utility of PONDEROSA, we used the package to determine 3D structures of two proteins: human ubiquitin and Escherichia coli iron-sulfur scaffold protein variant IscU(D39A). The automatically generated structural constraints and ensembles of conformers were as good as or better than those determined previously by much less automated means. The program, in the form of binary code along with tutorials and reference manuals, is available at http://ponderosa.nmrfam.wisc.edu/.
Nagpal, Suhani; Tiwari, Satyam; Mapa, Koyeli; Thukral, Lipi
2015-01-01
Many proteins comprising of complex topologies require molecular chaperones to achieve their unique three-dimensional folded structure. The E.coli chaperone, GroEL binds with a large number of unfolded and partially folded proteins, to facilitate proper folding and prevent misfolding and aggregation. Although the major structural components of GroEL are well defined, scaffolds of the non-native substrates that determine chaperone-mediated folding have been difficult to recognize. Here we performed all-atomistic and replica-exchange molecular dynamics simulations to dissect non-native ensemble of an obligate GroEL folder, DapA. Thermodynamics analyses of unfolding simulations revealed populated intermediates with distinct structural characteristics. We found that surface exposed hydrophobic patches are significantly increased, primarily contributed from native and non-native β-sheet elements. We validate the structural properties of these conformers using experimental data, including circular dichroism (CD), 1-anilinonaphthalene-8-sulfonic acid (ANS) binding measurements and previously reported hydrogen-deutrium exchange coupled to mass spectrometry (HDX-MS). Further, we constructed network graphs to elucidate long-range intra-protein connectivity of native and intermediate topologies, demonstrating regions that serve as central “hubs”. Overall, our results implicate that genomic variations (or mutations) in the distinct regions of protein structures might disrupt these topological signatures disabling chaperone-mediated folding, leading to formation of aggregates. PMID:26394388
Bioinformatic prediction and in vivo validation of residue-residue interactions in human proteins
NASA Astrophysics Data System (ADS)
Jordan, Daniel; Davis, Erica; Katsanis, Nicholas; Sunyaev, Shamil
2014-03-01
Identifying residue-residue interactions in protein molecules is important for understanding both protein structure and function in the context of evolutionary dynamics and medical genetics. Such interactions can be difficult to predict using existing empirical or physical potentials, especially when residues are far from each other in sequence space. Using a multiple sequence alignment of 46 diverse vertebrate species we explore the space of allowed sequences for orthologous protein families. Amino acid changes that are known to damage protein function allow us to identify specific changes that are likely to have interacting partners. We fit the parameters of the continuous-time Markov process used in the alignment to conclude that these interactions are primarily pairwise, rather than higher order. Candidates for sites under pairwise epistasis are predicted, which can then be tested by experiment. We report the results of an initial round of in vivo experiments in a zebrafish model that verify the presence of multiple pairwise interactions predicted by our model. These experimentally validated interactions are novel, distant in sequence, and are not readily explained by known biochemical or biophysical features.
The PYRIN domain: A member of the death domain-fold superfamily
Fairbrother, Wayne J.; Gordon, Nathaniel C.; Humke, Eric W.; O'Rourke, Karen M.; Starovasnik, Melissa A.; Yin, Jian-Ping; Dixit, Vishva M.
2001-01-01
PYRIN domains were identified recently as putative protein–protein interaction domains at the N-termini of several proteins thought to function in apoptotic and inflammatory signaling pathways. The ∼95 residue PYRIN domains have no statistically significant sequence homology to proteins with known three-dimensional structure. Using secondary structure prediction and potential-based fold recognition methods, however, the PYRIN domain is predicted to be a member of the six-helix bundle death domain-fold superfamily that includes death domains (DDs), death effector domains (DEDs), and caspase recruitment domains (CARDs). Members of the death domain-fold superfamily are well established mediators of protein–protein interactions found in many proteins involved in apoptosis and inflammation, indicating further that the PYRIN domains serve a similar function. An homology model of the PYRIN domain of CARD7/DEFCAP/NAC/NALP1, a member of the Apaf-1/Ced-4 family of proteins, was constructed using the three-dimensional structures of the FADD and p75 neurotrophin receptor DDs, and of the Apaf-1 and caspase-9 CARDs, as templates. Validation of the model using a variety of computational techniques indicates that the fold prediction is consistent with the sequence. Comparison of a circular dichroism spectrum of the PYRIN domain of CARD7/DEFCAP/NAC/NALP1 with spectra of several proteins known to adopt the death domain-fold provides experimental support for the structure prediction. PMID:11514682
NOXclass: prediction of protein-protein interaction types.
Zhu, Hongbo; Domingues, Francisco S; Sommer, Ingolf; Lengauer, Thomas
2006-01-19
Structural models determined by X-ray crystallography play a central role in understanding protein-protein interactions at the molecular level. Interpretation of these models requires the distinction between non-specific crystal packing contacts and biologically relevant interactions. This has been investigated previously and classification approaches have been proposed. However, less attention has been devoted to distinguishing different types of biological interactions. These interactions are classified as obligate and non-obligate according to the effect of the complex formation on the stability of the protomers. So far no automatic classification methods for distinguishing obligate, non-obligate and crystal packing interactions have been made available. Six interface properties have been investigated on a dataset of 243 protein interactions. The six properties have been combined using a support vector machine algorithm, resulting in NOXclass, a classifier for distinguishing obligate, non-obligate and crystal packing interactions. We achieve an accuracy of 91.8% for the classification of these three types of interactions using a leave-one-out cross-validation procedure. NOXclass allows the interpretation and analysis of protein quaternary structures. In particular, it generates testable hypotheses regarding the nature of protein-protein interactions, when experimental results are not available. We expect this server will benefit the users of protein structural models, as well as protein crystallographers and NMR spectroscopists. A web server based on the method and the datasets used in this study are available at http://noxclass.bioinf.mpi-inf.mpg.de/.
Singh, Raghvendra Pratap; Singh, Ram Nageena; Srivastava, Manish K; Srivastava, Alok Kumar; Kumar, Sudheer; Dubey, Ramesh Chandra; Sharma, Arun Kumar
2012-01-01
Methylobacteria are ubiquitous in the biosphere which are capable of growing on C1 compounds such as formate, formaldehyde, methanol and methylamine as well as on a wide range of multi-carbon growth substrates such as C2, C3 and C4 compounds due to the methylotrophic enzymes methanol dehydrogenase (MDH). MDH is performing these functions with the help of a key protein mxaF. Unfortunately, detailed structural analysis and homology modeling of mxaF is remains undefined. Hence, the objective of this research is the characterization and three dimensional modeling of mxaF protein from three different methylotrophs by using I-TASSER server. The predicted model were further optimize and validate by Profile 3D, Errat, Verifiy3-D and PROCHECK server. Predicted and best evaluated models have been successfully deposited to PMDB database with PMDB ID PM0077505, PM0077506 and PM0077507. Active site identification revealed 11, 13 and 14 putative functional site residues in respected models. It may play a major role during protein-protein, and protein-cofactor interactions. This study can provide us an ab-initio and detail information to understand the structure, mechanism of action and regulation of mxaF protein.
Singh, Raghvendra Pratap; Singh, Ram Nageena; Srivastava, Manish K; Srivastava, Alok Kumar; Kumar, Sudheer; Dubey, Ramesh Chandra; Sharma, Arun Kumar
2012-01-01
Methylobacteria are ubiquitous in the biosphere which are capable of growing on C1 compounds such as formate, formaldehyde, methanol and methylamine as well as on a wide range of multi-carbon growth substrates such as C2, C3 and C4 compounds due to the methylotrophic enzymes methanol dehydrogenase (MDH). MDH is performing these functions with the help of a key protein mxaF. Unfortunately, detailed structural analysis and homology modeling of mxaF is remains undefined. Hence, the objective of this research is the characterization and three dimensional modeling of mxaF protein from three different methylotrophs by using I-TASSER server. The predicted model were further optimize and validate by Profile 3D, Errat, Verifiy3-D and PROCHECK server. Predicted and best evaluated models have been successfully deposited to PMDB database with PMDB ID PM0077505, PM0077506 and PM0077507. Active site identification revealed 11, 13 and 14 putative functional site residues in respected models. It may play a major role during protein-protein, and protein-cofactor interactions. This study can provide us an ab-initio and detail information to understand the structure, mechanism of action and regulation of mxaF protein. PMID:23275704
Massari, Serena; Goracci, Laura; Desantis, Jenny; Tabarrini, Oriana
2016-09-08
The limited therapeutic options against the influenza virus (flu) and increasing challenges in drug resistance make the search for next-generation agents imperative. In this context, heterotrimeric viral PA/PB1/PB2 RNA-dependent RNA polymerase is an attractive target for a challenging but strategic protein-protein interaction (PPI) inhibition approach. Since 2012, the inhibition of the polymerase PA-PB1 subunit interface has become an active field of research following the publication of PA-PB1 crystal structures. In this Perspective, we briefly discuss the validity of flu polymerase as a drug target and its inhibition through a PPI inhibition strategy, including a comprehensive analysis of available PA-PB1 structures. An overview of all of the reported PA-PB1 complex formation inhibitors is provided, and approaches used for identification of the inhibitors, the hit-to-lead studies, and the emerged structure-activity relationship are described. In addition to highlighting the strengths and weaknesses of all of the PA-PB1 heterodimerization inhibitors, we analyze their hypothesized binding modes and alignment with a pharmacophore model that we have developed.
Dias, David M.; Ciulli, Alessio
2014-01-01
Nuclear magnetic resonance (NMR) spectroscopy is a pivotal method for structure-based and fragment-based lead discovery because it is one of the most robust techniques to provide information on protein structure, dynamics and interaction at an atomic level in solution. Nowadays, in most ligand screening cascades, NMR-based methods are applied to identify and structurally validate small molecule binding. These can be high-throughput and are often used synergistically with other biophysical assays. Here, we describe current state-of-the-art in the portfolio of available NMR-based experiments that are used to aid early-stage lead discovery. We then focus on multi-protein complexes as targets and how NMR spectroscopy allows studying of interactions within the high molecular weight assemblies that make up a vast fraction of the yet untargeted proteome. Finally, we give our perspective on how currently available methods could build an improved strategy for drug discovery against such challenging targets. PMID:25175337
Improve the prediction of RNA-binding residues using structural neighbours.
Li, Quan; Cao, Zanxia; Liu, Haiyan
2010-03-01
The interactions between RNA-binding proteins (RBPs) with RNA play key roles in managing some of the cell's basic functions. The identification and prediction of RNA binding sites is important for understanding the RNA-binding mechanism. Computational approaches are being developed to predict RNA-binding residues based on the sequence- or structure-derived features. To achieve higher prediction accuracy, improvements on current prediction methods are necessary. We identified that the structural neighbors of RNA-binding and non-RNA-binding residues have different amino acid compositions. Combining this structure-derived feature with evolutionary (PSSM) and other structural information (secondary structure and solvent accessibility) significantly improves the predictions over existing methods. Using a multiple linear regression approach and 6-fold cross validation, our best model can achieve an overall correct rate of 87.8% and MCC of 0.47, with a specificity of 93.4%, correctly predict 52.4% of the RNA-binding residues for a dataset containing 107 non-homologous RNA-binding proteins. Compared with existing methods, including the amino acid compositions of structure neighbors lead to clearly improvement. A web server was developed for predicting RNA binding residues in a protein sequence (or structure),which is available at http://mcgill.3322.org/RNA/.
Karageorgos, Ioannis; Gallagher, Elyssia S; Galvin, Connor; Gallagher, D Travis; Hudgens, Jeffrey W
2017-11-01
Monoclonal antibody pharmaceuticals are the fastest-growing class of therapeutics, with a wide range of clinical applications. To assure their safety, these protein drugs must demonstrate highly consistent purity and stability. Key to these objectives is higher order structure measurements validated by calibration to reference materials. We describe preparation, characterization, and crystal structure of the Fab fragment prepared from the NIST Reference Antibody RM 8671 (NISTmAb). NISTmAb is a humanized IgG1κ antibody, produced in murine cell culture and purified by standard biopharmaceutical production methods, developed at the National Institute of Standards and Technology (NIST) to serve as a reference material. The Fab fragment was derived from NISTmAb through papain cleavage followed by protein A based purification. The purified Fab fragment was characterized by SDS-PAGE, capillary gel electrophoresis, multi-angle light scattering, size exclusion chromatography, mass spectrometry, and x-ray crystallography. The crystal structure at 0.2 nm resolution includes four independent Fab molecules with complete light chains and heavy chains through Cys 223, enabling assessment of conformational variability and providing a well-characterized reference structure for research and engineering applications. This nonproprietary, publically available reference material of known higher-order structure can support metrology in biopharmaceutical applications, and it is a suitable platform for validation of molecular modeling studies. Published by Elsevier Ltd.
Merkley, Eric D; Rysavy, Steven; Kahraman, Abdullah; Hafen, Ryan P; Daggett, Valerie; Adkins, Joshua N
2014-01-01
Integrative structural biology attempts to model the structures of protein complexes that are challenging or intractable by classical structural methods (due to size, dynamics, or heterogeneity) by combining computational structural modeling with data from experimental methods. One such experimental method is chemical crosslinking mass spectrometry (XL-MS), in which protein complexes are crosslinked and characterized using liquid chromatography-mass spectrometry to pinpoint specific amino acid residues in close structural proximity. The commonly used lysine-reactive N-hydroxysuccinimide ester reagents disuccinimidylsuberate (DSS) and bis(sulfosuccinimidyl)suberate (BS3) have a linker arm that is 11.4 Å long when fully extended, allowing Cα (alpha carbon of protein backbone) atoms of crosslinked lysine residues to be up to ∼24 Å apart. However, XL-MS studies on proteins of known structure frequently report crosslinks that exceed this distance. Typically, a tolerance of ∼3 Å is added to the theoretical maximum to account for this observation, with limited justification for the chosen value. We used the Dynameomics database, a repository of high-quality molecular dynamics simulations of 807 proteins representative of diverse protein folds, to investigate the relationship between lysine–lysine distances in experimental starting structures and in simulation ensembles. We conclude that for DSS/BS3, a distance constraint of 26–30 Å between Cα atoms is appropriate. This analysis provides a theoretical basis for the widespread practice of adding a tolerance to the crosslinker length when comparing XL-MS results to structures or in modeling. We also discuss the comparison of XL-MS results to MD simulations and known structures as a means to test and validate experimental XL-MS methods. PMID:24639379
Basu, Sankar
2017-12-07
The complementarity plot (CP) is an established validation tool for protein structures, applicable to both globular proteins (folding) as well as protein-protein complexes (binding). It computes the shape and electrostatic complementarities (S m , E m ) for amino acid side-chains buried within the protein interior or interface and plots them in a two-dimensional plot having knowledge-based probabilistic quality estimates for the residues as well as for the whole structure. The current report essentially presents an upgraded version of the plot with the implementation of the advanced multi-dielectric functionality (as in Delphi version 6.2 or higher) in the computation of electrostatic complementarity to make the validation tool physico-chemically more realistic. The two methods (single- and multi-dielectric) agree decently in their resultant E m values, and hence, provisions for both methods have been kept in the software suite. So to speak, the global electrostatic balance within a well-folded protein and/or a well-packed interface seems only marginally perturbed by the choice of different internal dielectric values. However, both from theoretical as well as practical grounds, the more advanced multi-dielectric version of the plot is certainly recommended for potentially producing more reliable results. The report also presents a new methodology and a variant plot, namely CP dock , based on the same principles of complementarity specifically designed to be used in the docking of proteins. The efficacy of the method to discriminate between good and bad docked protein complexes has been tested on a recent state-of-the-art docking benchmark. The results unambiguously indicate that CP dock can indeed be effective in the initial screening phase of a docking scoring pipeline before going into more sophisticated and computationally expensive scoring functions. CP dock has been made available at https://github.com/nemo8130/CPdock . Graphical Abstract An example showing the efficacy of CP dock to be used in the initial screening phase of a protein-protein docking scoring pipeline.
Kepp, Kasper P
2015-10-01
Fast and accurate computation of protein stability is increasingly important for e.g. protein engineering and protein misfolding diseases, but no consensus methods exist for important proteins such as globins, and performance may depend on the type of structural input given. This paper reports benchmarking of six protein stability calculators (POPMUSIC 2.1, I-Mutant 2.0, I-Mutant 3.0, CUPSAT, SDM, and mCSM) against 134 experimental stability changes for mutations of sperm-whale myoglobin. Six different high-resolution structures were used to test structure sensitivity that may impair protein calculations. The trend accuracy of the methods decreased as I-Mutant 2.0 (R=0.64-0.65), SDM (R=0.57-0.60), POPMUSIC2.1 (R=0.54-0.57), I-Mutant 3.0 (R=0.53-0.55), mCSM (R=0.35-0.47), and CUPSAT (R=0.25-0.48). The mean signed errors increased as SDM
Song, Jiangning; Yuan, Zheng; Tan, Hao; Huber, Thomas; Burrage, Kevin
2007-12-01
Disulfide bonds are primary covalent crosslinks between two cysteine residues in proteins that play critical roles in stabilizing the protein structures and are commonly found in extracy-toplasmatic or secreted proteins. In protein folding prediction, the localization of disulfide bonds can greatly reduce the search in conformational space. Therefore, there is a great need to develop computational methods capable of accurately predicting disulfide connectivity patterns in proteins that could have potentially important applications. We have developed a novel method to predict disulfide connectivity patterns from protein primary sequence, using a support vector regression (SVR) approach based on multiple sequence feature vectors and predicted secondary structure by the PSIPRED program. The results indicate that our method could achieve a prediction accuracy of 74.4% and 77.9%, respectively, when averaged on proteins with two to five disulfide bridges using 4-fold cross-validation, measured on the protein and cysteine pair on a well-defined non-homologous dataset. We assessed the effects of different sequence encoding schemes on the prediction performance of disulfide connectivity. It has been shown that the sequence encoding scheme based on multiple sequence feature vectors coupled with predicted secondary structure can significantly improve the prediction accuracy, thus enabling our method to outperform most of other currently available predictors. Our work provides a complementary approach to the current algorithms that should be useful in computationally assigning disulfide connectivity patterns and helps in the annotation of protein sequences generated by large-scale whole-genome projects. The prediction web server and Supplementary Material are accessible at http://foo.maths.uq.edu.au/~huber/disulfide
2014-01-01
Background The advent of human genome sequencing project has led to a spurt in the number of protein sequences in the databanks. Success of structure based drug discovery severely hinges on the availability of structures. Despite significant progresses in the area of experimental protein structure determination, the sequence-structure gap is continually widening. Data driven homology based computational methods have proved successful in predicting tertiary structures for sequences sharing medium to high sequence similarities. With dwindling similarities of query sequences, advanced homology/ ab initio hybrid approaches are being explored to solve structure prediction problem. Here we describe Bhageerath-H, a homology/ ab initio hybrid software/server for predicting protein tertiary structures with advancing drug design attempts as one of the goals. Results Bhageerath-H web-server was validated on 75 CASP10 targets which showed TM-scores ≥0.5 in 91% of the cases and Cα RMSDs ≤5Å from the native in 58% of the targets, which is well above the CASP10 water mark. Comparison with some leading servers demonstrated the uniqueness of the hybrid methodology in effectively sampling conformational space, scoring best decoys and refining low resolution models to high and medium resolution. Conclusion Bhageerath-H methodology is web enabled for the scientific community as a freely accessible web server. The methodology is fielded in the on-going CASP11 experiment. PMID:25521245
Buried chloride stereochemistry in the Protein Data Bank
2014-01-01
Background Despite the chloride anion is involved in fundamental biological processes, its interactions with proteins are little known. In particular, we lack a systematic survey of its coordination spheres. Results The analysis of a non-redundant set (pairwise sequence identity?30%) of 1739 high resolution (<2 Å) crystal structures that contain at least one chloride anion shows that the first coordination spheres of the chlorides are essentially constituted by hydrogen bond donors. Amongst the side-chains positively charged, arginine interacts with chlorides much more frequently than lysine. Although the most common coordination number is 4, the coordination stereochemistry is closer to the expected geometry when the coordination number is 5, suggesting that this is the coordination number towards which the chlorides tend when they interact with proteins. Conclusions The results of these analyses are useful in interpreting, describing, and validating new protein crystal structures that contain chloride anions. PMID:25928393
Buried chloride stereochemistry in the Protein Data Bank.
Carugo, Oliviero
2014-09-23
Despite the chloride anion is involved in fundamental biological processes, its interactions with proteins are little known. In particular, we lack a systematic survey of its coordination spheres. The analysis of a non-redundant set (pairwise sequence identity < 30%) of 1739 high resolution (<2 Å) crystal structures that contain at least one chloride anion shows that the first coordination spheres of the chlorides are essentially constituted by hydrogen bond donors. Amongst the side-chains positively charged, arginine interacts with chlorides much more frequently than lysine. Although the most common coordination number is 4, the coordination stereochemistry is closer to the expected geometry when the coordination number is 5, suggesting that this is the coordination number towards which the chlorides tend when they interact with proteins. The results of these analyses are useful in interpreting, describing, and validating new protein crystal structures that contain chloride anions.
ClusCo: clustering and comparison of protein models.
Jamroz, Michal; Kolinski, Andrzej
2013-02-22
The development, optimization and validation of protein modeling methods require efficient tools for structural comparison. Frequently, a large number of models need to be compared with the target native structure. The main reason for the development of Clusco software was to create a high-throughput tool for all-versus-all comparison, because calculating similarity matrix is the one of the bottlenecks in the protein modeling pipeline. Clusco is fast and easy-to-use software for high-throughput comparison of protein models with different similarity measures (cRMSD, dRMSD, GDT_TS, TM-Score, MaxSub, Contact Map Overlap) and clustering of the comparison results with standard methods: K-means Clustering or Hierarchical Agglomerative Clustering. The application was highly optimized and written in C/C++, including the code for parallel execution on CPU and GPU, which resulted in a significant speedup over similar clustering and scoring computation programs.
DockTrina: docking triangular protein trimers.
Popov, Petr; Ritchie, David W; Grudinin, Sergei
2014-01-01
In spite of the abundance of oligomeric proteins within a cell, the structural characterization of protein-protein interactions is still a challenging task. In particular, many of these interactions involve heteromeric complexes, which are relatively difficult to determine experimentally. Hence there is growing interest in using computational techniques to model such complexes. However, assembling large heteromeric complexes computationally is a highly combinatorial problem. Nonetheless the problem can be simplified greatly by considering interactions between protein trimers. After dimers and monomers, triangular trimers (i.e. trimers with pair-wise contacts between all three pairs of proteins) are the most frequently observed quaternary structural motifs according to the three-dimensional (3D) complex database. This article presents DockTrina, a novel protein docking method for modeling the 3D structures of nonsymmetrical triangular trimers. The method takes as input pair-wise contact predictions from a rigid body docking program. It then scans and scores all possible combinations of pairs of monomers using a very fast root mean square deviation test. Finally, it ranks the predictions using a scoring function which combines triples of pair-wise contact terms and a geometric clash penalty term. The overall approach takes less than 2 min per complex on a modern desktop computer. The method is tested and validated using a benchmark set of 220 bound and seven unbound protein trimer structures. DockTrina will be made available at http://nano-d.inrialpes.fr/software/docktrina. Copyright © 2013 Wiley Periodicals, Inc.
Physics-based method to validate and repair flaws in protein structures
Martin, Osvaldo A.; Arnautova, Yelena A.; Icazatti, Alejandro A.; Scheraga, Harold A.; Vila, Jorge A.
2013-01-01
A method that makes use of information provided by the combination of 13Cα and 13Cβ chemical shifts, computed at the density functional level of theory, enables one to (i) validate, at the residue level, conformations of proteins and detect backbone or side-chain flaws by taking into account an ensemble average of chemical shifts over all of the conformations used to represent a protein, with a sensitivity of ∼90%; and (ii) provide a set of (χ1/χ2) torsional angles that leads to optimal agreement between the observed and computed 13Cα and 13Cβ chemical shifts. The method has been incorporated into the CheShift-2 protein validation Web server. To test the reliability of the provided set of (χ1/χ2) torsional angles, the side chains of all reported conformations of five NMR-determined protein models were refined by a simple routine, without using NOE-based distance restraints. The refinement of each of these five proteins leads to optimal agreement between the observed and computed 13Cα and 13Cβ chemical shifts for ∼94% of the flaws, on average, without introducing a significantly large number of violations of the NOE-based distance restraints for a distance range ≤ 0.5 Ǻ, in which the largest number of distance violations occurs. The results of this work suggest that use of the provided set of (χ1/χ2) torsional angles together with other observables, such as NOEs, should lead to a fast and accurate refinement of the side-chain conformations of protein models. PMID:24082119
Physics-based method to validate and repair flaws in protein structures.
Martin, Osvaldo A; Arnautova, Yelena A; Icazatti, Alejandro A; Scheraga, Harold A; Vila, Jorge A
2013-10-15
A method that makes use of information provided by the combination of (13)C(α) and (13)C(β) chemical shifts, computed at the density functional level of theory, enables one to (i) validate, at the residue level, conformations of proteins and detect backbone or side-chain flaws by taking into account an ensemble average of chemical shifts over all of the conformations used to represent a protein, with a sensitivity of ∼90%; and (ii) provide a set of (χ1/χ2) torsional angles that leads to optimal agreement between the observed and computed (13)C(α) and (13)C(β) chemical shifts. The method has been incorporated into the CheShift-2 protein validation Web server. To test the reliability of the provided set of (χ1/χ2) torsional angles, the side chains of all reported conformations of five NMR-determined protein models were refined by a simple routine, without using NOE-based distance restraints. The refinement of each of these five proteins leads to optimal agreement between the observed and computed (13)C(α) and (13)C(β) chemical shifts for ∼94% of the flaws, on average, without introducing a significantly large number of violations of the NOE-based distance restraints for a distance range ≤ 0.5 , in which the largest number of distance violations occurs. The results of this work suggest that use of the provided set of (χ1/χ2) torsional angles together with other observables, such as NOEs, should lead to a fast and accurate refinement of the side-chain conformations of protein models.
Quality assessment of protein model-structures using evolutionary conservation.
Kalman, Matan; Ben-Tal, Nir
2010-05-15
Programs that evaluate the quality of a protein structural model are important both for validating the structure determination procedure and for guiding the model-building process. Such programs are based on properties of native structures that are generally not expected for faulty models. One such property, which is rarely used for automatic structure quality assessment, is the tendency for conserved residues to be located at the structural core and for variable residues to be located at the surface. We present ConQuass, a novel quality assessment program based on the consistency between the model structure and the protein's conservation pattern. We show that it can identify problematic structural models, and that the scores it assigns to the server models in CASP8 correlate with the similarity of the models to the native structure. We also show that when the conservation information is reliable, the method's performance is comparable and complementary to that of the other single-structure quality assessment methods that participated in CASP8 and that do not use additional structural information from homologs. A perl implementation of the method, as well as the various perl and R scripts used for the analysis are available at http://bental.tau.ac.il/ConQuass/. nirb@tauex.tau.ac.il Supplementary data are available at Bioinformatics online.
Lee, Woonghee; Stark, Jaime L; Markley, John L
2014-11-01
Peak-picking Of Noe Data Enabled by Restriction Of Shift Assignments-Client Server (PONDEROSA-C/S) builds on the original PONDEROSA software (Lee et al. in Bioinformatics 27:1727-1728. doi: 10.1093/bioinformatics/btr200, 2011) and includes improved features for structure calculation and refinement. PONDEROSA-C/S consists of three programs: Ponderosa Server, Ponderosa Client, and Ponderosa Analyzer. PONDEROSA-C/S takes as input the protein sequence, a list of assigned chemical shifts, and nuclear Overhauser data sets ((13)C- and/or (15)N-NOESY). The output is a set of assigned NOEs and 3D structural models for the protein. Ponderosa Analyzer supports the visualization, validation, and refinement of the results from Ponderosa Server. These tools enable semi-automated NMR-based structure determination of proteins in a rapid and robust fashion. We present examples showing the use of PONDEROSA-C/S in solving structures of four proteins: two that enable comparison with the original PONDEROSA package, and two from the Critical Assessment of automated Structure Determination by NMR (Rosato et al. in Nat Methods 6:625-626. doi: 10.1038/nmeth0909-625 , 2009) competition. The software package can be downloaded freely in binary format from http://pine.nmrfam.wisc.edu/download_packages.html. Registered users of the National Magnetic Resonance Facility at Madison can submit jobs to the PONDEROSA-C/S server at http://ponderosa.nmrfam.wisc.edu, where instructions, tutorials, and instructions can be found. Structures are normally returned within 1-2 days.
Structures of invisible, excited protein states by relaxation dispersion NMR spectroscopy
Vallurupalli, Pramodh; Hansen, D. Flemming; Kay, Lewis E.
2008-01-01
Molecular function is often predicated on excursions between ground states and higher energy conformers that can play important roles in ligand binding, molecular recognition, enzyme catalysis, and protein folding. The tools of structural biology enable a detailed characterization of ground state structure and dynamics; however, studies of excited state conformations are more difficult because they are of low population and may exist only transiently. Here we describe an approach based on relaxation dispersion NMR spectroscopy in which structures of invisible, excited states are obtained from chemical shifts and residual anisotropic magnetic interactions. To establish the utility of the approach, we studied an exchanging protein (Abp1p SH3 domain)–ligand (Ark1p peptide) system, in which the peptide is added in only small amounts so that the ligand-bound form is invisible. From a collection of 15N, 1HN, 13Cα, and 13CO chemical shifts, along with 1HN-15N, 1Hα-13Cα, and 1HN-13CO residual dipolar couplings and 13CO residual chemical shift anisotropies, all pertaining to the invisible, bound conformer, the structure of the bound state is determined. The structure so obtained is cross-validated by comparison with 1HN-15N residual dipolar couplings recorded in a second alignment medium. The methodology described opens up the possibility for detailed structural studies of invisible protein conformers at a level of detail that has heretofore been restricted to applications involving visible ground states of proteins. PMID:18701719
Characterization of the low-temperature properties of a simplified protein model
NASA Astrophysics Data System (ADS)
Hagmann, Johannes-Geert; Nakagawa, Naoko; Peyrard, Michel
2014-01-01
Prompted by results that showed that a simple protein model, the frustrated Gō model, appears to exhibit a transition reminiscent of the protein dynamical transition, we examine the validity of this model to describe the low-temperature properties of proteins. First, we examine equilibrium fluctuations. We calculate its incoherent neutron-scattering structure factor and show that it can be well described by a theory using the one-phonon approximation. By performing an inherent structure analysis, we assess the transitions among energy states at low temperatures. Then, we examine nonequilibrium fluctuations after a sudden cooling of the protein. We investigate the violation of the fluctuation-dissipation theorem in order to analyze the protein glass transition. We find that the effective temperature of the quenched protein deviates from the temperature of the thermostat, however it relaxes towards the actual temperature with an Arrhenius behavior as the waiting time increases. These results of the equilibrium and nonequilibrium studies converge to the conclusion that the apparent dynamical transition of this coarse-grained model cannot be attributed to a glassy behavior.
Mei, Suyu
2018-05-04
Bacterial protein-protein interaction (PPI) networks are significant to reveal the machinery of signal transduction and drug resistance within bacterial cells. The database STRING has collected a large number of bacterial pathogen PPI networks, but most of the data are of low quality without being experimentally or computationally validated, thus restricting its further biomedical applications. We exploit the experimental data via four solutions to enhance the quality of M. tuberculosis H37Rv (MTB) PPI networks in STRING. Computational results show that the experimental data derived jointly by two-hybrid and copurification approaches are the most reliable to train an L 2 -regularized logistic regression model for MTB PPI network validation. On the basis of the validated MTB PPI networks, we further study the three problems via breadth-first graph search algorithm: (1) discovery of MTB drug-resistance pathways through searching for the paths between known drug-target genes and drug-resistance genes, (2) choosing potential cotarget genes via searching for the critical genes located on multiple pathways, and (3) choosing essential drug-target genes via analysis of network degree distribution. In addition, we further combine the validated MTB PPI networks with human PPI networks to analyze the potential pharmacological risks of known and candidate drug-target genes from the point of view of system pharmacology. The evidence from protein structure alignment demonstrates that the drugs that act on MTB target genes could also adversely act on human signaling pathways.
Prytkova, Vera; Heyden, Matthias; Khago, Domarin; Freites, J Alfredo; Butts, Carter T; Martin, Rachel W; Tobias, Douglas J
2016-08-25
We present a novel multi-conformation Monte Carlo simulation method that enables the modeling of protein-protein interactions and aggregation in crowded protein solutions. This approach is relevant to a molecular-scale description of realistic biological environments, including the cytoplasm and the extracellular matrix, which are characterized by high concentrations of biomolecular solutes (e.g., 300-400 mg/mL for proteins and nucleic acids in the cytoplasm of Escherichia coli). Simulation of such environments necessitates the inclusion of a large number of protein molecules. Therefore, computationally inexpensive methods, such as rigid-body Brownian dynamics (BD) or Monte Carlo simulations, can be particularly useful. However, as we demonstrate herein, the rigid-body representation typically employed in simulations of many-protein systems gives rise to certain artifacts in protein-protein interactions. Our approach allows us to incorporate molecular flexibility in Monte Carlo simulations at low computational cost, thereby eliminating ambiguities arising from structure selection in rigid-body simulations. We benchmark and validate the methodology using simulations of hen egg white lysozyme in solution, a well-studied system for which extensive experimental data, including osmotic second virial coefficients, small-angle scattering structure factors, and multiple structures determined by X-ray and neutron crystallography and solution NMR, as well as rigid-body BD simulation results, are available for comparison.
Xu, Xianjin; Qiu, Liming; Yan, Chengfei; Ma, Zhiwei; Grinter, Sam Z; Zou, Xiaoqin
2017-03-01
Protein-protein interactions are either through direct contacts between two binding partners or mediated by structural waters. Both direct contacts and water-mediated interactions are crucial to the formation of a protein-protein complex. During the recent CAPRI rounds, a novel parallel searching strategy for predicting water-mediated interactions is introduced into our protein-protein docking method, MDockPP. Briefly, a FFT-based docking algorithm is employed in generating putative binding modes, and an iteratively derived statistical potential-based scoring function, ITScorePP, in conjunction with biological information is used to assess and rank the binding modes. Up to 10 binding modes are selected as the initial protein-protein complex structures for MD simulations in explicit solvent. Water molecules near the interface are clustered based on the snapshots extracted from independent equilibrated trajectories. Then, protein-ligand docking is employed for a parallel search for water molecules near the protein-protein interface. The water molecules generated by ligand docking and the clustered water molecules generated by MD simulations are merged, referred to as the predicted structural water molecules. Here, we report the performance of this protocol for CAPRI rounds 28-29 and 31-35 containing 20 valid docking targets and 11 scoring targets. In the docking experiments, we predicted correct binding modes for nine targets, including one high-accuracy, two medium-accuracy, and six acceptable predictions. Regarding the two targets for the prediction of water-mediated interactions, we achieved models ranked as "excellent" in accordance with the CAPRI evaluation criteria; one of these two targets is considered as a difficult target for structural water prediction. Proteins 2017; 85:424-434. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
He, Yan; Estephan, Rima; Yang, Xiaomin; Vela, Adriana; Wang, Hsin; Bernard, Cédric; Stark, Ruth E.
2011-01-01
Liver fatty acid-binding protein (LFABP) is a 14-kDa cytosolic polypeptide, differing from other family members in number of ligand binding sites, diversity of bound ligands, and transfer of fatty acid(s) to membranes primarily via aqueous diffusion rather than direct collisional interactions. Distinct two-dimensional 1H-15N NMR signals indicative of slowly exchanging LFABP assemblies formed during stepwise ligand titration were exploited, without solving the protein-ligand complex structures, to yield the stoichiometries for the bound ligands, their locations within the protein binding cavity, the sequence of ligand occupation, and the corresponding protein structural accommodations. Chemical shifts were monitored for wild-type LFABP and a R122L/S124A mutant in which electrostatic interactions viewed as essential to fatty acid binding were removed. For wild-type LFABP the results compared favorably with previous tertiary structures of oleate-bound wild-type LFABP in crystals and in solution: there are two oleates, one U-shaped ligand that positions the long hydrophobic chain deep within the cavity and another extended structure with the hydrophobic chain facing the cavity and the carboxylate group lying close to the protein surface. The NMR titration validated a prior hypothesis that the first oleate to enter the cavity occupies the internal protein site. In contrast, 1H/15N chemical shift changes supported only one liganded oleate for R122L/S124A LFABP, at an intermediate location within the protein cavity. A rationale based on protein sequence and electrostatics was developed to explain the stoichiometry and binding site trends for LFABPs and to put these findings into context within the larger protein family. PMID:21226535
Di Scala, Coralie; Fantini, Jacques
2017-01-01
In eukaryotic cells, cholesterol is an important regulator of a broad range of membrane proteins, including receptors, transporters, and ion channels. Understanding how cholesterol interacts with membrane proteins is a difficult task because structural data of these proteins complexed with cholesterol are scarce. Here, we describe a dual approach based on in silico studies of protein-cholesterol interactions, combined with physico-chemical measurements of protein insertion into cholesterol-containing monolayers. Our algorithm is validated through careful analysis of the effect of key mutations within and outside the predicted cholesterol-binding site. Our method is illustrated by a complete analysis of cholesterol-binding to Alzheimer's β-amyloid peptide, a protein that penetrates the plasma membrane of brain cells through a cholesterol-dependent process.
The PDB_REDO server for macromolecular structure model optimization.
Joosten, Robbie P; Long, Fei; Murshudov, Garib N; Perrakis, Anastassis
2014-07-01
The refinement and validation of a crystallographic structure model is the last step before the coordinates and the associated data are submitted to the Protein Data Bank (PDB). The success of the refinement procedure is typically assessed by validating the models against geometrical criteria and the diffraction data, and is an important step in ensuring the quality of the PDB public archive [Read et al. (2011 ▶), Structure, 19, 1395-1412]. The PDB_REDO procedure aims for 'constructive validation', aspiring to consistent and optimal refinement parameterization and pro-active model rebuilding, not only correcting errors but striving for optimal interpretation of the electron density. A web server for PDB_REDO has been implemented, allowing thorough, consistent and fully automated optimization of the refinement procedure in REFMAC and partial model rebuilding. The goal of the web server is to help practicing crystallo-graphers to improve their model prior to submission to the PDB. For this, additional steps were implemented in the PDB_REDO pipeline, both in the refinement procedure, e.g. testing of resolution limits and k-fold cross-validation for small test sets, and as new validation criteria, e.g. the density-fit metrics implemented in EDSTATS and ligand validation as implemented in YASARA. Innovative ways to present the refinement and validation results to the user are also described, which together with auto-generated Coot scripts can guide users to subsequent model inspection and improvement. It is demonstrated that using the server can lead to substantial improvement of structure models before they are submitted to the PDB.
Timely deposition of macromolecular structures is necessary for peer review
DOE Office of Scientific and Technical Information (OSTI.GOV)
Joosten, Robbie P.; Soueidan, Hayssam; Wessels, Lodewyk F. A.
2013-12-01
Deposition of crystallographic structures should be concurrent with or prior to manuscript submission for peer review, enabling validation and increasing reliability of the PDB. Most of the macromolecular structures in the Protein Data Bank (PDB), which are used daily by thousands of educators and scientists alike, are determined by X-ray crystallography. It was examined whether the crystallographic models and data were deposited to the PDB at the same time as the publications that describe them were submitted for peer review. This condition is necessary to ensure pre-publication validation and the quality of the PDB public archive. It was found thatmore » a significant proportion of PDB entries were submitted to the PDB after peer review of the corresponding publication started, and many were only submitted after peer review had ended. It is argued that clear description of journal policies and effective policing is important for pre-publication validation, which is key in ensuring the quality of the PDB and of peer-reviewed literature.« less
Wang, Yongcui; Chen, Shilong; Deng, Naiyang; Wang, Yong
2013-01-01
Computational inference of novel therapeutic values for existing drugs, i.e., drug repositioning, offers the great prospect for faster and low-risk drug development. Previous researches have indicated that chemical structures, target proteins, and side-effects could provide rich information in drug similarity assessment and further disease similarity. However, each single data source is important in its own way and data integration holds the great promise to reposition drug more accurately. Here, we propose a new method for drug repositioning, PreDR (Predict Drug Repositioning), to integrate molecular structure, molecular activity, and phenotype data. Specifically, we characterize drug by profiling in chemical structure, target protein, and side-effects space, and define a kernel function to correlate drugs with diseases. Then we train a support vector machine (SVM) to computationally predict novel drug-disease interactions. PreDR is validated on a well-established drug-disease network with 1,933 interactions among 593 drugs and 313 diseases. By cross-validation, we find that chemical structure, drug target, and side-effects information are all predictive for drug-disease relationships. More experimentally observed drug-disease interactions can be revealed by integrating these three data sources. Comparison with existing methods demonstrates that PreDR is competitive both in accuracy and coverage. Follow-up database search and pathway analysis indicate that our new predictions are worthy of further experimental validation. Particularly several novel predictions are supported by clinical trials databases and this shows the significant prospects of PreDR in future drug treatment. In conclusion, our new method, PreDR, can serve as a useful tool in drug discovery to efficiently identify novel drug-disease interactions. In addition, our heterogeneous data integration framework can be applied to other problems. PMID:24244318
Bührmann, Mike; Wiedemann, Bianca M.; Müller, Matthias P.; Hardick, Julia; Ecke, Maria
2017-01-01
In protein kinase research, identifying and addressing small molecule binding sites other than the highly conserved ATP-pocket are of intense interest because this line of investigation extends our understanding of kinase function beyond the catalytic phosphotransfer. Such alternative binding sites may be involved in altering the activation state through subtle conformational changes, control cellular enzyme localization, or in mediating and disrupting protein-protein interactions. Small organic molecules that target these less conserved regions might serve as tools for chemical biology research and to probe alternative strategies in targeting protein kinases in disease settings. Here, we present the structure-based design and synthesis of a focused library of 2-arylquinazoline derivatives to target the lipophilic C-terminal binding pocket in p38α MAPK, for which a clear biological function has yet to be identified. The interactions of the ligands with p38α MAPK was analyzed by SPR measurements and validated by protein X-ray crystallography. PMID:28892510
Similarity Measures for Protein Ensembles
Lindorff-Larsen, Kresten; Ferkinghoff-Borg, Jesper
2009-01-01
Analyses of similarities and changes in protein conformation can provide important information regarding protein function and evolution. Many scores, including the commonly used root mean square deviation, have therefore been developed to quantify the similarities of different protein conformations. However, instead of examining individual conformations it is in many cases more relevant to analyse ensembles of conformations that have been obtained either through experiments or from methods such as molecular dynamics simulations. We here present three approaches that can be used to compare conformational ensembles in the same way as the root mean square deviation is used to compare individual pairs of structures. The methods are based on the estimation of the probability distributions underlying the ensembles and subsequent comparison of these distributions. We first validate the methods using a synthetic example from molecular dynamics simulations. We then apply the algorithms to revisit the problem of ensemble averaging during structure determination of proteins, and find that an ensemble refinement method is able to recover the correct distribution of conformations better than standard single-molecule refinement. PMID:19145244
Duff, Anthony P.; Durand, Dominique; Gabel, Frank; Hendrickson, Wayne A.; Hura, Greg L.; Jacques, David A.; Kirby, Nigel M.; Kwan, Ann H.; Pérez, Javier; Pollack, Lois; Ryan, Timothy M.; Sali, Andrej; Schneidman-Duhovny, Dina; Vachette, Patrice; Westbrook, John
2017-01-01
In 2012, preliminary guidelines were published addressing sample quality, data acquisition and reduction, presentation of scattering data and validation, and modelling for biomolecular small-angle scattering (SAS) experiments. Biomolecular SAS has since continued to grow and authors have increasingly adopted the preliminary guidelines. In parallel, integrative/hybrid determination of biomolecular structures is a rapidly growing field that is expanding the scope of structural biology. For SAS to contribute maximally to this field, it is essential to ensure open access to the information required for evaluation of the quality of SAS samples and data, as well as the validity of SAS-based structural models. To this end, the preliminary guidelines for data presentation in a publication are reviewed and updated, and the deposition of data and associated models in a public archive is recommended. These guidelines and recommendations have been prepared in consultation with the members of the International Union of Crystallography (IUCr) Small-Angle Scattering and Journals Commissions, the Worldwide Protein Data Bank (wwPDB) Small-Angle Scattering Validation Task Force and additional experts in the field. PMID:28876235
Chen, Meili; Hu, Yibo; Liu, Jingxing; Wu, Qi; Zhang, Chenglin; Yu, Jun; Xiao, Jingfa; Wei, Fuwen; Wu, Jiayan
2015-12-11
High-quality and complete gene models are the basis of whole genome analyses. The giant panda (Ailuropoda melanoleuca) genome was the first genome sequenced on the basis of solely short reads, but the genome annotation had lacked the support of transcriptomic evidence. In this study, we applied RNA-seq to globally improve the genome assembly completeness and to detect novel expressed transcripts in 12 tissues from giant pandas, by using a transcriptome reconstruction strategy that combined reference-based and de novo methods. Several aspects of genome assembly completeness in the transcribed regions were effectively improved by the de novo assembled transcripts, including genome scaffolding, the detection of small-size assembly errors, the extension of scaffold/contig boundaries, and gap closure. Through expression and homology validation, we detected three groups of novel full-length protein-coding genes. A total of 12.62% of the novel protein-coding genes were validated by proteomic data. GO annotation analysis showed that some of the novel protein-coding genes were involved in pigmentation, anatomical structure formation and reproduction, which might be related to the development and evolution of the black-white pelage, pseudo-thumb and delayed embryonic implantation of giant pandas. The updated genome annotation will help further giant panda studies from both structural and functional perspectives.
GAP Final Technical Report 12-14-04
DOE Office of Scientific and Technical Information (OSTI.GOV)
Andrew J. Bordner, PhD, Senior Research Scientist
2004-12-14
The Genomics Annotation Platform (GAP) was designed to develop new tools for high throughput functional annotation and characterization of protein sequences and structures resulting from genomics and structural proteomics, benchmarking and application of those tools. Furthermore, this platform integrated the genomic scale sequence and structural analysis and prediction tools with the advanced structure prediction and bioinformatics environment of ICM. The development of GAP was primarily oriented towards the annotation of new biomolecular structures using both structural and sequence data. Even though the amount of protein X-ray crystal data is growing exponentially, the volume of sequence data is growing even moremore » rapidly. This trend was exploited by leveraging the wealth of sequence data to provide functional annotation for protein structures. The additional information provided by GAP is expected to assist the majority of the commercial users of ICM, who are involved in drug discovery, in identifying promising drug targets as well in devising strategies for the rational design of therapeutics directed at the protein of interest. The GAP also provided valuable tools for biochemistry education, and structural genomics centers. In addition, GAP incorporates many novel prediction and analysis methods not available in other molecular modeling packages. This development led to signing the first Molsoft agreement in the structural genomics annotation area with the University of oxford Structural Genomics Center. This commercial agreement validated the Molsoft efforts under the GAP project and provided the basis for further development of the large scale functional annotation platform.« less
Membrane and Protein Interactions of the Pleckstrin Homology Domain Superfamily
Lenoir, Marc; Kufareva, Irina; Abagyan, Ruben; Overduin, Michael
2015-01-01
The human genome encodes about 285 proteins that contain at least one annotated pleckstrin homology (PH) domain. As the first phosphoinositide binding module domain to be discovered, the PH domain recruits diverse protein architectures to cellular membranes. PH domains constitute one of the largest protein superfamilies, and have diverged to regulate many different signaling proteins and modules such as Dbl homology (DH) and Tec homology (TH) domains. The ligands of approximately 70 PH domains have been validated by binding assays and complexed structures, allowing meaningful extrapolation across the entire superfamily. Here the Membrane Optimal Docking Area (MODA) program is used at a genome-wide level to identify all membrane docking PH structures and map their lipid-binding determinants. In addition to the linear sequence motifs which are employed for phosphoinositide recognition, the three dimensional structural features that allow peripheral membrane domains to approach and insert into the bilayer are pinpointed and can be predicted ab initio. The analysis shows that conserved structural surfaces distinguish which PH domains associate with membrane from those that do not. Moreover, the results indicate that lipid-binding PH domains can be classified into different functional subgroups based on the type of membrane insertion elements they project towards the bilayer. PMID:26512702
Rajesh, Durairaj; Muthukumar, Subramanian; Saibaba, Ganesan; Siva, Durairaj; Akbarsha, Mohammad Abdulkader; Gulyás, Balázs; Padmanabhan, Parasuraman; Archunan, Govindaraju
2016-01-01
Transportation of pheromones bound with carrier proteins belonging to lipocalin superfamily is known to prolong chemo-signal communication between individuals belonging to the same species. Members of lipocalin family (MLF) proteins have three structurally conserved motifs for delivery of hydrophobic molecules to the specific recognizer. However, computational analyses are critically required to validate and emphasize the sequence and structural annotation of MLF. This study focused to elucidate the evolution, structural documentation, stability and binding efficiency of estrus urinary lipocalin protein (EULP) with endogenous pheromones adopting in-silico and fluorescence study. The results revealed that: (i) EULP perhaps originated from fatty acid binding protein (FABP) revealed in evolutionary analysis; (ii) Dynamic simulation study shows that EULP is highly stable at below 0.45 Å of root mean square deviation (RMSD); (iii) Docking evaluation shows that EULP has higher binding energy with farnesol and 2-iso-butyl-3-methoxypyrazine (IBMP) than 2-naphthol; and (iv) Competitive binding and quenching assay revealed that purified EULP has good binding interaction with farnesol. Both, In-silico and experimental studies showed that EULP is an efficient binding partner to pheromones. The present study provides impetus to create a point mutation for increasing longevity of EULP to develop pheromone trap for rodent pest management. PMID:27782155
Rajesh, Durairaj; Muthukumar, Subramanian; Saibaba, Ganesan; Siva, Durairaj; Akbarsha, Mohammad Abdulkader; Gulyás, Balázs; Padmanabhan, Parasuraman; Archunan, Govindaraju
2016-10-26
Transportation of pheromones bound with carrier proteins belonging to lipocalin superfamily is known to prolong chemo-signal communication between individuals belonging to the same species. Members of lipocalin family (MLF) proteins have three structurally conserved motifs for delivery of hydrophobic molecules to the specific recognizer. However, computational analyses are critically required to validate and emphasize the sequence and structural annotation of MLF. This study focused to elucidate the evolution, structural documentation, stability and binding efficiency of estrus urinary lipocalin protein (EULP) with endogenous pheromones adopting in-silico and fluorescence study. The results revealed that: (i) EULP perhaps originated from fatty acid binding protein (FABP) revealed in evolutionary analysis; (ii) Dynamic simulation study shows that EULP is highly stable at below 0.45 Å of root mean square deviation (RMSD); (iii) Docking evaluation shows that EULP has higher binding energy with farnesol and 2-iso-butyl-3-methoxypyrazine (IBMP) than 2-naphthol; and (iv) Competitive binding and quenching assay revealed that purified EULP has good binding interaction with farnesol. Both, In-silico and experimental studies showed that EULP is an efficient binding partner to pheromones. The present study provides impetus to create a point mutation for increasing longevity of EULP to develop pheromone trap for rodent pest management.
Protein collapse is encoded in the folded state architecture.
Samanta, Himadri S; Zhuravlev, Pavel I; Hinczewski, Michael; Hori, Naoto; Chakrabarti, Shaon; Thirumalai, D
2017-05-21
Folded states of single domain globular proteins are compact with high packing density. The radius of gyration, R g , of both the folded and unfolded states increase as N ν where N is the number of amino acids in the protein. The values of the Flory exponent ν are, respectively, ≈⅓ and ≈0.6 in the folded and unfolded states, coinciding with those for homopolymers. However, the extent of compaction of the unfolded state of a protein under low denaturant concentration (collapsibility), conditions favoring the formation of the folded state, is unknown. We develop a theory that uses the contact map of proteins as input to quantitatively assess collapsibility of proteins. Although collapsibility is universal, the propensity to be compact depends on the protein architecture. Application of the theory to over two thousand proteins shows that collapsibility depends not only on N but also on the contact map reflecting the native structure. A major prediction of the theory is that β-sheet proteins are far more collapsible than structures dominated by α-helices. The theory and the accompanying simulations, validating the theoretical predictions, provide insights into the differing conclusions reached using different experimental probes assessing the extent of compaction of proteins. By calculating the criterion for collapsibility as a function of protein length we provide quantitative insights into the reasons why single domain proteins are small and the physical reasons for the origin of multi-domain proteins. Collapsibility of non-coding RNA molecules is similar β-sheet proteins structures adding support to "Compactness Selection Hypothesis".
Multiscale weighted colored graphs for protein flexibility and rigidity analysis
NASA Astrophysics Data System (ADS)
Bramer, David; Wei, Guo-Wei
2018-02-01
Protein structural fluctuation, measured by Debye-Waller factors or B-factors, is known to correlate to protein flexibility and function. A variety of methods has been developed for protein Debye-Waller factor prediction and related applications to domain separation, docking pose ranking, entropy calculation, hinge detection, stability analysis, etc. Nevertheless, none of the current methodologies are able to deliver an accuracy of 0.7 in terms of the Pearson correlation coefficients averaged over a large set of proteins. In this work, we introduce a paradigm-shifting geometric graph model, multiscale weighted colored graph (MWCG), to provide a new generation of computational algorithms to significantly change the current status of protein structural fluctuation analysis. Our MWCG model divides a protein graph into multiple subgraphs based on interaction types between graph nodes and represents the protein rigidity by generalized centralities of subgraphs. MWCGs not only predict the B-factors of protein residues but also accurately analyze the flexibility of all atoms in a protein. The MWCG model is validated over a number of protein test sets and compared with many standard methods. An extensive numerical study indicates that the proposed MWCG offers an accuracy of over 0.8 and thus provides perhaps the first reliable method for estimating protein flexibility and B-factors. It also simultaneously predicts all-atom flexibility in a molecule.
A Parametric Rosetta Energy Function Analysis with LK Peptides on SAM Surfaces.
Lubin, Joseph H; Pacella, Michael S; Gray, Jeffrey J
2018-05-08
Although structures have been determined for many soluble proteins and an increasing number of membrane proteins, experimental structure determination methods are limited for complexes of proteins and solid surfaces. An economical alternative or complement to experimental structure determination is molecular simulation. Rosetta is one software suite that models protein-surface interactions, but Rosetta is normally benchmarked on soluble proteins. For surface interactions, the validity of the energy function is uncertain because it is a combination of independent parameters from energy functions developed separately for solution proteins and mineral surfaces. Here, we assess the performance of the RosettaSurface algorithm and test the accuracy of its energy function by modeling the adsorption of leucine/lysine (LK)-repeat peptides on methyl- and carboxy-terminated self-assembled monolayers (SAMs). We investigated how RosettaSurface predictions for this system compare with the experimental results, which showed that on both surfaces, LK-α peptides folded into helices and LK-β peptides held extended structures. Utilizing this model system, we performed a parametric analysis of Rosetta's Talaris energy function and determined that adjusting solvation parameters offered improved predictive accuracy. Simultaneously increasing lysine carbon hydrophilicity and the hydrophobicity of the surface methyl head groups yielded computational predictions most closely matching the experimental results. De novo models still should be interpreted skeptically unless bolstered in an integrative approach with experimental data.
Internal protein motions in molecular-dynamics simulations of Bragg and diffuse X-ray scattering.
Wall, Michael E
2018-03-01
Molecular-dynamics (MD) simulations of Bragg and diffuse X-ray scattering provide a means of obtaining experimentally validated models of protein conformational ensembles. This paper shows that compared with a single periodic unit-cell model, the accuracy of simulating diffuse scattering is increased when the crystal is modeled as a periodic supercell consisting of a 2 × 2 × 2 layout of eight unit cells. The MD simulations capture the general dependence of correlations on the separation of atoms. There is substantial agreement between the simulated Bragg reflections and the crystal structure; there are local deviations, however, indicating both the limitation of using a single structure to model disordered regions of the protein and local deviations of the average structure away from the crystal structure. Although it was anticipated that a simulation of longer duration might be required to achieve maximal agreement of the diffuse scattering calculation with the data using the supercell model, only a microsecond is required, the same as for the unit cell. Rigid protein motions only account for a minority fraction of the variation in atom positions from the simulation. The results indicate that protein crystal dynamics may be dominated by internal motions rather than packing interactions, and that MD simulations can be combined with Bragg and diffuse X-ray scattering to model the protein conformational ensemble.
Plasmonic ruler on field-effect devices for kinase drug discovery applications.
Bhalla, Nikhil; Formisano, Nello; Miodek, Anna; Jain, Aditya; Di Lorenzo, Mirella; Pula, Giordano; Estrela, Pedro
2015-09-15
Protein kinases are cellular switches that mediate phosphorylation of proteins. Abnormal phosphorylation of proteins is associated with lethal diseases such as cancer. In the pharmaceutical industry, protein kinases have become an important class of drug targets. This study reports a versatile approach for the detection of protein phosphorylation. The change in charge of the myelin basic protein upon phosphorylation by the protein kinase C-alpha (PKC-α) in the presence of adenosine 5'-[γ-thio] triphosphate (ATP-S) was detected on gold metal-insulator-semiconductor (Au-MIS) capacitor structures. Gold nanoparticles (AuNPs) can then be attached to the thio-phosphorylated proteins, forming a Au-film/AuNP plasmonic couple. This was detected by a localized surface plasmon resonance (LSPR) technique alongside MIS capacitance. All reactions were validated using surface plasmon resonance technique and the interaction of AuNPs with the thio-phosphorylated proteins quantified by quartz crystal microbalance. The plasmonic coupling was also visualized by simulations using finite element analysis. The use of this approach in drug discovery applications was demonstrated by evaluating the response in the presence of a known inhibitor of PKC-α kinase. LSPR and MIS on a single platform act as a cross check mechanism for validating kinase activity and make the system robust to test novel inhibitors. Copyright © 2015 Elsevier B.V. All rights reserved.
Cornilescu, Gabriel; Lee, Byeong Ryong; Cornilescu, Claudia C; Wang, Guangshun; Peterkofsky, Alan; Clore, G Marius
2002-11-01
The solution structure of the complex between the cytoplasmic A domain (IIA(Mtl)) of the mannitol transporter II(Mannitol) and the histidine-containing phosphocarrier protein (HPr) of the Escherichia coli phosphotransferase system has been solved by NMR, including the use of conjoined rigid body/torsion angle dynamics, and residual dipolar couplings, coupled with cross-validation, to permit accurate orientation of the two proteins. A convex surface on HPr, formed by helices 1 and 2, interacts with a complementary concave depression on the surface of IIA(Mtl) formed by helix 3, portions of helices 2 and 4, and beta-strands 2 and 3. The majority of intermolecular contacts are hydrophobic, with a small number of electrostatic interactions at the periphery of the interface. The active site histidines, His-15 of HPr and His-65 of IIA(Mtl), are in close spatial proximity, and a pentacoordinate phosphoryl transition state can be readily accommodated with no change in protein-protein orientation and only minimal perturbations of the backbone immediately adjacent to the histidines. Comparison with two previously solved structures of complexes of HPr with partner proteins of the phosphotransferase system, the N-terminal domain of enzyme I (EIN) and enzyme IIA(Glucose) (IIA(Glc)), reveals a number of common features despite the fact that EIN, IIA(Glc), and IIA(Mtl) bear no structural resemblance to one another. Thus, entirely different underlying structural elements can form binding surfaces for HPr that are similar in terms of both shape and residue composition. These structural comparisons illustrate the roles of surface and residue complementarity, redundancy, incremental build-up of specificity and conformational side chain plasticity in the formation of transient specific protein-protein complexes in signal transduction pathways.
Bera, Krishnendu; Rani, Priyanka; Kishor, Gaurav; Agarwal, Shikha; Kumar, Antresh; Singh, Durg Vijay
2017-09-20
ATP-Binding cassette (ABC) transporters play an extensive role in the translocation of diverse sets of biologically important molecules across membrane. EchnocandinB (antifungal) and EcdL protein of Aspergillus rugulosus are encoded by the same cluster of genes. Co-expression of EcdL and echinocandinB reflects tightly linked biological functions. EcdL belongs to Multidrug Resistance associated Protein (MRP) subfamily of ABC transporters with an extra transmembrane domain zero (TMD0). Complete structure of MRP subfamily comprising of TMD0 domain, at atomic resolution is not known. We hypothesized that the transportation of echonocandinB is mediated via EcdL protein. Henceforth, it is pertinent to know the topological arrangement of TMD0, with other domains of protein and its possible role in transportation of echinocandinB. Absence of effective template for TMD0 domain lead us to model by I-TASSER, further structure has been refined by multiple template modelling using homologous templates of remaining domains (TMD1, NBD1, TMD2, NBD2). The modelled structure has been validated for packing, folding and stereochemical properties. MD simulation for 0.1 μs has been carried out in the biphasic environment for refinement of modelled protein. Non-redundant structures have been excavated by clustering of MD trajectory. The structural alignment of modelled structure has shown Z-score -37.9; 31.6, 31.5 with RMSD; 2.4, 4.2, 4.8 with ABC transporters; PDB ID 4F4C, 4M1 M, 4M2T, respectively, reflecting the correctness of structure. EchinocandinB has been docked to the modelled as well as to the clustered structures, which reveals interaction of echinocandinB with TMD0 and other TM helices in the translocation path build of TMDs.
Bhoir, Siddhant; Shaik, Althaf; Thiruvenkatam, Vijay; Kirubakaran, Sivapriya
2018-03-19
Human Tousled-like kinases (TLKs) are highly conserved serine/threonine protein kinases responsible for cell proliferation, DNA repair, and genome surveillance. Their possible involvement in cancer via efficient DNA repair mechanisms have made them clinically relevant molecular targets for anticancer therapy. Innovative approaches in chemical biology have played a key role in validating the importance of kinases as molecular targets. However, the detailed understanding of the protein structure and the mechanisms of protein-drug interaction through biochemical and biophysical techniques demands a method for the production of an active protein of exceptional stability and purity on a large scale. We have designed a bacterial expression system to express and purify biologically active, wild-type Human Tousled-like Kinase 1B (hTLK1B) by co-expression with the protein phosphatase from bacteriophage λ. We have obtained remarkably high amounts of the soluble and homogeneously dephosphorylated form of biologically active hTLK1B with our unique, custom-built vector design strategy. The recombinant hTLK1B can be used for the structural studies and may further facilitate the development of new TLK inhibitors for anti-cancer therapy using a structure-based drug design approach.
Prediction of pi-turns in proteins using PSI-BLAST profiles and secondary structure information.
Wang, Yan; Xue, Zhi-Dong; Shi, Xiao-Hong; Xu, Jin
2006-09-01
Due to the structural and functional importance of tight turns, some methods have been proposed to predict gamma-turns, beta-turns, and alpha-turns in proteins. In the past, studies of pi-turns were made, but not a single prediction approach has been developed so far. It will be useful to develop a method for identifying pi-turns in a protein sequence. In this paper, the support vector machine (SVM) method has been introduced to predict pi-turns from the amino acid sequence. The training and testing of this approach is performed with a newly collected data set of 640 non-homologous protein chains containing 1931 pi-turns. Different sequence encoding schemes have been explored in order to investigate their effects on the prediction performance. With multiple sequence alignment and predicted secondary structure, the final SVM model yields a Matthews correlation coefficient (MCC) of 0.556 by a 7-fold cross-validation. A web server implementing the prediction method is available at the following URL: http://210.42.106.80/piturn/.
NASA Astrophysics Data System (ADS)
Miao, Xijiang; Mukhopadhyay, Rishi; Valafar, Homayoun
2008-10-01
Advances in NMR instrumentation and pulse sequence design have resulted in easier acquisition of Residual Dipolar Coupling (RDC) data. However, computational and theoretical analysis of this type of data has continued to challenge the international community of investigators because of their complexity and rich information content. Contemporary use of RDC data has required a-priori assignment, which significantly increases the overall cost of structural analysis. This article introduces a novel algorithm that utilizes unassigned RDC data acquired from multiple alignment media ( nD-RDC, n ⩾ 3) for simultaneous extraction of the relative order tensor matrices and reconstruction of the interacting vectors in space. Estimation of the relative order tensors and reconstruction of the interacting vectors can be invaluable in a number of endeavors. An example application has been presented where the reconstructed vectors have been used to quantify the fitness of a template protein structure to the unknown protein structure. This work has other important direct applications such as verification of the novelty of an unknown protein and validation of the accuracy of an available protein structure model in drug design. More importantly, the presented work has the potential to bridge the gap between experimental and computational methods of structure determination.
Bargiello, Thaddeus A; Oh, Seunghoon; Tang, Qingxiu; Bargiello, Nicholas K; Dowd, Terry L; Kwon, Taekyung
2018-01-01
Voltage is an important physiologic regulator of channels formed by the connexin gene family. Connexins are unique among ion channels in that both plasma membrane inserted hemichannels (undocked hemichannels) and intercellular channels (aggregates of which form gap junctions) have important physiological roles. The hemichannel is the fundamental unit of gap junction voltage-gating. Each hemichannel displays two distinct voltage-gating mechanisms that are primarily sensitive to a voltage gradient formed along the length of the channel pore (the transjunctional voltage) rather than sensitivity to the absolute membrane potential (V m or V i-o ). These transjunctional voltage dependent processes have been termed V j - or fast-gating and loop- or slow-gating. Understanding the mechanism of voltage-gating, defined as the sequence of voltage-driven transitions that connect open and closed states, first and foremost requires atomic resolution models of the end states. Although ion channels formed by connexins were among the first to be characterized structurally by electron microscopy and x-ray diffraction in the early 1980's, subsequent progress has been slow. Much of the current understanding of the structure-function relations of connexin channels is based on two crystal structures of Cx26 gap junction channels. Refinement of crystal structure by all-atom molecular dynamics and incorporation of charge changing protein modifications has resulted in an atomic model of the open state that arguably corresponds to the physiologic open state. Obtaining validated atomic models of voltage-dependent closed states is more challenging, as there are currently no methods to solve protein structure while a stable voltage gradient is applied across the length of an oriented channel. It is widely believed that the best approach to solve the atomic structure of a voltage-gated closed ion channel is to apply different but complementary experimental and computational methods and to use the resulting information to derive a consensus atomic structure that is then subjected to rigorous validation. In this paper, we summarize our efforts to obtain and validate atomic models of the open and voltage-driven closed states of undocked connexin hemichannels. This article is part of a Special Issue entitled: Gap Junction Proteins edited by Jean Claude Herve. Copyright © 2017 Elsevier B.V. All rights reserved.
Tsai, Keng-Chang; Jian, Jhih-Wei; Yang, Ei-Wen; Hsu, Po-Chiang; Peng, Hung-Pin; Chen, Ching-Tai; Chen, Jun-Bo; Chang, Jeng-Yih; Hsu, Wen-Lian; Yang, An-Suei
2012-01-01
Non-covalent protein-carbohydrate interactions mediate molecular targeting in many biological processes. Prediction of non-covalent carbohydrate binding sites on protein surfaces not only provides insights into the functions of the query proteins; information on key carbohydrate-binding residues could suggest site-directed mutagenesis experiments, design therapeutics targeting carbohydrate-binding proteins, and provide guidance in engineering protein-carbohydrate interactions. In this work, we show that non-covalent carbohydrate binding sites on protein surfaces can be predicted with relatively high accuracy when the query protein structures are known. The prediction capabilities were based on a novel encoding scheme of the three-dimensional probability density maps describing the distributions of 36 non-covalent interacting atom types around protein surfaces. One machine learning model was trained for each of the 30 protein atom types. The machine learning algorithms predicted tentative carbohydrate binding sites on query proteins by recognizing the characteristic interacting atom distribution patterns specific for carbohydrate binding sites from known protein structures. The prediction results for all protein atom types were integrated into surface patches as tentative carbohydrate binding sites based on normalized prediction confidence level. The prediction capabilities of the predictors were benchmarked by a 10-fold cross validation on 497 non-redundant proteins with known carbohydrate binding sites. The predictors were further tested on an independent test set with 108 proteins. The residue-based Matthews correlation coefficient (MCC) for the independent test was 0.45, with prediction precision and sensitivity (or recall) of 0.45 and 0.49 respectively. In addition, 111 unbound carbohydrate-binding protein structures for which the structures were determined in the absence of the carbohydrate ligands were predicted with the trained predictors. The overall prediction MCC was 0.49. Independent tests on anti-carbohydrate antibodies showed that the carbohydrate antigen binding sites were predicted with comparable accuracy. These results demonstrate that the predictors are among the best in carbohydrate binding site predictions to date. PMID:22848404
Slattery, Scott D; Hahn, Klaus M
2014-12-01
Biosensors are valuable tools used to monitor many different protein behaviors in vivo. Demand for new biosensors is high, but their development and characterization can be difficult. During biosensor design, it is necessary to evaluate the effects of different biosensor structures on specificity, brightness, and fluorescence responses. By co-expressing the biosensor with upstream proteins that either stimulate or inhibit the activity reported by the biosensor, one can determine the difference between the biosensor's maximally activated and inactivated state, and examine response to specific proteins. We describe here a method for biosensor validation in a 96-well plate format using an automated microscope. This protocol produces dose-response curves, enables efficient examination of many parameters, and unlike cell suspension assays, allows visual inspection (e.g., for cell health and biosensor or regulator localization). Optimization of single-chain and dual-chain Rho GTPase biosensors is addressed, but the assay is applicable to any biosensor that can be expressed or otherwise loaded in adherent cells. The assay can also be used for purposes other than biosensor validation, using a well-characterized biosensor as a readout for effects of upstream molecules. Copyright © 2014 John Wiley & Sons, Inc.
Protocols for efficient simulations of long-time protein dynamics using coarse-grained CABS model.
Jamroz, Michal; Kolinski, Andrzej; Kmiecik, Sebastian
2014-01-01
Coarse-grained (CG) modeling is a well-acknowledged simulation approach for getting insight into long-time scale protein folding events at reasonable computational cost. Depending on the design of a CG model, the simulation protocols vary from highly case-specific-requiring user-defined assumptions about the folding scenario-to more sophisticated blind prediction methods for which only a protein sequence is required. Here we describe the framework protocol for the simulations of long-term dynamics of globular proteins, with the use of the CABS CG protein model and sequence data. The simulations can start from a random or a selected (e.g., native) structure. The described protocol has been validated using experimental data for protein folding model systems-the prediction results agreed well with the experimental results.
A network of molecular switches controls the activation of the two-component response regulator NtrC
NASA Astrophysics Data System (ADS)
Vanatta, Dan K.; Shukla, Diwakar; Lawrenz, Morgan; Pande, Vijay S.
2015-06-01
Recent successes in simulating protein structure and folding dynamics have demonstrated the power of molecular dynamics to predict the long timescale behaviour of proteins. Here, we extend and improve these methods to predict molecular switches that characterize conformational change pathways between the active and inactive state of nitrogen regulatory protein C (NtrC). By employing unbiased Markov state model-based molecular dynamics simulations, we construct a dynamic picture of the activation pathways of this key bacterial signalling protein that is consistent with experimental observations and predicts new mutants that could be used for validation of the mechanism. Moreover, these results suggest a novel mechanistic paradigm for conformational switching.
Expanding the Interactome of TES by Exploiting TES Modules with Different Subcellular Localizations.
Sala, Stefano; Van Troys, Marleen; Medves, Sandrine; Catillon, Marie; Timmerman, Evy; Staes, An; Schaffner-Reckinger, Elisabeth; Gevaert, Kris; Ampe, Christophe
2017-05-05
The multimodular nature of many eukaryotic proteins underlies their temporal or spatial engagement in a range of protein cocomplexes. Using the multimodule protein testin (TES), we here report a proteomics approach to increase insight in cocomplex diversity. The LIM-domain containing and tumor suppressor protein TES is present at different actin cytoskeleton adhesion structures in cells and influences cell migration, adhesion and spreading. TES module accessibility has been proposed to vary due to conformational switching and variants of TES lacking specific domains target to different subcellular locations. By applying iMixPro AP-MS ("intelligent Mixing of Proteomes"-affinity purification-mass spectrometry) to a set of tagged-TES modular variants, we identified proteins residing in module-specific cocomplexes. The obtained distinct module-specific interactomes combine to a global TES interactome that becomes more extensive and richer in information. Applying pathway analysis to the module interactomes revealed expected actin-related canonical pathways and also less expected pathways. We validated two new TES cocomplex partners: TGFB1I1 and a short form of the glucocorticoid receptor. TES and TGFB1I1 are shown to oppositely affect cell spreading providing biological validity for their copresence in complexes since they act in similar processes.
Polyphony: superposition independent methods for ensemble-based drug discovery.
Pitt, William R; Montalvão, Rinaldo W; Blundell, Tom L
2014-09-30
Structure-based drug design is an iterative process, following cycles of structural biology, computer-aided design, synthetic chemistry and bioassay. In favorable circumstances, this process can lead to the structures of hundreds of protein-ligand crystal structures. In addition, molecular dynamics simulations are increasingly being used to further explore the conformational landscape of these complexes. Currently, methods capable of the analysis of ensembles of crystal structures and MD trajectories are limited and usually rely upon least squares superposition of coordinates. Novel methodologies are described for the analysis of multiple structures of a protein. Statistical approaches that rely upon residue equivalence, but not superposition, are developed. Tasks that can be performed include the identification of hinge regions, allosteric conformational changes and transient binding sites. The approaches are tested on crystal structures of CDK2 and other CMGC protein kinases and a simulation of p38α. Known interaction - conformational change relationships are highlighted but also new ones are revealed. A transient but druggable allosteric pocket in CDK2 is predicted to occur under the CMGC insert. Furthermore, an evolutionarily-conserved conformational link from the location of this pocket, via the αEF-αF loop, to phosphorylation sites on the activation loop is discovered. New methodologies are described and validated for the superimposition independent conformational analysis of large collections of structures or simulation snapshots of the same protein. The methodologies are encoded in a Python package called Polyphony, which is released as open source to accompany this paper [http://wrpitt.bitbucket.org/polyphony/].
Steindl, Theodora M; Crump, Carolyn E; Hayden, Frederick G; Langer, Thierry
2005-10-06
The development and application of a sophisticated virtual screening and selection protocol to identify potential, novel inhibitors of the human rhinovirus coat protein employing various computer-assisted strategies are described. A large commercially available database of compounds was screened using a highly selective, structure-based pharmacophore model generated with the program Catalyst. A docking study and a principal component analysis were carried out within the software package Cerius and served to validate and further refine the obtained results. These combined efforts led to the selection of six candidate structures, for which in vitro anti-rhinoviral activity could be shown in a biological assay.
Higher Throughput Calorimetry: Opportunities, Approaches and Challenges
Recht, Michael I.; Coyle, Joseph E.; Bruce, Richard H.
2010-01-01
Higher throughput thermodynamic measurements can provide value in structure-based drug discovery during fragment screening, hit validation, and lead optimization. Enthalpy can be used to detect and characterize ligand binding, and changes that affect the interaction of protein and ligand can sometimes be detected more readily from changes in the enthalpy of binding than from the corresponding free-energy changes or from protein-ligand structures. Newer, higher throughput calorimeters are being incorporated into the drug discovery process. Improvements in titration calorimeters come from extensions of a mature technology and face limitations in scaling. Conversely, array calorimetry, an emerging technology, shows promise for substantial improvements in throughput and material utilization, but improved sensitivity is needed. PMID:20888754
2014-01-01
Background Identification of ligand-protein binding interactions is a critical step in drug discovery. Experimental screening of large chemical libraries, in spite of their specific role and importance in drug discovery, suffer from the disadvantages of being random, time-consuming and expensive. To accelerate the process, traditional structure- or ligand-based VLS approaches are combined with experimental high-throughput screening, HTS. Often a single protein or, at most, a protein family is considered. Large scale VLS benchmarking across diverse protein families is rarely done, and the reported success rate is very low. Here, we demonstrate the experimental HTS validation of a novel VLS approach, FINDSITEcomb, across a diverse set of medically-relevant proteins. Results For eight different proteins belonging to different fold-classes and from diverse organisms, the top 1% of FINDSITEcomb’s VLS predictions were tested, and depending on the protein target, 4%-47% of the predicted ligands were shown to bind with μM or better affinities. In total, 47 small molecule binders were identified. Low nanomolar (nM) binders for dihydrofolate reductase and protein tyrosine phosphatases (PTPs) and micromolar binders for the other proteins were identified. Six novel molecules had cytotoxic activity (<10 μg/ml) against the HCT-116 colon carcinoma cell line and one novel molecule had potent antibacterial activity. Conclusions We show that FINDSITEcomb is a promising new VLS approach that can assist drug discovery. PMID:24936211
Giollo, Manuel; Martin, Alberto J M; Walsh, Ian; Ferrari, Carlo; Tosatto, Silvio C E
2014-01-01
The rapid growth of un-annotated missense variants poses challenges requiring novel strategies for their interpretation. From the thermodynamic point of view, amino acid changes can lead to a change in the internal energy of a protein and induce structural rearrangements. This is of great relevance for the study of diseases and protein design, justifying the development of prediction methods for variant-induced stability changes. Here we propose NeEMO, a tool for the evaluation of stability changes using an effective representation of proteins based on residue interaction networks (RINs). RINs are used to extract useful features describing interactions of the mutant amino acid with its structural environment. Benchmarking shows NeEMO to be very effective, allowing reliable predictions in different parts of the protein such as β-strands and buried residues. Validation on a previously published independent dataset shows that NeEMO has a Pearson correlation coefficient of 0.77 and a standard error of 1 Kcal/mol, outperforming nine recent methods. The NeEMO web server can be freely accessed from URL: http://protein.bio.unipd.it/neemo/. NeEMO offers an innovative and reliable tool for the annotation of amino acid changes. A key contribution are RINs, which can be used for modeling proteins and their interactions effectively. Interestingly, the approach is very general, and can motivate the development of a new family of RIN-based protein structure analyzers. NeEMO may suggest innovative strategies for bioinformatics tools beyond protein stability prediction.
Cleland, Timothy P.; Schroeter, Elena R.; Zamdborg, Leonid; Zheng, Wenxia; Lee, Ji Eun; Tran, John C.; Bern, Marshall; Duncan, Michael B.; Lebleu, Valerie S.; Ahlf, Dorothy R.; Thomas, Paul M.; Kalluri, Raghu; Kelleher, Neil L.; Schweitzer, Mary H.
2016-01-01
Structures similar to blood vessels in location, morphology, flexibility, and transparency have been recovered after demineralization of multiple dinosaur cortical bone fragments from multiple specimens, some of which are as old as 80 Ma. These structures were hypothesized to be either endogenous to the bone (i.e., of vascular origin) or the result of biofilm colonizing the empty osteonal network after degradation of original organic components. Here, we test the hypothesis that these structures are endogenous and thus retain proteins in common with extant archosaur blood vessels that can be detected with high-resolution mass spectrometry and confirmed by immunofluorescence. Two lines of evidence support this hypothesis. First, peptide sequencing of Brachylophosaurus canadensis blood vessel extracts is consistent with peptides comprising extant archosaurian blood vessels and is not consistent with a bacterial, cellular slime mold, or fungal origin. Second, proteins identified by mass spectrometry can be localized to the tissues using antibodies specific to these proteins, validating their identity. Data are available via ProteomeXchange with identifier PXD001738. PMID:26595531
CANDO and the infinite drug discovery frontier
Minie, Mark; Chopra, Gaurav; Sethi, Geetika; Horst, Jeremy; White, George; Roy, Ambrish; Hatti, Kaushik; Samudrala, Ram
2014-01-01
The Computational Analysis of Novel Drug Opportunities (CANDO) platform (http://protinfo.org/cando) uses similarity of compound–proteome interaction signatures to infer homology of compound/drug behavior. We constructed interaction signatures for 3733 human ingestible compounds covering 48,278 protein structures mapping to 2030 indications based on basic science methodologies to predict and analyze protein structure, function, and interactions developed by us and others. Our signature comparison and ranking approach yielded benchmarking accuracies of 12–25% for 1439 indications with at least two approved compounds. We prospectively validated 49/82 ‘high value’ predictions from nine studies covering seven indications, with comparable or better activity to existing drugs, which serve as novel repurposed therapeutics. Our approach may be generalized to compounds beyond those approved by the FDA, and can also consider mutations in protein structures to enable personalization. Our platform provides a holistic multiscale modeling framework of complex atomic, molecular, and physiological systems with broader applications in medicine and engineering. PMID:24980786
Cleland, Timothy P; Schroeter, Elena R; Zamdborg, Leonid; Zheng, Wenxia; Lee, Ji Eun; Tran, John C; Bern, Marshall; Duncan, Michael B; Lebleu, Valerie S; Ahlf, Dorothy R; Thomas, Paul M; Kalluri, Raghu; Kelleher, Neil L; Schweitzer, Mary H
2015-12-04
Structures similar to blood vessels in location, morphology, flexibility, and transparency have been recovered after demineralization of multiple dinosaur cortical bone fragments from multiple specimens, some of which are as old as 80 Ma. These structures were hypothesized to be either endogenous to the bone (i.e., of vascular origin) or the result of biofilm colonizing the empty osteonal network after degradation of original organic components. Here, we test the hypothesis that these structures are endogenous and thus retain proteins in common with extant archosaur blood vessels that can be detected with high-resolution mass spectrometry and confirmed by immunofluorescence. Two lines of evidence support this hypothesis. First, peptide sequencing of Brachylophosaurus canadensis blood vessel extracts is consistent with peptides comprising extant archosaurian blood vessels and is not consistent with a bacterial, cellular slime mold, or fungal origin. Second, proteins identified by mass spectrometry can be localized to the tissues using antibodies specific to these proteins, validating their identity. Data are available via ProteomeXchange with identifier PXD001738.
Prediction of β-turns in proteins from multiple alignment using neural network
Kaur, Harpreet; Raghava, Gajendra Pal Singh
2003-01-01
A neural network-based method has been developed for the prediction of β-turns in proteins by using multiple sequence alignment. Two feed-forward back-propagation networks with a single hidden layer are used where the first-sequence structure network is trained with the multiple sequence alignment in the form of PSI-BLAST–generated position-specific scoring matrices. The initial predictions from the first network and PSIPRED-predicted secondary structure are used as input to the second structure-structure network to refine the predictions obtained from the first net. A significant improvement in prediction accuracy has been achieved by using evolutionary information contained in the multiple sequence alignment. The final network yields an overall prediction accuracy of 75.5% when tested by sevenfold cross-validation on a set of 426 nonhomologous protein chains. The corresponding Qpred, Qobs, and Matthews correlation coefficient values are 49.8%, 72.3%, and 0.43, respectively, and are the best among all the previously published β-turn prediction methods. The Web server BetaTPred2 (http://www.imtech.res.in/raghava/betatpred2/) has been developed based on this approach. PMID:12592033
Retinal Ligand Mobility Explains Internal Hydration and Reconciles Active Rhodopsin Structures
Leioatts, Nicholas; Mertz, Blake; Martínez-Mayorga, Karina; Romo, Tod D.; Pitman, Michael C.; Feller, Scott E.; Grossfield, Alan; Brown, Michael F.
2014-01-01
Rhodopsin, the mammalian dim-light receptor, is one of the best-characterized G-protein-coupled receptors, a pharmaceutically important class of membrane proteins that has garnered a great deal of attention because of the recent availability of structural information. Yet the mechanism of rhodopsin activation is not fully understood. Here, we use microsecond-scale all-atom molecular dynamics simulations, validated by solid-state 2H nuclear magnetic resonance spectroscopy, to understand the transition between the dark and metarhodopsin I (Meta I) states. Our analysis of these simulations reveals striking differences in ligand flexibility between the two states. Retinal is much more dynamic in Meta I, adopting an elongated conformation similar to that seen in the recent activelike crystal structures. Surprisingly, this elongation corresponds to both a dramatic influx of bulk water into the hydrophobic core of the protein and a concerted transition in the highly conserved Trp2656.48 residue. In addition, enhanced ligand flexibility upon light activation provides an explanation for the different retinal orientations observed in X-ray crystal structures of active rhodopsin. PMID:24328554
Kinact: a computational approach for predicting activating missense mutations in protein kinases.
Rodrigues, Carlos H M; Ascher, David B; Pires, Douglas E V
2018-05-21
Protein phosphorylation is tightly regulated due to its vital role in many cellular processes. While gain of function mutations leading to constitutive activation of protein kinases are known to be driver events of many cancers, the identification of these mutations has proven challenging. Here we present Kinact, a novel machine learning approach for predicting kinase activating missense mutations using information from sequence and structure. By adapting our graph-based signatures, Kinact represents both structural and sequence information, which are used as evidence to train predictive models. We show the combination of structural and sequence features significantly improved the overall accuracy compared to considering either primary or tertiary structure alone, highlighting their complementarity. Kinact achieved a precision of 87% and 94% and Area Under ROC Curve of 0.89 and 0.92 on 10-fold cross-validation, and on blind tests, respectively, outperforming well established tools (P < 0.01). We further show that Kinact performs equally well on homology models built using templates with sequence identity as low as 33%. Kinact is freely available as a user-friendly web server at http://biosig.unimelb.edu.au/kinact/.
Dias, David M; Ciulli, Alessio
2014-01-01
Nuclear magnetic resonance (NMR) spectroscopy is a pivotal method for structure-based and fragment-based lead discovery because it is one of the most robust techniques to provide information on protein structure, dynamics and interaction at an atomic level in solution. Nowadays, in most ligand screening cascades, NMR-based methods are applied to identify and structurally validate small molecule binding. These can be high-throughput and are often used synergistically with other biophysical assays. Here, we describe current state-of-the-art in the portfolio of available NMR-based experiments that are used to aid early-stage lead discovery. We then focus on multi-protein complexes as targets and how NMR spectroscopy allows studying of interactions within the high molecular weight assemblies that make up a vast fraction of the yet untargeted proteome. Finally, we give our perspective on how currently available methods could build an improved strategy for drug discovery against such challenging targets. Copyright © 2014 The Authors. Published by Elsevier Ltd.. All rights reserved.
The PDB_REDO server for macromolecular structure model optimization
Joosten, Robbie P.; Long, Fei; Murshudov, Garib N.; Perrakis, Anastassis
2014-01-01
The refinement and validation of a crystallographic structure model is the last step before the coordinates and the associated data are submitted to the Protein Data Bank (PDB). The success of the refinement procedure is typically assessed by validating the models against geometrical criteria and the diffraction data, and is an important step in ensuring the quality of the PDB public archive [Read et al. (2011 ▶), Structure, 19, 1395–1412]. The PDB_REDO procedure aims for ‘constructive validation’, aspiring to consistent and optimal refinement parameterization and pro-active model rebuilding, not only correcting errors but striving for optimal interpretation of the electron density. A web server for PDB_REDO has been implemented, allowing thorough, consistent and fully automated optimization of the refinement procedure in REFMAC and partial model rebuilding. The goal of the web server is to help practicing crystallographers to improve their model prior to submission to the PDB. For this, additional steps were implemented in the PDB_REDO pipeline, both in the refinement procedure, e.g. testing of resolution limits and k-fold cross-validation for small test sets, and as new validation criteria, e.g. the density-fit metrics implemented in EDSTATS and ligand validation as implemented in YASARA. Innovative ways to present the refinement and validation results to the user are also described, which together with auto-generated Coot scripts can guide users to subsequent model inspection and improvement. It is demonstrated that using the server can lead to substantial improvement of structure models before they are submitted to the PDB. PMID:25075342
Crystal structure of a designed, thermostable, heterotrimeric coiled coil.
Nautiyal, S.; Alber, T.
1999-01-01
Electrostatic interactions are often critical for determining the specificity of protein-protein complexes. To study the role of electrostatic interactions for assembly of helical bundles, we previously designed a thermostable, heterotrimeric coiled coil, ABC, in which charged residues were employed to drive preferential association of three distinct, 34-residue helices. To investigate the basis for heterotrimer specificity, we have used multiwavelength anomalous diffraction (MAD) analysis to determine the 1.8 A resolution crystal structure of ABC. The structure shows that ABC forms a heterotrimeric coiled coil with the intended arrangement of parallel chains. Over half of the ion pairs engineered to restrict helix associations were apparent in the experimental electron density map. As seen in other trimeric coiled coils, ABC displays acute knobs-into-holes packing and a buried anion coordinated by core polar amino acids. These interactions validate the design strategy and illustrate how packing and polar contacts determine structural uniqueness. PMID:10210186
Sugumar, Ramya; Adithavarman, Abhinand Ponneri; Dakshinamoorthi, Anusha; David, Darling Chellathai; Ragunath, Padmavathi Kannan
2016-03-01
Pneumocystis jirovecii is a fungus that causes Pneumocystis pneumonia in HIV and other immunosuppressed patients. Treatment of Pneumocystis pneumonia with the currently available antifungals is challenging and associated with considerable adverse effects. There is a need to develop drugs against novel targets with minimal human toxicities. Histone Acetyl Transferase (HAT) Rtt109 is a potential therapeutic target in Pneumocystis jirovecii species. HAT is linked to transcription and is required to acetylate conserved lysine residues on histone proteins by transferring an acetyl group from acetyl CoA to form e-N-acetyl lysine. Therefore, inhibitors of HAT can be useful therapeutic options in Pneumocystis pneumonia. To screen phytochemicals against (HAT) Rtt109 using bioinformatics tool. The tertiary structure of Pneumocystis jirovecii (HAT) Rtt109 was modeled by Homology Modeling. The ideal template for modeling was obtained by performing Psi BLAST of the protein sequence. Rtt109-AcCoA/Vps75 protein from Saccharomyces cerevisiae (PDB structure 3Q35) was chosen as the template. The target protein was modeled using Swiss Modeler and validated using Ramachandran plot and Errat 2. Comprehensive text mining was performed to identify phytochemical compounds with antipneumonia and fungicidal properties and these compounds were filtered based on Lipinski's Rule of 5. The chosen compounds were subjected to virtual screening against the target protein (HAT) Rtt109 using Molegro Virtual Docker 4.5. Osiris Property Explorer and Open Tox Server were used to predict ADME-T properties of the chosen phytochemicals. Tertiary structure model of HAT Rtt 109 had a ProSA score of -6.57 and Errat 2 score of 87.34. Structure validation analysis by Ramachandran plot for the model revealed 97% of amino acids were in the favoured region. Of all the phytochemicals subjected to virtual screening against the target protein (HAT) Rtt109, baicalin exhibited highest binding affinity towards the target protein as indicated by the Molegro score of 130.68 and formed 16 H-bonds. The ADME-T property prediction revealed that baicalin was non-mutagenic, non-tumorigenic and had a drug likeness score of 0.87. Baicalin has good binding with Rtt 109 in Pneumocystis jirovecii and can be considered as a novel and valuable treatment option for Pneumocystis pneumonia patients after subjecting it to invivo and invitro studies.
Adithavarman, Abhinand Ponneri; Dakshinamoorthi, Anusha; David, Darling Chellathai; Ragunath, Padmavathi Kannan
2016-01-01
Introduction Pneumocystis jirovecii is a fungus that causes Pneumocystis pneumonia in HIV and other immunosuppressed patients. Treatment of Pneumocystis pneumonia with the currently available antifungals is challenging and associated with considerable adverse effects. There is a need to develop drugs against novel targets with minimal human toxicities. Histone Acetyl Transferase (HAT) Rtt109 is a potential therapeutic target in Pneumocystis jirovecii species. HAT is linked to transcription and is required to acetylate conserved lysine residues on histone proteins by transferring an acetyl group from acetyl CoA to form e-N-acetyl lysine. Therefore, inhibitors of HAT can be useful therapeutic options in Pneumocystis pneumonia. Aim To screen phytochemicals against (HAT) Rtt109 using bioinformatics tool. Materials and Methods The tertiary structure of Pneumocystis jirovecii (HAT) Rtt109 was modeled by Homology Modeling. The ideal template for modeling was obtained by performing Psi BLAST of the protein sequence. Rtt109-AcCoA/Vps75 protein from Saccharomyces cerevisiae (PDB structure 3Q35) was chosen as the template. The target protein was modeled using Swiss Modeler and validated using Ramachandran plot and Errat 2. Comprehensive text mining was performed to identify phytochemical compounds with antipneumonia and fungicidal properties and these compounds were filtered based on Lipinski’s Rule of 5. The chosen compounds were subjected to virtual screening against the target protein (HAT) Rtt109 using Molegro Virtual Docker 4.5. Osiris Property Explorer and Open Tox Server were used to predict ADME-T properties of the chosen phytochemicals. Results Tertiary structure model of HAT Rtt 109 had a ProSA score of -6.57 and Errat 2 score of 87.34. Structure validation analysis by Ramachandran plot for the model revealed 97% of amino acids were in the favoured region. Of all the phytochemicals subjected to virtual screening against the target protein (HAT) Rtt109, baicalin exhibited highest binding affinity towards the target protein as indicated by the Molegro score of 130.68 and formed 16 H-bonds. The ADME-T property prediction revealed that baicalin was non-mutagenic, non-tumorigenic and had a drug likeness score of 0.87. Conclusion Baicalin has good binding with Rtt 109 in Pneumocystis jirovecii and can be considered as a novel and valuable treatment option for Pneumocystis pneumonia patients after subjecting it to invivo and invitro studies. PMID:27134887
Coarse Grained Model for Biological Simulations: Recent Refinements and Validation
Vicatos, Spyridon; Rychkova, Anna; Mukherjee, Shayantani; Warshel, Arieh
2014-01-01
Exploring the free energy landscape of proteins and modeling the corresponding functional aspects presents a major challenge for computer simulation approaches. This challenge is due to the complexity of the landscape and the enormous computer time needed for converging simulations. The use of various simplified coarse grained (CG) models offers an effective way of sampling the landscape, but most current models are not expected to give a reliable description of protein stability and functional aspects. The main problem is associated with insufficient focus on the electrostatic features of the model. In this respect our recent CG model offers significant advantage as it has been refined while focusing on its electrostatic free energy. Here we review the current state of our model, describing recent refinement, extensions and validation studies while focusing on demonstrating key applications. These include studies of protein stability, extending the model to include membranes and electrolytes and electrodes as well as studies of voltage activated proteins, protein insertion trough the translocon, the action of molecular motors and even the coupling of the stalled ribosome and the translocon. Our example illustrates the general potential of our approach in overcoming major challenges in studies of structure function correlation in proteins and large macromolecular complexes. PMID:25050439
Nanostructure and molecular mechanics of spider dragline silk protein assemblies
Keten, Sinan; Buehler, Markus J.
2010-01-01
Spider silk is a self-assembling biopolymer that outperforms most known materials in terms of its mechanical performance, despite its underlying weak chemical bonding based on H-bonds. While experimental studies have shown that the molecular structure of silk proteins has a direct influence on the stiffness, toughness and failure strength of silk, no molecular-level analysis of the nanostructure and associated mechanical properties of silk assemblies have been reported. Here, we report atomic-level structures of MaSp1 and MaSp2 proteins from the Nephila clavipes spider dragline silk sequence, obtained using replica exchange molecular dynamics, and subject these structures to mechanical loading for a detailed nanomechanical analysis. The structural analysis reveals that poly-alanine regions in silk predominantly form distinct and orderly beta-sheet crystal domains, while disorderly regions are formed by glycine-rich repeats that consist of 31-helix type structures and beta-turns. Our structural predictions are validated against experimental data based on dihedral angle pair calculations presented in Ramachandran plots, alpha-carbon atomic distances, as well as secondary structure content. Mechanical shearing simulations on selected structures illustrate that the nanoscale behaviour of silk protein assemblies is controlled by the distinctly different secondary structure content and hydrogen bonding in the crystalline and semi-amorphous regions. Both structural and mechanical characterization results show excellent agreement with available experimental evidence. Our findings set the stage for extensive atomistic investigations of silk, which may contribute towards an improved understanding of the source of the strength and toughness of this biological superfibre. PMID:20519206
Nanostructure and molecular mechanics of spider dragline silk protein assemblies.
Keten, Sinan; Buehler, Markus J
2010-12-06
Spider silk is a self-assembling biopolymer that outperforms most known materials in terms of its mechanical performance, despite its underlying weak chemical bonding based on H-bonds. While experimental studies have shown that the molecular structure of silk proteins has a direct influence on the stiffness, toughness and failure strength of silk, no molecular-level analysis of the nanostructure and associated mechanical properties of silk assemblies have been reported. Here, we report atomic-level structures of MaSp1 and MaSp2 proteins from the Nephila clavipes spider dragline silk sequence, obtained using replica exchange molecular dynamics, and subject these structures to mechanical loading for a detailed nanomechanical analysis. The structural analysis reveals that poly-alanine regions in silk predominantly form distinct and orderly beta-sheet crystal domains, while disorderly regions are formed by glycine-rich repeats that consist of 3₁-helix type structures and beta-turns. Our structural predictions are validated against experimental data based on dihedral angle pair calculations presented in Ramachandran plots, alpha-carbon atomic distances, as well as secondary structure content. Mechanical shearing simulations on selected structures illustrate that the nanoscale behaviour of silk protein assemblies is controlled by the distinctly different secondary structure content and hydrogen bonding in the crystalline and semi-amorphous regions. Both structural and mechanical characterization results show excellent agreement with available experimental evidence. Our findings set the stage for extensive atomistic investigations of silk, which may contribute towards an improved understanding of the source of the strength and toughness of this biological superfibre.
Lemieux, M Joanne
2007-01-01
The major facilitator superfamily (MFS) of transporters represents the largest family of secondary active transporters and has a diverse range of substrates. With structural information for four MFS transporters, we can see a strong structural commonality suggesting, as predicted, a common architecture for MFS transporters. The rate for crystal structure determination of MFS transporters is slow, making modeling of both prokaryotic and eukaryotic transporters more enticing. In this review, models of eukaryotic transporters Glut1, G6PT, OCT1, OCT2 and Pho84, based on the crystal structures of the prokaryotic GlpT, based on the crystal structure of LacY are discussed. The techniques used to generate the different models are compared. In addition, the validity of these models and the strategy of using prokaryotic crystal structures to model eukaryotic proteins are discussed. For comparison, E. coli GlpT was modeled based on the E. coli LacY structure and compared to the crystal structure of GlpT demonstrating that experimental evidence is essential for accurate modeling of membrane proteins.
Computational structure analysis of biomacromolecule complexes by interface geometry.
Mahdavi, Sedigheh; Salehzadeh-Yazdi, Ali; Mohades, Ali; Masoudi-Nejad, Ali
2013-12-01
The ability to analyze and compare protein-nucleic acid and protein-protein interaction interface has critical importance in understanding the biological function and essential processes occurring in the cells. Since high-resolution three-dimensional (3D) structures of biomacromolecule complexes are available, computational characterizing of the interface geometry become an important research topic in the field of molecular biology. In this study, the interfaces of a set of 180 protein-nucleic acid and protein-protein complexes are computed to understand the principles of their interactions. The weighted Voronoi diagram of the atoms and the Alpha complex has provided an accurate description of the interface atoms. Our method is implemented in the presence and absence of water molecules. A comparison among the three types of interaction interfaces show that RNA-protein complexes have the largest size of an interface. The results show a high correlation coefficient between our method and the PISA server in the presence and absence of water molecules in the Voronoi model and the traditional model based on solvent accessibility and the high validation parameters in comparison to the classical model. Copyright © 2013 Elsevier Ltd. All rights reserved.
2009-01-01
An important part of characterizing any protein molecule is to determine its size and shape. Sedimentation and gel filtration are hydrodynamic techniques that can be used for this medium resolution structural analysis. This review collects a number of simple calculations that are useful for thinking about protein structure at the nanometer level. Readers are reminded that the Perrin equation is generally not a valid approach to determine the shape of proteins. Instead, a simple guideline is presented, based on the measured sedimentation coefficient and a calculated maximum S, to estimate if a protein is globular or elongated. It is recalled that a gel filtration column fractionates proteins on the basis of their Stokes radius, not molecular weight. The molecular weight can be determined by combining gradient sedimentation and gel filtration, techniques available in most biochemistry laboratories, as originally proposed by Siegel and Monte. Finally, rotary shadowing and negative stain electron microscopy are powerful techniques for resolving the size and shape of single protein molecules and complexes at the nanometer level. A combination of hydrodynamics and electron microscopy is especially powerful. PMID:19495910
Kamath, Padmaja; Fernandez, Alberto; Giralt, Francesc; Rallo, Robert
2015-01-01
Nanoparticles are likely to interact in real-case application scenarios with mixtures of proteins and biomolecules that will absorb onto their surface forming the so-called protein corona. Information related to the composition of the protein corona and net cell association was collected from literature for a library of surface-modified gold and silver nanoparticles. For each protein in the corona, sequence information was extracted and used to calculate physicochemical properties and statistical descriptors. Data cleaning and preprocessing techniques including statistical analysis and feature selection methods were applied to remove highly correlated, redundant and non-significant features. A weighting technique was applied to construct specific signatures that represent the corona composition for each nanoparticle. Using this basic set of protein descriptors, a new Protein Corona Structure-Activity Relationship (PCSAR) that relates net cell association with the physicochemical descriptors of the proteins that form the corona was developed and validated. The features that resulted from the feature selection were in line with already published literature, and the computational model constructed on these features had a good accuracy (R(2)LOO=0.76 and R(2)LMO(25%)=0.72) and stability, with the advantage that the fingerprints based on physicochemical descriptors were independent of the specific proteins that form the corona.
MolProbity: all-atom contacts and structure validation for proteins and nucleic acids
Davis, Ian W.; Leaver-Fay, Andrew; Chen, Vincent B.; Block, Jeremy N.; Kapral, Gary J.; Wang, Xueyi; Murray, Laura W.; Arendall, W. Bryan; Snoeyink, Jack; Richardson, Jane S.; Richardson, David C.
2007-01-01
MolProbity is a general-purpose web server offering quality validation for 3D structures of proteins, nucleic acids and complexes. It provides detailed all-atom contact analysis of any steric problems within the molecules as well as updated dihedral-angle diagnostics, and it can calculate and display the H-bond and van der Waals contacts in the interfaces between components. An integral step in the process is the addition and full optimization of all hydrogen atoms, both polar and nonpolar. New analysis functions have been added for RNA, for interfaces, and for NMR ensembles. Additionally, both the web site and major component programs have been rewritten to improve speed, convenience, clarity and integration with other resources. MolProbity results are reported in multiple forms: as overall numeric scores, as lists or charts of local problems, as downloadable PDB and graphics files, and most notably as informative, manipulable 3D kinemage graphics shown online in the KiNG viewer. This service is available free to all users at http://molprobity.biochem.duke.edu. PMID:17452350
Comprehensive peptidomimetic libraries targeting protein-protein interactions.
Whitby, Landon R; Boger, Dale L
2012-10-16
Transient protein-protein interactions (PPIs) are essential components in cellular signaling pathways as well as in important processes such as viral infection, replication, and immune suppression. The unknown or uncharacterized PPIs involved in such interaction networks often represent compelling therapeutic targets for drug discovery. To date, however, the main strategies for discovery of small molecule modulators of PPIs are typically limited to structurally characterized targets. Recent developments in molecular scaffolds that mimic the side chain display of peptide secondary structures have yielded effective designs, but few screening libraries of such mimetics are available to interrogate PPI targets. We initiated a program to prepare a comprehensive small molecule library designed to mimic the three major recognition motifs that mediate PPIs (α-helix, β-turn, and β-strand). Three libraries would be built around templates designed to mimic each such secondary structure and substituted with all triplet combinations of groups representing the 20 natural amino acid side chains. When combined, the three libraries would contain a member capable of mimicking the key interaction and recognition residues of most targetable PPIs. In this Account, we summarize the results of the design, synthesis, and validation of an 8000 member α-helix mimetic library and a 4200 member β-turn mimetic library. We expect that the screening of these libraries will not only provide lead structures against α-helix- or β-turn-mediated protein-protein or peptide-receptor interactions, even if the nature of the interaction is unknown, but also yield key insights into the recognition motif (α-helix or β-turn) and identify the key residues mediating the interaction. Consistent with this expectation, the screening of the libraries against p53/MDM2 and HIV-1 gp41 (α-helix mimetic library) or the opioid receptors (β-turn mimetic library) led to the discovery of library members expected to mimic the known endogenous ligands. These efforts led to the discovery of high-affinity α-helix mimetics (K(i) = 0.7 μM) against HIV-1 gp41 as well as high-affinity and selective β-turn mimetics (K(i) = 80 nM) against the κ-opioid receptor. The results suggest that the use of such comprehensive libraries of peptide secondary structure mimetics, built around effective molecular scaffolds, constitutes a powerful method of interrogating PPIs. These structures provide small molecule modulators of PPI networks for therapeutic target validation, lead compound discovery, and the identification of modulators of biological processes for further study.
Automated, high-throughput platform for protein solubility screening using a split-GFP system
Listwan, Pawel; Terwilliger, Thomas C.
2010-01-01
Overproduction of soluble and stable proteins for functional and structural studies is a major bottleneck for structural genomics programs and traditional biochemistry laboratories. Many high-payoff proteins that are important in various biological processes are “difficult to handle” as protein reagents in their native form. We have recently made several advances in enabling biochemical technologies for improving protein stability (http://www.lanl.gov/projects/gfp/), allowing stratagems for efficient protein domain trapping, solubility-improving mutations, and finding protein folding partners. In particular split-GFP protein tags are a very powerful tool for detection of stable protein domains. Soluble, stable proteins tagged with the 15 amino acid GFP fragment (amino acids 216–228) can be detected in vivo and in vitro using the engineered GFP 1–10 “detector” fragment (amino acids 1–215). If the small tag is accessible, the detector fragment spontaneously binds resulting in fluorescence. Here, we describe our current and on-going efforts to move this process from the bench (manual sample manipulation) to an automated, high-throughput, liquid-handling platform. We discuss optimization and validation of bacterial culture growth, lysis protocols, protein extraction, and assays of soluble and insoluble protein in multiple 96 well plate format. The optimized liquid-handling protocol can be used for rapid determination of the optimal, compact domains from single ORFS, collections of ORFS, or cDNA libraries. PMID:19039681
EMDataBank unified data resource for 3DEM.
Lawson, Catherine L; Patwardhan, Ardan; Baker, Matthew L; Hryc, Corey; Garcia, Eduardo Sanz; Hudson, Brian P; Lagerstedt, Ingvar; Ludtke, Steven J; Pintilie, Grigore; Sala, Raul; Westbrook, John D; Berman, Helen M; Kleywegt, Gerard J; Chiu, Wah
2016-01-04
Three-dimensional Electron Microscopy (3DEM) has become a key experimental method in structural biology for a broad spectrum of biological specimens from molecules to cells. The EMDataBank project provides a unified portal for deposition, retrieval and analysis of 3DEM density maps, atomic models and associated metadata (emdatabank.org). We provide here an overview of the rapidly growing 3DEM structural data archives, which include maps in EM Data Bank and map-derived models in the Protein Data Bank. In addition, we describe progress and approaches toward development of validation protocols and methods, working with the scientific community, in order to create a validation pipeline for 3DEM data. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
Structural insights into simocyclinone as an antibiotic, effector ligand and substrate
Buttner, Mark J; Schäfer, Martin; Lawson, David M
2017-01-01
Abstract Simocyclinones are antibiotics produced by Streptomyces and Kitasatospora species that inhibit the validated drug target DNA gyrase in a unique way, and they are thus of therapeutic interest. Structural approaches have revealed their mode of action, the inducible-efflux mechanism in the producing organism, and given insight into one step in their biosynthesis. The crystal structures of simocyclinones bound to their target (gyrase), the transcriptional repressor SimR and the biosynthetic enzyme SimC7 reveal fascinating insight into how molecular recognition is achieved with these three unrelated proteins. PMID:29126195
Structural insights into simocyclinone as an antibiotic, effector ligand and substrate.
Buttner, Mark J; Schäfer, Martin; Lawson, David M; Maxwell, Anthony
2018-01-01
Simocyclinones are antibiotics produced by Streptomyces and Kitasatospora species that inhibit the validated drug target DNA gyrase in a unique way, and they are thus of therapeutic interest. Structural approaches have revealed their mode of action, the inducible-efflux mechanism in the producing organism, and given insight into one step in their biosynthesis. The crystal structures of simocyclinones bound to their target (gyrase), the transcriptional repressor SimR and the biosynthetic enzyme SimC7 reveal fascinating insight into how molecular recognition is achieved with these three unrelated proteins. © FEMS 2017.
Protein model discrimination using mutational sensitivity derived from deep sequencing.
Adkar, Bharat V; Tripathi, Arti; Sahoo, Anusmita; Bajaj, Kanika; Goswami, Devrishi; Chakrabarti, Purbani; Swarnkar, Mohit K; Gokhale, Rajesh S; Varadarajan, Raghavan
2012-02-08
A major bottleneck in protein structure prediction is the selection of correct models from a pool of decoys. Relative activities of ∼1,200 individual single-site mutants in a saturation library of the bacterial toxin CcdB were estimated by determining their relative populations using deep sequencing. This phenotypic information was used to define an empirical score for each residue (RankScore), which correlated with the residue depth, and identify active-site residues. Using these correlations, ∼98% of correct models of CcdB (RMSD ≤ 4Å) were identified from a large set of decoys. The model-discrimination methodology was further validated on eleven different monomeric proteins using simulated RankScore values. The methodology is also a rapid, accurate way to obtain relative activities of each mutant in a large pool and derive sequence-structure-function relationships without protein isolation or characterization. It can be applied to any system in which mutational effects can be monitored by a phenotypic readout. Copyright © 2012 Elsevier Ltd. All rights reserved.
Lee, Woonghee; Kim, Jin Hae; Westler, William M.; Markley, John L.
2011-01-01
Summary: PONDEROSA (Peak-picking Of Noe Data Enabled by Restriction of Shift Assignments) accepts input information consisting of a protein sequence, backbone and sidechain NMR resonance assignments, and 3D-NOESY (13C-edited and/or 15N-edited) spectra, and returns assignments of NOESY crosspeaks, distance and angle constraints, and a reliable NMR structure represented by a family of conformers. PONDEROSA incorporates and integrates external software packages (TALOS+, STRIDE and CYANA) to carry out different steps in the structure determination. PONDEROSA implements internal functions that identify and validate NOESY peak assignments and assess the quality of the calculated three-dimensional structure of the protein. The robustness of the analysis results from PONDEROSA's hierarchical processing steps that involve iterative interaction among the internal and external modules. PONDEROSA supports a variety of input formats: SPARKY assignment table (.shifts) and spectrum file formats (.ucsf), XEASY proton file format (.prot), and NMR-STAR format (.star). To demonstrate the utility of PONDEROSA, we used the package to determine 3D structures of two proteins: human ubiquitin and Escherichia coli iron-sulfur scaffold protein variant IscU(D39A). The automatically generated structural constraints and ensembles of conformers were as good as or better than those determined previously by much less automated means. Availability: The program, in the form of binary code along with tutorials and reference manuals, is available at http://ponderosa.nmrfam.wisc.edu/. Contact: whlee@nmrfam.wisc.edu; markley@nmrfam.wisc.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:21511715
Wang, Xue; Zhao, Kun; Kirberger, Michael; Wong, Hing; Chen, Guantao; Yang, Jenny J
2010-01-01
Calcium binding in proteins exhibits a wide range of polygonal geometries that relate directly to an equally diverse set of biological functions. The binding process stabilizes protein structures and typically results in local conformational change and/or global restructuring of the backbone. Previously, we established the MUG program, which utilized multiple geometries in the Ca2+-binding pockets of holoproteins to identify such pockets, ignoring possible Ca2+-induced conformational change. In this article, we first report our progress in the analysis of Ca2+-induced conformational changes followed by improved prediction of Ca2+-binding sites in the large group of Ca2+-binding proteins that exhibit only localized conformational changes. The MUGSR algorithm was devised to incorporate side chain torsional rotation as a predictor. The output from MUGSR presents groups of residues where each group, typically containing two to five residues, is a potential binding pocket. MUGSR was applied to both X-ray apo structures and NMR holo structures, which did not use calcium distance constraints in structure calculations. Predicted pockets were validated by comparison with homologous holo structures. Defining a “correct hit” as a group of residues containing at least two true ligand residues, the sensitivity was at least 90%; whereas for a “correct hit” defined as a group of residues containing at least three true ligand residues, the sensitivity was at least 78%. These data suggest that Ca2+-binding pockets are at least partially prepositioned to chelate the ion in the apo form of the protein. PMID:20512971
Electrostatic effects in unfolded staphylococcal nuclease
Fitzkee, Nicholas C.; García-Moreno E, Bertrand
2008-01-01
Structure-based calculations of pK a values and electrostatic free energies of proteins assume that electrostatic effects in the unfolded state are negligible. In light of experimental evidence showing that this assumption is invalid for many proteins, and with increasing awareness that the unfolded state is more structured and compact than previously thought, a detailed examination of electrostatic effects in unfolded proteins is warranted. Here we address this issue with structure-based calculations of electrostatic interactions in unfolded staphylococcal nuclease. The approach involves the generation of ensembles of structures representing the unfolded state, and calculation of Coulomb energies to Boltzmann weight the unfolded state ensembles. Four different structural models of the unfolded state were tested. Experimental proton binding data measured with a variant of nuclease that is unfolded under native conditions were used to establish the validity of the calculations. These calculations suggest that weak Coulomb interactions are an unavoidable property of unfolded proteins. At neutral pH, the interactions are too weak to organize the unfolded state; however, at extreme pH values, where the protein has a significant net charge, the combined action of a large number of weak repulsive interactions can lead to the expansion of the unfolded state. The calculated pK a values of ionizable groups in the unfolded state are similar but not identical to the values in small peptides in water. These studies suggest that the accuracy of structure-based calculations of electrostatic contributions to stability cannot be improved unless electrostatic effects in the unfolded state are calculated explicitly. PMID:18227429
Watching a signaling protein function in real time via 100-ps time-resolved Laue crystallography
DOE Office of Scientific and Technical Information (OSTI.GOV)
Schotte, Friedrich; Cho, Hyun Sun; Kaila, Ville R.I.
2012-11-06
To understand how signaling proteins function, it is necessary to know the time-ordered sequence of events that lead to the signaling state. We recently developed on the BioCARS 14-IDB beamline at the Advanced Photon Source the infrastructure required to characterize structural changes in protein crystals with near-atomic spatial resolution and 150-ps time resolution, and have used this capability to track the reversible photocycle of photoactive yellow protein (PYP) following trans-to-cis photoisomerization of its p-coumaric acid (pCA) chromophore over 10 decades of time. The first of four major intermediates characterized in this study is highly contorted, with the pCA carbonyl rotatedmore » nearly 90° out of the plane of the phenolate. A hydrogen bond between the pCA carbonyl and the Cys69 backbone constrains the chromophore in this unusual twisted conformation. Density functional theory calculations confirm that this structure is chemically plausible and corresponds to a strained cis intermediate. This unique structure is short-lived (~600 ps), has not been observed in prior cryocrystallography experiments, and is the progenitor of intermediates characterized in previous nanosecond time-resolved Laue crystallography studies. The structural transitions unveiled during the PYP photocycle include trans/cis isomerization, the breaking and making of hydrogen bonds, formation/relaxation of strain, and gated water penetration into the interior of the protein. This mechanistically detailed, near-atomic resolution description of the complete PYP photocycle provides a framework for understanding signal transduction in proteins, and for assessing and validating theoretical/computational approaches in protein biophysics.« less
de Oliveira, Saulo H P; Law, Eleanor C; Shi, Jiye; Deane, Charlotte M
2018-04-01
Most current de novo structure prediction methods randomly sample protein conformations and thus require large amounts of computational resource. Here, we consider a sequential sampling strategy, building on ideas from recent experimental work which shows that many proteins fold cotranslationally. We have investigated whether a pseudo-greedy search approach, which begins sequentially from one of the termini, can improve the performance and accuracy of de novo protein structure prediction. We observed that our sequential approach converges when fewer than 20 000 decoys have been produced, fewer than commonly expected. Using our software, SAINT2, we also compared the run time and quality of models produced in a sequential fashion against a standard, non-sequential approach. Sequential prediction produces an individual decoy 1.5-2.5 times faster than non-sequential prediction. When considering the quality of the best model, sequential prediction led to a better model being produced for 31 out of 41 soluble protein validation cases and for 18 out of 24 transmembrane protein cases. Correct models (TM-Score > 0.5) were produced for 29 of these cases by the sequential mode and for only 22 by the non-sequential mode. Our comparison reveals that a sequential search strategy can be used to drastically reduce computational time of de novo protein structure prediction and improve accuracy. Data are available for download from: http://opig.stats.ox.ac.uk/resources. SAINT2 is available for download from: https://github.com/sauloho/SAINT2. saulo.deoliveira@dtc.ox.ac.uk. Supplementary data are available at Bioinformatics online.
Shelar, Ashish; Bansal, Manju
2014-12-01
α-Helices are amongst the most common secondary structural elements seen in membrane proteins and are packed in the form of helix bundles. These α-helices encounter varying external environments (hydrophobic, hydrophilic) that may influence the sequence preferences at their N and C-termini. The role of the external environment in stabilization of the helix termini in membrane proteins is still unknown. Here we analyze α-helices in a high-resolution dataset of integral α-helical membrane proteins and establish that their sequence and conformational preferences differ from those in globular proteins. We specifically examine these preferences at the N and C-termini in helices initiating/terminating inside the membrane core as well as in linkers connecting these transmembrane helices. We find that the sequence preferences and structural motifs at capping (Ncap and Ccap) and near-helical (N' and C') positions are influenced by a combination of features including the membrane environment and the innate helix initiation and termination property of residues forming structural motifs. We also find that a large number of helix termini which do not form any particular capping motif are stabilized by formation of hydrogen bonds and hydrophobic interactions contributed from the neighboring helices in the membrane protein. We further validate the sequence preferences obtained from our analysis with data from an ultradeep sequencing study that identifies evolutionarily conserved amino acids in the rat neurotensin receptor. The results from our analysis provide insights for the secondary structure prediction, modeling and design of membrane proteins. © 2014 Wiley Periodicals, Inc.
Predicting X-ray diffuse scattering from translation–libration–screw structural ensembles
Van Benschoten, Andrew H.; Afonine, Pavel V.; Terwilliger, Thomas C.; Wall, Michael E.; Jackson, Colin J.; Sauter, Nicholas K.; Adams, Paul D.; Urzhumtsev, Alexandre; Fraser, James S.
2015-01-01
Identifying the intramolecular motions of proteins and nucleic acids is a major challenge in macromolecular X-ray crystallography. Because Bragg diffraction describes the average positional distribution of crystalline atoms with imperfect precision, the resulting electron density can be compatible with multiple models of motion. Diffuse X-ray scattering can reduce this degeneracy by reporting on correlated atomic displacements. Although recent technological advances are increasing the potential to accurately measure diffuse scattering, computational modeling and validation tools are still needed to quantify the agreement between experimental data and different parameterizations of crystalline disorder. A new tool, phenix.diffuse, addresses this need by employing Guinier’s equation to calculate diffuse scattering from Protein Data Bank (PDB)-formatted structural ensembles. As an example case, phenix.diffuse is applied to translation–libration–screw (TLS) refinement, which models rigid-body displacement for segments of the macromolecule. To enable the calculation of diffuse scattering from TLS-refined structures, phenix.tls_as_xyz builds multi-model PDB files that sample the underlying T, L and S tensors. In the glycerophosphodiesterase GpdQ, alternative TLS-group partitioning and different motional correlations between groups yield markedly dissimilar diffuse scattering maps with distinct implications for molecular mechanism and allostery. These methods demonstrate how, in principle, X-ray diffuse scattering could extend macromolecular structural refinement, validation and analysis. PMID:26249347
Predicting X-ray diffuse scattering from translation–libration–screw structural ensembles
Van Benschoten, Andrew H.; Afonine, Pavel V.; Terwilliger, Thomas C.; ...
2015-07-28
Identifying the intramolecular motions of proteins and nucleic acids is a major challenge in macromolecular X-ray crystallography. Because Bragg diffraction describes the average positional distribution of crystalline atoms with imperfect precision, the resulting electron density can be compatible with multiple models of motion. Diffuse X-ray scattering can reduce this degeneracy by reporting on correlated atomic displacements. Although recent technological advances are increasing the potential to accurately measure diffuse scattering, computational modeling and validation tools are still needed to quantify the agreement between experimental data and different parameterizations of crystalline disorder. A new tool, phenix.diffuse, addresses this need by employing Guinier'smore » equation to calculate diffuse scattering from Protein Data Bank (PDB)-formatted structural ensembles. As an example case, phenix.diffuse is applied to translation–libration–screw (TLS) refinement, which models rigid-body displacement for segments of the macromolecule. To enable the calculation of diffuse scattering from TLS-refined structures, phenix.tls_as_xyz builds multi-model PDB files that sample the underlying T, L and S tensors. In the glycerophosphodiesterase GpdQ, alternative TLS-group partitioning and different motional correlations between groups yield markedly dissimilar diffuse scattering maps with distinct implications for molecular mechanism and allostery. In addition, these methods demonstrate how, in principle, X-ray diffuse scattering could extend macromolecular structural refinement, validation and analysis.« less
Borbulevych, Oleg Y; Plumley, Joshua A; Martin, Roger I; Merz, Kenneth M; Westerhoff, Lance M
2014-05-01
Macromolecular crystallographic refinement relies on sometimes dubious stereochemical restraints and rudimentary energy functionals to ensure the correct geometry of the model of the macromolecule and any covalently bound ligand(s). The ligand stereochemical restraint file (CIF) requires a priori understanding of the ligand geometry within the active site, and creation of the CIF is often an error-prone process owing to the great variety of potential ligand chemistry and structure. Stereochemical restraints have been replaced with more robust functionals through the integration of the linear-scaling, semiempirical quantum-mechanics (SE-QM) program DivCon with the PHENIX X-ray refinement engine. The PHENIX/DivCon package has been thoroughly validated on a population of 50 protein-ligand Protein Data Bank (PDB) structures with a range of resolutions and chemistry. The PDB structures used for the validation were originally refined utilizing various refinement packages and were published within the past five years. PHENIX/DivCon does not utilize CIF(s), link restraints and other parameters for refinement and hence it does not make as many a priori assumptions about the model. Across the entire population, the method results in reasonable ligand geometries and low ligand strains, even when the original refinement exhibited difficulties, indicating that PHENIX/DivCon is applicable to both single-structure and high-throughput crystallography.
Benchmarking all-atom simulations using hydrogen exchange
DOE Office of Scientific and Technical Information (OSTI.GOV)
Skinner, John J.; Yu, Wookyung; Gichana, Elizabeth K.
We are now able to fold small proteins reversibly to their native structures [Lindorff-Larsen K, Piana S, Dror RO, Shaw DE (2011) Science 334(6055):517–520] using long-time molecular dynamics (MD) simulations. Our results indicate that modern force fields can reproduce the energy surface near the native structure. In this paper, to test how well the force fields recapitulate the other regions of the energy surface, MD trajectories for a variant of protein G are compared with data from site-resolved hydrogen exchange (HX) and other biophysical measurements. Because HX monitors the breaking of individual H-bonds, this experimental technique identifies the stability andmore » H-bond content of excited states, thus enabling quantitative comparison with the simulations. Contrary to experimental findings of a cooperative, all-or-none unfolding process, the simulated denatured state ensemble, on average, is highly collapsed with some transient or persistent native 2° structure. The MD trajectories of this protein G variant and other small proteins exhibit excessive intramolecular H-bonding even for the most expanded conformations, suggesting that the force fields require improvements in describing H-bonding and backbone hydration. Finally and moreover, these comparisons provide a general protocol for validating the ability of simulations to accurately capture rare structural fluctuations.« less
Bayesian refinement of protein structures and ensembles against SAXS data using molecular dynamics
Shevchuk, Roman; Hub, Jochen S.
2017-01-01
Small-angle X-ray scattering is an increasingly popular technique used to detect protein structures and ensembles in solution. However, the refinement of structures and ensembles against SAXS data is often ambiguous due to the low information content of SAXS data, unknown systematic errors, and unknown scattering contributions from the solvent. We offer a solution to such problems by combining Bayesian inference with all-atom molecular dynamics simulations and explicit-solvent SAXS calculations. The Bayesian formulation correctly weights the SAXS data versus prior physical knowledge, it quantifies the precision or ambiguity of fitted structures and ensembles, and it accounts for unknown systematic errors due to poor buffer matching. The method further provides a probabilistic criterion for identifying the number of states required to explain the SAXS data. The method is validated by refining ensembles of a periplasmic binding protein against calculated SAXS curves. Subsequently, we derive the solution ensembles of the eukaryotic chaperone heat shock protein 90 (Hsp90) against experimental SAXS data. We find that the SAXS data of the apo state of Hsp90 is compatible with a single wide-open conformation, whereas the SAXS data of Hsp90 bound to ATP or to an ATP-analogue strongly suggest heterogenous ensembles of a closed and a wide-open state. PMID:29045407
Combining Rosetta with molecular dynamics (MD): A benchmark of the MD-based ensemble protein design.
Ludwiczak, Jan; Jarmula, Adam; Dunin-Horkawicz, Stanislaw
2018-07-01
Computational protein design is a set of procedures for computing amino acid sequences that will fold into a specified structure. Rosetta Design, a commonly used software for protein design, allows for the effective identification of sequences compatible with a given backbone structure, while molecular dynamics (MD) simulations can thoroughly sample near-native conformations. We benchmarked a procedure in which Rosetta design is started on MD-derived structural ensembles and showed that such a combined approach generates 20-30% more diverse sequences than currently available methods with only a slight increase in computation time. Importantly, the increase in diversity is achieved without a loss in the quality of the designed sequences assessed by their resemblance to natural sequences. We demonstrate that the MD-based procedure is also applicable to de novo design tasks started from backbone structures without any sequence information. In addition, we implemented a protocol that can be used to assess the stability of designed models and to select the best candidates for experimental validation. In sum our results demonstrate that the MD ensemble-based flexible backbone design can be a viable method for protein design, especially for tasks that require a large pool of diverse sequences. Copyright © 2018 Elsevier Inc. All rights reserved.
Benchmarking all-atom simulations using hydrogen exchange
Skinner, John J.; Yu, Wookyung; Gichana, Elizabeth K.; ...
2014-10-27
We are now able to fold small proteins reversibly to their native structures [Lindorff-Larsen K, Piana S, Dror RO, Shaw DE (2011) Science 334(6055):517–520] using long-time molecular dynamics (MD) simulations. Our results indicate that modern force fields can reproduce the energy surface near the native structure. In this paper, to test how well the force fields recapitulate the other regions of the energy surface, MD trajectories for a variant of protein G are compared with data from site-resolved hydrogen exchange (HX) and other biophysical measurements. Because HX monitors the breaking of individual H-bonds, this experimental technique identifies the stability andmore » H-bond content of excited states, thus enabling quantitative comparison with the simulations. Contrary to experimental findings of a cooperative, all-or-none unfolding process, the simulated denatured state ensemble, on average, is highly collapsed with some transient or persistent native 2° structure. The MD trajectories of this protein G variant and other small proteins exhibit excessive intramolecular H-bonding even for the most expanded conformations, suggesting that the force fields require improvements in describing H-bonding and backbone hydration. Finally and moreover, these comparisons provide a general protocol for validating the ability of simulations to accurately capture rare structural fluctuations.« less
Naveed, Hammad; Hameed, Umar S.; Harrus, Deborah; Bourguet, William; Arold, Stefan T.; Gao, Xin
2015-01-01
Motivation: The inherent promiscuity of small molecules towards protein targets impedes our understanding of healthy versus diseased metabolism. This promiscuity also poses a challenge for the pharmaceutical industry as identifying all protein targets is important to assess (side) effects and repositioning opportunities for a drug. Results: Here, we present a novel integrated structure- and system-based approach of drug-target prediction (iDTP) to enable the large-scale discovery of new targets for small molecules, such as pharmaceutical drugs, co-factors and metabolites (collectively called ‘drugs’). For a given drug, our method uses sequence order–independent structure alignment, hierarchical clustering and probabilistic sequence similarity to construct a probabilistic pocket ensemble (PPE) that captures promiscuous structural features of different binding sites on known targets. A drug’s PPE is combined with an approximation of its delivery profile to reduce false positives. In our cross-validation study, we use iDTP to predict the known targets of 11 drugs, with 63% sensitivity and 81% specificity. We then predicted novel targets for these drugs—two that are of high pharmacological interest, the peroxisome proliferator-activated receptor gamma and the oncogene B-cell lymphoma 2, were successfully validated through in vitro binding experiments. Our method is broadly applicable for the prediction of protein-small molecule interactions with several novel applications to biological research and drug development. Availability and implementation: The program, datasets and results are freely available to academic users at http://sfb.kaust.edu.sa/Pages/Software.aspx. Contact: xin.gao@kaust.edu.sa and stefan.arold@kaust.edu.sa Supplementary information: Supplementary data are available at Bioinformatics online. PMID:26286808
E-novo: an automated workflow for efficient structure-based lead optimization.
Pearce, Bradley C; Langley, David R; Kang, Jia; Huang, Hongwei; Kulkarni, Amit
2009-07-01
An automated E-Novo protocol designed as a structure-based lead optimization tool was prepared through Pipeline Pilot with existing CHARMm components in Discovery Studio. A scaffold core having 3D binding coordinates of interest is generated from a ligand-bound protein structural model. Ligands of interest are generated from the scaffold using an R-group fragmentation/enumeration tool within E-Novo, with their cores aligned. The ligand side chains are conformationally sampled and are subjected to core-constrained protein docking, using a modified CHARMm-based CDOCKER method to generate top poses along with CDOCKER energies. In the final stage of E-Novo, a physics-based binding energy scoring function ranks the top ligand CDOCKER poses using a more accurate Molecular Mechanics-Generalized Born with Surface Area method. Correlation of the calculated ligand binding energies with experimental binding affinities were used to validate protocol performance. Inhibitors of Src tyrosine kinase, CDK2 kinase, beta-secretase, factor Xa, HIV protease, and thrombin were used to test the protocol using published ligand crystal structure data within reasonably defined binding sites. In-house Respiratory Syncytial Virus inhibitor data were used as a more challenging test set using a hand-built binding model. Least squares fits for all data sets suggested reasonable validation of the protocol within the context of observed ligand binding poses. The E-Novo protocol provides a convenient all-in-one structure-based design process for rapid assessment and scoring of lead optimization libraries.
Optimal contact definition for reconstruction of contact maps.
Duarte, Jose M; Sathyapriya, Rajagopal; Stehr, Henning; Filippis, Ioannis; Lappe, Michael
2010-05-27
Contact maps have been extensively used as a simplified representation of protein structures. They capture most important features of a protein's fold, being preferred by a number of researchers for the description and study of protein structures. Inspired by the model's simplicity many groups have dedicated a considerable amount of effort towards contact prediction as a proxy for protein structure prediction. However a contact map's biological interest is subject to the availability of reliable methods for the 3-dimensional reconstruction of the structure. We use an implementation of the well-known distance geometry protocol to build realistic protein 3-dimensional models from contact maps, performing an extensive exploration of many of the parameters involved in the reconstruction process. We try to address the questions: a) to what accuracy does a contact map represent its corresponding 3D structure, b) what is the best contact map representation with regard to reconstructability and c) what is the effect of partial or inaccurate contact information on the 3D structure recovery. Our results suggest that contact maps derived from the application of a distance cutoff of 9 to 11A around the Cbeta atoms constitute the most accurate representation of the 3D structure. The reconstruction process does not provide a single solution to the problem but rather an ensemble of conformations that are within 2A RMSD of the crystal structure and with lower values for the pairwise average ensemble RMSD. Interestingly it is still possible to recover a structure with partial contact information, although wrong contacts can lead to dramatic loss in reconstruction fidelity. Thus contact maps represent a valid approximation to the structures with an accuracy comparable to that of experimental methods. The optimal contact definitions constitute key guidelines for methods based on contact maps such as structure prediction through contacts and structural alignments based on maximum contact map overlap.
Optimal contact definition for reconstruction of Contact Maps
2010-01-01
Background Contact maps have been extensively used as a simplified representation of protein structures. They capture most important features of a protein's fold, being preferred by a number of researchers for the description and study of protein structures. Inspired by the model's simplicity many groups have dedicated a considerable amount of effort towards contact prediction as a proxy for protein structure prediction. However a contact map's biological interest is subject to the availability of reliable methods for the 3-dimensional reconstruction of the structure. Results We use an implementation of the well-known distance geometry protocol to build realistic protein 3-dimensional models from contact maps, performing an extensive exploration of many of the parameters involved in the reconstruction process. We try to address the questions: a) to what accuracy does a contact map represent its corresponding 3D structure, b) what is the best contact map representation with regard to reconstructability and c) what is the effect of partial or inaccurate contact information on the 3D structure recovery. Our results suggest that contact maps derived from the application of a distance cutoff of 9 to 11Å around the Cβ atoms constitute the most accurate representation of the 3D structure. The reconstruction process does not provide a single solution to the problem but rather an ensemble of conformations that are within 2Å RMSD of the crystal structure and with lower values for the pairwise average ensemble RMSD. Interestingly it is still possible to recover a structure with partial contact information, although wrong contacts can lead to dramatic loss in reconstruction fidelity. Conclusions Thus contact maps represent a valid approximation to the structures with an accuracy comparable to that of experimental methods. The optimal contact definitions constitute key guidelines for methods based on contact maps such as structure prediction through contacts and structural alignments based on maximum contact map overlap. PMID:20507547
Shen, Yang; Bax, Ad
2013-01-01
A new program, TALOS-N, is introduced for predicting protein backbone torsion angles from NMR chemical shifts. The program relies far more extensively on the use of trained artificial neural networks than its predecessor, TALOS+. Validation on an independent set of proteins indicates that backbone torsion angles can be predicted for a larger, ≥ 90% fraction of the residues, with an error rate smaller than ca 3.5%, using an acceptance criterion that is nearly two-fold tighter than that used previously, and a root mean square difference between predicted and crystallographically observed (φ,ψ) torsion angles of ca 12°. TALOS-N also reports sidechain χ1 rotameric states for about 50% of the residues, and a consistency with reference structures of 89%. The program includes a neural network trained to identify secondary structure from residue sequence and chemical shifts. PMID:23728592
Quantitative structure-activity relationship: promising advances in drug discovery platforms.
Wang, Tao; Wu, Mian-Bin; Lin, Jian-Ping; Yang, Li-Rong
2015-12-01
Quantitative structure-activity relationship (QSAR) modeling is one of the most popular computer-aided tools employed in medicinal chemistry for drug discovery and lead optimization. It is especially powerful in the absence of 3D structures of specific drug targets. QSAR methods have been shown to draw public attention since they were first introduced. In this review, the authors provide a brief discussion of the basic principles of QSAR, model development and model validation. They also highlight the current applications of QSAR in different fields, particularly in virtual screening, rational drug design and multi-target QSAR. Finally, in view of recent controversies, the authors detail the challenges faced by QSAR modeling and the relevant solutions. The aim of this review is to show how QSAR modeling can be applied in novel drug discovery, design and lead optimization. QSAR should intentionally be used as a powerful tool for fragment-based drug design platforms in the field of drug discovery and design. Although there have been an increasing number of experimentally determined protein structures in recent years, a great number of protein structures cannot be easily obtained (i.e., membrane transport proteins and G-protein coupled receptors). Fragment-based drug discovery, such as QSAR, could be applied further and have a significant role in dealing with these problems. Moreover, along with the development of computer software and hardware, it is believed that QSAR will be increasingly important.
Crystal Structure of a Ube2S-Ubiquitin Conjugate
Lorenz, Sonja; Bhattacharyya, Moitrayee; Feiler, Christian; Rape, Michael; Kuriyan, John
2016-01-01
Protein ubiquitination occurs through the sequential formation and reorganization of specific protein-protein interfaces. Ubiquitin-conjugating (E2) enzymes, such as Ube2S, catalyze the formation of an isopeptide linkage between the C-terminus of a “donor” ubiquitin and a primary amino group of an “acceptor” ubiquitin molecule. This reaction involves an intermediate, in which the C-terminus of the donor ubiquitin is thioester-bound to the active site cysteine of the E2 and a functionally important interface is formed between the two proteins. A docked model of a Ube2S-donor ubiquitin complex was generated previously, based on chemical shift mapping by NMR, and predicted contacts were validated in functional studies. We now present the crystal structure of a covalent Ube2S-ubiquitin complex. The structure contains an interface between Ube2S and ubiquitin in trans that resembles the earlier model in general terms, but differs in detail. The crystallographic interface is more hydrophobic than the earlier model and is stable in molecular dynamics (MD) simulations. Remarkably, the docked Ube2S-donor complex converges readily to the configuration seen in the crystal structure in 3 out of 8 MD trajectories. Since the crystallographic interface is fully consistent with mutational effects, this indicates that the structure provides an energetically favorable representation of the functionally critical Ube2S-donor interface. PMID:26828794
Iizaka, Shinji; Kaitani, Toshiko; Nakagami, Gojiro; Sugama, Junko; Sanada, Hiromi
2015-11-01
Adequate nutritional intake is essential for pressure ulcer healing. Recently, the estimated energy requirement (30 kcal/kg) and the average protein requirement (0.95 g/kg) necessary to maintain metabolic balance have been reported. The purpose was to evaluate the clinical validity of these requirements in older hospitalized patients with pressure ulcers by assessing nutritional status and wound healing. This multicenter prospective study carried out as a secondary analysis of a clinical trial included 194 patients with pressure ulcers aged ≥65 years from 29 institutions. Nutritional status including anthropometry and biochemical tests, and wound status by a structured severity tool, were evaluated over 3 weeks. Energy and protein intake were determined from medical records on a typical day and dichotomized by meeting the estimated average requirement. Longitudinal data were analyzed with a multivariate mixed-effects model. Meeting the energy requirement was associated with changes in weight (P < 0.001), arm muscle circumference (P = 0.003) and serum albumin level (P = 0.016). Meeting the protein requirement was associated with changes in weight (P < 0.001) and serum albumin level (P = 0.043). These markers decreased in patients who did not meet the requirement, but were stable or increased in those who did. Energy and protein intake were associated with wound healing for deep ulcers (P = 0.013 for both), improving exudates and necrotic tissue, but not for superficial ulcers. Estimated energy requirement and average protein requirement were clinically validated for prevention of nutritional decline and of impaired healing of deep pressure ulcers. © 2014 Japan Geriatrics Society.
Sequence-based prediction of protein-binding sites in DNA: comparative study of two SVM models.
Park, Byungkyu; Im, Jinyong; Tuvshinjargal, Narankhuu; Lee, Wook; Han, Kyungsook
2014-11-01
As many structures of protein-DNA complexes have been known in the past years, several computational methods have been developed to predict DNA-binding sites in proteins. However, its inverse problem (i.e., predicting protein-binding sites in DNA) has received much less attention. One of the reasons is that the differences between the interaction propensities of nucleotides are much smaller than those between amino acids. Another reason is that DNA exhibits less diverse sequence patterns than protein. Therefore, predicting protein-binding DNA nucleotides is much harder than predicting DNA-binding amino acids. We computed the interaction propensity (IP) of nucleotide triplets with amino acids using an extensive dataset of protein-DNA complexes, and developed two support vector machine (SVM) models that predict protein-binding nucleotides from sequence data alone. One SVM model predicts protein-binding nucleotides using DNA sequence data alone, and the other SVM model predicts protein-binding nucleotides using both DNA and protein sequences. In a 10-fold cross-validation with 1519 DNA sequences, the SVM model that uses DNA sequence data only predicted protein-binding nucleotides with an accuracy of 67.0%, an F-measure of 67.1%, and a Matthews correlation coefficient (MCC) of 0.340. With an independent dataset of 181 DNAs that were not used in training, it achieved an accuracy of 66.2%, an F-measure 66.3% and a MCC of 0.324. Another SVM model that uses both DNA and protein sequences achieved an accuracy of 69.6%, an F-measure of 69.6%, and a MCC of 0.383 in a 10-fold cross-validation with 1519 DNA sequences and 859 protein sequences. With an independent dataset of 181 DNAs and 143 proteins, it showed an accuracy of 67.3%, an F-measure of 66.5% and a MCC of 0.329. Both in cross-validation and independent testing, the second SVM model that used both DNA and protein sequence data showed better performance than the first model that used DNA sequence data. To the best of our knowledge, this is the first attempt to predict protein-binding nucleotides in a given DNA sequence from the sequence data alone. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
Classification of Dynamical Diffusion States in Single Molecule Tracking Microscopy
Bosch, Peter J.; Kanger, Johannes S.; Subramaniam, Vinod
2014-01-01
Single molecule tracking of membrane proteins by fluorescence microscopy is a promising method to investigate dynamic processes in live cells. Translating the trajectories of proteins to biological implications, such as protein interactions, requires the classification of protein motion within the trajectories. Spatial information of protein motion may reveal where the protein interacts with cellular structures, because binding of proteins to such structures often alters their diffusion speed. For dynamic diffusion systems, we provide an analytical framework to determine in which diffusion state a molecule is residing during the course of its trajectory. We compare different methods for the quantification of motion to utilize this framework for the classification of two diffusion states (two populations with different diffusion speed). We found that a gyration quantification method and a Bayesian statistics-based method are the most accurate in diffusion-state classification for realistic experimentally obtained datasets, of which the gyration method is much less computationally demanding. After classification of the diffusion, the lifetime of the states can be determined, and images of the diffusion states can be reconstructed at high resolution. Simulations validate these applications. We apply the classification and its applications to experimental data to demonstrate the potential of this approach to obtain further insights into the dynamics of cell membrane proteins. PMID:25099798
A cohort of new adhesive proteins identified from transcriptomic analysis of mussel foot glands.
DeMartini, Daniel G; Errico, John M; Sjoestroem, Sebastian; Fenster, April; Waite, J Herbert
2017-06-01
The adaptive attachment of marine mussels to a wide range of substrates in a high-energy, saline environment has been explored for decades and is a significant driver of bioinspired wet adhesion research. Mussel attachment relies on a fibrous holdfast known as the byssus, which is made by a specialized appendage called the foot. Multiple adhesive and structural proteins are rapidly synthesized, secreted and moulded by the foot into holdfast threads. About 10 well-characterized proteins, namely the mussel foot proteins (Mfps), the preCols and the thread matrix proteins, are reported as representing the bulk of these structures. To explore how robust this proposition is, we sequenced the transcriptome of the glandular tissues that produce and secrete the various holdfast components using next-generation sequencing methods. Surprisingly, we found around 15 highly expressed genes that have not previously been characterized, but bear key similarities to the previously defined mussel foot proteins, suggesting additional contribution to byssal function. We verified the validity of these transcripts by polymerase chain reaction, cloning and Sanger sequencing as well as confirming their presence as proteins in the byssus. These newly identified proteins greatly expand the palette of mussel holdfast biochemistry and provide new targets for investigation into bioinspired wet adhesion. © 2017 The Author(s).
Application of Time-Resolved Tryptophan Phosphorescence Spectroscopy to Protein Folding Studies.
NASA Astrophysics Data System (ADS)
Subramaniam, Vinod
This thesis presents studies of the protein folding problem, one of the most significant questions in contemporary biophysics. Sensitive biophysical techniques, including room temperature tryptophan phosphorescence, which reports on the local environment of the residue, and the lability of proteins to denaturation, a global parameter, were used to assess the validity of the traditional assumption that the biologically active state of a protein is the 'native' state, and to determine whether the pathways of folding in vitro lead to the folded state achieved in vivo. Phosphorescence techniques have also been extended to study, for the first time, emission from tryptophan residues engineered into specific positions as reporters of protein structure. During in vitro refolding of E. coli alkaline phosphatase and bovine 13-lactoglobulin, significant differences were found between the refolded proteins and the native conformations, which have no apparent effect on the biological functions. Slow conformational transitions, termed 'annealing,' that occur long after the return of enzyme activity of alkaline phosphatase are manifested in the retarded recovery of phosphorescence intensity, lifetime, and protein lability. While 'annealing' is not observed for beta -lactoglobulin, both phosphorescence and lability experiments reveal changes in the structure of the refolded protein, even though its biological activity, retinol binding, is fully recovered. This result suggests that the pathways of folding in vitro need not lead to the structure formed in vivo. We have used phosphorescence techniques to study the refolding of ribonuclease T1, which exhibits slow kinetics characteristic of proline isomerization. Furthermore, the ability to extract structural information from phosphorescent tryptophan probes engineered into selected regions represents an important advance in studying protein structure; we have reported the first such results from a mutant staphylococcal nuclease. The refolding data have been interpreted in the context of recent theoretical work on rugged energy landscape models of protein folding. Our results suggest that the barriers to folding can be as large as ~ 20 kcal-mol^{-1}, and imply that the conventional definition of the 'native' state as the biologically active conformation may need revision to acknowledge that the active state may represent a long-lived intermediate on the pathway to the native structure.
Panda, Subhamay; Kumari, Leena
2017-01-01
Serine proteases are a group of enzymes that hydrolyses the peptide bonds in proteins. In mammals, these enzymes help in the regulation of several major physiological functions such as digestion, blood clotting, responses of immune system, reproductive functions and the complement system. Serine proteases obtained from the venom of Octopodidae family is a relatively unexplored area of research. In the present work, we tried to effectively utilize comparative composite molecular modeling technique. Our key aim was to propose the first molecular model structure of unexplored serine protease 5 derived from big blue octopus. The other objective of this study was to analyze the distribution of negatively and positively charged amino acid over molecular modeled structure, distribution of secondary structural elements, hydrophobicity molecular surface analysis and electrostatic potential analysis with the aid of different bioinformatic tools. In the present study, molecular model has been generated with the help of I-TASSER suite. Afterwards the refined structural model was validated with standard methods. For functional annotation of protein molecule we used Protein Information Resource (PIR) database. Serine protease 5 of big blue octopus was analyzed with different bioinformatical algorithms for the distribution of negatively and positively charged amino acid over molecular modeled structure, distribution of secondary structural elements, hydrophobicity molecular surface analysis and electrostatic potential analysis. The functionally critical amino acids and ligand- binding site (LBS) of the proteins (modeled) were determined using the COACH program. The molecular model data in cooperation to other pertinent post model analysis data put forward molecular insight to proteolytic activity of serine protease 5, which helps in the clear understanding of procoagulant and anticoagulant characteristics of this natural lead molecule. Our approach was to investigate the octopus venom protein as a whole or a part of their structure that may result in the development of new lead molecule. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.
Donini, Stefano; Garavaglia, Silvia; Ferraris, Davide M.; Miggiano, Riccardo; Mori, Shigetarou; Shibayama, Keigo
2017-01-01
Mycobacterium smegmatis represents one model for studying the biology of its pathogenic relative Mycobacterium tuberculosis. The structural characterization of a M. tuberculosis ortholog protein can serve as a valid tool for the development of molecules active against the M. tuberculosis target. In this context, we report the biochemical and structural characterization of M. smegmatis phosphoribosylpyrophosphate synthetase (PrsA), the ortholog of M. tuberculosis PrsA, the unique enzyme responsible for the synthesis of phosphoribosylpyrophosphate (PRPP). PRPP is a key metabolite involved in several biosynthetic pathways including those for histidine, tryptophan, nucleotides and decaprenylphosphoryl-arabinose, an essential precursor for the mycobacterial cell wall biosynthesis. Since M. tuberculosis PrsA has been validated as a drug target for the development of antitubercular agents, the data presented here will add to the knowledge of the mycobacterial enzyme and could contribute to the development of M. tuberculosis PrsA inhibitors of potential pharmacological interest. PMID:28419153
Donini, Stefano; Garavaglia, Silvia; Ferraris, Davide M; Miggiano, Riccardo; Mori, Shigetarou; Shibayama, Keigo; Rizzi, Menico
2017-01-01
Mycobacterium smegmatis represents one model for studying the biology of its pathogenic relative Mycobacterium tuberculosis. The structural characterization of a M. tuberculosis ortholog protein can serve as a valid tool for the development of molecules active against the M. tuberculosis target. In this context, we report the biochemical and structural characterization of M. smegmatis phosphoribosylpyrophosphate synthetase (PrsA), the ortholog of M. tuberculosis PrsA, the unique enzyme responsible for the synthesis of phosphoribosylpyrophosphate (PRPP). PRPP is a key metabolite involved in several biosynthetic pathways including those for histidine, tryptophan, nucleotides and decaprenylphosphoryl-arabinose, an essential precursor for the mycobacterial cell wall biosynthesis. Since M. tuberculosis PrsA has been validated as a drug target for the development of antitubercular agents, the data presented here will add to the knowledge of the mycobacterial enzyme and could contribute to the development of M. tuberculosis PrsA inhibitors of potential pharmacological interest.
2018-01-01
Plant homeodomain (PHD) zinc fingers are histone reader domains that are often associated with human diseases. Despite this, they constitute a poorly targeted class of readers, suggesting low ligandability. Here, we describe a successful fragment-based campaign targeting PHD fingers from the proteins BAZ2A and BAZ2B as model systems. We validated a pool of in silico fragments both biophysically and structurally and solved the first crystal structures of PHD zinc fingers in complex with fragments bound to an anchoring pocket at the histone binding site. The best-validated hits were found to displace a histone H3 tail peptide in competition assays. This work identifies new chemical scaffolds that provide suitable starting points for future ligand optimization using structure-guided approaches. The demonstrated ligandability of the PHD reader domains could pave the way for the development of chemical probes to drug this family of epigenetic readers. PMID:29529862
Amato, Anastasia; Lucas, Xavier; Bortoluzzi, Alessio; Wright, David; Ciulli, Alessio
2018-04-20
Plant homeodomain (PHD) zinc fingers are histone reader domains that are often associated with human diseases. Despite this, they constitute a poorly targeted class of readers, suggesting low ligandability. Here, we describe a successful fragment-based campaign targeting PHD fingers from the proteins BAZ2A and BAZ2B as model systems. We validated a pool of in silico fragments both biophysically and structurally and solved the first crystal structures of PHD zinc fingers in complex with fragments bound to an anchoring pocket at the histone binding site. The best-validated hits were found to displace a histone H3 tail peptide in competition assays. This work identifies new chemical scaffolds that provide suitable starting points for future ligand optimization using structure-guided approaches. The demonstrated ligandability of the PHD reader domains could pave the way for the development of chemical probes to drug this family of epigenetic readers.
Internal protein motions in molecular-dynamics simulations of Bragg and diffuse X-ray scattering
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wall, Michael E.
Molecular-dynamics (MD) simulations of Bragg and diffuse X-ray scattering provide a means of obtaining experimentally validated models of protein conformational ensembles. This paper shows that compared with a single periodic unit-cell model, the accuracy of simulating diffuse scattering is increased when the crystal is modeled as a periodic supercell consisting of a 2 × 2 × 2 layout of eight unit cells. The MD simulations capture the general dependence of correlations on the separation of atoms. There is substantial agreement between the simulated Bragg reflections and the crystal structure; there are local deviations, however, indicating both the limitation of using a single structuremore » to model disordered regions of the protein and local deviations of the average structure away from the crystal structure. Although it was anticipated that a simulation of longer duration might be required to achieve maximal agreement of the diffuse scattering calculation with the data using the supercell model, only a microsecond is required, the same as for the unit cell. Rigid protein motions only account for a minority fraction of the variation in atom positions from the simulation. The results indicate that protein crystal dynamics may be dominated by internal motions rather than packing interactions, and that MD simulations can be combined with Bragg and diffuse X-ray scattering to model the protein conformational ensemble.« less
Internal protein motions in molecular-dynamics simulations of Bragg and diffuse X-ray scattering
Wall, Michael E.
2018-01-25
Molecular-dynamics (MD) simulations of Bragg and diffuse X-ray scattering provide a means of obtaining experimentally validated models of protein conformational ensembles. This paper shows that compared with a single periodic unit-cell model, the accuracy of simulating diffuse scattering is increased when the crystal is modeled as a periodic supercell consisting of a 2 × 2 × 2 layout of eight unit cells. The MD simulations capture the general dependence of correlations on the separation of atoms. There is substantial agreement between the simulated Bragg reflections and the crystal structure; there are local deviations, however, indicating both the limitation of using a single structuremore » to model disordered regions of the protein and local deviations of the average structure away from the crystal structure. Although it was anticipated that a simulation of longer duration might be required to achieve maximal agreement of the diffuse scattering calculation with the data using the supercell model, only a microsecond is required, the same as for the unit cell. Rigid protein motions only account for a minority fraction of the variation in atom positions from the simulation. The results indicate that protein crystal dynamics may be dominated by internal motions rather than packing interactions, and that MD simulations can be combined with Bragg and diffuse X-ray scattering to model the protein conformational ensemble.« less
Identifying protein β-turns with vibrational Raman optical activity.
Weymuth, Thomas; Jacob, Christoph R; Reiher, Markus
2011-04-18
β-turns belong to the most important secondary structure elements in proteins. On the basis of density functional calculations, vibrational Raman optical activity signatures of different types of β-turns are established and compared as well as related to other signatures proposed in the literature earlier. Our findings indicate that there are much more characteristic ROA signals of β-turns than have been hitherto suggested. These suggested signatures are, however, found to be valid for the most important type of β-turns. Moreover, we compare the influence of different amino acid side chains on these signatures and investigate the discrimination of β-turns from other secondary structure elements, namely α- and 3(10)-helices. Copyright © 2011 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
The design strategy of selective PTP1B inhibitors over TCPTP.
Li, XiangQian; Wang, LiJun; Shi, DaYong
2016-08-15
Protein tyrosine phosphatase 1B (PTP1B) has already been well studied as a highly validated therapeutic target for diabetes and obesity. However, the lack of selectivity limited further studies and clinical applications of PTP1B inhibitors, especially over T-cell protein tyrosine phosphatase (TCPTP). In this review, we enumerate the published specific inhibitors of PTP1B, discuss the structure-activity relationships by analysis of their X-ray structures or docking results, and summarize the characteristic of selectivity related residues and groups. Furthermore, the design strategy of selective PTP1B inhibitors over TCPTP is also proposed. We hope our work could provide an effective way to gain specific PTP1B inhibitors. Copyright © 2016 Elsevier Ltd. All rights reserved.
Perspective on computational and structural aspects of kinase discovery from IPK2014.
Martin, Eric; Knapp, Stefan; Engh, Richard A; Moebitz, Henrik; Varin, Thibault; Roux, Benoit; Meiler, Jens; Berdini, Valerio; Baumann, Alexander; Vieth, Michal
2015-10-01
Recent advances in understanding the activity and selectivity of kinase inhibitors and their relationships to protein structure are presented. Conformational selection in kinases is studied from empirical, data-driven and simulation approaches. Ligand binding and its affinity are, in many cases, determined by the predetermined active and inactive conformation of kinases. Binding affinity and selectivity predictions highlight the current state of the art and advances in computational chemistry as it applies to kinase inhibitor discovery. Kinome wide inhibitor profiling and cell panel profiling lead to a better understanding of selectivity and allow for target validation and patient tailoring hypotheses. This article is part of a Special Issue entitled: Inhibitors of Protein Kinases. Copyright © 2015 Elsevier B.V. All rights reserved.
Accounting for epistatic interactions improves the functional analysis of protein structures.
Wilkins, Angela D; Venner, Eric; Marciano, David C; Erdin, Serkan; Atri, Benu; Lua, Rhonald C; Lichtarge, Olivier
2013-11-01
The constraints under which sequence, structure and function coevolve are not fully understood. Bringing this mutual relationship to light can reveal the molecular basis of binding, catalysis and allostery, thereby identifying function and rationally guiding protein redesign. Underlying these relationships are the epistatic interactions that occur when the consequences of a mutation to a protein are determined by the genetic background in which it occurs. Based on prior data, we hypothesize that epistatic forces operate most strongly between residues nearby in the structure, resulting in smooth evolutionary importance across the structure. We find that when residue scores of evolutionary importance are distributed smoothly between nearby residues, functional site prediction accuracy improves. Accordingly, we designed a novel measure of evolutionary importance that focuses on the interaction between pairs of structurally neighboring residues. This measure that we term pair-interaction Evolutionary Trace yields greater functional site overlap and better structure-based proteome-wide functional predictions. Our data show that the structural smoothness of evolutionary importance is a fundamental feature of the coevolution of sequence, structure and function. Mutations operate on individual residues, but selective pressure depends in part on the extent to which a mutation perturbs interactions with neighboring residues. In practice, this principle led us to redefine the importance of a residue in terms of the importance of its epistatic interactions with neighbors, yielding better annotation of functional residues, motivating experimental validation of a novel functional site in LexA and refining protein function prediction. lichtarge@bcm.edu. Supplementary data are available at Bioinformatics online.
Accounting for epistatic interactions improves the functional analysis of protein structures
Wilkins, Angela D.; Venner, Eric; Marciano, David C.; Erdin, Serkan; Atri, Benu; Lua, Rhonald C.; Lichtarge, Olivier
2013-01-01
Motivation: The constraints under which sequence, structure and function coevolve are not fully understood. Bringing this mutual relationship to light can reveal the molecular basis of binding, catalysis and allostery, thereby identifying function and rationally guiding protein redesign. Underlying these relationships are the epistatic interactions that occur when the consequences of a mutation to a protein are determined by the genetic background in which it occurs. Based on prior data, we hypothesize that epistatic forces operate most strongly between residues nearby in the structure, resulting in smooth evolutionary importance across the structure. Methods and Results: We find that when residue scores of evolutionary importance are distributed smoothly between nearby residues, functional site prediction accuracy improves. Accordingly, we designed a novel measure of evolutionary importance that focuses on the interaction between pairs of structurally neighboring residues. This measure that we term pair-interaction Evolutionary Trace yields greater functional site overlap and better structure-based proteome-wide functional predictions. Conclusions: Our data show that the structural smoothness of evolutionary importance is a fundamental feature of the coevolution of sequence, structure and function. Mutations operate on individual residues, but selective pressure depends in part on the extent to which a mutation perturbs interactions with neighboring residues. In practice, this principle led us to redefine the importance of a residue in terms of the importance of its epistatic interactions with neighbors, yielding better annotation of functional residues, motivating experimental validation of a novel functional site in LexA and refining protein function prediction. Contact: lichtarge@bcm.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:24021383
Expanded explorations into the optimization of an energy function for protein design
Huang, Yao-ming; Bystroff, Christopher
2014-01-01
Nature possesses a secret formula for the energy as a function of the structure of a protein. In protein design, approximations are made to both the structural representation of the molecule and to the form of the energy equation, such that the existence of a general energy function for proteins is by no means guaranteed. Here we present new insights towards the application of machine learning to the problem of finding a general energy function for protein design. Machine learning requires the definition of an objective function, which carries with it the implied definition of success in protein design. We explored four functions, consisting of two functional forms, each with two criteria for success. Optimization was carried out by a Monte Carlo search through the space of all variable parameters. Cross-validation of the optimized energy function against a test set gave significantly different results depending on the choice of objective function, pointing to relative correctness of the built-in assumptions. Novel energy cross-terms correct for the observed non-additivity of energy terms and an imbalance in the distribution of predicted amino acids. This paper expands on the work presented at ACM-BCB, Orlando FL , October 2012. PMID:24384706
Thomas, Karluss; Herouet-Guicheney, Corinne; Ladics, Gregory; McClain, Scott; MacIntosh, Susan; Privalle, Laura; Woolhiser, Mike
2008-09-01
The International Life Science Institute's Health and Environmental Sciences Institute's Protein Allergenicity Technical Committee hosted an international workshop October 23-25, 2007, in Nice, France, to review and discuss existing and emerging methods and techniques for improving the current weight-of-evidence approach for evaluating the potential allergenicity of novel proteins. The workshop included over 40 international experts from government, industry, and academia. Their expertise represented a range of disciplines including immunology, chemistry, molecular biology, bioinformatics, and toxicology. Among participants, there was consensus that (1) current bioinformatic approaches are highly conservative; (2) advances in bioinformatics using structural comparisons of proteins may be helpful as the availability of structural data increases; (3) proteomics may prove useful for monitoring the natural variability in a plant's proteome and assessing the impact of biotechnology transformations on endogenous levels of allergens, but only when analytical techniques have been standardized and additional data are available on the natural variation of protein expression in non-transgenic bred plants; (4) basophil response assays are promising techniques, but need additional evaluation around specificity, sensitivity, and reproducibility; (5) additional research is required to develop and validate an animal model for the purpose of predicting protein allergenicity.
Transmembrane protein topology prediction using support vector machines.
Nugent, Timothy; Jones, David T
2009-05-26
Alpha-helical transmembrane (TM) proteins are involved in a wide range of important biological processes such as cell signaling, transport of membrane-impermeable molecules, cell-cell communication, cell recognition and cell adhesion. Many are also prime drug targets, and it has been estimated that more than half of all drugs currently on the market target membrane proteins. However, due to the experimental difficulties involved in obtaining high quality crystals, this class of protein is severely under-represented in structural databases. In the absence of structural data, sequence-based prediction methods allow TM protein topology to be investigated. We present a support vector machine-based (SVM) TM protein topology predictor that integrates both signal peptide and re-entrant helix prediction, benchmarked with full cross-validation on a novel data set of 131 sequences with known crystal structures. The method achieves topology prediction accuracy of 89%, while signal peptides and re-entrant helices are predicted with 93% and 44% accuracy respectively. An additional SVM trained to discriminate between globular and TM proteins detected zero false positives, with a low false negative rate of 0.4%. We present the results of applying these tools to a number of complete genomes. Source code, data sets and a web server are freely available from http://bioinf.cs.ucl.ac.uk/psipred/. The high accuracy of TM topology prediction which includes detection of both signal peptides and re-entrant helices, combined with the ability to effectively discriminate between TM and globular proteins, make this method ideally suited to whole genome annotation of alpha-helical transmembrane proteins.
NASA Astrophysics Data System (ADS)
Demers, Jean-Philippe; Habenstein, Birgit; Loquet, Antoine; Kumar Vasa, Suresh; Giller, Karin; Becker, Stefan; Baker, David; Lange, Adam; Sgourakis, Nikolaos G.
2014-09-01
We introduce a general hybrid approach for determining the structures of supramolecular assemblies. Cryo-electron microscopy (cryo-EM) data define the overall envelope of the assembly and rigid-body orientation of the subunits while solid-state nuclear magnetic resonance (ssNMR) chemical shifts and distance constraints define the local secondary structure, protein fold and inter-subunit interactions. Finally, Rosetta structure calculations provide a general framework to integrate the different sources of structural information. Combining a 7.7-Å cryo-EM density map and 996 ssNMR distance constraints, the structure of the type-III secretion system needle of Shigella flexneri is determined to a precision of 0.4 Å. The calculated structures are cross-validated using an independent data set of 691 ssNMR constraints and scanning transmission electron microscopy measurements. The hybrid model resolves the conformation of the non-conserved N terminus, which occupies a protrusion in the cryo-EM density, and reveals conserved pore residues forming a continuous pattern of electrostatic interactions, thereby suggesting a mechanism for effector protein translocation.
Structure determination of helical filaments by solid-state NMR spectroscopy
Ahmed, Mumdooh; Spehr, Johannes; König, Renate; Lünsdorf, Heinrich; Rand, Ulfert; Lührs, Thorsten; Ritter, Christiane
2016-01-01
The controlled formation of filamentous protein complexes plays a crucial role in many biological systems and represents an emerging paradigm in signal transduction. The mitochondrial antiviral signaling protein (MAVS) is a central signal transduction hub in innate immunity that is activated by a receptor-induced conversion into helical superstructures (filaments) assembled from its globular caspase activation and recruitment domain. Solid-state NMR (ssNMR) spectroscopy has become one of the most powerful techniques for atomic resolution structures of protein fibrils. However, for helical filaments, the determination of the correct symmetry parameters has remained a significant hurdle for any structural technique and could thus far not be precisely derived from ssNMR data. Here, we solved the atomic resolution structure of helical MAVSCARD filaments exclusively from ssNMR data. We present a generally applicable approach that systematically explores the helical symmetry space by efficient modeling of the helical structure restrained by interprotomer ssNMR distance restraints. Together with classical automated NMR structure calculation, this allowed us to faithfully determine the symmetry that defines the entire assembly. To validate our structure, we probed the protomer arrangement by solvent paramagnetic resonance enhancement, analysis of chemical shift differences relative to the solution NMR structure of the monomer, and mutagenesis. We provide detailed information on the atomic contacts that determine filament stability and describe mechanistic details on the formation of signaling-competent MAVS filaments from inactive monomers. PMID:26733681
Siddiqui, Mohd Faizan; Bano, Bilqees
2018-06-06
Intrinsic and extrinsic factors are responsible for the transition of soluble proteins into aggregated form. Trifluoroethanol is among such potent extrinsic factor which facilitates the formation of aggregated structure. It disrupts the interactive forces and destabilizes the native structure of the protein. The present study investigates the effect of trifluoroethanol (TFE) on garlic cystatin. Garlic cystatin was incubated with increasing concentration of TFE (0-90% v/v) for 4 h. Incubation of GPC with TFE induces structural changes thereby resulting in the formation of aggregates. Inactivation of garlic phytocystatin was confirmed by cysteine proteinase inhibitory activity. Garlic cystatin at 30% TFE exhibits native-like secondary structure and high ANS fluorescence, thus suggesting the presence of molten globule state. Circular dichroism and FTIR confirmed the transition of the native alpha-helical structure of garlic cystatin to the beta-sheet structure at 60% TFE. Furthermore, increased ThT fluorescence and redshift in Congo red absorbance assay confirmed the presence of aggregates. Rayleigh and turbidity assay was also performed to validate the aggregation results. Scanning electron microscopy was followed to analyze the morphological changes which confirm the presence of sheath-like structure at 60% TFE. The study sheds light on the conformational behavior of a plant protein when kept under stress condition induced by an extrinsic factor. Copyright © 2018 Elsevier B.V. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hast, Michael A.; Nichols, Connie B.; Armstrong, Stephanie M.
Cryptococcus neoformans is a fungal pathogen that causes life-threatening infections in immunocompromised individuals, including AIDS patients and transplant recipients. Few antifungals can treat C. neoformans infections, and drug resistance is increasing. Protein farnesyltransferase (FTase) catalyzes post-translational lipidation of key signal transduction proteins and is essential in C. neoformans. We present a multidisciplinary study validating C. neoformans FTase (CnFTase) as a drug target, showing that several anticancer FTase inhibitors with disparate scaffolds can inhibit C. neoformans and suggesting structure-based strategies for further optimization of these leads. Structural studies are an essential element for species-specific inhibitor development strategies by revealing similarities andmore » differences between pathogen and host orthologs that can be exploited. We, therefore, present eight crystal structures of CnFTase that define the enzymatic reaction cycle, basis of ligand selection, and structurally divergent regions of the active site. Crystal structures of clinically important anticancer FTase inhibitors in complex with CnFTase reveal opportunities for optimization of selectivity for the fungal enzyme by modifying functional groups that interact with structurally diverse regions. A substrate-induced conformational change in CnFTase is observed as part of the reaction cycle, a feature that is mechanistically distinct from human FTase. Our combined structural and functional studies provide a framework for developing FTase inhibitors to treat invasive fungal infections.« less
Watanabe, Hideki; Matsumaru, Hiroyuki; Ooishi, Ayako; Feng, Yanwen; Odahara, Takayuki; Suto, Kyoko; Honda, Shinya
2009-05-01
Protein-protein interaction in response to environmental conditions enables sophisticated biological and biotechnological processes. Aiming toward the rational design of a pH-sensitive protein-protein interaction, we engineered pH-sensitive mutants of streptococcal protein G B1, a binder to the IgG constant region. We systematically introduced histidine residues into the binding interface to cause electrostatic repulsion on the basis of a rigid body model. Exquisite pH sensitivity of this interaction was confirmed by surface plasmon resonance and affinity chromatography employing a clinically used human IgG. The pH-sensitive mechanism of the interaction was analyzed and evaluated from kinetic, thermodynamic, and structural viewpoints. Histidine-mediated electrostatic repulsion resulted in significant loss of exothermic heat of the binding that decreased the affinity only at acidic conditions, thereby improving the pH sensitivity. The reduced binding energy was partly recovered by "enthalpy-entropy compensation." Crystal structures of the designed mutants confirmed the validity of the rigid body model on which the effective electrostatic repulsion was based. Moreover, our data suggested that the entropy gain involved exclusion of water molecules solvated in a space formed by the introduced histidine and adjacent tryptophan residue. Our findings concerning the mechanism of histidine-introduced interactions will provide a guideline for the rational design of pH-sensitive protein-protein recognition.
Wang, Qian; Li, Yanwei; Dong, Hong; Wang, Li; Peng, Jinmei; An, Tongqing; Yang, Xufu; Tian, Zhijun; Cai, Xuehui
2017-02-22
The highly pathogenic porcine reproductive and respiratory syndrome virus (HP-PRRSV) continues to pose one of the greatest threats to the swine industry. M protein is the most conserved and important structural protein of PRRSV. However, information about the host cellular proteins that interact with M protein remains limited. Host cellular proteins that interact with the M protein of HP-PRRSV were immunoprecipitated from MARC-145 cells infected with PRRSV HuN4-F112 using the M monoclonal antibody (mAb). The differentially expressed proteins were identified by LC-MS/MS. The screened proteins were used for bioinformatics analysis including Gene Ontology, the interaction network, and the enriched KEGG pathways. Some interested cellular proteins were validated to interact with M protein by CO-IP. The PRRSV HuN4-F112 infection group had 10 bands compared with the control group. The bands included 219 non-redundant cellular proteins that interact with M protein, which were identified by LC-MS/MS with high confidence. The gene ontology and Kyoto encyclopedia of genes and genomes (KEGG) pathway bioinformatic analyses indicated that the identified proteins could be assigned to several different subcellular locations and functional classes. Functional analysis of the interactome profile highlighted cellular pathways associated with protein translation, infectious disease, and signal transduction. Two interested cellular proteins-nuclear factor of activated T cells 45 kDa (NF45) and proliferating cell nuclear antigen (PCNA)-that could interact with M protein were validated by Co-IP and confocal analyses. The interactome data between PRRSV M protein and cellular proteins were identified and contribute to the understanding of the roles of M protein in the replication and pathogenesis of PRRSV. The interactome of M protein will aid studies of virus/host interactions and provide means to decrease the threat of PRRSV to the swine industry in the future.
Lu, Diannan; Liu, Zheng; Wu, Jianzhong
2006-01-01
Proteins fold in a confined space not only in vivo, i.e., folding assisted by molecular chaperons and chaperonins in a crowded cellular medium, but also in vitro as in production of recombinant proteins. Despite extensive work on protein folding in bulk, little is known about how and to what extent the thermodynamics and kinetics of protein folding are altered by confinement. In this work, we use a Gō-like off-lattice model to investigate the folding and stability of an all β-sheet protein in spherical cages of different sizes and surface hydrophobicity. We find whereas extreme confinement inhibits correct folding, a hydrophilic cage stabilizes the protein due to restriction of the unfolded configurations. In a hydrophobic cage, however, strong attraction from the cage surface destabilizes the confined protein because of competition between self-aggregation and adsorption of hydrophobic residues. We show that the kinetics of protein collapse and folding is strongly correlated with both the cage size and the surface hydrophobicity. It is demonstrated that a cage of moderate size and hydrophobicity optimizes both the folding yield and kinetics of structural transitions. To support the simulation results, we have also investigated the refolding of hen-egg lysozyme in the presence of cetyltrimethylammoniumbromide (CTAB) surfactants that provide an effective confinement of the proteins by micellization. The influence of the surfactant hydrophobicity on the structural and biological activity of the protein is determined with circular dichroism spectrum, fluorescence emission spectrum, and biological activity assay. It is shown that, as predicted by coarse-grained simulations, CTAB micelles facilitate the collapse of denatured lysozyme, whereas the addition of β-cyclodextrin-grafted-PNIPAAm, a weakly hydrophobic stripper, dissociates CTAB micelles and promotes the conformational rearrangement and thereby gives an improved recovery of lysozyme activity. PMID:16461405
[Can the local energy minimization refine the PDB structures of different resolution universally?].
Godzi, M G; Gromova, A P; Oferkin, I V; Mironov, P V
2009-01-01
The local energy minimization was statistically validated as the refinement strategy for PDB structure pairs of different resolution. Thirteen pairs of structures with the only difference in resolution were extracted from PDB, and the structures of 11 identical proteins obtained by different X-ray diffraction techniques were represented. The distribution of RMSD value was calculated for these pairs before and after the local energy minimization of each structure. The MMFF94 field was used for energy calculations, and the quasi-Newton method was used for local energy minimization. By comparison of these two RMSD distributions, the local energy minimization was proved to statistically increase the structural differences in pairs so that it cannot be used for refinement purposes. To explore the prospects of complex refinement strategies based on energy minimization, randomized structures were obtained by moving the initial PDB structures as far as the minimized structures had been moved in a multidimensional space of atomic coordinates. For these randomized structures, the RMSD distribution was calculated and compared with that for minimized structures. The significant differences in their mean values proved the energy surface of the protein to have only few minima near the conformations of different resolution obtained by X-ray diffraction for PDB. Some other results obtained by exploring the energy surface near these conformations are also presented. These results are expected to be very useful for the development of new protein refinement strategies based on energy minimization.
Integrative Annotation of 21,037 Human Genes Validated by Full-Length cDNA Clones
Imanishi, Tadashi; Itoh, Takeshi; Suzuki, Yutaka; O'Donovan, Claire; Fukuchi, Satoshi; Koyanagi, Kanako O; Barrero, Roberto A; Tamura, Takuro; Yamaguchi-Kabata, Yumi; Tanino, Motohiko; Yura, Kei; Miyazaki, Satoru; Ikeo, Kazuho; Homma, Keiichi; Kasprzyk, Arek; Nishikawa, Tetsuo; Hirakawa, Mika; Thierry-Mieg, Jean; Thierry-Mieg, Danielle; Ashurst, Jennifer; Jia, Libin; Nakao, Mitsuteru; Thomas, Michael A; Mulder, Nicola; Karavidopoulou, Youla; Jin, Lihua; Kim, Sangsoo; Yasuda, Tomohiro; Lenhard, Boris; Eveno, Eric; Suzuki, Yoshiyuki; Yamasaki, Chisato; Takeda, Jun-ichi; Gough, Craig; Hilton, Phillip; Fujii, Yasuyuki; Sakai, Hiroaki; Tanaka, Susumu; Amid, Clara; Bellgard, Matthew; Bonaldo, Maria de Fatima; Bono, Hidemasa; Bromberg, Susan K; Brookes, Anthony J; Bruford, Elspeth; Carninci, Piero; Chelala, Claude; Couillault, Christine; de Souza, Sandro J.; Debily, Marie-Anne; Devignes, Marie-Dominique; Dubchak, Inna; Endo, Toshinori; Estreicher, Anne; Eyras, Eduardo; Fukami-Kobayashi, Kaoru; R. Gopinath, Gopal; Graudens, Esther; Hahn, Yoonsoo; Han, Michael; Han, Ze-Guang; Hanada, Kousuke; Hanaoka, Hideki; Harada, Erimi; Hashimoto, Katsuyuki; Hinz, Ursula; Hirai, Momoki; Hishiki, Teruyoshi; Hopkinson, Ian; Imbeaud, Sandrine; Inoko, Hidetoshi; Kanapin, Alexander; Kaneko, Yayoi; Kasukawa, Takeya; Kelso, Janet; Kersey, Paul; Kikuno, Reiko; Kimura, Kouichi; Korn, Bernhard; Kuryshev, Vladimir; Makalowska, Izabela; Makino, Takashi; Mano, Shuhei; Mariage-Samson, Regine; Mashima, Jun; Matsuda, Hideo; Mewes, Hans-Werner; Minoshima, Shinsei; Nagai, Keiichi; Nagasaki, Hideki; Nagata, Naoki; Nigam, Rajni; Ogasawara, Osamu; Ohara, Osamu; Ohtsubo, Masafumi; Okada, Norihiro; Okido, Toshihisa; Oota, Satoshi; Ota, Motonori; Ota, Toshio; Otsuki, Tetsuji; Piatier-Tonneau, Dominique; Poustka, Annemarie; Ren, Shuang-Xi; Saitou, Naruya; Sakai, Katsunaga; Sakamoto, Shigetaka; Sakate, Ryuichi; Schupp, Ingo; Servant, Florence; Sherry, Stephen; Shiba, Rie; Shimizu, Nobuyoshi; Shimoyama, Mary; Simpson, Andrew J; Soares, Bento; Steward, Charles; Suwa, Makiko; Suzuki, Mami; Takahashi, Aiko; Tamiya, Gen; Tanaka, Hiroshi; Taylor, Todd; Terwilliger, Joseph D; Unneberg, Per; Veeramachaneni, Vamsi; Watanabe, Shinya; Wilming, Laurens; Yasuda, Norikazu; Yoo, Hyang-Sook; Stodolsky, Marvin; Makalowski, Wojciech; Go, Mitiko; Nakai, Kenta; Takagi, Toshihisa; Kanehisa, Minoru; Sakaki, Yoshiyuki; Quackenbush, John; Okazaki, Yasushi; Hayashizaki, Yoshihide; Hide, Winston; Chakraborty, Ranajit; Nishikawa, Ken; Sugawara, Hideaki; Tateno, Yoshio; Chen, Zhu; Oishi, Michio; Tonellato, Peter; Apweiler, Rolf; Okubo, Kousaku; Wagner, Lukas; Wiemann, Stefan; Strausberg, Robert L; Isogai, Takao; Auffray, Charles; Nomura, Nobuo; Sugano, Sumio
2004-01-01
The human genome sequence defines our inherent biological potential; the realization of the biology encoded therein requires knowledge of the function of each gene. Currently, our knowledge in this area is still limited. Several lines of investigation have been used to elucidate the structure and function of the genes in the human genome. Even so, gene prediction remains a difficult task, as the varieties of transcripts of a gene may vary to a great extent. We thus performed an exhaustive integrative characterization of 41,118 full-length cDNAs that capture the gene transcripts as complete functional cassettes, providing an unequivocal report of structural and functional diversity at the gene level. Our international collaboration has validated 21,037 human gene candidates by analysis of high-quality full-length cDNA clones through curation using unified criteria. This led to the identification of 5,155 new gene candidates. It also manifested the most reliable way to control the quality of the cDNA clones. We have developed a human gene database, called the H-Invitational Database (H-InvDB; http://www.h-invitational.jp/). It provides the following: integrative annotation of human genes, description of gene structures, details of novel alternative splicing isoforms, non-protein-coding RNAs, functional domains, subcellular localizations, metabolic pathways, predictions of protein three-dimensional structure, mapping of known single nucleotide polymorphisms (SNPs), identification of polymorphic microsatellite repeats within human genes, and comparative results with mouse full-length cDNAs. The H-InvDB analysis has shown that up to 4% of the human genome sequence (National Center for Biotechnology Information build 34 assembly) may contain misassembled or missing regions. We found that 6.5% of the human gene candidates (1,377 loci) did not have a good protein-coding open reading frame, of which 296 loci are strong candidates for non-protein-coding RNA genes. In addition, among 72,027 uniquely mapped SNPs and insertions/deletions localized within human genes, 13,215 nonsynonymous SNPs, 315 nonsense SNPs, and 452 indels occurred in coding regions. Together with 25 polymorphic microsatellite repeats present in coding regions, they may alter protein structure, causing phenotypic effects or resulting in disease. The H-InvDB platform represents a substantial contribution to resources needed for the exploration of human biology and pathology. PMID:15103394
Identification of DNA-Binding Proteins Using Mixed Feature Representation Methods.
Qu, Kaiyang; Han, Ke; Wu, Song; Wang, Guohua; Wei, Leyi
2017-09-22
DNA-binding proteins play vital roles in cellular processes, such as DNA packaging, replication, transcription, regulation, and other DNA-associated activities. The current main prediction method is based on machine learning, and its accuracy mainly depends on the features extraction method. Therefore, using an efficient feature representation method is important to enhance the classification accuracy. However, existing feature representation methods cannot efficiently distinguish DNA-binding proteins from non-DNA-binding proteins. In this paper, a multi-feature representation method, which combines three feature representation methods, namely, K-Skip-N-Grams, Information theory, and Sequential and structural features (SSF), is used to represent the protein sequences and improve feature representation ability. In addition, the classifier is a support vector machine. The mixed-feature representation method is evaluated using 10-fold cross-validation and a test set. Feature vectors, which are obtained from a combination of three feature extractions, show the best performance in 10-fold cross-validation both under non-dimensional reduction and dimensional reduction by max-relevance-max-distance. Moreover, the reduced mixed feature method performs better than the non-reduced mixed feature technique. The feature vectors, which are a combination of SSF and K-Skip-N-Grams, show the best performance in the test set. Among these methods, mixed features exhibit superiority over the single features.
Zhou, Hua; Pisitkun, Trairak; Aponte, Angel; Yuen, Peter S.T.; Hoffert, Jason D.; Yasuda, Hideo; Hu, Xuzhen; Chawla, Lakhmir; Shen, Rong-Fong; Knepper, Mark A.; Star., Robert A.
2008-01-01
Urinary exosomes containing apical membrane and intracellular fluid are normally secreted into the urine from all nephron segments, and may carry protein markers of renal dysfunction and structural injury. We aimed to discover biomarkers in urinary exosomes to detect acute kidney injury (AKI) which has a high mortality and morbidity. Animals were injected intravenously with cisplatin. Urinary exosomes were isolated by differential centrifugation. Protein changes were evaluated by two-dimensional difference in gel electrophoresis and changed proteins were identified by MALDI-TOF-TOF or LC-MS/MS. The identified candidate biomarkers were validated by western blotting in individual urine samples from rats subjected to cisplatin injection; bilateral ischemia and reperfusion (I/R); volume depletion (VD); and ICU patients with and without AKI. We identified 18 proteins that were increased and 9 proteins that were decreased 8 hr after cisplatin. Most of the candidates could not be validated by western blotting. However, exosomal Fetuin-A increased 52.5-fold at day 2 (1 day before serum creatinine increase and tubule damage) and remained elevated 51.5-fold at day 5 (peak renal injury) after cisplatin injection. By immuno-electron microscopy and elution studies, Fetuin-A was located inside urinary exosomes. Urinary Fetuin-A was increased 31.6-fold in the early phase (2~8hr) of ischemia/reperfusion, but not in prerenal azotemia. Urinary exosomal Fetuin-A also increased in three ICU patients with AKI compared to the patients without AKI. We conclude that 1) Proteomic analysis of urinary exosomes can provide biomarker candidates for the diagnosis of AKI; 2) Urinary Fetuin-A might be a predictive biomarker of structural renal injury. PMID:17021608
Multivariate Analyses of Quality Metrics for Crystal Structures in the PDB Archive.
Shao, Chenghua; Yang, Huanwang; Westbrook, John D; Young, Jasmine Y; Zardecki, Christine; Burley, Stephen K
2017-03-07
Following deployment of an augmented validation system by the Worldwide Protein Data Bank (wwPDB) partnership, the quality of crystal structures entering the PDB has improved. Of significance are improvements in quality measures now prominently displayed in the wwPDB validation report. Comparisons of PDB depositions made before and after introduction of the new reporting system show improvements in quality measures relating to pairwise atom-atom clashes, side-chain torsion angle rotamers, and local agreement between the atomic coordinate structure model and experimental electron density data. These improvements are largely independent of resolution limit and sample molecular weight. No significant improvement in the quality of associated ligands was observed. Principal component analysis revealed that structure quality could be summarized with three measures (Rfree, real-space R factor Z score, and a combined molecular geometry quality metric), which can in turn be reduced to a single overall quality metric readily interpretable by all PDB archive users. Copyright © 2017 Elsevier Ltd. All rights reserved.
Drug search for leishmaniasis: a virtual screening approach by grid computing
NASA Astrophysics Data System (ADS)
Ochoa, Rodrigo; Watowich, Stanley J.; Flórez, Andrés; Mesa, Carol V.; Robledo, Sara M.; Muskus, Carlos
2016-07-01
The trypanosomatid protozoa Leishmania is endemic in 100 countries, with infections causing 2 million new cases of leishmaniasis annually. Disease symptoms can include severe skin and mucosal ulcers, fever, anemia, splenomegaly, and death. Unfortunately, therapeutics approved to treat leishmaniasis are associated with potentially severe side effects, including death. Furthermore, drug-resistant Leishmania parasites have developed in most endemic countries. To address an urgent need for new, safe and inexpensive anti-leishmanial drugs, we utilized the IBM World Community Grid to complete computer-based drug discovery screens (Drug Search for Leishmaniasis) using unique leishmanial proteins and a database of 600,000 drug-like small molecules. Protein structures from different Leishmania species were selected for molecular dynamics (MD) simulations, and a series of conformational "snapshots" were chosen from each MD trajectory to simulate the protein's flexibility. A Relaxed Complex Scheme methodology was used to screen 2000 MD conformations against the small molecule database, producing >1 billion protein-ligand structures. For each protein target, a binding spectrum was calculated to identify compounds predicted to bind with highest average affinity to all protein conformations. Significantly, four different Leishmania protein targets were predicted to strongly bind small molecules, with the strongest binding interactions predicted to occur for dihydroorotate dehydrogenase (LmDHODH; PDB:3MJY). A number of predicted tight-binding LmDHODH inhibitors were tested in vitro and potent selective inhibitors of Leishmania panamensis were identified. These promising small molecules are suitable for further development using iterative structure-based optimization and in vitro/in vivo validation assays.
Protein flexibility: coordinate uncertainties and interpretation of structural differences
DOE Office of Scientific and Technical Information (OSTI.GOV)
Rashin, Alexander A., E-mail: alexander-rashin@hotmail.com; LH Baker Center for Bioinformatics and Department of Biochemistry, Biophysics and Molecular Biology, 112 Office and Lab Building, Iowa State University, Ames, IA 50011-3020; Rashin, Abraham H. L.
2009-11-01
Criteria for the interpretability of coordinate differences and a new method for identifying rigid-body motions and nonrigid deformations in protein conformational changes are developed and applied to functionally induced and crystallization-induced conformational changes. Valid interpretations of conformational movements in protein structures determined by X-ray crystallography require that the movement magnitudes exceed their uncertainty threshold. Here, it is shown that such thresholds can be obtained from the distance difference matrices (DDMs) of 1014 pairs of independently determined structures of bovine ribonuclease A and sperm whale myoglobin, with no explanations provided for reportedly minor coordinate differences. The smallest magnitudes of reportedly functionalmore » motions are just above these thresholds. Uncertainty thresholds can provide objective criteria that distinguish between true conformational changes and apparent ‘noise’, showing that some previous interpretations of protein coordinate changes attributed to external conditions or mutations may be doubtful or erroneous. The use of uncertainty thresholds, DDMs, the newly introduced CDDMs (contact distance difference matrices) and a novel simple rotation algorithm allows a more meaningful classification and description of protein motions, distinguishing between various rigid-fragment motions and nonrigid conformational deformations. It is also shown that half of 75 pairs of identical molecules, each from the same asymmetric crystallographic cell, exhibit coordinate differences that range from just outside the coordinate uncertainty threshold to the full magnitude of large functional movements. Thus, crystallization might often induce protein conformational changes that are comparable to those related to or induced by the protein function.« less
Drug search for leishmaniasis: a virtual screening approach by grid computing.
Ochoa, Rodrigo; Watowich, Stanley J; Flórez, Andrés; Mesa, Carol V; Robledo, Sara M; Muskus, Carlos
2016-07-01
The trypanosomatid protozoa Leishmania is endemic in ~100 countries, with infections causing ~2 million new cases of leishmaniasis annually. Disease symptoms can include severe skin and mucosal ulcers, fever, anemia, splenomegaly, and death. Unfortunately, therapeutics approved to treat leishmaniasis are associated with potentially severe side effects, including death. Furthermore, drug-resistant Leishmania parasites have developed in most endemic countries. To address an urgent need for new, safe and inexpensive anti-leishmanial drugs, we utilized the IBM World Community Grid to complete computer-based drug discovery screens (Drug Search for Leishmaniasis) using unique leishmanial proteins and a database of 600,000 drug-like small molecules. Protein structures from different Leishmania species were selected for molecular dynamics (MD) simulations, and a series of conformational "snapshots" were chosen from each MD trajectory to simulate the protein's flexibility. A Relaxed Complex Scheme methodology was used to screen ~2000 MD conformations against the small molecule database, producing >1 billion protein-ligand structures. For each protein target, a binding spectrum was calculated to identify compounds predicted to bind with highest average affinity to all protein conformations. Significantly, four different Leishmania protein targets were predicted to strongly bind small molecules, with the strongest binding interactions predicted to occur for dihydroorotate dehydrogenase (LmDHODH; PDB:3MJY). A number of predicted tight-binding LmDHODH inhibitors were tested in vitro and potent selective inhibitors of Leishmania panamensis were identified. These promising small molecules are suitable for further development using iterative structure-based optimization and in vitro/in vivo validation assays.
Carbohydrate-protein interactions: molecular modeling insights.
Pérez, Serge; Tvaroška, Igor
2014-01-01
The article reviews the significant contributions to, and the present status of, applications of computational methods for the characterization and prediction of protein-carbohydrate interactions. After a presentation of the specific features of carbohydrate modeling, along with a brief description of the experimental data and general features of carbohydrate-protein interactions, the survey provides a thorough coverage of the available computational methods and tools. At the quantum-mechanical level, the use of both molecular orbitals and density-functional theory is critically assessed. These are followed by a presentation and critical evaluation of the applications of semiempirical and empirical methods: QM/MM, molecular dynamics, free-energy calculations, metadynamics, molecular robotics, and others. The usefulness of molecular docking in structural glycobiology is evaluated by considering recent docking- validation studies on a range of protein targets. The range of applications of these theoretical methods provides insights into the structural, energetic, and mechanistic facets that occur in the course of the recognition processes. Selected examples are provided to exemplify the usefulness and the present limitations of these computational methods in their ability to assist in elucidation of the structural basis underlying the diverse function and biological roles of carbohydrates in their dialogue with proteins. These test cases cover the field of both carbohydrate biosynthesis and glycosyltransferases, as well as glycoside hydrolases. The phenomenon of (macro)molecular recognition is illustrated for the interactions of carbohydrates with such proteins as lectins, monoclonal antibodies, GAG-binding proteins, porins, and viruses. © 2014 Elsevier Inc. All rights reserved.
Moustafa, Ibrahim M; Gohara, David W; Uchida, Akira; Yennawar, Neela; Cameron, Craig E
2015-11-23
The genomes of RNA viruses are relatively small. To overcome the small-size limitation, RNA viruses assign distinct functions to the processed viral proteins and their precursors. This is exemplified by poliovirus 3CD protein. 3C protein is a protease and RNA-binding protein. 3D protein is an RNA-dependent RNA polymerase (RdRp). 3CD exhibits unique protease and RNA-binding activities relative to 3C and is devoid of RdRp activity. The origin of these differences is unclear, since crystal structure of 3CD revealed "beads-on-a-string" structure with no significant structural differences compared to the fully processed proteins. We performed molecular dynamics (MD) simulations on 3CD to investigate its conformational dynamics. A compact conformation of 3CD was observed that was substantially different from that shown crystallographically. This new conformation explained the unique properties of 3CD relative to the individual proteins. Interestingly, simulations of mutant 3CD showed altered interface. Additionally, accelerated MD simulations uncovered a conformational ensemble of 3CD. When we elucidated the 3CD conformations in solution using small-angle X-ray scattering (SAXS) experiments a range of conformations from extended to compact was revealed, validating the MD simulations. The existence of conformational ensemble of 3CD could be viewed as a way to expand the poliovirus proteome, an observation that may extend to other viruses.
MOLECULAR THEORY OF HYDROPHOBIC EFFECTS: "She is too mean to have her name repeated."*
NASA Astrophysics Data System (ADS)
Pratt, Lawrence R.
2002-10-01
This paper reviews the molecular theory of hydrophobic effects relevant to biomolecular structure and assembly in aqueous solution. Recent progress has resulted in simple, validated molecular statistical thermodynamic theories and clarification of confusing theories of decades ago. Current work is resolving effects of wider variations of thermodynamic state, e.g., pressure denaturation of soluble proteins, and more exotic questions such as effects of surface chemistry in treating stability of macromolecular structures in aqueous solution.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shi, Jade; Nobrega, R. Paul; Schwantes, Christian
The dynamics of globular proteins can be described in terms of transitions between a folded native state and less-populated intermediates, or excited states, which can play critical roles in both protein folding and function. Excited states are by definition transient species, and therefore are difficult to characterize using current experimental techniques. We report an atomistic model of the excited state ensemble of a stabilized mutant of an extensively studied flavodoxin fold protein CheY. We employed a hybrid simulation and experimental approach in which an aggregate 42 milliseconds of all-atom molecular dynamics were used as an informative prior for the structuremore » of the excited state ensemble. The resulting prior was then refined against small-angle X-ray scattering (SAXS) data employing an established method (EROS). The most striking feature of the resulting excited state ensemble was an unstructured N-terminus stabilized by non-native contacts in a conformation that is topologically simpler than the native state. We then predict incisive single molecule FRET experiments, using these results, as a means of model validation. Our study demonstrates the paradigm of uniting simulation and experiment in a statistical model to study the structure of protein excited states and rationally design validating experiments.« less
Structural plasticity of the TDRD3 Tudor domain probed by a fragment screening hit.
Liu, Jiuyang; Zhang, Shuya; Liu, Mingqing; Liu, Yaqian; Nshogoza, Gilbert; Gao, Jia; Ma, Rongsheng; Yang, Yang; Wu, Jihui; Zhang, Jiahai; Li, Fudong; Ruan, Ke
2018-04-12
As a reader of di-methylated arginine on various proteins, such as histone, RNA polymerase II, PIWI and Fragile X mental retardation protein, the Tudor domain of Tudor domain-containing protein 3 (TDRD3) mediates transcriptional activation in nucleus and formation of stress granules in the cytoplasm. Despite the TDRD3 implication in cancer cell proliferation and invasion, warheads to block the di-methylated arginine recognition pocket of the TDRD3 Tudor domain have not yet been uncovered. Here we identified 14 small molecule hits against the TDRD3 Tudor domain through NMR fragment-based screening. These hits were further cross-validated by using competitive fluorescence polarization and isothermal titration calorimetry experiments. The crystal structure of the TDRD3 Tudor domain in complex with hit 1 reveals a distinct binding mode from the nature substrate. Hit 1 protrudes into the aromatic cage of the TDRD3 Tudor domain, where the aromatic residues are tilted to accommodate a sandwich-like π-π interaction. The side chain of the conserved residue N596 swings away 3.1 Å to form a direct hydrogen bond with hit 1. Moreover, this compound shows a decreased affinity against the single Tudor domain of survival motor neuron protein, but no detectable binding to neither the tandem Tudor domain of TP53-binding protein 1 nor the extended Tudor domain of staphylococcal nuclease domain-containing protein 1. Our work depicts the structural plasticity of the TDRD3 Tudor domain and paves the way for the subsequent structure-guided discovery of selective inhibitors targeting Tudor domains. Structural data are available in the PDB under the accession number 5YJ8. © 2018 Federation of European Biochemical Societies.
Parameter optimization on the convergence surface of path simulations
NASA Astrophysics Data System (ADS)
Chandrasekaran, Srinivas Niranj
Computational treatments of protein conformational changes tend to focus on the trajectories themselves, despite the fact that it is the transition state structures that contain information about the barriers that impose multi-state behavior. PATH is an algorithm that computes a transition pathway between two protein crystal structures, along with the transition state structure, by minimizing the Onsager-Machlup action functional. It is rapid but depends on several unknown input parameters whose range of different values can potentially generate different transition-state structures. Transition-state structures arising from different input parameters cannot be uniquely compared with those generated by other methods. I outline modifications that I have made to the PATH algorithm that estimates these input parameters in a manner that circumvents these difficulties, and describe two complementary tests that validate the transition-state structures found by the PATH algorithm. First, I show that although the PATH algorithm and two other approaches to computing transition pathways produce different low-energy structures connecting the initial and final ground-states with the transition state, all three methods agree closely on the configurations of their transition states. Second, I show that the PATH transition states are close to the saddle points of free-energy surfaces connecting initial and final states generated by replica-exchange Discrete Molecular Dynamics simulations. I show that aromatic side-chain rearrangements create similar potential energy barriers in the transition-state structures identified by PATH for a signaling protein, a contractile protein, and an enzyme. Finally, I observed, but cannot account for, the fact that trajectories obtained for all-atom and Calpha-only simulations identify transition state structures in which the Calpha atoms are in essentially the same positions. The consistency between transition-state structures derived by different algorithms for unrelated protein systems argues that although functionally important protein conformational change trajectories are to a degree stochastic, they nonetheless pass through a well-defined transition state whose detailed structural properties can rapidly be identified using PATH. In the end, I outline the strategies that could enhance the efficiency and applicability of PATH.
Bergsdorf, Christian; Fiez-Vandal, Cédric; Sykes, David A; Bernet, Pascal; Aussenac, Sonia; Charlton, Steven J; Schopfer, Ulrich; Ottl, Johannes; Duckely, Myriam
2016-03-01
Integral membrane proteins (IMPs) play an important role in many cellular events and are involved in numerous pathological processes. Therefore, understanding the structure and function of IMPs is a crucial prerequisite to enable successful targeting of these proteins with low molecular weight (LMW) ligands early on in the discovery process. To optimize IMP purification/crystallization and to identify/characterize LMW ligand-target interactions, robust, reliable, high-throughput, and sensitive biophysical methods are needed. Here, we describe a differential scanning fluorimetry (DSF) screening method using the thiol-reactive BODIPY FL-cystine dye to monitor thermal unfolding of the G-protein-coupled receptor (GPCR), CXCR2. To validate this method, the seven-transmembrane protein CXCR2 was analyzed with a set of well-characterized antagonists. This study showed that the new DSF assay assessed reliably the stability of CXCR2 in a 384-well format. The analysis of 14 ligands with a potency range over 4 log units demonstrated the detection/characterization of LMW ligands binding to the membrane protein target. Furthermore, DSF results cross-validated with the label-free differential static light scattering (DSLS) thermal denaturation method. These results underline the potential of the BODIPY assay format as a general tool to investigate membrane proteins and their interaction partners. © 2015 Society for Laboratory Automation and Screening.
Using support vector machine to predict beta- and gamma-turns in proteins.
Hu, Xiuzhen; Li, Qianzhong
2008-09-01
By using the composite vector with increment of diversity, position conservation scoring function, and predictive secondary structures to express the information of sequence, a support vector machine (SVM) algorithm for predicting beta- and gamma-turns in the proteins is proposed. The 426 and 320 nonhomologous protein chains described by Guruprasad and Rajkumar (Guruprasad and Rajkumar J. Biosci 2000, 25,143) are used for training and testing the predictive model of the beta- and gamma-turns, respectively. The overall prediction accuracy and the Matthews correlation coefficient in 7-fold cross-validation are 79.8% and 0.47, respectively, for the beta-turns. The overall prediction accuracy in 5-fold cross-validation is 61.0% for the gamma-turns. These results are significantly higher than the other algorithms in the prediction of beta- and gamma-turns using the same datasets. In addition, the 547 and 823 nonhomologous protein chains described by Fuchs and Alix (Fuchs and Alix Proteins: Struct Funct Bioinform 2005, 59, 828) are used for training and testing the predictive model of the beta- and gamma-turns, and better results are obtained. This algorithm may be helpful to improve the performance of protein turns' prediction. To ensure the ability of the SVM method to correctly classify beta-turn and non-beta-turn (gamma-turn and non-gamma-turn), the receiver operating characteristic threshold independent measure curves are provided. (c) 2008 Wiley Periodicals, Inc.
Improved method for predicting protein fold patterns with ensemble classifiers.
Chen, W; Liu, X; Huang, Y; Jiang, Y; Zou, Q; Lin, C
2012-01-27
Protein folding is recognized as a critical problem in the field of biophysics in the 21st century. Predicting protein-folding patterns is challenging due to the complex structure of proteins. In an attempt to solve this problem, we employed ensemble classifiers to improve prediction accuracy. In our experiments, 188-dimensional features were extracted based on the composition and physical-chemical property of proteins and 20-dimensional features were selected using a coupled position-specific scoring matrix. Compared with traditional prediction methods, these methods were superior in terms of prediction accuracy. The 188-dimensional feature-based method achieved 71.2% accuracy in five cross-validations. The accuracy rose to 77% when we used a 20-dimensional feature vector. These methods were used on recent data, with 54.2% accuracy. Source codes and dataset, together with web server and software tools for prediction, are available at: http://datamining.xmu.edu.cn/main/~cwc/ProteinPredict.html.
Sukhwal, Anshul; Sowdhamini, Ramanathan
2013-07-01
Protein-protein interactions are important in carrying out many biological processes and functions. These interactions may be either permanent or of temporary nature. Several studies have employed tools like solvent accessibility and graph theory to identify these interactions, but still more studies need to be performed to quantify and validate them. Although we now have many databases available with predicted and experimental results on protein-protein interactions, we still do not have many databases which focus on providing structural details of the interacting complexes, their oligomerisation state and homologues. In this work, protein-protein interactions have been thoroughly investigated within the structural regime and quantified for their strength using calculated pseudoenergies. The PPCheck server, an in-house webserver, has been used for calculating the pseudoenergies like van der Waals, hydrogen bonds and electrostatic energy based on distances between atoms of amino acids from two interacting proteins. PPCheck can be visited at . Based on statistical data, as obtained by studying established protein-protein interacting complexes from earlier studies, we came to a conclusion that an average protein-protein interface consisted of about 51 to 150 amino acid residues and the generalized energy per residue ranged from -2 kJ mol(-1) to -6 kJ mol(-1). We found that some of the proteins have an exceptionally higher number of amino acids at the interface and it was purely because of their elaborate interface or extended topology i.e. some of their secondary structure regions or loops were either inter-mixing or running parallel to one another or they were taking part in domain swapping. Residue networks were prepared for all the amino acids of the interacting proteins involved in different types of interactions (like van der Waals, hydrogen-bonding, electrostatic or intramolecular interactions) and were analysed between the query domain-interacting partner pair and its remote homologue-interacting partner pair. We found that, in exceptional cases, homologous proteins belonging to the same superfamily, but with remote sequence similarity, can share similar interfaces.
Computational prediction of atomic structures of helical membrane proteins aided by EM maps.
Kovacs, Julio A; Yeager, Mark; Abagyan, Ruben
2007-09-15
Integral membrane proteins pose a major challenge for protein-structure prediction because only approximately 100 high-resolution structures are available currently, thereby impeding the development of rules or empirical potentials to predict the packing of transmembrane alpha-helices. However, when an intermediate-resolution electron microscopy (EM) map is available, it can be used to provide restraints which, in combination with a suitable computational protocol, make structure prediction feasible. In this work we present such a protocol, which proceeds in three stages: 1), generation of an ensemble of alpha-helices by flexible fitting into each of the density rods in the low-resolution EM map, spanning a range of rotational angles around the main helical axes and translational shifts along the density rods; 2), fast optimization of side chains and scoring of the resulting conformations; and 3), refinement of the lowest-scoring conformations with internal coordinate mechanics, by optimizing the van der Waals, electrostatics, hydrogen bonding, torsional, and solvation energy contributions. In addition, our method implements a penalty term through a so-called tethering map, derived from the EM map, which restrains the positions of the alpha-helices. The protocol was validated on three test cases: GpA, KcsA, and MscL.
Batyuk, Alexander; Wu, Yufan; Honegger, Annemarie; Heberling, Matthew M; Plückthun, Andreas
2016-04-24
DARPin libraries, based on a Designed Ankyrin Repeat Protein consensus framework, are a rich source of binding partners for a wide variety of proteins. Their modular structure, stability, ease of in vitro selection and high production yields make DARPins an ideal starting point for further engineering. The X-ray structures of around 30 different DARPin complexes demonstrate their ability to facilitate crystallization of their target proteins by restricting flexibility and preventing undesired interactions of the target molecule. However, their small size (18 kDa), very hydrophilic surface and repetitive structure can limit the DARPins' ability to provide essential crystal contacts and their usefulness as a search model for addressing the crystallographic phase problem in molecular replacement. To optimize DARPins for their application as crystallization chaperones, rigid domain-domain fusions of the DARPins to larger proteins, proven to yield high-resolution crystal structures, were generated. These fusions were designed in such a way that they affect only one of the terminal capping repeats of the DARPin and do not interfere with residues involved in target binding, allowing to exchange at will the binding specificities of the DARPin in the fusion construct. As a proof of principle, we designed rigid fusions of a stabilized version of Escherichia coli TEM-1 β-lactamase to the C-terminal capping repeat of various DARPins in six different relative domain orientations. Five crystal structures representing four different fusion constructs, alone or in complex with the cognate target, show the predicted relative domain orientations and prove the validity of the concept. Copyright © 2016 Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Fayaz, S. M.; Rajanikant, G. K.
2014-07-01
Programmed cell death has been a fascinating area of research since it throws new challenges and questions in spite of the tremendous ongoing research in this field. Recently, necroptosis, a programmed form of necrotic cell death, has been implicated in many diseases including neurological disorders. Receptor interacting serine/threonine protein kinase 1 (RIPK1) is an important regulatory protein involved in the necroptosis and inhibition of this protein is essential to stop necroptotic process and eventually cell death. Current structure-based virtual screening methods involve a wide range of strategies and recently, considering the multiple protein structures for pharmacophore extraction has been emphasized as a way to improve the outcome. However, using the pharmacophoric information completely during docking is very important. Further, in such methods, using the appropriate protein structures for docking is desirable. If not, potential compound hits, obtained through pharmacophore-based screening, may not have correct ranks and scores after docking. Therefore, a comprehensive integration of different ensemble methods is essential, which may provide better virtual screening results. In this study, dual ensemble screening, a novel computational strategy was used to identify diverse and potent inhibitors against RIPK1. All the pharmacophore features present in the binding site were captured using both the apo and holo protein structures and an ensemble pharmacophore was built by combining these features. This ensemble pharmacophore was employed in pharmacophore-based screening of ZINC database. The compound hits, thus obtained, were subjected to ensemble docking. The leads acquired through docking were further validated through feature evaluation and molecular dynamics simulation.
PON-Sol: prediction of effects of amino acid substitutions on protein solubility.
Yang, Yang; Niroula, Abhishek; Shen, Bairong; Vihinen, Mauno
2016-07-01
Solubility is one of the fundamental protein properties. It is of great interest because of its relevance to protein expression. Reduced solubility and protein aggregation are also associated with many diseases. We collected from literature the largest experimentally verified solubility affecting amino acid substitution (AAS) dataset and used it to train a predictor called PON-Sol. The predictor can distinguish both solubility decreasing and increasing variants from those not affecting solubility. PON-Sol has normalized correct prediction ratio of 0.491 on cross-validation and 0.432 for independent test set. The performance of the method was compared both to solubility and aggregation predictors and found to be superior. PON-Sol can be used for the prediction of effects of disease-related substitutions, effects on heterologous recombinant protein expression and enhanced crystallizability. One application is to investigate effects of all possible AASs in a protein to aid protein engineering. PON-Sol is freely available at http://structure.bmc.lu.se/PON-Sol The training and test data are available at http://structure.bmc.lu.se/VariBench/ponsol.php mauno.vihinen@med.lu.se Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Babina, Arianne M; Parker, Darren J; Li, Gene-Wei; Meyer, Michelle M
2018-06-20
In many bacteria, ribosomal proteins autogenously repress their own expression by interacting with RNA structures typically located in the 5'-UTRs of their mRNA transcripts. This regulation is necessary to maintain a balance between ribosomal proteins and rRNA to ensure proper ribosome production. Despite advances in non-coding RNA discovery and validation of RNA-protein regulatory interactions, the selective pressures that govern the formation and maintenance of such RNA cis-regulators in the context of an organism remain largely undetermined. To examine the impact disruptions to this regulation have on bacterial fitness, we introduced point mutations that abolish ribosomal protein binding and regulation into the RNA structure that controls expression of ribosomal proteins L20 and L35 within the Bacillus subtilis genome. Our studies indicate that removing this regulation results in reduced log phase growth, improper rRNA maturation, and the accumulation of a kinetically trapped or mis-assembled ribosomal particle at low temperatures, suggesting defects in ribosome synthesis. Such work emphasizes the important role regulatory RNAs play in the stoichiometric production of ribosomal components for proper ribosome composition and overall organism viability and reinforces the potential of targeting ribosomal protein production and ribosome assembly with novel antimicrobials. Published by Cold Spring Harbor Laboratory Press for the RNA Society.
How PEGylation enhances the stability and potency of insulin: a molecular dynamics simulation.
Yang, Cheng; Lu, Diannan; Liu, Zheng
2011-04-05
While the effectiveness of PEGylation in enhancing the stability and potency of protein pharmaceuticals has been validated for years, the underlying mechanism remains poorly understood, particularly at the molecular level. A molecular dynamics simulation was developed using an annealing procedure that allowed an all-atom level examination of the interaction between PEG polymers of different chain lengths and a conjugated protein represented by insulin. It was shown that PEG became entangled around the protein surface through hydrophobic interaction and concurrently formed hydrogen bonds with the surrounding water molecules. In addition to enhancing its structural stability, as indicated by the root-mean-square difference (rmsd) and secondary structure analyses, conjugation increased the size of the protein drug while decreasing the solvent accessible surface area of the protein. All these thus led to prolonged circulation life despite kidney filtration, proteolysis, and immunogenic side effects, as experimentally demonstrated elsewhere. Moreover, the simulation results indicated that an optimal chain length exists that would maximize drug potency underpinned by the parameters mentioned above. The simulation provided molecular insight into the interaction between PEG and the conjugated protein at the all-atom level and offered a tool that would allow for the design of PEGylated protein pharmaceuticals for given applications.
A Bayesian Active Learning Experimental Design for Inferring Signaling Networks.
Ness, Robert O; Sachs, Karen; Mallick, Parag; Vitek, Olga
2018-06-21
Machine learning methods for learning network structure are applied to quantitative proteomics experiments and reverse-engineer intracellular signal transduction networks. They provide insight into the rewiring of signaling within the context of a disease or a phenotype. To learn the causal patterns of influence between proteins in the network, the methods require experiments that include targeted interventions that fix the activity of specific proteins. However, the interventions are costly and add experimental complexity. We describe an active learning strategy for selecting optimal interventions. Our approach takes as inputs pathway databases and historic data sets, expresses them in form of prior probability distributions on network structures, and selects interventions that maximize their expected contribution to structure learning. Evaluations on simulated and real data show that the strategy reduces the detection error of validated edges as compared with an unguided choice of interventions and avoids redundant interventions, thereby increasing the effectiveness of the experiment.
Macroscopic modeling and simulations of supercoiled DNA with bound proteins
NASA Astrophysics Data System (ADS)
Huang, Jing; Schlick, Tamar
2002-11-01
General methods are presented for modeling and simulating DNA molecules with bound proteins on the macromolecular level. These new approaches are motivated by the need for accurate and affordable methods to simulate slow processes (on the millisecond time scale) in DNA/protein systems, such as the large-scale motions involved in the Hin-mediated inversion process. Our approaches, based on the wormlike chain model of long DNA molecules, introduce inhomogeneous potentials for DNA/protein complexes based on available atomic-level structures. Electrostatically, treat those DNA/protein complexes as sets of effective charges, optimized by our discrete surface charge optimization package, in which the charges are distributed on an excluded-volume surface that represents the macromolecular complex. We also introduce directional bending potentials as well as non-identical bead hydrodynamics algorithm to further mimic the inhomogeneous effects caused by protein binding. These models thus account for basic elements of protein binding effects on DNA local structure but remain computational tractable. To validate these models and methods, we reproduce various properties measured by both Monte Carlo methods and experiments. We then apply the developed models to study the Hin-mediated inversion system in long DNA. By simulating supercoiled, circular DNA with or without bound proteins, we observe significant effects of protein binding on global conformations and long-time dynamics of the DNA on the kilo basepair length.
Identification of the HIV-1 Vif and Human APOBEC3G Protein Interface.
Letko, Michael; Booiman, Thijs; Kootstra, Neeltje; Simon, Viviana; Ooms, Marcel
2015-12-01
Human cells express natural antiviral proteins, such as APOBEC3G (A3G), that potently restrict HIV replication. As a counter-defense, HIV encodes the accessory protein Vif, which binds A3G and mediates its proteasomal degradation. Our structural knowledge on how Vif and A3G interact is limited, because a co-structure is not available. We identified specific points of contact between Vif and A3G by using functional assays with full-length A3G, patient-derived Vif variants, and HIV forced evolution. These anchor points were used to model and validate the Vif-A3G interface. The resultant co-structure model shows that the negatively charged β4-α4 A3G loop, which contains primate-specific variation, is the core Vif binding site and forms extensive interactions with a positively charged pocket in HIV Vif. Our data present a functional map of this viral-host interface and open avenues for targeted approaches to block HIV replication by obstructing the Vif-A3G interaction. Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.
Raster-scanning serial protein crystallography using micro- and nano-focused synchrotron beams
Coquelle, Nicolas; Brewster, Aaron S.; Kapp, Ulrike; Shilova, Anastasya; Weinhausen, Britta; Burghammer, Manfred; Colletier, Jacques-Philippe
2015-01-01
High-resolution structural information was obtained from lysozyme microcrystals (20 µm in the largest dimension) using raster-scanning serial protein crystallography on micro- and nano-focused beamlines at the ESRF. Data were collected at room temperature (RT) from crystals sandwiched between two silicon nitride wafers, thereby preventing their drying, while limiting background scattering and sample consumption. In order to identify crystal hits, new multi-processing and GUI-driven Python-based pre-analysis software was developed, named NanoPeakCell, that was able to read data from a variety of crystallographic image formats. Further data processing was carried out using CrystFEL, and the resultant structures were refined to 1.7 Å resolution. The data demonstrate the feasibility of RT raster-scanning serial micro- and nano-protein crystallography at synchrotrons and validate it as an alternative approach for the collection of high-resolution structural data from micro-sized crystals. Advantages of the proposed approach are its thriftiness, its handling-free nature, the reduced amount of sample required, the adjustable hit rate, the high indexing rate and the minimization of background scattering. PMID:25945583
Raster-scanning serial protein crystallography using micro- and nano-focused synchrotron beams
DOE Office of Scientific and Technical Information (OSTI.GOV)
Coquelle, Nicolas; Brewster, Aaron S.; Kapp, Ulrike
High-resolution structural information was obtained from lysozyme microcrystals (20 µm in the largest dimension) using raster-scanning serial protein crystallography on micro- and nano-focused beamlines at the ESRF. Data were collected at room temperature (RT) from crystals sandwiched between two silicon nitride wafers, thereby preventing their drying, while limiting background scattering and sample consumption. In order to identify crystal hits, new multi-processing and GUI-driven Python-based pre-analysis software was developed, named NanoPeakCell, that was able to read data from a variety of crystallographic image formats. Further data processing was carried out using CrystFEL, and the resultant structures were refined to 1.7 Åmore » resolution. The data demonstrate the feasibility of RT raster-scanning serial micro- and nano-protein crystallography at synchrotrons and validate it as an alternative approach for the collection of high-resolution structural data from micro-sized crystals. Advantages of the proposed approach are its thriftiness, its handling-free nature, the reduced amount of sample required, the adjustable hit rate, the high indexing rate and the minimization of background scattering.« less
Raster-scanning serial protein crystallography using micro- and nano-focused synchrotron beams.
Coquelle, Nicolas; Brewster, Aaron S; Kapp, Ulrike; Shilova, Anastasya; Weinhausen, Britta; Burghammer, Manfred; Colletier, Jacques Philippe
2015-05-01
High-resolution structural information was obtained from lysozyme microcrystals (20 µm in the largest dimension) using raster-scanning serial protein crystallography on micro- and nano-focused beamlines at the ESRF. Data were collected at room temperature (RT) from crystals sandwiched between two silicon nitride wafers, thereby preventing their drying, while limiting background scattering and sample consumption. In order to identify crystal hits, new multi-processing and GUI-driven Python-based pre-analysis software was developed, named NanoPeakCell, that was able to read data from a variety of crystallographic image formats. Further data processing was carried out using CrystFEL, and the resultant structures were refined to 1.7 Å resolution. The data demonstrate the feasibility of RT raster-scanning serial micro- and nano-protein crystallography at synchrotrons and validate it as an alternative approach for the collection of high-resolution structural data from micro-sized crystals. Advantages of the proposed approach are its thriftiness, its handling-free nature, the reduced amount of sample required, the adjustable hit rate, the high indexing rate and the minimization of background scattering.
Raster-scanning serial protein crystallography using micro- and nano-focused synchrotron beams
Coquelle, Nicolas; Brewster, Aaron S.; Kapp, Ulrike; ...
2015-04-25
High-resolution structural information was obtained from lysozyme microcrystals (20 µm in the largest dimension) using raster-scanning serial protein crystallography on micro- and nano-focused beamlines at the ESRF. Data were collected at room temperature (RT) from crystals sandwiched between two silicon nitride wafers, thereby preventing their drying, while limiting background scattering and sample consumption. In order to identify crystal hits, new multi-processing and GUI-driven Python-based pre-analysis software was developed, named NanoPeakCell, that was able to read data from a variety of crystallographic image formats. Further data processing was carried out using CrystFEL, and the resultant structures were refined to 1.7 Åmore » resolution. The data demonstrate the feasibility of RT raster-scanning serial micro- and nano-protein crystallography at synchrotrons and validate it as an alternative approach for the collection of high-resolution structural data from micro-sized crystals. Advantages of the proposed approach are its thriftiness, its handling-free nature, the reduced amount of sample required, the adjustable hit rate, the high indexing rate and the minimization of background scattering.« less
Vila, Jorge A.; Scheraga, Harold A.
2008-01-01
Interest centers here on the analysis of two different, but related, phenomena that affect side-chain conformations and consequently 13Cα chemical shifts and their applications to determine, refine, and validate protein structures. The first is whether 13Cα chemical shifts, computed at the DFT level of approximation with charged residues is a better approximation of observed 13Cα chemical shifts than those computed with neutral residues for proteins in solution. Accurate computation of 13Cα chemical shifts requires a proper representation of the charges, which might not take on integral values. For this analysis, the charges for 139 conformations of the protein ubiquitin were determined by explicit consideration of protein binding equilibria, at a given pH, that is, by exploring the 2ξ possible ionization states of the whole molecule, with ξ being the number of ionizable groups. The results of this analysis, as revealed by the shielding/deshield-ing of the 13Cα nucleus, indicated that: (i) there is a significant difference in the computed 13Cα chemical shifts, between basic and acidic groups, as a function of the degree of charge of the side chain; (ii) this difference is attributed to the distance between the ionizable groups and the 13Cα nucleus, which is shorter for the acidic Asp and Glu groups as compared with that for the basic Lys and Arg groups; and (iii) the use of neutral, rather than charged, basic and acidic groups is a better approximation of the observed 13Cα chemical shifts of a protein in solution. The second is how side-chain flexibility influences computed 13Cα chemical shifts in an additional set of ubiquitin conformations, in which the side chains are generated from an NMR-derived structure with the backbone conformation assumed to be fixed. The 13Cα chemical shift of a given amino acid residue in a protein is determined, mainly, by its own backbone and side-chain torsional angles, independent of the neighboring residues; the conformation of a given residue itself, however, depends on the environment of this residue and, hence, on the whole protein structure. As a consequence, this analysis reveals the role and impact of an accurate side-chain computation in the determination and refinement of protein conformation. The results of this analysis are: (i) a lower error between computed and observed 13Cα chemical shifts (by up to 3.7 ppm), was found for ~68% and ~63% of all ionizable residues and all non-Ala/Pro/Gly residues, respectively, in the additional set of conformations, compared with results for the model from which the set was derived; and (ii) all the additional conformations exhibit a lower root-mean-square-deviation (1.97 ppm ≤ rmsd ≤ 2.13 ppm), between computed and observed 13Cα chemical shifts, than the rmsd (2.32 ppm) computed for the starting conformation from which this additional set was derived. As a validation test, an analysis of the additional set of ubiquitin conformations, comparing computed and observed values of both 13Cα chemical shifts and χ1 torsional angles (given by the vicinal coupling constants, 3JN–Cγ and 3JC′–Cγ, is discussed. PMID:17975838
Automatic rebuilding and optimization of crystallographic structures in the Protein Data Bank
Joosten, Robbie P.; Joosten, Krista; Cohen, Serge X.; Vriend, Gert; Perrakis, Anastassis
2011-01-01
Motivation: Macromolecular crystal structures in the Protein Data Bank (PDB) are a key source of structural insight into biological processes. These structures, some >30 years old, were constructed with methods of their era. With PDB_REDO, we aim to automatically optimize these structures to better fit their corresponding experimental data, passing the benefits of new methods in crystallography on to a wide base of non-crystallographer structure users. Results: We developed new algorithms to allow automatic rebuilding and remodeling of main chain peptide bonds and side chains in crystallographic electron density maps, and incorporated these and further enhancements in the PDB_REDO procedure. Applying the updated PDB_REDO to the oldest, but also to some of the newest models in the PDB, corrects existing modeling errors and brings these models to a higher quality, as judged by standard validation methods. Availability and Implementation: The PDB_REDO database and links to all software are available at http://www.cmbi.ru.nl/pdb_redo. Contact: r.joosten@nki.nl; a.perrakis@nki.nl Supplementary Information: Supplementary data are available at Bioinformatics online. PMID:22034521
Knutson, Stacy T; Westwood, Brian M; Leuthaeuser, Janelle B; Turner, Brandon E; Nguyendac, Don; Shea, Gabrielle; Kumar, Kiran; Hayden, Julia D; Harper, Angela F; Brown, Shoshana D; Morris, John H; Ferrin, Thomas E; Babbitt, Patricia C; Fetrow, Jacquelyn S
2017-04-01
Protein function identification remains a significant problem. Solving this problem at the molecular functional level would allow mechanistic determinant identification-amino acids that distinguish details between functional families within a superfamily. Active site profiling was developed to identify mechanistic determinants. DASP and DASP2 were developed as tools to search sequence databases using active site profiling. Here, TuLIP (Two-Level Iterative clustering Process) is introduced as an iterative, divisive clustering process that utilizes active site profiling to separate structurally characterized superfamily members into functionally relevant clusters. Underlying TuLIP is the observation that functionally relevant families (curated by Structure-Function Linkage Database, SFLD) self-identify in DASP2 searches; clusters containing multiple functional families do not. Each TuLIP iteration produces candidate clusters, each evaluated to determine if it self-identifies using DASP2. If so, it is deemed a functionally relevant group. Divisive clustering continues until each structure is either a functionally relevant group member or a singlet. TuLIP is validated on enolase and glutathione transferase structures, superfamilies well-curated by SFLD. Correlation is strong; small numbers of structures prevent statistically significant analysis. TuLIP-identified enolase clusters are used in DASP2 GenBank searches to identify sequences sharing functional site features. Analysis shows a true positive rate of 96%, false negative rate of 4%, and maximum false positive rate of 4%. F-measure and performance analysis on the enolase search results and comparison to GEMMA and SCI-PHY demonstrate that TuLIP avoids the over-division problem of these methods. Mechanistic determinants for enolase families are evaluated and shown to correlate well with literature results. © 2017 The Authors Protein Science published by Wiley Periodicals, Inc. on behalf of The Protein Society.
Tsuchiya, Megumi; Karim, M Rezaul; Matsumoto, Taro; Ogawa, Hidesato; Taniguchi, Hiroaki
2017-01-24
Transcriptional coregulators are vital to the efficient transcriptional regulation of nuclear chromatin structure. Coregulators play a variety of roles in regulating transcription. These include the direct interaction with transcription factors, the covalent modification of histones and other proteins, and the occasional chromatin conformation alteration. Accordingly, establishing relatively quick methods for identifying proteins that interact within this network is crucial to enhancing our understanding of the underlying regulatory mechanisms. LC-MS/MS-mediated protein binding partner identification is a validated technique used to analyze protein-protein interactions. By immunoprecipitating a previously-identified member of a protein complex with an antibody (occasionally with an antibody for a tagged protein), it is possible to identify its unknown protein interactions via mass spectrometry analysis. Here, we present a method of protein preparation for the LC-MS/MS-mediated high-throughput identification of protein interactions involving nuclear cofactors and their binding partners. This method allows for a better understanding of the transcriptional regulatory mechanisms of the targeted nuclear factors.
Li, Liang; Mustafi, Debarshi; Fu, Qiang; Tereshko, Valentina; Chen, Delai L.; Tice, Joshua D.; Ismagilov, Rustem F.
2006-01-01
High-throughput screening and optimization experiments are critical to a number of fields, including chemistry and structural and molecular biology. The separation of these two steps may introduce false negatives and a time delay between initial screening and subsequent optimization. Although a hybrid method combining both steps may address these problems, miniaturization is required to minimize sample consumption. This article reports a “hybrid” droplet-based microfluidic approach that combines the steps of screening and optimization into one simple experiment and uses nanoliter-sized plugs to minimize sample consumption. Many distinct reagents were sequentially introduced as ≈140-nl plugs into a microfluidic device and combined with a substrate and a diluting buffer. Tests were conducted in ≈10-nl plugs containing different concentrations of a reagent. Methods were developed to form plugs of controlled concentrations, index concentrations, and incubate thousands of plugs inexpensively and without evaporation. To validate the hybrid method and demonstrate its applicability to challenging problems, crystallization of model membrane proteins and handling of solutions of detergents and viscous precipitants were demonstrated. By using 10 μl of protein solution, ≈1,300 crystallization trials were set up within 20 min by one researcher. This method was compatible with growth, manipulation, and extraction of high-quality crystals of membrane proteins, demonstrated by obtaining high-resolution diffraction images and solving a crystal structure. This robust method requires inexpensive equipment and supplies, should be especially suitable for use in individual laboratories, and could find applications in a number of areas that require chemical, biochemical, and biological screening and optimization. PMID:17159147
Pullara, Filippo; Guerrero-Santoro, Jennifer; Calero, Monica; Zhang, Qiangmin; Peng, Ye; Spåhr, Henrik; Kornberg, Guy L.; Cusimano, Antonella; Stevenson, Hilary P.; Santamaria-Suarez, Hugo; Reynolds, Shelley L.; Brown, Ian S.; Monga, Satdarshan P.S.; Van Houten, Bennett; Rapić-Otrin, Vesna; Calero, Guillermo; Levine, Arthur S.
2014-01-01
Expression of recombinant proteins in bacterial or eukaryotic systems often results in aggregation rendering them unavailable for biochemical or structural studies. Protein aggregation is a costly problem for biomedical research. It forces research laboratories and the biomedical industry to search for alternative, more soluble, non-human proteins and limits the number of potential “druggable” targets. In this study we present a highly reproducible protocol that introduces the systematic use of an extensive number of detergents to solubilize aggregated proteins expressed in bacterial and eukaryotic systems. We validate the usefulness of this protocol by solubilizing traditionally difficult human protein targets to milligram quantities and confirm their biological activity. We use this method to solubilize monomeric or multimeric components of multi-protein complexes and demonstrate its efficacy to reconstitute large cellular machines. This protocol works equally well on cytosolic, nuclear and membrane proteins and can be easily adapted to a high throughput format. PMID:23137940
Computational smart polymer design based on elastin protein mutability.
Tarakanova, Anna; Huang, Wenwen; Weiss, Anthony S; Kaplan, David L; Buehler, Markus J
2017-05-01
Soluble elastin-like peptides (ELPs) can be engineered into a range of physical forms, from hydrogels and scaffolds to fibers and artificial tissues, finding numerous applications in medicine and engineering as "smart polymers". Elastin-like peptides are attractive candidates as a platform for novel biomaterial design because they exhibit a highly tunable response spectrum, with reversible phase transition capabilities. Here, we report the design of the first virtual library of elastin-like protein models using methods for enhanced sampling to study the effect of peptide chemistry, chain length, and salt concentration on the structural transitions of ELPs, exposing associated molecular mechanisms. We describe the behavior of the local molecular structure under increasing temperatures and the effect of peptide interactions with nearest hydration shell water molecules on peptide mobility and propensity to exhibit structural transitions. Shifts in the magnitude of structural transitions at the single-molecule scale are explained from the perspective of peptide-ion-water interactions in a library of four unique elastin-like peptide systems. Predictions of structural transitions are subsequently validated in experiment. This library is a valuable resource for recombinant protein design and synthesis as it elucidates mechanisms at the single-molecule level, paving a feedback path between simulation and experiment for smart material designs, with applications in biomedicine and diagnostic devices. Copyright © 2017. Published by Elsevier Ltd.
Bacterial actin MreB assembles in complex with cell shape protein RodZ.
van den Ent, Fusinita; Johnson, Christopher M; Persons, Logan; de Boer, Piet; Löwe, Jan
2010-03-17
Bacterial actin homologue MreB is required for cell shape maintenance in most non-spherical bacteria, where it assembles into helical structures just underneath the cytoplasmic membrane. Proper assembly of the actin cytoskeleton requires RodZ, a conserved, bitopic membrane protein that colocalises to MreB and is essential for cell shape determination. Here, we present the first crystal structure of bacterial actin engaged with a natural partner and provide a clear functional significance of the interaction. We show that the cytoplasmic helix-turn-helix motif of Thermotoga maritima RodZ directly interacts with monomeric as well as filamentous MreB and present the crystal structure of the complex. In vitro and in vivo analyses of mutant T. maritima and Escherichia coli RodZ validate the structure and reveal the importance of the MreB-RodZ interaction in the ability of cells to propagate as rods. Furthermore, the results elucidate how the bacterial actin cytoskeleton might be anchored to the membrane to help constrain peptidoglycan synthesis in the periplasm.
Bhadra, Pratiti; Pal, Debnath
2017-04-01
Dynamics is integral to the function of proteins, yet the use of molecular dynamics (MD) simulation as a technique remains under-explored for molecular function inference. This is more important in the context of genomics projects where novel proteins are determined with limited evolutionary information. Recently we developed a method to match the query protein's flexible segments to infer function using a novel approach combining analysis of residue fluctuation-graphs and auto-correlation vectors derived from coarse-grained (CG) MD trajectory. The method was validated on a diverse dataset with sequence identity between proteins as low as 3%, with high function-recall rates. Here we share its implementation as a publicly accessible web service, named DynFunc (Dynamics Match for Function) to query protein function from ≥1 µs long CG dynamics trajectory information of protein subunits. Users are provided with the custom-developed coarse-grained molecular mechanics (CGMM) forcefield to generate the MD trajectories for their protein of interest. On upload of trajectory information, the DynFunc web server identifies specific flexible regions of the protein linked to putative molecular function. Our unique application does not use evolutionary information to infer molecular function from MD information and can, therefore, work for all proteins, including moonlighting and the novel ones, whenever structural information is available. Our pipeline is expected to be of utility to all structural biologists working with novel proteins and interested in moonlighting functions. Copyright © 2017 Elsevier Ltd. All rights reserved.
L, Sunil; Vasu, Prasanna
2017-09-01
Leucine, isoleucine, and valine are three essential branched-chain amino acids (BCAA) account for 40-45% of total essential amino acids. BCAA stimulates protein synthesis primarily in skeletal muscles, and it can directly transport to circulatory blood stream bypassing the liver. Hence, a protein enriched with BCAA is an important therapeutic target for the dietary treatment of chronic liver disease. The present study is to design a synthetic protein enriched with BCAA and the challenge is to maximize the BCAA content, keeping the balanced ratio of leucine, isoleucine, valine - 2: 1: 1.2 as specified by WHO/UNU/FAO. Here, we turned the general concept of homology modeling and tried to find a suitable scaffold (α-helix) to host an excess amount of BCAA for increased stability and digestibility. A total of 50 protein models were constructed by using SWISS-MODEL, Modeller 9.17, ProtParam tool, and allergen online tools. Out of 50 different protein models, protein model-50 was found to be best, which had a well-defined 3D structure, good in silico digestibility, balanced ratio of BCAA and showed 65.57% structure identity to the template apo-bovine α-lactalbumin (1F6R). Templates search was performed against PDB using PSI-BLAST, SWISS-MODEL, PROFUNC, I-TASSER, and ConSurf. The secondary structure was predicted by PSSPred, PSIPRED, I-TASSER, PORTER, and SPIDER2. The modeled structure of protein Model-50 was validated by PROCHECK, ERRAT, ProSA, and QMEAN. COACH and ProFUNC tools were performed to determine the functional effects of protein model-50. Overall, the BCAA was enriched from 22 to 56.4% with the balanced ratio of Leu: Ile: Val (2: 1: 1.2). The Ramachandran plot showed 97.7% of the amino acid residues in allowed regions with ERRAT score of 86.05. We have successfully modeled the complete three-dimensional structure of the target protein model-50 using highly reputed computational tools. Copyright © 2017 Elsevier Inc. All rights reserved.
Natarajaseenivasan, Kalimuthusamy; Shanmughapriya, Santhanam; Velineni, Sridhar; Artiushin, Sergey C; Timoney, John F
2011-10-01
Leptospirosis is an infectious bacterial disease caused by Leptospira species. In this study, we cloned and sequenced the gene encoding the immunodominant protein GroEL from L. interrogans serovar Autumnalis strain N2, which was isolated from the urine of a patient during an outbreak of leptospirosis in Chennai, India. This groEL gene encodes a protein of 60 kDa with a high degree of homology (99% similarity) to those of other leptospiral serovars. Recombinant GroEL was overexpressed in Escherichia coli. Immunoblot analysis indicated that the sera from confirmed leptospirosis patients showed strong reactivity with the recombinant GroEL while no reactivity was observed with the sera from seronegative control patient. In addition, the 3D structure of GroEL was constructed using chaperonin complex cpn60 from Thermus thermophilus as template and validated. The results indicated a Z-score of -8.35, which is in good agreement with the expected value for a protein. The superposition of the Ca traces of cpn60 structure and predicted structure of leptospiral GroEL indicates good agreement of secondary structure elements with an RMSD value of 1.5 Å. Further study is necessary to evaluate GroEL for serological diagnosis of leptospirosis and for its potential as a vaccine component. Copyright © 2011 Beijing Genomics Institute. Published by Elsevier Ltd. All rights reserved.
Wan, Minghui; Liao, Dongjiang; Peng, Guilin; Xu, Xin; Yin, Weiqiang; Guo, Guixin; Jiang, Funeng; Zhong, Weide
2017-01-01
Chloride intracellular channel 1 (CLIC1) is involved in the development of most aggressive human tumors, including gastric, colon, lung, liver, and glioblastoma cancers. It has become an attractive new therapeutic target for several types of cancer. In this work, we aim to identify natural products as potent CLIC1 inhibitors from Traditional Chinese Medicine (TCM) database using structure-based virtual screening and molecular dynamics (MD) simulation. First, structure-based docking was employed to screen the refined TCM database and the top 500 TCM compounds were obtained and reranked by X-Score. Then, 30 potent hits were achieved from the top 500 TCM compounds using cluster and ligand-protein interaction analysis. Finally, MD simulation was employed to validate the stability of interactions between each hit and CLIC1 protein from docking simulation, and Molecular Mechanics/Generalized Born Surface Area (MM-GBSA) analysis was used to refine the virtual hits. Six TCM compounds with top MM-GBSA scores and ideal-binding models were confirmed as the final hits. Our study provides information about the interaction between TCM compounds and CLIC1 protein, which may be helpful for further experimental investigations. In addition, the top 6 natural products structural scaffolds could serve as building blocks in designing drug-like molecules for CLIC1 inhibition. PMID:29147652
Multiple solvent crystal structures of ribonuclease A: An assessment of the method
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dechene, Michelle; Wink, Glenna; Smith, Mychal
2010-11-12
The multiple solvent crystal structures (MSCS) method uses organic solvents to map the surfaces of proteins. It identifies binding sites and allows for a more thorough examination of protein plasticity and hydration than could be achieved by a single structure. The crystal structures of bovine pancreatic ribonuclease A (RNAse A) soaked in the following organic solvents are presented: 50% dioxane, 50% dimethylformamide, 70% dimethylsulfoxide, 70% 1,6-hexanediol, 70% isopropanol, 50% R,S,R-bisfuran alcohol, 70% t-butanol, 50% trifluoroethanol, or 1.0M trimethylamine-N-oxide. This set of structures is compared with four sets of crystal structures of RNAse A from the protein data bank (PDB) andmore » with the solution NMR structure to assess the validity of previously untested assumptions associated with MSCS analysis. Plasticity from MSCS is the same as from PDB structures obtained in the same crystal form and deviates only at crystal contacts when compared to structures from a diverse set of crystal environments. Furthermore, there is a good correlation between plasticity as observed by MSCS and the dynamic regions seen by NMR. Conserved water binding sites are identified by MSCS to be those that are conserved in the sets of structures taken from the PDB. Comparison of the MSCS structures with inhibitor-bound crystal structures of RNAse A reveals that the organic solvent molecules identify key interactions made by inhibitor molecules, highlighting ligand binding hot-spots in the active site. The present work firmly establishes the relevance of information obtained by MSCS.« less
Urea-mediated protein denaturation: a consensus view.
Das, Atanu; Mukhopadhyay, Chaitali
2009-09-24
We have performed all-atom molecular dynamics simulations of three structurally similar small globular proteins in 8 M urea and compared the results with pure aqueous simulations. Protein denaturation is preceded by an initial loss of water from the first solvation shell and consequent in-flow of urea toward the protein. Urea reaches the first solvation shell of the protein mainly due to electrostatic interaction with a considerable contribution coming from the dispersion interaction. Urea shifts the equilibrium from the native to denatured ensemble by making the protein-protein contact less stable than protein-urea contact, which is just the reverse of the condition in pure water, where protein-protein contact is more stable than protein-water contact. We have also seen that water follows urea and reaches the protein interior at later stages of denaturation, while urea preferentially and efficiently solvates different parts of the protein. Solvation of the protein backbone via hydrogen bonding, favorable electrostatic interaction with hydrophilic residues, and dispersion interaction with hydrophobic residues are the key steps through which urea intrudes the core of the protein and denatures it. Why urea is preferred over water for binding to the protein backbone and how urea orients itself toward the protein backbone have been identified comprehensively. All the key components of intermolecular forces are found to play a significant part in urea-induced protein denaturation and also toward the stability of the denatured state ensemble. Changes in water network/structure and dynamical properties and higher degree of solvation of the hydrophobic residues validate the presence of "indirect mechanism" along with the "direct mechanism" and reinforce the effect of urea on protein.
Sweetening the pot: adding glycosylation to the biomarker discovery equation.
Drake, Penelope M; Cho, Wonryeon; Li, Bensheng; Prakobphol, Akraporn; Johansen, Eric; Anderson, N Leigh; Regnier, Fred E; Gibson, Bradford W; Fisher, Susan J
2010-02-01
Cancer has profound effects on gene expression, including a cell's glycosylation machinery. Thus, tumors produce glycoproteins that carry oligosaccharides with structures that are markedly different from the same protein produced by a normal cell. A single protein can have many glycosylation sites that greatly amplify the signals they generate compared with their protein backbones. In this article, we survey clinical tests that target carbohydrate modifications for diagnosing and treating cancer. We present the biological relevance of glycosylation to disease progression by highlighting the role these structures play in adhesion, signaling, and metastasis and then address current methodological approaches to biomarker discovery that capitalize on selectively capturing tumor-associated glycoforms to enrich and identify disease-related candidate analytes. Finally, we discuss emerging technologies--multiple reaction monitoring and lectin-antibody arrays--as potential tools for biomarker validation studies in pursuit of clinically useful tests. The future of carbohydrate-based biomarker studies has arrived. At all stages, from discovery through verification and deployment into clinics, glycosylation should be considered a primary readout or a way of increasing the sensitivity and specificity of protein-based analyses.
Sweetening the pot: adding glycosylation to the biomarker discovery equation
Drake, Penelope M.; Cho, Wonryeon; Li, Bensheng; Prakobphol, Akraporn; Johansen, Eric; Anderson, N. Leigh; Regnier, Fred E.; Gibson, Bradford W.; Fisher, Susan J.
2010-01-01
Background Cancer has profound effects on gene expression, including a cell’s glycosylation machinery. Thus, tumors produce glycoproteins that carry oligosaccharides with structures that are markedly different from the same protein produced by a normal cell. A single protein can have many glycosylation sites that greatly amplify the signals they generate as compared to their protein backbones. Content We survey clinical tests that target carbohydrate modifications. for diagnosing and treating cancer. Next, we present the biological relevance of glycosylation to disease progression by highlighting the role these structures play in adhesion, signaling and metastasis, and then address current methodological approaches to biomarker discovery that capitalize on selectively capturing tumor-associated glycoforms to enrich and identify disease-related candidate analytes. Finally, we discuss emerging technologies—multiple reaction monitoring and lectin-antibody arrays—as potential tools for biomarker validation studies in pursuit of clinically useful tests. Summary The future of carbohydrate-based biomarker studies has arrived. At all stages, from discovery through verification and deployment into clinics, glycosylation should be considered a primary readout or a way of increasing the sensitivity and specificity of protein-based analyses. PMID:19959616
Membrane raft association is a determinant of plasma membrane localization.
Diaz-Rohrer, Blanca B; Levental, Kandice R; Simons, Kai; Levental, Ilya
2014-06-10
The lipid raft hypothesis proposes lateral domains driven by preferential interactions between sterols, sphingolipids, and specific proteins as a central mechanism for the regulation of membrane structure and function; however, experimental limitations in defining raft composition and properties have prevented unequivocal demonstration of their functional relevance. Here, we establish a quantitative, functional relationship between raft association and subcellular protein sorting. By systematic mutation of the transmembrane and juxtamembrane domains of a model transmembrane protein, linker for activation of T-cells (LAT), we generated a panel of variants possessing a range of raft affinities. These mutations revealed palmitoylation, transmembrane domain length, and transmembrane sequence to be critical determinants of membrane raft association. Moreover, plasma membrane (PM) localization was strictly dependent on raft partitioning across the entire panel of unrelated mutants, suggesting that raft association is necessary and sufficient for PM sorting of LAT. Abrogation of raft partitioning led to mistargeting to late endosomes/lysosomes because of a failure to recycle from early endosomes. These findings identify structural determinants of raft association and validate lipid-driven domain formation as a mechanism for endosomal protein sorting.
Membrane raft association is a determinant of plasma membrane localization
Diaz-Rohrer, Blanca B.; Levental, Kandice R.; Simons, Kai; Levental, Ilya
2014-01-01
The lipid raft hypothesis proposes lateral domains driven by preferential interactions between sterols, sphingolipids, and specific proteins as a central mechanism for the regulation of membrane structure and function; however, experimental limitations in defining raft composition and properties have prevented unequivocal demonstration of their functional relevance. Here, we establish a quantitative, functional relationship between raft association and subcellular protein sorting. By systematic mutation of the transmembrane and juxtamembrane domains of a model transmembrane protein, linker for activation of T-cells (LAT), we generated a panel of variants possessing a range of raft affinities. These mutations revealed palmitoylation, transmembrane domain length, and transmembrane sequence to be critical determinants of membrane raft association. Moreover, plasma membrane (PM) localization was strictly dependent on raft partitioning across the entire panel of unrelated mutants, suggesting that raft association is necessary and sufficient for PM sorting of LAT. Abrogation of raft partitioning led to mistargeting to late endosomes/lysosomes because of a failure to recycle from early endosomes. These findings identify structural determinants of raft association and validate lipid-driven domain formation as a mechanism for endosomal protein sorting. PMID:24912166
Interleukin-11 binds specific EF-hand proteins via their conserved structural motifs.
Kazakov, Alexei S; Sokolov, Andrei S; Vologzhannikova, Alisa A; Permyakova, Maria E; Khorn, Polina A; Ismailov, Ramis G; Denessiouk, Konstantin A; Denesyuk, Alexander I; Rastrygina, Victoria A; Baksheeva, Viktoriia E; Zernii, Evgeni Yu; Zinchenko, Dmitry V; Glazatov, Vladimir V; Uversky, Vladimir N; Mirzabekov, Tajib A; Permyakov, Eugene A; Permyakov, Sergei E
2017-01-01
Interleukin-11 (IL-11) is a hematopoietic cytokine engaged in numerous biological processes and validated as a target for treatment of various cancers. IL-11 contains intrinsically disordered regions that might recognize multiple targets. Recently we found that aside from IL-11RA and gp130 receptors, IL-11 interacts with calcium sensor protein S100P. Strict calcium dependence of this interaction suggests a possibility of IL-11 interaction with other calcium sensor proteins. Here we probed specificity of IL-11 to calcium-binding proteins of various types: calcium sensors of the EF-hand family (calmodulin, S100B and neuronal calcium sensors: recoverin, NCS-1, GCAP-1, GCAP-2), calcium buffers of the EF-hand family (S100G, oncomodulin), and a non-EF-hand calcium buffer (α-lactalbumin). A specific subset of the calcium sensor proteins (calmodulin, S100B, NCS-1, GCAP-1/2) exhibits metal-dependent binding of IL-11 with dissociation constants of 1-19 μM. These proteins share several amino acid residues belonging to conservative structural motifs of the EF-hand proteins, 'black' and 'gray' clusters. Replacements of the respective S100P residues by alanine drastically decrease its affinity to IL-11, suggesting their involvement into the association process. Secondary structure and accessibility of the hinge region of the EF-hand proteins studied are predicted to control specificity and selectivity of their binding to IL-11. The IL-11 interaction with the EF-hand proteins is expected to occur under numerous pathological conditions, accompanied by disintegration of plasma membrane and efflux of cellular components into the extracellular milieu.
NASA Astrophysics Data System (ADS)
Fleishman, Sarel
2012-02-01
Molecular recognition underlies all life processes. Design of interactions not seen in nature is a test of our understanding of molecular recognition and could unlock the vast potential of subtle control over molecular interaction networks, allowing the design of novel diagnostics and therapeutics for basic and applied research. We developed the first general method for designing protein interactions. The method starts by computing a region of high affinity interactions between dismembered amino acid residues and the target surface and then identifying proteins that can harbor these residues. Designs are tested experimentally for binding the target surface and successful ones are affinity matured using yeast cell surface display. Applied to the conserved stem region of influenza hemagglutinin we designed two unrelated proteins that, following affinity maturation, bound hemagglutinin at subnanomolar dissociation constants. Co-crystal structures of hemagglutinin bound to the two designed binders were within 1Angstrom RMSd of their models, validating the accuracy of the design strategy. One of the designed proteins inhibits the conformational changes that underlie hemagglutinin's cell-invasion functions and blocks virus infectivity in cell culture, suggesting that such proteins may in future serve as diagnostics and antivirals against a wide range of pathogenic influenza strains. We have used this method to obtain experimentally validated binders of several other target proteins, demonstrating the generality of the approach. We discuss the combination of modeling and high-throughput characterization of design variants which has been key to the success of this approach, as well as how we have used the data obtained in this project to enhance our understanding of molecular recognition. References: Science 332:816 JMB, in press Protein Sci 20:753
Structural analysis of linear and conformational epitopes of allergens
Ivanciuc, Ovidiu; Schein, Catherine H.; Garcia, Tzintzuni; Oezguen, Numan; Negi, Surendra S.; Braun, Werner
2009-01-01
In many countries regulatory agencies have adopted safety guidelines, based on bioinformatics rules from the WHO/FAO and EFSA recommendations, to prevent potentially allergenic novel foods or agricultural products from reaching consumers. We created the Structural Database of Allergenic Proteins (SDAP, http://fermi.utmb.edu/SDAP/) to combine data that had previously been available only as flat files on Web pages or in the literature. SDAP was designed to be user friendly, to be of maximum use to regulatory agencies, clinicians, as well as to scientists interested in assessing the potential allergenic risk of a protein. We developed methods, unique to SDAP, to compare the physicochemical properties of discrete areas of allergenic proteins to known IgE epitopes. We developed a new similarity measure, the property distance (PD) value that can be used to detect related segments in allergens with clinical observed crossreactivity. We have now expanded this work to obtain experimental validation of the PD index as a quantitative predictor of IgE cross-reactivity, by designing peptide variants with predetermined PD scores relative to known IgE epitopes. In complementary work we show how sequence motifs characteristic of allergenic proteins in protein families can be used as fingerprints for allergenicity. PMID:19121639
Pucci, Fabrizio; Bourgeas, Raphaël; Rooman, Marianne
2016-03-18
The accurate prediction of the impact of an amino acid substitution on the thermal stability of a protein is a central issue in protein science, and is of key relevance for the rational optimization of various bioprocesses that use enzymes in unusual conditions. Here we present one of the first computational tools to predict the change in melting temperature ΔTm upon point mutations, given the protein structure and, when available, the melting temperature Tm of the wild-type protein. The key ingredients of our model structure are standard and temperature-dependent statistical potentials, which are combined with the help of an artificial neural network. The model structure was chosen on the basis of a detailed thermodynamic analysis of the system. The parameters of the model were identified on a set of more than 1,600 mutations with experimentally measured ΔTm. The performance of our method was tested using a strict 5-fold cross-validation procedure, and was found to be significantly superior to that of competing methods. We obtained a root mean square deviation between predicted and experimental ΔTm values of 4.2 °C that reduces to 2.9 °C when ten percent outliers are removed. A webserver-based tool is freely available for non-commercial use at soft.dezyme.com.
Ortega-Roldan, Jose Luis; Jensen, Malene Ringkjøbing; Brutscher, Bernhard; Azuaga, Ana I; Blackledge, Martin; van Nuland, Nico A J
2009-05-01
The description of the interactome represents one of key challenges remaining for structural biology. Physiologically important weak interactions, with dissociation constants above 100 muM, are remarkably common, but remain beyond the reach of most of structural biology. NMR spectroscopy, and in particular, residual dipolar couplings (RDCs) provide crucial conformational constraints on intermolecular orientation in molecular complexes, but the combination of free and bound contributions to the measured RDC seriously complicates their exploitation for weakly interacting partners. We develop a robust approach for the determination of weak complexes based on: (i) differential isotopic labeling of the partner proteins facilitating RDC measurement in both partners; (ii) measurement of RDC changes upon titration into different equilibrium mixtures of partially aligned free and complex forms of the proteins; (iii) novel analytical approaches to determine the effective alignment in all equilibrium mixtures; and (iv) extraction of precise RDCs for bound forms of both partner proteins. The approach is demonstrated for the determination of the three-dimensional structure of the weakly interacting CD2AP SH3-C:Ubiquitin complex (K(d) = 132 +/- 13 muM) and is shown, using cross-validation, to be highly precise. We expect this methodology to extend the remarkable and unique ability of NMR to study weak protein-protein complexes.
Subota, Ines; Julkowska, Daria; Vincensini, Laetitia; Reeg, Nele; Buisson, Johanna; Blisnick, Thierry; Huet, Diego; Perrot, Sylvie; Santi-Rocca, Julien; Duchateau, Magalie; Hourdel, Véronique; Rousselle, Jean-Claude; Cayet, Nadège; Namane, Abdelkader; Chamot-Rooke, Julia; Bastin, Philippe
2014-01-01
Cilia and flagella are complex organelles made of hundreds of proteins of highly variable structures and functions. Here we report the purification of intact flagella from the procyclic stage of Trypanosoma brucei using mechanical shearing. Structural preservation was confirmed by transmission electron microscopy that showed that flagella still contained typical elements such as the membrane, the axoneme, the paraflagellar rod, and the intraflagellar transport particles. It also revealed that flagella severed below the basal body, and were not contaminated by other cytoskeletal structures such as the flagellar pocket collar or the adhesion zone filament. Mass spectrometry analysis identified a total of 751 proteins with high confidence, including 88% of known flagellar components. Comparison with the cell debris fraction revealed that more than half of the flagellum markers were enriched in flagella and this enrichment criterion was taken into account to identify 212 proteins not previously reported to be associated to flagella. Nine of these were experimentally validated including a 14-3-3 protein not yet reported to be associated to flagella and eight novel proteins termed FLAM (FLAgellar Member). Remarkably, they localized to five different subdomains of the flagellum. For example, FLAM6 is restricted to the proximal half of the axoneme, no matter its length. In contrast, FLAM8 is progressively accumulating at the distal tip of growing flagella and half of it still needs to be added after cell division. A combination of RNA interference and Fluorescence Recovery After Photobleaching approaches demonstrated very different dynamics from one protein to the other, but also according to the stage of construction and the age of the flagellum. Structural proteins are added to the distal tip of the elongating flagellum and exhibit slow turnover whereas membrane proteins such as the arginine kinase show rapid turnover without a detectible polarity. PMID:24741115
Tabassum, Asra; Rajeshwari, Tadigadapa; Soni, Nidhi; Raju, D S B; Yadav, Mukesh; Nayarisseri, Anuraj; Jahan, Parveen
2014-03-01
Non-synonymous single nucleotide changes (nSNC) are coding variants that introduce amino acid changes in their corresponding proteins. They can affect protein function; they are believed to have the largest impact on human health compared with SNCs in other regions of the genome. Such a sequence alteration directly affects their structural stability through conformational changes. Presence of these conformational changes near catalytic site or active site may alter protein function and as a consequence receptor-ligand complex interactions. The present investigation includes assessment of human podocin mutations (G92C, P118L, R138Q, and D160G) on its structure. Podocin is an important glomerular integral membrane protein thought to play a key role in steroid resistant nephrotic syndrome. Podocin has a hairpin like structure with 383 amino acids, it is an integral protein homologous to stomatin, and acts as a molecular link in a stretch-sensitive system. We modeled 3D structure of podocin by means of Modeller and validated via PROCHECK to get a Ramachandran plot (88.5% in most favored region), main chain, side chain, bad contacts, gauche and pooled standard deviation. Further, a protein engineering tool Triton was used to induce mutagenesis corresponding to four variants G92C, P118L, R138Q and D160G in the wild type. Perusal of energies of wild and mutated type of podocin structures confirmed that mutated structures were thermodynamically more stable than wild type and therefore biological events favored synthesis of mutated forms of podocin than wild type. As a conclusive part, two mutations G92C (-8179.272 kJ/mol) and P118L (-8136.685 kJ/mol) are more stable and probable to take place in podocin structure over wild podocin structure (-8105.622 kJ/mol). Though there is lesser difference in mutated and wild type (approximately, 74 and 35 kJ/mol), it may play a crucial role in deciding why mutations are favored and occur at the genetic level.
Sborgi, Lorenzo; Ravotti, Francesco; Dandey, Venkata P.; Dick, Mathias S.; Mazur, Adam; Reckel, Sina; Chami, Mohamed; Scherer, Sebastian; Huber, Matthias; Böckmann, Anja; Egelman, Edward H.; Stahlberg, Henning; Broz, Petr; Meier, Beat H.; Hiller, Sebastian
2015-01-01
Inflammasomes are multiprotein complexes that control the innate immune response by activating caspase-1, thus promoting the secretion of cytokines in response to invading pathogens and endogenous triggers. Assembly of inflammasomes is induced by activation of a receptor protein. Many inflammasome receptors require the adapter protein ASC [apoptosis-associated speck-like protein containing a caspase-recruitment domain (CARD)], which consists of two domains, the N-terminal pyrin domain (PYD) and the C-terminal CARD. Upon activation, ASC forms large oligomeric filaments, which facilitate procaspase-1 recruitment. Here, we characterize the structure and filament formation of mouse ASC in vitro at atomic resolution. Information from cryo-electron microscopy and solid-state NMR spectroscopy is combined in a single structure calculation to obtain the atomic-resolution structure of the ASC filament. Perturbations of NMR resonances upon filament formation monitor the specific binding interfaces of ASC-PYD association. Importantly, NMR experiments show the rigidity of the PYD forming the core of the filament as well as the high mobility of the CARD relative to this core. The findings are validated by structure-based mutagenesis experiments in cultured macrophages. The 3D structure of the mouse ASC-PYD filament is highly similar to the recently determined human ASC-PYD filament, suggesting evolutionary conservation of ASC-dependent inflammasome mechanisms. PMID:26464513
Domain analyses of Usher syndrome causing Clarin-1 and GPR98 protein models.
Khan, Sehrish Haider; Javed, Muhammad Rizwan; Qasim, Muhammad; Shahzadi, Samar; Jalil, Asma; Rehman, Shahid Ur
2014-01-01
Usher syndrome is an autosomal recessive disorder that causes hearing loss, Retinitis Pigmentosa (RP) and vestibular dysfunction. It is clinically and genetically heterogeneous disorder which is clinically divided into three types i.e. type I, type II and type III. To date, there are about twelve loci and ten identified genes which are associated with Usher syndrome. A mutation in any of these genes e.g. CDH23, CLRN1, GPR98, MYO7A, PCDH15, USH1C, USH1G, USH2A and DFNB31 can result in Usher syndrome or non-syndromic deafness. These genes provide instructions for making proteins that play important roles in normal hearing, balance and vision. Studies have shown that protein structures of only seven genes have been determined experimentally and there are still three genes whose structures are unavailable. These genes are Clarin-1, GPR98 and Usherin. In the absence of an experimentally determined structure, homology modeling and threading often provide a useful 3D model of a protein. Therefore in the current study Clarin-1 and GPR98 proteins have been analyzed for signal peptide, domains and motifs. Clarin-1 protein was found to be without any signal peptide and consists of prokar lipoprotein domain. Clarin-1 is classified within claudin 2 super family and consists of twelve motifs. Whereas, GPR98 has a 29 amino acids long signal peptide and classified within GPCR family 2 having Concanavalin A-like lectin/glucanase superfamily. It was found to be consists of GPS and G protein receptor F2 domains and twenty nine motifs. Their 3D structures have been predicted using I-TASSER server. The model of Clarin-1 showed only α-helix but no beta sheets while model of GPR98 showed both α-helix and β sheets. The predicted structures were then evaluated and validated by MolProbity and Ramachandran plot. The evaluation of the predicted structures showed 78.9% residues of Clarin-1 and 78.9% residues of GPR98 within favored regions. The findings of present study has resulted in the three dimensional structure prediction and conserved domain analysis which will be quite beneficial in better understanding of molecular components, protein-protein interaction, clinical heterogeneity and pathophysiology of Usher syndrome.
Domain analyses of Usher syndrome causing Clarin-1 and GPR98 protein models
Khan, Sehrish Haider; Javed, Muhammad Rizwan; Qasim, Muhammad; Shahzadi, Samar; Jalil, Asma; Rehman, Shahid ur
2014-01-01
Usher syndrome is an autosomal recessive disorder that causes hearing loss, Retinitis Pigmentosa (RP) and vestibular dysfunction. It is clinically and genetically heterogeneous disorder which is clinically divided into three types i.e. type I, type II and type III. To date, there are about twelve loci and ten identified genes which are associated with Usher syndrome. A mutation in any of these genes e.g. CDH23, CLRN1, GPR98, MYO7A, PCDH15, USH1C, USH1G, USH2A and DFNB31 can result in Usher syndrome or non-syndromic deafness. These genes provide instructions for making proteins that play important roles in normal hearing, balance and vision. Studies have shown that protein structures of only seven genes have been determined experimentally and there are still three genes whose structures are unavailable. These genes are Clarin-1, GPR98 and Usherin. In the absence of an experimentally determined structure, homology modeling and threading often provide a useful 3D model of a protein. Therefore in the current study Clarin-1 and GPR98 proteins have been analyzed for signal peptide, domains and motifs. Clarin-1 protein was found to be without any signal peptide and consists of prokar lipoprotein domain. Clarin-1 is classified within claudin 2 super family and consists of twelve motifs. Whereas, GPR98 has a 29 amino acids long signal peptide and classified within GPCR family 2 having Concanavalin A-like lectin/glucanase superfamily. It was found to be consists of GPS and G protein receptor F2 domains and twenty nine motifs. Their 3D structures have been predicted using I-TASSER server. The model of Clarin-1 showed only α-helix but no beta sheets while model of GPR98 showed both α-helix and β sheets. The predicted structures were then evaluated and validated by MolProbity and Ramachandran plot. The evaluation of the predicted structures showed 78.9% residues of Clarin-1 and 78.9% residues of GPR98 within favored regions. The findings of present study has resulted in the three dimensional structure prediction and conserved domain analysis which will be quite beneficial in better understanding of molecular components, protein-protein interaction, clinical heterogeneity and pathophysiology of Usher syndrome. PMID:25258483
Gupta, Vibha; Gupta, Rakesh K.; Khare, Garima; Salunke, Dinakar M.; Tyagi, Anil K.
2009-01-01
Emergence of tuberculosis as a global health threat has necessitated an urgent search for new antitubercular drugs entailing determination of 3-dimensional structures of a large number of mycobacterial proteins for structure-based drug design. The essential requirement of ferritins/bacterioferritins (proteins involved in iron storage and homeostasis) for the survival of several prokaryotic pathogens makes these proteins very attractive targets for structure determination and inhibitor design. Bacterioferritins (Bfrs) differ from ferritins in that they have additional noncovalently bound haem groups. The physiological role of haem in Bfrs is not very clear but studies indicate that the haem group is involved in mediating release of iron from Bfr by facilitating reduction of the iron core. To further enhance our understanding, we have determined the crystal structure of the selenomethionyl analog of bacterioferritin A (SeMet-BfrA) from Mycobacterium tuberculosis (Mtb). Unexpectedly, electron density observed in the crystals of SeMet-BfrA analogous to haem location in bacterioferritins, shows a demetallated and degraded product of haem. This unanticipated observation is a consequence of the altered spatial electronic environment around the axial ligands of haem (in lieu of Met52 modification to SeMet52). Furthermore, the structure of Mtb SeMet-BfrA displays a possible lost protein interaction with haem propionates due to formation of a salt bridge between Arg53-Glu57, which appears to be unique to Mtb BfrA, resulting in slight modulation of haem binding pocket in this organism. The crystal structure of Mtb SeMet-BfrA provides novel leads to physiological function of haem in Bfrs. If validated as a drug target, it may also serve as a scaffold for designing specific inhibitors. In addition, this study provides evidence against the general belief that a selenium derivative of a protein represents its true physiological native structure. PMID:19946376
Stability of halophilic proteins: from dipeptide attributes to discrimination classifier.
Zhang, Guangya; Huihua, Ge; Yi, Lin
2013-02-01
To investigate the molecular features responsible for protein halophilicity is of great significance for understanding the structure basis of protein halo-stability and would help to develop a practical strategy for designing halophilic proteins. In this work, we have systematically analyzed the dipeptide composition of the halophilic and non-halophilic protein sequences. We observed the halophilic proteins contained more DA, RA, AD, RR, AP, DD, PD, EA, VG and DV at the expense of LK, IL, II, IA, KK, IS, KA, GK, RK and AI. We identified some macromolecular signatures of halo-adaptation, and thought the dipeptide composition might contain more information than amino acid composition. Based on the dipeptide composition, we have developed a machine learning method for classifying halophilic and non-halophilic proteins for the first time. The accuracy of our method for the training dataset was 100.0%, and for the 10-fold cross-validation was 93.1%. We also discussed the influence of some specific dipeptides on prediction accuracy. Copyright © 2012 Elsevier B.V. All rights reserved.
Detailed proteomic analysis on DM: insight into its hypoallergenicity.
Bertino, Enrico; Gastaldi, Daniela; Monti, Giovanna; Baro, Cristina; Fortunato, Donatella; Perono Garoffo, Lorenza; Coscia, Alessandra; Fabris, Claudio; Mussap, Michele; Conti, Amedeo
2010-01-01
Successful therapy in cow milk (CM) protein allergy rests upon completely eliminating CM proteins from the child's diet: it is thus necessary to provide a replacement food. Donkey milk (DM) has recently aroused scientific and clinical interest, above all among paediatric allergologists. A deeper knowledge of proteins in DM is necessary to evaluate the immunological and physiological properties of this natural substitute for cow's milk. The paper offers a detailed comparative analysis among the protein fractions of DM, CM and human milk, following an extensive proteomic study of the casein and whey proteins of DM performed by narrow pH range 2-DE. The detailed protein composition and structural features reported in this study provide insight into the molecular reasons for the hypoallergenicity of DM. Whole DM might constitute a valid substitute of CM in feeding children with CM protein allergy and it might also constitute the basis for formulas suitable for allergic subjects in the first year of life.
Automating the application of smart materials for protein crystallization.
Khurshid, Sahir; Govada, Lata; El-Sharif, Hazim F; Reddy, Subrayal M; Chayen, Naomi E
2015-03-01
The fabrication and validation of the first semi-liquid nonprotein nucleating agent to be administered automatically to crystallization trials is reported. This research builds upon prior demonstration of the suitability of molecularly imprinted polymers (MIPs; known as `smart materials') for inducing protein crystal growth. Modified MIPs of altered texture suitable for high-throughput trials are demonstrated to improve crystal quality and to increase the probability of success when screening for suitable crystallization conditions. The application of these materials is simple, time-efficient and will provide a potent tool for structural biologists embarking on crystallization trials.
The Protein Data Bank: unifying the archive
Westbrook, John; Feng, Zukang; Jain, Shri; Bhat, T. N.; Thanki, Narmada; Ravichandran, Veerasamy; Gilliland, Gary L.; Bluhm, Wolfgang F.; Weissig, Helge; Greer, Douglas S.; Bourne, Philip E.; Berman, Helen M.
2002-01-01
The Protein Data Bank (PDB; http://www.pdb.org/) is the single worldwide archive of structural data of biological macromolecules. This paper describes the progress that has been made in validating all data in the PDB archive and in releasing a uniform archive for the community. We have now produced a collection of mmCIF data files for the PDB archive (ftp://beta.rcsb.org/pub/pdb/uniformity/data/mmCIF/). A utility application that converts the mmCIF data files to the PDB format (called CIFTr) has also been released to provide support for existing software. PMID:11752306
Chang, Yi-Chien; Hu, Zhenjun; Rachlin, John; Anton, Brian P; Kasif, Simon; Roberts, Richard J; Steffen, Martin
2016-01-04
The COMBREX database (COMBREX-DB; combrex.bu.edu) is an online repository of information related to (i) experimentally determined protein function, (ii) predicted protein function, (iii) relationships among proteins of unknown function and various types of experimental data, including molecular function, protein structure, and associated phenotypes. The database was created as part of the novel COMBREX (COMputational BRidges to EXperiments) effort aimed at accelerating the rate of gene function validation. It currently holds information on ∼ 3.3 million known and predicted proteins from over 1000 completely sequenced bacterial and archaeal genomes. The database also contains a prototype recommendation system for helping users identify those proteins whose experimental determination of function would be most informative for predicting function for other proteins within protein families. The emphasis on documenting experimental evidence for function predictions, and the prioritization of uncharacterized proteins for experimental testing distinguish COMBREX from other publicly available microbial genomics resources. This article describes updates to COMBREX-DB since an initial description in the 2011 NAR Database Issue. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
Khoury, George A; Smadbeck, James; Kieslich, Chris A; Koskosidis, Alexandra J; Guzman, Yannis A; Tamamis, Phanourios; Floudas, Christodoulos A
2017-06-01
Protein structure refinement is the challenging problem of operating on any protein structure prediction to improve its accuracy with respect to the native structure in a blind fashion. Although many approaches have been developed and tested during the last four CASP experiments, a majority of the methods continue to degrade models rather than improve them. Princeton_TIGRESS (Khoury et al., Proteins 2014;82:794-814) was developed previously and utilizes separate sampling and selection stages involving Monte Carlo and molecular dynamics simulations and classification using an SVM predictor. The initial implementation was shown to consistently refine protein structures 76% of the time in our own internal benchmarking on CASP 7-10 targets. In this work, we improved the sampling and selection stages and tested the method in blind predictions during CASP11. We added a decomposition of physics-based and hybrid energy functions, as well as a coordinate-free representation of the protein structure through distance-binning Cα-Cα distances to capture fine-grained movements. We performed parameter estimation to optimize the adjustable SVM parameters to maximize precision while balancing sensitivity and specificity across all cross-validated data sets, finding enrichment in our ability to select models from the populations of similar decoys generated for targets in CASPs 7-10. The MD stage was enhanced such that larger structures could be further refined. Among refinement methods that are currently implemented as web-servers, Princeton_TIGRESS 2.0 demonstrated the most consistent and most substantial net refinement in blind predictions during CASP11. The enhanced refinement protocol Princeton_TIGRESS 2.0 is freely available as a web server at http://atlas.engr.tamu.edu/refinement/. Proteins 2017; 85:1078-1098. © 2017 Wiley Periodicals, Inc. © 2017 Wiley Periodicals, Inc.
Plasma proteins predict conversion to dementia from prodromal disease
Hye, Abdul; Riddoch-Contreras, Joanna; Baird, Alison L.; Ashton, Nicholas J.; Bazenet, Chantal; Leung, Rufina; Westman, Eric; Simmons, Andrew; Dobson, Richard; Sattlecker, Martina; Lupton, Michelle; Lunnon, Katie; Keohane, Aoife; Ward, Malcolm; Pike, Ian; Zucht, Hans Dieter; Pepin, Danielle; Zheng, Wei; Tunnicliffe, Alan; Richardson, Jill; Gauthier, Serge; Soininen, Hilkka; Kłoszewska, Iwona; Mecocci, Patrizia; Tsolaki, Magda; Vellas, Bruno; Lovestone, Simon
2014-01-01
Background The study aimed to validate previously discovered plasma biomarkers associated with AD, using a design based on imaging measures as surrogate for disease severity and assess their prognostic value in predicting conversion to dementia. Methods Three multicenter cohorts of cognitively healthy elderly, mild cognitive impairment (MCI), and AD participants with standardized clinical assessments and structural neuroimaging measures were used. Twenty-six candidate proteins were quantified in 1148 subjects using multiplex (xMAP) assays. Results Sixteen proteins correlated with disease severity and cognitive decline. Strongest associations were in the MCI group with a panel of 10 proteins predicting progression to AD (accuracy 87%, sensitivity 85%, and specificity 88%). Conclusions We have identified 10 plasma proteins strongly associated with disease severity and disease progression. Such markers may be useful for patient selection for clinical trials and assessment of patients with predisease subjective memory complaints. PMID:25012867
Plasma proteins predict conversion to dementia from prodromal disease.
Hye, Abdul; Riddoch-Contreras, Joanna; Baird, Alison L; Ashton, Nicholas J; Bazenet, Chantal; Leung, Rufina; Westman, Eric; Simmons, Andrew; Dobson, Richard; Sattlecker, Martina; Lupton, Michelle; Lunnon, Katie; Keohane, Aoife; Ward, Malcolm; Pike, Ian; Zucht, Hans Dieter; Pepin, Danielle; Zheng, Wei; Tunnicliffe, Alan; Richardson, Jill; Gauthier, Serge; Soininen, Hilkka; Kłoszewska, Iwona; Mecocci, Patrizia; Tsolaki, Magda; Vellas, Bruno; Lovestone, Simon
2014-11-01
The study aimed to validate previously discovered plasma biomarkers associated with AD, using a design based on imaging measures as surrogate for disease severity and assess their prognostic value in predicting conversion to dementia. Three multicenter cohorts of cognitively healthy elderly, mild cognitive impairment (MCI), and AD participants with standardized clinical assessments and structural neuroimaging measures were used. Twenty-six candidate proteins were quantified in 1148 subjects using multiplex (xMAP) assays. Sixteen proteins correlated with disease severity and cognitive decline. Strongest associations were in the MCI group with a panel of 10 proteins predicting progression to AD (accuracy 87%, sensitivity 85%, and specificity 88%). We have identified 10 plasma proteins strongly associated with disease severity and disease progression. Such markers may be useful for patient selection for clinical trials and assessment of patients with predisease subjective memory complaints. Copyright © 2014 The Authors. Published by Elsevier Inc. All rights reserved.
Engineering the entropy-driven free-energy landscape of a dynamic nanoporous protein assembly.
Alberstein, Robert; Suzuki, Yuta; Paesani, Francesco; Tezcan, F Akif
2018-04-30
De novo design and construction of stimuli-responsive protein assemblies that predictably switch between discrete conformational states remains an essential but highly challenging goal in biomolecular design. We previously reported synthetic, two-dimensional protein lattices self-assembled via disulfide bonding interactions, which endows them with a unique capacity to undergo coherent conformational changes without losing crystalline order. Here, we carried out all-atom molecular dynamics simulations to map the free-energy landscape of these lattices, validated this landscape through extensive structural characterization by electron microscopy and established that it is predominantly governed by solvent reorganization entropy. Subsequent redesign of the protein surface with conditionally repulsive electrostatic interactions enabled us to predictably perturb the free-energy landscape and obtain a new protein lattice whose conformational dynamics can be chemically and mechanically toggled between three different states with varying porosities and molecular densities.
Hossain, Mohammad M; Wilson, William C; Faburay, Bonto; Richt, Jürgen; McVey, David S; Rowland, Raymond R
2016-08-01
A multiplex fluorescence microsphere immunoassay (FMIA) was used to detect bovine and ovine IgM and IgG antibodies to several Rift Valley fever virus (RVFV) proteins, including the major surface glycoprotein, Gn; the nonstructural proteins, NSs and NSm; and the nucleoprotein, N. Target antigens were assembled into a multiplex and tested in serum samples from infected wild-type RVFV or MP12, a modified live virus vaccine. As expected, the N protein was immunodominant and the best target for early detection of infection. Antibody activity against the other targets was also detected. The experimental results demonstrate the capabilities of FMIA for the detection of antibodies to RVFV structural and nonstructural proteins, which can be applied to future development and validation of diagnostic tests that can be used to differentiate vaccinated from infected animals.
Raster-scanning serial protein crystallography using micro- and nano-focused synchrotron beams
DOE Office of Scientific and Technical Information (OSTI.GOV)
Coquelle, Nicolas; CNRS, IBS, 38044 Grenoble; CEA, IBS, 38044 Grenoble
A raster scanning serial protein crystallography approach is presented, that consumes as low ∼200–700 nl of sedimented crystals. New serial data pre-analysis software, NanoPeakCell, is introduced. High-resolution structural information was obtained from lysozyme microcrystals (20 µm in the largest dimension) using raster-scanning serial protein crystallography on micro- and nano-focused beamlines at the ESRF. Data were collected at room temperature (RT) from crystals sandwiched between two silicon nitride wafers, thereby preventing their drying, while limiting background scattering and sample consumption. In order to identify crystal hits, new multi-processing and GUI-driven Python-based pre-analysis software was developed, named NanoPeakCell, that was able tomore » read data from a variety of crystallographic image formats. Further data processing was carried out using CrystFEL, and the resultant structures were refined to 1.7 Å resolution. The data demonstrate the feasibility of RT raster-scanning serial micro- and nano-protein crystallography at synchrotrons and validate it as an alternative approach for the collection of high-resolution structural data from micro-sized crystals. Advantages of the proposed approach are its thriftiness, its handling-free nature, the reduced amount of sample required, the adjustable hit rate, the high indexing rate and the minimization of background scattering.« less
Picosecond fluorescence of intact and dissolved PSI-LHCI crystals.
van Oort, Bart; Amunts, Alexey; Borst, Jan Willem; van Hoek, Arie; Nelson, Nathan; van Amerongen, Herbert; Croce, Roberta
2008-12-15
Over the past several years, many crystal structures of photosynthetic pigment-protein complexes have been determined, and these have been used extensively to model spectroscopic results obtained on the same proteins in solution. However, the crystal structure is not necessarily identical to the structure of the protein in solution. Here, we studied picosecond fluorescence of photosystem I light-harvesting complex I (PSI-LHCI), a multisubunit pigment-protein complex that catalyzes the first steps of photosynthesis. The ultrafast fluorescence of PSI-LHCI crystals is identical to that of dissolved crystals, but differs considerably from most kinetics presented in the literature. In contrast to most studies, the data presented here can be modeled quantitatively with only two compartments: PSI core and LHCI. This yields the rate of charge separation from an equilibrated core (22.5 +/- 2.5 ps) and rates of excitation energy transfer from LHCI to core (k(LC)) and vice versa (k(CL)). The ratio between these rates, R = k(CL)/k(LC), appears to be wavelength-dependent and scales with the ratio of the absorption spectra of LHCI and core, indicating the validity of a detailed balance relation between both compartments. k(LC) depends slightly but nonsystematically on detection wavelength, averaging (9.4 +/- 4.9 ps)(-1). R ranges from 0.5 (<690 nm) to approximately 1.3 above 720 nm.
McCue, J; Osborne, D; Dumont, J; Peters, R; Mei, B; Pierce, G F; Kobayashi, K; Euwart, D
2014-01-01
Recombinant factor IX Fc (rFIXFc) fusion protein is the first of a new class of bioengineered long-acting factors approved for the treatment and prevention of bleeding episodes in haemophilia B. The aim of this work was to describe the manufacturing process for rFIXFc, to assess product quality and to evaluate the capacity of the process to remove impurities and viruses. This manufacturing process utilized a transferable and scalable platform approach established for therapeutic antibody manufacturing and adapted for production of the rFIXFc molecule. rFIXFc was produced using a process free of human- and animal-derived raw materials and a host cell line derived from human embryonic kidney (HEK) 293H cells. The process employed multi-step purification and viral clearance processing, including use of a protein A affinity capture chromatography step, which binds to the Fc portion of the rFIXFc molecule with high affinity and specificity, and a 15 nm pore size virus removal nanofilter. Process validation studies were performed to evaluate identity, purity, activity and safety. The manufacturing process produced rFIXFc with consistent product quality and high purity. Impurity clearance validation studies demonstrated robust and reproducible removal of process-related impurities and adventitious viruses. The rFIXFc manufacturing process produces a highly pure product, free of non-human glycan structures. Validation studies demonstrate that this product is produced with consistent quality and purity. In addition, the scalability and transferability of this process are key attributes to ensure consistent and continuous supply of rFIXFc. PMID:24811361
Customizing G Protein-coupled receptor models for structure-based virtual screening.
de Graaf, Chris; Rognan, Didier
2009-01-01
This review will focus on the construction, refinement, and validation of G Protein-coupled receptor models for the purpose of structure-based virtual screening. Practical tips and tricks derived from concrete modeling and virtual screening exercises to overcome the problems and pitfalls associated with the different steps of the receptor modeling workflow will be presented. These examples will not only include rhodopsin-like (class A), but also secretine-like (class B), and glutamate-like (class C) receptors. In addition, the review will present a careful comparative analysis of current crystal structures and their implication on homology modeling. The following themes will be discussed: i) the use of experimental anchors in guiding the modeling procedure; ii) amino acid sequence alignments; iii) ligand binding mode accommodation and binding cavity expansion; iv) proline-induced kinks in transmembrane helices; v) binding mode prediction and virtual screening by receptor-ligand interaction fingerprint scoring; vi) extracellular loop modeling; vii) virtual filtering schemes. Finally, an overview of several successful structure-based screening shows that receptor models, despite structural inaccuracies, can be efficiently used to find novel ligands.
Bergal, Hans Thor; Hopkins, Alex Hunt; Metzner, Sandra Ines; Sousa, Marcelo Carlos
2016-02-02
The β-barrel assembly machine (BAM) mediates folding and insertion of integral β-barrel outer membrane proteins (OMPs) in Gram-negative bacteria. Of the five BAM subunits, only BamA and BamD are essential for cell viability. Here we present the crystal structure of a fusion between BamA POTRA4-5 and BamD from Rhodothermus marinus. The POTRA5 domain binds BamD between its tetratricopeptide repeats 3 and 4. The interface structural elements are conserved in the Escherichia coli proteins, which allowed structure validation by mutagenesis and disulfide crosslinking in E. coli. Furthermore, the interface is consistent with previously reported mutations that impair BamA-BamD binding. The structure serves as a linchpin to generate a BAM model where POTRA domains and BamD form an elongated periplasmic ring adjacent to the membrane with a central cavity approximately 30 × 60 Å wide. We propose that nascent OMPs bind this periplasmic ring prior to insertion and folding by BAM. Copyright © 2016 Elsevier Ltd. All rights reserved.
Gong, Xin; Qian, Hongwu; Shao, Wei; Li, Jingxian; Wu, Jianping; Liu, Jun-Jie; Li, Wenqi; Wang, Hong-Wei; Espenshade, Peter; Yan, Nieng
2016-11-01
Sterol regulatory element-binding protein (SREBP) transcription factors are master regulators of cellular lipid homeostasis in mammals and oxygen-responsive regulators of hypoxic adaptation in fungi. SREBP C-terminus binds to the WD40 domain of SREBP cleavage-activating protein (SCAP), which confers sterol regulation by controlling the ER-to-Golgi transport of the SREBP-SCAP complex and access to the activating proteases in the Golgi. Here, we biochemically and structurally show that the carboxyl terminal domains (CTD) of Sre1 and Scp1, the fission yeast SREBP and SCAP, form a functional 4:4 oligomer and Sre1-CTD forms a dimer of dimers. The crystal structure of Sre1-CTD at 3.5 Å and cryo-EM structure of the complex at 5.4 Å together with in vitro biochemical evidence elucidate three distinct regions in Sre1-CTD required for Scp1 binding, Sre1-CTD dimerization and tetrameric formation. Finally, these structurally identified domains are validated in a cellular context, demonstrating that the proper 4:4 oligomeric complex formation is required for Sre1 activation.
ASD: a comprehensive database of allosteric proteins and modulators
Huang, Zhimin; Zhu, Liang; Cao, Yan; Wu, Geng; Liu, Xinyi; Chen, Yingyi; Wang, Qi; Shi, Ting; Zhao, Yaxue; Wang, Yuefei; Li, Weihua; Li, Yixue; Chen, Haifeng; Chen, Guoqiang; Zhang, Jian
2011-01-01
Allostery is the most direct, rapid and efficient way of regulating protein function, ranging from the control of metabolic mechanisms to signal-transduction pathways. However, an enormous amount of unsystematic allostery information has deterred scientists who could benefit from this field. Here, we present the AlloSteric Database (ASD), the first online database that provides a central resource for the display, search and analysis of structure, function and related annotation for allosteric molecules. Currently, ASD contains 336 allosteric proteins from 101 species and 8095 modulators in three categories (activators, inhibitors and regulators). Proteins are annotated with a detailed description of allostery, biological process and related diseases, and modulators with binding affinity, physicochemical properties and therapeutic area. Integrating the information of allosteric proteins in ASD should allow for the identification of specific allosteric sites of a given subtype among proteins of the same family that can potentially serve as ideal targets for experimental validation. In addition, modulators curated in ASD can be used to investigate potent allosteric targets for the query compound, and also help chemists to implement structure modifications for novel allosteric drug design. Therefore, ASD could be a platform and a starting point for biologists and medicinal chemists for furthering allosteric research. ASD is freely available at http://mdl.shsmu.edu.cn/ASD/. PMID:21051350
A Santos, Jose C; Nassif, Houssam; Page, David; Muggleton, Stephen H; E Sternberg, Michael J
2012-07-11
There is a need for automated methods to learn general features of the interactions of a ligand class with its diverse set of protein receptors. An appropriate machine learning approach is Inductive Logic Programming (ILP), which automatically generates comprehensible rules in addition to prediction. The development of ILP systems which can learn rules of the complexity required for studies on protein structure remains a challenge. In this work we use a new ILP system, ProGolem, and demonstrate its performance on learning features of hexose-protein interactions. The rules induced by ProGolem detect interactions mediated by aromatics and by planar-polar residues, in addition to less common features such as the aromatic sandwich. The rules also reveal a previously unreported dependency for residues cys and leu. They also specify interactions involving aromatic and hydrogen bonding residues. This paper shows that Inductive Logic Programming implemented in ProGolem can derive rules giving structural features of protein/ligand interactions. Several of these rules are consistent with descriptions in the literature. In addition to confirming literature results, ProGolem's model has a 10-fold cross-validated predictive accuracy that is superior, at the 95% confidence level, to another ILP system previously used to study protein/hexose interactions and is comparable with state-of-the-art statistical learners.
Survey of phosphorylation near drug binding sites in the Protein Data Bank (PDB) and their effects.
Smith, Kyle P; Gifford, Kathleen M; Waitzman, Joshua S; Rice, Sarah E
2015-01-01
While it is currently estimated that 40 to 50% of eukaryotic proteins are phosphorylated, little is known about the frequency and local effects of phosphorylation near pharmaceutical inhibitor binding sites. In this study, we investigated how frequently phosphorylation may affect the binding of drug inhibitors to target proteins. We examined the 453 non-redundant structures of soluble mammalian drug target proteins bound to inhibitors currently available in the Protein Data Bank (PDB). We cross-referenced these structures with phosphorylation data available from the PhosphoSitePlus database. Three hundred twenty-two of 453 (71%) of drug targets have evidence of phosphorylation that has been validated by multiple methods or labs. For 132 of 453 (29%) of those, the phosphorylation site is within 12 Å of the small molecule-binding site, where it would likely alter small molecule binding affinity. We propose a framework for distinguishing between drug-phosphorylation site interactions that are likely to alter the efficacy of drugs versus those that are not. In addition we highlight examples of well-established drug targets, such as estrogen receptor alpha, for which phosphorylation may affect drug affinity and clinical efficacy. Our data suggest that phosphorylation may affect drug binding and efficacy for a significant fraction of drug target proteins. © 2014 Wiley Periodicals, Inc.
hEIDI: An Intuitive Application Tool To Organize and Treat Large-Scale Proteomics Data.
Hesse, Anne-Marie; Dupierris, Véronique; Adam, Claire; Court, Magali; Barthe, Damien; Emadali, Anouk; Masselon, Christophe; Ferro, Myriam; Bruley, Christophe
2016-10-07
Advances in high-throughput proteomics have led to a rapid increase in the number, size, and complexity of the associated data sets. Managing and extracting reliable information from such large series of data sets require the use of dedicated software organized in a consistent pipeline to reduce, validate, exploit, and ultimately export data. The compilation of multiple mass-spectrometry-based identification and quantification results obtained in the context of a large-scale project represents a real challenge for developers of bioinformatics solutions. In response to this challenge, we developed a dedicated software suite called hEIDI to manage and combine both identifications and semiquantitative data related to multiple LC-MS/MS analyses. This paper describes how, through a user-friendly interface, hEIDI can be used to compile analyses and retrieve lists of nonredundant protein groups. Moreover, hEIDI allows direct comparison of series of analyses, on the basis of protein groups, while ensuring consistent protein inference and also computing spectral counts. hEIDI ensures that validated results are compliant with MIAPE guidelines as all information related to samples and results is stored in appropriate databases. Thanks to the database structure, validated results generated within hEIDI can be easily exported in the PRIDE XML format for subsequent publication. hEIDI can be downloaded from http://biodev.extra.cea.fr/docs/heidi .
Duffy, Bryan C; Liu, Shuang; Martin, Gregory S; Wang, Ruifang; Hsia, Ming Min; Zhao, He; Guo, Cheng; Ellis, Michael; Quinn, John F; Kharenko, Olesya A; Norek, Karen; Gesner, Emily M; Young, Peter R; McLure, Kevin G; Wagner, Gregory S; Lakshminarasimhan, Damodharan; White, Andre; Suto, Robert K; Hansen, Henrik C; Kitchen, Douglas B
2015-07-15
Bromodomains are key transcriptional regulators that are thought to be druggable epigenetic targets for cancer, inflammation, diabetes and cardiovascular therapeutics. Of particular importance is the first of two bromodomains in bromodomain containing 4 protein (BRD4(1)). Protein-ligand docking in BRD4(1) was used to purchase a small, focused screening set of compounds possessing a large variety of core structures. Within this set, a small number of weak hits each contained a dihydroquinoxalinone ring system. We purchased other analogs with this ring system and further validated the new hit series and obtained improvement in binding inhibition. Limited exploration by new analog synthesis showed that the binding inhibition in a FRET assay could be improved to the low μM level making this new core a potential hit-to-lead series. Additionally, the predicted geometries of the initial hit and an improved analog were confirmed by X-ray co-crystallography with BRD4(1). Copyright © 2015 Elsevier Ltd. All rights reserved.
Joshi, Prashant; Gupta, Mehak; Vishwakarma, Ram A; Kumar, Ajay; Bharate, Sandip B
2017-06-01
Glycogen synthase kinase 3β (GSK-3β) is a widely investigated molecular target for numerous diseases including Alzheimer's disease, cancer, and diabetes mellitus. The present study was aimed to discover new scaffolds for GSK-3β inhibition, through protein structure-guided virtual screening approach. With the availability of large number of GSK-3β crystal structures with varying degree of RMSD in protein backbone and RMSF in side chain geometry, herein appropriate crystal structures were selected based on the characteristic ROC curve and percentage enrichment of actives. The validated docking protocol was employed to screen a library of 50,000 small molecules using molecular docking and binding affinity calculations. Based on the GLIDE docking score, Prime MMGB/SA binding affinity, and interaction pattern analysis, the top 50 ligands were selected for GSK-3β inhibition. (Z)-2-(3-chlorobenzylidene)-3,4-dihydro-N-(2-methoxyethyl)-3-oxo-2H-benzo[b][1,4]oxazine-6-carboxamide (F389-0663, 7) was identified as a potent inhibitor of GSK-3β with an IC 50 value of 1.6 μm. Further, GSK-3β inhibition activity was then investigated in cell-based assay. The treatment of neuroblastoma N2a cells with 12.5 μm of F389-0663 resulted in the significant increase in GSK-3β Ser9 levels, which is indicative of the GSK-3β inhibitory activity of a compound. The molecular dynamic simulations were carried out to understand the interactions of F389-0663 with GSK-3β protein. © 2016 John Wiley & Sons A/S.
González-Díaz, Humberto; Muíño, Laura; Anadón, Ana M; Romaris, Fernanda; Prado-Prado, Francisco J; Munteanu, Cristian R; Dorado, Julián; Sierra, Alejandro Pazos; Mezo, Mercedes; González-Warleta, Marta; Gárate, Teresa; Ubeira, Florencio M
2011-06-01
Infections caused by human parasites (HPs) affect the poorest 500 million people worldwide but chemotherapy has become expensive, toxic, and/or less effective due to drug resistance. On the other hand, many 3D structures in Protein Data Bank (PDB) remain without function annotation. We need theoretical models to quickly predict biologically relevant Parasite Self Proteins (PSP), which are expressed differentially in a given parasite and are dissimilar to proteins expressed in other parasites and have a high probability to become new vaccines (unique sequence) or drug targets (unique 3D structure). We present herein a model for PSPs in eight different HPs (Ascaris, Entamoeba, Fasciola, Giardia, Leishmania, Plasmodium, Trypanosoma, and Toxoplasma) with 90% accuracy for 15 341 training and validation cases. The model combines protein residue networks, Markov Chain Models (MCM) and Artificial Neural Networks (ANN). The input parameters are the spectral moments of the Markov transition matrix for electrostatic interactions associated with the protein residue complex network calculated with the MARCH-INSIDE software. We implemented this model in a new web-server called MISS-Prot (MARCH-INSIDE Scores for Self-Proteins). MISS-Prot was programmed using PHP/HTML/Python and MARCH-INSIDE routines and is freely available at: . This server is easy to use by non-experts in Bioinformatics who can carry out automatic online upload and prediction with 3D structures deposited at PDB (mode 1). We can also study outcomes of Peptide Mass Fingerprinting (PMFs) and MS/MS for query proteins with unknown 3D structures (mode 2). We illustrated the use of MISS-Prot in experimental and/or theoretical studies of peptides from Fasciola hepatica cathepsin proteases or present on 10 Anisakis simplex allergens (Ani s 1 to Ani s 10). In doing so, we combined electrophoresis (1DE), MALDI-TOF Mass Spectroscopy, and MASCOT to seek sequences, Molecular Mechanics + Molecular Dynamics (MM/MD) to generate 3D structures and MISS-Prot to predict PSP scores. MISS-Prot also allows the prediction of PSP proteins in 16 additional species including parasite hosts, fungi pathogens, disease transmission vectors, and biotechnologically relevant organisms.
Akula, Nagaraju; Pattabiraman, Nagarajan
2005-06-01
Membrane proteins play a major role in number of biological processes such as signaling pathways. The determination of the three-dimensional structure of these proteins is increasingly important for our understanding of their structure-function relationships. Due to the difficulty in isolating membrane proteins for X-ray diffraction studies, computational techniques are being developed to generate the 3D structures of TM domains. Here, we present a systematic search method for the identification of energetically favorable and tightly packed transmembrane parallel alpha-helices. The first step in our systematic search method is the generation of 3D models for pairs of parallel helix bundles with all possible orientations followed by an energy-based filter to eliminate structures with severe non-bonded contacts. Then, a RMS-based filter was used to cluster these structures into families. Furthermore, these dimers were energy minimized using molecular mechanics force field. Finally, we identified the tightly packed parallel alpha-helices by using an interface surface area. To validate our search method, we compared our predicted GlycophorinA dimer structures with the reported NMR structures. With our search method, we are able to reproduce NMR structures of GPA with 0.9A RMSD. In addition, by considering the reported mutational data on GxxxG motif interactions, twenty percent of our predicted dimers are within in the 2.0A RMSD. The dimers obtained from our method were used to generate parallel trimeric and tetramer TM structures of GPA and found that the structure of GPA might exist only in a dimer form as reported earlier.
Zhang, Wenting; Zheng, Wenjie; Toh, Yukimatsu; Betancourt-Solis, Miguel A; Tu, Jiagang; Fan, Yanlin; Vakharia, Vikram N; Liu, Jun; McNew, James A; Jin, Meilin; Tao, Yizhi J
2017-08-08
Many enveloped viruses encode a matrix protein. In the influenza A virus, the matrix protein M1 polymerizes into a rigid protein layer underneath the viral envelope to help enforce the shape and structural integrity of intact viruses. The influenza virus M1 is also known to mediate virus budding as well as the nuclear export of the viral nucleocapsids and their subsequent packaging into nascent viral particles. Despite extensive studies on the influenza A virus M1 (FLUA-M1), only crystal structures of its N-terminal domain are available. Here we report the crystal structure of the full-length M1 from another orthomyxovirus that infects fish, the infectious salmon anemia virus (ISAV). The structure of ISAV-M1 assumes the shape of an elbow, with its N domain closely resembling that of the FLUA-M1. The C domain, which is connected to the N domain through a flexible linker, is made of four α-helices packed as a tight bundle. In the crystal, ISAV-M1 monomers form infinite 2D arrays with a network of interactions involving both the N and C domains. Results from liposome flotation assays indicated that ISAV-M1 binds membrane via electrostatic interactions that are primarily mediated by a positively charged surface loop from the N domain. Cryoelectron tomography reconstruction of intact ISA virions identified a matrix protein layer adjacent to the inner leaflet of the viral membrane. The physical dimensions of the virion-associated matrix layer are consistent with the 2D ISAV-M1 crystal lattice, suggesting that the crystal lattice is a valid model for studying M1-M1, M1-membrane, and M1-RNP interactions in the virion.
Predicting protein-binding RNA nucleotides with consideration of binding partners.
Tuvshinjargal, Narankhuu; Lee, Wook; Park, Byungkyu; Han, Kyungsook
2015-06-01
In recent years several computational methods have been developed to predict RNA-binding sites in protein. Most of these methods do not consider interacting partners of a protein, so they predict the same RNA-binding sites for a given protein sequence even if the protein binds to different RNAs. Unlike the problem of predicting RNA-binding sites in protein, the problem of predicting protein-binding sites in RNA has received little attention mainly because it is much more difficult and shows a lower accuracy on average. In our previous study, we developed a method that predicts protein-binding nucleotides from an RNA sequence. In an effort to improve the prediction accuracy and usefulness of the previous method, we developed a new method that uses both RNA and protein sequence data. In this study, we identified effective features of RNA and protein molecules and developed a new support vector machine (SVM) model to predict protein-binding nucleotides from RNA and protein sequence data. The new model that used both protein and RNA sequence data achieved a sensitivity of 86.5%, a specificity of 86.2%, a positive predictive value (PPV) of 72.6%, a negative predictive value (NPV) of 93.8% and Matthews correlation coefficient (MCC) of 0.69 in a 10-fold cross validation; it achieved a sensitivity of 58.8%, a specificity of 87.4%, a PPV of 65.1%, a NPV of 84.2% and MCC of 0.48 in independent testing. For comparative purpose, we built another prediction model that used RNA sequence data alone and ran it on the same dataset. In a 10 fold-cross validation it achieved a sensitivity of 85.7%, a specificity of 80.5%, a PPV of 67.7%, a NPV of 92.2% and MCC of 0.63; in independent testing it achieved a sensitivity of 67.7%, a specificity of 78.8%, a PPV of 57.6%, a NPV of 85.2% and MCC of 0.45. In both cross-validations and independent testing, the new model that used both RNA and protein sequences showed a better performance than the model that used RNA sequence data alone in most performance measures. To the best of our knowledge, this is the first sequence-based prediction of protein-binding nucleotides in RNA which considers the binding partner of RNA. The new model will provide valuable information for designing biochemical experiments to find putative protein-binding sites in RNA with unknown structure. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
Drung, Binia; Scholz, Christoph; Barbosa, Valéria A; Nazari, Azadeh; Sarragiotto, Maria H; Schmidt, Boris
2014-10-15
DYRK1A has been associated with Down's syndrome and neurodegenerative diseases, therefore it is an important target for novel pharmacological interventions. We combined a ligand-based pharmacophore design with a structure-based protein/ligand docking using the software MOE in order to evaluate the underlying structure/activity relationship. Based on this knowledge we synthesized several novel β-carboline derivatives to validate the theoretical model. Furthermore we identified a modified lead structure as a potent DYRK1A inhibitor (IC50=130 nM) with significant selectivity against MAO-A, DYRK2, DYRK3, DYRK4 & CLK2. Copyright © 2014 Elsevier Ltd. All rights reserved.
Tandon, Gitanjali; Jaiswal, Sarika; Iquebal, M A; Kumar, Sunil; Kaur, Sukhdeep; Rai, Anil; Kumar, Dinesh
2015-01-01
Biotic stress is a major cause of heavy loss in grape productivity. In order to develop biotic stress-resistant grape varieties, the key defense genes along with its pathway have to be deciphered. In angiosperm plants, lipase-like protein phytoalexin deficient 4 (PAD4) is well known to be essential for systemic resistance against biotic stress. PAD4 functions together with its interacting partner protein enhanced disease susceptibility 1 (EDS1) to promote salicylic acid (SA)-dependent and SA-independent defense pathway. Existence and structure of key protein of systemic resistance EDS1 and PAD4 are not known in grapes. Before SA pathway studies are taken in grape, molecular evidence of EDS1: PAD4 complex is to be established. To establish this, EDS1 protein sequence was retrieved from NCBI and homologous PAD4 protein was generated using Arabidopsis thaliana as template and conserved domains were confirmed. In this study, computational methods were used to model EDS1 and PAD4 and simulated the interactions of EDS1 and PAD4. Since no structural details of the proteins were available, homology modeling was employed to construct three-dimensional structures. Further, molecular dynamic simulations were performed to study the dynamic behavior of the EDS1 and PAD4. The modeled proteins were validated and subjected to molecular docking analysis. Molecular evidence of stable complex of EDS1:PAD4 in grape supporting SA defense pathway in response to biotic stress is reported in this study. If SA defense pathway genes are explored, then markers of genes involved can play pivotal role in grape variety development especially against biotic stress leading to higher productivity.
Marrero-Ponce, Yovani; Contreras-Torres, Ernesto; García-Jacas, César R; Barigye, Stephen J; Cubillán, Néstor; Alvarado, Ysaías J
2015-06-07
In the present study, we introduce novel 3D protein descriptors based on the bilinear algebraic form in the ℝ(n) space on the coulombic matrix. For the calculation of these descriptors, macromolecular vectors belonging to ℝ(n) space, whose components represent certain amino acid side-chain properties, were used as weighting schemes. Generalization approaches for the calculation of inter-amino acidic residue spatial distances based on Minkowski metrics are proposed. The simple- and double-stochastic schemes were defined as approaches to normalize the coulombic matrix. The local-fragment indices for both amino acid-types and amino acid-groups are presented in order to permit characterizing fragments of interest in proteins. On the other hand, with the objective of taking into account specific interactions among amino acids in global or local indices, geometric and topological cut-offs are defined. To assess the utility of global and local indices a classification model for the prediction of the major four protein structural classes, was built with the Linear Discriminant Analysis (LDA) technique. The developed LDA-model correctly classifies the 92.6% and 92.7% of the proteins on the training and test sets, respectively. The obtained model showed high values of the generalized square correlation coefficient (GC(2)) on both the training and test series. The statistical parameters derived from the internal and external validation procedures demonstrate the robustness, stability and the high predictive power of the proposed model. The performance of the LDA-model demonstrates the capability of the proposed indices not only to codify relevant biochemical information related to the structural classes of proteins, but also to yield suitable interpretability. It is anticipated that the current method will benefit the prediction of other protein attributes or functions. Copyright © 2015 Elsevier Ltd. All rights reserved.
Automating the application of smart materials for protein crystallization
DOE Office of Scientific and Technical Information (OSTI.GOV)
Khurshid, Sahir; Govada, Lata; EL-Sharif, Hazim F.
2015-03-01
The first semi-liquid, non-protein nucleating agent for automated protein crystallization trials is described. This ‘smart material’ is demonstrated to induce crystal growth and will provide a simple, cost-effective tool for scientists in academia and industry. The fabrication and validation of the first semi-liquid nonprotein nucleating agent to be administered automatically to crystallization trials is reported. This research builds upon prior demonstration of the suitability of molecularly imprinted polymers (MIPs; known as ‘smart materials’) for inducing protein crystal growth. Modified MIPs of altered texture suitable for high-throughput trials are demonstrated to improve crystal quality and to increase the probability of successmore » when screening for suitable crystallization conditions. The application of these materials is simple, time-efficient and will provide a potent tool for structural biologists embarking on crystallization trials.« less
Predicting PDZ domain mediated protein interactions from structure
2013-01-01
Background PDZ domains are structural protein domains that recognize simple linear amino acid motifs, often at protein C-termini, and mediate protein-protein interactions (PPIs) in important biological processes, such as ion channel regulation, cell polarity and neural development. PDZ domain-peptide interaction predictors have been developed based on domain and peptide sequence information. Since domain structure is known to influence binding specificity, we hypothesized that structural information could be used to predict new interactions compared to sequence-based predictors. Results We developed a novel computational predictor of PDZ domain and C-terminal peptide interactions using a support vector machine trained with PDZ domain structure and peptide sequence information. Performance was estimated using extensive cross validation testing. We used the structure-based predictor to scan the human proteome for ligands of 218 PDZ domains and show that the predictions correspond to known PDZ domain-peptide interactions and PPIs in curated databases. The structure-based predictor is complementary to the sequence-based predictor, finding unique known and novel PPIs, and is less dependent on training–testing domain sequence similarity. We used a functional enrichment analysis of our hits to create a predicted map of PDZ domain biology. This map highlights PDZ domain involvement in diverse biological processes, some only found by the structure-based predictor. Based on this analysis, we predict novel PDZ domain involvement in xenobiotic metabolism and suggest new interactions for other processes including wound healing and Wnt signalling. Conclusions We built a structure-based predictor of PDZ domain-peptide interactions, which can be used to scan C-terminal proteomes for PDZ interactions. We also show that the structure-based predictor finds many known PDZ mediated PPIs in human that were not found by our previous sequence-based predictor and is less dependent on training–testing domain sequence similarity. Using both predictors, we defined a functional map of human PDZ domain biology and predict novel PDZ domain function. Users may access our structure-based and previous sequence-based predictors at http://webservice.baderlab.org/domains/POW. PMID:23336252
Luo, Heng; Zhang, Ping; Cao, Xi Hang; Du, Dizheng; Ye, Hao; Huang, Hui; Li, Can; Qin, Shengying; Wan, Chunling; Shi, Leming; He, Lin; Yang, Lun
2016-11-02
The cost of developing a new drug has increased sharply over the past years. To ensure a reasonable return-on-investment, it is useful for drug discovery researchers in both industry and academia to identify all the possible indications for early pipeline molecules. For the first time, we propose the term computational "drug candidate positioning" or "drug positioning", to describe the above process. It is distinct from drug repositioning, which identifies new uses for existing drugs and maximizes their value. Since many therapeutic effects are mediated by unexpected drug-protein interactions, it is reasonable to analyze the chemical-protein interactome (CPI) profiles to predict indications. Here we introduce the server DPDR-CPI, which can make real-time predictions based only on the structure of the small molecule. When a user submits a molecule, the server will dock it across 611 human proteins, generating a CPI profile of features that can be used for predictions. It can suggest the likelihood of relevance of the input molecule towards ~1,000 human diseases with top predictions listed. DPDR-CPI achieved an overall AUROC of 0.78 during 10-fold cross-validations and AUROC of 0.76 for the independent validation. The server is freely accessible via http://cpi.bio-x.cn/dpdr/.
Di Scala, Coralie; Baier, Carlos J; Evans, Luke S; Williamson, Philip T F; Fantini, Jacques; Barrantes, Francisco J
2017-01-01
Cholesterol is a ubiquitous neutral lipid, which finely tunes the activity of a wide range of membrane proteins, including neurotransmitter and hormone receptors and ion channels. Given the scarcity of available X-ray crystallographic structures and the even fewer in which cholesterol sites have been directly visualized, application of in silico computational methods remains a valid alternative for the detection and thermodynamic characterization of cholesterol-specific sites in functionally important membrane proteins. The membrane-embedded segments of the paradigm neurotransmitter receptor for acetylcholine display a series of cholesterol consensus domains (which we have coined "CARC"). The CARC motif exhibits a preference for the outer membrane leaflet and its mirror motif, CRAC, for the inner one. Some membrane proteins possess the double CARC-CRAC sequences within the same transmembrane domain. In addition to in silico molecular modeling, the affinity, concentration dependence, and specificity of the cholesterol-recognition motif-protein interaction have recently found experimental validation in other biophysical approaches like monolayer techniques and nuclear magnetic resonance spectroscopy. From the combined studies, it becomes apparent that the CARC motif is now more firmly established as a high-affinity cholesterol-binding domain for membrane-bound receptors and remarkably conserved along phylogenetic evolution. © 2017 Elsevier Inc. All rights reserved.
Cleveland, Sean B.; Davies, John; McClure, Marcella A.
2011-01-01
The goal of this Bioinformatic study is to investigate sequence conservation in relation to evolutionary function/structure of the nucleoprotein of the order Mononegavirales. In the combined analysis of 63 representative nucleoprotein (N) sequences from four viral families (Bornaviridae, Filoviridae, Rhabdoviridae, and Paramyxoviridae) we predict the regions of protein disorder, intra-residue contact and co-evolving residues. Correlations between location and conservation of predicted regions illustrate a strong division between families while high- lighting conservation within individual families. These results suggest the conserved regions among the nucleoproteins, specifically within Rhabdoviridae and Paramyxoviradae, but also generally among all members of the order, reflect an evolutionary advantage in maintaining these sites for the viral nucleoprotein as part of the transcription/replication machinery. Results indicate conservation for disorder in the C-terminus region of the representative proteins that is important for interacting with the phosphoprotein and the large subunit polymerase during transcription and replication. Additionally, the C-terminus region of the protein preceding the disordered region, is predicted to be important for interacting with the encapsidated genome. Portions of the N-terminus are responsible for N∶N stability and interactions identified by the presence or lack of co-evolving intra-protein contact predictions. The validation of these prediction results by current structural information illustrates the benefits of the Disorder, Intra-residue contact and Compensatory mutation Correlator (DisICC) pipeline as a method for quickly characterizing proteins and providing the most likely residues and regions necessary to target for disruption in viruses that have little structural information available. PMID:21559282
Doss, C. George Priya; NagaSundaram, N.
2012-01-01
Background Elucidating the molecular dynamic behavior of Protein-DNA complex upon mutation is crucial in current genomics. Molecular dynamics approach reveals the changes on incorporation of variants that dictate the structure and function of Protein-DNA complexes. Deleterious mutations in APE1 protein modify the physicochemical property of amino acids that affect the protein stability and dynamic behavior. Further, these mutations disrupt the binding sites and prohibit the protein to form complexes with its interacting DNA. Principal Findings In this study, we developed a rapid and cost-effective method to analyze variants in APE1 gene that are associated with disease susceptibility and evaluated their impacts on APE1-DNA complex dynamic behavior. Initially, two different in silico approaches were used to identify deleterious variants in APE1 gene. Deleterious scores that overlap in these approaches were taken in concern and based on it, two nsSNPs with IDs rs61730854 (I64T) and rs1803120 (P311S) were taken further for structural analysis. Significance Different parameters such as RMSD, RMSF, salt bridge, H-bonds and SASA applied in Molecular dynamic study reveals that predicted deleterious variants I64T and P311S alters the structure as well as affect the stability of APE1-DNA interacting functions. This study addresses such new methods for validating functional polymorphisms of human APE1 which is critically involved in causing deficit in repair capacity, which in turn leads to genetic instability and carcinogenesis. PMID:22384055
Strategies for carbohydrate model building, refinement and validation
2017-01-01
Sugars are the most stereochemically intricate family of biomolecules and present substantial challenges to anyone trying to understand their nomenclature, reactions or branched structures. Current crystallographic programs provide an abstraction layer allowing inexpert structural biologists to build complete protein or nucleic acid model components automatically either from scratch or with little manual intervention. This is, however, still not generally true for sugars. The need for carbohydrate-specific building and validation tools has been highlighted a number of times in the past, concomitantly with the introduction of a new generation of experimental methods that have been ramping up the production of protein–sugar complexes and glycoproteins for the past decade. While some incipient advances have been made to address these demands, correctly modelling and refining carbohydrates remains a challenge. This article will address many of the typical difficulties that a structural biologist may face when dealing with carbohydrates, with an emphasis on problem solving in the resolution range where X-ray crystallography and cryo-electron microscopy are expected to overlap in the next decade. PMID:28177313
Molecular dynamics-based refinement and validation for sub-5 Å cryo-electron microscopy maps.
Singharoy, Abhishek; Teo, Ivan; McGreevy, Ryan; Stone, John E; Zhao, Jianhua; Schulten, Klaus
2016-07-07
Two structure determination methods, based on the molecular dynamics flexible fitting (MDFF) paradigm, are presented that resolve sub-5 Å cryo-electron microscopy (EM) maps with either single structures or ensembles of such structures. The methods, denoted cascade MDFF and resolution exchange MDFF, sequentially re-refine a search model against a series of maps of progressively higher resolutions, which ends with the original experimental resolution. Application of sequential re-refinement enables MDFF to achieve a radius of convergence of ~25 Å demonstrated with the accurate modeling of β-galactosidase and TRPV1 proteins at 3.2 Å and 3.4 Å resolution, respectively. The MDFF refinements uniquely offer map-model validation and B-factor determination criteria based on the inherent dynamics of the macromolecules studied, captured by means of local root mean square fluctuations. The MDFF tools described are available to researchers through an easy-to-use and cost-effective cloud computing resource on Amazon Web Services.
Natural-product-derived fragments for fragment-based ligand discovery
NASA Astrophysics Data System (ADS)
Over, Björn; Wetzel, Stefan; Grütter, Christian; Nakai, Yasushi; Renner, Steffen; Rauh, Daniel; Waldmann, Herbert
2013-01-01
Fragment-based ligand and drug discovery predominantly employs sp2-rich compounds covering well-explored regions of chemical space. Despite the ease with which such fragments can be coupled, this focus on flat compounds is widely cited as contributing to the attrition rate of the drug discovery process. In contrast, biologically validated natural products are rich in stereogenic centres and populate areas of chemical space not occupied by average synthetic molecules. Here, we have analysed more than 180,000 natural product structures to arrive at 2,000 clusters of natural-product-derived fragments with high structural diversity, which resemble natural scaffolds and are rich in sp3-configured centres. The structures of the cluster centres differ from previously explored fragment libraries, but for nearly half of the clusters representative members are commercially available. We validate their usefulness for the discovery of novel ligand and inhibitor types by means of protein X-ray crystallography and the identification of novel stabilizers of inactive conformations of p38α MAP kinase and of inhibitors of several phosphatases.
Structural biology data archiving - where we are and what lies ahead.
Kleywegt, Gerard J; Velankar, Sameer; Patwardhan, Ardan
2018-05-10
For almost 50 years, structural biology has endeavoured to conserve and share its experimental data and their interpretations (usually, atomistic models) through global public archives such as the Protein Data Bank, Electron Microscopy Data Bank and Biological Magnetic Resonance Data Bank (BMRB). These archives are treasure troves of freely accessible data that document our quest for molecular or atomic understanding of biological function and processes in health and disease. They have prepared the field to tackle new archiving challenges as more and more (combinations of) techniques are being utilized to elucidate structure at ever increasing length scales. Furthermore, the field has made substantial efforts to develop validation methods that help users to assess the reliability of structures and to identify the most appropriate data for their needs. In this Review, we present an overview of public data archives in structural biology and discuss the importance of validation for users and producers of structural data. Finally, we sketch our efforts to integrate structural data with bioimaging data and with other sources of biological data. This will make relevant structural information available and more easily discoverable for a wide range of scientists. © 2018 The Authors. FEBS Letters published by John Wiley & Sons Ltd on behalf of Federation of European Biochemical Societies.
Genomic selection across multiple breeding cycles in applied bread wheat breeding.
Michel, Sebastian; Ametz, Christian; Gungor, Huseyin; Epure, Doru; Grausgruber, Heinrich; Löschenberger, Franziska; Buerstmayr, Hermann
2016-06-01
We evaluated genomic selection across five breeding cycles of bread wheat breeding. Bias of within-cycle cross-validation and methods for improving the prediction accuracy were assessed. The prospect of genomic selection has been frequently shown by cross-validation studies using the same genetic material across multiple environments, but studies investigating genomic selection across multiple breeding cycles in applied bread wheat breeding are lacking. We estimated the prediction accuracy of grain yield, protein content and protein yield of 659 inbred lines across five independent breeding cycles and assessed the bias of within-cycle cross-validation. We investigated the influence of outliers on the prediction accuracy and predicted protein yield by its components traits. A high average heritability was estimated for protein content, followed by grain yield and protein yield. The bias of the prediction accuracy using populations from individual cycles using fivefold cross-validation was accordingly substantial for protein yield (17-712 %) and less pronounced for protein content (8-86 %). Cross-validation using the cycles as folds aimed to avoid this bias and reached a maximum prediction accuracy of [Formula: see text] = 0.51 for protein content, [Formula: see text] = 0.38 for grain yield and [Formula: see text] = 0.16 for protein yield. Dropping outlier cycles increased the prediction accuracy of grain yield to [Formula: see text] = 0.41 as estimated by cross-validation, while dropping outlier environments did not have a significant effect on the prediction accuracy. Independent validation suggests, on the other hand, that careful consideration is necessary before an outlier correction is undertaken, which removes lines from the training population. Predicting protein yield by multiplying genomic estimated breeding values of grain yield and protein content raised the prediction accuracy to [Formula: see text] = 0.19 for this derived trait.
Wards in the keyway: amino acids with anomalous pK(a)s in calycins.
Eberini, Ivano; Sensi, Cristina; Bovi, Michele; Molinari, Henriette; Galliano, Monica; Bonomi, Franco; Iametti, Stefania; Gianazza, Elisabetta
2012-12-01
As a follow-up to our recent analysis of the electrostatics of bovine β-lactoglobulin (Eberini et al. in Amino Acids 42:2019-2030, 2011), we investigated whether the occurrence in the native structure of calycins-the superfamily to which β-lactoglobulin belongs-of amino acids with anomalous pK (a)s is an infrequent or, on the contrary, a common occurrence, and whether or not a general pattern may be recognized. To this aim, we randomly selected four calycins we had either purified from natural sources or prepared with recombinant DNA technologies during our previous and current structural and functional studies on this family. Their pIs vary over several pH units and their known functions are as diverse as carriers, enzymes, immunomodulators and/or extracellular chaperones. In our survey, we used both in silico prediction methods and in vitro procedures, such as isoelectric focusing, electrophoretic titration curves and spectroscopic techniques. By comparing the results under native conditions (no exposure of the proteins to chaotropic agents) to those after protein unfolding (in the presence of 8 M urea), a shift is observed in the pK (a) of at least one amino acid per protein, which results in a measurable change in pI. Three types of amino acids are involved: Cys, Glu, and His, their position varies along the calycin sequence. Although no common mechanism may thus be recognized, we hypothesize that the 'normalization' of anomalous pK (a)s may be the phenomenon that accompanies, and favors, structural rearrangements such as those involved in ligand binding by these proteins. An interesting, if anecdotal, validation to this view comes from the behavior of human retinol binding protein, for which the pI of the folded and liganded protein is intermediate between those of the folded and unliganded and of the unfolded protein forms. Likewise, both solid (from crystallography) and solution state (from CD spectroscopy) data confirm that the protein undergoes structural rearrangement upon retinol binding.
Parsy, Christophe; Alexandre, François-René; Brandt, Guillaume; Caillet, Catherine; Cappelle, Sylvie; Chaves, Dominique; Convard, Thierry; Derock, Michel; Gloux, Damien; Griffon, Yann; Lallos, Lisa; Leroy, Frédéric; Liuzzi, Michel; Loi, Anna-Giulia; Moulat, Laure; Musiu, Chiara; Rahali, Houcine; Roques, Virginie; Seifer, Maria; Standring, David; Surleraux, Dominique
2014-09-15
Structural homology between thrombin inhibitors and the early tetrapeptide HCV protease inhibitor led to the bioisosteric replacement of the P2 proline by a 2,4-disubstituted azetidine within the macrocyclic β-strand mimic. Molecular modeling guided the design of the series. This approach was validated by the excellent activity and selectivity in biochemical and cell based assays of this novel series and confirmed by the co-crystal structure of the inhibitor with the NS3/4A protein (PDB code: 4TYD). Copyright © 2014 Elsevier Ltd. All rights reserved.
Computational Prediction of Atomic Structures of Helical Membrane Proteins Aided by EM Maps
Kovacs, Julio A.; Yeager, Mark; Abagyan, Ruben
2007-01-01
Integral membrane proteins pose a major challenge for protein-structure prediction because only ≈100 high-resolution structures are available currently, thereby impeding the development of rules or empirical potentials to predict the packing of transmembrane α-helices. However, when an intermediate-resolution electron microscopy (EM) map is available, it can be used to provide restraints which, in combination with a suitable computational protocol, make structure prediction feasible. In this work we present such a protocol, which proceeds in three stages: 1), generation of an ensemble of α-helices by flexible fitting into each of the density rods in the low-resolution EM map, spanning a range of rotational angles around the main helical axes and translational shifts along the density rods; 2), fast optimization of side chains and scoring of the resulting conformations; and 3), refinement of the lowest-scoring conformations with internal coordinate mechanics, by optimizing the van der Waals, electrostatics, hydrogen bonding, torsional, and solvation energy contributions. In addition, our method implements a penalty term through a so-called tethering map, derived from the EM map, which restrains the positions of the α-helices. The protocol was validated on three test cases: GpA, KcsA, and MscL. PMID:17496035
Improving compound-protein interaction prediction by building up highly credible negative samples.
Liu, Hui; Sun, Jianjiang; Guan, Jihong; Zheng, Jie; Zhou, Shuigeng
2015-06-15
Computational prediction of compound-protein interactions (CPIs) is of great importance for drug design and development, as genome-scale experimental validation of CPIs is not only time-consuming but also prohibitively expensive. With the availability of an increasing number of validated interactions, the performance of computational prediction approaches is severely impended by the lack of reliable negative CPI samples. A systematic method of screening reliable negative sample becomes critical to improving the performance of in silico prediction methods. This article aims at building up a set of highly credible negative samples of CPIs via an in silico screening method. As most existing computational models assume that similar compounds are likely to interact with similar target proteins and achieve remarkable performance, it is rational to identify potential negative samples based on the converse negative proposition that the proteins dissimilar to every known/predicted target of a compound are not much likely to be targeted by the compound and vice versa. We integrated various resources, including chemical structures, chemical expression profiles and side effects of compounds, amino acid sequences, protein-protein interaction network and functional annotations of proteins, into a systematic screening framework. We first tested the screened negative samples on six classical classifiers, and all these classifiers achieved remarkably higher performance on our negative samples than on randomly generated negative samples for both human and Caenorhabditis elegans. We then verified the negative samples on three existing prediction models, including bipartite local model, Gaussian kernel profile and Bayesian matrix factorization, and found that the performances of these models are also significantly improved on the screened negative samples. Moreover, we validated the screened negative samples on a drug bioactivity dataset. Finally, we derived two sets of new interactions by training an support vector machine classifier on the positive interactions annotated in DrugBank and our screened negative interactions. The screened negative samples and the predicted interactions provide the research community with a useful resource for identifying new drug targets and a helpful supplement to the current curated compound-protein databases. Supplementary files are available at: http://admis.fudan.edu.cn/negative-cpi/. © The Author 2015. Published by Oxford University Press.
2012-01-01
Background Single nucleotide polymorphism (SNP) validation and large-scale genotyping are required to maximize the use of DNA sequence variation and determine the functional relevance of candidate genes for complex stress tolerance traits through genetic association in rice. We used the bead array platform-based Illumina GoldenGate assay to validate and genotype SNPs in a select set of stress-responsive genes to understand their functional relevance and study the population structure in rice. Results Of the 384 putative SNPs assayed, we successfully validated and genotyped 362 (94.3%). Of these 325 (84.6%) showed polymorphism among the 91 rice genotypes examined. Physical distribution, degree of allele sharing, admixtures and introgression, and amino acid replacement of SNPs in 263 abiotic and 62 biotic stress-responsive genes provided clues for identification and targeted mapping of trait-associated genomic regions. We assessed the functional and adaptive significance of validated SNPs in a set of contrasting drought tolerant upland and sensitive lowland rice genotypes by correlating their allelic variation with amino acid sequence alterations in catalytic domains and three-dimensional secondary protein structure encoded by stress-responsive genes. We found a strong genetic association among SNPs in the nine stress-responsive genes with upland and lowland ecological adaptation. Higher nucleotide diversity was observed in indica accessions compared with other rice sub-populations based on different population genetic parameters. The inferred ancestry of 16% among rice genotypes was derived from admixed populations with the maximum between upland aus and wild Oryza species. Conclusions SNPs validated in biotic and abiotic stress-responsive rice genes can be used in association analyses to identify candidate genes and develop functional markers for stress tolerance in rice. PMID:22921105
Liu, Jin; Chen, Yu; Li, Jing-Ya; Luo, Cheng; Li, Jia; Chen, Kai-Xian; Li, Xu-Wen; Guo, Yue-Wei
2018-03-20
Phidianidines A and B are two novel marine indole alkaloids bearing an uncommon 1,2,4-oxadiazole ring and exhibiting various biological activities. Our previous research showed that the synthesized phidianidine analogs had the potential to inhibit the activity of protein tyrosine phosphatase 1B (PTP1B), a validated target for Type II diabetes, which indicates that these analogs are worth further structural modification. Therefore, in this paper, a series of phidianidine derivatives were designed and rapidly synthesized with a function-oriented synthesis (FOS) strategy. Their inhibitory effects on PTP1B and T-cell protein tyrosine phosphatase (TCPTP) were evaluated, and several compounds displayed significant inhibitory potency and specific selectivity over PTP1B. The structure-activity relationship (SAR) and molecular docking analyses are also described.
Naz, Huma; Shahbaaz, Mohd; Bisetty, Krishna; Islam, Asimul; Ahmad, Faizan; Hassan, Md Imtaiyaz
2016-06-01
Human calcium/calmodulin-dependent protein kinase IV (CAMKIV) is a member of Ser/Thr protein kinase family. It is regulated by the calcium-calmodulin dependent signal through a secondary messenger, Ca(2+), which leads to the activation of its autoinhibited form. The over-expression and mutation in CAMKIV as well as change in Ca(2+) concentration is often associated with numerous neurodegenerative diseases and cancers. We have successfully cloned, expressed, and purified a functionally active kinase domain of human CAMKIV. To observe the effect of different pH conditions on the structural and functional properties of CAMKIV, we have used spectroscopic techniques such as circular diachroism (CD) absorbance and fluorescence. We have observed that within the pH range 5.0-11.5, CAMKIV maintained both its secondary and tertiary structures, along with its function, whereas significant aggregation was observed at acidic pH (2.0-4.5). We have also performed ATPase activity assays under different pH conditions and found a significant correlation between the structure and enzymatic activities of CAMKIV. In-silico validations were further carried out by modeling the 3-dimensional structure of CAMKIV and then subjecting it to molecular dynamics (MD) simulations to understand its conformational behavior in explicit water conditions. A strong correlation between spectroscopic observations and the output of molecular dynamics simulation was observed for CAMKIV.
Neshich, Goran; Togawa, Roberto C.; Mancini, Adauto L.; Kuser, Paula R.; Yamagishi, Michel E. B.; Pappas, Georgios; Torres, Wellington V.; Campos, Tharsis Fonseca e; Ferreira, Leonardo L.; Luna, Fabio M.; Oliveira, Adilton G.; Miura, Ronald T.; Inoue, Marcus K.; Horita, Luiz G.; de Souza, Dimas F.; Dominiquini, Fabiana; Álvaro, Alexandre; Lima, Cleber S.; Ogawa, Fabio O.; Gomes, Gabriel B.; Palandrani, Juliana F.; dos Santos, Gabriela F.; de Freitas, Esther M.; Mattiuz, Amanda R.; Costa, Ivan C.; de Almeida, Celso L.; Souza, Savio; Baudet, Christian; Higa, Roberto H.
2003-01-01
STING Millennium Suite (SMS) is a new web-based suite of programs and databases providing visualization and a complex analysis of molecular sequence and structure for the data deposited at the Protein Data Bank (PDB). SMS operates with a collection of both publicly available data (PDB, HSSP, Prosite) and its own data (contacts, interface contacts, surface accessibility). Biologists find SMS useful because it provides a variety of algorithms and validated data, wrapped-up in a user friendly web interface. Using SMS it is now possible to analyze sequence to structure relationships, the quality of the structure, nature and volume of atomic contacts of intra and inter chain type, relative conservation of amino acids at the specific sequence position based on multiple sequence alignment, indications of folding essential residue (FER) based on the relationship of the residue conservation to the intra-chain contacts and Cα–Cα and Cβ–Cβ distance geometry. Specific emphasis in SMS is given to interface forming residues (IFR)—amino acids that define the interactive portion of the protein surfaces. SMS may simultaneously display and analyze previously superimposed structures. PDB updates trigger SMS updates in a synchronized fashion. SMS is freely accessible for public data at http://www.cbi.cnptia.embrapa.br, http://mirrors.rcsb.org/SMS and http://trantor.bioc.columbia.edu/SMS. PMID:12824333
Clustering and visualizing similarity networks of membrane proteins.
Hu, Geng-Ming; Mai, Te-Lun; Chen, Chi-Ming
2015-08-01
We proposed a fast and unsupervised clustering method, minimum span clustering (MSC), for analyzing the sequence-structure-function relationship of biological networks, and demonstrated its validity in clustering the sequence/structure similarity networks (SSN) of 682 membrane protein (MP) chains. The MSC clustering of MPs based on their sequence information was found to be consistent with their tertiary structures and functions. For the largest seven clusters predicted by MSC, the consistency in chain function within the same cluster is found to be 100%. From analyzing the edge distribution of SSN for MPs, we found a characteristic threshold distance for the boundary between clusters, over which SSN of MPs could be properly clustered by an unsupervised sparsification of the network distance matrix. The clustering results of MPs from both MSC and the unsupervised sparsification methods are consistent with each other, and have high intracluster similarity and low intercluster similarity in sequence, structure, and function. Our study showed a strong sequence-structure-function relationship of MPs. We discussed evidence of convergent evolution of MPs and suggested applications in finding structural similarities and predicting biological functions of MP chains based on their sequence information. © 2015 Wiley Periodicals, Inc.
Homology Modeling of Class A G Protein-Coupled Receptors
Costanzi, Stefano
2012-01-01
G protein-coupled receptors (GPCRs) are a large superfamily of membrane bound signaling proteins that hold great pharmaceutical interest. Since experimentally elucidated structures are available only for a very limited number of receptors, homology modeling has become a widespread technique for the construction of GPCR models intended to study the structure-function relationships of the receptors and aid the discovery and development of ligands capable of modulating their activity. Through this chapter, various aspects involved in the constructions of homology models of the serpentine domain of the largest class of GPCRs, known as class A or rhodopsin family, are illustrated. In particular, the chapter provides suggestions, guidelines and critical thoughts on some of the most crucial aspect of GPCR modeling, including: collection of candidate templates and a structure-based alignment of their sequences; identification and alignment of the transmembrane helices of the query receptor to the corresponding domains of the candidate templates; selection of one or more templates receptor; election of homology or de novo modeling for the construction of specific extracellular and intracellular domains; construction of the three-dimensional models, with special consideration to extracellular regions, disulfide bridges, and interhelical cavity; validation of the models through controlled virtual screening experiments. PMID:22323225
Hinsen, Konrad; Vaitinadapoule, Aurore; Ostuni, Mariano A; Etchebest, Catherine; Lacapere, Jean-Jacques
2015-02-01
The 18 kDa protein TSPO is a highly conserved transmembrane protein found in bacteria, yeast, animals and plants. TSPO is involved in a wide range of physiological functions, among which the transport of several molecules. The atomic structure of monomeric ligand-bound mouse TSPO in detergent has been published recently. A previously published low-resolution structure of Rhodobacter sphaeroides TSPO, obtained from tubular crystals with lipids and observed in cryo-electron microscopy, revealed an oligomeric structure without any ligand. We analyze this electron microscopy density in view of available biochemical and biophysical data, building a matching atomic model for the monomer and then the entire crystal. We compare its intra- and inter-molecular contacts with those predicted by amino acid covariation in TSPO proteins from evolutionary sequence analysis. The arrangement of the five transmembrane helices in a monomer of our model is different from that observed for the mouse TSPO. We analyze possible ligand binding sites for protoporphyrin, for the high-affinity ligand PK 11195, and for cholesterol in TSPO monomers and/or oligomers, and we discuss possible functional implications. Copyright © 2014 Elsevier B.V. All rights reserved.
Ashworth, Justin; Plaisier, Christopher L.; Lo, Fang Yin; Reiss, David J.; Baliga, Nitin S.
2014-01-01
Widespread microbial genome sequencing presents an opportunity to understand the gene regulatory networks of non-model organisms. This requires knowledge of the binding sites for transcription factors whose DNA-binding properties are unknown or difficult to infer. We adapted a protein structure-based method to predict the specificities and putative regulons of homologous transcription factors across diverse species. As a proof-of-concept we predicted the specificities and transcriptional target genes of divergent archaeal feast/famine regulatory proteins, several of which are encoded in the genome of Halobacterium salinarum. This was validated by comparison to experimentally determined specificities for transcription factors in distantly related extremophiles, chromatin immunoprecipitation experiments, and cis-regulatory sequence conservation across eighteen related species of halobacteria. Through this analysis we were able to infer that Halobacterium salinarum employs a divergent local trans-regulatory strategy to regulate genes (carA and carB) involved in arginine and pyrimidine metabolism, whereas Escherichia coli employs an operon. The prediction of gene regulatory binding sites using structure-based methods is useful for the inference of gene regulatory relationships in new species that are otherwise difficult to infer. PMID:25255272
Ashworth, Justin; Plaisier, Christopher L; Lo, Fang Yin; Reiss, David J; Baliga, Nitin S
2014-01-01
Widespread microbial genome sequencing presents an opportunity to understand the gene regulatory networks of non-model organisms. This requires knowledge of the binding sites for transcription factors whose DNA-binding properties are unknown or difficult to infer. We adapted a protein structure-based method to predict the specificities and putative regulons of homologous transcription factors across diverse species. As a proof-of-concept we predicted the specificities and transcriptional target genes of divergent archaeal feast/famine regulatory proteins, several of which are encoded in the genome of Halobacterium salinarum. This was validated by comparison to experimentally determined specificities for transcription factors in distantly related extremophiles, chromatin immunoprecipitation experiments, and cis-regulatory sequence conservation across eighteen related species of halobacteria. Through this analysis we were able to infer that Halobacterium salinarum employs a divergent local trans-regulatory strategy to regulate genes (carA and carB) involved in arginine and pyrimidine metabolism, whereas Escherichia coli employs an operon. The prediction of gene regulatory binding sites using structure-based methods is useful for the inference of gene regulatory relationships in new species that are otherwise difficult to infer.
Bouchard, P; Chomilier, J; Ravet, V; Mornon, J P; Viguès, B
2001-01-01
Epiplasmin C is the major protein component of the membrane skeleton in the ciliate Tetrahymena pyriformis. Cloning and analysis of the gene encoding epiplasmin C showed this protein to be a previously unrecognized protein. In particular, epiplasmin C was shown to lack the canonical features of already known epiplasmic proteins in ciliates and flagellates. By means of hydrophobic cluster analysis (HCA), it has been shown that epiplasmin C is constituted of a repeat of 25 domains of 40 residues each. These domains are related and can be grouped in two families called types I and types II. Connections between types I and types II present rules that can be evidenced in the sequence itself, thus enforcing the validity of the splitting of the domains. Using these repeated domains as queries, significant structural similarities were demonstrated with an extra six heptads shared by nuclear lamins and invertebrate cytoplasmic intermediate filament proteins and deleted in the cytoplasmic intermediate filament protein lineage at the protostome-deuterostome branching in the eukaryotic phylogenetic tree.