Protein Structure Determination using Metagenome sequence data
Ovchinnikov, Sergey; Park, Hahnbeom; Varghese, Neha; Huang, Po-Ssu; Pavlopoulos, Georgios A.; Kim, David E.; Kamisetty, Hetunandan; Kyrpides, Nikos C.; Baker, David
2017-01-01
Despite decades of work by structural biologists, there are still ~5200 protein families with unknown structure outside the range of comparative modeling. We show that Rosetta structure prediction guided by residue-residue contacts inferred from evolutionary information can accurately model proteins that belong to large families, and that metagenome sequence data more than triples the number of protein families with sufficient sequences for accurate modeling. We then integrate metagenome data, contact based structure matching and Rosetta structure calculations to generate models for 614 protein families with currently unknown structures; 206 are membrane proteins and 137 have folds not represented in the PDB. This approach provides the representative models for large protein families originally envisioned as the goal of the protein structure initiative at a fraction of the cost. PMID:28104891
Arana-Daniel, Nancy; Gallegos, Alberto A; López-Franco, Carlos; Alanís, Alma Y; Morales, Jacob; López-Franco, Adriana
2016-01-01
With the increasing power of computers, the amount of data that can be processed in small periods of time has grown exponentially, as has the importance of classifying large-scale data efficiently. Support vector machines have shown good results classifying large amounts of high-dimensional data, such as data generated by protein structure prediction, spam recognition, medical diagnosis, optical character recognition and text classification, etc. Most state of the art approaches for large-scale learning use traditional optimization methods, such as quadratic programming or gradient descent, which makes the use of evolutionary algorithms for training support vector machines an area to be explored. The present paper proposes an approach that is simple to implement based on evolutionary algorithms and Kernel-Adatron for solving large-scale classification problems, focusing on protein structure prediction. The functional properties of proteins depend upon their three-dimensional structures. Knowing the structures of proteins is crucial for biology and can lead to improvements in areas such as medicine, agriculture and biofuels.
Dal Palù, Alessandro; Pontelli, Enrico; He, Jing; Lu, Yonggang
2007-01-01
The paper describes a novel framework, constructed using Constraint Logic Programming (CLP) and parallelism, to determine the association between parts of the primary sequence of a protein and alpha-helices extracted from 3D low-resolution descriptions of large protein complexes. The association is determined by extracting constraints from the 3D information, regarding length, relative position and connectivity of helices, and solving these constraints with the guidance of a secondary structure prediction algorithm. Parallelism is employed to enhance performance on large proteins. The framework provides a fast, inexpensive alternative to determine the exact tertiary structure of unknown proteins.
XLinkDB 2.0: integrated, large-scale structural analysis of protein crosslinking data
Schweppe, Devin K.; Zheng, Chunxiang; Chavez, Juan D.; Navare, Arti T.; Wu, Xia; Eng, Jimmy K.; Bruce, James E.
2016-01-01
Motivation: Large-scale chemical cross-linking with mass spectrometry (XL-MS) analyses are quickly becoming a powerful means for high-throughput determination of protein structural information and protein–protein interactions. Recent studies have garnered thousands of cross-linked interactions, yet the field lacks an effective tool to compile experimental data or access the network and structural knowledge for these large scale analyses. We present XLinkDB 2.0 which integrates tools for network analysis, Protein Databank queries, modeling of predicted protein structures and modeling of docked protein structures. The novel, integrated approach of XLinkDB 2.0 enables the holistic analysis of XL-MS protein interaction data without limitation to the cross-linker or analytical system used for the analysis. Availability and Implementation: XLinkDB 2.0 can be found here, including documentation and help: http://xlinkdb.gs.washington.edu/. Contact: jimbruce@uw.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27153666
Large-scale model quality assessment for improving protein tertiary structure prediction.
Cao, Renzhi; Bhattacharya, Debswapna; Adhikari, Badri; Li, Jilong; Cheng, Jianlin
2015-06-15
Sampling structural models and ranking them are the two major challenges of protein structure prediction. Traditional protein structure prediction methods generally use one or a few quality assessment (QA) methods to select the best-predicted models, which cannot consistently select relatively better models and rank a large number of models well. Here, we develop a novel large-scale model QA method in conjunction with model clustering to rank and select protein structural models. It unprecedentedly applied 14 model QA methods to generate consensus model rankings, followed by model refinement based on model combination (i.e. averaging). Our experiment demonstrates that the large-scale model QA approach is more consistent and robust in selecting models of better quality than any individual QA method. Our method was blindly tested during the 11th Critical Assessment of Techniques for Protein Structure Prediction (CASP11) as MULTICOM group. It was officially ranked third out of all 143 human and server predictors according to the total scores of the first models predicted for 78 CASP11 protein domains and second according to the total scores of the best of the five models predicted for these domains. MULTICOM's outstanding performance in the extremely competitive 2014 CASP11 experiment proves that our large-scale QA approach together with model clustering is a promising solution to one of the two major problems in protein structure modeling. The web server is available at: http://sysbio.rnet.missouri.edu/multicom_cluster/human/. © The Author 2015. Published by Oxford University Press.
Goonesekere, Nalin Cw
2009-01-01
The large numbers of protein sequences generated by whole genome sequencing projects require rapid and accurate methods of annotation. The detection of homology through computational sequence analysis is a powerful tool in determining the complex evolutionary and functional relationships that exist between proteins. Homology search algorithms employ amino acid substitution matrices to detect similarity between proteins sequences. The substitution matrices in common use today are constructed using sequences aligned without reference to protein structure. Here we present amino acid substitution matrices constructed from the alignment of a large number of protein domain structures from the structural classification of proteins (SCOP) database. We show that when incorporated into the homology search algorithms BLAST and PSI-blast, the structure-based substitution matrices enhance the efficacy of detecting remote homologs.
Eyrich, V A; Standley, D M; Friesner, R A
1999-05-14
We report the tertiary structure predictions for 95 proteins ranging in size from 17 to 160 residues starting from known secondary structure. Predictions are obtained from global minimization of an empirical potential function followed by the application of a refined atomic overlap potential. The minimization strategy employed represents a variant of the Monte Carlo plus minimization scheme of Li and Scheraga applied to a reduced model of the protein chain. For all of the cases except beta-proteins larger than 75 residues, a native-like structure, usually 4-6 A root-mean-square deviation from the native, is located. For beta-proteins larger than 75 residues, the energy gap between native-like structures and the lowest energy structures produced in the simulation is large, so that low RMSD structures are not generated starting from an unfolded state. This is attributed to the lack of an explicit hydrogen bond term in the potential function, which we hypothesize is necessary to stabilize large assemblies of beta-strands. Copyright 1999 Academic Press.
Schindler, Christina E M; de Vries, Sjoerd J; Zacharias, Martin
2015-02-01
Protein-protein interactions are abundant in the cell but to date structural data for a large number of complexes is lacking. Computational docking methods can complement experiments by providing structural models of complexes based on structures of the individual partners. A major caveat for docking success is accounting for protein flexibility. Especially, interface residues undergo significant conformational changes upon binding. This limits the performance of docking methods that keep partner structures rigid or allow limited flexibility. A new docking refinement approach, iATTRACT, has been developed which combines simultaneous full interface flexibility and rigid body optimizations during docking energy minimization. It employs an atomistic molecular mechanics force field for intermolecular interface interactions and a structure-based force field for intramolecular contributions. The approach was systematically evaluated on a large protein-protein docking benchmark, starting from an enriched decoy set of rigidly docked protein-protein complexes deviating by up to 15 Å from the native structure at the interface. Large improvements in sampling and slight but significant improvements in scoring/discrimination of near native docking solutions were observed. Complexes with initial deviations at the interface of up to 5.5 Å were refined to significantly better agreement with the native structure. Improvements in the fraction of native contacts were especially favorable, yielding increases of up to 70%. © 2014 Wiley Periodicals, Inc.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Liu, X.; Wilcox, G.L.
1993-12-31
We have implemented large scale back-propagation neural networks on a 544 node Connection Machine, CM-5, using the C language in MIMD mode. The program running on 512 processors performs backpropagation learning at 0.53 Gflops, which provides 76 million connection updates per second. We have applied the network to the prediction of protein tertiary structure from sequence information alone. A neural network with one hidden layer and 40 million connections is trained to learn the relationship between sequence and tertiary structure. The trained network yields predicted structures of some proteins on which it has not been trained given only their sequences.more » Presentation of the Fourier transform of the sequences accentuates periodicity in the sequence and yields good generalization with greatly increased training efficiency. Training simulations with a large, heterologous set of protein structures (111 proteins from CM-5 time) to solutions with under 2% RMS residual error within the training set (random responses give an RMS error of about 20%). Presentation of 15 sequences of related proteins in a testing set of 24 proteins yields predicted structures with less than 8% RMS residual error, indicating good apparent generalization.« less
Protein Models Docking Benchmark 2
Anishchenko, Ivan; Kundrotas, Petras J.; Tuzikov, Alexander V.; Vakser, Ilya A.
2015-01-01
Structural characterization of protein-protein interactions is essential for our ability to understand life processes. However, only a fraction of known proteins have experimentally determined structures. Such structures provide templates for modeling of a large part of the proteome, where individual proteins can be docked by template-free or template-based techniques. Still, the sensitivity of the docking methods to the inherent inaccuracies of protein models, as opposed to the experimentally determined high-resolution structures, remains largely untested, primarily due to the absence of appropriate benchmark set(s). Structures in such a set should have pre-defined inaccuracy levels and, at the same time, resemble actual protein models in terms of structural motifs/packing. The set should also be large enough to ensure statistical reliability of the benchmarking results. We present a major update of the previously developed benchmark set of protein models. For each interactor, six models were generated with the model-to-native Cα RMSD in the 1 to 6 Å range. The models in the set were generated by a new approach, which corresponds to the actual modeling of new protein structures in the “real case scenario,” as opposed to the previous set, where a significant number of structures were model-like only. In addition, the larger number of complexes (165 vs. 63 in the previous set) increases the statistical reliability of the benchmarking. We estimated the highest accuracy of the predicted complexes (according to CAPRI criteria), which can be attained using the benchmark structures. The set is available at http://dockground.bioinformatics.ku.edu. PMID:25712716
Protein docking by the interface structure similarity: how much structure is needed?
Sinha, Rohita; Kundrotas, Petras J; Vakser, Ilya A
2012-01-01
The increasing availability of co-crystallized protein-protein complexes provides an opportunity to use template-based modeling for protein-protein docking. Structure alignment techniques are useful in detection of remote target-template similarities. The size of the structure involved in the alignment is important for the success in modeling. This paper describes a systematic large-scale study to find the optimal definition/size of the interfaces for the structure alignment-based docking applications. The results showed that structural areas corresponding to the cutoff values <12 Å across the interface inadequately represent structural details of the interfaces. With the increase of the cutoff beyond 12 Å, the success rate for the benchmark set of 99 protein complexes, did not increase significantly for higher accuracy models, and decreased for lower-accuracy models. The 12 Å cutoff was optimal in our interface alignment-based docking, and a likely best choice for the large-scale (e.g., on the scale of the entire genome) applications to protein interaction networks. The results provide guidelines for the docking approaches, including high-throughput applications to modeled structures.
High-Resolution Protein Structure Determination by Serial Femtosecond Crystallography
Boutet, Sébastien; Lomb, Lukas; Williams, Garth J.; Barends, Thomas R. M.; Aquila, Andrew; Doak, R. Bruce; Weierstall, Uwe; DePonte, Daniel P.; Steinbrener, Jan; Shoeman, Robert L.; Messerschmidt, Marc; Barty, Anton; White, Thomas A.; Kassemeyer, Stephan; Kirian, Richard A.; Seibert, M. Marvin; Montanez, Paul A.; Kenney, Chris; Herbst, Ryan; Hart, Philip; Pines, Jack; Haller, Gunther; Gruner, Sol M.; Philipp, Hugh T.; Tate, Mark W.; Hromalik, Marianne; Koerner, Lucas J.; van Bakel, Niels; Morse, John; Ghonsalves, Wilfred; Arnlund, David; Bogan, Michael J.; Caleman, Carl; Fromme, Raimund; Hampton, Christina Y.; Hunter, Mark S.; Johansson, Linda C.; Katona, Gergely; Kupitz, Christopher; Liang, Mengning; Martin, Andrew V.; Nass, Karol; Redecke, Lars; Stellato, Francesco; Timneanu, Nicusor; Wang, Dingjie; Zatsepin, Nadia A.; Schafer, Donald; Defever, James; Neutze, Richard; Fromme, Petra; Spence, John C. H.; Chapman, Henry N.; Schlichting, Ilme
2013-01-01
Structure determination of proteins and other macromolecules has historically required the growth of high-quality crystals sufficiently large to diffract x-rays efficiently while withstanding radiation damage. We applied serial femtosecond crystallography (SFX) using an x-ray free-electron laser (XFEL) to obtain high-resolution structural information from microcrystals (less than 1 micrometer by 1 micrometer by 3 micrometers) of the well-characterized model protein lysozyme. The agreement with synchrotron data demonstrates the immediate relevance of SFX for analyzing the structure of the large group of difficult-to-crystallize molecules. PMID:22653729
Accelerating large-scale protein structure alignments with graphics processing units
2012-01-01
Background Large-scale protein structure alignment, an indispensable tool to structural bioinformatics, poses a tremendous challenge on computational resources. To ensure structure alignment accuracy and efficiency, efforts have been made to parallelize traditional alignment algorithms in grid environments. However, these solutions are costly and of limited accessibility. Others trade alignment quality for speedup by using high-level characteristics of structure fragments for structure comparisons. Findings We present ppsAlign, a parallel protein structure Alignment framework designed and optimized to exploit the parallelism of Graphics Processing Units (GPUs). As a general-purpose GPU platform, ppsAlign could take many concurrent methods, such as TM-align and Fr-TM-align, into the parallelized algorithm design. We evaluated ppsAlign on an NVIDIA Tesla C2050 GPU card, and compared it with existing software solutions running on an AMD dual-core CPU. We observed a 36-fold speedup over TM-align, a 65-fold speedup over Fr-TM-align, and a 40-fold speedup over MAMMOTH. Conclusions ppsAlign is a high-performance protein structure alignment tool designed to tackle the computational complexity issues from protein structural data. The solution presented in this paper allows large-scale structure comparisons to be performed using massive parallel computing power of GPU. PMID:22357132
Andreeva, Antonina
2016-06-15
The Structural Classification of Proteins (SCOP) database has facilitated the development of many tools and algorithms and it has been successfully used in protein structure prediction and large-scale genome annotations. During the development of SCOP, numerous exceptions were found to topological rules, along with complex evolutionary scenarios and peculiarities in proteins including the ability to fold into alternative structures. This article reviews cases of structural variations observed for individual proteins and among groups of homologues, knowledge of which is essential for protein structure modelling. © 2016 The Author(s). published by Portland Press Limited on behalf of the Biochemical Society.
Keates, Tracy; Cooper, Christopher D O; Savitsky, Pavel; Allerston, Charles K; Phillips, Claire; Hammarström, Martin; Daga, Neha; Berridge, Georgina; Mahajan, Pravin; Burgess-Brown, Nicola A; Müller, Susanne; Gräslund, Susanne; Gileadi, Opher
2012-06-15
The generation of affinity reagents to large numbers of human proteins depends on the ability to express the target proteins as high-quality antigens. The Structural Genomics Consortium (SGC) focuses on the production and structure determination of human proteins. In a 7-year period, the SGC has deposited crystal structures of >800 human protein domains, and has additionally expressed and purified a similar number of protein domains that have not yet been crystallised. The targets include a diversity of protein domains, with an attempt to provide high coverage of protein families. The family approach provides an excellent basis for characterising the selectivity of affinity reagents. We present a summary of the approaches used to generate purified human proteins or protein domains, a test case demonstrating the ability to rapidly generate new proteins, and an optimisation study on the modification of >70 proteins by biotinylation in vivo. These results provide a unique synergy between large-scale structural projects and the recent efforts to produce a wide coverage of affinity reagents to the human proteome. Copyright © 2011 Elsevier B.V. All rights reserved.
Keates, Tracy; Cooper, Christopher D.O.; Savitsky, Pavel; Allerston, Charles K.; Phillips, Claire; Hammarström, Martin; Daga, Neha; Berridge, Georgina; Mahajan, Pravin; Burgess-Brown, Nicola A.; Müller, Susanne; Gräslund, Susanne; Gileadi, Opher
2012-01-01
The generation of affinity reagents to large numbers of human proteins depends on the ability to express the target proteins as high-quality antigens. The Structural Genomics Consortium (SGC) focuses on the production and structure determination of human proteins. In a 7-year period, the SGC has deposited crystal structures of >800 human protein domains, and has additionally expressed and purified a similar number of protein domains that have not yet been crystallised. The targets include a diversity of protein domains, with an attempt to provide high coverage of protein families. The family approach provides an excellent basis for characterising the selectivity of affinity reagents. We present a summary of the approaches used to generate purified human proteins or protein domains, a test case demonstrating the ability to rapidly generate new proteins, and an optimisation study on the modification of >70 proteins by biotinylation in vivo. These results provide a unique synergy between large-scale structural projects and the recent efforts to produce a wide coverage of affinity reagents to the human proteome. PMID:22027370
2014-01-01
Background Due to rapid sequencing of genomes, there are now millions of deposited protein sequences with no known function. Fast sequence-based comparisons allow detecting close homologs for a protein of interest to transfer functional information from the homologs to the given protein. Sequence-based comparison cannot detect remote homologs, in which evolution has adjusted the sequence while largely preserving structure. Structure-based comparisons can detect remote homologs but most methods for doing so are too expensive to apply at a large scale over structural databases of proteins. Recently, fragment-based structural representations have been proposed that allow fast detection of remote homologs with reasonable accuracy. These representations have also been used to obtain linearly-reducible maps of protein structure space. It has been shown, as additionally supported from analysis in this paper that such maps preserve functional co-localization of the protein structure space. Methods Inspired by a recent application of the Latent Dirichlet Allocation (LDA) model for conducting structural comparisons of proteins, we propose higher-order LDA-obtained topic-based representations of protein structures to provide an alternative route for remote homology detection and organization of the protein structure space in few dimensions. Various techniques based on natural language processing are proposed and employed to aid the analysis of topics in the protein structure domain. Results We show that a topic-based representation is just as effective as a fragment-based one at automated detection of remote homologs and organization of protein structure space. We conduct a detailed analysis of the information content in the topic-based representation, showing that topics have semantic meaning. The fragment-based and topic-based representations are also shown to allow prediction of superfamily membership. Conclusions This work opens exciting venues in designing novel representations to extract information about protein structures, as well as organizing and mining protein structure space with mature text mining tools. PMID:25080993
DOE Office of Scientific and Technical Information (OSTI.GOV)
Osipiuk, J.; Gornicki, P.; Maj, L.
The structure of the YlxR protein of unknown function from Streptococcus pneumonia was determined to 1.35 Angstroms. YlxR is expressed from the nusA/infB operon in bacteria and belongs to a small protein family (COG2740) that shares a conserved sequence motif GRGA(Y/W). The family shows no significant amino-acid sequence similarity with other proteins. Three-wavelength diffraction MAD data were collected to 1.7 Angstroms from orthorhombic crystals using synchrotron radiation and the structure was determined using a semi-automated approach. The YlxR structure resembles a two-layer {alpha}/{beta} sandwich with the overall shape of a cylinder and shows no structural homology to proteins of knownmore » structure. Structural analysis revealed that the YlxR structure represents a new protein fold that belongs to the {alpha}-{beta} plait superfamily. The distribution of the electrostatic surface potential shows a large positively charged patch on one side of the protein, a feature often found in nucleic acid-binding proteins. Three sulfate ions bind to this positively charged surface. Analysis of potential binding sites uncovered several substantial clefts, with the largest spanning 3/4 of the protein. A similar distribution of binding sites and a large sharply bent cleft are observed in RNA-binding proteins that are unrelated in sequence and structure. It is proposed that YlxR is an RNA-binding protein.« less
Streptococcus pneumonia YlxR at 1.35 A shows a putative new fold.
Osipiuk, J; Górnicki, P; Maj, L; Dementieva, I; Laskowski, R; Joachimiak, A
2001-11-01
The structure of the YlxR protein of unknown function from Streptococcus pneumonia was determined to 1.35 A. YlxR is expressed from the nusA/infB operon in bacteria and belongs to a small protein family (COG2740) that shares a conserved sequence motif GRGA(Y/W). The family shows no significant amino-acid sequence similarity with other proteins. Three-wavelength diffraction MAD data were collected to 1.7 A from orthorhombic crystals using synchrotron radiation and the structure was determined using a semi-automated approach. The YlxR structure resembles a two-layer alpha/beta sandwich with the overall shape of a cylinder and shows no structural homology to proteins of known structure. Structural analysis revealed that the YlxR structure represents a new protein fold that belongs to the alpha-beta plait superfamily. The distribution of the electrostatic surface potential shows a large positively charged patch on one side of the protein, a feature often found in nucleic acid-binding proteins. Three sulfate ions bind to this positively charged surface. Analysis of potential binding sites uncovered several substantial clefts, with the largest spanning 3/4 of the protein. A similar distribution of binding sites and a large sharply bent cleft are observed in RNA-binding proteins that are unrelated in sequence and structure. It is proposed that YlxR is an RNA-binding protein.
Structure-Based Characterization of Multiprotein Complexes
Wiederstein, Markus; Gruber, Markus; Frank, Karl; Melo, Francisco; Sippl, Manfred J.
2014-01-01
Summary Multiprotein complexes govern virtually all cellular processes. Their 3D structures provide important clues to their biological roles, especially through structural correlations among protein molecules and complexes. The detection of such correlations generally requires comprehensive searches in databases of known protein structures by means of appropriate structure-matching techniques. Here, we present a high-speed structure search engine capable of instantly matching large protein oligomers against the complete and up-to-date database of biologically functional assemblies of protein molecules. We use this tool to reveal unseen structural correlations on the level of protein quaternary structure and demonstrate its general usefulness for efficiently exploring complex structural relationships among known protein assemblies. PMID:24954616
Alpha-Helical Protein Networks Are Self-Protective and Flaw-Tolerant
Ackbarow, Theodor; Sen, Dipanjan; Thaulow, Christian; Buehler, Markus J.
2009-01-01
Alpha-helix based protein networks as they appear in intermediate filaments in the cell’s cytoskeleton and the nuclear membrane robustly withstand large deformation of up to several hundred percent strain, despite the presence of structural imperfections or flaws. This performance is not achieved by most synthetic materials, which typically fail at much smaller deformation and show a great sensitivity to the existence of structural flaws. Here we report a series of molecular dynamics simulations with a simple coarse-grained multi-scale model of alpha-helical protein domains, explaining the structural and mechanistic basis for this observed behavior. We find that the characteristic properties of alpha-helix based protein networks are due to the particular nanomechanical properties of their protein constituents, enabling the formation of large dissipative yield regions around structural flaws, effectively protecting the protein network against catastrophic failure. We show that the key for these self protecting properties is a geometric transformation of the crack shape that significantly reduces the stress concentration at corners. Specifically, our analysis demonstrates that the failure strain of alpha-helix based protein networks is insensitive to the presence of structural flaws in the protein network, only marginally affecting their overall strength. Our findings may help to explain the ability of cells to undergo large deformation without catastrophic failure while providing significant mechanical resistance. PMID:19547709
Structure of the uncleaved ectodomain of the paramyxovirus (hPIV3) fusion protein
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yin, Hsien-Sheng; Paterson, Reay G.; Wen, Xiaolin
2010-03-08
Class I viral fusion proteins share common mechanistic and structural features but little sequence similarity. Structural insights into the protein conformational changes associated with membrane fusion are based largely on studies of the influenza virus hemagglutinin in pre- and postfusion conformations. Here, we present the crystal structure of the secreted, uncleaved ectodomain of the paramyxovirus, human parainfluenza virus 3 fusion (F) protein, a member of the class I viral fusion protein group. The secreted human parainfluenza virus 3 F forms a trimer with distinct head, neck, and stalk regions. Unexpectedly, the structure reveals a six-helix bundle associated with the postfusionmore » form of F, suggesting that the anchor-minus ectodomain adopts a conformation largely similar to the postfusion state. The transmembrane anchor domains of F may therefore profoundly influence the folding energetics that establish and maintain a metastable, prefusion state.« less
Johann Deisenhofer, Crystallography, and Proteins
research using X-ray crystallography to elucidate for the first time the three-dimensional structure of a large membrane-bound protein molecule. This structure helped explain the process of photosynthesis, by a protein structure determination that relied on complementary features of two different beam lines
Multi-Conformer Ensemble Docking to Difficult Protein Targets
Ellingson, Sally R.; Miao, Yinglong; Baudry, Jerome; ...
2014-09-08
We investigate large-scale ensemble docking using five proteins from the Directory of Useful Decoys (DUD, dud.docking.org) for which docking to crystal structures has proven difficult. Molecular dynamics trajectories are produced for each protein and an ensemble of representative conformational structures extracted from the trajectories. Docking calculations are performed on these selected simulation structures and ensemble-based enrichment factors compared with those obtained using docking in crystal structures of the same protein targets or random selection of compounds. We also found simulation-derived snapshots with improved enrichment factors that increased the chemical diversity of docking hits for four of the five selected proteins.more » A combination of all the docking results obtained from molecular dynamics simulation followed by selection of top-ranking compounds appears to be an effective strategy for increasing the number and diversity of hits when using docking to screen large libraries of chemicals against difficult protein targets.« less
Tuncbag, Nurcan; Gursoy, Attila; Nussinov, Ruth; Keskin, Ozlem
2011-08-11
Prediction of protein-protein interactions at the structural level on the proteome scale is important because it allows prediction of protein function, helps drug discovery and takes steps toward genome-wide structural systems biology. We provide a protocol (termed PRISM, protein interactions by structural matching) for large-scale prediction of protein-protein interactions and assembly of protein complex structures. The method consists of two components: rigid-body structural comparisons of target proteins to known template protein-protein interfaces and flexible refinement using a docking energy function. The PRISM rationale follows our observation that globally different protein structures can interact via similar architectural motifs. PRISM predicts binding residues by using structural similarity and evolutionary conservation of putative binding residue 'hot spots'. Ultimately, PRISM could help to construct cellular pathways and functional, proteome-scale annotation. PRISM is implemented in Python and runs in a UNIX environment. The program accepts Protein Data Bank-formatted protein structures and is available at http://prism.ccbb.ku.edu.tr/prism_protocol/.
Structure of faustovirus, a large dsDNA virus
DOE Office of Scientific and Technical Information (OSTI.GOV)
Klose, Thomas; Reteno, Dorine G.; Benamar, Samia
Many viruses protect their genome with a combination of a protein shell with or without a membrane layer. In this paper, we describe the structure of faustovirus, the first DNA virus (to our knowledge) that has been found to use two protein shells to encapsidate and protect its genome. The crystal structure of the major capsid protein, in combination with cryo-electron microscopy structures of two different maturation stages of the virus, shows that the outer virus shell is composed of a double jelly-roll protein that can be found in many double-stranded DNA viruses. The structure of the repeating hexameric unitmore » of the inner shell is different from all other known capsid proteins. In addition to the unique architecture, the region of the genome that encodes the major capsid protein stretches over 17,000 bp and contains a large number of introns and exons. Finally, this complexity might help the virus to rapidly adapt to new environments or hosts.« less
Structure of faustovirus, a large dsDNA virus
Klose, Thomas; Reteno, Dorine G.; Benamar, Samia; ...
2016-05-16
Many viruses protect their genome with a combination of a protein shell with or without a membrane layer. In this paper, we describe the structure of faustovirus, the first DNA virus (to our knowledge) that has been found to use two protein shells to encapsidate and protect its genome. The crystal structure of the major capsid protein, in combination with cryo-electron microscopy structures of two different maturation stages of the virus, shows that the outer virus shell is composed of a double jelly-roll protein that can be found in many double-stranded DNA viruses. The structure of the repeating hexameric unitmore » of the inner shell is different from all other known capsid proteins. In addition to the unique architecture, the region of the genome that encodes the major capsid protein stretches over 17,000 bp and contains a large number of introns and exons. Finally, this complexity might help the virus to rapidly adapt to new environments or hosts.« less
Uchikoga, Nobuyuki; Hirokawa, Takatsugu
2010-05-11
Protein-protein docking for proteins with large conformational changes was analyzed by using interaction fingerprints, one of the scales for measuring similarities among complex structures, utilized especially for searching near-native protein-ligand or protein-protein complex structures. Here, we have proposed a combined method for analyzing protein-protein docking by taking large conformational changes into consideration. This combined method consists of ensemble soft docking with multiple protein structures, refinement of complexes, and cluster analysis using interaction fingerprints and energy profiles. To test for the applicability of this combined method, various CaM-ligand complexes were reconstructed from the NMR structures of unbound CaM. For the purpose of reconstruction, we used three known CaM-ligands, namely, the CaM-binding peptides of cyclic nucleotide gateway (CNG), CaM kinase kinase (CaMKK) and the plasma membrane Ca2+ ATPase pump (PMCA), and thirty-one structurally diverse CaM conformations. For each ligand, 62000 CaM-ligand complexes were generated in the docking step and the relationship between their energy profiles and structural similarities to the native complex were analyzed using interaction fingerprint and RMSD. Near-native clusters were obtained in the case of CNG and CaMKK. The interaction fingerprint method discriminated near-native structures better than the RMSD method in cluster analysis. We showed that a combined method that includes the interaction fingerprint is very useful for protein-protein docking analysis of certain cases.
Worobec, E A; Martin, N L; McCubbin, W D; Kay, C M; Brayer, G D; Hancock, R E
1988-04-07
A large-scale purification scheme was developed for lipopolysaccharide-free protein P, the phosphate-starvation-inducible outer-membrane porin from Pseudomonas aeruginosa. This highly purified protein P was used to successfully form hexagonal crystals in the presence of n-octyl-beta-glucopyranoside. Amino-acid analysis indicated that protein P had a similar composition to other bacterial outer membrane proteins, containing a high percentage (50%) of hydrophilic residues. The amino-terminal sequence of this protein, although not homologous to either outer membrane protein, PhoE or OmpF, of Escherichia coli, was found to have an analogous protein-folding pattern. Protein P in the native trimer form was capable of maintaining a stable functional trimer after proteinase cleavage. This suggested the existence of a strongly associated tertiary and quaternary structure. Circular dichroism studies confirmed these results in that a large proportion of the protein structure was determined to be beta-sheet and resistant to acid pH and heating in 0.1% sodium dodecyl sulphate.
DeepQA: improving the estimation of single protein model quality with deep belief networks.
Cao, Renzhi; Bhattacharya, Debswapna; Hou, Jie; Cheng, Jianlin
2016-12-05
Protein quality assessment (QA) useful for ranking and selecting protein models has long been viewed as one of the major challenges for protein tertiary structure prediction. Especially, estimating the quality of a single protein model, which is important for selecting a few good models out of a large model pool consisting of mostly low-quality models, is still a largely unsolved problem. We introduce a novel single-model quality assessment method DeepQA based on deep belief network that utilizes a number of selected features describing the quality of a model from different perspectives, such as energy, physio-chemical characteristics, and structural information. The deep belief network is trained on several large datasets consisting of models from the Critical Assessment of Protein Structure Prediction (CASP) experiments, several publicly available datasets, and models generated by our in-house ab initio method. Our experiments demonstrate that deep belief network has better performance compared to Support Vector Machines and Neural Networks on the protein model quality assessment problem, and our method DeepQA achieves the state-of-the-art performance on CASP11 dataset. It also outperformed two well-established methods in selecting good outlier models from a large set of models of mostly low quality generated by ab initio modeling methods. DeepQA is a useful deep learning tool for protein single model quality assessment and protein structure prediction. The source code, executable, document and training/test datasets of DeepQA for Linux is freely available to non-commercial users at http://cactus.rnet.missouri.edu/DeepQA/ .
Reaction trajectory revealed by a joint analysis of protein data bank.
Ren, Zhong
2013-01-01
Structural motions along a reaction pathway hold the secret about how a biological macromolecule functions. If each static structure were considered as a snapshot of the protein molecule in action, a large collection of structures would constitute a multidimensional conformational space of an enormous size. Here I present a joint analysis of hundreds of known structures of human hemoglobin in the Protein Data Bank. By applying singular value decomposition to distance matrices of these structures, I demonstrate that this large collection of structural snapshots, derived under a wide range of experimental conditions, arrange orderly along a reaction pathway. The structural motions along this extensive trajectory, including several helical transformations, arrive at a reverse engineered mechanism of the cooperative machinery (Ren, companion article), and shed light on pathological properties of the abnormal homotetrameric hemoglobins from α-thalassemia. This method of meta-analysis provides a general approach to structural dynamics based on static protein structures in this post genomics era.
Reaction Trajectory Revealed by a Joint Analysis of Protein Data Bank
Ren, Zhong
2013-01-01
Structural motions along a reaction pathway hold the secret about how a biological macromolecule functions. If each static structure were considered as a snapshot of the protein molecule in action, a large collection of structures would constitute a multidimensional conformational space of an enormous size. Here I present a joint analysis of hundreds of known structures of human hemoglobin in the Protein Data Bank. By applying singular value decomposition to distance matrices of these structures, I demonstrate that this large collection of structural snapshots, derived under a wide range of experimental conditions, arrange orderly along a reaction pathway. The structural motions along this extensive trajectory, including several helical transformations, arrive at a reverse engineered mechanism of the cooperative machinery (Ren, companion article), and shed light on pathological properties of the abnormal homotetrameric hemoglobins from α-thalassemia. This method of meta-analysis provides a general approach to structural dynamics based on static protein structures in this post genomics era. PMID:24244274
Adamczak, Rafal; Meller, Jarek
2016-12-28
Advances in computing have enabled current protein and RNA structure prediction and molecular simulation methods to dramatically increase their sampling of conformational spaces. The quickly growing number of experimentally resolved structures, and databases such as the Protein Data Bank, also implies large scale structural similarity analyses to retrieve and classify macromolecular data. Consequently, the computational cost of structure comparison and clustering for large sets of macromolecular structures has become a bottleneck that necessitates further algorithmic improvements and development of efficient software solutions. uQlust is a versatile and easy-to-use tool for ultrafast ranking and clustering of macromolecular structures. uQlust makes use of structural profiles of proteins and nucleic acids, while combining a linear-time algorithm for implicit comparison of all pairs of models with profile hashing to enable efficient clustering of large data sets with a low memory footprint. In addition to ranking and clustering of large sets of models of the same protein or RNA molecule, uQlust can also be used in conjunction with fragment-based profiles in order to cluster structures of arbitrary length. For example, hierarchical clustering of the entire PDB using profile hashing can be performed on a typical laptop, thus opening an avenue for structural explorations previously limited to dedicated resources. The uQlust package is freely available under the GNU General Public License at https://github.com/uQlust . uQlust represents a drastic reduction in the computational complexity and memory requirements with respect to existing clustering and model quality assessment methods for macromolecular structure analysis, while yielding results on par with traditional approaches for both proteins and RNAs.
An Unusual Hydrophobic Core Confers Extreme Flexibility to HEAT Repeat Proteins
Kappel, Christian; Zachariae, Ulrich; Dölker, Nicole; Grubmüller, Helmut
2010-01-01
Alpha-solenoid proteins are suggested to constitute highly flexible macromolecules, whose structural variability and large surface area is instrumental in many important protein-protein binding processes. By equilibrium and nonequilibrium molecular dynamics simulations, we show that importin-β, an archetypical α-solenoid, displays unprecedentedly large and fully reversible elasticity. Our stretching molecular dynamics simulations reveal full elasticity over up to twofold end-to-end extensions compared to its bound state. Despite the absence of any long-range intramolecular contacts, the protein can return to its equilibrium structure to within 3 Å backbone RMSD after the release of mechanical stress. We find that this extreme degree of flexibility is based on an unusually flexible hydrophobic core that differs substantially from that of structurally similar but more rigid globular proteins. In that respect, the core of importin-β resembles molten globules. The elastic behavior is dominated by nonpolar interactions between HEAT repeats, combined with conformational entropic effects. Our results suggest that α-solenoid structures such as importin-β may bridge the molecular gap between completely structured and intrinsically disordered proteins. PMID:20816072
Structure-based characterization of multiprotein complexes.
Wiederstein, Markus; Gruber, Markus; Frank, Karl; Melo, Francisco; Sippl, Manfred J
2014-07-08
Multiprotein complexes govern virtually all cellular processes. Their 3D structures provide important clues to their biological roles, especially through structural correlations among protein molecules and complexes. The detection of such correlations generally requires comprehensive searches in databases of known protein structures by means of appropriate structure-matching techniques. Here, we present a high-speed structure search engine capable of instantly matching large protein oligomers against the complete and up-to-date database of biologically functional assemblies of protein molecules. We use this tool to reveal unseen structural correlations on the level of protein quaternary structure and demonstrate its general usefulness for efficiently exploring complex structural relationships among known protein assemblies. Copyright © 2014 The Authors. Published by Elsevier Inc. All rights reserved.
Protein complex prediction in large ontology attributed protein-protein interaction networks.
Zhang, Yijia; Lin, Hongfei; Yang, Zhihao; Wang, Jian; Li, Yanpeng; Xu, Bo
2013-01-01
Protein complexes are important for unraveling the secrets of cellular organization and function. Many computational approaches have been developed to predict protein complexes in protein-protein interaction (PPI) networks. However, most existing approaches focus mainly on the topological structure of PPI networks, and largely ignore the gene ontology (GO) annotation information. In this paper, we constructed ontology attributed PPI networks with PPI data and GO resource. After constructing ontology attributed networks, we proposed a novel approach called CSO (clustering based on network structure and ontology attribute similarity). Structural information and GO attribute information are complementary in ontology attributed networks. CSO can effectively take advantage of the correlation between frequent GO annotation sets and the dense subgraph for protein complex prediction. Our proposed CSO approach was applied to four different yeast PPI data sets and predicted many well-known protein complexes. The experimental results showed that CSO was valuable in predicting protein complexes and achieved state-of-the-art performance.
Valéry, Céline; Deville-Foillard, Stéphanie; Lefebvre, Christelle; Taberner, Nuria; Legrand, Pierre; Meneau, Florian; Meriadec, Cristelle; Delvaux, Camille; Bizien, Thomas; Kasotakis, Emmanouil; Lopez-Iglesias, Carmen; Gall, Andrew; Bressanelli, Stéphane; Le Du, Marie-Hélène; Paternostre, Maïté; Artzner, Franck
2015-01-01
External stimuli are powerful tools that naturally control protein assemblies and functions. For example, during viral entry and exit changes in pH are known to trigger large protein conformational changes. However, the molecular features stabilizing the higher pH structures remain unclear. Here we elucidate the conformational change of a self-assembling peptide that forms either small or large nanotubes dependent on the pH. The sub-angstrom high-pH peptide structure reveals a globular conformation stabilized through a strong histidine-serine H-bond and a tight histidine-aromatic packing. Lowering the pH induces histidine protonation, disrupts these interactions and triggers a large change to an extended β-sheet-based conformation. Re-visiting available structures of proteins with pH-dependent conformations reveals both histidine-containing aromatic pockets and histidine-serine proximity as key motifs in higher pH structures. The mechanism discovered in this study may thus be generally used by pH-dependent proteins and opens new prospects in the field of nanomaterials. PMID:26190377
DWARF – a data warehouse system for analyzing protein families
Fischer, Markus; Thai, Quan K; Grieb, Melanie; Pleiss, Jürgen
2006-01-01
Background The emerging field of integrative bioinformatics provides the tools to organize and systematically analyze vast amounts of highly diverse biological data and thus allows to gain a novel understanding of complex biological systems. The data warehouse DWARF applies integrative bioinformatics approaches to the analysis of large protein families. Description The data warehouse system DWARF integrates data on sequence, structure, and functional annotation for protein fold families. The underlying relational data model consists of three major sections representing entities related to the protein (biochemical function, source organism, classification to homologous families and superfamilies), the protein sequence (position-specific annotation, mutant information), and the protein structure (secondary structure information, superimposed tertiary structure). Tools for extracting, transforming and loading data from public available resources (ExPDB, GenBank, DSSP) are provided to populate the database. The data can be accessed by an interface for searching and browsing, and by analysis tools that operate on annotation, sequence, or structure. We applied DWARF to the family of α/β-hydrolases to host the Lipase Engineering database. Release 2.3 contains 6138 sequences and 167 experimentally determined protein structures, which are assigned to 37 superfamilies 103 homologous families. Conclusion DWARF has been designed for constructing databases of large structurally related protein families and for evaluating their sequence-structure-function relationships by a systematic analysis of sequence, structure and functional annotation. It has been applied to predict biochemical properties from sequence, and serves as a valuable tool for protein engineering. PMID:17094801
Host nuclear proteins expressed in simian virus 40-transformed and -infected cells.
Melero, J A; Tur, S; Carroll, R B
1980-01-01
Two new families of host proteins (Mr, 48,000 and 55,000), in additional to the viral large (T) and small tumor antigens, are precipitable, with anti-T antiserum, from cells transformed or infected by the DNA tumor virus simian virus 40 (SV40). Rabbit anti-mouse 48,000 protein antiserum reacts specifically with SV40-infected or -transformed mouse cells to give nuclear staining indistinguishable from T-antigen staining but does not react with SV40-transformed human cells which nevertheless have structurally analogous 48,000 proteins, nor does it give nuclear fluorescence with untransformed mouse cells. Comparison of the partial proteolytic digests of the 48,000 proteins from cultured cells of various mammalian species shows that they are structurally related but not related to the 55,000 or large T-antigen proteins. The 55,000 proteins from the various mammalian species were also structurally related. Images PMID:6244576
Nannenga, Brent L; Iadanza, Matthew G; Vollmar, Breanna S; Gonen, Tamir
2013-01-01
Electron cryomicroscopy, or cryoEM, is an emerging technique for studying the three-dimensional structures of proteins and large macromolecular machines. Electron crystallography is a branch of cryoEM in which structures of proteins can be studied at resolutions that rival those achieved by X-ray crystallography. Electron crystallography employs two-dimensional crystals of a membrane protein embedded within a lipid bilayer. The key to a successful electron crystallographic experiment is the crystallization, or reconstitution, of the protein of interest. This unit describes ways in which protein can be expressed, purified, and reconstituted into well-ordered two-dimensional crystals. A protocol is also provided for negative stain electron microscopy as a tool for screening crystallization trials. When large and well-ordered crystals are obtained, the structures of both protein and its surrounding membrane can be determined to atomic resolution.
Functional Advantages of Conserved Intrinsic Disorder in RNA-Binding Proteins.
Varadi, Mihaly; Zsolyomi, Fruzsina; Guharoy, Mainak; Tompa, Peter
2015-01-01
Proteins form large macromolecular assemblies with RNA that govern essential molecular processes. RNA-binding proteins have often been associated with conformational flexibility, yet the extent and functional implications of their intrinsic disorder have never been fully assessed. Here, through large-scale analysis of comprehensive protein sequence and structure datasets we demonstrate the prevalence of intrinsic structural disorder in RNA-binding proteins and domains. We addressed their functionality through a quantitative description of the evolutionary conservation of disordered segments involved in binding, and investigated the structural implications of flexibility in terms of conformational stability and interface formation. We conclude that the functional role of intrinsically disordered protein segments in RNA-binding is two-fold: first, these regions establish extended, conserved electrostatic interfaces with RNAs via induced fit. Second, conformational flexibility enables them to target different RNA partners, providing multi-functionality, while also ensuring specificity. These findings emphasize the functional importance of intrinsically disordered regions in RNA-binding proteins.
Okazaki, Kei-ichi; Koga, Nobuyasu; Takada, Shoji; Onuchic, Jose N.; Wolynes, Peter G.
2006-01-01
Biomolecules often undergo large-amplitude motions when they bind or release other molecules. Unlike macroscopic machines, these biomolecular machines can partially disassemble (unfold) and then reassemble (fold) during such transitions. Here we put forward a minimal structure-based model, the “multiple-basin model,” that can directly be used for molecular dynamics simulation of even very large biomolecular systems so long as the endpoints of the conformational change are known. We investigate the model by simulating large-scale motions of four proteins: glutamine-binding protein, S100A6, dihydrofolate reductase, and HIV-1 protease. The mechanisms of conformational transition depend on the protein basin topologies and change with temperature near the folding transition. The conformational transition rate varies linearly with driving force over a fairly large range. This linearity appears to be a consequence of partial unfolding during the conformational transition. PMID:16877541
NASA Astrophysics Data System (ADS)
Fang, F.; Szleifer, I.
2003-07-01
The competitive adsorption of proteins of different sizes and charges is studied using a molecular theory. The theory enables the study of charged systems explicitly including the size, shape, and charge distributions in all the molecular species in the mixture. Thus, this approach goes beyond the commonly used Poisson-Boltzmann approximation. The adsorption isotherms of the protein mixtures are studied for mixtures of two proteins of different size and charge. The amount of proteins adsorbed and the fraction of each protein is calculated as a function of the bulk composition of the solution and the amount of salt in the system. It is found that the total amount of proteins adsorbed is a monotonically decreasing function of the fraction of large proteins on the bulk solution and for fixed protein composition of the salt concentration. However, the composition of the adsorbed layer is a complicated function of the bulk composition and solution ionic strength. The structure of the adsorb layer depends upon the bulk composition and salt concentration. In general, there are multilayers adsorbed due to the long-range character of the electrostatic interactions. When the composition of large proteins in bulk is in very large excess it is found that the structure of the adsorb multilayer is such that the layer in contact with the surface is composed by a mixture of large and small proteins. However, the second and third layers are almost exclusively composed of large proteins. The theory is also generalized to study the time-dependent adsorption. The approach is based on separation of time scales into fast modes for the ions from the salt and the solvent and slow for the proteins. The dynamic equations are written for the slow modes, while the fast ones are obtained from the condition of equilibrium constrained to the distribution of proteins given by the slow modes. Two different processes are presented: the adsorption from a homogeneous solution to a charged surface at low salt concentration, and large excess of the large proteins in bulk. The second process is the kinetics of structural and adsorption change by changing the salt concentration of the bulk solution from low to high. The first process shows a large overshoot of the large proteins on the surface due to their excess in solution, followed by a surface replacement by the smaller molecules. The second process shows a very fast desorption of the large proteins followed by adsorption at latter stages. This process is found to be driven by large electrostatic repulsions induced by the fast ions from the salt approaching the surface. The relevance of the theoretical predictions to experimental system and possible directions for improvements of the theory are discussed.
Park, Hahnbeom; Bradley, Philip; Greisen, Per; Liu, Yuan; Mulligan, Vikram Khipple; Kim, David E.; Baker, David; DiMaio, Frank
2017-01-01
Most biomolecular modeling energy functions for structure prediction, sequence design, and molecular docking, have been parameterized using existing macromolecular structural data; this contrasts molecular mechanics force fields which are largely optimized using small-molecule data. In this study, we describe an integrated method that enables optimization of a biomolecular modeling energy function simultaneously against small-molecule thermodynamic data and high-resolution macromolecular structural data. We use this approach to develop a next-generation Rosetta energy function that utilizes a new anisotropic implicit solvation model, and an improved electrostatics and Lennard-Jones model, illustrating how energy functions can be considerably improved in their ability to describe large-scale energy landscapes by incorporating both small-molecule and macromolecule data. The energy function improves performance in a wide range of protein structure prediction challenges, including monomeric structure prediction, protein-protein and protein-ligand docking, protein sequence design, and prediction of the free energy changes by mutation, while reasonably recapitulating small-molecule thermodynamic properties. PMID:27766851
Automated multi-dimensional purification of tagged proteins.
Sigrell, Jill A; Eklund, Pär; Galin, Markus; Hedkvist, Lotta; Liljedahl, Pia; Johansson, Christine Markeland; Pless, Thomas; Torstenson, Karin
2003-01-01
The capacity for high throughput purification (HTP) is essential in fields such as structural genomics where large numbers of protein samples are routinely characterized in, for example, studies of structural determination, functionality and drug development. Proteins required for such analysis must be pure and homogenous and available in relatively large amounts. AKTA 3D system is a powerful automated protein purification system, which minimizes preparation, run-time and repetitive manual tasks. It has the capacity to purify up to 6 different His6- or GST-tagged proteins per day and can produce 1-50 mg protein per run at >90% purity. The success of automated protein purification increases with careful experimental planning. Protocol, columns and buffers need to be chosen with the final application area for the purified protein in mind.
Energy Landscape of All-Atom Protein-Protein Interactions Revealed by Multiscale Enhanced Sampling
Moritsugu, Kei; Terada, Tohru; Kidera, Akinori
2014-01-01
Protein-protein interactions are regulated by a subtle balance of complicated atomic interactions and solvation at the interface. To understand such an elusive phenomenon, it is necessary to thoroughly survey the large configurational space from the stable complex structure to the dissociated states using the all-atom model in explicit solvent and to delineate the energy landscape of protein-protein interactions. In this study, we carried out a multiscale enhanced sampling (MSES) simulation of the formation of a barnase-barstar complex, which is a protein complex characterized by an extraordinary tight and fast binding, to determine the energy landscape of atomistic protein-protein interactions. The MSES adopts a multicopy and multiscale scheme to enable for the enhanced sampling of the all-atom model of large proteins including explicit solvent. During the 100-ns MSES simulation of the barnase-barstar system, we observed the association-dissociation processes of the atomistic protein complex in solution several times, which contained not only the native complex structure but also fully non-native configurations. The sampled distributions suggest that a large variety of non-native states went downhill to the stable complex structure, like a fast folding on a funnel-like potential. This funnel landscape is attributed to dominant configurations in the early stage of the association process characterized by near-native orientations, which will accelerate the native inter-molecular interactions. These configurations are guided mostly by the shape complementarity between barnase and barstar, and lead to the fast formation of the final complex structure along the downhill energy landscape. PMID:25340714
Eichmann, Cédric; Orts, Julien; Tzitzilonis, Christos; Vögeli, Beat; Smrt, Sean; Lorieau, Justin; Riek, Roland
2014-12-11
The interaction between membrane proteins and lipids or lipid mimetics such as detergents is key for the three-dimensional structure and dynamics of membrane proteins. In NMR-based structural studies of membrane proteins, qualitative analysis of intermolecular nuclear Overhauser enhancements (NOEs) or paramagnetic resonance enhancement are used in general to identify the transmembrane segments of a membrane protein. Here, we employed a quantitative characterization of intermolecular NOEs between (1)H of the detergent and (1)H(N) of (2)H-perdeuterated, (15)N-labeled α-helical membrane protein-detergent complexes following the exact NOE (eNOE) approach. Structural considerations suggest that these intermolecular NOEs should show a helical-wheel-type behavior along a transmembrane helix or a membrane-attached helix within a membrane protein as experimentally demonstrated for the complete influenza hemagglutinin fusion domain HAfp23. The partial absence of such a NOE pattern along the amino acid sequence as shown for a truncated variant of HAfp23 and for the Escherichia coli inner membrane protein YidH indicates the presence of large tertiary structure fluctuations such as an opening between helices or the presence of large rotational dynamics of the helices. Detergent-protein NOEs thus appear to be a straightforward probe for a qualitative characterization of structural and dynamical properties of membrane proteins embedded in detergent micelles.
Nonequilibrium stabilization of an RNA/protein droplet emulsion by nuclear actin
NASA Astrophysics Data System (ADS)
Brangwynne, Clifford
2013-03-01
Actin plays a structural role in the cytoplasm. However, actin takes on new functions and structures in the nucleus that are poorly understood. The nuclei of the large oocytes of the frog X. laevisspecifically accumulate actin to reach high concentrations; however, it remains unclear if this actin polymerizes into a network, and what, if any, structural role such an actin network might play. Here, we use microrheological and confocal imaging techniques to probe the local architecture and mechanics of the nucleus. Our data show that actin forms a weak network that spatially organizes the nucleus by kinetically stabilizing embedded liquid-like RNA/protein bodies which are important for cell growth. In actin-disrupted nuclei this RNA/protein droplet emulsion is destabilized leading to homotypic coalescence into single large droplets. Our data provide intriguing new insights into why large cell nuclei require an actin-based structural scaffold.
FT-IR Study Reveals Intrinsically Disordered Nature of Heat Shock Protein 90
NASA Astrophysics Data System (ADS)
Xie, Aihua; Neto, David; Balch, Maurie; Hendriks, Johnny; Causey, Oliver; Deng, Junpeng; Matts, Robert
Heat shock protein 90 (Hsp90) is a highly conserved chaperone protein that enables the proper folding of a large number of structurally diverse proteins (a.k.a., clients) in the crowded cytosolic environment and plays a key role in regulating the heat shock response. A long standing open question is how Hsp90 accommodates the structural diversity of a large cohort of client proteins? We report ATR FTIR study on structural properties of Hsp90 C-terminal domain (CTD) and their temperature dependences. Effects of temperature on Hsp90 structure are dissected into the C-terminal domain (CTD) and the N-terminal/middle domain (NTMD). One of our major findings reveals that within a narrow temperature window across the physiological temperatures (35 to 45 C), Hsp90CTD exhibits significant increases in protein aggregation and increases in unordered structures. Despite the intrinsically disordered nature of Hsp90CTD, it retains a protected hydrophobic core at 40 C. Implications of these results will be discussed in the light of the structural dynamics and client diversity of Hsp90. AX is grateful for Grant supports from OCAST HR10-078 and NSF MRI DBI1338097.
Rysavy, Steven J; Beck, David A C; Daggett, Valerie
2014-11-01
Protein function is intimately linked to protein structure and dynamics yet experimentally determined structures frequently omit regions within a protein due to indeterminate data, which is often due protein dynamics. We propose that atomistic molecular dynamics simulations provide a diverse sampling of biologically relevant structures for these missing segments (and beyond) to improve structural modeling and structure prediction. Here we make use of the Dynameomics data warehouse, which contains simulations of representatives of essentially all known protein folds. We developed novel computational methods to efficiently identify, rank and retrieve small peptide structures, or fragments, from this database. We also created a novel data model to analyze and compare large repositories of structural data, such as contained within the Protein Data Bank and the Dynameomics data warehouse. Our evaluation compares these structural repositories for improving loop predictions and analyzes the utility of our methods and models. Using a standard set of loop structures, containing 510 loops, 30 for each loop length from 4 to 20 residues, we find that the inclusion of Dynameomics structures in fragment-based methods improves the quality of the loop predictions without being dependent on sequence homology. Depending on loop length, ∼ 25-75% of the best predictions came from the Dynameomics set, resulting in lower main chain root-mean-square deviations for all fragment lengths using the combined fragment library. We also provide specific cases where Dynameomics fragments provide better predictions for NMR loop structures than fragments from crystal structures. Online access to these fragment libraries is available at http://www.dynameomics.org/fragments. © 2014 The Protein Society.
Protein family clustering for structural genomics.
Yan, Yongpan; Moult, John
2005-10-28
A major goal of structural genomics is the provision of a structural template for a large fraction of protein domains. The magnitude of this task depends on the number and nature of protein sequence families. With a large number of bacterial genomes now fully sequenced, it is possible to obtain improved estimates of the number and diversity of families in that kingdom. We have used an automated clustering procedure to group all sequences in a set of genomes into protein families. Bench-marking shows the clustering method is sensitive at detecting remote family members, and has a low level of false positives. This comprehensive protein family set has been used to address the following questions. (1) What is the structure coverage for currently known families? (2) How will the number of known apparent families grow as more genomes are sequenced? (3) What is a practical strategy for maximizing structure coverage in future? Our study indicates that approximately 20% of known families with three or more members currently have a representative structure. The study indicates also that the number of apparent protein families will be considerably larger than previously thought: We estimate that, by the criteria of this work, there will be about 250,000 protein families when 1000 microbial genomes have been sequenced. However, the vast majority of these families will be small, and it will be possible to obtain structural templates for 70-80% of protein domains with an achievable number of representative structures, by systematically sampling the larger families.
NASA Astrophysics Data System (ADS)
Santos, Marlus Alves Dos; Teixeira, Francesco Brugnera; Moreira, Heline Hellen Teixeira; Rodrigues, Adele Aud; Machado, Fabrício Castro; Clemente, Tatiana Mordente; Brigido, Paula Cristina; Silva, Rebecca Tavares E.; Purcino, Cecílio; Gomes, Rafael Gonçalves Barbosa; Bahia, Diana; Mortara, Renato Arruda; Munte, Claudia Elisabeth; Horjales, Eduardo; da Silva, Claudio Vieira
2014-03-01
Structural studies of proteins normally require large quantities of pure material that can only be obtained through heterologous expression systems and recombinant technique. In these procedures, large amounts of expressed protein are often found in the insoluble fraction, making protein purification from the soluble fraction inefficient, laborious, and costly. Usually, protein refolding is avoided due to a lack of experimental assays that can validate correct folding and that can compare the conformational population to that of the soluble fraction. Herein, we propose a validation method using simple and rapid 1D 1H nuclear magnetic resonance (NMR) spectra that can efficiently compare protein samples, including individual information of the environment of each proton in the structure.
A thermodynamic definition of protein domains.
Porter, Lauren L; Rose, George D
2012-06-12
Protein domains are conspicuous structural units in globular proteins, and their identification has been a topic of intense biochemical interest dating back to the earliest crystal structures. Numerous disparate domain identification algorithms have been proposed, all involving some combination of visual intuition and/or structure-based decomposition. Instead, we present a rigorous, thermodynamically-based approach that redefines domains as cooperative chain segments. In greater detail, most small proteins fold with high cooperativity, meaning that the equilibrium population is dominated by completely folded and completely unfolded molecules, with a negligible subpopulation of partially folded intermediates. Here, we redefine structural domains in thermodynamic terms as cooperative folding units, based on m-values, which measure the cooperativity of a protein or its substructures. In our analysis, a domain is equated to a contiguous segment of the folded protein whose m-value is largely unaffected when that segment is excised from its parent structure. Defined in this way, a domain is a self-contained cooperative unit; i.e., its cooperativity depends primarily upon intrasegment interactions, not intersegment interactions. Implementing this concept computationally, the domains in a large representative set of proteins were identified; all exhibit consistency with experimental findings. Specifically, our domain divisions correspond to the experimentally determined equilibrium folding intermediates in a set of nine proteins. The approach was also proofed against a representative set of 71 additional proteins, again with confirmatory results. Our reframed interpretation of a protein domain transforms an indeterminate structural phenomenon into a quantifiable molecular property grounded in solution thermodynamics.
Lessons on RNA Silencing Mechanisms in Plants from Eukaryotic Argonaute Structures[W
Poulsen, Christian; Vaucheret, Hervé; Brodersen, Peter
2013-01-01
RNA silencing refers to a collection of gene regulatory mechanisms that use small RNAs for sequence specific repression. These mechanisms rely on ARGONAUTE (AGO) proteins that directly bind small RNAs and thereby constitute the central component of the RNA-induced silencing complex (RISC). AGO protein function has been probed extensively by mutational analyses, particularly in plants where large allelic series of several AGO proteins have been isolated. Structures of entire human and yeast AGO proteins have only very recently been obtained, and they allow more precise analyses of functional consequences of mutations obtained by forward genetics. To a large extent, these analyses support current models of regions of particular functional importance of AGO proteins. Interestingly, they also identify previously unrecognized parts of AGO proteins with profound structural and functional importance and provide the first hints at structural elements that have important functions specific to individual AGO family members. A particularly important outcome of the analysis concerns the evidence for existence of Gly-Trp (GW) repeat interactors of AGO proteins acting in the plant microRNA pathway. The parallel analysis of AGO structures and plant AGO mutations also suggests that such interactions with GW proteins may be a determinant of whether an endonucleolytically competent RISC is formed. PMID:23303917
Lessons on RNA silencing mechanisms in plants from eukaryotic argonaute structures.
Poulsen, Christian; Vaucheret, Hervé; Brodersen, Peter
2013-01-01
RNA silencing refers to a collection of gene regulatory mechanisms that use small RNAs for sequence specific repression. These mechanisms rely on ARGONAUTE (AGO) proteins that directly bind small RNAs and thereby constitute the central component of the RNA-induced silencing complex (RISC). AGO protein function has been probed extensively by mutational analyses, particularly in plants where large allelic series of several AGO proteins have been isolated. Structures of entire human and yeast AGO proteins have only very recently been obtained, and they allow more precise analyses of functional consequences of mutations obtained by forward genetics. To a large extent, these analyses support current models of regions of particular functional importance of AGO proteins. Interestingly, they also identify previously unrecognized parts of AGO proteins with profound structural and functional importance and provide the first hints at structural elements that have important functions specific to individual AGO family members. A particularly important outcome of the analysis concerns the evidence for existence of Gly-Trp (GW) repeat interactors of AGO proteins acting in the plant microRNA pathway. The parallel analysis of AGO structures and plant AGO mutations also suggests that such interactions with GW proteins may be a determinant of whether an endonucleolytically competent RISC is formed.
3D-SURFER 2.0: web platform for real-time search and characterization of protein surfaces.
Xiong, Yi; Esquivel-Rodriguez, Juan; Sael, Lee; Kihara, Daisuke
2014-01-01
The increasing number of uncharacterized protein structures necessitates the development of computational approaches for function annotation using the protein tertiary structures. Protein structure database search is the basis of any structure-based functional elucidation of proteins. 3D-SURFER is a web platform for real-time protein surface comparison of a given protein structure against the entire PDB using 3D Zernike descriptors. It can smoothly navigate the protein structure space in real-time from one query structure to another. A major new feature of Release 2.0 is the ability to compare the protein surface of a single chain, a single domain, or a single complex against databases of protein chains, domains, complexes, or a combination of all three in the latest PDB. Additionally, two types of protein structures can now be compared: all-atom-surface and backbone-atom-surface. The server can also accept a batch job for a large number of database searches. Pockets in protein surfaces can be identified by VisGrid and LIGSITE (csc) . The server is available at http://kiharalab.org/3d-surfer/.
Solution structure and interactions of the Escherichia coli cell division activator protein CedA.
Chen, Ho An; Simpson, Peter; Huyton, Trevor; Roper, David; Matthews, Stephen
2005-05-10
CedA is a protein that is postulated to be involved in the regulation of cell division in Escherichia coli and related organisms; however, little biological data about its possible mode of action are available. Here we present a three-dimensional structure of this protein as determined by NMR spectroscopy. The protein is made up of four antiparallel beta-strands, an alpha-helix, and a large unstructured stretch of residues at the N-terminus. It shows structural similarity to a family of DNA-binding proteins which interact with dsDNA via a three-stranded beta-sheet, suggesting that CedA may be a DNA-binding protein. The putative binding surface of CedA is predominantly positively charged with a number of basic residues surrounding a groove largely dominated by aromatic residues. NMR chemical shift perturbations and gel-shift experiments performed with CedA confirm that the protein binds dsDNA, and its interaction is mediated primarily via the beta-sheet.
2000-05-05
This computer graphic depicts the relative complexity of crystallizing large proteins in order to study their structures through x-ray crystallography. Insulin is a vital protein whose structure has several subtle points that scientists are still trying to determine. Large molecules such as insuline are complex with structures that are comparatively difficult to understand. For comparison, a sugar molecule (which many people have grown as hard crystals in science glass) and a water molecule are shown. These images were produced with the Macmolecule program. Photo credit: NASA/Marshall Space Flight Center (MSFC)
Protein homology model refinement by large-scale energy optimization.
Park, Hahnbeom; Ovchinnikov, Sergey; Kim, David E; DiMaio, Frank; Baker, David
2018-03-20
Proteins fold to their lowest free-energy structures, and hence the most straightforward way to increase the accuracy of a partially incorrect protein structure model is to search for the lowest-energy nearby structure. This direct approach has met with little success for two reasons: first, energy function inaccuracies can lead to false energy minima, resulting in model degradation rather than improvement; and second, even with an accurate energy function, the search problem is formidable because the energy only drops considerably in the immediate vicinity of the global minimum, and there are a very large number of degrees of freedom. Here we describe a large-scale energy optimization-based refinement method that incorporates advances in both search and energy function accuracy that can substantially improve the accuracy of low-resolution homology models. The method refined low-resolution homology models into correct folds for 50 of 84 diverse protein families and generated improved models in recent blind structure prediction experiments. Analyses of the basis for these improvements reveal contributions from both the improvements in conformational sampling techniques and the energy function.
Sinz, Andrea
2018-05-28
Structural mass spectrometry (MS) is gaining increasing importance for deriving valuable three-dimensional structural information on proteins and protein complexes, and it complements existing techniques, such as NMR spectroscopy and X-ray crystallography. Structural MS unites different MS-based techniques, such as hydrogen/deuterium exchange, native MS, ion-mobility MS, protein footprinting, and chemical cross-linking/MS, and it allows fundamental questions in structural biology to be addressed. In this Minireview, I will focus on the cross-linking/MS strategy. This method not only delivers tertiary structural information on proteins, but is also increasingly being used to decipher protein interaction networks, both in vitro and in vivo. Cross-linking/MS is currently one of the most promising MS-based approaches to derive structural information on very large and transient protein assemblies and intrinsically disordered proteins. © 2018 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.
Dissecting the relationship between protein structure and sequence variation
NASA Astrophysics Data System (ADS)
Shahmoradi, Amir; Wilke, Claus; Wilke Lab Team
2015-03-01
Over the past decade several independent works have shown that some structural properties of proteins are capable of predicting protein evolution. The strength and significance of these structure-sequence relations, however, appear to vary widely among different proteins, with absolute correlation strengths ranging from 0 . 1 to 0 . 8 . Here we present the results from a comprehensive search for the potential biophysical and structural determinants of protein evolution by studying more than 200 structural and evolutionary properties in a dataset of 209 monomeric enzymes. We discuss the main protein characteristics responsible for the general patterns of protein evolution, and identify sequence divergence as the main determinant of the strengths of virtually all structure-evolution relationships, explaining ~ 10 - 30 % of observed variation in sequence-structure relations. In addition to sequence divergence, we identify several protein structural properties that are moderately but significantly coupled with the strength of sequence-structure relations. In particular, proteins with more homogeneous back-bone hydrogen bond energies, large fractions of helical secondary structures and low fraction of beta sheets tend to have the strongest sequence-structure relation. BEACON-NSF center for the study of evolution in action.
Wang, Nanyi; Wang, Lirong; Xie, Xiang-Qun
2017-11-27
Molecular docking is widely applied to computer-aided drug design and has become relatively mature in the recent decades. Application of docking in modeling varies from single lead compound optimization to large-scale virtual screening. The performance of molecular docking is highly dependent on the protein structures selected. It is especially challenging for large-scale target prediction research when multiple structures are available for a single target. Therefore, we have established ProSelection, a docking preferred-protein selection algorithm, in order to generate the proper structure subset(s). By the ProSelection algorithm, protein structures of "weak selectors" are filtered out whereas structures of "strong selectors" are kept. Specifically, the structure which has a good statistical performance of distinguishing active ligands from inactive ligands is defined as a strong selector. In this study, 249 protein structures of 14 autophagy-related targets are investigated. Surflex-dock was used as the docking engine to distinguish active and inactive compounds against these protein structures. Both t test and Mann-Whitney U test were used to distinguish the strong from the weak selectors based on the normality of the docking score distribution. The suggested docking score threshold for active ligands (SDA) was generated for each strong selector structure according to the receiver operating characteristic (ROC) curve. The performance of ProSelection was further validated by predicting the potential off-targets of 43 U.S. Federal Drug Administration approved small molecule antineoplastic drugs. Overall, ProSelection will accelerate the computational work in protein structure selection and could be a useful tool for molecular docking, target prediction, and protein-chemical database establishment research.
NASA Astrophysics Data System (ADS)
Finkelstein, A. V.; Galzitskaya, O. V.
2004-04-01
Protein physics is grounded on three fundamental experimental facts: protein, this long heteropolymer, has a well defined compact three-dimensional structure; this structure can spontaneously arise from the unfolded protein chain in appropriate environment; and this structure is separated from the unfolded state of the chain by the “all-or-none” phase transition, which ensures robustness of protein structure and therefore of its action. The aim of this review is to consider modern understanding of physical principles of self-organization of protein structures and to overview such important features of this process, as finding out the unique protein structure among zillions alternatives, nucleation of the folding process and metastable folding intermediates. Towards this end we will consider the main experimental facts and simple, mostly phenomenological theoretical models. We will concentrate on relatively small (single-domain) water-soluble globular proteins (whose structure and especially folding are much better studied and understood than those of large or membrane and fibrous proteins) and consider kinetic and structural aspects of transition of initially unfolded protein chains into their final solid (“native”) 3D structures.
Protein flexibility in the light of structural alphabets
Craveur, Pierrick; Joseph, Agnel P.; Esque, Jeremy; Narwani, Tarun J.; Noël, Floriane; Shinada, Nicolas; Goguet, Matthieu; Leonard, Sylvain; Poulain, Pierre; Bertrand, Olivier; Faure, Guilhem; Rebehmed, Joseph; Ghozlane, Amine; Swapna, Lakshmipuram S.; Bhaskara, Ramachandra M.; Barnoud, Jonathan; Téletchéa, Stéphane; Jallu, Vincent; Cerny, Jiri; Schneider, Bohdan; Etchebest, Catherine; Srinivasan, Narayanaswamy; Gelly, Jean-Christophe; de Brevern, Alexandre G.
2015-01-01
Protein structures are valuable tools to understand protein function. Nonetheless, proteins are often considered as rigid macromolecules while their structures exhibit specific flexibility, which is essential to complete their functions. Analyses of protein structures and dynamics are often performed with a simplified three-state description, i.e., the classical secondary structures. More precise and complete description of protein backbone conformation can be obtained using libraries of small protein fragments that are able to approximate every part of protein structures. These libraries, called structural alphabets (SAs), have been widely used in structure analysis field, from definition of ligand binding sites to superimposition of protein structures. SAs are also well suited to analyze the dynamics of protein structures. Here, we review innovative approaches that investigate protein flexibility based on SAs description. Coupled to various sources of experimental data (e.g., B-factor) and computational methodology (e.g., Molecular Dynamic simulation), SAs turn out to be powerful tools to analyze protein dynamics, e.g., to examine allosteric mechanisms in large set of structures in complexes, to identify order/disorder transition. SAs were also shown to be quite efficient to predict protein flexibility from amino-acid sequence. Finally, in this review, we exemplify the interest of SAs for studying flexibility with different cases of proteins implicated in pathologies and diseases. PMID:26075209
Collier, James H; Lesk, Arthur M; Garcia de la Banda, Maria; Konagurthu, Arun S
2012-07-01
Searching for well-fitting 3D oligopeptide fragments within a large collection of protein structures is an important task central to many analyses involving protein structures. This article reports a new web server, Super, dedicated to the task of rapidly screening the protein data bank (PDB) to identify all fragments that superpose with a query under a prespecified threshold of root-mean-square deviation (RMSD). Super relies on efficiently computing a mathematical bound on the commonly used structural similarity measure, RMSD of superposition. This allows the server to filter out a large proportion of fragments that are unrelated to the query; >99% of the total number of fragments in some cases. For a typical query, Super scans the current PDB containing over 80,500 structures (with ∼40 million potential oligopeptide fragments to match) in under a minute. Super web server is freely accessible from: http://lcb.infotech.monash.edu.au/super.
Kinjo, Akira R.; Bekker, Gert-Jan; Suzuki, Hirofumi; Tsuchiya, Yuko; Kawabata, Takeshi; Ikegawa, Yasuyo; Nakamura, Haruki
2017-01-01
The Protein Data Bank Japan (PDBj, http://pdbj.org), a member of the worldwide Protein Data Bank (wwPDB), accepts and processes the deposited data of experimentally determined macromolecular structures. While maintaining the archive in collaboration with other wwPDB partners, PDBj also provides a wide range of services and tools for analyzing structures and functions of proteins. We herein outline the updated web user interfaces together with RESTful web services and the backend relational database that support the former. To enhance the interoperability of the PDB data, we have previously developed PDB/RDF, PDB data in the Resource Description Framework (RDF) format, which is now a wwPDB standard called wwPDB/RDF. We have enhanced the connectivity of the wwPDB/RDF data by incorporating various external data resources. Services for searching, comparing and analyzing the ever-increasing large structures determined by hybrid methods are also described. PMID:27789697
Computational design of chimeric protein libraries for directed evolution.
Silberg, Jonathan J; Nguyen, Peter Q; Stevenson, Taylor
2010-01-01
The best approach for creating libraries of functional proteins with large numbers of nondisruptive amino acid substitutions is protein recombination, in which structurally related polypeptides are swapped among homologous proteins. Unfortunately, as more distantly related proteins are recombined, the fraction of variants having a disrupted structure increases. One way to enrich the fraction of folded and potentially interesting chimeras in these libraries is to use computational algorithms to anticipate which structural elements can be swapped without disturbing the integrity of a protein's structure. Herein, we describe how the algorithm Schema uses the sequences and structures of the parent proteins recombined to predict the structural disruption of chimeras, and we outline how dynamic programming can be used to find libraries with a range of amino acid substitution levels that are enriched in variants with low Schema disruption.
G-LoSA for Prediction of Protein-Ligand Binding Sites and Structures.
Lee, Hui Sun; Im, Wonpil
2017-01-01
Recent advances in high-throughput structure determination and computational protein structure prediction have significantly enriched the universe of protein structure. However, there is still a large gap between the number of available protein structures and that of proteins with annotated function in high accuracy. Computational structure-based protein function prediction has emerged to reduce this knowledge gap. The identification of a ligand binding site and its structure is critical to the determination of a protein's molecular function. We present a computational methodology for predicting small molecule ligand binding site and ligand structure using G-LoSA, our protein local structure alignment and similarity measurement tool. All the computational procedures described here can be easily implemented using G-LoSA Toolkit, a package of standalone software programs and preprocessed PDB structure libraries. G-LoSA and G-LoSA Toolkit are freely available to academic users at http://compbio.lehigh.edu/GLoSA . We also illustrate a case study to show the potential of our template-based approach harnessing G-LoSA for protein function prediction.
Design of structurally distinct proteins using strategies inspired by evolution
Jacobs, T. M.; Williams, B.; Williams, T.; ...
2016-05-06
Natural recombination combines pieces of preexisting proteins to create new tertiary structures and functions. In this paper, we describe a computational protocol, called SEWING, which is inspired by this process and builds new proteins from connected or disconnected pieces of existing structures. Helical proteins designed with SEWING contain structural features absent from other de novo designed proteins and, in some cases, remain folded at more than 100°C. High-resolution structures of the designed proteins CA01 and DA05R1 were solved by x-ray crystallography (2.2 angstrom resolution) and nuclear magnetic resonance, respectively, and there was excellent agreement with the design models. Finally, thismore » method provides a new strategy to rapidly create large numbers of diverse and designable protein scaffolds.« less
Free-Energy Landscape of Protein-Ligand Interactions Coupled with Protein Structural Changes.
Moritsugu, Kei; Terada, Tohru; Kidera, Akinori
2017-02-02
Protein-ligand interactions are frequently coupled with protein structural changes. Focusing on the coupling, we present the free-energy surface (FES) of the ligand-binding process for glutamine-binding protein (GlnBP) and its ligand, glutamine, in which glutamine binding accompanies large-scale domain closure. All-atom simulations were performed in explicit solvents by multiscale enhanced sampling (MSES), which adopts a multicopy and multiscale scheme to achieve enhanced sampling of systems with a large number of degrees of freedom. The structural ensemble derived from the MSES simulation yielded the FES of the coupling, described in terms of both the ligand's and protein's degrees of freedom at atomic resolution, and revealed the tight coupling between the two degrees of freedom. The derived FES led to the determination of definite structural states, which suggested the dominant pathways of glutamine binding to GlnBP: first, glutamine migrates via diffusion to form a dominant encounter complex with Arg75 on the large domain of GlnBP, through strong polar interactions. Subsequently, the closing motion of GlnBP occurs to form ligand interactions with the small domain, finally completing the native-specific complex structure. The formation of hydrogen bonds between glutamine and the small domain is considered to be a rate-limiting step, inducing desolvation of the protein-ligand interface to form the specific native complex. The key interactions to attain high specificity for glutamine, the "door keeper" existing between the two domains (Asp10-Lys115) and the "hydrophobic sandwich" formed between the ligand glutamine and Phe13/Phe50, have been successfully mapped on the pathway derived from the FES.
Rysavy, Steven J; Beck, David AC; Daggett, Valerie
2014-01-01
Protein function is intimately linked to protein structure and dynamics yet experimentally determined structures frequently omit regions within a protein due to indeterminate data, which is often due protein dynamics. We propose that atomistic molecular dynamics simulations provide a diverse sampling of biologically relevant structures for these missing segments (and beyond) to improve structural modeling and structure prediction. Here we make use of the Dynameomics data warehouse, which contains simulations of representatives of essentially all known protein folds. We developed novel computational methods to efficiently identify, rank and retrieve small peptide structures, or fragments, from this database. We also created a novel data model to analyze and compare large repositories of structural data, such as contained within the Protein Data Bank and the Dynameomics data warehouse. Our evaluation compares these structural repositories for improving loop predictions and analyzes the utility of our methods and models. Using a standard set of loop structures, containing 510 loops, 30 for each loop length from 4 to 20 residues, we find that the inclusion of Dynameomics structures in fragment-based methods improves the quality of the loop predictions without being dependent on sequence homology. Depending on loop length, ∼25–75% of the best predictions came from the Dynameomics set, resulting in lower main chain root-mean-square deviations for all fragment lengths using the combined fragment library. We also provide specific cases where Dynameomics fragments provide better predictions for NMR loop structures than fragments from crystal structures. Online access to these fragment libraries is available at http://www.dynameomics.org/fragments. PMID:25142412
Expanding protein universe and its origin from the biological Big Bang.
Dokholyan, Nikolay V; Shakhnovich, Boris; Shakhnovich, Eugene I
2002-10-29
The bottom-up approach to understanding the evolution of organisms is by studying molecular evolution. With the large number of protein structures identified in the past decades, we have discovered peculiar patterns that nature imprints on protein structural space in the course of evolution. In particular, we have discovered that the universe of protein structures is organized hierarchically into a scale-free network. By understanding the cause of these patterns, we attempt to glance at the very origin of life.
Pan, Joshua; Meyers, Robin M; Michel, Brittany C; Mashtalir, Nazar; Sizemore, Ann E; Wells, Jonathan N; Cassel, Seth H; Vazquez, Francisca; Weir, Barbara A; Hahn, William C; Marsh, Joseph A; Tsherniak, Aviad; Kadoch, Cigall
2018-05-23
Protein complexes are assemblies of subunits that have co-evolved to execute one or many coordinated functions in the cellular environment. Functional annotation of mammalian protein complexes is critical to understanding biological processes, as well as disease mechanisms. Here, we used genetic co-essentiality derived from genome-scale RNAi- and CRISPR-Cas9-based fitness screens performed across hundreds of human cancer cell lines to assign measures of functional similarity. From these measures, we systematically built and characterized functional similarity networks that recapitulate known structural and functional features of well-studied protein complexes and resolve novel functional modules within complexes lacking structural resolution, such as the mammalian SWI/SNF complex. Finally, by integrating functional networks with large protein-protein interaction networks, we discovered novel protein complexes involving recently evolved genes of unknown function. Taken together, these findings demonstrate the utility of genetic perturbation screens alone, and in combination with large-scale biophysical data, to enhance our understanding of mammalian protein complexes in normal and disease states. Copyright © 2018 The Authors. Published by Elsevier Inc. All rights reserved.
MDC1: The art of keeping things in focus.
Jungmichel, Stephanie; Stucki, Manuel
2010-08-01
The chromatin structure is important for recognition and repair of DNA damage. Many DNA damage response proteins accumulate in large chromatin domains flanking sites of DNA double-strand breaks. The assembly of these structures-usually termed DNA damage foci-is primarily regulated by MDC1, a large nuclear mediator/adaptor protein that is composed of several distinct structural and functional domains. Here, we are summarizing the latest discoveries about the mechanisms by which MDC1 mediates DNA damage foci formation, and we are reviewing the considerable efforts taken to understand the functional implication of these structures.
The Origin and Early Evolution of Membrane Proteins
NASA Technical Reports Server (NTRS)
Pohorille, Andrew; Schweighofer, Karl; Wilson, Michael A.
2005-01-01
Membrane proteins mediate functions that are essential to all cells. These functions include transport of ions, nutrients and waste products across cell walls, capture of energy and its transduction into the form usable in chemical reactions, transmission of environmental signals to the interior of the cell, cellular growth and cell volume regulation. In the absence of membrane proteins, ancestors of cell (protocells), would have had only very limited capabilities to communicate with their environment. Thus, it is not surprising that membrane proteins are quite common even in simplest prokaryotic cells. Considering that contemporary membrane channels are large and complex, both structurally and functionally, a question arises how their presumably much simpler ancestors could have emerged, perform functions and diversify in early protobiological evolution. Remarkably, despite their overall complexity, structural motifs in membrane proteins are quite simple, with a-helices being most common. This suggests that these proteins might have evolved from simple building blocks. To explain how these blocks could have organized into functional structures, we performed large-scale, accurate computer simulations of folding peptides at a water-membrane interface, their insertion into the membrane, self-assembly into higher-order structures and function. The results of these simulations, combined with analysis of structural and functional experimental data led to the first integrated view of the origin and early evolution of membrane proteins.
Benchmark data sets for structure-based computational target prediction.
Schomburg, Karen T; Rarey, Matthias
2014-08-25
Structure-based computational target prediction methods identify potential targets for a bioactive compound. Methods based on protein-ligand docking so far face many challenges, where the greatest probably is the ranking of true targets in a large data set of protein structures. Currently, no standard data sets for evaluation exist, rendering comparison and demonstration of improvements of methods cumbersome. Therefore, we propose two data sets and evaluation strategies for a meaningful evaluation of new target prediction methods, i.e., a small data set consisting of three target classes for detailed proof-of-concept and selectivity studies and a large data set consisting of 7992 protein structures and 72 drug-like ligands allowing statistical evaluation with performance metrics on a drug-like chemical space. Both data sets are built from openly available resources, and any information needed to perform the described experiments is reported. We describe the composition of the data sets, the setup of screening experiments, and the evaluation strategy. Performance metrics capable to measure the early recognition of enrichments like AUC, BEDROC, and NSLR are proposed. We apply a sequence-based target prediction method to the large data set to analyze its content of nontrivial evaluation cases. The proposed data sets are used for method evaluation of our new inverse screening method iRAISE. The small data set reveals the method's capability and limitations to selectively distinguish between rather similar protein structures. The large data set simulates real target identification scenarios. iRAISE achieves in 55% excellent or good enrichment a median AUC of 0.67 and RMSDs below 2.0 Å for 74% and was able to predict the first true target in 59 out of 72 cases in the top 2% of the protein data set of about 8000 structures.
How large B-factors can be in protein crystal structures.
Carugo, Oliviero
2018-02-23
Protein crystal structures are potentially over-interpreted since they are routinely refined without any restraint on the upper limit of atomic B-factors. Consequently, some of their atoms, undetected in the electron density maps, are allowed to reach extremely large B-factors, even above 100 square Angstroms, and their final positions are purely speculative and not based on any experimental evidence. A strategy to define B-factors upper limits is described here, based on the analysis of protein crystal structures deposited in the Protein Data Bank prior 2008, when the tendency to allow B-factor to arbitrary inflate was limited. This B-factor upper limit (B_max) is determined by extrapolating the relationship between crystal structure average B-factor and percentage of crystal volume occupied by solvent (pcVol) to pcVol =100%, when, ab absurdo, the crystal contains only liquid solvent, the structure of which is, by definition, undetectable in electron density maps. It is thus possible to highlight structures with average B-factors larger than B_max, which should be considered with caution by the users of the information deposited in the Protein Data Bank, in order to avoid scientifically deleterious over-interpretations.
There is Diversity in Disorder-"In all Chaos there is a Cosmos, in all Disorder a Secret Order".
Nielsen, Jakob T; Mulder, Frans A A
2016-01-01
The protein universe consists of a continuum of structures ranging from full order to complete disorder. As the structured part of the proteome has been intensively studied, stably folded proteins are increasingly well documented and understood. However, proteins that are fully, or in large part, disordered are much less well characterized. Here we collected NMR chemical shifts in a small database for 117 protein sequences that are known to contain disorder. We demonstrate that NMR chemical shift data can be brought to bear as an exquisite judge of protein disorder at the residue level, and help in validation. With the help of secondary chemical shift analysis we demonstrate that the proteins in the database span the full spectrum of disorder, but still, largely segregate into two classes; disordered with small segments of order scattered along the sequence, and structured with small segments of disorder inserted between the different structured regions. A detailed analysis reveals that the distribution of order/disorder along the sequence shows a complex and asymmetric distribution, that is highly protein-dependent. Access to ratified training data further suggests an avenue to improving prediction of disorder from sequence.
Advances in Homology Protein Structure Modeling
Xiang, Zhexin
2007-01-01
Homology modeling plays a central role in determining protein structure in the structural genomics project. The importance of homology modeling has been steadily increasing because of the large gap that exists between the overwhelming number of available protein sequences and experimentally solved protein structures, and also, more importantly, because of the increasing reliability and accuracy of the method. In fact, a protein sequence with over 30% identity to a known structure can often be predicted with an accuracy equivalent to a low-resolution X-ray structure. The recent advances in homology modeling, especially in detecting distant homologues, aligning sequences with template structures, modeling of loops and side chains, as well as detecting errors in a model, have contributed to reliable prediction of protein structure, which was not possible even several years ago. The ongoing efforts in solving protein structures, which can be time-consuming and often difficult, will continue to spur the development of a host of new computational methods that can fill in the gap and further contribute to understanding the relationship between protein structure and function. PMID:16787261
Yeh, Chun-Ting; Brunette, T J; Baker, David; McIntosh-Smith, Simon; Parmeggiani, Fabio
2018-02-01
Computational protein design methods have enabled the design of novel protein structures, but they are often still limited to small proteins and symmetric systems. To expand the size of designable proteins while controlling the overall structure, we developed Elfin, a genetic algorithm for the design of novel proteins with custom shapes using structural building blocks derived from experimentally verified repeat proteins. By combining building blocks with compatible interfaces, it is possible to rapidly build non-symmetric large structures (>1000 amino acids) that match three-dimensional geometric descriptions provided by the user. A run time of about 20min on a laptop computer for a 3000 amino acid structure makes Elfin accessible to users with limited computational resources. Protein structures with controlled geometry will allow the systematic study of the effect of spatial arrangement of enzymes and signaling molecules, and provide new scaffolds for functional nanomaterials. Copyright © 2017 Elsevier Inc. All rights reserved.
Structural basis of viral invasion: lessons from paramyxovirus F
Lamb, Robert A.; Jardetzky, Theodore S.
2007-01-01
Summary The structures of glycoproteins that mediate enveloped virus entry into cells have revealed dramatic structural changes that accompany membrane fusion and provided mechanistic insights into this process. The group of class I viral fusion proteins includes the influenza hemagglutinin, paramyxovirus F, HIV env and other mechanistically related fusogens, but these proteins are unrelated in sequence and exhibit clearly distinct structural features. Recently determined crystal structures of the paramyxovirus F protein in two conformations, representing prefusion and postfusion states, reveal a novel protein architecture that undergoes large-scale, irreversible refolding during membrane fusion, extending our understanding of this diverse group of membrane fusion machines. PMID:17870467
Relative Sizes of Organic Molecules
NASA Technical Reports Server (NTRS)
2000-01-01
This computer graphic depicts the relative complexity of crystallizing large proteins in order to study their structures through x-ray crystallography. Insulin is a vital protein whose structure has several subtle points that scientists are still trying to determine. Large molecules such as insuline are complex with structures that are comparatively difficult to understand. For comparison, a sugar molecule (which many people have grown as hard crystals in science glass) and a water molecule are shown. These images were produced with the Macmolecule program. Photo credit: NASA/Marshall Space Flight Center (MSFC)
MEGADOCK: An All-to-All Protein-Protein Interaction Prediction System Using Tertiary Structure Data
Ohue, Masahito; Matsuzaki, Yuri; Uchikoga, Nobuyuki; Ishida, Takashi; Akiyama, Yutaka
2014-01-01
The elucidation of protein-protein interaction (PPI) networks is important for understanding cellular structure and function and structure-based drug design. However, the development of an effective method to conduct exhaustive PPI screening represents a computational challenge. We have been investigating a protein docking approach based on shape complementarity and physicochemical properties. We describe here the development of the protein-protein docking software package “MEGADOCK” that samples an extremely large number of protein dockings at high speed. MEGADOCK reduces the calculation time required for docking by using several techniques such as a novel scoring function called the real Pairwise Shape Complementarity (rPSC) score. We showed that MEGADOCK is capable of exhaustive PPI screening by completing docking calculations 7.5 times faster than the conventional docking software, ZDOCK, while maintaining an acceptable level of accuracy. When MEGADOCK was applied to a subset of a general benchmark dataset to predict 120 relevant interacting pairs from 120 x 120 = 14,400 combinations of proteins, an F-measure value of 0.231 was obtained. Further, we showed that MEGADOCK can be applied to a large-scale protein-protein interaction-screening problem with accuracy better than random. When our approach is combined with parallel high-performance computing systems, it is now feasible to search and analyze protein-protein interactions while taking into account three-dimensional structures at the interactome scale. MEGADOCK is freely available at http://www.bi.cs.titech.ac.jp/megadock. PMID:23855673
Chemical cross-linking and native mass spectrometry: A fruitful combination for structural biology
Sinz, Andrea; Arlt, Christian; Chorev, Dror; Sharon, Michal
2015-01-01
Mass spectrometry (MS) is becoming increasingly popular in the field of structural biology for analyzing protein three-dimensional-structures and for mapping protein–protein interactions. In this review, the specific contributions of chemical crosslinking and native MS are outlined to reveal the structural features of proteins and protein assemblies. Both strategies are illustrated based on the examples of the tetrameric tumor suppressor protein p53 and multisubunit vinculin-Arp2/3 hybrid complexes. We describe the distinct advantages and limitations of each technique and highlight synergistic effects when both techniques are combined. Integrating both methods is especially useful for characterizing large protein assemblies and for capturing transient interactions. We also point out the future directions we foresee for a combination of in vivo crosslinking and native MS for structural investigation of intact protein assemblies. PMID:25970732
Lazim, Raudah; Mei, Ye; Zhang, Dawei
2012-03-01
Replica exchange molecular dynamics (REMD) simulation provides an efficient conformational sampling tool for the study of protein folding. In this study, we explore the mechanism directing the structure variation from α/4β-fold protein to 3α-fold protein after mutation by conducting REMD simulation on 42 replicas with temperatures ranging from 270 K to 710 K. The simulation began from a protein possessing the primary structure of GA88 but the tertiary structure of GB88, two G proteins with "high sequence identity." Albeit the large Cα-root mean square deviation (RMSD) of the folded protein (4.34 Å at 270 K and 4.75 Å at 304 K), a variation in tertiary structure was observed. Together with the analysis of secondary structure assignment, cluster analysis and principal component, it provides insights to the folding and unfolding pathway of 3α-fold protein and α/4β-fold protein respectively paving the way toward the understanding of the ongoings during conformational variation.
PBxplore: a tool to analyze local protein structure and deformability with Protein Blocks
Craveur, Pierrick; Joseph, Agnel Praveen; Jallu, Vincent
2017-01-01
This paper describes the development and application of a suite of tools, called PBxplore, to analyze the dynamics and deformability of protein structures using Protein Blocks (PBs). Proteins are highly dynamic macromolecules, and a classical way to analyze their inherent flexibility is to perform molecular dynamics simulations. The advantage of using small structural prototypes such as PBs is to give a good approximation of the local structure of the protein backbone. More importantly, by reducing the conformational complexity of protein structures, PBs allow analysis of local protein deformability which cannot be done with other methods and had been used efficiently in different applications. PBxplore is able to process large amounts of data such as those produced by molecular dynamics simulations. It produces frequencies, entropy and information logo outputs as text and graphics. PBxplore is available at https://github.com/pierrepo/PBxplore and is released under the open-source MIT license. PMID:29177113
Neutron protein crystallography: A complementary tool for locating hydrogens in proteins.
O'Dell, William B; Bodenheimer, Annette M; Meilleur, Flora
2016-07-15
Neutron protein crystallography is a powerful tool for investigating protein chemistry because it directly locates hydrogen atom positions in a protein structure. The visibility of hydrogen and deuterium atoms arises from the strong interaction of neutrons with the nuclei of these isotopes. Positions can be unambiguously assigned from diffraction at resolutions typical of protein crystals. Neutrons have the additional benefit to structural biology of not inducing radiation damage in protein crystals. The same crystal could be measured multiple times for parametric studies. Here, we review the basic principles of neutron protein crystallography. The information that can be gained from a neutron structure is presented in balance with practical considerations. Methods to produce isotopically-substituted proteins and to grow large crystals are provided in the context of neutron structures reported in the literature. Available instruments for data collection and software for data processing and structure refinement are described along with technique-specific strategies including joint X-ray/neutron structure refinement. Examples are given to illustrate, ultimately, the unique scientific value of neutron protein crystal structures. Copyright © 2015 Elsevier Inc. All rights reserved.
Principles of assembly reveal a periodic table of protein complexes.
Ahnert, Sebastian E; Marsh, Joseph A; Hernández, Helena; Robinson, Carol V; Teichmann, Sarah A
2015-12-11
Structural insights into protein complexes have had a broad impact on our understanding of biological function and evolution. In this work, we sought a comprehensive understanding of the general principles underlying quaternary structure organization in protein complexes. We first examined the fundamental steps by which protein complexes can assemble, using experimental and structure-based characterization of assembly pathways. Most assembly transitions can be classified into three basic types, which can then be used to exhaustively enumerate a large set of possible quaternary structure topologies. These topologies, which include the vast majority of observed protein complex structures, enable a natural organization of protein complexes into a periodic table. On the basis of this table, we can accurately predict the expected frequencies of quaternary structure topologies, including those not yet observed. These results have important implications for quaternary structure prediction, modeling, and engineering. Copyright © 2015, American Association for the Advancement of Science.
Investigating homology between proteins using energetic profiles.
Wrabl, James O; Hilser, Vincent J
2010-03-26
Accumulated experimental observations demonstrate that protein stability is often preserved upon conservative point mutation. In contrast, less is known about the effects of large sequence or structure changes on the stability of a particular fold. Almost completely unknown is the degree to which stability of different regions of a protein is generally preserved throughout evolution. In this work, these questions are addressed through thermodynamic analysis of a large representative sample of protein fold space based on remote, yet accepted, homology. More than 3,000 proteins were computationally analyzed using the structural-thermodynamic algorithm COREX/BEST. Estimated position-specific stability (i.e., local Gibbs free energy of folding) and its component enthalpy and entropy were quantitatively compared between all proteins in the sample according to all-vs.-all pairwise structural alignment. It was discovered that the local stabilities of homologous pairs were significantly more correlated than those of non-homologous pairs, indicating that local stability was indeed generally conserved throughout evolution. However, the position-specific enthalpy and entropy underlying stability were less correlated, suggesting that the overall regional stability of a protein was more important than the thermodynamic mechanism utilized to achieve that stability. Finally, two different types of statistically exceptional evolutionary structure-thermodynamic relationships were noted. First, many homologous proteins contained regions of similar thermodynamics despite localized structure change, suggesting a thermodynamic mechanism enabling evolutionary fold change. Second, some homologous proteins with extremely similar structures nonetheless exhibited different local stabilities, a phenomenon previously observed experimentally in this laboratory. These two observations, in conjunction with the principal conclusion that homologous proteins generally conserved local stability, may provide guidance for a future thermodynamically informed classification of protein homology.
How the folding rates of two- and multistate proteins depend on the amino acid properties.
Huang, Jitao T; Huang, Wei; Huang, Shanran R; Li, Xin
2014-10-01
Proteins fold by either two-state or multistate kinetic mechanism. We observe that amino acids play different roles in different mechanism. Many residues that are easy to form regular secondary structures (α helices, β sheets and turns) can promote the two-state folding reactions of small proteins. Most of hydrophilic residues can speed up the multistate folding reactions of large proteins. Folding rates of large proteins are equally responsive to the flexibility of partial amino acids. Other properties of amino acids (including volume, polarity, accessible surface, exposure degree, isoelectric point, and phase transfer energy) have contributed little to folding kinetics of the proteins. Cysteine is a special residue, it triggers two-state folding reaction and but inhibits multistate folding reaction. These findings not only provide a new insight into protein structure prediction, but also could be used to direct the point mutations that can change folding rate. © 2014 Wiley Periodicals, Inc.
G Protein-Coupled Receptor Rhodopsin: A Prospectus
Filipek, Sławomir; Stenkamp, Ronald E.; Teller, David C.; Palczewski, Krzysztof
2006-01-01
Rhodopsin is a retinal photoreceptor protein of bipartite structure consisting of the transmembrane protein opsin and a light-sensitive chromophore 11-cis-retinal, linked to opsin via a protonated Schiff base. Studies on rhodopsin have unveiled many structural and functional features that are common to a large and pharmacologically important group of proteins from the G protein-coupled receptor (GPCR) superfamily, of which rhodopsin is the best-studied member. In this work, we focus on structural features of rhodopsin as revealed by many biochemical and structural investigations. In particular, the high-resolution structure of bovine rhodopsin provides a template for understanding how GPCRs work. We describe the sensitivity and complexity of rhodopsin that lead to its important role in vision. PMID:12471166
Khafizov, Kamil; Madrid-Aliste, Carlos; Almo, Steven C; Fiser, Andras
2014-03-11
The exponential growth of protein sequence data provides an ever-expanding body of unannotated and misannotated proteins. The National Institutes of Health-supported Protein Structure Initiative and related worldwide structural genomics efforts facilitate functional annotation of proteins through structural characterization. Recently there have been profound changes in the taxonomic composition of sequence databases, which are effectively redefining the scope and contribution of these large-scale structure-based efforts. The faster-growing bacterial genomic entries have overtaken the eukaryotic entries over the last 5 y, but also have become more redundant. Despite the enormous increase in the number of sequences, the overall structural coverage of proteins--including proteins for which reliable homology models can be generated--on the residue level has increased from 30% to 40% over the last 10 y. Structural genomics efforts contributed ∼50% of this new structural coverage, despite determining only ∼10% of all new structures. Based on current trends, it is expected that ∼55% structural coverage (the level required for significant functional insight) will be achieved within 15 y, whereas without structural genomics efforts, realizing this goal will take approximately twice as long.
Automation of NMR structure determination of proteins.
Altieri, Amanda S; Byrd, R Andrew
2004-10-01
The automation of protein structure determination using NMR is coming of age. The tedious processes of resonance assignment, followed by assignment of NOE (nuclear Overhauser enhancement) interactions (now intertwined with structure calculation), assembly of input files for structure calculation, intermediate analyses of incorrect assignments and bad input data, and finally structure validation are all being automated with sophisticated software tools. The robustness of the different approaches continues to deal with problems of completeness and uniqueness; nevertheless, the future is very bright for automation of NMR structure generation to approach the levels found in X-ray crystallography. Currently, near completely automated structure determination is possible for small proteins, and the prospect for medium-sized and large proteins is good. Copyright 2004 Elsevier Ltd.
SAIL--stereo-array isotope labeling.
Kainosho, Masatsune; Güntert, Peter
2009-11-01
Optimal stereospecific and regiospecific labeling of proteins with stable isotopes enhances the nuclear magnetic resonance (NMR) method for the determination of the three-dimensional protein structures in solution. Stereo-array isotope labeling (SAIL) offers sharpened lines, spectral simplification without loss of information and the ability to rapidly collect and automatically evaluate the structural restraints required to solve a high-quality solution structure for proteins up to twice as large as before. This review gives an overview of stable isotope labeling methods for NMR spectroscopy with proteins and provides an in-depth treatment of the SAIL technology.
Ikeya, Teppei; Terauchi, Tsutomu; Güntert, Peter; Kainosho, Masatsune
2006-07-01
Recently we have developed the stereo-array isotope labeling (SAIL) technique to overcome the conventional molecular size limitation in NMR protein structure determination by employing complete stereo- and regiospecific patterns of stable isotopes. SAIL sharpens signals and simplifies spectra without the loss of requisite structural information, thus making large classes of proteins newly accessible to detailed solution structure determination. The automated structure calculation program CYANA can efficiently analyze SAIL-NOESY spectra and calculate structures without manual analysis. Nevertheless, the original SAIL method might not be capable of determining the structures of proteins larger than 50 kDa or membrane proteins, for which the spectra are characterized by many broadened and overlapped peaks. Here we have carried out simulations of new SAIL patterns optimized for minimal relaxation and overlap, to evaluate the combined use of SAIL and CYANA for solving the structures of larger proteins and membrane proteins. The modified approach reduces the number of peaks to nearly half of that observed with uniform labeling, while still yielding well-defined structures and is expected to enable NMR structure determinations of these challenging systems.
Lappala, Anna; Nishima, Wataru; Miner, Jacob; Fenimore, Paul; Fischer, Will; Hraber, Peter; Zhang, Ming; McMahon, Benjamin; Tung, Chang-Shung
2018-05-10
Membrane fusion proteins are responsible for viral entry into host cells—a crucial first step in viral infection. These proteins undergo large conformational changes from pre-fusion to fusion-initiation structures, and, despite differences in viral genomes and disease etiology, many fusion proteins are arranged as trimers. Structural information for both pre-fusion and fusion-initiation states is critical for understanding virus neutralization by the host immune system. In the case of Ebola virus glycoprotein (EBOV GP) and Zika virus envelope protein (ZIKV E), pre-fusion state structures have been identified experimentally, but only partial structures of fusion-initiation states have been described. While the fusion-initiation structure is in an energetically unfavorable state that is difficult to solve experimentally, the existing structural information combined with computational approaches enabled the modeling of fusion-initiation state structures of both proteins. These structural models provide an improved understanding of four different neutralizing antibodies in the prevention of viral host entry.
Measuring and comparing structural fluctuation patterns in large protein datasets.
Fuglebakk, Edvin; Echave, Julián; Reuter, Nathalie
2012-10-01
The function of a protein depends not only on its structure but also on its dynamics. This is at the basis of a large body of experimental and theoretical work on protein dynamics. Further insight into the dynamics-function relationship can be gained by studying the evolutionary divergence of protein motions. To investigate this, we need appropriate comparative dynamics methods. The most used dynamical similarity score is the correlation between the root mean square fluctuations (RMSF) of aligned residues. Despite its usefulness, RMSF is in general less evolutionarily conserved than the native structure. A fundamental issue is whether RMSF is not as conserved as structure because dynamics is less conserved or because RMSF is not the best property to use to study its conservation. We performed a systematic assessment of several scores that quantify the (dis)similarity between protein fluctuation patterns. We show that the best scores perform as well as or better than structural dissimilarity, as assessed by their consistency with the SCOP classification. We conclude that to uncover the full extent of the evolutionary conservation of protein fluctuation patterns, it is important to measure the directions of fluctuations and their correlations between sites. Nathalie.Reuter@mbi.uib.no Supplementary data are available at Bioinformatics Online.
Al Nasr, Kamal; Ranjan, Desh; Zubair, Mohammad; Chen, Lin; He, Jing
2014-01-01
Electron cryomicroscopy is becoming a major experimental technique in solving the structures of large molecular assemblies. More and more three-dimensional images have been obtained at the medium resolutions between 5 and 10 Å. At this resolution range, major α-helices can be detected as cylindrical sticks and β-sheets can be detected as plain-like regions. A critical question in de novo modeling from cryo-EM images is to determine the match between the detected secondary structures from the image and those on the protein sequence. We formulate this matching problem into a constrained graph problem and present an O(Δ(2)N(2)2(N)) algorithm to this NP-Hard problem. The algorithm incorporates the dynamic programming approach into a constrained K-shortest path algorithm. Our method, DP-TOSS, has been tested using α-proteins with maximum 33 helices and α-β proteins up to five helices and 12 β-strands. The correct match was ranked within the top 35 for 19 of the 20 α-proteins and all nine α-β proteins tested. The results demonstrate that DP-TOSS improves accuracy, time and memory space in deriving the topologies of the secondary structure elements for proteins with a large number of secondary structures and a complex skeleton.
ERIC Educational Resources Information Center
Hodis, Eran; Prilusky, Jaime, Sussman, Joel L.
2010-01-01
Protein structures are hard to represent on paper. They are large, complex, and three-dimensional (3D)--four-dimensional if conformational changes count! Unlike most of their substrates, which can easily be drawn out in full chemical formula, drawing every atom in a protein would usually be a mess. Simplifications like showing only the surface of…
Hydrophobic Collapse of Ubiquitin Generates Rapid Protein-Water Motions.
Wirtz, Hanna; Schäfer, Sarah; Hoberg, Claudius; Reid, Korey M; Leitner, David M; Havenith, Martina
2018-06-04
We report time-resolved measurements of the coupled protein-water modes of solvated ubiquitin during protein folding. Kinetic terahertz absorption (KITA) spectroscopy serves as a label-free technique for monitoring large scale conformational changes and folding of proteins subsequent to a sudden T-jump. We report here KITA measurements at an unprecedented time resolution of 500 ns, a resolution 2 orders of magnitude better than those of any previous KITA measurements, which reveal the coupled ubiquitin-solvent dynamics even in the initial phase of hydrophobic collapse. Complementary equilibrium experiments and molecular simulations of ubiquitin solutions are performed to clarify non-equilibrium contributions and reveal the molecular picture upon a change in structure, respectively. On the basis of our results, we propose that in the case of ubiquitin a rapid (<500 ns) initial phase of the hydrophobic collapse from the elongated protein to a molten globule structure precedes secondary structure formation. We find that these very first steps, including large-amplitude changes within the unfolded manifold, are accompanied by a rapid (<500 ns) pronounced change of the coupled protein-solvent response. The KITA response upon secondary structure formation exhibits an opposite sign, which indicates a distinct effect on the solvent-exposed surface.
Membrane protein properties revealed through data-rich electrostatics calculations
Guerriero, Christopher J.; Brodsky, Jeffrey L.; Grabe, Michael
2015-01-01
SUMMARY The electrostatic properties of membrane proteins often reveal many of their key biophysical characteristics, such as ion channel selectivity and the stability of charged membrane-spanning segments. The Poisson-Boltzmann (PB) equation is the gold standard for calculating protein electrostatics, and the software APBSmem enables the solution of the PB equation in the presence of a membrane. Here, we describe significant advances to APBSmem including: full automation of system setup, per-residue energy decomposition, incorporation of PDB2PQR, calculation of membrane induced pKa shifts, calculation of non-polar energies, and command-line scripting for large scale calculations. We highlight these new features with calculations carried out on a number of membrane proteins, including the recently solved structure of the ion channel TRPV1 and a large survey of 1,614 membrane proteins of known structure. This survey provides a comprehensive list of residues with large electrostatic penalties for being embedded in the membrane potentially revealing interesting functional information. PMID:26118532
Membrane Protein Properties Revealed through Data-Rich Electrostatics Calculations.
Marcoline, Frank V; Bethel, Neville; Guerriero, Christopher J; Brodsky, Jeffrey L; Grabe, Michael
2015-08-04
The electrostatic properties of membrane proteins often reveal many of their key biophysical characteristics, such as ion channel selectivity and the stability of charged membrane-spanning segments. The Poisson-Boltzmann (PB) equation is the gold standard for calculating protein electrostatics, and the software APBSmem enables the solution of the PB equation in the presence of a membrane. Here, we describe significant advances to APBSmem, including full automation of system setup, per-residue energy decomposition, incorporation of PDB2PQR, calculation of membrane-induced pKa shifts, calculation of non-polar energies, and command-line scripting for large-scale calculations. We highlight these new features with calculations carried out on a number of membrane proteins, including the recently solved structure of the ion channel TRPV1 and a large survey of 1,614 membrane proteins of known structure. This survey provides a comprehensive list of residues with large electrostatic penalties for being embedded in the membrane, potentially revealing interesting functional information. Copyright © 2015 Elsevier Ltd. All rights reserved.
2014-01-01
Background Bacteroides spp. form a significant part of our gut microbiome and are well known for optimized metabolism of diverse polysaccharides. Initial analysis of the archetypal Bacteroides thetaiotaomicron genome identified 172 glycosyl hydrolases and a large number of uncharacterized proteins associated with polysaccharide metabolism. Results BT_1012 from Bacteroides thetaiotaomicron VPI-5482 is a protein of unknown function and a member of a large protein family consisting entirely of uncharacterized proteins. Initial sequence analysis predicted that this protein has two domains, one on the N- and one on the C-terminal. A PSI-BLAST search found over 150 full length and over 90 half size homologs consisting only of the N-terminal domain. The experimentally determined three-dimensional structure of the BT_1012 protein confirms its two-domain architecture and structural analysis of both domains suggests their specific functions. The N-terminal domain is a putative catalytic domain with significant similarity to known glycoside hydrolases, the C-terminal domain has a beta-sandwich fold typically found in C-terminal domains of other glycosyl hydrolases, however these domains are typically involved in substrate binding. We describe the structure of the BT_1012 protein and discuss its sequence-structure relationship and their possible functional implications. Conclusions Structural and sequence analyses of the BT_1012 protein identifies it as a glycosyl hydrolase, expanding an already impressive catalog of enzymes involved in polysaccharide metabolism in Bacteroides spp. Based on this we have renamed the Pfam families representing the two domains found in the BT_1012 protein, PF13204 and PF12904, as putative glycoside hydrolase and glycoside hydrolase-associated C-terminal domain respectively. PMID:24742328
NASA Astrophysics Data System (ADS)
Li, Huilin; Nguyen, Hong Hanh; Ogorzalek Loo, Rachel R.; Campuzano, Iain D. G.; Loo, Joseph A.
2018-02-01
Mass spectrometry (MS) has become a crucial technique for the analysis of protein complexes. Native MS has traditionally examined protein subunit arrangements, while proteomics MS has focused on sequence identification. These two techniques are usually performed separately without taking advantage of the synergies between them. Here we describe the development of an integrated native MS and top-down proteomics method using Fourier-transform ion cyclotron resonance (FTICR) to analyse macromolecular protein complexes in a single experiment. We address previous concerns of employing FTICR MS to measure large macromolecular complexes by demonstrating the detection of complexes up to 1.8 MDa, and we demonstrate the efficacy of this technique for direct acquirement of sequence to higher-order structural information with several large complexes. We then summarize the unique functionalities of different activation/dissociation techniques. The platform expands the ability of MS to integrate proteomics and structural biology to provide insights into protein structure, function and regulation.
Super: a web server to rapidly screen superposable oligopeptide fragments from the protein data bank
Collier, James H.; Lesk, Arthur M.; Garcia de la Banda, Maria; Konagurthu, Arun S.
2012-01-01
Searching for well-fitting 3D oligopeptide fragments within a large collection of protein structures is an important task central to many analyses involving protein structures. This article reports a new web server, Super, dedicated to the task of rapidly screening the protein data bank (PDB) to identify all fragments that superpose with a query under a prespecified threshold of root-mean-square deviation (RMSD). Super relies on efficiently computing a mathematical bound on the commonly used structural similarity measure, RMSD of superposition. This allows the server to filter out a large proportion of fragments that are unrelated to the query; >99% of the total number of fragments in some cases. For a typical query, Super scans the current PDB containing over 80 500 structures (with ∼40 million potential oligopeptide fragments to match) in under a minute. Super web server is freely accessible from: http://lcb.infotech.monash.edu.au/super. PMID:22638586
Structure of human Niemann–Pick C1 protein
Li, Xiaochun; Wang, Jiawei; Coutavas, Elias; Shi, Hang; Hao, Qi; Blobel, Günter
2016-01-01
Niemann–Pick C1 protein (NPC1) is a late-endosomal membrane protein involved in trafficking of LDL-derived cholesterol, Niemann–Pick disease type C, and Ebola virus infection. NPC1 contains 13 transmembrane segments (TMs), five of which are thought to represent a “sterol-sensing domain” (SSD). Although present also in other key regulatory proteins of cholesterol biosynthesis, uptake, and signaling, the structure and mechanism of action of the SSD are unknown. Here we report a crystal structure of a large fragment of human NPC1 at 3.6 Å resolution, which reveals internal twofold pseudosymmetry along TM 2–13 and two structurally homologous domains that protrude 60 Å into the endosomal lumen. Strikingly, NPC1's SSD forms a cavity that is accessible from both the luminal bilayer leaflet and the endosomal lumen; computational modeling suggests that this cavity is large enough to accommodate one cholesterol molecule. We propose a model for NPC1 function in cholesterol sensing and transport. PMID:27307437
Proteins as sponges: a statistical journey along protein structure organization principles.
Paola, Luisa Di; Paci, Paola; Santoni, Daniele; Ruvo, Micol De; Giuliani, Alessandro
2012-02-27
The analysis of a large database of protein structures by means of topological and shape indexes inspired by complex network and fractal analysis shed light on some organizational principles of proteins. Proteins appear much more similar to "fractal" sponges than to closely packed spheres, casting doubts on the tenability of the hydrophobic core concept. Principal component analysis highlighted three main order parameters shaping the protein universe: (1) "size", with the consequent generation of progressively less dense and more empty structures at an increasing number of residues, (2) "microscopic structuring", linked to the existence of a spectrum going from the prevalence of heterologous (different hydrophobicity) to the prevalence of homologous (similar hydrophobicity) contacts, and (3) "fractal shape", an organizing protein data set along a continuum going from approximately linear to very intermingled structures. Perhaps the time has come for seriously taking into consideration the real relevance of time-honored principles like the hydrophobic core and hydrophobic effect.
Origins of structure in globular proteins.
Chan, H S; Dill, K A
1990-01-01
The principal forces of protein folding--hydrophobicity and conformational entropy--are nonspecific. A long-standing puzzle has, therefore, been: What forces drive the formation of the specific internal architectures in globular proteins? We find that any self-avoiding flexible polymer molecule will develop large amounts of secondary structure, helices and parallel and antiparallel sheets, as it is driven to increasing compactness by any force of attraction among the chain monomers. Thus structure formation arises from the severity of steric constraints in compact polymers. This steric principle of organization can account for why short helices are stable in globular proteins, why there are parallel and anti-parallel sheets in proteins, and why weakly unfolded proteins have some secondary structure. On this basis, it should be possible to construct copolymers, not necessarily using amino acids, that can collapse to maximum compactness in incompatible solvents and that should then have structural organization resembling that of proteins. Images PMID:2385597
Structure of synaptophysin: a hexameric MARVEL-domain channel protein.
Arthur, Christopher P; Stowell, Michael H B
2007-06-01
Synaptophysin I (SypI) is an archetypal member of the MARVEL-domain family of integral membrane proteins and one of the first synaptic vesicle proteins to be identified and cloned. Most all MARVEL-domain proteins are involved in membrane apposition and vesicle-trafficking events, but their precise role in these processes is unclear. We have purified mammalian SypI and determined its three-dimensional (3D) structure by using electron microscopy and single-particle 3D reconstruction. The hexameric structure resembles an open basket with a large pore and tenuous interactions within the cytosolic domain. The structure suggests a model for Synaptophysin's role in fusion and recycling that is regulated by known interactions with the SNARE machinery. This 3D structure of a MARVEL-domain protein provides a structural foundation for understanding the role of these important proteins in a variety of biological processes.
Mitchell, Carter A; Shi, Ce; Aldrich, Courtney C; Gulick, Andrew M
2012-04-17
Many bacteria use large modular enzymes for the synthesis of polyketide and peptide natural products. These multidomain enzymes contain integrated carrier domains that deliver bound substrates to multiple catalytic domains, requiring coordination of these chemical steps. Nonribosomal peptide synthetases (NRPSs) load amino acids onto carrier domains through the activity of an upstream adenylation domain. Our lab recently determined the structure of an engineered two-domain NRPS containing fused adenylation and carrier domains. This structure adopted a domain-swapped dimer that illustrated the interface between these two domains. To continue our investigation, we now examine PA1221, a natural two-domain protein from Pseudomonas aeruginosa. We have determined the amino acid specificity of this new enzyme and used domain specific mutations to demonstrate that loading the downstream carrier domain within a single protein molecule occurs more quickly than loading of a nonfused carrier domain intermolecularly. Finally, we have determined crystal structures of both apo- and holo-PA1221 proteins, the latter using a valine-adenosine vinylsulfonamide inhibitor that traps the adenylation domain-carrier domain interaction. The protein adopts an interface similar to that seen with the prior adenylation domain-carrier protein construct. A comparison of these structures with previous structures of multidomain NRPSs suggests that a large conformational change within the NRPS adenylation domains guides the carrier domain into the active site for thioester formation.
FitzGerald, Paul; Sun, Ning; Shibata, Brad; Hess, John F
2016-01-01
The differentiated lens fiber cell assembles a filamentous cytoskeletal structure referred to as the beaded filament (BF). The BF requires CP49 (bfsp2) and filensin (bfsp1) for assembly, both of which are highly divergent members of the large intermediate filament (IF) family of proteins. Thus far, these two proteins have been reported only in the differentiated lens fiber cell. For this reason, both proteins have been considered robust markers of fiber cell differentiation. We report here that both proteins are also expressed in the mouse lens epithelium, but only after 5 weeks of age. Localization of CP49 was achieved with immunocytochemical probing of wild-type, CP49 knockout, filensin knockout, and vimentin knockout mice, in sections and in the explanted lens epithelium, at the light microscope and electron microscope levels. The relationship between CP49 and other cytoskeletal elements was probed using fluorescent phalloidin, as well as with antibodies to vimentin, GFAP, and α-tubulin. The relationship between CP49 and the aggresome was probed with antibodies to γ-tubulin, ubiquitin, and HDAC6. CP49 and filensin were expressed in the mouse lens epithelium, but only after 5 weeks of age. At the light microscope level, these two proteins colocalize to a large tubular structure, approximately 7 × 1 μm, which was typically present at one to two copies per cell. This structure is found in the anterior and anterolateral lens epithelium, including the zone where mitosis occurs. The structure becomes smaller and largely undetectable closer to the equator where the cell exits the cell cycle and commits to fiber cell differentiation. This structure bears some resemblance to the aggresome and is reactive with antibodies to HDAC6, a marker for the aggresome. However, the structure does not colocalize with antibodies to γ-tubulin or ubiquitin, also markers for the aggresome. The structure also colocalizes with actin but appears to largely exclude vimentin and α-tubulin. In the CP49 and filensin knockouts, this structure is absent, confirming the identity of CP49 and filensin in this structure, and suggesting a requirement for the physiologic coassembly of CP49 and filensin. CP49 and filensin have been considered robust markers for mouse lens fiber cell differentiation. The data reported here, however, document both proteins in the mouse lens epithelium, but only after 5 weeks of age, when lens epithelial growth and mitotic activity have slowed. Because of this, CP49 and filensin must be considered markers of differentiation for both fiber cells and the lens epithelium in the mouse. In addition, to our knowledge, no other protein has been shown to emerge so late in the development of the mouse lens epithelium, suggesting that lens epithelial differentiation may continue well into post-natal life. If this structure is related to the aggresome, it is a rare, or perhaps unique example of a large, stable aggresome in wild-type tissue.
Atomic structures of corkscrew-forming segments of SOD1 reveal varied oligomer conformations.
Sangwan, Smriti; Sawaya, Michael R; Murray, Kevin A; Hughes, Michael P; Eisenberg, David S
2018-02-17
The aggregation cascade of disease-related amyloidogenic proteins, terminating in insoluble amyloid fibrils, involves intermediate oligomeric states. The structural and biochemical details of these oligomers have been largely unknown. Here we report crystal structures of variants of the cytotoxic oligomer-forming segment residues 28-38 of the ALS-linked protein, SOD1. The crystal structures reveal three different architectures: corkscrew oligomeric structure, nontwisting curved sheet structure and a steric zipper proto-filament structure. Our work highlights the polymorphism of the segment 28-38 of SOD1 and identifies the molecular features of amyloidogenic entities. © 2018 The Protein Society.
Yesselman, Joseph D; Horowitz, Scott; Brooks, Charles L; Trievel, Raymond C
2015-03-01
The propensity of backbone Cα atoms to engage in carbon-oxygen (CH · · · O) hydrogen bonding is well-appreciated in protein structure, but side chain CH · · · O hydrogen bonding remains largely uncharacterized. The extent to which side chain methyl groups in proteins participate in CH · · · O hydrogen bonding is examined through a survey of neutron crystal structures, quantum chemistry calculations, and molecular dynamics simulations. Using these approaches, methyl groups were observed to form stabilizing CH · · · O hydrogen bonds within protein structure that are maintained through protein dynamics and participate in correlated motion. Collectively, these findings illustrate that side chain methyl CH · · · O hydrogen bonding contributes to the energetics of protein structure and folding. © 2014 Wiley Periodicals, Inc.
Networking at the Protein Society symposium.
McKnight, C James; Cordes, Matthew H J
2005-10-01
From the complex behavior of multicomponent signaling networks to the structures of large protein complexes and aggregates, questions once viewed as daunting are now being tackled fearlessly by protein scientists. The 19th Annual Symposium of the Protein Society in Boston highlighted the maturation of systems biology as applied to proteins.
Rebelling for a Reason: Protein Structural “Outliers”
Arumugam, Gandhimathi; Nair, Anu G.; Hariharaputran, Sridhar; Ramanathan, Sowdhamini
2013-01-01
Analysis of structural variation in domain superfamilies can reveal constraints in protein evolution which aids protein structure prediction and classification. Structure-based sequence alignment of distantly related proteins, organized in PASS2 database, provides clues about structurally conserved regions among different functional families. Some superfamily members show large structural differences which are functionally relevant. This paper analyses the impact of structural divergence on function for multi-member superfamilies, selected from the PASS2 superfamily alignment database. Functional annotations within superfamilies, with structural outliers or ‘rebels’, are discussed in the context of structural variations. Overall, these data reinforce the idea that functional similarities cannot be extrapolated from mere structural conservation. The implication for fold-function prediction is that the functional annotations can only be inherited with very careful consideration, especially at low sequence identities. PMID:24073209
Structure-based design of combinatorial mutagenesis libraries.
Verma, Deeptak; Grigoryan, Gevorg; Bailey-Kellogg, Chris
2015-05-01
The development of protein variants with improved properties (thermostability, binding affinity, catalytic activity, etc.) has greatly benefited from the application of high-throughput screens evaluating large, diverse combinatorial libraries. At the same time, since only a very limited portion of sequence space can be experimentally constructed and tested, an attractive possibility is to use computational protein design to focus libraries on a productive portion of the space. We present a general-purpose method, called "Structure-based Optimization of Combinatorial Mutagenesis" (SOCoM), which can optimize arbitrarily large combinatorial mutagenesis libraries directly based on structural energies of their constituents. SOCoM chooses both positions and substitutions, employing a combinatorial optimization framework based on library-averaged energy potentials in order to avoid explicitly modeling every variant in every possible library. In case study applications to green fluorescent protein, β-lactamase, and lipase A, SOCoM optimizes relatively small, focused libraries whose variants achieve energies comparable to or better than previous library design efforts, as well as larger libraries (previously not designable by structure-based methods) whose variants cover greater diversity while still maintaining substantially better energies than would be achieved by representative random library approaches. By allowing the creation of large-scale combinatorial libraries based on structural calculations, SOCoM promises to increase the scope of applicability of computational protein design and improve the hit rate of discovering beneficial variants. While designs presented here focus on variant stability (predicted by total energy), SOCoM can readily incorporate other structure-based assessments, such as the energy gap between alternative conformational or bound states. © 2015 The Protein Society.
Integrating protein structural dynamics and evolutionary analysis with Bio3D.
Skjærven, Lars; Yao, Xin-Qiu; Scarabelli, Guido; Grant, Barry J
2014-12-10
Popular bioinformatics approaches for studying protein functional dynamics include comparisons of crystallographic structures, molecular dynamics simulations and normal mode analysis. However, determining how observed displacements and predicted motions from these traditionally separate analyses relate to each other, as well as to the evolution of sequence, structure and function within large protein families, remains a considerable challenge. This is in part due to the general lack of tools that integrate information of molecular structure, dynamics and evolution. Here, we describe the integration of new methodologies for evolutionary sequence, structure and simulation analysis into the Bio3D package. This major update includes unique high-throughput normal mode analysis for examining and contrasting the dynamics of related proteins with non-identical sequences and structures, as well as new methods for quantifying dynamical couplings and their residue-wise dissection from correlation network analysis. These new methodologies are integrated with major biomolecular databases as well as established methods for evolutionary sequence and comparative structural analysis. New functionality for directly comparing results derived from normal modes, molecular dynamics and principal component analysis of heterogeneous experimental structure distributions is also included. We demonstrate these integrated capabilities with example applications to dihydrofolate reductase and heterotrimeric G-protein families along with a discussion of the mechanistic insight provided in each case. The integration of structural dynamics and evolutionary analysis in Bio3D enables researchers to go beyond a prediction of single protein dynamics to investigate dynamical features across large protein families. The Bio3D package is distributed with full source code and extensive documentation as a platform independent R package under a GPL2 license from http://thegrantlab.org/bio3d/ .
Structure of a designed protein cage that self-assembles into a highly porous cube
Lai, Yen-Ting; Reading, Eamonn; Hura, Greg L.; ...
2014-11-10
Natural proteins can be versatile building blocks for multimeric, self-assembling structures. Yet, creating protein-based assemblies with specific geometries and chemical properties remains challenging. Highly porous materials represent particularly interesting targets for designed assembly. Here we utilize a strategy of fusing two natural protein oligomers using a continuous alpha-helical linker to design a novel protein that self assembles into a 750 kDa, 225 Å diameter, cube-shaped cage with large openings into a 130 Å diameter inner cavity. A crystal structure of the cage showed atomic level agreement with the designed model, while electron microscopy, native mass spectrometry, and small angle x-raymore » scattering revealed alternate assembly forms in solution. These studies show that accurate design of large porous assemblies with specific shapes is feasible, while further specificity improvements will likely require limiting flexibility to select against alternative forms. Finally, these results provide a foundation for the design of advanced materials with applications in bionanotechnology, nanomedicine and material sciences.« less
Direct detection of x-rays for protein crystallography employing a thick, large area CCD
Atac, Muzaffer; McKay, Timothy
1999-01-01
An apparatus and method for directly determining the crystalline structure of a protein crystal. The crystal is irradiated by a finely collimated x-ray beam. The interaction of the x-ray beam with the crystal produces scattered x-rays. These scattered x-rays are detected by means of a large area, thick CCD which is capable of measuring a significant number of scattered x-rays which impact its surface. The CCD is capable of detecting the position of impact of the scattered x-ray on the surface of the CCD and the quantity of scattered x-rays which impact the same cell or pixel. This data is then processed in real-time and the processed data is outputted to produce a image of the structure of the crystal. If this crystal is a protein the molecular structure of the protein can be determined from the data received.
NASA Technical Reports Server (NTRS)
Arkin, I. T.; Sukharev, S. I.; Blount, P.; Kung, C.; Brunger, A. T.
1998-01-01
In this report, we present structural studies on the large conductance mechanosensitive ion channel (MscL) from E. coli in detergent micelles and lipid vesicles. Both transmission Fourier transform infrared spectroscopy and circular dichroism (CD) spectra indicate that the protein is highly helical in detergents as well as liposomes. The secondary structure of the proteins was shown to be highly resistant towards denaturation (25-95 degrees C) based on an ellipticity thermal profile. Amide H+/D+ exchange was shown to be extensive (ca. 66%), implying that two thirds of the protein are water accessible. MscL, reconstituted in oriented lipid bilayers, was shown to possess a net bilayer orientation using dichroic ratios measured by attenuated total-reflection Fourier transform infrared spectroscopy. Here, we present and discuss this initial set of structural data on this new family of ion-channel proteins.
Sequence co-evolution gives 3D contacts and structures of protein complexes
Hopf, Thomas A; Schärfe, Charlotta P I; Rodrigues, João P G L M; Green, Anna G; Kohlbacher, Oliver; Sander, Chris; Bonvin, Alexandre M J J; Marks, Debora S
2014-01-01
Protein–protein interactions are fundamental to many biological processes. Experimental screens have identified tens of thousands of interactions, and structural biology has provided detailed functional insight for select 3D protein complexes. An alternative rich source of information about protein interactions is the evolutionary sequence record. Building on earlier work, we show that analysis of correlated evolutionary sequence changes across proteins identifies residues that are close in space with sufficient accuracy to determine the three-dimensional structure of the protein complexes. We evaluate prediction performance in blinded tests on 76 complexes of known 3D structure, predict protein–protein contacts in 32 complexes of unknown structure, and demonstrate how evolutionary couplings can be used to distinguish between interacting and non-interacting protein pairs in a large complex. With the current growth of sequences, we expect that the method can be generalized to genome-wide elucidation of protein–protein interaction networks and used for interaction predictions at residue resolution. DOI: http://dx.doi.org/10.7554/eLife.03430.001 PMID:25255213
xMDFF: molecular dynamics flexible fitting of low-resolution X-ray structures.
McGreevy, Ryan; Singharoy, Abhishek; Li, Qufei; Zhang, Jingfen; Xu, Dong; Perozo, Eduardo; Schulten, Klaus
2014-09-01
X-ray crystallography remains the most dominant method for solving atomic structures. However, for relatively large systems, the availability of only medium-to-low-resolution diffraction data often limits the determination of all-atom details. A new molecular dynamics flexible fitting (MDFF)-based approach, xMDFF, for determining structures from such low-resolution crystallographic data is reported. xMDFF employs a real-space refinement scheme that flexibly fits atomic models into an iteratively updating electron-density map. It addresses significant large-scale deformations of the initial model to fit the low-resolution density, as tested with synthetic low-resolution maps of D-ribose-binding protein. xMDFF has been successfully applied to re-refine six low-resolution protein structures of varying sizes that had already been submitted to the Protein Data Bank. Finally, via systematic refinement of a series of data from 3.6 to 7 Å resolution, xMDFF refinements together with electrophysiology experiments were used to validate the first all-atom structure of the voltage-sensing protein Ci-VSP.
Pulsed Electron Beam Water Radiolysis for Sub-Microsecond Hydroxyl Radical Protein Footprinting
Watson, Caroline; Janik, Ireneusz; Zhuang, Tiandi; Charvátová, Olga; Woods, Robert J.; Sharp, Joshua S.
2009-01-01
Hydroxyl radical footprinting is a valuable technique for studying protein structure, but care must be taken to ensure that the protein does not unfold during the labeling process due to oxidative damage. Footprinting methods based on sub-microsecond laser photolysis of peroxide that complete the labeling process faster than the protein can unfold have been recently described; however, the mere presence of large amounts of hydrogen peroxide can also cause uncontrolled oxidation and minor conformational changes. We have developed a novel method for sub-microsecond hydroxyl radical protein footprinting using a pulsed electron beam from a 2 MeV Van de Graaff electron accelerator to generate a high concentration of hydroxyl radicals by radiolysis of water. The amount of oxidation can be controlled by buffer composition, pulsewidth, dose, and dissolved nitrous oxide gas in the sample. Our results with ubiquitin and β-lactoglobulin A demonstrate that one sub-microsecond electron beam pulse produces extensive protein surface modifications. Highly reactive residues that are buried within the protein structure are not oxidized, indicating that the protein retains its folded structure during the labeling process. Time-resolved spectroscopy indicates that the major part of protein oxidation is complete in a timescale shorter than that of large scale protein motions. PMID:19265387
Normal mode-guided transition pathway generation in proteins
Lee, Byung Ho; Seo, Sangjae; Kim, Min Hyeok; Kim, Youngjin; Jo, Soojin; Choi, Moon-ki; Lee, Hoomin; Choi, Jae Boong
2017-01-01
The biological function of proteins is closely related to its structural motion. For instance, structurally misfolded proteins do not function properly. Although we are able to experimentally obtain structural information on proteins, it is still challenging to capture their dynamics, such as transition processes. Therefore, we need a simulation method to predict the transition pathways of a protein in order to understand and study large functional deformations. Here, we present a new simulation method called normal mode-guided elastic network interpolation (NGENI) that performs normal modes analysis iteratively to predict transition pathways of proteins. To be more specific, NGENI obtains displacement vectors that determine intermediate structures by interpolating the distance between two end-point conformations, similar to a morphing method called elastic network interpolation. However, the displacement vector is regarded as a linear combination of the normal mode vectors of each intermediate structure, in order to enhance the physical sense of the proposed pathways. As a result, we can generate more reasonable transition pathways geometrically and thermodynamically. By using not only all normal modes, but also in part using only the lowest normal modes, NGENI can still generate reasonable pathways for large deformations in proteins. This study shows that global protein transitions are dominated by collective motion, which means that a few lowest normal modes play an important role in this process. NGENI has considerable merit in terms of computational cost because it is possible to generate transition pathways by partial degrees of freedom, while conventional methods are not capable of this. PMID:29020017
A cross docking pipeline for improving pose prediction and virtual screening performance
NASA Astrophysics Data System (ADS)
Kumar, Ashutosh; Zhang, Kam Y. J.
2018-01-01
Pose prediction and virtual screening performance of a molecular docking method depend on the choice of protein structures used for docking. Multiple structures for a target protein are often used to take into account the receptor flexibility and problems associated with a single receptor structure. However, the use of multiple receptor structures is computationally expensive when docking a large library of small molecules. Here, we propose a new cross-docking pipeline suitable to dock a large library of molecules while taking advantage of multiple target protein structures. Our method involves the selection of a suitable receptor for each ligand in a screening library utilizing ligand 3D shape similarity with crystallographic ligands. We have prospectively evaluated our method in D3R Grand Challenge 2 and demonstrated that our cross-docking pipeline can achieve similar or better performance than using either single or multiple-receptor structures. Moreover, our method displayed not only decent pose prediction performance but also better virtual screening performance over several other methods.
Cryo-EM structure of the large subunit of the spinach chloroplast ribosome
Ahmed, Tofayel; Yin, Zhan; Bhushan, Shashi
2016-01-01
Protein synthesis in the chloroplast is mediated by the chloroplast ribosome (chloro-ribosome). Overall architecture of the chloro-ribosome is considerably similar to the Escherichia coli (E. coli) ribosome but certain differences are evident. The chloro-ribosome proteins are generally larger because of the presence of chloroplast-specific extensions in their N- and C-termini. The chloro-ribosome harbours six plastid-specific ribosomal proteins (PSRPs); four in the small subunit and two in the large subunit. Deletions and insertions occur throughout the rRNA sequence of the chloro-ribosome (except for the conserved peptidyl transferase center region) but the overall length of the rRNAs do not change significantly, compared to the E. coli. Although, recent advancements in cryo-electron microscopy (cryo-EM) have provided detailed high-resolution structures of ribosomes from many different sources, a high-resolution structure of the chloro-ribosome is still lacking. Here, we present a cryo-EM structure of the large subunit of the chloro-ribosome from spinach (Spinacia oleracea) at an average resolution of 3.5 Å. High-resolution map enabled us to localize and model chloro-ribosome proteins, chloroplast-specific protein extensions, two PSRPs (PSRP5 and 6) and three rRNA molecules present in the chloro-ribosome. Although comparable to E. coli, the polypeptide tunnel and the tunnel exit site show chloroplast-specific features. PMID:27762343
DOE Office of Scientific and Technical Information (OSTI.GOV)
Cardarelli, Lia; Lam, Robert; Tuite, Ashleigh
2010-08-17
The final step in the morphogenesis of long-tailed double-stranded DNA bacteriophages is the joining of the DNA-filled head to the tail. The connector is a specialized structure of the head that serves as the interface for tail attachment and the point of egress for DNA from the head during infection. Here, we report the determination of a 2.1 {angstrom} crystal structure of gp6 of bacteriophage HK97. Through structural comparisons, functional studies, and bioinformatic analysis, gp6 has been determined to be a component of the connector of phage HK97 that is evolutionarily related to gp15, a well-characterized connector component of bacteriophagemore » SPP1. Whereas the structure of gp15 was solved in a monomeric form, gp6 crystallized as an oligomeric ring with the dimensions expected for a connector protein. Although this ring is composed of 13 subunits, which does not match the symmetry of the connector within the phage, sequence conservation and modeling of this structure into the cryo-electron microscopy density of the SPP1 connector indicate that this oligomeric structure represents the arrangement of gp6 subunits within the mature phage particle. Through sequence searches and genomic position analysis, we determined that gp6 is a member of a large family of connector proteins that are present in long-tailed phages. We have also identified gp7 of HK97 as a homologue of gp16 of phage SPP1, which is the second component of the connector of this phage. These proteins are members of another large protein family involved in connector assembly.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Cardarelli, Lia; Lam, Robert; Tuite, Ashleigh
2011-11-23
The final step in the morphogenesis of long-tailed double-stranded DNA bacteriophages is the joining of the DNA-filled head to the tail. The connector is a specialized structure of the head that serves as the interface for tail attachment and the point of egress for DNA from the head during infection. Here, we report the determination of a 2.1 Å crystal structure of gp6 of bacteriophage HK97. Through structural comparisons, functional studies, and bioinformatic analysis, gp6 has been determined to be a component of the connector of phage HK97 that is evolutionarily related to gp15, a well-characterized connector component of bacteriophagemore » SPP1. Whereas the structure of gp15 was solved in a monomeric form, gp6 crystallized as an oligomeric ring with the dimensions expected for a connector protein. Although this ring is composed of 13 subunits, which does not match the symmetry of the connector within the phage, sequence conservation and modeling of this structure into the cryo-electron microscopy density of the SPP1 connector indicate that this oligomeric structure represents the arrangement of gp6 subunits within the mature phage particle. Through sequence searches and genomic position analysis, we determined that gp6 is a member of a large family of connector proteins that are present in long-tailed phages. We have also identified gp7 of HK97 as a homologue of gp16 of phage SPP1, which is the second component of the connector of this phage. These proteins are members of another large protein family involved in connector assembly.« less
Zhang, Gaihua; Su, Zhen
2012-01-01
Work on protein structure prediction is very useful in biological research. To evaluate their accuracy, experimental protein structures or their derived data are used as the 'gold standard'. However, as proteins are dynamic molecular machines with structural flexibility such a standard may be unreliable. To investigate the influence of the structure flexibility, we analysed 3,652 protein structures of 137 unique sequences from 24 protein families. The results showed that (1) the three-dimensional (3D) protein structures were not rigid: the root-mean-square deviation (RMSD) of the backbone Cα of structures with identical sequences was relatively large, with the average of the maximum RMSD from each of the 137 sequences being 1.06 Å; (2) the derived data of the 3D structure was not constant, e.g. the highest ratio of the secondary structure wobble site was 60.69%, with the sequence alignments from structural comparisons of two proteins in the same family sometimes being completely different. Proteins may have several stable conformations and the data derived from resolved structures as a 'gold standard' should be optimized before being utilized as criteria to evaluate the prediction methods, e.g. sequence alignment from structural comparison. Helix/β-sheet transition exists in normal free proteins. The coil ratio of the 3D structure could affect its resolution as determined by X-ray crystallography.
Khafizov, Kamil; Madrid-Aliste, Carlos; Almo, Steven C.; Fiser, Andras
2014-01-01
The exponential growth of protein sequence data provides an ever-expanding body of unannotated and misannotated proteins. The National Institutes of Health-supported Protein Structure Initiative and related worldwide structural genomics efforts facilitate functional annotation of proteins through structural characterization. Recently there have been profound changes in the taxonomic composition of sequence databases, which are effectively redefining the scope and contribution of these large-scale structure-based efforts. The faster-growing bacterial genomic entries have overtaken the eukaryotic entries over the last 5 y, but also have become more redundant. Despite the enormous increase in the number of sequences, the overall structural coverage of proteins—including proteins for which reliable homology models can be generated—on the residue level has increased from 30% to 40% over the last 10 y. Structural genomics efforts contributed ∼50% of this new structural coverage, despite determining only ∼10% of all new structures. Based on current trends, it is expected that ∼55% structural coverage (the level required for significant functional insight) will be achieved within 15 y, whereas without structural genomics efforts, realizing this goal will take approximately twice as long. PMID:24567391
A transthyretin-related protein is functionally expressed in Herbaspirillum seropedicae.
Matiollo, Camila; Vernal, Javier; Ecco, Gabriela; Bertoldo, Jean Borges; Razzera, Guilherme; de Souza, Emanuel M; Pedrosa, Fábio O; Terenzi, Hernán
2009-10-02
Transthyretin-related proteins (TRPs) constitute a family of proteins structurally related to transthyretin (TTR) and are found in a large range of bacterial, fungal, plant, invertebrate, and vertebrate species. However, it was recently recognized that both prokaryotic and eukaryotic members of this family are not functionally related to transthyretins. TRPs are in fact involved in the purine catabolic pathway and function as hydroxyisourate hydrolases. An open reading frame encoding a protein similar to the Escherichia coli TRP was identified in Herbaspirillum seropedicae genome (Hs_TRP). It was cloned, overexpressed in E. coli, and purified to homogeneity. Mass spectrometry data confirmed the identity of this protein, and circular dichroism spectrum indicated a predominance of beta-sheet structure, as expected for a TRP. We have demonstrated that Hs_TRP is a 5-hydroxyisourate hydrolase and by site-directed mutagenesis the importance of three conserved catalytic residues for Hs_TRP activity was further confirmed. The production of large quantities of this recombinant protein opens up the possibility of obtaining its 3D-structure and will help further investigations into purine catabolism.
Modeling complexes of modeled proteins.
Anishchenko, Ivan; Kundrotas, Petras J; Vakser, Ilya A
2017-03-01
Structural characterization of proteins is essential for understanding life processes at the molecular level. However, only a fraction of known proteins have experimentally determined structures. This fraction is even smaller for protein-protein complexes. Thus, structural modeling of protein-protein interactions (docking) primarily has to rely on modeled structures of the individual proteins, which typically are less accurate than the experimentally determined ones. Such "double" modeling is the Grand Challenge of structural reconstruction of the interactome. Yet it remains so far largely untested in a systematic way. We present a comprehensive validation of template-based and free docking on a set of 165 complexes, where each protein model has six levels of structural accuracy, from 1 to 6 Å C α RMSD. Many template-based docking predictions fall into acceptable quality category, according to the CAPRI criteria, even for highly inaccurate proteins (5-6 Å RMSD), although the number of such models (and, consequently, the docking success rate) drops significantly for models with RMSD > 4 Å. The results show that the existing docking methodologies can be successfully applied to protein models with a broad range of structural accuracy, and the template-based docking is much less sensitive to inaccuracies of protein models than the free docking. Proteins 2017; 85:470-478. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
Modularity in protein structures: study on all-alpha proteins.
Khan, Taushif; Ghosh, Indira
2015-01-01
Modularity is known as one of the most important features of protein's robust and efficient design. The architecture and topology of proteins play a vital role by providing necessary robust scaffolds to support organism's growth and survival in constant evolutionary pressure. These complex biomolecules can be represented by several layers of modular architecture, but it is pivotal to understand and explore the smallest biologically relevant structural component. In the present study, we have developed a component-based method, using protein's secondary structures and their arrangements (i.e. patterns) in order to investigate its structural space. Our result on all-alpha protein shows that the known structural space is highly populated with limited set of structural patterns. We have also noticed that these frequently observed structural patterns are present as modules or "building blocks" in large proteins (i.e. higher secondary structure content). From structural descriptor analysis, observed patterns are found to be within similar deviation; however, frequent patterns are found to be distinctly occurring in diverse functions e.g. in enzymatic classes and reactions. In this study, we are introducing a simple approach to explore protein structural space using combinatorial- and graph-based geometry methods, which can be used to describe modularity in protein structures. Moreover, analysis indicates that protein function seems to be the driving force that shapes the known structure space.
Robust enzyme design: bioinformatic tools for improved protein stability.
Suplatov, Dmitry; Voevodin, Vladimir; Švedas, Vytas
2015-03-01
The ability of proteins and enzymes to maintain a functionally active conformation under adverse environmental conditions is an important feature of biocatalysts, vaccines, and biopharmaceutical proteins. From an evolutionary perspective, robust stability of proteins improves their biological fitness and allows for further optimization. Viewed from an industrial perspective, enzyme stability is crucial for the practical application of enzymes under the required reaction conditions. In this review, we analyze bioinformatic-driven strategies that are used to predict structural changes that can be applied to wild type proteins in order to produce more stable variants. The most commonly employed techniques can be classified into stochastic approaches, empirical or systematic rational design strategies, and design of chimeric proteins. We conclude that bioinformatic analysis can be efficiently used to study large protein superfamilies systematically as well as to predict particular structural changes which increase enzyme stability. Evolution has created a diversity of protein properties that are encoded in genomic sequences and structural data. Bioinformatics has the power to uncover this evolutionary code and provide a reproducible selection of hotspots - key residues to be mutated in order to produce more stable and functionally diverse proteins and enzymes. Further development of systematic bioinformatic procedures is needed to organize and analyze sequences and structures of proteins within large superfamilies and to link them to function, as well as to provide knowledge-based predictions for experimental evaluation. Copyright © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Identification of a new protein in the centrosome-like "atractophore" of Trichomonas vaginalis.
Bricheux, Geneviève; Coffe, Gérard; Brugerolle, Guy
2007-06-01
The human parasite Trichomonas vaginalis has specific structural bodies, atractophores, associated at one end to the kinetosomes and at the other to the spindle during division. A monoclonal antibody specific for a component of this structure was obtained. It recognizes a protein with a predicted molecular mass of 477 kDa. Sequence analysis of this protein shows that P477 belongs to the family of large coiled-coil proteins, sharing a highly versatile protein folding motif adaptable to many biological functions. P477-might act as an anchor to localize cellular activities and components to the golgi centrosomal region. It may represent a new class of structural proteins, since similar proteins were found in many protozoans.
CABS-flex 2.0: a web server for fast simulations of flexibility of protein structures.
Kuriata, Aleksander; Gierut, Aleksandra Maria; Oleniecki, Tymoteusz; Ciemny, Maciej Pawel; Kolinski, Andrzej; Kurcinski, Mateusz; Kmiecik, Sebastian
2018-05-14
Classical simulations of protein flexibility remain computationally expensive, especially for large proteins. A few years ago, we developed a fast method for predicting protein structure fluctuations that uses a single protein model as the input. The method has been made available as the CABS-flex web server and applied in numerous studies of protein structure-function relationships. Here, we present a major update of the CABS-flex web server to version 2.0. The new features include: extension of the method to significantly larger and multimeric proteins, customizable distance restraints and simulation parameters, contact maps and a new, enhanced web server interface. CABS-flex 2.0 is freely available at http://biocomp.chem.uw.edu.pl/CABSflex2.
Automated structure determination of proteins with the SAIL-FLYA NMR method.
Takeda, Mitsuhiro; Ikeya, Teppei; Güntert, Peter; Kainosho, Masatsune
2007-01-01
The labeling of proteins with stable isotopes enhances the NMR method for the determination of 3D protein structures in solution. Stereo-array isotope labeling (SAIL) provides an optimal stereospecific and regiospecific pattern of stable isotopes that yields sharpened lines, spectral simplification without loss of information, and the ability to collect rapidly and evaluate fully automatically the structural restraints required to solve a high-quality solution structure for proteins up to twice as large as those that can be analyzed using conventional methods. Here, we describe a protocol for the preparation of SAIL proteins by cell-free methods, including the preparation of S30 extract and their automated structure analysis using the FLYA algorithm and the program CYANA. Once efficient cell-free expression of the unlabeled or uniformly labeled target protein has been achieved, the NMR sample preparation of a SAIL protein can be accomplished in 3 d. A fully automated FLYA structure calculation can be completed in 1 d on a powerful computer system.
ERIC Educational Resources Information Center
Bednarski, April E.; Elgin, Sarah C. R.; Pakrasi, Himadri B.
2005-01-01
This inquiry-based lab is designed around genetic diseases with a focus on protein structure and function. To allow students to work on their own investigatory projects, 10 projects on 10 different proteins were developed. Students are grouped in sections of 20 and work in pairs on each of the projects. To begin their investigation, students are…
Folding and Stabilization of Native-Sequence-Reversed Proteins
Zhang, Yuanzhao; Weber, Jeffrey K; Zhou, Ruhong
2016-01-01
Though the problem of sequence-reversed protein folding is largely unexplored, one might speculate that reversed native protein sequences should be significantly more foldable than purely random heteropolymer sequences. In this article, we investigate how the reverse-sequences of native proteins might fold by examining a series of small proteins of increasing structural complexity (α-helix, β-hairpin, α-helix bundle, and α/β-protein). Employing a tandem protein structure prediction algorithmic and molecular dynamics simulation approach, we find that the ability of reverse sequences to adopt native-like folds is strongly influenced by protein size and the flexibility of the native hydrophobic core. For β-hairpins with reverse-sequences that fail to fold, we employ a simple mutational strategy for guiding stable hairpin formation that involves the insertion of amino acids into the β-turn region. This systematic look at reverse sequence duality sheds new light on the problem of protein sequence-structure mapping and may serve to inspire new protein design and protein structure prediction protocols. PMID:27113844
Folding and Stabilization of Native-Sequence-Reversed Proteins
NASA Astrophysics Data System (ADS)
Zhang, Yuanzhao; Weber, Jeffrey K.; Zhou, Ruhong
2016-04-01
Though the problem of sequence-reversed protein folding is largely unexplored, one might speculate that reversed native protein sequences should be significantly more foldable than purely random heteropolymer sequences. In this article, we investigate how the reverse-sequences of native proteins might fold by examining a series of small proteins of increasing structural complexity (α-helix, β-hairpin, α-helix bundle, and α/β-protein). Employing a tandem protein structure prediction algorithmic and molecular dynamics simulation approach, we find that the ability of reverse sequences to adopt native-like folds is strongly influenced by protein size and the flexibility of the native hydrophobic core. For β-hairpins with reverse-sequences that fail to fold, we employ a simple mutational strategy for guiding stable hairpin formation that involves the insertion of amino acids into the β-turn region. This systematic look at reverse sequence duality sheds new light on the problem of protein sequence-structure mapping and may serve to inspire new protein design and protein structure prediction protocols.
PreSSAPro: a software for the prediction of secondary structure by amino acid properties.
Costantini, Susan; Colonna, Giovanni; Facchiano, Angelo M
2007-10-01
PreSSAPro is a software, available to the scientific community as a free web service designed to provide predictions of secondary structures starting from the amino acid sequence of a given protein. Predictions are based on our recently published work on the amino acid propensities for secondary structures in either large but not homogeneous protein data sets, as well as in smaller but homogeneous data sets corresponding to protein structural classes, i.e. all-alpha, all-beta, or alpha-beta proteins. Predictions result improved by the use of propensities evaluated for the right protein class. PreSSAPro predicts the secondary structure according to the right protein class, if known, or gives a multiple prediction with reference to the different structural classes. The comparison of these predictions represents a novel tool to evaluate what sequence regions can assume different secondary structures depending on the structural class assignment, in the perspective of identifying proteins able to fold in different conformations. The service is available at the URL http://bioinformatica.isa.cnr.it/PRESSAPRO/.
An approach to large scale identification of non-obvious structural similarities between proteins
Cherkasov, Artem; Jones, Steven JM
2004-01-01
Background A new sequence independent bioinformatics approach allowing genome-wide search for proteins with similar three dimensional structures has been developed. By utilizing the numerical output of the sequence threading it establishes putative non-obvious structural similarities between proteins. When applied to the testing set of proteins with known three dimensional structures the developed approach was able to recognize structurally similar proteins with high accuracy. Results The method has been developed to identify pathogenic proteins with low sequence identity and high structural similarity to host analogues. Such protein structure relationships would be hypothesized to arise through convergent evolution or through ancient horizontal gene transfer events, now undetectable using current sequence alignment techniques. The pathogen proteins, which could mimic or interfere with host activities, would represent candidate virulence factors. The developed approach utilizes the numerical outputs from the sequence-structure threading. It identifies the potential structural similarity between a pair of proteins by correlating the threading scores of the corresponding two primary sequences against the library of the standard folds. This approach allowed up to 64% sensitivity and 99.9% specificity in distinguishing protein pairs with high structural similarity. Conclusion Preliminary results obtained by comparison of the genomes of Homo sapiens and several strains of Chlamydia trachomatis have demonstrated the potential usefulness of the method in the identification of bacterial proteins with known or potential roles in virulence. PMID:15147578
Beebe, Emily T.; Makino, Shin-ichi; Nozawa, Akira; Matsubara, Yuko; Frederick, Ronnie O.; Primm, John G.; Goren, Michael A.; Fox, Brian G.
2010-01-01
The use of the Protemist XE, an automated discontinuous-batch protein synthesis robot, in cell-free translation is reported. The soluble Galdieria sulphuraria protein DCN1 was obtained in greater than 2 mg total synthesis yield per mL of reaction mixture from the Protemist XE, and the structure was subsequently solved by X-ray crystallography using material from one 10 mL synthesis (PDB ID: 3KEV). The Protemist XE was also capable of membrane protein translation. Thus human sigma-1 receptor was translated in the presence of unilamellar liposomes and bacteriorhodopsin was translated directly into detergent micelles in the presence of all-trans-retinal. The versatility, ease of use, and compact size of the Protemist XE robot demonstrate its suitability for large-scale synthesis of many classes of proteins. PMID:20637905
Fast protein tertiary structure retrieval based on global surface shape similarity.
Sael, Lee; Li, Bin; La, David; Fang, Yi; Ramani, Karthik; Rustamov, Raif; Kihara, Daisuke
2008-09-01
Characterization and identification of similar tertiary structure of proteins provides rich information for investigating function and evolution. The importance of structure similarity searches is increasing as structure databases continue to expand, partly due to the structural genomics projects. A crucial drawback of conventional protein structure comparison methods, which compare structures by their main-chain orientation or the spatial arrangement of secondary structure, is that a database search is too slow to be done in real-time. Here we introduce a global surface shape representation by three-dimensional (3D) Zernike descriptors, which represent a protein structure compactly as a series expansion of 3D functions. With this simplified representation, the search speed against a few thousand structures takes less than a minute. To investigate the agreement between surface representation defined by 3D Zernike descriptor and conventional main-chain based representation, a benchmark was performed against a protein classification generated by the combinatorial extension algorithm. Despite the different representation, 3D Zernike descriptor retrieved proteins of the same conformation defined by combinatorial extension in 89.6% of the cases within the top five closest structures. The real-time protein structure search by 3D Zernike descriptor will open up new possibility of large-scale global and local protein surface shape comparison. 2008 Wiley-Liss, Inc.
G2S: a web-service for annotating genomic variants on 3D protein structures.
Wang, Juexin; Sheridan, Robert; Sumer, S Onur; Schultz, Nikolaus; Xu, Dong; Gao, Jianjiong
2018-06-01
Accurately mapping and annotating genomic locations on 3D protein structures is a key step in structure-based analysis of genomic variants detected by recent large-scale sequencing efforts. There are several mapping resources currently available, but none of them provides a web API (Application Programming Interface) that supports programmatic access. We present G2S, a real-time web API that provides automated mapping of genomic variants on 3D protein structures. G2S can align genomic locations of variants, protein locations, or protein sequences to protein structures and retrieve the mapped residues from structures. G2S API uses REST-inspired design and it can be used by various clients such as web browsers, command terminals, programming languages and other bioinformatics tools for bringing 3D structures into genomic variant analysis. The webserver and source codes are freely available at https://g2s.genomenexus.org. g2s@genomenexus.org. Supplementary data are available at Bioinformatics online.
High throughput platforms for structural genomics of integral membrane proteins.
Mancia, Filippo; Love, James
2011-08-01
Structural genomics approaches on integral membrane proteins have been postulated for over a decade, yet specific efforts are lagging years behind their soluble counterparts. Indeed, high throughput methodologies for production and characterization of prokaryotic integral membrane proteins are only now emerging, while large-scale efforts for eukaryotic ones are still in their infancy. Presented here is a review of recent literature on actively ongoing structural genomics of membrane protein initiatives, with a focus on those aimed at implementing interesting techniques aimed at increasing our rate of success for this class of macromolecules. Copyright © 2011 Elsevier Ltd. All rights reserved.
Discrete-continuous duality of protein structure space.
Sadreyev, Ruslan I; Kim, Bong-Hyun; Grishin, Nick V
2009-06-01
Recently, the nature of protein structure space has been widely discussed in the literature. The traditional discrete view of protein universe as a set of separate folds has been criticized in the light of growing evidence that almost any arrangement of secondary structures is possible and the whole protein space can be traversed through a path of similar structures. Here we argue that the discrete and continuous descriptions are not mutually exclusive, but complementary: the space is largely discrete in evolutionary sense, but continuous geometrically when purely structural similarities are quantified. Evolutionary connections are mainly confined to separate structural prototypes corresponding to folds as islands of structural stability, with few remaining traceable links between the islands. However, for a geometric similarity measure, it is usually possible to find a reasonable cutoff that yields paths connecting any two structures through intermediates.
Zhou, Ren-Bin; Lu, Hui-Meng; Liu, Jie; Shi, Jian-Yu; Zhu, Jing; Lu, Qin-Qin; Yin, Da-Chuan
2016-01-01
Recombinant expression of proteins has become an indispensable tool in modern day research. The large yields of recombinantly expressed proteins accelerate the structural and functional characterization of proteins. Nevertheless, there are literature reported that the recombinant proteins show some differences in structure and function as compared with the native ones. Now there have been more than 100,000 structures (from both recombinant and native sources) publicly available in the Protein Data Bank (PDB) archive, which makes it possible to investigate if there exist any proteins in the RCSB PDB archive that have identical sequence but have some difference in structures. In this paper, we present the results of a systematic comparative study of the 3D structures of identical naturally purified versus recombinantly expressed proteins. The structural data and sequence information of the proteins were mined from the RCSB PDB archive. The combinatorial extension (CE), FATCAT-flexible and TM-Align methods were employed to align the protein structures. The root-mean-square distance (RMSD), TM-score, P-value, Z-score, secondary structural elements and hydrogen bonds were used to assess the structure similarity. A thorough analysis of the PDB archive generated five-hundred-seventeen pairs of native and recombinant proteins that have identical sequence. There were no pairs of proteins that had the same sequence and significantly different structural fold, which support the hypothesis that expression in a heterologous host usually could fold correctly into their native forms.
Zhou, Ren-Bin; Lu, Hui-Meng; Liu, Jie; Shi, Jian-Yu; Zhu, Jing; Lu, Qin-Qin; Yin, Da-Chuan
2016-01-01
Recombinant expression of proteins has become an indispensable tool in modern day research. The large yields of recombinantly expressed proteins accelerate the structural and functional characterization of proteins. Nevertheless, there are literature reported that the recombinant proteins show some differences in structure and function as compared with the native ones. Now there have been more than 100,000 structures (from both recombinant and native sources) publicly available in the Protein Data Bank (PDB) archive, which makes it possible to investigate if there exist any proteins in the RCSB PDB archive that have identical sequence but have some difference in structures. In this paper, we present the results of a systematic comparative study of the 3D structures of identical naturally purified versus recombinantly expressed proteins. The structural data and sequence information of the proteins were mined from the RCSB PDB archive. The combinatorial extension (CE), FATCAT-flexible and TM-Align methods were employed to align the protein structures. The root-mean-square distance (RMSD), TM-score, P-value, Z-score, secondary structural elements and hydrogen bonds were used to assess the structure similarity. A thorough analysis of the PDB archive generated five-hundred-seventeen pairs of native and recombinant proteins that have identical sequence. There were no pairs of proteins that had the same sequence and significantly different structural fold, which support the hypothesis that expression in a heterologous host usually could fold correctly into their native forms. PMID:27517583
Jiang, Hanlun; Sheong, Fu Kit; Zhu, Lizhe; Gao, Xin; Bernauer, Julie; Huang, Xuhui
2015-07-01
Argonaute (Ago) proteins and microRNAs (miRNAs) are central components in RNA interference, which is a key cellular mechanism for sequence-specific gene silencing. Despite intensive studies, molecular mechanisms of how Ago recognizes miRNA remain largely elusive. In this study, we propose a two-step mechanism for this molecular recognition: selective binding followed by structural re-arrangement. Our model is based on the results of a combination of Markov State Models (MSMs), large-scale protein-RNA docking, and molecular dynamics (MD) simulations. Using MSMs, we identify an open state of apo human Ago-2 in fast equilibrium with partially open and closed states. Conformations in this open state are distinguished by their largely exposed binding grooves that can geometrically accommodate miRNA as indicated in our protein-RNA docking studies. miRNA may then selectively bind to these open conformations. Upon the initial binding, the complex may perform further structural re-arrangement as shown in our MD simulations and eventually reach the stable binary complex structure. Our results provide novel insights in Ago-miRNA recognition mechanisms and our methodology holds great potential to be widely applied in the studies of other important molecular recognition systems.
Unraveling the meaning of chemical shifts in protein NMR.
Berjanskii, Mark V; Wishart, David S
2017-11-01
Chemical shifts are among the most informative parameters in protein NMR. They provide wealth of information about protein secondary and tertiary structure, protein flexibility, and protein-ligand binding. In this report, we review the progress in interpreting and utilizing protein chemical shifts that has occurred over the past 25years, with a particular focus on the large body of work arising from our group and other Canadian NMR laboratories. More specifically, this review focuses on describing, assessing, and providing some historical context for various chemical shift-based methods to: (1) determine protein secondary and super-secondary structure; (2) derive protein torsion angles; (3) assess protein flexibility; (4) predict residue accessible surface area; (5) refine 3D protein structures; (6) determine 3D protein structures and (7) characterize intrinsically disordered proteins. This review also briefly covers some of the methods that we previously developed to predict chemical shifts from 3D protein structures and/or protein sequence data. It is hoped that this review will help to increase awareness of the considerable utility of NMR chemical shifts in structural biology and facilitate more widespread adoption of chemical-shift based methods by the NMR spectroscopists, structural biologists, protein biophysicists, and biochemists worldwide. This article is part of a Special Issue entitled: Biophysics in Canada, edited by Lewis Kay, John Baenziger, Albert Berghuis and Peter Tieleman. Copyright © 2017 Elsevier B.V. All rights reserved.
Impact of genetic variation on three dimensional structure and function of proteins
Bhattacharya, Roshni; Rose, Peter W.; Burley, Stephen K.
2017-01-01
The Protein Data Bank (PDB; http://wwpdb.org) was established in 1971 as the first open access digital data resource in biology with seven protein structures as its initial holdings. The global PDB archive now contains more than 126,000 experimentally determined atomic level three-dimensional (3D) structures of biological macromolecules (proteins, DNA, RNA), all of which are freely accessible via the Internet. Knowledge of the 3D structure of the gene product can help in understanding its function and role in disease. Of particular interest in the PDB archive are proteins for which 3D structures of genetic variant proteins have been determined, thus revealing atomic-level structural differences caused by the variation at the DNA level. Herein, we present a systematic and qualitative analysis of such cases. We observe a wide range of structural and functional changes caused by single amino acid differences, including changes in enzyme activity, aggregation propensity, structural stability, binding, and dissociation, some in the context of large assemblies. Structural comparison of wild type and mutated proteins, when both are available, provide insights into atomic-level structural differences caused by the genetic variation. PMID:28296894
Nullspace Sampling with Holonomic Constraints Reveals Molecular Mechanisms of Protein Gαs.
Pachov, Dimitar V; van den Bedem, Henry
2015-07-01
Proteins perform their function or interact with partners by exchanging between conformational substates on a wide range of spatiotemporal scales. Structurally characterizing these exchanges is challenging, both experimentally and computationally. Large, diffusional motions are often on timescales that are difficult to access with molecular dynamics simulations, especially for large proteins and their complexes. The low frequency modes of normal mode analysis (NMA) report on molecular fluctuations associated with biological activity. However, NMA is limited to a second order expansion about a minimum of the potential energy function, which limits opportunities to observe diffusional motions. By contrast, kino-geometric conformational sampling (KGS) permits large perturbations while maintaining the exact geometry of explicit conformational constraints, such as hydrogen bonds. Here, we extend KGS and show that a conformational ensemble of the α subunit Gαs of heterotrimeric stimulatory protein Gs exhibits structural features implicated in its activation pathway. Activation of protein Gs by G protein-coupled receptors (GPCRs) is associated with GDP release and large conformational changes of its α-helical domain. Our method reveals a coupled α-helical domain opening motion while, simultaneously, Gαs helix α5 samples an activated conformation. These motions are moderated in the activated state. The motion centers on a dynamic hub near the nucleotide-binding site of Gαs, and radiates to helix α4. We find that comparative NMA-based ensembles underestimate the amplitudes of the motion. Additionally, the ensembles fall short in predicting the accepted direction of the full activation pathway. Taken together, our findings suggest that nullspace sampling with explicit, holonomic constraints yields ensembles that illuminate molecular mechanisms involved in GDP release and protein Gs activation, and further establish conformational coupling between key structural elements of Gαs.
Nullspace Sampling with Holonomic Constraints Reveals Molecular Mechanisms of Protein Gαs
Pachov, Dimitar V.; van den Bedem, Henry
2015-01-01
Proteins perform their function or interact with partners by exchanging between conformational substates on a wide range of spatiotemporal scales. Structurally characterizing these exchanges is challenging, both experimentally and computationally. Large, diffusional motions are often on timescales that are difficult to access with molecular dynamics simulations, especially for large proteins and their complexes. The low frequency modes of normal mode analysis (NMA) report on molecular fluctuations associated with biological activity. However, NMA is limited to a second order expansion about a minimum of the potential energy function, which limits opportunities to observe diffusional motions. By contrast, kino-geometric conformational sampling (KGS) permits large perturbations while maintaining the exact geometry of explicit conformational constraints, such as hydrogen bonds. Here, we extend KGS and show that a conformational ensemble of the α subunit Gαs of heterotrimeric stimulatory protein Gs exhibits structural features implicated in its activation pathway. Activation of protein Gs by G protein-coupled receptors (GPCRs) is associated with GDP release and large conformational changes of its α-helical domain. Our method reveals a coupled α-helical domain opening motion while, simultaneously, Gαs helix α5 samples an activated conformation. These motions are moderated in the activated state. The motion centers on a dynamic hub near the nucleotide-binding site of Gαs, and radiates to helix α4. We find that comparative NMA-based ensembles underestimate the amplitudes of the motion. Additionally, the ensembles fall short in predicting the accepted direction of the full activation pathway. Taken together, our findings suggest that nullspace sampling with explicit, holonomic constraints yields ensembles that illuminate molecular mechanisms involved in GDP release and protein Gs activation, and further establish conformational coupling between key structural elements of Gαs. PMID:26218073
Chromophore photophysics and dynamics in fluorescent proteins of the GFP family
NASA Astrophysics Data System (ADS)
Nienhaus, Karin; Nienhaus, G. Ulrich
2016-11-01
Proteins of the green fluorescent protein (GFP) family are indispensable for fluorescence imaging experiments in the life sciences, particularly of living specimens. Their essential role as genetically encoded fluorescence markers has motivated many researchers over the last 20 years to further advance and optimize these proteins by using protein engineering. Amino acids can be exchanged by site-specific mutagenesis, starting with naturally occurring proteins as templates. Optical properties of the fluorescent chromophore are strongly tuned by the surrounding protein environment, and a targeted modification of chromophore-protein interactions requires a profound knowledge of the underlying photophysics and photochemistry, which has by now been well established from a large number of structural and spectroscopic experiments and molecular-mechanical and quantum-mechanical computations on many variants of fluorescent proteins. Nevertheless, such rational engineering often does not meet with success and thus is complemented by random mutagenesis and selection based on the optical properties. In this topical review, we present an overview of the key structural and spectroscopic properties of fluorescent proteins. We address protein-chromophore interactions that govern ground state optical properties as well as processes occurring in the electronically excited state. Special emphasis is placed on photoactivation of fluorescent proteins. These light-induced reactions result in large structural changes that drastically alter the fluorescence properties of the protein, which enables some of the most exciting applications, including single particle tracking, pulse chase imaging and super-resolution imaging. We also present a few examples of fluorescent protein application in live-cell imaging experiments.
Large-scale modelling of the divergent spectrin repeats in nesprins: giant modular proteins.
Autore, Flavia; Pfuhl, Mark; Quan, Xueping; Williams, Aisling; Roberts, Roland G; Shanahan, Catherine M; Fraternali, Franca
2013-01-01
Nesprin-1 and nesprin-2 are nuclear envelope (NE) proteins characterized by a common structure of an SR (spectrin repeat) rod domain and a C-terminal transmembrane KASH [Klarsicht-ANC-Syne-homology] domain and display N-terminal actin-binding CH (calponin homology) domains. Mutations in these proteins have been described in Emery-Dreifuss muscular dystrophy and attributed to disruptions of interactions at the NE with nesprins binding partners, lamin A/C and emerin. Evolutionary analysis of the rod domains of the nesprins has shown that they are almost entirely composed of unbroken SR-like structures. We present a bioinformatical approach to accurate definition of the boundaries of each SR by comparison with canonical SR structures, allowing for a large-scale homology modelling of the 74 nesprin-1 and 56 nesprin-2 SRs. The exposed and evolutionary conserved residues identify important pbs for protein-protein interactions that can guide tailored binding experiments. Most importantly, the bioinformatics analyses and the 3D models have been central to the design of selected constructs for protein expression. 1D NMR and CD spectra have been performed of the expressed SRs, showing a folded, stable, high content α-helical structure, typical of SRs. Molecular Dynamics simulations have been performed to study the structural and elastic properties of consecutive SRs, revealing insights in the mechanical properties adopted by these modules in the cell.
Hiraki, Masahiko; Kato, Ryuichi; Nagai, Minoru; Satoh, Tadashi; Hirano, Satoshi; Ihara, Kentaro; Kudo, Norio; Nagae, Masamichi; Kobayashi, Masanori; Inoue, Michio; Uejima, Tamami; Oda, Shunichiro; Chavas, Leonard M G; Akutsu, Masato; Yamada, Yusuke; Kawasaki, Masato; Matsugaki, Naohiro; Igarashi, Noriyuki; Suzuki, Mamoru; Wakatsuki, Soichi
2006-09-01
Protein crystallization remains one of the bottlenecks in crystallographic analysis of macromolecules. An automated large-scale protein-crystallization system named PXS has been developed consisting of the following subsystems, which proceed in parallel under unified control software: dispensing precipitants and protein solutions, sealing crystallization plates, carrying robot, incubators, observation system and image-storage server. A sitting-drop crystallization plate specialized for PXS has also been designed and developed. PXS can set up 7680 drops for vapour diffusion per hour, which includes time for replenishing supplies such as disposable tips and crystallization plates. Images of the crystallization drops are automatically recorded according to a preprogrammed schedule and can be viewed by users remotely using web-based browser software. A number of protein crystals were successfully produced and several protein structures could be determined directly from crystals grown by PXS. In other cases, X-ray quality crystals were obtained by further optimization by manual screening based on the conditions found by PXS.
Huang, Wenxi; Liu, Wanting; Jin, Jingjie; Xiao, Qilan; Lu, Ruibin; Chen, Wei; Xiong, Sheng; Zhang, Gong
2018-03-25
Translational pausing coordinates protein synthesis and co-translational folding. It is a common factor that facilitates the correct folding of large, multi-domain proteins. For small proteins, pausing sites rarely occurs in the gene body, and the 3'-end pausing sites are only essential for the folding of a fraction of proteins. The determinant of the necessity of the pausings remains obscure. In this study, we demonstrated that the steady-state structural fluctuation is a predictor of the necessity of pausing-mediated co-translational folding for small proteins. Validated by experiments with 5 model proteins, we found that the rigid protein structures do not, while the flexible structures do need 3'-end pausings to fold correctly. Therefore, rational optimization of translational pausing can improve soluble expression of small proteins with flexible structures, but not the rigid ones. The rigidity of the structure can be quantitatively estimated in silico using molecular dynamic simulation. Nevertheless, we also found that the translational pausing optimization increases the fitness of the expression host, and thus benefits the recombinant protein production, independent from the soluble expression. These results shed light on the structural basis of the translational pausing and provided a practical tool for industrial protein fermentation. Copyright © 2017. Published by Elsevier Inc.
Graf, Michael; Arenz, Stefan; Huter, Paul; Dönhöfer, Alexandra; Nováček, Jiří
2017-01-01
Abstract Ribosomes are the protein synthesizing machines of the cell. Recent advances in cryo-EM have led to the determination of structures from a variety of species, including bacterial 70S and eukaryotic 80S ribosomes as well as mitoribosomes from eukaryotic mitochondria, however, to date high resolution structures of plastid 70S ribosomes have been lacking. Here we present a cryo-EM structure of the spinach chloroplast 70S ribosome, with an average resolution of 5.4 Å for the small 30S subunit and 3.6 Å for the large 50S ribosomal subunit. The structure reveals the location of the plastid-specific ribosomal proteins (RPs) PSRP1, PSRP4, PSRP5 and PSRP6 as well as the numerous plastid-specific extensions of the RPs. We discover many features by which the plastid-specific extensions stabilize the ribosome via establishing additional interactions with surrounding ribosomal RNA and RPs. Moreover, we identify a large conglomerate of plastid-specific protein mass adjacent to the tunnel exit site that could facilitate interaction of the chloroplast ribosome with the thylakoid membrane and the protein-targeting machinery. Comparing the Escherichia coli 70S ribosome with that of the spinach chloroplast ribosome provides detailed insight into the co-evolution of RP and rRNA. PMID:27986857
An experimental point of view on hydration/solvation in halophilic proteins
Talon, Romain; Coquelle, Nicolas; Madern, Dominique; Girard, Eric
2014-01-01
Protein-solvent interactions govern the behaviors of proteins isolated from extreme halophiles. In this work, we compared the solvent envelopes of two orthologous tetrameric malate dehydrogenases (MalDHs) from halophilic and non-halophilic bacteria. The crystal structure of the MalDH from the non-halophilic bacterium Chloroflexus aurantiacus (Ca MalDH) solved, de novo, at 1.7 Å resolution exhibits numerous water molecules in its solvation shell. We observed that a large number of these water molecules are arranged in pentagonal polygons in the first hydration shell of Ca MalDH. Some of them are clustered in large networks, which cover non-polar amino acid surface. The crystal structure of MalDH from the extreme halophilic bacterium Salinibacter ruber (Sr) solved at 1.55 Å resolution shows that its surface is strongly enriched in acidic amino acids. The structural comparison of these two models is the first direct observation of the relative impact of acidic surface enrichment on the water structure organization between a halophilic protein and its non-adapted counterpart. The data show that surface acidic amino acids disrupt pentagonal water networks in the hydration shell. These crystallographic observations are discussed with respect to halophilic protein behaviors in solution PMID:24600446
An experimental point of view on hydration/solvation in halophilic proteins.
Talon, Romain; Coquelle, Nicolas; Madern, Dominique; Girard, Eric
2014-01-01
Protein-solvent interactions govern the behaviors of proteins isolated from extreme halophiles. In this work, we compared the solvent envelopes of two orthologous tetrameric malate dehydrogenases (MalDHs) from halophilic and non-halophilic bacteria. The crystal structure of the MalDH from the non-halophilic bacterium Chloroflexus aurantiacus (Ca MalDH) solved, de novo, at 1.7 Å resolution exhibits numerous water molecules in its solvation shell. We observed that a large number of these water molecules are arranged in pentagonal polygons in the first hydration shell of Ca MalDH. Some of them are clustered in large networks, which cover non-polar amino acid surface. The crystal structure of MalDH from the extreme halophilic bacterium Salinibacter ruber (Sr) solved at 1.55 Å resolution shows that its surface is strongly enriched in acidic amino acids. The structural comparison of these two models is the first direct observation of the relative impact of acidic surface enrichment on the water structure organization between a halophilic protein and its non-adapted counterpart. The data show that surface acidic amino acids disrupt pentagonal water networks in the hydration shell. These crystallographic observations are discussed with respect to halophilic protein behaviors in solution.
Effect of fullerenol surface chemistry on nanoparticle binding-induced protein misfolding
NASA Astrophysics Data System (ADS)
Radic, Slaven; Nedumpully-Govindan, Praveen; Chen, Ran; Salonen, Emppu; Brown, Jared M.; Ke, Pu Chun; Ding, Feng
2014-06-01
Fullerene and its derivatives with different surface chemistry have great potential in biomedical applications. Accordingly, it is important to delineate the impact of these carbon-based nanoparticles on protein structure, dynamics, and subsequently function. Here, we focused on the effect of hydroxylation -- a common strategy for solubilizing and functionalizing fullerene -- on protein-nanoparticle interactions using a model protein, ubiquitin. We applied a set of complementary computational modeling methods, including docking and molecular dynamics simulations with both explicit and implicit solvent, to illustrate the impact of hydroxylated fullerenes on the structure and dynamics of ubiquitin. We found that all derivatives bound to the model protein. Specifically, the more hydrophilic nanoparticles with a higher number of hydroxyl groups bound to the surface of the protein via hydrogen bonds, which stabilized the protein without inducing large conformational changes in the protein structure. In contrast, fullerene derivatives with a smaller number of hydroxyl groups buried their hydrophobic surface inside the protein, thereby causing protein denaturation. Overall, our results revealed a distinct role of surface chemistry on nanoparticle-protein binding and binding-induced protein misfolding.Fullerene and its derivatives with different surface chemistry have great potential in biomedical applications. Accordingly, it is important to delineate the impact of these carbon-based nanoparticles on protein structure, dynamics, and subsequently function. Here, we focused on the effect of hydroxylation -- a common strategy for solubilizing and functionalizing fullerene -- on protein-nanoparticle interactions using a model protein, ubiquitin. We applied a set of complementary computational modeling methods, including docking and molecular dynamics simulations with both explicit and implicit solvent, to illustrate the impact of hydroxylated fullerenes on the structure and dynamics of ubiquitin. We found that all derivatives bound to the model protein. Specifically, the more hydrophilic nanoparticles with a higher number of hydroxyl groups bound to the surface of the protein via hydrogen bonds, which stabilized the protein without inducing large conformational changes in the protein structure. In contrast, fullerene derivatives with a smaller number of hydroxyl groups buried their hydrophobic surface inside the protein, thereby causing protein denaturation. Overall, our results revealed a distinct role of surface chemistry on nanoparticle-protein binding and binding-induced protein misfolding. Electronic supplementary information (ESI) is available: Fluorescence spectra, ITC, CD spectra and other data as described in the text. See DOI: 10.1039/c4nr01544d
Fast large-scale clustering of protein structures using Gauss integrals.
Harder, Tim; Borg, Mikael; Boomsma, Wouter; Røgen, Peter; Hamelryck, Thomas
2012-02-15
Clustering protein structures is an important task in structural bioinformatics. De novo structure prediction, for example, often involves a clustering step for finding the best prediction. Other applications include assigning proteins to fold families and analyzing molecular dynamics trajectories. We present Pleiades, a novel approach to clustering protein structures with a rigorous mathematical underpinning. The method approximates clustering based on the root mean square deviation by first mapping structures to Gauss integral vectors--which were introduced by Røgen and co-workers--and subsequently performing K-means clustering. Compared to current methods, Pleiades dramatically improves on the time needed to perform clustering, and can cluster a significantly larger number of structures, while providing state-of-the-art results. The number of low energy structures generated in a typical folding study, which is in the order of 50,000 structures, can be clustered within seconds to minutes.
NASA Astrophysics Data System (ADS)
Wanapun, Duangporn; Wampler, Ronald D.; Begue, Nathan J.; Simpson, Garth J.
2008-03-01
A new method for sensitive determination of protein secondary structure via multi-photon absorption is considered theoretically. Perturbation theory is developed to describe the polarization-dependent two-photon absorption (TPA) of α-helix and β-sheet protein secondary structures. The exciton coupling interactions responsible for relatively weak electronic circular dichroism in one-photon absorption are predicted to give rise to large changes in the TPA cross-section (>200%) for circular versus linear incident polarizations, defined as CLD. The CLD effect in TPA is electric dipole-allowed, which explains the much greater sensitivity. These predictions suggest TPA should be a viable means of sensitively probing protein secondary structure.
Ligand binding by repeat proteins: natural and designed
Grove, Tijana Z; Cortajarena, Aitziber L; Regan, Lynne
2012-01-01
Repeat proteins contain tandem arrays of small structural motifs. As a consequence of this architecture, they adopt non-globular, extended structures that present large, highly specific surfaces for ligand binding. Here we discuss recent advances toward understanding the functional role of this unique modular architecture. We showcase specific examples of natural repeat proteins interacting with diverse ligands and also present examples of designed repeat protein–ligand interactions. PMID:18602006
TALEs from a spring--superelasticity of Tal effector protein structures.
Flechsig, Holger
2014-01-01
Transcription activator-like effectors (TALEs) are DNA-related proteins that recognise and bind specific target sequences to manipulate gene expression. Recently determined crystal structures show that their common architecture reveals a superhelical overall structure that may undergo drastic conformational changes. To establish a link between structure and dynamics in TALE proteins we have employed coarse-grained elastic-network modelling of currently available structural data and implemented a force-probe setup that allowed us to investigate their mechanical behaviour in computer experiments. Based on the measured force-extension curves we conclude that TALEs exhibit superelastic dynamical properties allowing for large-scale global conformational changes along their helical axis, which represents the soft direction in such proteins. For moderate external forcing the TALE models behave like linear springs, obeying Hooke's law, and the investigated structures can be characterised and compared by a corresponding spring constant. We show that conformational flexibility underlying the large-scale motions is not homogeneously distributed over the TALE structure, but instead soft spot residues around which strain is accumulated and which turn out to represent key agents in the transmission of conformational motions are identified. They correspond to the RVD loop residues that have been experimentally determined to play an eminent role in the binding process of target DNA.
TALEs from a Spring – Superelasticity of Tal Effector Protein Structures
Flechsig, Holger
2014-01-01
Transcription activator-like effectors (TALEs) are DNA-related proteins that recognise and bind specific target sequences to manipulate gene expression. Recently determined crystal structures show that their common architecture reveals a superhelical overall structure that may undergo drastic conformational changes. To establish a link between structure and dynamics in TALE proteins we have employed coarse-grained elastic-network modelling of currently available structural data and implemented a force-probe setup that allowed us to investigate their mechanical behaviour in computer experiments. Based on the measured force-extension curves we conclude that TALEs exhibit superelastic dynamical properties allowing for large-scale global conformational changes along their helical axis, which represents the soft direction in such proteins. For moderate external forcing the TALE models behave like linear springs, obeying Hooke's law, and the investigated structures can be characterised and compared by a corresponding spring constant. We show that conformational flexibility underlying the large-scale motions is not homogeneously distributed over the TALE structure, but instead soft spot residues around which strain is accumulated and which turn out to represent key agents in the transmission of conformational motions are identified. They correspond to the RVD loop residues that have been experimentally determined to play an eminent role in the binding process of target DNA. PMID:25313859
Glyakina, Anna V; Pereyaslavets, Leonid B; Galzitskaya, Oxana V
2013-09-01
Despite the large number of publications on three-helix protein folding, there is no study devoted to the influence of handedness on the rate of three-helix protein folding. From the experimental studies, we make a conclusion that the left-handed three-helix proteins fold faster than the right-handed ones. What may explain this difference? An important question arising in this paper is whether the modeling of protein folding can catch the difference between the protein folding rates of proteins with similar structures but with different folding mechanisms. To answer this question, the folding of eight three-helix proteins (four right-handed and four left-handed), which are similar in size, was modeled using the Monte Carlo and dynamic programming methods. The studies allowed us to determine the orders of folding of the secondary-structure elements in these domains and amino acid residues which are important for the folding. The obtained data are in good correlation with each other and with the experimental data. Structural analysis of these proteins demonstrated that the left-handed domains have a lesser number of contacts per residue and a smaller radius of cross section than the right-handed domains. This may be one of the explanations of the observed fact. The same tendency is observed for the large dataset consisting of 332 three-helix proteins (238 right- and 94 left-handed). From our analysis, we found that the left-handed three-helix proteins have some less-dense packing that should result in faster folding for some proteins as compared to the case of right-handed proteins. Copyright © 2013 Wiley Periodicals, Inc.
Understand protein functions by comparing the similarity of local structural environments.
Chen, Jiawen; Xie, Zhong-Ru; Wu, Yinghao
2017-02-01
The three-dimensional structures of proteins play an essential role in regulating binding between proteins and their partners, offering a direct relationship between structures and functions of proteins. It is widely accepted that the function of a protein can be determined if its structure is similar to other proteins whose functions are known. However, it is also observed that proteins with similar global structures do not necessarily correspond to the same function, while proteins with very different folds can share similar functions. This indicates that function similarity is originated from the local structural information of proteins instead of their global shapes. We assume that proteins with similar local environments prefer binding to similar types of molecular targets. In order to testify this assumption, we designed a new structural indicator to define the similarity of local environment between residues in different proteins. This indicator was further used to calculate the probability that a given residue binds to a specific type of structural neighbors, including DNA, RNA, small molecules and proteins. After applying the method to a large-scale non-redundant database of proteins, we show that the positive signal of binding probability calculated from the local structural indicator is statistically meaningful. In summary, our studies suggested that the local environment of residues in a protein is a good indicator to recognize specific binding partners of the protein. The new method could be a potential addition to a suite of existing template-based approaches for protein function prediction. Copyright © 2016 Elsevier B.V. All rights reserved.
2014-01-01
Background Protein model quality assessment is an essential component of generating and using protein structural models. During the Tenth Critical Assessment of Techniques for Protein Structure Prediction (CASP10), we developed and tested four automated methods (MULTICOM-REFINE, MULTICOM-CLUSTER, MULTICOM-NOVEL, and MULTICOM-CONSTRUCT) that predicted both local and global quality of protein structural models. Results MULTICOM-REFINE was a clustering approach that used the average pairwise structural similarity between models to measure the global quality and the average Euclidean distance between a model and several top ranked models to measure the local quality. MULTICOM-CLUSTER and MULTICOM-NOVEL were two new support vector machine-based methods of predicting both the local and global quality of a single protein model. MULTICOM-CONSTRUCT was a new weighted pairwise model comparison (clustering) method that used the weighted average similarity between models in a pool to measure the global model quality. Our experiments showed that the pairwise model assessment methods worked better when a large portion of models in the pool were of good quality, whereas single-model quality assessment methods performed better on some hard targets when only a small portion of models in the pool were of reasonable quality. Conclusions Since digging out a few good models from a large pool of low-quality models is a major challenge in protein structure prediction, single model quality assessment methods appear to be poised to make important contributions to protein structure modeling. The other interesting finding was that single-model quality assessment scores could be used to weight the models by the consensus pairwise model comparison method to improve its accuracy. PMID:24731387
Cao, Renzhi; Wang, Zheng; Cheng, Jianlin
2014-04-15
Protein model quality assessment is an essential component of generating and using protein structural models. During the Tenth Critical Assessment of Techniques for Protein Structure Prediction (CASP10), we developed and tested four automated methods (MULTICOM-REFINE, MULTICOM-CLUSTER, MULTICOM-NOVEL, and MULTICOM-CONSTRUCT) that predicted both local and global quality of protein structural models. MULTICOM-REFINE was a clustering approach that used the average pairwise structural similarity between models to measure the global quality and the average Euclidean distance between a model and several top ranked models to measure the local quality. MULTICOM-CLUSTER and MULTICOM-NOVEL were two new support vector machine-based methods of predicting both the local and global quality of a single protein model. MULTICOM-CONSTRUCT was a new weighted pairwise model comparison (clustering) method that used the weighted average similarity between models in a pool to measure the global model quality. Our experiments showed that the pairwise model assessment methods worked better when a large portion of models in the pool were of good quality, whereas single-model quality assessment methods performed better on some hard targets when only a small portion of models in the pool were of reasonable quality. Since digging out a few good models from a large pool of low-quality models is a major challenge in protein structure prediction, single model quality assessment methods appear to be poised to make important contributions to protein structure modeling. The other interesting finding was that single-model quality assessment scores could be used to weight the models by the consensus pairwise model comparison method to improve its accuracy.
Crystal growth of enzymes in low gravity (L-5)
NASA Technical Reports Server (NTRS)
Morita, Yuhei
1993-01-01
Recent developments in protein engineering have expanded the possibilities of studies of enzymes and other proteins. Now such studies are not limited to the elucidation of the relationship between the structure and function of the protein. They also aim at the production of proteins with new and practical functions, based on results obtained during investigation of structure and function. For continuing research in this field, investigation of the tertiary structure of proteins is important. X-ray diffraction of single crystals of protein is usually used for this purpose. The main difficulty is the preparation of the crystals. The theme of the research is to prepare such crystals at very low gravity, with the main purpose being to obtain large single crystals of proteins suitable for x-ray diffraction studies.
The cryo-electron microscopy structure of huntingtin
NASA Astrophysics Data System (ADS)
Guo, Qiang; Bin Huang; Cheng, Jingdong; Seefelder, Manuel; Engler, Tatjana; Pfeifer, Günter; Oeckl, Patrick; Otto, Markus; Moser, Franziska; Maurer, Melanie; Pautsch, Alexander; Baumeister, Wolfgang; Fernández-Busnadiego, Rubén; Kochanek, Stefan
2018-03-01
Huntingtin (HTT) is a large (348 kDa) protein that is essential for embryonic development and is involved in diverse cellular activities such as vesicular transport, endocytosis, autophagy and the regulation of transcription. Although an integrative understanding of the biological functions of HTT is lacking, the large number of identified HTT interactors suggests that it serves as a protein-protein interaction hub. Furthermore, Huntington’s disease is caused by a mutation in the HTT gene, resulting in a pathogenic expansion of a polyglutamine repeat at the amino terminus of HTT. However, only limited structural information regarding HTT is currently available. Here we use cryo-electron microscopy to determine the structure of full-length human HTT in a complex with HTT-associated protein 40 (HAP40; encoded by three F8A genes in humans) to an overall resolution of 4 Å. HTT is largely α-helical and consists of three major domains. The amino- and carboxy-terminal domains contain multiple HEAT (huntingtin, elongation factor 3, protein phosphatase 2A and lipid kinase TOR) repeats arranged in a solenoid fashion. These domains are connected by a smaller bridge domain containing different types of tandem repeats. HAP40 is also largely α-helical and has a tetratricopeptide repeat-like organization. HAP40 binds in a cleft and contacts the three HTT domains by hydrophobic and electrostatic interactions, thereby stabilizing the conformation of HTT. These data rationalize previous biochemical results and pave the way for improved understanding of the diverse cellular functions of HTT.
Bandyopadhyay, Deepak; Huan, Jun; Prins, Jan; Snoeyink, Jack; Wang, Wei; Tropsha, Alexander
2009-11-01
Protein function prediction is one of the central problems in computational biology. We present a novel automated protein structure-based function prediction method using libraries of local residue packing patterns that are common to most proteins in a known functional family. Critical to this approach is the representation of a protein structure as a graph where residue vertices (residue name used as a vertex label) are connected by geometrical proximity edges. The approach employs two steps. First, it uses a fast subgraph mining algorithm to find all occurrences of family-specific labeled subgraphs for all well characterized protein structural and functional families. Second, it queries a new structure for occurrences of a set of motifs characteristic of a known family, using a graph index to speed up Ullman's subgraph isomorphism algorithm. The confidence of function inference from structure depends on the number of family-specific motifs found in the query structure compared with their distribution in a large non-redundant database of proteins. This method can assign a new structure to a specific functional family in cases where sequence alignments, sequence patterns, structural superposition and active site templates fail to provide accurate annotation.
Sun, Ning; Shibata, Brad; Hess, John F.
2016-01-01
Purpose The differentiated lens fiber cell assembles a filamentous cytoskeletal structure referred to as the beaded filament (BF). The BF requires CP49 (bfsp2) and filensin (bfsp1) for assembly, both of which are highly divergent members of the large intermediate filament (IF) family of proteins. Thus far, these two proteins have been reported only in the differentiated lens fiber cell. For this reason, both proteins have been considered robust markers of fiber cell differentiation. We report here that both proteins are also expressed in the mouse lens epithelium, but only after 5 weeks of age. Methods Localization of CP49 was achieved with immunocytochemical probing of wild-type, CP49 knockout, filensin knockout, and vimentin knockout mice, in sections and in the explanted lens epithelium, at the light microscope and electron microscope levels. The relationship between CP49 and other cytoskeletal elements was probed using fluorescent phalloidin, as well as with antibodies to vimentin, GFAP, and α-tubulin. The relationship between CP49 and the aggresome was probed with antibodies to γ-tubulin, ubiquitin, and HDAC6. Results CP49 and filensin were expressed in the mouse lens epithelium, but only after 5 weeks of age. At the light microscope level, these two proteins colocalize to a large tubular structure, approximately 7 × 1 μm, which was typically present at one to two copies per cell. This structure is found in the anterior and anterolateral lens epithelium, including the zone where mitosis occurs. The structure becomes smaller and largely undetectable closer to the equator where the cell exits the cell cycle and commits to fiber cell differentiation. This structure bears some resemblance to the aggresome and is reactive with antibodies to HDAC6, a marker for the aggresome. However, the structure does not colocalize with antibodies to γ-tubulin or ubiquitin, also markers for the aggresome. The structure also colocalizes with actin but appears to largely exclude vimentin and α-tubulin. In the CP49 and filensin knockouts, this structure is absent, confirming the identity of CP49 and filensin in this structure, and suggesting a requirement for the physiologic coassembly of CP49 and filensin. Conclusions CP49 and filensin have been considered robust markers for mouse lens fiber cell differentiation. The data reported here, however, document both proteins in the mouse lens epithelium, but only after 5 weeks of age, when lens epithelial growth and mitotic activity have slowed. Because of this, CP49 and filensin must be considered markers of differentiation for both fiber cells and the lens epithelium in the mouse. In addition, to our knowledge, no other protein has been shown to emerge so late in the development of the mouse lens epithelium, suggesting that lens epithelial differentiation may continue well into post-natal life. If this structure is related to the aggresome, it is a rare, or perhaps unique example of a large, stable aggresome in wild-type tissue. PMID:27559293
ssHMM: extracting intuitive sequence-structure motifs from high-throughput RNA-binding protein data
Krestel, Ralf; Ohler, Uwe; Vingron, Martin; Marsico, Annalisa
2017-01-01
Abstract RNA-binding proteins (RBPs) play an important role in RNA post-transcriptional regulation and recognize target RNAs via sequence-structure motifs. The extent to which RNA structure influences protein binding in the presence or absence of a sequence motif is still poorly understood. Existing RNA motif finders either take the structure of the RNA only partially into account, or employ models which are not directly interpretable as sequence-structure motifs. We developed ssHMM, an RNA motif finder based on a hidden Markov model (HMM) and Gibbs sampling which fully captures the relationship between RNA sequence and secondary structure preference of a given RBP. Compared to previous methods which output separate logos for sequence and structure, it directly produces a combined sequence-structure motif when trained on a large set of sequences. ssHMM’s model is visualized intuitively as a graph and facilitates biological interpretation. ssHMM can be used to find novel bona fide sequence-structure motifs of uncharacterized RBPs, such as the one presented here for the YY1 protein. ssHMM reaches a high motif recovery rate on synthetic data, it recovers known RBP motifs from CLIP-Seq data, and scales linearly on the input size, being considerably faster than MEMERIS and RNAcontext on large datasets while being on par with GraphProt. It is freely available on Github and as a Docker image. PMID:28977546
Hierarchical and Helical Self-assembly of ADP-ribosyl Cyclase into Large-scale Protein Microtubes
Liu, Qun; Kriksunov, Irina A.; Wang, Zhongwu; Graeff, Richard; Lee, Hon Cheung; Hao, Quan
2013-01-01
Proteins are macromolecules with characteristic structures and biological functions. It is extremely challenging to obtain protein microtube structures through self-assembly as proteins are very complex and flexible. Here we present a strategy showing how a specific protein, ADP-ribosyl cyclase, helically self-assembles from monomers into hexagonal nanochains and further to highly ordered crystalline microtubes. The structures of protein nanochains and consequently self-assembled superlattice were determined by X-ray crystallography at 4.5 Å resolution and imaged by Scanning Electron Microscopy. The protein initially forms into dimers that have a fixed size of 5.6 nm, and then, helically self-assembles into 35.6 nm long hexagonal nanochains. One such nanochain consists of six dimers (12 monomers) that stack in order by a pseudo P61 screw axis. Seven nanochains produce a series of largescale assemblies, nanorods, forming the building blocks for microrods. A proposed aging process of microrods results in the formation of hollow microstructures. Synthesis and characterization of large scale self-assembled protein microtubes may pave a new pathway, capable of not only understanding the self-assembly dynamics of biological materials, but also directing design and fabrication of multifunctional nanobuilding blocks with particular applications in biomedical engineering. PMID:18956900
Crystal structure of the YDR533c S. cerevisiae protein, a class II member of the Hsp31 family.
Graille, Marc; Quevillon-Cheruel, Sophie; Leulliot, Nicolas; Zhou, Cong-Zhao; Li de la Sierra Gallay, Ines; Jacquamet, Lilian; Ferrer, Jean-Luc; Liger, Dominique; Poupon, Anne; Janin, Joel; van Tilbeurgh, Herman
2004-05-01
The ORF YDR533c from Saccharomyces cerevisiae codes for a 25.5 kDa protein of unknown biochemical function. Transcriptome analysis of yeast has shown that this gene is activated in response to various stress conditions together with proteins belonging to the heat shock family. In order to clarify its biochemical function, we determined the crystal structure of YDR533c to 1.85 A resolution by the single anomalous diffraction method. The protein possesses an alpha/beta hydrolase fold and a putative Cys-His-Glu catalytic triad common to a large enzyme family containing proteases, amidotransferases, lipases, and esterases. The protein has strong structural resemblance with the E. coli Hsp31 protein and the intracellular protease I from Pyrococcus horikoshii, which are considered class I and class III members of the Hsp31 family, respectively. Detailed structural analysis strongly suggests that the YDR533c protein crystal structure is the first one of a class II member of the Hsp31 family.
Pulawski, Wojciech; Jamroz, Michal; Kolinski, Michal; Kolinski, Andrzej; Kmiecik, Sebastian
2016-11-28
The CABS coarse-grained model is a well-established tool for modeling globular proteins (predicting their structure, dynamics, and interactions). Here we introduce an extension of the CABS representation and force field (CABS-membrane) to the modeling of the effect of the biological membrane environment on the structure of membrane proteins. We validate the CABS-membrane model in folding simulations of 10 short helical membrane proteins not using any knowledge about their structure. The simulations start from random protein conformations placed outside the membrane environment and allow for full flexibility of the modeled proteins during their spontaneous insertion into the membrane. In the resulting trajectories, we have found models close to the experimental membrane structures. We also attempted to select the correctly folded models using simple filtering followed by structural clustering combined with reconstruction to the all-atom representation and all-atom scoring. The CABS-membrane model is a promising approach for further development toward modeling of large protein-membrane systems.
Dynamic protein interaction networks and new structural paradigms in signaling
Csizmok, Veronika; Follis, Ariele Viacava; Kriwacki, Richard W.; Forman-Kay, Julie D.
2017-01-01
Understanding signaling and other complex biological processes requires elucidating the critical roles of intrinsically disordered proteins and regions (IDPs/IDRs), which represent ~30% of the proteome and enable unique regulatory mechanisms. In this review we describe the structural heterogeneity of disordered proteins that underpins these mechanisms and the latest progress in obtaining structural descriptions of ensembles of disordered proteins that are needed for linking structure and dynamics to function. We describe the diverse interactions of IDPs that can have unusual characteristics such as “ultrasensitivity” and “regulated folding and unfolding”. We also summarize the mounting data showing that large-scale assembly and protein phase separation occurs within a variety of signaling complexes and cellular structures. In addition, we discuss efforts to therapeutically target disordered proteins with small molecules. Overall, we interpret the remodeling of disordered state ensembles due to binding and post-translational modifications within an expanded framework for allostery that provides significant insights into how disordered proteins transmit biological information. PMID:26922996
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dolan, Kyle T.; Duguid, Erica M.; He, Chuan
2011-11-17
SlyA is a master virulence regulator that controls the transcription of numerous genes in Salmonella enterica. We present here crystal structures of SlyA by itself and bound to a high-affinity DNA operator sequence in the slyA gene. SlyA interacts with DNA through direct recognition of a guanine base by Arg-65, as well as interactions between conserved Arg-86 and the minor groove and a large network of non-base-specific contacts with the sugar phosphate backbone. Our structures, together with an unpublished structure of SlyA bound to the small molecule effector salicylate (Protein Data Bank code 3DEU), reveal that, unlike many other MarRmore » family proteins, SlyA dissociates from DNA without large conformational changes when bound to this effector. We propose that SlyA and other MarR global regulators rely more on indirect readout of DNA sequence to exert control over many genes, in contrast to proteins (such as OhrR) that recognize a single operator.« less
Extreme disorder in an ultrahigh-affinity protein complex
NASA Astrophysics Data System (ADS)
Borgia, Alessandro; Borgia, Madeleine B.; Bugge, Katrine; Kissling, Vera M.; Heidarsson, Pétur O.; Fernandes, Catarina B.; Sottini, Andrea; Soranno, Andrea; Buholzer, Karin J.; Nettels, Daniel; Kragelund, Birthe B.; Best, Robert B.; Schuler, Benjamin
2018-03-01
Molecular communication in biology is mediated by protein interactions. According to the current paradigm, the specificity and affinity required for these interactions are encoded in the precise complementarity of binding interfaces. Even proteins that are disordered under physiological conditions or that contain large unstructured regions commonly interact with well-structured binding sites on other biomolecules. Here we demonstrate the existence of an unexpected interaction mechanism: the two intrinsically disordered human proteins histone H1 and its nuclear chaperone prothymosin-α associate in a complex with picomolar affinity, but fully retain their structural disorder, long-range flexibility and highly dynamic character. On the basis of closely integrated experiments and molecular simulations, we show that the interaction can be explained by the large opposite net charge of the two proteins, without requiring defined binding sites or interactions between specific individual residues. Proteome-wide sequence analysis suggests that this interaction mechanism may be abundant in eukaryotes.
Structure Prediction and Analysis of DNA Transposon and LINE Retrotransposon Proteins*
Abrusán, György; Zhang, Yang; Szilágyi, András
2013-01-01
Despite the considerable amount of research on transposable elements, no large-scale structural analyses of the TE proteome have been performed so far. We predicted the structures of hundreds of proteins from a representative set of DNA and LINE transposable elements and used the obtained structural data to provide the first general structural characterization of TE proteins and to estimate the frequency of TE domestication and horizontal transfer events. We show that 1) ORF1 and Gag proteins of retrotransposons contain high amounts of structural disorder; thus, despite their very low conservation, the presence of disordered regions and probably their chaperone function is conserved. 2) The distribution of SCOP classes in DNA transposons and LINEs indicates that the proteins of DNA transposons are more ancient, containing folds that already existed when the first cellular organisms appeared. 3) DNA transposon proteins have lower contact order than randomly selected reference proteins, indicating rapid folding, most likely to avoid protein aggregation. 4) Structure-based searches for TE homologs indicate that the overall frequency of TE domestication events is low, whereas we found a relatively high number of cases where horizontal transfer, frequently involving parasites, is the most likely explanation for the observed homology. PMID:23530042
GOSSIP: a method for fast and accurate global alignment of protein structures.
Kifer, I; Nussinov, R; Wolfson, H J
2011-04-01
The database of known protein structures (PDB) is increasing rapidly. This results in a growing need for methods that can cope with the vast amount of structural data. To analyze the accumulating data, it is important to have a fast tool for identifying similar structures and clustering them by structural resemblance. Several excellent tools have been developed for the comparison of protein structures. These usually address the task of local structure alignment, an important yet computationally intensive problem due to its complexity. It is difficult to use such tools for comparing a large number of structures to each other at a reasonable time. Here we present GOSSIP, a novel method for a global all-against-all alignment of any set of protein structures. The method detects similarities between structures down to a certain cutoff (a parameter of the program), hence allowing it to detect similar structures at a much higher speed than local structure alignment methods. GOSSIP compares many structures in times which are several orders of magnitude faster than well-known available structure alignment servers, and it is also faster than a database scanning method. We evaluate GOSSIP both on a dataset of short structural fragments and on two large sequence-diverse structural benchmarks. Our conclusions are that for a threshold of 0.6 and above, the speed of GOSSIP is obtained with no compromise of the accuracy of the alignments or of the number of detected global similarities. A server, as well as an executable for download, are available at http://bioinfo3d.cs.tau.ac.il/gossip/.
Visualization of SV2A conformations in situ by the use of Protein Tomography
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lynch, Berkley A.; Matagne, Alain; Braennstroem, Annika
2008-10-31
The synaptic vesicle protein 2A (SV2A), the brain-binding site of the anti-epileptic drug levetiracetam (LEV), has been characterized by Protein Tomography{sup TM}. We identified two major conformations of SV2A in mouse brain tissue: first, a compact, funnel-structure with a pore-like opening towards the cytoplasm; second, a more open, V-shaped structure with a cleft-like opening towards the intravesicular space. The large differences between these conformations suggest a high degree of flexibility and support a valve-like mechanism consistent with the postulated transporter role of SV2A. These two conformations are represented both in samples treated with LEV, and in saline-treated samples, which indicatesmore » that LEV binding does not cause a large-scale conformational change of SV2A, or lock a specific conformational state of the protein. This study provides the first direct structural data on SV2A, and supports a transporter function suggested by sequence homology to MFS class of transporter proteins.« less
Projections for fast protein structure retrieval
Bhattacharya, Sourangshu; Bhattacharyya, Chiranjib; Chandra, Nagasuma R
2006-01-01
Background In recent times, there has been an exponential rise in the number of protein structures in databases e.g. PDB. So, design of fast algorithms capable of querying such databases is becoming an increasingly important research issue. This paper reports an algorithm, motivated from spectral graph matching techniques, for retrieving protein structures similar to a query structure from a large protein structure database. Each protein structure is specified by the 3D coordinates of residues of the protein. The algorithm is based on a novel characterization of the residues, called projections, leading to a similarity measure between the residues of the two proteins. This measure is exploited to efficiently compute the optimal equivalences. Results Experimental results show that, the current algorithm outperforms the state of the art on benchmark datasets in terms of speed without losing accuracy. Search results on SCOP 95% nonredundant database, for fold similarity with 5 proteins from different SCOP classes show that the current method performs competitively with the standard algorithm CE. The algorithm is also capable of detecting non-topological similarities between two proteins which is not possible with most of the state of the art tools like Dali. PMID:17254310
Barradas-Bautista, Didier; Fernández-Recio, Juan
2017-01-01
Next-generation sequencing (NGS) technologies are providing genomic information for an increasing number of healthy individuals and patient populations. In the context of the large amount of generated genomic data that is being generated, understanding the effect of disease-related mutations at molecular level can contribute to close the gap between genotype and phenotype and thus improve prevention, diagnosis or treatment of a pathological condition. In order to fully characterize the effect of a pathological mutation and have useful information for prediction purposes, it is important first to identify whether the mutation is located at a protein-binding interface, and second to understand the effect on the binding affinity of the affected interaction/s. Computational methods, such as protein docking are currently used to complement experimental efforts and could help to build the human structural interactome. Here we have extended the original pyDockNIP method to predict the location of disease-associated nsSNPs at protein-protein interfaces, when there is no available structure for the protein-protein complex. We have applied this approach to the pathological interaction networks of six diseases with low structural data on PPIs. This approach can almost double the number of nsSNPs that can be characterized and identify edgetic effects in many nsSNPs that were previously unknown. This can help to annotate and interpret genomic data from large-scale population studies, and to achieve a better understanding of disease at molecular level.
2017-01-01
Next-generation sequencing (NGS) technologies are providing genomic information for an increasing number of healthy individuals and patient populations. In the context of the large amount of generated genomic data that is being generated, understanding the effect of disease-related mutations at molecular level can contribute to close the gap between genotype and phenotype and thus improve prevention, diagnosis or treatment of a pathological condition. In order to fully characterize the effect of a pathological mutation and have useful information for prediction purposes, it is important first to identify whether the mutation is located at a protein-binding interface, and second to understand the effect on the binding affinity of the affected interaction/s. Computational methods, such as protein docking are currently used to complement experimental efforts and could help to build the human structural interactome. Here we have extended the original pyDockNIP method to predict the location of disease-associated nsSNPs at protein-protein interfaces, when there is no available structure for the protein-protein complex. We have applied this approach to the pathological interaction networks of six diseases with low structural data on PPIs. This approach can almost double the number of nsSNPs that can be characterized and identify edgetic effects in many nsSNPs that were previously unknown. This can help to annotate and interpret genomic data from large-scale population studies, and to achieve a better understanding of disease at molecular level. PMID:28841721
Ayuso-Tejedor, Sara; Angarica, Vladimir Espinosa; Bueno, Marta; Campos, Luis A; Abián, Olga; Bernadó, Pau; Sancho, Javier; Jiménez, M Angeles
2010-07-23
Partly unfolded protein conformations close to the native state may play important roles in protein function and in protein misfolding. Structural analyses of such conformations which are essential for their fully physicochemical understanding are complicated by their characteristic low populations at equilibrium. We stabilize here with a single mutation the equilibrium intermediate of apoflavodoxin thermal unfolding and determine its solution structure by NMR. It consists of a large native region identical with that observed in the X-ray structure of the wild-type protein plus an unfolded region. Small-angle X-ray scattering analysis indicates that the calculated ensemble of structures is consistent with the actual degree of expansion of the intermediate. The unfolded region encompasses discontinuous sequence segments that cluster in the 3D structure of the native protein forming the FMN cofactor binding loops and the binding site of a variety of partner proteins. Analysis of the apoflavodoxin inner interfaces reveals that those becoming destabilized in the intermediate are more polar than other inner interfaces of the protein. Natively folded proteins contain hydrophobic cores formed by the packing of hydrophobic surfaces, while natively unfolded proteins are rich in polar residues. The structure of the apoflavodoxin thermal intermediate suggests that the regions of natively folded proteins that are easily responsive to thermal activation may contain cores of intermediate hydrophobicity. Copyright (c) 2010 Elsevier Ltd. All rights reserved.
Bioinformatics and variability in drug response: a protein structural perspective
Lahti, Jennifer L.; Tang, Grace W.; Capriotti, Emidio; Liu, Tianyun; Altman, Russ B.
2012-01-01
Marketed drugs frequently perform worse in clinical practice than in the clinical trials on which their approval is based. Many therapeutic compounds are ineffective for a large subpopulation of patients to whom they are prescribed; worse, a significant fraction of patients experience adverse effects more severe than anticipated. The unacceptable risk–benefit profile for many drugs mandates a paradigm shift towards personalized medicine. However, prior to adoption of patient-specific approaches, it is useful to understand the molecular details underlying variable drug response among diverse patient populations. Over the past decade, progress in structural genomics led to an explosion of available three-dimensional structures of drug target proteins while efforts in pharmacogenetics offered insights into polymorphisms correlated with differential therapeutic outcomes. Together these advances provide the opportunity to examine how altered protein structures arising from genetic differences affect protein–drug interactions and, ultimately, drug response. In this review, we first summarize structural characteristics of protein targets and common mechanisms of drug interactions. Next, we describe the impact of coding mutations on protein structures and drug response. Finally, we highlight tools for analysing protein structures and protein–drug interactions and discuss their application for understanding altered drug responses associated with protein structural variants. PMID:22552919
Soulages, Jose L.; Kim, Kangmin; Arrese, Estela L.; Walters, Christina; Cushman, John C.
2003-01-01
Late embryogenesis abundant (LEA) proteins are members of a large group of hydrophilic, glycine-rich proteins found in plants, algae, fungi, and bacteria known collectively as hydrophilins that are preferentially expressed in response to dehydration or hyperosmotic stress. Group 2 LEA (dehydrins or responsive to abscisic acid) proteins are postulated to stabilize macromolecules against damage by freezing, dehydration, ionic, or osmotic stress. However, the structural and physicochemical properties of group 2 LEA proteins that account for such functions remain unknown. We have analyzed the structural properties of a recombinant form of a soybean (Glycine max) group 2 LEA (rGmDHN1). Differential scanning calorimetry of purified rGmDHN1 demonstrated that the protein does not display a cooperative unfolding transition upon heating. Ultraviolet absorption and circular dichroism spectroscopy revealed that the protein is in a largely hydrated and unstructured conformation in solution. However, ultraviolet absorption and circular dichroism measurements collected at different temperatures showed that the protein exists in equilibrium between two extended conformational states: unordered and left-handed extended helical or poly (l-proline)-type II structures. It is estimated that 27% of the residues of rGmDHN1 adopt or poly (l-proline)-type II-like helical conformation at 12°C. The content of extended helix gradually decreases to 15% as the temperature is increased to 80°C. Studies of the conformation of the protein in solution in the presence of liposomes, trifluoroethanol, and sodium dodecyl sulfate indicated that rGmDHN1 has a very low intrinsic ability to adopt α-helical structure and to interact with phospholipid bilayers through amphipathic α-helices. The ability of the protein to remain in a highly extended conformation at low temperatures could constitute the basis of the functional role of GmDHN1 in the prevention of freezing, desiccation, ionic, or osmotic stress-related damage to macromolecular structures. PMID:12644649
Wang, Jian; Xie, Dong; Lin, Hongfei; Yang, Zhihao; Zhang, Yijia
2012-06-21
Many biological processes recognize in particular the importance of protein complexes, and various computational approaches have been developed to identify complexes from protein-protein interaction (PPI) networks. However, high false-positive rate of PPIs leads to challenging identification. A protein semantic similarity measure is proposed in this study, based on the ontology structure of Gene Ontology (GO) terms and GO annotations to estimate the reliability of interactions in PPI networks. Interaction pairs with low GO semantic similarity are removed from the network as unreliable interactions. Then, a cluster-expanding algorithm is used to detect complexes with core-attachment structure on filtered network. Our method is applied to three different yeast PPI networks. The effectiveness of our method is examined on two benchmark complex datasets. Experimental results show that our method performed better than other state-of-the-art approaches in most evaluation metrics. The method detects protein complexes from large scale PPI networks by filtering GO semantic similarity. Removing interactions with low GO similarity significantly improves the performance of complex identification. The expanding strategy is also effective to identify attachment proteins of complexes.
Expression and Purification of Rat Glucose Transporter 1 in Pichia pastoris.
Venskutonytė, Raminta; Elbing, Karin; Lindkvist-Petersson, Karin
2018-01-01
Large amounts of pure and homogenous protein are a prerequisite for several biochemical and biophysical analyses, and in particular if aiming at resolving the three-dimensional protein structure. Here we describe the production of the rat glucose transporter 1 (GLUT1), a membrane protein facilitating the transport of glucose in cells. The protein is recombinantly expressed in the yeast Pichia pastoris. It is easily maintained and large-scale protein production in shaker flasks, as commonly performed in academic research laboratories, results in relatively high yields of membrane protein. The purification protocol describes all steps needed to obtain a pure and homogenous GLUT1 protein solution, including cell growth, membrane isolation, and chromatographic purification methods.
Revealing the global map of protein folding space by large-scale simulations
NASA Astrophysics Data System (ADS)
Sinner, Claude; Lutz, Benjamin; Verma, Abhinav; Schug, Alexander
2015-12-01
The full characterization of protein folding is a remarkable long-standing challenge both for experiment and simulation. Working towards a complete understanding of this process, one needs to cover the full diversity of existing folds and identify the general principles driving the process. Here, we want to understand and quantify the diversity in folding routes for a large and representative set of protein topologies covering the full range from all alpha helical topologies towards beta barrels guided by the key question: Does the majority of the observed routes contribute to the folding process or only a particular route? We identified a set of two-state folders among non-homologous proteins with a sequence length of 40-120 residues. For each of these proteins, we ran native-structure based simulations both with homogeneous and heterogeneous contact potentials. For each protein, we simulated dozens of folding transitions in continuous uninterrupted simulations and constructed a large database of kinetic parameters. We investigate folding routes by tracking the formation of tertiary structure interfaces and discuss whether a single specific route exists for a topology or if all routes are equiprobable. These results permit us to characterize the complete folding space for small proteins in terms of folding barrier ΔG‡, number of routes, and the route specificity RT.
NASA Technical Reports Server (NTRS)
Righetti, Pier Giorgio; Casale, Elena; Carter, Daniel; Snyder, Robert S.; Wenisch, Elisabeth; Faupel, Michel
1990-01-01
Recombinant-DNA (deoxyribonucleic acid) (r-DNA) proteins, produced in large quantities for human consumption, are now available in sufficient amounts for crystal growth. Crystallographic analysis is the only method now available for defining the atomic arrangements within complex biological molecules and decoding, e.g., the structure of the active site. Growing protein crystals in microgravity has become an important aspect of biology in space, since crystals that are large enough and of sufficient quality to permit complete structure determinations are usually obtained. However even small amounts of impurities in a protein preparation are anathema for the growth of a regular crystal lattice. A multicompartment electrolyzer with isoelectric, immobiline membranes, able to purify large quantities of r-DNA proteins is described. The electrolyzer consists of a stack of flow cells, delimited by membranes of very precise isoelectric point (pI, consisting of polyacrylamide supported by glass fiber filters containing Immobiline buffers and titrants to uniquely define a pI value) and very high buffering power, able to titrate all proteins tangent or crossing such membranes. By properly selecting the pI values of two membranes delimiting a flow chamber, a single protein can be kept isoelectric in a single flow chamber and thus, be purified to homogeneity (by the most stringent criterion, charge homogeneity).
Accurate high-throughput structure mapping and prediction with transition metal ion FRET
Yu, Xiaozhen; Wu, Xiongwu; Bermejo, Guillermo A.; Brooks, Bernard R.; Taraska, Justin W.
2013-01-01
Mapping the landscape of a protein’s conformational space is essential to understanding its functions and regulation. The limitations of many structural methods have made this process challenging for most proteins. Here, we report that transition metal ion FRET (tmFRET) can be used in a rapid, highly parallel screen, to determine distances from multiple locations within a protein at extremely low concentrations. The distances generated through this screen for the protein Maltose Binding Protein (MBP) match distances from the crystal structure to within a few angstroms. Furthermore, energy transfer accurately detects structural changes during ligand binding. Finally, fluorescence-derived distances can be used to guide molecular simulations to find low energy states. Our results open the door to rapid, accurate mapping and prediction of protein structures at low concentrations, in large complex systems, and in living cells. PMID:23273426
Abendroth, Jan; McCormick, Michael S.; Edwards, Thomas E.; Staker, Bart; Loewen, Roderick; Gifford, Martin; Rifkin, Jeff; Mayer, Chad; Guo, Wenjin; Zhang, Yang; Myler, Peter; Kelley, Angela; Analau, Erwin; Hewitt, Stephen Nakazawa; Napuli, Alberto J.; Kuhn, Peter; Ruth, Ronald D.; Stewart, Lance J.
2010-01-01
Structural genomics discovery projects require ready access to both X-ray and NMR instrumentation which support the collection of experimental data needed to solve large numbers of novel protein structures. The most productive X-ray crystal structure determination laboratories make extensive frequent use of tunable synchrotron X-ray light to solve novel structures by anomalous diffraction methods. This requires that frozen cryo-protected crystals be shipped to large government-run synchrotron facilities for data collection. In an effort to eliminate the need to ship crystals for data collection, we have developed the first laboratory-scale synchrotron light source capable of performing many of the state-of-the-art synchrotron applications in X-ray science. This Compact Light Source is a first-in-class device that uses inverse Compton scattering to generate X-rays of sufficient flux, tunable wavelength and beam size to allow high-resolution X-ray diffraction data collection from protein crystals. We report on benchmarking tests of X-ray diffraction data collection with hen egg white lysozyme, and the successful high-resolution X-ray structure determination of the Glycine cleavage system protein H from Mycobacterium tuberculosis using diffraction data collected with the Compact Light Source X-ray beam. PMID:20364333
Automation of large scale transient protein expression in mammalian cells
Zhao, Yuguang; Bishop, Benjamin; Clay, Jordan E.; Lu, Weixian; Jones, Margaret; Daenke, Susan; Siebold, Christian; Stuart, David I.; Yvonne Jones, E.; Radu Aricescu, A.
2011-01-01
Traditional mammalian expression systems rely on the time-consuming generation of stable cell lines; this is difficult to accommodate within a modern structural biology pipeline. Transient transfections are a fast, cost-effective solution, but require skilled cell culture scientists, making man-power a limiting factor in a setting where numerous samples are processed in parallel. Here we report a strategy employing a customised CompacT SelecT cell culture robot allowing the large-scale expression of multiple protein constructs in a transient format. Successful protocols have been designed for automated transient transfection of human embryonic kidney (HEK) 293T and 293S GnTI− cells in various flask formats. Protein yields obtained by this method were similar to those produced manually, with the added benefit of reproducibility, regardless of user. Automation of cell maintenance and transient transfection allows the expression of high quality recombinant protein in a completely sterile environment with limited support from a cell culture scientist. The reduction in human input has the added benefit of enabling continuous cell maintenance and protein production, features of particular importance to structural biology laboratories, which typically use large quantities of pure recombinant proteins, and often require rapid characterisation of a series of modified constructs. This automated method for large scale transient transfection is now offered as a Europe-wide service via the P-cube initiative. PMID:21571074
NASA Astrophysics Data System (ADS)
Wu, Chun; Shea, Joan-Emma
Protein aggregation involves the self-assembly of proteins into large β-sheet-rich complexes. This process can be the result of aberrant protein folding and lead to "amyloidosis," a condition characterized by deposits of protein aggregates known as amyloids on various organs of the body [1]. Amyloid-related diseases include, among others, Alzheimer's disease, Parkinson's disease, Creutzfeldt-Jakob disease, and type II diabetes [2, 3, 4]. In other instances, however, protein aggregation is not a pathological process, but rather a functional one, with aggregates serving as structural scaffolds in a number of organisms [5].
Origins of the protein synthesis cycle
NASA Technical Reports Server (NTRS)
Fox, S. W.
1981-01-01
Largely derived from experiments in molecular evolution, a theory of protein synthesis cycles has been constructed. The sequence begins with ordered thermal proteins resulting from the self-sequencing of mixed amino acids. Ordered thermal proteins then aggregate to cell-like structures. When they contained proteinoids sufficiently rich in lysine, the structures were able to synthesize offspring peptides. Since lysine-rich proteinoid (LRP) also catalyzes the polymerization of nucleoside triphosphate to polynucleotides, the same microspheres containing LRP could have synthesized both original cellular proteins and cellular nucleic acids. The LRP within protocells would have provided proximity advantageous for the origin and evolution of the genetic code.
Structure-based design of combinatorial mutagenesis libraries
Verma, Deeptak; Grigoryan, Gevorg; Bailey-Kellogg, Chris
2015-01-01
The development of protein variants with improved properties (thermostability, binding affinity, catalytic activity, etc.) has greatly benefited from the application of high-throughput screens evaluating large, diverse combinatorial libraries. At the same time, since only a very limited portion of sequence space can be experimentally constructed and tested, an attractive possibility is to use computational protein design to focus libraries on a productive portion of the space. We present a general-purpose method, called “Structure-based Optimization of Combinatorial Mutagenesis” (SOCoM), which can optimize arbitrarily large combinatorial mutagenesis libraries directly based on structural energies of their constituents. SOCoM chooses both positions and substitutions, employing a combinatorial optimization framework based on library-averaged energy potentials in order to avoid explicitly modeling every variant in every possible library. In case study applications to green fluorescent protein, β-lactamase, and lipase A, SOCoM optimizes relatively small, focused libraries whose variants achieve energies comparable to or better than previous library design efforts, as well as larger libraries (previously not designable by structure-based methods) whose variants cover greater diversity while still maintaining substantially better energies than would be achieved by representative random library approaches. By allowing the creation of large-scale combinatorial libraries based on structural calculations, SOCoM promises to increase the scope of applicability of computational protein design and improve the hit rate of discovering beneficial variants. While designs presented here focus on variant stability (predicted by total energy), SOCoM can readily incorporate other structure-based assessments, such as the energy gap between alternative conformational or bound states. PMID:25611189
Mechanistic aspects of protein corona formation: insulin adsorption onto gold nanoparticle surfaces
NASA Astrophysics Data System (ADS)
Grass, Stefan; Treuel, Lennart
2014-02-01
In biological fluids, an adsorption layer of proteins, a "protein corona" forms around nanoparticles (NPs) largely determining their biological identity. In many interactions with NPs proteins can undergo structural changes. Here, we study the adsorption of insulin onto gold NPs (mean hydrodynamic particle diameter 80 ± 18 nm), focusing on the structural consequences of the adsorption process for the protein. We use surface enhanced Raman scattering (SERS) spectroscopy to study changes in the protein's secondary structure as well as the impact on integrity and conformations of disulfide bonds immediately on the NP surface. A detailed comparison to SERS spectra of cysteine and cystine provides first mechanistic insights into the causes for these conformational changes. Potential biological and toxicological implications of these findings are also discussed.
Lee, Hui Sun; Im, Wonpil
2016-04-01
Molecular recognition by protein mostly occurs in a local region on the protein surface. Thus, an efficient computational method for accurate characterization of protein local structural conservation is necessary to better understand biology and drug design. We present a novel local structure alignment tool, G-LoSA. G-LoSA aligns protein local structures in a sequence order independent way and provides a GA-score, a chemical feature-based and size-independent structure similarity score. Our benchmark validation shows the robust performance of G-LoSA to the local structures of diverse sizes and characteristics, demonstrating its universal applicability to local structure-centric comparative biology studies. In particular, G-LoSA is highly effective in detecting conserved local regions on the entire surface of a given protein. In addition, the applications of G-LoSA to identifying template ligands and predicting ligand and protein binding sites illustrate its strong potential for computer-aided drug design. We hope that G-LoSA can be a useful computational method for exploring interesting biological problems through large-scale comparison of protein local structures and facilitating drug discovery research and development. G-LoSA is freely available to academic users at http://im.compbio.ku.edu/GLoSA/. © 2016 The Protein Society.
Structural perturbations on huntingtin N17 domain during its folding on 2D-nanomaterials
NASA Astrophysics Data System (ADS)
Zhang, Leili; Feng, Mei; Zhou, Ruhong; Luan, Binquan
2017-09-01
A globular protein’s folded structure in its physiological environment is largely determined by its amino acid sequence. Recently, newly discovered transformer proteins as well as intrinsically disordered proteins may adopt the folding-upon-binding mechanism where their secondary structures are highly dependent on their binding partners. Due to the various applications of nanomaterials in biological sensors and potential wearable devices, it is important to discover possible conformational changes of proteins on nanomaterials. Here, through molecular dynamics simulations, we show that the first 17 residues of the huntingtin protein (HTT-N17) exhibit appreciable differences during its folding on 2D-nanomaterials, such as graphene and MoS2 nanosheets. Namely, the protein is disordered on the graphene surface but is helical on the MoS2 surface. Despite that the amphiphilic environment at the nanosheet-water interface promotes the folding of the amphipathic proteins (such as HTT-N17), competitions between protein-nanosheet and intra-protein interactions yield very different protein conformations. Therefore, as engineered binding partners, nanomaterials might significantly affect the structures of adsorbed proteins.
A new test set for validating predictions of protein-ligand interaction.
Nissink, J Willem M; Murray, Chris; Hartshorn, Mike; Verdonk, Marcel L; Cole, Jason C; Taylor, Robin
2002-12-01
We present a large test set of protein-ligand complexes for the purpose of validating algorithms that rely on the prediction of protein-ligand interactions. The set consists of 305 complexes with protonation states assigned by manual inspection. The following checks have been carried out to identify unsuitable entries in this set: (1) assessing the involvement of crystallographically related protein units in ligand binding; (2) identification of bad clashes between protein side chains and ligand; and (3) assessment of structural errors, and/or inconsistency of ligand placement with crystal structure electron density. In addition, the set has been pruned to assure diversity in terms of protein-ligand structures, and subsets are supplied for different protein-structure resolution ranges. A classification of the set by protein type is available. As an illustration, validation results are shown for GOLD and SuperStar. GOLD is a program that performs flexible protein-ligand docking, and SuperStar is used for the prediction of favorable interaction sites in proteins. The new CCDC/Astex test set is freely available to the scientific community (http://www.ccdc.cam.ac.uk). Copyright 2002 Wiley-Liss, Inc.
Structural Determination of Biomolecules in Microfluidic Systems
NASA Astrophysics Data System (ADS)
Butler, John C.; Menard, Etienne; Rogers, John A.; Wong, Gerard C. L.
2004-03-01
Supramolecular biological complexes are often too large to be crystallized for structural studies. Here, we explore the use of microfluidic arrays to order a model self-assembled cytoskeletal system. Filamentous actin (F-actin) is a negatively charged protein rod and is a key structural component in the eukaryotic cytoskeleton. In this context, F-actin can self-assemble with actin binding proteins (ABP) in a highly regulated manner to dynamically form structures for a wide range of biomechanical functions. In this work, we will systematically study the action of 3 types of actin binding proteins (a-actinin, fimbrin, cofilin) on the self-assembled structures of F-actin that have been aligned in microfluidic arrays.
Thermodynamic prediction of protein neutrality.
Bloom, Jesse D; Silberg, Jonathan J; Wilke, Claus O; Drummond, D Allan; Adami, Christoph; Arnold, Frances H
2005-01-18
We present a simple theory that uses thermodynamic parameters to predict the probability that a protein retains the wild-type structure after one or more random amino acid substitutions. Our theory predicts that for large numbers of substitutions the probability that a protein retains its structure will decline exponentially with the number of substitutions, with the severity of this decline determined by properties of the structure. Our theory also predicts that a protein can gain extra robustness to the first few substitutions by increasing its thermodynamic stability. We validate our theory with simulations on lattice protein models and by showing that it quantitatively predicts previously published experimental measurements on subtilisin and our own measurements on variants of TEM1 beta-lactamase. Our work unifies observations about the clustering of functional proteins in sequence space, and provides a basis for interpreting the response of proteins to substitutions in protein engineering applications.
Thermodynamic prediction of protein neutrality
Bloom, Jesse D.; Silberg, Jonathan J.; Wilke, Claus O.; Drummond, D. Allan; Adami, Christoph; Arnold, Frances H.
2005-01-01
We present a simple theory that uses thermodynamic parameters to predict the probability that a protein retains the wild-type structure after one or more random amino acid substitutions. Our theory predicts that for large numbers of substitutions the probability that a protein retains its structure will decline exponentially with the number of substitutions, with the severity of this decline determined by properties of the structure. Our theory also predicts that a protein can gain extra robustness to the first few substitutions by increasing its thermodynamic stability. We validate our theory with simulations on lattice protein models and by showing that it quantitatively predicts previously published experimental measurements on subtilisin and our own measurements on variants of TEM1 β-lactamase. Our work unifies observations about the clustering of functional proteins in sequence space, and provides a basis for interpreting the response of proteins to substitutions in protein engineering applications. PMID:15644440
Protein Structural Analysis via Mass Spectrometry-Based Proteomics
Artigues, Antonio; Nadeau, Owen W.; Rimmer, Mary Ashley; Villar, Maria T.; Du, Xiuxia; Fenton, Aron W.; Carlson, Gerald M.
2017-01-01
Modern mass spectrometry (MS) technologies have provided a versatile platform that can be combined with a large number of techniques to analyze protein structure and dynamics. These techniques include the three detailed in this chapter: 1) hydrogen/deuterium exchange (HDX), 2) limited proteolysis, and 3) chemical crosslinking (CX). HDX relies on the change in mass of a protein upon its dilution into deuterated buffer, which results in varied deuterium content within its backbone amides. Structural information on surface exposed, flexible or disordered linker regions of proteins can be achieved through limited proteolysis, using a variety of proteases and only small extents of digestion. CX refers to the covalent coupling of distinct chemical species and has been used to analyze the structure, function and interactions of proteins by identifying crosslinking sites that are formed by small multi-functional reagents, termed crosslinkers. Each of these MS applications is capable of revealing structural information for proteins when used either with or without other typical high resolution techniques, including NMR and X-ray crystallography. PMID:27975228
A 'periodic table' for protein structures.
Taylor, William R
2002-04-11
Current structural genomics programs aim systematically to determine the structures of all proteins coded in both human and other genomes, providing a complete picture of the number and variety of protein structures that exist. In the past, estimates have been made on the basis of the incomplete sample of structures currently known. These estimates have varied greatly (between 1,000 and 10,000; see for example refs 1 and 2), partly because of limited sample size but also owing to the difficulties of distinguishing one structure from another. This distinction is usually topological, based on the fold of the protein; however, in strict topological terms (neglecting to consider intra-chain cross-links), protein chains are open strings and hence are all identical. To avoid this trivial result, topologies are determined by considering secondary links in the form of intra-chain hydrogen bonds (secondary structure) and tertiary links formed by the packing of secondary structures. However, small additions to or loss of structure can make large changes to these perceived topologies and such subjective solutions are neither robust nor amenable to automation. Here I formalize both secondary and tertiary links to allow the rigorous and automatic definition of protein topology.
Aggregation of alpha-synuclein by a coarse-grained Monte Carlo simulation
NASA Astrophysics Data System (ADS)
Farmer, Barry; Pandey, Ras
Alpha-synuclein, an intrinsic protein abundant in neurons, is believed to be a major cause of neurodegenerative diseases (e.g. Alzheimer, Parkinson's disease). Abnormal aggregation of ASN leads to Lewy bodies with specific morphologies. We investigate the self-organizing structures in a crowded environment of ASN proteins by a coarse-grained Monte Carlo simulation. ASN is a chain of 140 residues. Structure detail of residues is neglected but its specificity is captured via unique knowledge-based residue-residue interactions. Large-scale simulations are performed to analyze a number local and global physical quantities (e.g. mobility profile, contact map, radius of gyration, structure factor) as a function of temperature and protein concentration. Trend in multi-scale structural variations of the protein in a crowded environment is compared with that of a free protein chain.
Gallat, F.-X.; Laganowsky, A.; Wood, K.; Gabel, F.; van Eijck, L.; Wuttke, J.; Moulin, M.; Härtlein, M.; Eisenberg, D.; Colletier, J.-P.; Zaccai, G.; Weik, M.
2012-01-01
Hydration water is vital for various macromolecular biological activities, such as specific ligand recognition, enzyme activity, response to receptor binding, and energy transduction. Without hydration water, proteins would not fold correctly and would lack the conformational flexibility that animates their three-dimensional structures. Motions in globular, soluble proteins are thought to be governed to a certain extent by hydration-water dynamics, yet it is not known whether this relationship holds true for other protein classes in general and whether, in turn, the structural nature of a protein also influences water motions. Here, we provide insight into the coupling between hydration-water dynamics and atomic motions in intrinsically disordered proteins (IDP), a largely unexplored class of proteins that, in contrast to folded proteins, lack a well-defined three-dimensional structure. We investigated the human IDP tau, which is involved in the pathogenic processes accompanying Alzheimer disease. Combining neutron scattering and protein perdeuteration, we found similar atomic mean-square displacements over a large temperature range for the tau protein and its hydration water, indicating intimate coupling between them. This is in contrast to the behavior of folded proteins of similar molecular weight, such as the globular, soluble maltose-binding protein and the membrane protein bacteriorhodopsin, which display moderate to weak coupling, respectively. The extracted mean square displacements also reveal a greater motional flexibility of IDP compared with globular, folded proteins and more restricted water motions on the IDP surface. The results provide evidence that protein and hydration-water motions mutually affect and shape each other, and that there is a gradient of coupling across different protein classes that may play a functional role in macromolecular activity in a cellular context. PMID:22828339
DNAproDB: an interactive tool for structural analysis of DNA–protein complexes
Sagendorf, Jared M.
2017-01-01
Abstract Many biological processes are mediated by complex interactions between DNA and proteins. Transcription factors, various polymerases, nucleases and histones recognize and bind DNA with different levels of binding specificity. To understand the physical mechanisms that allow proteins to recognize DNA and achieve their biological functions, it is important to analyze structures of DNA–protein complexes in detail. DNAproDB is a web-based interactive tool designed to help researchers study these complexes. DNAproDB provides an automated structure-processing pipeline that extracts structural features from DNA–protein complexes. The extracted features are organized in structured data files, which are easily parsed with any programming language or viewed in a browser. We processed a large number of DNA–protein complexes retrieved from the Protein Data Bank and created the DNAproDB database to store this data. Users can search the database by combining features of the DNA, protein or DNA–protein interactions at the interface. Additionally, users can upload their own structures for processing privately and securely. DNAproDB provides several interactive and customizable tools for creating visualizations of the DNA–protein interface at different levels of abstraction that can be exported as high quality figures. All functionality is documented and freely accessible at http://dnaprodb.usc.edu. PMID:28431131
Structural features that predict real-value fluctuations of globular proteins.
Jamroz, Michal; Kolinski, Andrzej; Kihara, Daisuke
2012-05-01
It is crucial to consider dynamics for understanding the biological function of proteins. We used a large number of molecular dynamics (MD) trajectories of nonhomologous proteins as references and examined static structural features of proteins that are most relevant to fluctuations. We examined correlation of individual structural features with fluctuations and further investigated effective combinations of features for predicting the real value of residue fluctuations using the support vector regression (SVR). It was found that some structural features have higher correlation than crystallographic B-factors with fluctuations observed in MD trajectories. Moreover, SVR that uses combinations of static structural features showed accurate prediction of fluctuations with an average Pearson's correlation coefficient of 0.669 and a root mean square error of 1.04 Å. This correlation coefficient is higher than the one observed in predictions by the Gaussian network model (GNM). An advantage of the developed method over the GNMs is that the former predicts the real value of fluctuation. The results help improve our understanding of relationships between protein structure and fluctuation. Furthermore, the developed method provides a convienient practial way to predict fluctuations of proteins using easily computed static structural features of proteins. Copyright © 2012 Wiley Periodicals, Inc.
Structural features that predict real-value fluctuations of globular proteins
Jamroz, Michal; Kolinski, Andrzej; Kihara, Daisuke
2012-01-01
It is crucial to consider dynamics for understanding the biological function of proteins. We used a large number of molecular dynamics trajectories of non-homologous proteins as references and examined static structural features of proteins that are most relevant to fluctuations. We examined correlation of individual structural features with fluctuations and further investigated effective combinations of features for predicting the real-value of residue fluctuations using the support vector regression. It was found that some structural features have higher correlation than crystallographic B-factors with fluctuations observed in molecular dynamics trajectories. Moreover, support vector regression that uses combinations of static structural features showed accurate prediction of fluctuations with an average Pearson’s correlation coefficient of 0.669 and a root mean square error of 1.04 Å. This correlation coefficient is higher than the one observed for the prediction by the Gaussian network model. An advantage of the developed method over the Gaussian network models is that the former predicts the real-value of fluctuation. The results help improve our understanding of relationships between protein structure and fluctuation. Furthermore, the developed method provides a convienient practial way to predict fluctuations of proteins using easily computed static structural features of proteins. PMID:22328193
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ficko-Blean, E.; Gregg, K; Adams, J
2009-01-01
Common features of the extracellular carbohydrate-active virulence factors involved in host-pathogen interactions are their large sizes and modular complexities. This has made them recalcitrant to structural analysis, and therefore our understanding of the significance of modularity in these important proteins is lagging. Clostridium perfringens is a prevalent human pathogen that harbors a wide array of large, extracellular carbohydrate-active enzymes and is an excellent and relevant model system to approach this problem. Here we describe the complete structure of C. perfringens GH84C (NagJ), a 1001-amino acid multimodular homolog of the C. perfringens ?-toxin, which was determined using a combination of smallmore » angle x-ray scattering and x-ray crystallography. The resulting structure reveals unprecedented insight into how catalysis, carbohydrate-specific adherence, and the formation of molecular complexes with other enzymes via an ultra-tight protein-protein interaction are spatially coordinated in an enzyme involved in a host-pathogen interaction.« less
An Evolution-Based Approach to De Novo Protein Design and Case Study on Mycobacterium tuberculosis
Brender, Jeffrey R.; Czajka, Jeff; Marsh, David; Gray, Felicia; Cierpicki, Tomasz; Zhang, Yang
2013-01-01
Computational protein design is a reverse procedure of protein folding and structure prediction, where constructing structures from evolutionarily related proteins has been demonstrated to be the most reliable method for protein 3-dimensional structure prediction. Following this spirit, we developed a novel method to design new protein sequences based on evolutionarily related protein families. For a given target structure, a set of proteins having similar fold are identified from the PDB library by structural alignments. A structural profile is then constructed from the protein templates and used to guide the conformational search of amino acid sequence space, where physicochemical packing is accommodated by single-sequence based solvation, torsion angle, and secondary structure predictions. The method was tested on a computational folding experiment based on a large set of 87 protein structures covering different fold classes, which showed that the evolution-based design significantly enhances the foldability and biological functionality of the designed sequences compared to the traditional physics-based force field methods. Without using homologous proteins, the designed sequences can be folded with an average root-mean-square-deviation of 2.1 Å to the target. As a case study, the method is extended to redesign all 243 structurally resolved proteins in the pathogenic bacteria Mycobacterium tuberculosis, which is the second leading cause of death from infectious disease. On a smaller scale, five sequences were randomly selected from the design pool and subjected to experimental validation. The results showed that all the designed proteins are soluble with distinct secondary structure and three have well ordered tertiary structure, as demonstrated by circular dichroism and NMR spectroscopy. Together, these results demonstrate a new avenue in computational protein design that uses knowledge of evolutionary conservation from protein structural families to engineer new protein molecules of improved fold stability and biological functionality. PMID:24204234
Rudling, Axel; Orro, Adolfo; Carlsson, Jens
2018-02-26
Water plays a major role in ligand binding and is attracting increasing attention in structure-based drug design. Water molecules can make large contributions to binding affinity by bridging protein-ligand interactions or by being displaced upon complex formation, but these phenomena are challenging to model at the molecular level. Herein, networks of ordered water molecules in protein binding sites were analyzed by clustering of molecular dynamics (MD) simulation trajectories. Locations of ordered waters (hydration sites) were first identified from simulations of high resolution crystal structures of 13 protein-ligand complexes. The MD-derived hydration sites reproduced 73% of the binding site water molecules observed in the crystal structures. If the simulations were repeated without the cocrystallized ligands, a majority (58%) of the crystal waters in the binding sites were still predicted. In addition, comparison of the hydration sites obtained from simulations carried out in the absence of ligands to those identified for the complexes revealed that the networks of ordered water molecules were preserved to a large extent, suggesting that the locations of waters in a protein-ligand interface are mainly dictated by the protein. Analysis of >1000 crystal structures showed that hydration sites bridged protein-ligand interactions in complexes with different ligands, and those with high MD-derived occupancies were more likely to correspond to experimentally observed ordered water molecules. The results demonstrate that ordered water molecules relevant for modeling of protein-ligand complexes can be identified from MD simulations. Our findings could contribute to development of improved methods for structure-based virtual screening and lead optimization.
The correlation of fragmentation and structure of a protein
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wu, Qinyuan; Cheng, Xueheng; Van Orden, S.
1995-12-31
Characterization of proteins of similar structures is important to understanding the biological function of the proteins and the processes with which they are involved. Cytochrome c variants typically have similar sequences, and have similar conformations in solution with almost identical absorption spectra and redox potentials. The authors chose cytochrome c`s from bovine, tuna, rabbit and horse as a model system in studying large biomolecules using MS{sup n} of multiply charged ions generated from electrospray ionization (ESI).
A Generative Angular Model of Protein Structure Evolution
Golden, Michael; García-Portugués, Eduardo; Sørensen, Michael; Mardia, Kanti V.; Hamelryck, Thomas; Hein, Jotun
2017-01-01
Abstract Recently described stochastic models of protein evolution have demonstrated that the inclusion of structural information in addition to amino acid sequences leads to a more reliable estimation of evolutionary parameters. We present a generative, evolutionary model of protein structure and sequence that is valid on a local length scale. The model concerns the local dependencies between sequence and structure evolution in a pair of homologous proteins. The evolutionary trajectory between the two structures in the protein pair is treated as a random walk in dihedral angle space, which is modeled using a novel angular diffusion process on the two-dimensional torus. Coupling sequence and structure evolution in our model allows for modeling both “smooth” conformational changes and “catastrophic” conformational jumps, conditioned on the amino acid changes. The model has interpretable parameters and is comparatively more realistic than previous stochastic models, providing new insights into the relationship between sequence and structure evolution. For example, using the trained model we were able to identify an apparent sequence–structure evolutionary motif present in a large number of homologous protein pairs. The generative nature of our model enables us to evaluate its validity and its ability to simulate aspects of protein evolution conditioned on an amino acid sequence, a related amino acid sequence, a related structure or any combination thereof. PMID:28453724
Protein-protein structure prediction by scoring molecular dynamics trajectories of putative poses.
Sarti, Edoardo; Gladich, Ivan; Zamuner, Stefano; Correia, Bruno E; Laio, Alessandro
2016-09-01
The prediction of protein-protein interactions and their structural configuration remains a largely unsolved problem. Most of the algorithms aimed at finding the native conformation of a protein complex starting from the structure of its monomers are based on searching the structure corresponding to the global minimum of a suitable scoring function. However, protein complexes are often highly flexible, with mobile side chains and transient contacts due to thermal fluctuations. Flexibility can be neglected if one aims at finding quickly the approximate structure of the native complex, but may play a role in structure refinement, and in discriminating solutions characterized by similar scores. We here benchmark the capability of some state-of-the-art scoring functions (BACH-SixthSense, PIE/PISA and Rosetta) in discriminating finite-temperature ensembles of structures corresponding to the native state and to non-native configurations. We produce the ensembles by running thousands of molecular dynamics simulations in explicit solvent starting from poses generated by rigid docking and optimized in vacuum. We find that while Rosetta outperformed the other two scoring functions in scoring the structures in vacuum, BACH-SixthSense and PIE/PISA perform better in distinguishing near-native ensembles of structures generated by molecular dynamics in explicit solvent. Proteins 2016; 84:1312-1320. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
McDougall, Carmel; Woodcroft, Ben J.
2016-01-01
In nature, numerous mechanisms have evolved by which organisms fabricate biological structures with an impressive array of physical characteristics. Some examples of metazoan biological materials include the highly elastic byssal threads by which bivalves attach themselves to rocks, biomineralized structures that form the skeletons of various animals, and spider silks that are renowned for their exceptional strength and elasticity. The remarkable properties of silks, which are perhaps the best studied biological materials, are the result of the highly repetitive, modular, and biased amino acid composition of the proteins that compose them. Interestingly, similar levels of modularity/repetitiveness and similar bias in amino acid compositions have been reported in proteins that are components of structural materials in other organisms, however the exact nature and extent of this similarity, and its functional and evolutionary relevance, is unknown. Here, we investigate this similarity and use sequence features common to silks and other known structural proteins to develop a bioinformatics-based method to identify similar proteins from large-scale transcriptome and whole-genome datasets. We show that a large number of proteins identified using this method have roles in biological material formation throughout the animal kingdom. Despite the similarity in sequence characteristics, most of the silk-like structural proteins (SLSPs) identified in this study appear to have evolved independently and are restricted to a particular animal lineage. Although the exact function of many of these SLSPs is unknown, the apparent independent evolution of proteins with similar sequence characteristics in divergent lineages suggests that these features are important for the assembly of biological materials. The identification of these characteristics enable the generation of testable hypotheses regarding the mechanisms by which these proteins assemble and direct the construction of biological materials with diverse morphologies. The SilkSlider predictor software developed here is available at https://github.com/wwood/SilkSlider. PMID:27415783
Photoswitchable red fluorescent protein with a large Stokes shift
Piatkevich, Kiryl D.; English, Brian P.; Malashkevich, Vladimir N.; Xiao, Hui; Almo, Steven C.; Singer, Robert H.; Verkhusha, Vladislav V.
2014-01-01
SUMMARY Subclass of fluorescent proteins, large Stokes shift fluorescent proteins, is characterized by their increased spread between the excitation and emission maxima. Here we report a photoswitchable variant of a red fluorescent protein with a large Stokes shift, PSLSSmKate, which initially exhibits excitation/emission at 445/622 nm, but irradiation with violet light photoswitches PSLSSmKate into a common red form with excitation/emission at 573/621 nm. We characterize spectral, photophysical and biochemical properties of PSLSSmKate in vitro and in mammalian cells, and determine its crystal structure in the large Stokes shift form. Mass-spectrometry, mutagenesis and spectroscopic analysis of PSLSSmKate allow us to propose molecular mechanisms for the large Stokes shift, pH dependence and light-induced chromophore transformation. We demonstrate applicability of PSLSSmKate to superresolution PALM microscopy and protein dynamics in live cells. Given its promising properties, we expect that PSLSSmKate-like phenotype will be further used for photoactivatable imaging and tracking multiple populations of intracellular objects. PMID:25242289
Structural Basis for Antifreeze Activity of Ice-binding Protein from Arctic Yeast*
Lee, Jun Hyuck; Park, Ae Kyung; Do, Hackwon; Park, Kyoung Sun; Moh, Sang Hyun; Chi, Young Min; Kim, Hak Jun
2012-01-01
Arctic yeast Leucosporidium sp. produces a glycosylated ice-binding protein (LeIBP) with a molecular mass of ∼25 kDa, which can lower the freezing point below the melting point once it binds to ice. LeIBP is a member of a large class of ice-binding proteins, the structures of which are unknown. Here, we report the crystal structures of non-glycosylated LeIBP and glycosylated LeIBP at 1.57- and 2.43-Å resolution, respectively. Structural analysis of the LeIBPs revealed a dimeric right-handed β-helix fold, which is composed of three parts: a large coiled structural domain, a long helix region (residues 96–115 form a long α-helix that packs along one face of the β-helix), and a C-terminal hydrophobic loop region (243PFVPAPEVV251). Unexpectedly, the C-terminal hydrophobic loop region has an extended conformation pointing away from the body of the coiled structural domain and forms intertwined dimer interactions. In addition, structural analysis of glycosylated LeIBP with sugar moieties attached to Asn185 provides a basis for interpreting previous biochemical analyses as well as the increased stability and secretion of glycosylated LeIBP. We also determined that the aligned Thr/Ser/Ala residues are critical for ice binding within the B face of LeIBP using site-directed mutagenesis. Although LeIBP has a common β-helical fold similar to that of canonical hyperactive antifreeze proteins, the ice-binding site is more complex and does not have a simple ice-binding motif. In conclusion, we could identify the ice-binding site of LeIBP and discuss differences in the ice-binding modes compared with other known antifreeze proteins and ice-binding proteins. PMID:22303017
Niu, Mengna; Ma, Hongyan; Hu, Fei; Wang, Shige; Liu, Lu; Chang, Haizhou; Huang, Mingxian
2017-06-08
Large-pore silica microspheres were synthesized by utilizing weak cation exchange polymer beads as templates, N -trimethoxysilylpropyl- N,N,N -trimethylammonium chloride (TMSPTMA) as a structure-directing agent, tetraethoxysilane (TEOS) as a silica precursor, and triethanolamine as a weak base catalyst. The hydrolysis and condensation of the silica precursors occurred inside the templating polymer beads yielded polymer/silica composite microspheres. After the organic polymer templates were removed in the calcination step, large-pore silica microspheres were produced. The effects of different reaction conditions on the morphology, structure and dispersibility of the formed silica microspheres were investigated. It has been shown that when the volume ratio of TMSPTMA, TEOS and triethanolamine was 1:2:2, silica microspheres with pore size range of 50-150 nm and particle size around 2 μm were obtained. The as-prepared silica microspheres were then bonded with chlorodimethyloctadecylsilane (C18), packed into a 50 mm×4.6 mm column, and evaluated for the separations of some common standard proteins and soybean isolation proteins. The results showed that the large-pore silica spheres from this work have potentials for protein separation in HPLC.
Introduction to bioinformatics.
Can, Tolga
2014-01-01
Bioinformatics is an interdisciplinary field mainly involving molecular biology and genetics, computer science, mathematics, and statistics. Data intensive, large-scale biological problems are addressed from a computational point of view. The most common problems are modeling biological processes at the molecular level and making inferences from collected data. A bioinformatics solution usually involves the following steps: Collect statistics from biological data. Build a computational model. Solve a computational modeling problem. Test and evaluate a computational algorithm. This chapter gives a brief introduction to bioinformatics by first providing an introduction to biological terminology and then discussing some classical bioinformatics problems organized by the types of data sources. Sequence analysis is the analysis of DNA and protein sequences for clues regarding function and includes subproblems such as identification of homologs, multiple sequence alignment, searching sequence patterns, and evolutionary analyses. Protein structures are three-dimensional data and the associated problems are structure prediction (secondary and tertiary), analysis of protein structures for clues regarding function, and structural alignment. Gene expression data is usually represented as matrices and analysis of microarray data mostly involves statistics analysis, classification, and clustering approaches. Biological networks such as gene regulatory networks, metabolic pathways, and protein-protein interaction networks are usually modeled as graphs and graph theoretic approaches are used to solve associated problems such as construction and analysis of large-scale networks.
Nadalin, Francesca; Carbone, Alessandra
2018-02-01
Large-scale computational docking will be increasingly used in future years to discriminate protein-protein interactions at the residue resolution. Complete cross-docking experiments make in silico reconstruction of protein-protein interaction networks a feasible goal. They ask for efficient and accurate screening of the millions structural conformations issued by the calculations. We propose CIPS (Combined Interface Propensity for decoy Scoring), a new pair potential combining interface composition with residue-residue contact preference. CIPS outperforms several other methods on screening docking solutions obtained either with all-atom or with coarse-grain rigid docking. Further testing on 28 CAPRI targets corroborates CIPS predictive power over existing methods. By combining CIPS with atomic potentials, discrimination of correct conformations in all-atom structures reaches optimal accuracy. The drastic reduction of candidate solutions produced by thousands of proteins docked against each other makes large-scale docking accessible to analysis. CIPS source code is freely available at http://www.lcqb.upmc.fr/CIPS. alessandra.carbone@lip6.fr. Supplementary data are available at Bioinformatics online. © The Author(s) 2017. Published by Oxford University Press.
Construction of ontology augmented networks for protein complex prediction.
Zhang, Yijia; Lin, Hongfei; Yang, Zhihao; Wang, Jian
2013-01-01
Protein complexes are of great importance in understanding the principles of cellular organization and function. The increase in available protein-protein interaction data, gene ontology and other resources make it possible to develop computational methods for protein complex prediction. Most existing methods focus mainly on the topological structure of protein-protein interaction networks, and largely ignore the gene ontology annotation information. In this article, we constructed ontology augmented networks with protein-protein interaction data and gene ontology, which effectively unified the topological structure of protein-protein interaction networks and the similarity of gene ontology annotations into unified distance measures. After constructing ontology augmented networks, a novel method (clustering based on ontology augmented networks) was proposed to predict protein complexes, which was capable of taking into account the topological structure of the protein-protein interaction network, as well as the similarity of gene ontology annotations. Our method was applied to two different yeast protein-protein interaction datasets and predicted many well-known complexes. The experimental results showed that (i) ontology augmented networks and the unified distance measure can effectively combine the structure closeness and gene ontology annotation similarity; (ii) our method is valuable in predicting protein complexes and has higher F1 and accuracy compared to other competing methods.
Perspective: next generation isotope-aided methods for protein NMR spectroscopy.
Kainosho, Masatsune; Miyanoiri, Yohei; Terauchi, Tsutomu; Takeda, Mitsuhiro
2018-06-22
In this perspective, we describe our efforts to innovate the current isotope-aided NMR methodology to investigate biologically important large proteins and protein complexes, for which only limited structural information could be obtained by conventional NMR approaches. At the present time, it is widely believed that only backbone amide and methyl signals are amenable for investigating such difficult targets. Therefore, our primary mission is to disseminate our novel knowledge within the biological NMR community; specifically, that any type of NMR signals other than methyl and amide groups can be obtained, even for quite large proteins, by optimizing the transverse relaxation properties by isotope labeling methods. The idea of "TROSY by isotope labeling" has been cultivated through our endeavors aiming to improve the original stereo-array isotope labeling (SAIL) method (Kainosho et al., Nature 440:52-57, 2006). The SAIL TROSY methods subsequently culminated in the successful observations of individual NMR signals for the side-chain aliphatic and aromatic 13 CH groups in large proteins, as exemplified by the 82 kDa single domain protein, malate synthase G. Meanwhile, the expected role of NMR spectroscopy in the emerging integrative structural biology has been rapidly shifting, from structure determination to the acquisition of biologically relevant structural dynamics, which are poorly accessible by X-ray crystallography or cryo-electron microscopy. Therefore, the newly accessible NMR probes, in addition to the methyl and amide signals, will open up a new horizon for investigating difficult protein targets, such as membrane proteins and supramolecular complexes, by NMR spectroscopy. We briefly introduce our latest results, showing that the protons attached to 12 C-atoms give profoundly narrow 1 H-NMR signals even for large proteins, by isolating them from the other protons using the selective deuteration. The direct 1 H observation methods exhibit the highest sensitivities, as compared to heteronuclear multidimensional spectroscopy, in which the 1 H-signals are acquired via the spin-coupled 13 C- and/or 15 N-nuclei. Although the selective deuteration method was launched a half century ago, as the first milestone in the following prosperous history of isotope-aided NMR methods, our results strongly imply that the low-dimensional 1 H-direct observation NMR methods should be revitalized in the coming era, featuring ultrahigh-field spectrometers beyond 1 GHz.
CCProf: exploring conformational change profile of proteins
Chang, Che-Wei; Chou, Chai-Wei; Chang, Darby Tien-Hao
2016-01-01
In many biological processes, proteins have important interactions with various molecules such as proteins, ions or ligands. Many proteins undergo conformational changes upon these interactions, where regions with large conformational changes are critical to the interactions. This work presents the CCProf platform, which provides conformational changes of entire proteins, named conformational change profile (CCP) in the context. CCProf aims to be a platform where users can study potential causes of novel conformational changes. It provides 10 biological features, including conformational change, potential binding target site, secondary structure, conservation, disorder propensity, hydropathy propensity, sequence domain, structural domain, phosphorylation site and catalytic site. All these information are integrated into a well-aligned view, so that researchers can capture important relevance between different biological features visually. The CCProf contains 986 187 protein structure pairs for 3123 proteins. In addition, CCProf provides a 3D view in which users can see the protein structures before and after conformational changes as well as binding targets that induce conformational changes. All information (e.g. CCP, binding targets and protein structures) shown in CCProf, including intermediate data are available for download to expedite further analyses. Database URL: http://zoro.ee.ncku.edu.tw/ccprof/ PMID:27016699
Fingerprint-Based Structure Retrieval Using Electron Density
Yin, Shuangye; Dokholyan, Nikolay V.
2010-01-01
We present a computational approach that can quickly search a large protein structural database to identify structures that fit a given electron density, such as determined by cryo-electron microscopy. We use geometric invariants (fingerprints) constructed using 3D Zernike moments to describe the electron density, and reduce the problem of fitting of the structure to the electron density to simple fingerprint comparison. Using this approach, we are able to screen the entire Protein Data Bank and identify structures that fit two experimental electron densities determined by cryo-electron microscopy. PMID:21287628
Fingerprint-based structure retrieval using electron density.
Yin, Shuangye; Dokholyan, Nikolay V
2011-03-01
We present a computational approach that can quickly search a large protein structural database to identify structures that fit a given electron density, such as determined by cryo-electron microscopy. We use geometric invariants (fingerprints) constructed using 3D Zernike moments to describe the electron density, and reduce the problem of fitting of the structure to the electron density to simple fingerprint comparison. Using this approach, we are able to screen the entire Protein Data Bank and identify structures that fit two experimental electron densities determined by cryo-electron microscopy. Copyright © 2010 Wiley-Liss, Inc.
Structure and dynamics of Ebola virus matrix protein VP40 by a coarse-grained Monte Carlo simulation
NASA Astrophysics Data System (ADS)
Pandey, Ras; Farmer, Barry
Ebola virus matrix protein VP40 (consisting of 326 residues) plays a critical role in viral assembly and its functions such as regulation of viral transcription, packaging, and budding of mature virions into the plasma membrane of infected cells. How does the protein VP40 go through structural evolution during the viral life cycle remains an open question? Using a coarse-grained Monte Carlo simulation we investigate the structural evolution of VP40 as a function of temperature with the input of a knowledge-based residue-residue interaction. A number local and global physical quantities (e.g. mobility profile, contact map, radius of gyration, structure factor) are analyzed with our large-scale simulations. Our preliminary data show that the structure of the protein evolves through different state with well-defined morphologies which can be identified and quantified via a detailed analysis of structure factor.
Singh, Juswinder; Deng, Zhan; Narale, Gaurav; Chuaqui, Claudio
2006-01-01
The combination of advances in structure-based drug design efforts in the pharmaceutical industry in parallel with structural genomics initiatives in the public domain has led to an explosion in the number of structures of protein-small molecule complexes structures. This information has critical importance to both the understanding of the structural basis for molecular recognition in biological systems and the design of better drugs. A significant challenge exists in managing this vast amount of data and fully leveraging it. Here, we review our work to develop a simple, fast way to store, organize, mine, and analyze large numbers of protein-small molecule complexes. We illustrate the utility of the approach to the management of inhibitor complexes from the protein kinase family. Finally, we describe our recent efforts in applying this method to the design of target-focused chemical libraries.
Dias, Raquel; Manny, Austin; Kolaczkowski, Oralia; Kolaczkowski, Bryan
2017-06-01
Reconstruction of ancestral protein sequences using phylogenetic methods is a powerful technique for directly examining the evolution of molecular function. Although ancestral sequence reconstruction (ASR) is itself very efficient, downstream functional, and structural studies necessary to characterize when and how changes in molecular function occurred are often costly and time-consuming, currently limiting ASR studies to examining a relatively small number of discrete functional shifts. As a result, we have very little direct information about how molecular function evolves across large protein families. Here we develop an approach combining ASR with structure and function prediction to efficiently examine the evolution of ligand affinity across a large family of double-stranded RNA binding proteins (DRBs) spanning animals and plants. We find that the characteristic domain architecture of DRBs-consisting of 2-3 tandem double-stranded RNA binding motifs (dsrms)-arose independently in early animal and plant lineages. The affinity with which individual dsrms bind double-stranded RNA appears to have increased and decreased often across both animal and plant phylogenies, primarily through convergent structural mechanisms involving RNA-contact residues within the β1-β2 loop and a small region of α2. These studies provide some of the first direct information about how protein function evolves across large gene families and suggest that changes in molecular function may occur often and unassociated with major phylogenetic events, such as gene or domain duplications. © The Author 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Competing Pathways and Multiple Folding Nuclei in a Large Multidomain Protein, Luciferase.
Scholl, Zackary N; Yang, Weitao; Marszalek, Piotr E
2017-05-09
Proteins obtain their final functional configuration through incremental folding with many intermediate steps in the folding pathway. If known, these intermediate steps could be valuable new targets for designing therapeutics and the sequence of events could elucidate the mechanism of refolding. However, determining these intermediate steps is hardly an easy feat, and has been elusive for most proteins, especially large, multidomain proteins. Here, we effectively map part of the folding pathway for the model large multidomain protein, Luciferase, by combining single-molecule force-spectroscopy experiments and coarse-grained simulation. Single-molecule refolding experiments reveal the initial nucleation of folding while simulations corroborate these stable core structures of Luciferase, and indicate the relative propensities for each to propagate to the final folded native state. Both experimental refolding and Monte Carlo simulations of Markov state models generated from simulation reveal that Luciferase most often folds along a pathway originating from the nucleation of the N-terminal domain, and that this pathway is the least likely to form nonnative structures. We then engineer truncated variants of Luciferase whose sequences corresponded to the putative structure from simulation and we use atomic force spectroscopy to determine their unfolding and stability. These experimental results corroborate the structures predicted from the folding simulation and strongly suggest that they are intermediates along the folding pathway. Taken together, our results suggest that initial Luciferase refolding occurs along a vectorial pathway and also suggest a mechanism that chaperones may exploit to prevent misfolding. Copyright © 2017 Biophysical Society. Published by Elsevier Inc. All rights reserved.
Lyumkis, Dmitry; Oliveira dos Passos, Dario; Tahara, Erich B.; Webb, Kristofor; Bennett, Eric J.; Vinterbo, Staal; Potter, Clinton S.; Carragher, Bridget; Joazeiro, Claudio A. P.
2014-01-01
All organisms have evolved mechanisms to manage the stalling of ribosomes upon translation of aberrant mRNA. In eukaryotes, the large ribosomal subunit-associated quality control complex (RQC), composed of the listerin/Ltn1 E3 ubiquitin ligase and cofactors, mediates the ubiquitylation and extraction of ribosome-stalled nascent polypeptide chains for proteasomal degradation. How RQC recognizes stalled ribosomes and performs its functions has not been understood. Using single-particle cryoelectron microscopy, we have determined the structure of the RQC complex bound to stalled 60S ribosomal subunits. The structure establishes how Ltn1 associates with the large ribosomal subunit and properly positions its E3-catalytic RING domain to mediate nascent chain ubiquitylation. The structure also reveals that a distinguishing feature of stalled 60S particles is an exposed, nascent chain-conjugated tRNA, and that the Tae2 subunit of RQC, which facilitates Ltn1 binding, is responsible for selective recognition of stalled 60S subunits. RQC components are engaged in interactions across a large span of the 60S subunit surface, connecting the tRNA in the peptidyl transferase center to the distally located nascent chain tunnel exit. This work provides insights into a mechanism linking translation and protein degradation that targets defective proteins immediately after synthesis, while ignoring nascent chains in normally translating ribosomes. PMID:25349383
Stepwise evolution of protein native structure with electrospray into the gas phase, 10−12 to 102 s
Breuker, Kathrin; McLafferty, Fred W.
2008-01-01
Mass spectrometry (MS) has been revolutionized by electrospray ionization (ESI), which is sufficiently “gentle” to introduce nonvolatile biomolecules such as proteins and nucleic acids (RNA or DNA) into the gas phase without breaking covalent bonds. Although in some cases noncovalent bonding can be maintained sufficiently for ESI/MS characterization of the solution structure of large protein complexes and native enzyme/substrate binding, the new gaseous environment can ultimately cause dramatic structural alterations. The temporal (picoseconds to minutes) evolution of native protein structure during and after transfer into the gas phase, as proposed here based on a variety of studies, can involve side-chain collapse, unfolding, and refolding into new, non-native structures. Control of individual experimental factors allows optimization for specific research objectives. PMID:19033474
Rigidity of transmembrane proteins determines their cluster shape
NASA Astrophysics Data System (ADS)
Jafarinia, Hamidreza; Khoshnood, Atefeh; Jalali, Mir Abbas
2016-01-01
Protein aggregation in cell membrane is vital for the majority of biological functions. Recent experimental results suggest that transmembrane domains of proteins such as α -helices and β -sheets have different structural rigidities. We use molecular dynamics simulation of a coarse-grained model of protein-embedded lipid membranes to investigate the mechanisms of protein clustering. For a variety of protein concentrations, our simulations under thermal equilibrium conditions reveal that the structural rigidity of transmembrane domains dramatically affects interactions and changes the shape of the cluster. We have observed stable large aggregates even in the absence of hydrophobic mismatch, which has been previously proposed as the mechanism of protein aggregation. According to our results, semiflexible proteins aggregate to form two-dimensional clusters, while rigid proteins, by contrast, form one-dimensional string-like structures. By assuming two probable scenarios for the formation of a two-dimensional triangular structure, we calculate the lipid density around protein clusters and find that the difference in lipid distribution around rigid and semiflexible proteins determines the one- or two-dimensional nature of aggregates. It is found that lipids move faster around semiflexible proteins than rigid ones. The aggregation mechanism suggested in this paper can be tested by current state-of-the-art experimental facilities.
Novel Computational Approaches to Drug Discovery
NASA Astrophysics Data System (ADS)
Skolnick, Jeffrey; Brylinski, Michal
2010-01-01
New approaches to protein functional inference based on protein structure and evolution are described. First, FINDSITE, a threading based approach to protein function prediction, is summarized. Then, the results of large scale benchmarking of ligand binding site prediction, ligand screening, including applications to HIV protease, and GO molecular functional inference are presented. A key advantage of FINDSITE is its ability to use low resolution, predicted structures as well as high resolution experimental structures. Then, an extension of FINDSITE to ligand screening in GPCRs using predicted GPCR structures, FINDSITE/QDOCKX, is presented. This is a particularly difficult case as there are few experimentally solved GPCR structures. Thus, we first train on a subset of known binding ligands for a set of GPCRs; this is then followed by benchmarking against a large ligand library. For the virtual ligand screening of a number of Dopamine receptors, encouraging results are seen, with significant enrichment in identified ligands over those found in the training set. Thus, FINDSITE and its extensions represent a powerful approach to the successful prediction of a variety of molecular functions.
Sgourakis, Nikolaos G; Natarajan, Kannan; Ying, Jinfa; Vogeli, Beat; Boyd, Lisa F; Margulies, David H; Bax, Ad
2014-09-02
Immunoevasins are key proteins used by viruses to subvert host immune responses. Determining their high-resolution structures is key to understanding virus-host interactions toward the design of vaccines and other antiviral therapies. Mouse cytomegalovirus encodes a unique set of immunoevasins, the m02-m06 family, that modulates major histocompatibility complex class I (MHC-I) antigen presentation to CD8+ T cells and natural killer cells. Notwithstanding the large number of genetic and functional studies, the structural biology of immunoevasins remains incompletely understood, largely because of crystallization bottlenecks. Here we implement a technology using sparse nuclear magnetic resonance data and integrative Rosetta modeling to determine the structure of the m04/gp34 immunoevasin extracellular domain. The structure reveals a β fold that is representative of the m02-m06 family of viral proteins, several of which are known to bind MHC-I molecules and interfere with antigen presentation, suggesting its role as a diversified immune regulation module. Copyright © 2014 Elsevier Ltd. All rights reserved.
Survey of large protein complexes D. vulgaris reveals great structural diversity
DOE Office of Scientific and Technical Information (OSTI.GOV)
Han, B.-G.; Dong, M.; Liu, H.
2009-08-15
An unbiased survey has been made of the stable, most abundant multi-protein complexes in Desulfovibrio vulgaris Hildenborough (DvH) that are larger than Mr {approx} 400 k. The quaternary structures for 8 of the 16 complexes purified during this work were determined by single-particle reconstruction of negatively stained specimens, a success rate {approx}10 times greater than that of previous 'proteomic' screens. In addition, the subunit compositions and stoichiometries of the remaining complexes were determined by biochemical methods. Our data show that the structures of only two of these large complexes, out of the 13 in this set that have recognizable functions,more » can be modeled with confidence based on the structures of known homologs. These results indicate that there is significantly greater variability in the way that homologous prokaryotic macromolecular complexes are assembled than has generally been appreciated. As a consequence, we suggest that relying solely on previously determined quaternary structures for homologous proteins may not be sufficient to properly understand their role in another cell of interest.« less
Buried and accessible surface area control intrinsic protein flexibility.
Marsh, Joseph A
2013-09-09
Proteins experience a wide variety of conformational dynamics that can be crucial for facilitating their diverse functions. How is the intrinsic flexibility required for these motions encoded in their three-dimensional structures? Here, the overall flexibility of a protein is demonstrated to be tightly coupled to the total amount of surface area buried within its fold. A simple proxy for this, the relative solvent-accessible surface area (Arel), therefore shows excellent agreement with independent measures of global protein flexibility derived from various experimental and computational methods. Application of Arel on a large scale demonstrates its utility by revealing unique sequence and structural properties associated with intrinsic flexibility. In particular, flexibility as measured by Arel shows little correspondence with intrinsic disorder, but instead tends to be associated with multiple domains and increased α-helical structure. Furthermore, the apparent flexibility of monomeric proteins is found to be useful for identifying quaternary-structure errors in published crystal structures. There is also a strong tendency for the crystal structures of more flexible proteins to be solved to lower resolutions. Finally, local solvent accessibility is shown to be a primary determinant of local residue flexibility. Overall, this work provides both fundamental mechanistic insight into the origin of protein flexibility and a simple, practical method for predicting flexibility from protein structures. © 2013 Elsevier Ltd. All rights reserved.
Protein Structure Classification and Loop Modeling Using Multiple Ramachandran Distributions.
Najibi, Seyed Morteza; Maadooliat, Mehdi; Zhou, Lan; Huang, Jianhua Z; Gao, Xin
2017-01-01
Recently, the study of protein structures using angular representations has attracted much attention among structural biologists. The main challenge is how to efficiently model the continuous conformational space of the protein structures based on the differences and similarities between different Ramachandran plots. Despite the presence of statistical methods for modeling angular data of proteins, there is still a substantial need for more sophisticated and faster statistical tools to model the large-scale circular datasets. To address this need, we have developed a nonparametric method for collective estimation of multiple bivariate density functions for a collection of populations of protein backbone angles. The proposed method takes into account the circular nature of the angular data using trigonometric spline which is more efficient compared to existing methods. This collective density estimation approach is widely applicable when there is a need to estimate multiple density functions from different populations with common features. Moreover, the coefficients of adaptive basis expansion for the fitted densities provide a low-dimensional representation that is useful for visualization, clustering, and classification of the densities. The proposed method provides a novel and unique perspective to two important and challenging problems in protein structure research: structure-based protein classification and angular-sampling-based protein loop structure prediction.
Chevrier, D. M.; Thanthirige, V. D.; Luo, Z.; Driscoll, S.; Cho, P.; MacDonald, M. A.; Yao, Q.; Guda, R.; Xie, J.; Johnson, E. R.; Chatt, A.; Zheng, N.
2018-01-01
Highly luminescent gold clusters simultaneously synthesized and stabilized by protein molecules represent a remarkable category of nanoscale materials with promising applications in bionanotechnology as sensors. Nevertheless, the atomic structure and luminescence mechanism of these gold clusters are still unknown after several years of developments. Herein, we report findings on the structure, luminescence and biomolecular self-assembly of gold clusters stabilized by the large globular protein, bovine serum albumin. We highlight the surprising identification of interlocked gold-thiolate rings as the main gold structural unit. Importantly, such gold clusters are in a rigidified state within the protein scaffold, offering an explanation for their highly luminescent character. Combined free-standing cluster synthesis (without protecting protein scaffold) with rigidifying and un-rigidifying experiments, were designed to further verify the luminescence mechanism and gold atomic structure within the protein. Finally, the biomolecular self-assembly process of the protein-stabilized gold clusters was elucidated by time-dependent X-ray absorption spectroscopy measurements and density functional theory calculations. PMID:29732064
Gindullis, Frank; Rose, Annkatrin; Patel, Shalaka; Meier, Iris
2002-01-01
Background Animal and yeast proteins containing long coiled-coil domains are involved in attaching other proteins to the large, solid-state components of the cell. One subgroup of long coiled-coil proteins are the nuclear lamins, which are involved in attaching chromatin to the nuclear envelope and have recently been implicated in inherited human diseases. In contrast to other eukaryotes, long coiled-coil proteins have been barely investigated in plants. Results We have searched the completed Arabidopsis genome and have identified a family of structurally related long coiled-coil proteins. Filament-like plant proteins (FPP) were identified by sequence similarity to a tomato cDNA that encodes a coiled-coil protein which interacts with the nuclear envelope-associated protein, MAF1. The FPP family is defined by four novel unique sequence motifs and by two clusters of long coiled-coil domains separated by a non-coiled-coil linker. All family members are expressed in a variety of Arabidopsis tissues. A homolog sharing the structural features was identified in the monocot rice, indicating conservation among angiosperms. Conclusion Except for myosins, this is the first characterization of a family of long coiled-coil proteins in plants. The tomato homolog of the FPP family binds in a yeast two-hybrid assay to a nuclear envelope-associated protein. This might suggest that FPP family members function in nuclear envelope biology. Because the full Arabidopsis genome does not appear to contain genes for lamins, it is of interest to investigate other long coiled-coil proteins, which might functionally replace lamins in the plant kingdom. PMID:11972898
Looping and clustering model for the organization of protein-DNA complexes on the bacterial genome
NASA Astrophysics Data System (ADS)
Walter, Jean-Charles; Walliser, Nils-Ole; David, Gabriel; Dorignac, Jérôme; Geniet, Frédéric; Palmeri, John; Parmeggiani, Andrea; Wingreen, Ned S.; Broedersz, Chase P.
2018-03-01
The bacterial genome is organized by a variety of associated proteins inside a structure called the nucleoid. These proteins can form complexes on DNA that play a central role in various biological processes, including chromosome segregation. A prominent example is the large ParB-DNA complex, which forms an essential component of the segregation machinery in many bacteria. ChIP-Seq experiments show that ParB proteins localize around centromere-like parS sites on the DNA to which ParB binds specifically, and spreads from there over large sections of the chromosome. Recent theoretical and experimental studies suggest that DNA-bound ParB proteins can interact with each other to condense into a coherent 3D complex on the DNA. However, the structural organization of this protein-DNA complex remains unclear, and a predictive quantitative theory for the distribution of ParB proteins on DNA is lacking. Here, we propose the looping and clustering model, which employs a statistical physics approach to describe protein-DNA complexes. The looping and clustering model accounts for the extrusion of DNA loops from a cluster of interacting DNA-bound proteins that is organized around a single high-affinity binding site. Conceptually, the structure of the protein-DNA complex is determined by a competition between attractive protein interactions and loop closure entropy of this protein-DNA cluster on the one hand, and the positional entropy for placing loops within the cluster on the other. Indeed, we show that the protein interaction strength determines the ‘tightness’ of the loopy protein-DNA complex. Thus, our model provides a theoretical framework for quantitatively computing the binding profiles of ParB-like proteins around a cognate (parS) binding site.
The electric dipole moment of DNA-binding HU protein calculated by the use of an NMR database.
Takashima, S; Yamaoka, K
1999-08-30
Electric birefringence measurements indicated the presence of a large permanent dipole moment in HU protein-DNA complex. In order to substantiate this observation, numerical computation of the dipole moment of HU protein homodimer was carried out by using NMR protein databases. The dipole moments of globular proteins have hitherto been calculated with X-ray databases and NMR data have never been used before. The advantages of NMR databases are: (a) NMR data are obtained, unlike X-ray databases, using protein solutions. Accordingly, this method eliminates the bothersome question as to the possible alteration of the protein structure due to the transition from the crystalline state to the solution state. This question is particularly important for proteins such as HU protein which has some degree of internal flexibility; (b) the three-dimensional coordinates of hydrogen atoms in protein molecules can be determined with a sufficient resolution and this enables the N-H as well as C = O bond moments to be calculated. Since the NMR database of HU protein from Bacillus stearothermophilus consists of 25 models, the surface charge as well as the core dipole moments were computed for each of these structures. The results of these calculations show that the net permanent dipole moments of HU protein homodimer is approximately 500-530 D (1 D = 3.33 x 10(-30) Cm) at pH 7.5 and 600-630 D at the isoelectric point (pH 10.5). These permanent dipole moments are unusually large for a small protein of the size of 19.5 kDa. Nevertheless, the result of numerical calculations is compatible with the electro-optical observation, confirming a very large dipole moment in this protein.
Segmental Isotopic Labeling of Proteins for Nuclear Magnetic Resonance
Dongsheng, Liu; Xu, Rong; Cowburn, David
2009-01-01
Nuclear Magnetic Resonance (NMR) spectroscopy has emerged as one of the principle techniques of structural biology. It is not only a powerful method for elucidating the 3D structures under near physiological conditions, but also a convenient method for studying protein-ligand interactions and protein dynamics. A major drawback of macromolecular NMR is its size limitation caused by slower tumbling rates and greater complexity of the spectra as size increases. Segmental isotopic labeling allows specific segment(s) within a protein to be selectively examined by NMR thus significantly reducing the spectral complexity for large proteins and allowing a variety of solution-based NMR strategies to be applied. Two related approaches are generally used in the segmental isotopic labeling of proteins: expressed protein ligation and protein trans-splicing. Here we describe the methodology and recent application of expressed protein ligation and protein trans-splicing for NMR structural studies of proteins and protein complexes. We also describe the protocol used in our lab for the segmental isotopic labeling of a 50 kDa protein Csk (C-terminal Src Kinase) using expressed protein ligation methods. PMID:19632474
Shao, Qiang
2016-10-26
Large-scale conformational changes in proteins are important for their functions. Tracking the conformational change in real time at the level of a single protein molecule, however, remains a great challenge. In this article, we present a novel in silico approach with the combination of normal mode analysis and integrated-tempering-sampling molecular simulation (NMA-ITS) to give quantitative data for exploring the conformational transition pathway in multi-dimensional energy landscapes starting only from the knowledge of the two endpoint structures of the protein. The open-to-closed transitions of three proteins, including nCaM, AdK, and HIV-1 PR, were investigated using NMA-ITS simulations. The three proteins have varied structural flexibilities and domain communications in their respective conformational changes. The transition state structure in the conformational change of nCaM and the associated free-energy barrier are in agreement with those measured in a standard explicit-solvent REMD simulation. The experimentally measured transition intermediate structures of the intrinsically flexible AdK are captured by the conformational transition pathway measured here. The dominant transition pathways between the closed and fully open states of HIV-1 PR are very similar to those observed in recent REMD simulations. Finally, the evaluated relaxation times of the conformational transitions of three proteins are roughly at the same level as reported experimental data. Therefore, the NMA-ITS method is applicable for a variety of cases, providing both qualitative and quantitative insights into the conformational changes associated with the real functions of proteins.
Integrated structural biology to unravel molecular mechanisms of protein-RNA recognition.
Schlundt, Andreas; Tants, Jan-Niklas; Sattler, Michael
2017-04-15
Recent advances in RNA sequencing technologies have greatly expanded our knowledge of the RNA landscape in cells, often with spatiotemporal resolution. These techniques identified many new (often non-coding) RNA molecules. Large-scale studies have also discovered novel RNA binding proteins (RBPs), which exhibit single or multiple RNA binding domains (RBDs) for recognition of specific sequence or structured motifs in RNA. Starting from these large-scale approaches it is crucial to unravel the molecular principles of protein-RNA recognition in ribonucleoprotein complexes (RNPs) to understand the underlying mechanisms of gene regulation. Structural biology and biophysical studies at highest possible resolution are key to elucidate molecular mechanisms of RNA recognition by RBPs and how conformational dynamics, weak interactions and cooperative binding contribute to the formation of specific, context-dependent RNPs. While large compact RNPs can be well studied by X-ray crystallography and cryo-EM, analysis of dynamics and weak interaction necessitates the use of solution methods to capture these properties. Here, we illustrate methods to study the structure and conformational dynamics of protein-RNA complexes in solution starting from the identification of interaction partners in a given RNP. Biophysical and biochemical techniques support the characterization of a protein-RNA complex and identify regions relevant in structural analysis. Nuclear magnetic resonance (NMR) is a powerful tool to gain information on folding, stability and dynamics of RNAs and characterize RNPs in solution. It provides crucial information that is complementary to the static pictures derived from other techniques. NMR can be readily combined with other solution techniques, such as small angle X-ray and/or neutron scattering (SAXS/SANS), electron paramagnetic resonance (EPR), and Förster resonance energy transfer (FRET), which provide information about overall shapes, internal domain arrangements and dynamics. Principles of protein-RNA recognition and current approaches are reviewed and illustrated with recent studies. Copyright © 2017 Elsevier Inc. All rights reserved.
Large protein as a potential target for use in rabies diagnostics.
Santos Katz, I S; Dias, M H; Lima, I F; Chaves, L B; Ribeiro, O G; Scheffer, K C; Iwai, L K
Rabies is a zoonotic viral disease that remains a serious threat to public health worldwide. The rabies lyssavirus (RABV) genome encodes five structural proteins, multifunctional and significant for pathogenicity. The large protein (L) presents well-conserved genomic regions, which may be a good alternative to generate informative datasets for development of new methods for rabies diagnosis. This paper describes the development of a technique for the identification of L protein in several RABV strains from different hosts, demonstrating that MS-based proteomics is a potential method for antigen identification and a good alternative for rabies diagnosis.
Thompson, Jared J; Tabatabaei Ghomi, Hamed; Lill, Markus A
2014-12-01
Knowledge-based methods for analyzing protein structures, such as statistical potentials, primarily consider the distances between pairs of bodies (atoms or groups of atoms). Considerations of several bodies simultaneously are generally used to characterize bonded structural elements or those in close contact with each other, but historically do not consider atoms that are not in direct contact with each other. In this report, we introduce an information-theoretic method for detecting and quantifying distance-dependent through-space multibody relationships between the sidechains of three residues. The technique introduced is capable of producing convergent and consistent results when applied to a sufficiently large database of randomly chosen, experimentally solved protein structures. The results of our study can be shown to reproduce established physico-chemical properties of residues as well as more recently discovered properties and interactions. These results offer insight into the numerous roles that residues play in protein structure, as well as relationships between residue function, protein structure, and evolution. The techniques and insights presented in this work should be useful in the future development of novel knowledge-based tools for the evaluation of protein structure. © 2014 Wiley Periodicals, Inc.
Structure prediction of polyglutamine disease proteins: comparison of methods
2014-01-01
Background The expansion of polyglutamine (poly-Q) repeats in several unrelated proteins is associated with at least ten neurodegenerative diseases. The length of the poly-Q regions plays an important role in the progression of the diseases. The number of glutamines (Q) is inversely related to the onset age of these polyglutamine diseases, and the expansion of poly-Q repeats has been associated with protein misfolding. However, very little is known about the structural changes induced by the expansion of the repeats. Computational methods can provide an alternative to determine the structure of these poly-Q proteins, but it is important to evaluate their performance before large scale prediction work is done. Results In this paper, two popular protein structure prediction programs, I-TASSER and Rosetta, have been used to predict the structure of the N-terminal fragment of a protein associated with Huntington's disease with 17 glutamines. Results show that both programs have the ability to find the native structures, but I-TASSER performs better for the overall task. Conclusions Both I-TASSER and Rosetta can be used for structure prediction of proteins with poly-Q repeats. Knowledge of poly-Q structure may significantly contribute to development of therapeutic strategies for poly-Q diseases. PMID:25080018
Chae, Pil Seok; Rasmussen, Søren G F; Rana, Rohini R; Gotfryd, Kamil; Chandra, Richa; Goren, Michael A; Kruse, Andrew C; Nurva, Shailika; Loland, Claus J; Pierre, Yves; Drew, David; Popot, Jean-Luc; Picot, Daniel; Fox, Brian G; Guan, Lan; Gether, Ulrik; Byrne, Bernadette; Kobilka, Brian; Gellman, Samuel H
2010-12-01
The understanding of integral membrane protein (IMP) structure and function is hampered by the difficulty of handling these proteins. Aqueous solubilization, necessary for many types of biophysical analysis, generally requires a detergent to shield the large lipophilic surfaces of native IMPs. Many proteins remain difficult to study owing to a lack of suitable detergents. We introduce a class of amphiphiles, each built around a central quaternary carbon atom derived from neopentyl glycol, with hydrophilic groups derived from maltose. Representatives of this maltose-neopentyl glycol (MNG) amphiphile family show favorable behavior relative to conventional detergents, as manifested in multiple membrane protein systems, leading to enhanced structural stability and successful crystallization. MNG amphiphiles are promising tools for membrane protein science because of the ease with which they may be prepared and the facility with which their structures may be varied.
Leite, Wellington C; Galvão, Carolina W; Saab, Sérgio C; Iulek, Jorge; Etto, Rafael M; Steffens, Maria B R; Chitteni-Pattu, Sindhu; Stanage, Tyler; Keck, James L; Cox, Michael M
2016-01-01
The bacterial RecA protein plays a role in the complex system of DNA damage repair. Here, we report the functional and structural characterization of the Herbaspirillum seropedicae RecA protein (HsRecA). HsRecA protein is more efficient at displacing SSB protein from ssDNA than Escherichia coli RecA protein. HsRecA also promotes DNA strand exchange more efficiently. The three dimensional structure of HsRecA-ADP/ATP complex has been solved to 1.7 Å resolution. HsRecA protein contains a small N-terminal domain, a central core ATPase domain and a large C-terminal domain, that are similar to homologous bacterial RecA proteins. Comparative structural analysis showed that the N-terminal polymerization motif of archaeal and eukaryotic RecA family proteins are also present in bacterial RecAs. Reconstruction of electrostatic potential from the hexameric structure of HsRecA-ADP/ATP revealed a high positive charge along the inner side, where ssDNA is bound inside the filament. The properties of this surface may explain the greater capacity of HsRecA protein to bind ssDNA, forming a contiguous nucleoprotein filament, displace SSB and promote DNA exchange relative to EcRecA. Our functional and structural analyses provide insight into the molecular mechanisms of polymerization of bacterial RecA as a helical nucleoprotein filament.
Predicting protein structures with a multiplayer online game.
Cooper, Seth; Khatib, Firas; Treuille, Adrien; Barbero, Janos; Lee, Jeehyung; Beenen, Michael; Leaver-Fay, Andrew; Baker, David; Popović, Zoran; Players, Foldit
2010-08-05
People exert large amounts of problem-solving effort playing computer games. Simple image- and text-recognition tasks have been successfully 'crowd-sourced' through games, but it is not clear if more complex scientific problems can be solved with human-directed computing. Protein structure prediction is one such problem: locating the biologically relevant native conformation of a protein is a formidable computational challenge given the very large size of the search space. Here we describe Foldit, a multiplayer online game that engages non-scientists in solving hard prediction problems. Foldit players interact with protein structures using direct manipulation tools and user-friendly versions of algorithms from the Rosetta structure prediction methodology, while they compete and collaborate to optimize the computed energy. We show that top-ranked Foldit players excel at solving challenging structure refinement problems in which substantial backbone rearrangements are necessary to achieve the burial of hydrophobic residues. Players working collaboratively develop a rich assortment of new strategies and algorithms; unlike computational approaches, they explore not only the conformational space but also the space of possible search strategies. The integration of human visual problem-solving and strategy development capabilities with traditional computational algorithms through interactive multiplayer games is a powerful new approach to solving computationally-limited scientific problems.
Goblirsch, Brandon; Kurker, Richard C.; Streit, Bennett R.; Wilmot, Carrie M.; DuBois, Jennifer L.
2011-01-01
Heme proteins are extremely diverse, widespread, and versatile biocatalysts, sensors, and molecular transporters. The chlorite dismutase family of hemoproteins received its name due to the ability of the first-isolated members to detoxify anthropogenic ClO2−, a function believed to have evolved only in the last few decades. Family members have since been found in fifteen bacterial and archaeal genera, suggesting ancient roots. A structure- and sequence-based examination of the family is presented, in which key sequence and structural motifs are identified and possible functions for family proteins are proposed. Newly identified structural homologies moreover demonstrate clear connections to two other large, ancient, and functionally mysterious protein families. We propose calling them collectively the CDE superfamily of heme proteins. PMID:21354424
Protein functional features are reflected in the patterns of mRNA translation speed.
López, Daniel; Pazos, Florencio
2015-07-09
The degeneracy of the genetic code makes it possible for the same amino acid string to be coded by different messenger RNA (mRNA) sequences. These "synonymous mRNAs" may differ largely in a number of aspects related to their overall translational efficiency, such as secondary structure content and availability of the encoded transfer RNAs (tRNAs). Consequently, they may render different yields of the translated polypeptides. These mRNA features related to translation efficiency are also playing a role locally, resulting in a non-uniform translation speed along the mRNA, which has been previously related to some protein structural features and also used to explain some dramatic effects of "silent" single-nucleotide-polymorphisms (SNPs). In this work we perform the first large scale analysis of the relationship between three experimental proxies of mRNA local translation efficiency and the local features of the corresponding encoded proteins. We found that a number of protein functional and structural features are reflected in the patterns of ribosome occupancy, secondary structure and tRNA availability along the mRNA. One or more of these proxies of translation speed have distinctive patterns around the mRNA regions coding for certain protein local features. In some cases the three patterns follow a similar trend. We also show specific examples where these patterns of translation speed point to the protein's important structural and functional features. This support the idea that the genome not only codes the protein functional features as sequences of amino acids, but also as subtle patterns of mRNA properties which, probably through local effects on the translation speed, have some consequence on the final polypeptide. These results open the possibility of predicting a protein's functional regions based on a single genomic sequence, and have implications for heterologous protein expression and fine-tuning protein function.
The Biophysics Microgravity Initiative
NASA Technical Reports Server (NTRS)
Gorti, S.
2016-01-01
Biophysical microgravity research on the International Space Station using biological materials has been ongoing for several decades. The well-documented substantive effects of long duration microgravity include the facilitation of the assembly of biological macromolecules into large structures, e.g., formation of large protein crystals under micro-gravity. NASA is invested not only in understanding the possible physical mechanisms of crystal growth, but also promoting two flight investigations to determine the influence of µ-gravity on protein crystal quality. In addition to crystal growth, flight investigations to determine the effects of shear on nucleation and subsequent formation of complex structures (e.g., crystals, fibrils, etc.) are also supported. It is now considered that long duration microgravity research aboard the ISS could also make possible the formation of large complex biological and biomimetic materials. Investigations of various materials undergoing complex structure formation in microgravity will not only strengthen NASA science programs, but may also provide invaluable insight towards the construction of large complex tissues, organs, or biomimetic materials on Earth.
High-throughput crystallization screening.
Skarina, Tatiana; Xu, Xiaohui; Evdokimova, Elena; Savchenko, Alexei
2014-01-01
Protein structure determination by X-ray crystallography is dependent on obtaining a single protein crystal suitable for diffraction data collection. Due to this requirement, protein crystallization represents a key step in protein structure determination. The conditions for protein crystallization have to be determined empirically for each protein, making this step also a bottleneck in the structure determination process. Typical protein crystallization practice involves parallel setup and monitoring of a considerable number of individual protein crystallization experiments (also called crystallization trials). In these trials the aliquots of purified protein are mixed with a range of solutions composed of a precipitating agent, buffer, and sometimes an additive that have been previously successful in prompting protein crystallization. The individual chemical conditions in which a particular protein shows signs of crystallization are used as a starting point for further crystallization experiments. The goal is optimizing the formation of individual protein crystals of sufficient size and quality to make them suitable for diffraction data collection. Thus the composition of the primary crystallization screen is critical for successful crystallization.Systematic analysis of crystallization experiments carried out on several hundred proteins as part of large-scale structural genomics efforts allowed the optimization of the protein crystallization protocol and identification of a minimal set of 96 crystallization solutions (the "TRAP" screen) that, in our experience, led to crystallization of the maximum number of proteins.
Hsing, Michael; Cherkasov, Artem
2008-06-25
Insertions and deletions (indels) represent a common type of sequence variations, which are less studied and pose many important biological questions. Recent research has shown that the presence of sizable indels in protein sequences may be indicative of protein essentiality and their role in protein interaction networks. Examples of utilization of indels for structure-based drug design have also been recently demonstrated. Nonetheless many structural and functional characteristics of indels remain less researched or unknown. We have created a web-based resource, Indel PDB, representing a structural database of insertions/deletions identified from the sequence alignments of highly similar proteins found in the Protein Data Bank (PDB). Indel PDB utilized large amounts of available structural information to characterize 1-, 2- and 3-dimensional features of indel sites. Indel PDB contains 117,266 non-redundant indel sites extracted from 11,294 indel-containing proteins. Unlike loop databases, Indel PDB features more indel sequences with secondary structures including alpha-helices and beta-sheets in addition to loops. The insertion fragments have been characterized by their sequences, lengths, locations, secondary structure composition, solvent accessibility, protein domain association and three dimensional structures. By utilizing the data available in Indel PDB, we have studied and presented here several sequence and structural features of indels. We anticipate that Indel PDB will not only enable future functional studies of indels, but will also assist protein modeling efforts and identification of indel-directed drug binding sites.
Efficient protein structure search using indexing methods
2013-01-01
Understanding functions of proteins is one of the most important challenges in many studies of biological processes. The function of a protein can be predicted by analyzing the functions of structurally similar proteins, thus finding structurally similar proteins accurately and efficiently from a large set of proteins is crucial. A protein structure can be represented as a vector by 3D-Zernike Descriptor (3DZD) which compactly represents the surface shape of the protein tertiary structure. This simplified representation accelerates the searching process. However, computing the similarity of two protein structures is still computationally expensive, thus it is hard to efficiently process many simultaneous requests of structurally similar protein search. This paper proposes indexing techniques which substantially reduce the search time to find structurally similar proteins. In particular, we first exploit two indexing techniques, i.e., iDistance and iKernel, on the 3DZDs. After that, we extend the techniques to further improve the search speed for protein structures. The extended indexing techniques build and utilize an reduced index constructed from the first few attributes of 3DZDs of protein structures. To retrieve top-k similar structures, top-10 × k similar structures are first found using the reduced index, and top-k structures are selected among them. We also modify the indexing techniques to support θ-based nearest neighbor search, which returns data points less than θ to the query point. The results show that both iDistance and iKernel significantly enhance the searching speed. In top-k nearest neighbor search, the searching time is reduced 69.6%, 77%, 77.4% and 87.9%, respectively using iDistance, iKernel, the extended iDistance, and the extended iKernel. In θ-based nearest neighbor serach, the searching time is reduced 80%, 81%, 95.6% and 95.6% using iDistance, iKernel, the extended iDistance, and the extended iKernel, respectively. PMID:23691543
Efficient protein structure search using indexing methods.
Kim, Sungchul; Sael, Lee; Yu, Hwanjo
2013-01-01
Understanding functions of proteins is one of the most important challenges in many studies of biological processes. The function of a protein can be predicted by analyzing the functions of structurally similar proteins, thus finding structurally similar proteins accurately and efficiently from a large set of proteins is crucial. A protein structure can be represented as a vector by 3D-Zernike Descriptor (3DZD) which compactly represents the surface shape of the protein tertiary structure. This simplified representation accelerates the searching process. However, computing the similarity of two protein structures is still computationally expensive, thus it is hard to efficiently process many simultaneous requests of structurally similar protein search. This paper proposes indexing techniques which substantially reduce the search time to find structurally similar proteins. In particular, we first exploit two indexing techniques, i.e., iDistance and iKernel, on the 3DZDs. After that, we extend the techniques to further improve the search speed for protein structures. The extended indexing techniques build and utilize an reduced index constructed from the first few attributes of 3DZDs of protein structures. To retrieve top-k similar structures, top-10 × k similar structures are first found using the reduced index, and top-k structures are selected among them. We also modify the indexing techniques to support θ-based nearest neighbor search, which returns data points less than θ to the query point. The results show that both iDistance and iKernel significantly enhance the searching speed. In top-k nearest neighbor search, the searching time is reduced 69.6%, 77%, 77.4% and 87.9%, respectively using iDistance, iKernel, the extended iDistance, and the extended iKernel. In θ-based nearest neighbor serach, the searching time is reduced 80%, 81%, 95.6% and 95.6% using iDistance, iKernel, the extended iDistance, and the extended iKernel, respectively.
The role of protein structural analysis in the next generation sequencing era.
Yue, Wyatt W; Froese, D Sean; Brennan, Paul E
2014-01-01
Proteins are macromolecules that serve a cell's myriad processes and functions in all living organisms via dynamic interactions with other proteins, small molecules and cellular components. Genetic variations in the protein-encoding regions of the human genome account for >85% of all known Mendelian diseases, and play an influential role in shaping complex polygenic diseases. Proteins also serve as the predominant target class for the design of small molecule drugs to modulate their activity. Knowledge of the shape and form of proteins, by means of their three-dimensional structures, is therefore instrumental to understanding their roles in disease and their potentials for drug development. In this chapter we outline, with the wide readership of non-structural biologists in mind, the various experimental and computational methods available for protein structure determination. We summarize how the wealth of structure information, contributed to a large extent by the technological advances in structure determination to date, serves as a useful tool to decipher the molecular basis of genetic variations for disease characterization and diagnosis, particularly in the emerging era of genomic medicine, and becomes an integral component in the modern day approach towards rational drug development.
Yu, Clinton; Huszagh, Alexander; Viner, Rosa; Novitsky, Eric J; Rychnovsky, Scott D; Huang, Lan
2016-10-18
Cross-linking mass spectrometry (XL-MS) represents a recently popularized hybrid methodology for defining protein-protein interactions (PPIs) and analyzing structures of large protein assemblies. In particular, XL-MS strategies have been demonstrated to be effective in elucidating molecular details of PPIs at the peptide resolution, providing a complementary set of structural data that can be utilized to refine existing complex structures or direct de novo modeling of unknown protein structures. To study structural and interaction dynamics of protein complexes, quantitative cross-linking mass spectrometry (QXL-MS) strategies based on isotope-labeled cross-linkers have been developed. Although successful, these approaches are mostly limited to pairwise comparisons. In order to establish a robust workflow enabling comparative analysis of multiple cross-linked samples simultaneously, we have developed a multiplexed QXL-MS strategy, namely, QMIX (Quantitation of Multiplexed, Isobaric-labeled cross (X)-linked peptides) by integrating MS-cleavable cross-linkers with isobaric labeling reagents. This study has established a new analytical platform for quantitative analysis of cross-linked peptides, which can be directly applied for multiplexed comparisons of the conformational dynamics of protein complexes and PPIs at the proteome scale in future studies.
Discrete Molecular Dynamics Approach to the Study of Disordered and Aggregating Proteins.
Emperador, Agustí; Orozco, Modesto
2017-03-14
We present a refinement of the Coarse Grained PACSAB force field for Discrete Molecular Dynamics (DMD) simulations of proteins in aqueous conditions. As the original version, the refined method provides good representation of the structure and dynamics of folded proteins but provides much better representations of a variety of unfolded proteins, including some very large, impossible to analyze by atomistic simulation methods. The PACSAB/DMD method also reproduces accurately aggregation properties, providing good pictures of the structural ensembles of proteins showing a folded core and an intrinsically disordered region. The combination of accuracy and speed makes the method presented here a good alternative for the exploration of unstructured protein systems.
Structure and assembly of scalable porous protein cages
NASA Astrophysics Data System (ADS)
Sasaki, Eita; Böhringer, Daniel; van de Waterbeemd, Michiel; Leibundgut, Marc; Zschoche, Reinhard; Heck, Albert J. R.; Ban, Nenad; Hilvert, Donald
2017-03-01
Proteins that self-assemble into regular shell-like polyhedra are useful, both in nature and in the laboratory, as molecular containers. Here we describe cryo-electron microscopy (EM) structures of two versatile encapsulation systems that exploit engineered electrostatic interactions for cargo loading. We show that increasing the number of negative charges on the lumenal surface of lumazine synthase, a protein that naturally assembles into a ~1-MDa dodecahedron composed of 12 pentamers, induces stepwise expansion of the native protein shell, giving rise to thermostable ~3-MDa and ~6-MDa assemblies containing 180 and 360 subunits, respectively. Remarkably, these expanded particles assume unprecedented tetrahedrally and icosahedrally symmetric structures constructed entirely from pentameric units. Large keyhole-shaped pores in the shell, not present in the wild-type capsid, enable diffusion-limited encapsulation of complementarily charged guests. The structures of these supercharged assemblies demonstrate how programmed electrostatic effects can be effectively harnessed to tailor the architecture and properties of protein cages.
GDP Release Preferentially Occurs on the Phosphate Side in Heterotrimeric G-proteins
Louet, Maxime; Martinez, Jean; Floquet, Nicolas
2012-01-01
After extra-cellular stimulation of G-Protein Coupled Receptors (GPCRs), GDP/GTP exchange appears as the key, rate limiting step of the intracellular activation cycle of heterotrimeric G-proteins. Despite the availability of a large number of X-ray structures, the mechanism of GDP release out of heterotrimeric G-proteins still remains unknown at the molecular level. Starting from the available X-ray structure, extensive unconstrained/constrained molecular dynamics simulations were performed on the complete membrane-anchored Gi heterotrimer complexed to GDP, for a total simulation time overcoming 500 ns. By combining Targeted Molecular Dynamics (TMD) and free energy profiles reconstruction by umbrella sampling, our data suggest that the release of GDP was much more favored on its phosphate side. Interestingly, upon the forced extraction of GDP on this side, the whole protein encountered large, collective motions in perfect agreement with those we described previously including a domain to domain motion between the two ras-like and helical sub-domains of Gα. PMID:22829757
Zerze, Gül H; Miller, Cayla M; Granata, Daniele; Mittal, Jeetain
2015-06-09
Intrinsically disordered proteins (IDPs), which are expected to be largely unstructured under physiological conditions, make up a large fraction of eukaryotic proteins. Molecular dynamics simulations have been utilized to probe structural characteristics of these proteins, which are not always easily accessible to experiments. However, exploration of the conformational space by brute force molecular dynamics simulations is often limited by short time scales. Present literature provides a number of enhanced sampling methods to explore protein conformational space in molecular simulations more efficiently. In this work, we present a comparison of two enhanced sampling methods: temperature replica exchange molecular dynamics and bias exchange metadynamics. By investigating both the free energy landscape as a function of pertinent order parameters and the per-residue secondary structures of an IDP, namely, human islet amyloid polypeptide, we found that the two methods yield similar results as expected. We also highlight the practical difference between the two methods by describing the path that we followed to obtain both sets of data.
Evolution of Protein Domain Repeats in Metazoa
Schüler, Andreas; Bornberg-Bauer, Erich
2016-01-01
Repeats are ubiquitous elements of proteins and they play important roles for cellular function and during evolution. Repeats are, however, also notoriously difficult to capture computationally and large scale studies so far had difficulties in linking genetic causes, structural properties and evolutionary trajectories of protein repeats. Here we apply recently developed methods for repeat detection and analysis to a large dataset comprising over hundred metazoan genomes. We find that repeats in larger protein families experience generally very few insertions or deletions (indels) of repeat units but there is also a significant fraction of noteworthy volatile outliers with very high indel rates. Analysis of structural data indicates that repeats with an open structure and independently folding units are more volatile and more likely to be intrinsically disordered. Such disordered repeats are also significantly enriched in sites with a high functional potential such as linear motifs. Furthermore, the most volatile repeats have a high sequence similarity between their units. Since many volatile repeats also show signs of recombination, we conclude they are often shaped by concerted evolution. Intriguingly, many of these conserved yet volatile repeats are involved in host-pathogen interactions where they might foster fast but subtle adaptation in biological arms races. Key Words: protein evolution, domain rearrangements, protein repeats, concerted evolution. PMID:27671125
Membrane-spanning α-helical barrels as tractable protein-design targets.
Niitsu, Ai; Heal, Jack W; Fauland, Kerstin; Thomson, Andrew R; Woolfson, Derek N
2017-08-05
The rational ( de novo ) design of membrane-spanning proteins lags behind that for water-soluble globular proteins. This is due to gaps in our knowledge of membrane-protein structure, and experimental difficulties in studying such proteins compared to water-soluble counterparts. One limiting factor is the small number of experimentally determined three-dimensional structures for transmembrane proteins. By contrast, many tens of thousands of globular protein structures provide a rich source of 'scaffolds' for protein design, and the means to garner sequence-to-structure relationships to guide the design process. The α-helical coiled coil is a protein-structure element found in both globular and membrane proteins, where it cements a variety of helix-helix interactions and helical bundles. Our deep understanding of coiled coils has enabled a large number of successful de novo designs. For one class, the α-helical barrels-that is, symmetric bundles of five or more helices with central accessible channels-there are both water-soluble and membrane-spanning examples. Recent computational designs of water-soluble α-helical barrels with five to seven helices have advanced the design field considerably. Here we identify and classify analogous and more complicated membrane-spanning α-helical barrels from the Protein Data Bank. These provide tantalizing but tractable targets for protein engineering and de novo protein design.This article is part of the themed issue 'Membrane pores: from structure and assembly, to medicine and technology'. © 2017 The Author(s).
Protein structure determination by exhaustive search of Protein Data Bank derived databases.
Stokes-Rees, Ian; Sliz, Piotr
2010-12-14
Parallel sequence and structure alignment tools have become ubiquitous and invaluable at all levels in the study of biological systems. We demonstrate the application and utility of this same parallel search paradigm to the process of protein structure determination, benefitting from the large and growing corpus of known structures. Such searches were previously computationally intractable. Through the method of Wide Search Molecular Replacement, developed here, they can be completed in a few hours with the aide of national-scale federated cyberinfrastructure. By dramatically expanding the range of models considered for structure determination, we show that small (less than 12% structural coverage) and low sequence identity (less than 20% identity) template structures can be identified through multidimensional template scoring metrics and used for structure determination. Many new macromolecular complexes can benefit significantly from such a technique due to the lack of known homologous protein folds or sequences. We demonstrate the effectiveness of the method by determining the structure of a full-length p97 homologue from Trichoplusia ni. Example cases with the MHC/T-cell receptor complex and the EmoB protein provide systematic estimates of minimum sequence identity, structure coverage, and structural similarity required for this method to succeed. We describe how this structure-search approach and other novel computationally intensive workflows are made tractable through integration with the US national computational cyberinfrastructure, allowing, for example, rapid processing of the entire Structural Classification of Proteins protein fragment database.
Efficient Relaxation of Protein-Protein Interfaces by Discrete Molecular Dynamics Simulations.
Emperador, Agusti; Solernou, Albert; Sfriso, Pedro; Pons, Carles; Gelpi, Josep Lluis; Fernandez-Recio, Juan; Orozco, Modesto
2013-02-12
Protein-protein interactions are responsible for the transfer of information inside the cell and represent one of the most interesting research fields in structural biology. Unfortunately, after decades of intense research, experimental approaches still have difficulties in providing 3D structures for the hundreds of thousands of interactions formed between the different proteins in a living organism. The use of theoretical approaches like docking aims to complement experimental efforts to represent the structure of the protein interactome. However, we cannot ignore that current methods have limitations due to problems of sampling of the protein-protein conformational space and the lack of accuracy of available force fields. Cases that are especially difficult for prediction are those in which complex formation implies a non-negligible change in the conformation of the interacting proteins, i.e., those cases where protein flexibility plays a key role in protein-protein docking. In this work, we present a new approach to treat flexibility in docking by global structural relaxation based on ultrafast discrete molecular dynamics. On a standard benchmark of protein complexes, the method provides a general improvement over the results obtained by rigid docking. The method is especially efficient in cases with large conformational changes upon binding, in which structure relaxation with discrete molecular dynamics leads to a predictive success rate double that obtained with state-of-the-art rigid-body docking.
Bacterial collagen-like proteins that form triple-helical structures
Yu, Zhuoxin; An, Bo; Ramshaw, John A.M.; Brodsky, Barbara
2014-01-01
A large number of collagen-like proteins have been identified in bacteria during the past ten years, principally from analysis of genome databases. These bacterial collagens share the distinctive Gly-Xaa-Yaa repeating amino acid sequence of animal collagens which underlies their unique triple-helical structure. A number of the bacterial collagens have been expressed in E. coli, and they all adopt a triple-helix conformation. Unlike animal collagens, these bacterial proteins do not contain the post-translationally modified amino acid, hydroxyproline, which is known to stabilize the triple-helix structure and may promote self-assembly. Despite the absence of collagen hydroxylation, the triple-helix structures of the bacterial collagens studied exhibit a high thermal stability of 35–39 °C, close to that seen for mammalian collagens. These bacterial collagens are readily produced in large quantities by recombinant methods, either in the original amino acid sequence or in genetically manipulated sequences. This new family of recombinant, easy to modify collagens could provide a novel system for investigating structural and functional motifs in animal collagens and could also form the basis of new biomedical materials with designed structural properties and functions. PMID:24434612
A growing family: the expanding universe of the bacterial cytoskeleton
Ingerson-Mahar, Michael; Gitai, Zemer
2014-01-01
Cytoskeletal proteins are important mediators of cellular organization in both eukaryotes and bacteria. In the past, cytoskeletal studies have largely focused on three major cytoskeletal families, namely the eukaryotic actin, tubulin, and intermediate filament (IF) proteins and their bacterial homologs MreB, FtsZ, and crescentin. However, mounting evidence suggests that these proteins represent only the tip of the iceberg, as the cellular cytoskeletal network is far more complex. In bacteria, each of MreB, FtsZ, and crescentin represents only one member of large families of diverse homologs. There are also newly identified bacterial cytoskeletal proteins with no eukaryotic homologs, such as WACA proteins and bactofilins. Furthermore, there are universally conserved proteins, such as the metabolic enzyme CtpS, that assemble into filamentous structures that can be repurposed for structural cytoskeletal functions. Recent studies have also identified an increasing number of eukaryotic cytoskeletal proteins that are unrelated to actin, tubulin, and IFs, such that expanding our understanding of cytoskeletal proteins is advancing the understanding of the cell biology of all organisms. Here, we summarize the recent explosion in the identification of new members of the bacterial cytoskeleton and describe a hypothesis for the evolution of the cytoskeleton from self-assembling enzymes. PMID:22092065
Rule-based modeling and simulations of the inner kinetochore structure.
Tschernyschkow, Sergej; Herda, Sabine; Gruenert, Gerd; Döring, Volker; Görlich, Dennis; Hofmeister, Antje; Hoischen, Christian; Dittrich, Peter; Diekmann, Stephan; Ibrahim, Bashar
2013-09-01
Combinatorial complexity is a central problem when modeling biochemical reaction networks, since the association of a few components can give rise to a large variation of protein complexes. Available classical modeling approaches are often insufficient for the analysis of very large and complex networks in detail. Recently, we developed a new rule-based modeling approach that facilitates the analysis of spatial and combinatorially complex problems. Here, we explore for the first time how this approach can be applied to a specific biological system, the human kinetochore, which is a multi-protein complex involving over 100 proteins. Applying our freely available SRSim software to a large data set on kinetochore proteins in human cells, we construct a spatial rule-based simulation model of the human inner kinetochore. The model generates an estimation of the probability distribution of the inner kinetochore 3D architecture and we show how to analyze this distribution using information theory. In our model, the formation of a bridge between CenpA and an H3 containing nucleosome only occurs efficiently for higher protein concentration realized during S-phase but may be not in G1. Above a certain nucleosome distance the protein bridge barely formed pointing towards the importance of chromatin structure for kinetochore complex formation. We define a metric for the distance between structures that allow us to identify structural clusters. Using this modeling technique, we explore different hypothetical chromatin layouts. Applying a rule-based network analysis to the spatial kinetochore complex geometry allowed us to integrate experimental data on kinetochore proteins, suggesting a 3D model of the human inner kinetochore architecture that is governed by a combinatorial algebraic reaction network. This reaction network can serve as bridge between multiple scales of modeling. Our approach can be applied to other systems beyond kinetochores. Copyright © 2013 Elsevier Ltd. All rights reserved.
Random close packing in protein cores
NASA Astrophysics Data System (ADS)
Gaines, Jennifer C.; Smith, W. Wendell; Regan, Lynne; O'Hern, Corey S.
2016-03-01
Shortly after the determination of the first protein x-ray crystal structures, researchers analyzed their cores and reported packing fractions ϕ ≈0.75 , a value that is similar to close packing of equal-sized spheres. A limitation of these analyses was the use of extended atom models, rather than the more physically accurate explicit hydrogen model. The validity of the explicit hydrogen model was proved in our previous studies by its ability to predict the side chain dihedral angle distributions observed in proteins. In contrast, the extended atom model is not able to recapitulate the side chain dihedral angle distributions, and gives rise to large atomic clashes at side chain dihedral angle combinations that are highly probable in protein crystal structures. Here, we employ the explicit hydrogen model to calculate the packing fraction of the cores of over 200 high-resolution protein structures. We find that these protein cores have ϕ ≈0.56 , which is similar to results obtained from simulations of random packings of individual amino acids. This result provides a deeper understanding of the physical basis of protein structure that will enable predictions of the effects of amino acid mutations to protein cores and interfaces of known structure.
Random close packing in protein cores.
Gaines, Jennifer C; Smith, W Wendell; Regan, Lynne; O'Hern, Corey S
2016-03-01
Shortly after the determination of the first protein x-ray crystal structures, researchers analyzed their cores and reported packing fractions ϕ ≈ 0.75, a value that is similar to close packing of equal-sized spheres. A limitation of these analyses was the use of extended atom models, rather than the more physically accurate explicit hydrogen model. The validity of the explicit hydrogen model was proved in our previous studies by its ability to predict the side chain dihedral angle distributions observed in proteins. In contrast, the extended atom model is not able to recapitulate the side chain dihedral angle distributions, and gives rise to large atomic clashes at side chain dihedral angle combinations that are highly probable in protein crystal structures. Here, we employ the explicit hydrogen model to calculate the packing fraction of the cores of over 200 high-resolution protein structures. We find that these protein cores have ϕ ≈ 0.56, which is similar to results obtained from simulations of random packings of individual amino acids. This result provides a deeper understanding of the physical basis of protein structure that will enable predictions of the effects of amino acid mutations to protein cores and interfaces of known structure.
Layers: A molecular surface peeling algorithm and its applications to analyze protein structures
Karampudi, Naga Bhushana Rao; Bahadur, Ranjit Prasad
2015-01-01
We present an algorithm ‘Layers’ to peel the atoms of proteins as layers. Using Layers we show an efficient way to transform protein structures into 2D pattern, named residue transition pattern (RTP), which is independent of molecular orientations. RTP explains the folding patterns of proteins and hence identification of similarity between proteins is simple and reliable using RTP than with the standard sequence or structure based methods. Moreover, Layers generates a fine-tunable coarse model for the molecular surface by using non-random sampling. The coarse model can be used for shape comparison, protein recognition and ligand design. Additionally, Layers can be used to develop biased initial configuration of molecules for protein folding simulations. We have developed a random forest classifier to predict the RTP of a given polypeptide sequence. Layers is a standalone application; however, it can be merged with other applications to reduce the computational load when working with large datasets of protein structures. Layers is available freely at http://www.csb.iitkgp.ernet.in/applications/mol_layers/main. PMID:26553411
Arc is a flexible modular protein capable of reversible self-oligomerization
Myrum, Craig; Baumann, Anne; Bustad, Helene J.; Flydal, Marte Innselset; Mariaule, Vincent; Alvira, Sara; Cuéllar, Jorge; Haavik, Jan; Soulé, Jonathan; Valpuesta, José Maria; Márquez, José Antonio; Martinez, Aurora; Bramham, Clive R.
2015-01-01
The immediate early gene product Arc (activity-regulated cytoskeleton-associated protein) is posited as a master regulator of long-term synaptic plasticity and memory. However, the physicochemical and structural properties of Arc have not been elucidated. In the present study, we expressed and purified recombinant human Arc (hArc) and performed the first biochemical and biophysical analysis of hArc's structure and stability. Limited proteolysis assays and MS analysis indicate that hArc has two major domains on either side of a central more disordered linker region, consistent with in silico structure predictions. hArc's secondary structure was estimated using CD, and stability was analysed by CD-monitored thermal denaturation and differential scanning fluorimetry (DSF). Oligomerization states under different conditions were studied by dynamic light scattering (DLS) and visualized by AFM and EM. Biophysical analyses show that hArc is a modular protein with defined secondary structure and loose tertiary structure. hArc appears to be pyramid-shaped as a monomer and is capable of reversible self-association, forming large soluble oligomers. The N-terminal domain of hArc is highly basic, which may promote interaction with cytoskeletal structures or other polyanionic surfaces, whereas the C-terminal domain is acidic and stabilized by ionic conditions that promote oligomerization. Upon binding of presenilin-1 (PS1) peptide, hArc undergoes a large structural change. A non-synonymous genetic variant of hArc (V231G) showed properties similar to the wild-type (WT) protein. We conclude that hArc is a flexible multi-domain protein that exists in monomeric and oligomeric forms, compatible with a diverse, hub-like role in plasticity-related processes. PMID:25748042
Jia, Yi; Huan, Jun; Buhr, Vincent; Zhang, Jintao; Carayannopoulos, Leonidas N
2009-01-01
Background Automatic identification of structure fingerprints from a group of diverse protein structures is challenging, especially for proteins whose divergent amino acid sequences may fall into the "twilight-" or "midnight-" zones where pair-wise sequence identities to known sequences fall below 25% and sequence-based functional annotations often fail. Results Here we report a novel graph database mining method and demonstrate its application to protein structure pattern identification and structure classification. The biologic motivation of our study is to recognize common structure patterns in "immunoevasins", proteins mediating virus evasion of host immune defense. Our experimental study, using both viral and non-viral proteins, demonstrates the efficiency and efficacy of the proposed method. Conclusion We present a theoretic framework, offer a practical software implementation for incorporating prior domain knowledge, such as substitution matrices as studied here, and devise an efficient algorithm to identify approximate matched frequent subgraphs. By doing so, we significantly expanded the analytical power of sophisticated data mining algorithms in dealing with large volume of complicated and noisy protein structure data. And without loss of generality, choice of appropriate compatibility matrices allows our method to be easily employed in domains where subgraph labels have some uncertainty. PMID:19208148
Hidden Structural Codes in Protein Intrinsic Disorder.
Borkosky, Silvia S; Camporeale, Gabriela; Chemes, Lucía B; Risso, Marikena; Noval, María Gabriela; Sánchez, Ignacio E; Alonso, Leonardo G; de Prat Gay, Gonzalo
2017-10-17
Intrinsic disorder is a major structural category in biology, accounting for more than 30% of coding regions across the domains of life, yet consists of conformational ensembles in equilibrium, a major challenge in protein chemistry. Anciently evolved papillomavirus genomes constitute an unparalleled case for sequence to structure-function correlation in cases in which there are no folded structures. E7, the major transforming oncoprotein of human papillomaviruses, is a paradigmatic example among the intrinsically disordered proteins. Analysis of a large number of sequences of the same viral protein allowed for the identification of a handful of residues with absolute conservation, scattered along the sequence of its N-terminal intrinsically disordered domain, which intriguingly are mostly leucine residues. Mutation of these led to a pronounced increase in both α-helix and β-sheet structural content, reflected by drastic effects on equilibrium propensities and oligomerization kinetics, and uncovers the existence of local structural elements that oppose canonical folding. These folding relays suggest the existence of yet undefined hidden structural codes behind intrinsic disorder in this model protein. Thus, evolution pinpoints conformational hot spots that could have not been identified by direct experimental methods for analyzing or perturbing the equilibrium of an intrinsically disordered protein ensemble.
Clustering biomolecular complexes by residue contacts similarity.
Rodrigues, João P G L M; Trellet, Mikaël; Schmitz, Christophe; Kastritis, Panagiotis; Karaca, Ezgi; Melquiond, Adrien S J; Bonvin, Alexandre M J J
2012-07-01
Inaccuracies in computational molecular modeling methods are often counterweighed by brute-force generation of a plethora of putative solutions. These are then typically sieved via structural clustering based on similarity measures such as the root mean square deviation (RMSD) of atomic positions. Albeit widely used, these measures suffer from several theoretical and technical limitations (e.g., choice of regions for fitting) that impair their application in multicomponent systems (N > 2), large-scale studies (e.g., interactomes), and other time-critical scenarios. We present here a simple similarity measure for structural clustering based on atomic contacts--the fraction of common contacts--and compare it with the most used similarity measure of the protein docking community--interface backbone RMSD. We show that this method produces very compact clusters in remarkably short time when applied to a collection of binary and multicomponent protein-protein and protein-DNA complexes. Furthermore, it allows easy clustering of similar conformations of multicomponent symmetrical assemblies in which chain permutations can occur. Simple contact-based metrics should be applicable to other structural biology clustering problems, in particular for time-critical or large-scale endeavors. Copyright © 2012 Wiley Periodicals, Inc.
Deciphering the shape and deformation of secondary structures through local conformation analysis
2011-01-01
Background Protein deformation has been extensively analysed through global methods based on RMSD, torsion angles and Principal Components Analysis calculations. Here we use a local approach, able to distinguish among the different backbone conformations within loops, α-helices and β-strands, to address the question of secondary structures' shape variation within proteins and deformation at interface upon complexation. Results Using a structural alphabet, we translated the 3 D structures of large sets of protein-protein complexes into sequences of structural letters. The shape of the secondary structures can be assessed by the structural letters that modeled them in the structural sequences. The distribution analysis of the structural letters in the three protein compartments (surface, core and interface) reveals that secondary structures tend to adopt preferential conformations that differ among the compartments. The local description of secondary structures highlights that curved conformations are preferred on the surface while straight ones are preferred in the core. Interfaces display a mixture of local conformations either preferred in core or surface. The analysis of the structural letters transition occurring between protein-bound and unbound conformations shows that the deformation of secondary structure is tightly linked to the compartment preference of the local conformations. Conclusion The conformation of secondary structures can be further analysed and detailed thanks to a structural alphabet which allows a better description of protein surface, core and interface in terms of secondary structures' shape and deformation. Induced-fit modification tendencies described here should be valuable information to identify and characterize regions under strong structural constraints for functional reasons. PMID:21284872
Deciphering the shape and deformation of secondary structures through local conformation analysis.
Baussand, Julie; Camproux, Anne-Claude
2011-02-01
Protein deformation has been extensively analysed through global methods based on RMSD, torsion angles and Principal Components Analysis calculations. Here we use a local approach, able to distinguish among the different backbone conformations within loops, α-helices and β-strands, to address the question of secondary structures' shape variation within proteins and deformation at interface upon complexation. Using a structural alphabet, we translated the 3 D structures of large sets of protein-protein complexes into sequences of structural letters. The shape of the secondary structures can be assessed by the structural letters that modeled them in the structural sequences. The distribution analysis of the structural letters in the three protein compartments (surface, core and interface) reveals that secondary structures tend to adopt preferential conformations that differ among the compartments. The local description of secondary structures highlights that curved conformations are preferred on the surface while straight ones are preferred in the core. Interfaces display a mixture of local conformations either preferred in core or surface. The analysis of the structural letters transition occurring between protein-bound and unbound conformations shows that the deformation of secondary structure is tightly linked to the compartment preference of the local conformations. The conformation of secondary structures can be further analysed and detailed thanks to a structural alphabet which allows a better description of protein surface, core and interface in terms of secondary structures' shape and deformation. Induced-fit modification tendencies described here should be valuable information to identify and characterize regions under strong structural constraints for functional reasons.
3D Complex: A Structural Classification of Protein Complexes
Levy, Emmanuel D; Pereira-Leal, Jose B; Chothia, Cyrus; Teichmann, Sarah A
2006-01-01
Most of the proteins in a cell assemble into complexes to carry out their function. It is therefore crucial to understand the physicochemical properties as well as the evolution of interactions between proteins. The Protein Data Bank represents an important source of information for such studies, because more than half of the structures are homo- or heteromeric protein complexes. Here we propose the first hierarchical classification of whole protein complexes of known 3-D structure, based on representing their fundamental structural features as a graph. This classification provides the first overview of all the complexes in the Protein Data Bank and allows nonredundant sets to be derived at different levels of detail. This reveals that between one-half and two-thirds of known structures are multimeric, depending on the level of redundancy accepted. We also analyse the structures in terms of the topological arrangement of their subunits and find that they form a small number of arrangements compared with all theoretically possible ones. This is because most complexes contain four subunits or less, and the large majority are homomeric. In addition, there is a strong tendency for symmetry in complexes, even for heteromeric complexes. Finally, through comparison of Biological Units in the Protein Data Bank with the Protein Quaternary Structure database, we identified many possible errors in quaternary structure assignments. Our classification, available as a database and Web server at http://www.3Dcomplex.org, will be a starting point for future work aimed at understanding the structure and evolution of protein complexes. PMID:17112313
Serrano, Pedro; Dutta, Samit K; Proudfoot, Andrew; Mohanty, Biswaranjan; Susac, Lukas; Martin, Bryan; Geralt, Michael; Jaroszewski, Lukasz; Godzik, Adam; Elsliger, Marc; Wilson, Ian A; Wüthrich, Kurt
2016-11-01
For more than a decade, the Joint Center for Structural Genomics (JCSG; www.jcsg.org) worked toward increased three-dimensional structure coverage of the protein universe. This coordinated quest was one of the main goals of the four high-throughput (HT) structure determination centers of the Protein Structure Initiative (PSI; www.nigms.nih.gov/Research/specificareas/PSI). To achieve the goals of the PSI, the JCSG made use of the complementarity of structure determination by X-ray crystallography and nuclear magnetic resonance (NMR) spectroscopy to increase and diversify the range of targets entering the HT structure determination pipeline. The overall strategy, for both techniques, was to determine atomic resolution structures for representatives of large protein families, as defined by the Pfam database, which had no structural coverage and could make significant contributions to biological and biomedical research. Furthermore, the experimental structures could be leveraged by homology modeling to further expand the structural coverage of the protein universe and increase biological insights. Here, we describe what could be achieved by this structural genomics approach, using as an illustration the contributions from 20 NMR structure determinations out of a total of 98 JCSG NMR structures, which were selected because they are the first three-dimensional structure representations of the respective Pfam protein families. The information from this small sample is representative for the overall results from crystal and NMR structure determination in the JCSG. There are five new folds, which were classified as domains of unknown functions (DUF), three of the proteins could be functionally annotated based on three-dimensional structure similarity with previously characterized proteins, and 12 proteins showed only limited similarity with previous deposits in the Protein Data Bank (PDB) and were classified as DUFs. © 2016 Federation of European Biochemical Societies.
Structure and biochemical functions of four simian virus 40 truncated large-T antigens.
Chaudry, F; Harvey, R; Smith, A E
1982-01-01
The structure of four abnormal T antigens which are present in different simian virus 40 (SV40)-transformed mouse cell lines was studied by tryptic peptide mapping, partial proteolysis fingerprinting, immunoprecipitation with monoclonal antibodies, and in vitro translation. The results obtained allowed us to deduce that these proteins, which have apparent molecular weights of 15,000, 22,000, 33,000 and 45,000, are truncated forms of large-T antigen extending to different amounts into the amino acid sequences unique to large-T. The proteins are all phosphorylated, probably at a site between amino acids 106 and 123. The mRNAs coding for the proteins probably contain the normal large-T splice but are shorter than the normal transcripts of the SV40 early region. The truncated large-Ts were tested for the ability to bind to double-stranded DNA-cellulose. This showed that the 33,000- and 45,000-molecular-weight polypeptides contained sequences sufficient for binding under the conditions used, whereas the 15,000- and 22,000-molecular-weight forms did not. Together with published data, this allows the tentative mapping of a region of SV40 large-T between amino acids 109 and 272 that is necessary and may be sufficient for the binding to double-stranded DNA-cellulose in vitro. None of the truncated large-T species formed a stable complex with the host cell protein referred to as nonviral T-antigen or p53, suggesting that the carboxy-terminal sequences of large-T are necessary for complex formation. Images PMID:6292504
Ukleja, Marta; Valpuesta, José María; Dziembowski, Andrzej; Cuellar, Jorge
2016-10-01
Large protein assemblies are usually the effectors of major cellular processes. The intricate cell homeostasis network is divided into numerous interconnected pathways, each controlled by a set of protein machines. One of these master regulators is the CCR4-NOT complex, which ultimately controls protein expression levels. This multisubunit complex assembles around a scaffold platform, which enables a wide variety of well-studied functions from mRNA synthesis to transcript decay, as well as other tasks still being identified. Solving the structure of the entire CCR4-NOT complex will help to define the distribution of its functions. The recently published three-dimensional reconstruction of the complex, in combination with the known crystal structures of some of the components, has begun to address this. Methodological improvements in structural biology, especially in cryoelectron microscopy, encourage further structural and protein-protein interaction studies, which will advance our comprehension of the gene expression machinery. © 2016 WILEY Periodicals, Inc.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Merkley, Eric D.; Cort, John R.; Adkins, Joshua N.
2013-09-01
Multiprotein complexes, rather than individual proteins, make up a large part of the biological macromolecular machinery of a cell. Understanding the structure and organization of these complexes is critical to understanding cellular function. Chemical cross-linking coupled with mass spectrometry is emerging as a complementary technique to traditional structural biology methods and can provide low-resolution structural information for a multitude of purposes, such as distance constraints in computational modeling of protein complexes. In this review, we discuss the experimental considerations for successful application of chemical cross-linking-mass spectrometry in biological studies and highlight three examples of such studies from the recent literature.more » These examples (as well as many others) illustrate the utility of a chemical cross-linking-mass spectrometry approach in facilitating structural analysis of large and challenging complexes.« less
MacDonald, James T.; Kabasakal, Burak V.; Godding, David; Kraatz, Sebastian; Henderson, Louie; Barber, James; Freemont, Paul S.; Murray, James W.
2016-01-01
The ability to design and construct structures with atomic level precision is one of the key goals of nanotechnology. Proteins offer an attractive target for atomic design because they can be synthesized chemically or biologically and can self-assemble. However, the generalized protein folding and design problem is unsolved. One approach to simplifying the problem is to use a repetitive protein as a scaffold. Repeat proteins are intrinsically modular, and their folding and structures are better understood than large globular domains. Here, we have developed a class of synthetic repeat proteins based on the pentapeptide repeat family of beta-solenoid proteins. We have constructed length variants of the basic scaffold and computationally designed de novo loops projecting from the scaffold core. The experimentally solved 3.56-Å resolution crystal structure of one designed loop matches closely the designed hairpin structure, showing the computational design of a backbone extension onto a synthetic protein core without the use of backbone fragments from known structures. Two other loop designs were not clearly resolved in the crystal structures, and one loop appeared to be in an incorrect conformation. We have also shown that the repeat unit can accommodate whole-domain insertions by inserting a domain into one of the designed loops. PMID:27573845
Xu, Yongqian; Liu, Qin; Li, Xiaopeng; Wesdemiotis, Chrys; Pang, Yi
2012-11-28
A novel squaraine dye (SQ) exhibits improved fluorescence response toward protein detection by incorporation of a zwitterionic structure. With the aid of a dansylamide (DNSA) substituent, the new probe (DNSA-SQ) exhibits remarkable selectivity in binding to site I (a specific substructure in protein).
Free energy decomposition of protein-protein interactions.
Noskov, S Y; Lim, C
2001-08-01
A free energy decomposition scheme has been developed and tested on antibody-antigen and protease-inhibitor binding for which accurate experimental structures were available for both free and bound proteins. Using the x-ray coordinates of the free and bound proteins, the absolute binding free energy was computed assuming additivity of three well-defined, physical processes: desolvation of the x-ray structures, isomerization of the x-ray conformation to a nearby local minimum in the gas-phase, and subsequent noncovalent complex formation in the gas phase. This free energy scheme, together with the Generalized Born model for computing the electrostatic solvation free energy, yielded binding free energies in remarkable agreement with experimental data. Two assumptions commonly used in theoretical treatments; viz., the rigid-binding approximation (which assumes no conformational change upon complexation) and the neglect of vdW interactions, were found to yield large errors in the binding free energy. Protein-protein vdW and electrostatic interactions between complementary surfaces over a relatively large area (1400--1700 A(2)) were found to drive antibody-antigen and protease-inhibitor binding.
Evaluation of an Ultrafast Molecular Rotor, Auramine O, as a Fluorescent Amyloid Marker.
Mudliar, Niyati H; Sadhu, Biswajit; Pettiwala, Aafrin M; Singh, Prabhat K
2016-10-13
Recently, Auramine O (AuO) has been projected as a fluorescent fibril sensor, and it has been claimed that AuO has an advantage over the most extensively utilized fibril marker, Thioflavin-T (ThT), owing to the presence of an additional large red-shifted emission band for AuO, which was observed exclusively for AuO in the presence of fibrillar media and not in protein or buffer media. As fibrils are very rich in β-sheet structure, a fibril sensor should be more specific toward the β-sheet structure so as to produce a large contrast between the fibril form and native protein form, for efficient detection and in vitro mechanistic studies of fibrillation. However, in this report, we show that AuO interacts significantly with the native form of bovine serum albumin (BSA), which is an all-α-helical protein and lacks the β-sheet structure, which are the hallmarks of a fibrillar structure. This strong interaction of AuO with the native form of BSA leads to a large emission enhancement of AuO for the native protein itself, and leads to a low contrast between the BSA protein and its fibrils. More importantly, the large red-shifted emission band of AuO, reported in the presence of human insulin fibrils, and which was projected as its major advantage over ThT, is not observed in the presence of BSA fibrils as well as fibrils from other proteins, such as lysozyme, human serum albumin, and β-lactoglobulin. Thus, our results provide information on the universal applicability of the distinctive and claimed-to-be-advantageous photophysical features reported for AuO in human insulin fibrils towards fibrils from other proteins. Time-resolved fluorescence measurements also support the proposition of a strong interaction of AuO with native BSA. Additionally, tryptophan emission of the protein has been explored to further elucidate the binding mechanism of AuO with native BSA. Evaluation of thermodynamic parameters revealed that the binding of AuO with native BSA involved positive enthalpy and entropy changes, suggesting dominant contributions from hydrophobic and electrostatic interactions toward the association of AuO with native BSA. Molecular docking calculations have been performed to identify the principal binding location of AuO in native BSA.
Structural changes of malt proteins during boiling.
Jin, Bei; Li, Lin; Liu, Guo-Qin; Li, Bing; Zhu, Yu-Kui; Liao, Liao-Ning
2009-03-09
Changes in the physicochemical properties and structure of proteins derived from two malt varieties (Baudin and Guangmai) during wort boiling were investigated by differential scanning calorimetry, SDS-PAGE, two-dimensional electrophoresis, gel filtration chromatography and circular dichroism spectroscopy. The results showed that both protein content and amino acid composition changed only slightly during boiling, and that boiling might cause a gradual unfolding of protein structures, as indicated by the decrease in surface hydrophobicity and free sulfhydryl content and enthalpy value, as well as reduced alpha-helix contents and markedly increased random coil contents. It was also found that major component of both worts was a boiling-resistant protein with a molecular mass of 40 kDa, and that according to the two-dimensional electrophoresis and SE-HPLC analyses, a small amount of soluble aggregates might be formed via hydrophobic interactions. It was thus concluded that changes of protein structure caused by boiling that might influence beer quality are largely independent of malt variety.
Jiang, Jiang; Chen, Jie; Xiong, Youling L
2009-08-26
Structural unfolding of soy protein isolate (SPI) as induced by holding (0, 0.5, 1, 2, and 4 h) in acidic (pH 1.5-3.5) and alkaline (pH 10.0-12.0) pH solutions, followed by refolding (1 h) at pH 7.0, was analyzed. Changes in emulsifying properties of treated SPI were then examined. The pH-shifting treatments resulted in a substantial increase in protein surface hydrophobicity, intrinsic tryptophan fluorescence intensity, and disulfide-mediated aggregation, along with the exposure of tyrosine. After the pH-shifting processes, soy protein adopted a molten globule-like conformation that largely maintained the original secondary structure and overall compactness but lost some tertiary structure. These structural modifications, consequently, led to markedly improved emulsifying activity of SPI as well as the emulsion stability.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Singh, Rajesh Kumar; Palm, Gottfried J.; Panjikar, Santosh
2007-04-01
Crystal structure analysis of the apo form of catabolite control protein A reveals the three-helix bundle of the DNA-binding domain. In the crystal packing, this domain interacts with the binding site for the corepressor protein. Crystal structure determination of catabolite control protein A (CcpA) at 2.6 Å resolution reveals for the first time the structure of a full-length apo-form LacI-GalR family repressor protein. In the crystal structures of these transcription regulators, the three-helix bundle of the DNA-binding domain has only been observed in cognate DNA complexes; it has not been observed in other crystal structures owing to its mobility. Inmore » the crystal packing of apo-CcpA, the protein–protein contacts between the N-terminal three-helix bundle and the core domain consisted of interactions between the homodimers that were similar to those between the corepressor protein HPr and the CcpA N-subdomain in the ternary DNA complex. In contrast to the DNA complex, the apo-CcpA structure reveals large subdomain movements in the core, resulting in a complete loss of contacts between the N-subdomains of the homodimer.« less
A Circular Dichroism Reference Database for Membrane Proteins
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wallace,B.; Wien, F.; Stone, T.
2006-01-01
Membrane proteins are a major product of most genomes and the target of a large number of current pharmaceuticals, yet little information exists on their structures because of the difficulty of crystallising them; hence for the most part they have been excluded from structural genomics programme targets. Furthermore, even methods such as circular dichroism (CD) spectroscopy which seek to define secondary structure have not been fully exploited because of technical limitations to their interpretation for membrane embedded proteins. Empirical analyses of circular dichroism (CD) spectra are valuable for providing information on secondary structures of proteins. However, the accuracy of themore » results depends on the appropriateness of the reference databases used in the analyses. Membrane proteins have different spectral characteristics than do soluble proteins as a result of the low dielectric constants of membrane bilayers relative to those of aqueous solutions (Chen & Wallace (1997) Biophys. Chem. 65:65-74). To date, no CD reference database exists exclusively for the analysis of membrane proteins, and hence empirical analyses based on current reference databases derived from soluble proteins are not adequate for accurate analyses of membrane protein secondary structures (Wallace et al (2003) Prot. Sci. 12:875-884). We have therefore created a new reference database of CD spectra of integral membrane proteins whose crystal structures have been determined. To date it contains more than 20 proteins, and spans the range of secondary structures from mostly helical to mostly sheet proteins. This reference database should enable more accurate secondary structure determinations of membrane embedded proteins and will become one of the reference database options in the CD calculation server DICHROWEB (Whitmore & Wallace (2004) NAR 32:W668-673).« less
Secure web book to store structural genomics research data.
Manjasetty, Babu A; Höppner, Klaus; Mueller, Uwe; Heinemann, Udo
2003-01-01
Recently established collaborative structural genomics programs aim at significantly accelerating the crystal structure analysis of proteins. These large-scale projects require efficient data management systems to ensure seamless collaboration between different groups of scientists working towards the same goal. Within the Berlin-based Protein Structure Factory, the synchrotron X-ray data collection and the subsequent crystal structure analysis tasks are located at BESSY, a third-generation synchrotron source. To organize file-based communication and data transfer at the BESSY site of the Protein Structure Factory, we have developed the web-based BCLIMS, the BESSY Crystallography Laboratory Information Management System. BCLIMS is a relational data management system which is powered by MySQL as the database engine and Apache HTTP as the web server. The database interface routines are written in Python programing language. The software is freely available to academic users. Here we describe the storage, retrieval and manipulation of laboratory information, mainly pertaining to the synchrotron X-ray diffraction experiments and the subsequent protein structure analysis, using BCLIMS.
A tool for calculating binding-site residues on proteins from PDB structures.
Hu, Jing; Yan, Changhui
2009-08-03
In the research on protein functional sites, researchers often need to identify binding-site residues on a protein. A commonly used strategy is to find a complex structure from the Protein Data Bank (PDB) that consists of the protein of interest and its interacting partner(s) and calculate binding-site residues based on the complex structure. However, since a protein may participate in multiple interactions, the binding-site residues calculated based on one complex structure usually do not reveal all binding sites on a protein. Thus, this requires researchers to find all PDB complexes that contain the protein of interest and combine the binding-site information gleaned from them. This process is very time-consuming. Especially, combing binding-site information obtained from different PDB structures requires tedious work to align protein sequences. The process becomes overwhelmingly difficult when researchers have a large set of proteins to analyze, which is usually the case in practice. In this study, we have developed a tool for calculating binding-site residues on proteins, TCBRP http://yanbioinformatics.cs.usu.edu:8080/ppbindingsubmit. For an input protein, TCBRP can quickly find all binding-site residues on the protein by automatically combining the information obtained from all PDB structures that consist of the protein of interest. Additionally, TCBRP presents the binding-site residues in different categories according to the interaction type. TCBRP also allows researchers to set the definition of binding-site residues. The developed tool is very useful for the research on protein binding site analysis and prediction.
Kister, Alexander
2015-01-01
We present an alternative approach to protein 3D folding prediction based on determination of rules that specify distribution of “favorable” residues, that are mainly responsible for a given fold formation, and “unfavorable” residues, that are incompatible with that fold, in polypeptide sequences. The process of determining favorable and unfavorable residues is iterative. The starting assumptions are based on the general principles of protein structure formation as well as structural features peculiar to a protein fold under investigation. The initial assumptions are tested one-by-one for a set of all known proteins with a given structure. The assumption is accepted as a “rule of amino acid distribution” for the protein fold if it holds true for all, or near all, structures. If the assumption is not accepted as a rule, it can be modified to better fit the data and then tested again in the next step of the iterative search algorithm, or rejected. We determined the set of amino acid distribution rules for a large group of beta sandwich-like proteins characterized by a specific arrangement of strands in two beta sheets. It was shown that this set of rules is highly sensitive (~90%) and very specific (~99%) for identifying sequences of proteins with specified beta sandwich fold structure. The advantage of the proposed approach is that it does not require that query proteins have a high degree of homology to proteins with known structure. So long as the query protein satisfies residue distribution rules, it can be confidently assigned to its respective protein fold. Another advantage of our approach is that it allows for a better understanding of which residues play an essential role in protein fold formation. It may, therefore, facilitate rational protein engineering design. PMID:25625198
Dewhurst, Henry M.; Choudhury, Shilpa; Torres, Matthew P.
2015-01-01
Predicting the biological function potential of post-translational modifications (PTMs) is becoming increasingly important in light of the exponential increase in available PTM data from high-throughput proteomics. We developed structural analysis of PTM hotspots (SAPH-ire)—a quantitative PTM ranking method that integrates experimental PTM observations, sequence conservation, protein structure, and interaction data to allow rank order comparisons within or between protein families. Here, we applied SAPH-ire to the study of PTMs in diverse G protein families, a conserved and ubiquitous class of proteins essential for maintenance of intracellular structure (tubulins) and signal transduction (large and small Ras-like G proteins). A total of 1728 experimentally verified PTMs from eight unique G protein families were clustered into 451 unique hotspots, 51 of which have a known and cited biological function or response. Using customized software, the hotspots were analyzed in the context of 598 unique protein structures. By comparing distributions of hotspots with known versus unknown function, we show that SAPH-ire analysis is predictive for PTM biological function. Notably, SAPH-ire revealed high-ranking hotspots for which a functional impact has not yet been determined, including phosphorylation hotspots in the N-terminal tails of G protein gamma subunits—conserved protein structures never before reported as regulators of G protein coupled receptor signaling. To validate this prediction we used the yeast model system for G protein coupled receptor signaling, revealing that gamma subunit–N-terminal tail phosphorylation is activated in response to G protein coupled receptor stimulation and regulates protein stability in vivo. These results demonstrate the utility of integrating protein structural and sequence features into PTM prioritization schemes that can improve the analysis and functional power of modification-specific proteomics data. PMID:26070665
A systematic analysis of atomic protein-ligand interactions in the PDB.
Ferreira de Freitas, Renato; Schapira, Matthieu
2017-10-01
As the protein databank (PDB) recently passed the cap of 123 456 structures, it stands more than ever as an important resource not only to analyze structural features of specific biological systems, but also to study the prevalence of structural patterns observed in a large body of unrelated structures, that may reflect rules governing protein folding or molecular recognition. Here, we compiled a list of 11 016 unique structures of small-molecule ligands bound to proteins - 6444 of which have experimental binding affinity - representing 750 873 protein-ligand atomic interactions, and analyzed the frequency, geometry and impact of each interaction type. We find that hydrophobic interactions are generally enriched in high-efficiency ligands, but polar interactions are over-represented in fragment inhibitors. While most observations extracted from the PDB will be familiar to seasoned medicinal chemists, less expected findings, such as the high number of C-H···O hydrogen bonds or the relatively frequent amide-π stacking between the backbone amide of proteins and aromatic rings of ligands, uncover underused ligand design strategies.
Necci, Marco; Piovesan, Damiano; Tosatto, Silvio C E
2016-12-01
Intrinsic disorder (ID) in proteins has been extensively described for the last decade; a large-scale classification of ID in proteins is mostly missing. Here, we provide an extensive analysis of ID in the protein universe on the UniProt database derived from sequence-based predictions in MobiDB. Almost half the sequences contain an ID region of at least five residues. About 9% of proteins have a long ID region of over 20 residues which are more abundant in Eukaryotic organisms and most frequently cover less than 20% of the sequence. A small subset of about 67,000 (out of over 80 million) proteins is fully disordered and mostly found in Viruses. Most proteins have only one ID, with short ID evenly distributed along the sequence and long ID overrepresented in the center. The charged residue composition of Das and Pappu was used to classify ID proteins by structural propensities and corresponding functional enrichment. Swollen Coils seem to be used mainly as structural components and in biosynthesis in both Prokaryotes and Eukaryotes. In Bacteria, they are confined in the nucleoid and in Viruses provide DNA binding function. Coils & Hairpins seem to be specialized in ribosome binding and methylation activities. Globules & Tadpoles bind antigens in Eukaryotes but are involved in killing other organisms and cytolysis in Bacteria. The Undefined class is used by Bacteria to bind toxic substances and mediate transport and movement between and within organisms in Viruses. Fully disordered proteins behave similarly, but are enriched for glycine residues and extracellular structures. © 2016 The Protein Society.
Schmidt, Nathan W.; Grigoryan, Gevorg
2017-01-01
Abstract Coiled‐coils are essential components of many protein complexes. First discovered in structural proteins such as keratins, they have since been found to figure largely in the assembly and dynamics required for diverse functions, including membrane fusion, signal transduction and motors. Coiled‐coils have a characteristic repeating seven‐residue geometric and sequence motif, which is sometimes interrupted by the insertion of one or more residues. Such insertions are often highly conserved and critical to interdomain communication in signaling proteins such as bacterial histidine kinases. Here we develop the “accommodation index” as a parameter that allows automatic detection and classification of insertions based on the three dimensional structure of a protein. This method allows precise identification of the type of insertion and the “accommodation length” over which the insertion is structurally accommodated. A simple theory is presented that predicts the structural perturbations of 1, 3, 4 residue insertions as a function of the length over which the insertion is accommodated. Analysis of experimental structures is in good agreement with theory, and shows that short accommodation lengths give rise to greater perturbation of helix packing angles, changes in local helical phase, and increased structural asymmetry relative to long accommodation lengths. Cytoplasmic domains of histidine kinases in different signaling states display large changes in their accommodation lengths, which can now be seen to underlie diverse structural transitions including symmetry/asymmetry and local variations in helical phase that accompany signal transduction. PMID:27977891
Systematic methods for defining coarse-grained maps in large biomolecules.
Zhang, Zhiyong
2015-01-01
Large biomolecules are involved in many important biological processes. It would be difficult to use large-scale atomistic molecular dynamics (MD) simulations to study the functional motions of these systems because of the computational expense. Therefore various coarse-grained (CG) approaches have attracted rapidly growing interest, which enable simulations of large biomolecules over longer effective timescales than all-atom MD simulations. The first issue in CG modeling is to construct CG maps from atomic structures. In this chapter, we review the recent development of a novel and systematic method for constructing CG representations of arbitrarily complex biomolecules, in order to preserve large-scale and functionally relevant essential dynamics (ED) at the CG level. In this ED-CG scheme, the essential dynamics can be characterized by principal component analysis (PCA) on a structural ensemble, or elastic network model (ENM) of a single atomic structure. Validation and applications of the method cover various biological systems, such as multi-domain proteins, protein complexes, and even biomolecular machines. The results demonstrate that the ED-CG method may serve as a very useful tool for identifying functional dynamics of large biomolecules at the CG level.
Hrle, Ajla; Maier, Lisa-Katharina; Sharma, Kundan; Ebert, Judith; Basquin, Claire; Urlaub, Henning; Marchfelder, Anita; Conti, Elena
2014-01-01
Upon pathogen invasion, bacteria and archaea activate an RNA-interference-like mechanism termed CRISPR (clustered regularly interspaced short palindromic repeats). A large family of Cas (CRISPR-associated) proteins mediates the different stages of this sophisticated immune response. Bioinformatic studies have classified the Cas proteins into families, according to their sequences and respective functions. These range from the insertion of the foreign genetic elements into the host genome to the activation of the interference machinery as well as target degradation upon attack. Cas7 family proteins are central to the type I and type III interference machineries as they constitute the backbone of the large interference complexes. Here we report the crystal structure of Thermofilum pendens Csc2, a Cas7 family protein of type I-D. We found that Csc2 forms a core RRM-like domain, flanked by three peripheral insertion domains: a lid domain, a Zinc-binding domain and a helical domain. Comparison with other Cas7 family proteins reveals a set of similar structural features both in the core and in the peripheral domains, despite the absence of significant sequence similarity. T. pendens Csc2 binds single-stranded RNA in vitro in a sequence-independent manner. Using a crosslinking - mass-spectrometry approach, we mapped the RNA-binding surface to a positively charged surface patch on T. pendens Csc2. Thus our analysis of the key structural and functional features of T. pendens Csc2 highlights recurring themes and evolutionary relationships in type I and type III Cas proteins.
NASA Technical Reports Server (NTRS)
Mahtani, H. K.; Richmond, R. C.; Chang, T. Y.; Chang, C. C. Y.; Rose, M. Franklin (Technical Monitor)
2001-01-01
The enzyme acyl-coenzyme A:cholesterol acyltransferase (ACAT) is an important contributor to the pathological expression of plaque leading to artherosclerosis n a major health problem. Adequate knowledge of the structure of this protein will enable pharmaceutical companies to design drugs specific to the enzyme. ACAT is a membrane protein located in the endoplasmic reticulum.t The protein has never been purified to homogeneity.T.Y. Chang's laboratory at Dartmouth College provided a 4-kb cDNA clone (K1) coding for a structural gene of the protein. We have modified the gene sequence and inserted the cDNA into the BioGreen His Baculovirus transfer vector. This was successfully expressed in Sf2l insect cells as a GFP-labeled ACAT protein. The advantage to this ACAT-GFP fusion protein (abbreviated GCAT) is that one can easily monitor its expression as a function of GFP excitation at 395 nm and emission at 509 nm. Moreover, the fusion protein GCAT can be detected on Western blots with the use of commercially available GFP antibodies. Antibodies against ACAT are not readily available. The presence of the 6xHis tag in the transfer vector facilitates purification of the recombinant protein since 6xHis fusion proteins bind with high affinity to Ni-NTA agarose. Obtaining highly pure protein in large quantities is essential for subsequent crystallization. The purified GCAT fusion protein can readily be cleaved into distinct GFP and ACAT proteins in the presence of thrombin. Thrombin digests the 6xHis tag linking the two protein sequences. Preliminary experiments have indicated that both GCAT and ACAT are expressed as functional proteins. The ultimate aim is to obtain large quantities of the ACAT protein in pure and functional form appropriate for protein crystal growth. Determining protein structure is the key to the design and development of effective drugs. X-ray analysis requires large homogeneous crystals that are difficult to obtain in the gravity environment of earth. Protein crystals grown in microgravity are often larger and have fewer defects than those grown on earth. The analysis of higher quality space-grown crystals will assist in structure-based drug design. We have successfully grown GCAT-infected Sf21 cells in both adhesion and suspension cultures. Expression levels of GCAT in cell lines such as Sf9 and High Five appear to be reduced. We intend to replicate GCAT expression in all three cell lines using the NASA rotating wall bioreactor which effectively duplicates a microgravity environment. The bioreactor itself could be launched to study the expression of the GFP and GCAT proteins in the actual microgravity environment achieved in orbit.
Unexpected features of the dark proteome.
Perdigão, Nelson; Heinrich, Julian; Stolte, Christian; Sabir, Kenneth S; Buckley, Michael J; Tabor, Bruce; Signal, Beth; Gloss, Brian S; Hammang, Christopher J; Rost, Burkhard; Schafferhans, Andrea; O'Donoghue, Seán I
2015-12-29
We surveyed the "dark" proteome-that is, regions of proteins never observed by experimental structure determination and inaccessible to homology modeling. For 546,000 Swiss-Prot proteins, we found that 44-54% of the proteome in eukaryotes and viruses was dark, compared with only ∼14% in archaea and bacteria. Surprisingly, most of the dark proteome could not be accounted for by conventional explanations, such as intrinsic disorder or transmembrane regions. Nearly half of the dark proteome comprised dark proteins, in which the entire sequence lacked similarity to any known structure. Dark proteins fulfill a wide variety of functions, but a subset showed distinct and largely unexpected features, such as association with secretion, specific tissues, the endoplasmic reticulum, disulfide bonding, and proteolytic cleavage. Dark proteins also had short sequence length, low evolutionary reuse, and few known interactions with other proteins. These results suggest new research directions in structural and computational biology.
Chae, Pil Seok; Rasmussen, Søren G. F.; Rana, Rohini; Gotfryd, Kamil; Chandra, Richa; Goren, Michael A.; Kruse, Andrew C.; Nurva, Shailika; Loland, Claus J.; Pierre, Yves; Drew, David; Popot, Jean-Luc; Picot, Daniel; Fox, Brian G.; Guan, Lan; Gether, Ulrik; Byrne, Bernadette; Kobilka, Brian; Gellman, Samuel H.
2011-01-01
The understanding of integral membrane protein (IMP) structure and function is hampered by the difficulty of handling these proteins. Aqueous solubilization, necessary for many types of biophysical analysis, generally requires a detergent to shield the large lipophilic surfaces displayed by native IMPs. Many proteins remain difficult to study owing to a lack of suitable detergents. We introduce a class of amphiphiles, each of which is built around a central quaternary carbon atom derived from neopentyl glycol, with hydrophilic groups derived from maltose. Representatives of this maltose-neopentyl glycol (MNG) amphiphile family display favorable behavior relative to conventional detergents, as tested on multiple membrane protein systems, leading to enhanced structural stability and successful crystallization. MNG amphiphiles are promising tools for membrane protein science because of the ease with which they may be prepared and the facility with which their structures may be varied. PMID:21037590
Unexpected features of the dark proteome
Perdigão, Nelson; Heinrich, Julian; Stolte, Christian; Sabir, Kenneth S.; Buckley, Michael J.; Tabor, Bruce; Signal, Beth; Gloss, Brian S.; Hammang, Christopher J.; Rost, Burkhard; Schafferhans, Andrea
2015-01-01
We surveyed the “dark” proteome–that is, regions of proteins never observed by experimental structure determination and inaccessible to homology modeling. For 546,000 Swiss-Prot proteins, we found that 44–54% of the proteome in eukaryotes and viruses was dark, compared with only ∼14% in archaea and bacteria. Surprisingly, most of the dark proteome could not be accounted for by conventional explanations, such as intrinsic disorder or transmembrane regions. Nearly half of the dark proteome comprised dark proteins, in which the entire sequence lacked similarity to any known structure. Dark proteins fulfill a wide variety of functions, but a subset showed distinct and largely unexpected features, such as association with secretion, specific tissues, the endoplasmic reticulum, disulfide bonding, and proteolytic cleavage. Dark proteins also had short sequence length, low evolutionary reuse, and few known interactions with other proteins. These results suggest new research directions in structural and computational biology. PMID:26578815
Direct Calculation of Protein Fitness Landscapes through Computational Protein Design
Au, Loretta; Green, David F.
2016-01-01
Naturally selected amino-acid sequences or experimentally derived ones are often the basis for understanding how protein three-dimensional conformation and function are determined by primary structure. Such sequences for a protein family comprise only a small fraction of all possible variants, however, representing the fitness landscape with limited scope. Explicitly sampling and characterizing alternative, unexplored protein sequences would directly identify fundamental reasons for sequence robustness (or variability), and we demonstrate that computational methods offer an efficient mechanism toward this end, on a large scale. The dead-end elimination and A∗ search algorithms were used here to find all low-energy single mutant variants, and corresponding structures of a G-protein heterotrimer, to measure changes in structural stability and binding interactions to define a protein fitness landscape. We established consistency between these algorithms with known biophysical and evolutionary trends for amino-acid substitutions, and could thus recapitulate known protein side-chain interactions and predict novel ones. PMID:26745411
Shield, Alison J; Murray, Tracy P; Board, Philip G
2006-09-08
Mutations in the ganglioside-induced differentiation-associated protein 1 (GDAP1) gene have been linked with Charcot-Marie-Tooth (CMT) disease. This protein, and its paralogue GDAP1L1, appear to be structurally related to the cytosolic glutathione S-transferases (GST) including an N-terminal thioredoxin fold domain with conserved active site residues. The specific function, of GDAP1 remains unknown. To further characterise their structure and function we purified recombinant human GDAP1 and GDAP1L1 proteins using bacterial expression and immobilised metal affinity chromatography. Like other cytosolic GSTs, GDAP1 protein has a dimeric structure. Although the full-length proteins were largely insoluble, the deletion of a proposed C-terminal transmembrane domain allowed the preparation of soluble protein. The purified proteins were assayed for glutathione-dependent activity against a library of 'prototypic' GST substrates. No evidence of glutathione-dependent activity or an ability to bind glutathione immobilised on agarose was found.
Portolano, Nicola; Watson, Peter J; Fairall, Louise; Millard, Christopher J; Milano, Charles P; Song, Yun; Cowley, Shaun M; Schwabe, John W R
2014-10-16
The expression and purification of large amounts of recombinant protein complexes is an essential requirement for structural biology studies. For over two decades, prokaryotic expression systems such as E. coli have dominated the scientific literature over costly and less efficient eukaryotic cell lines. Despite the clear advantage in terms of yields and costs of expressing recombinant proteins in bacteria, the absence of specific co-factors, chaperones and post-translational modifications may cause loss of function, mis-folding and can disrupt protein-protein interactions of certain eukaryotic multi-subunit complexes, surface receptors and secreted proteins. The use of mammalian cell expression systems can address these drawbacks since they provide a eukaryotic expression environment. However, low protein yields and high costs of such methods have until recently limited their use for structural biology. Here we describe a simple and accessible method for expressing and purifying milligram quantities of protein by performing transient transfections of suspension grown HEK (Human Embryonic Kidney) 293 F cells.
Advances in recombinant protein expression for use in pharmaceutical research.
Assenberg, Rene; Wan, Paul T; Geisse, Sabine; Mayr, Lorenz M
2013-06-01
Protein production for structural and biophysical studies, functional assays, biomarkers, mechanistic studies in vitro and in vivo, but also for therapeutic applications in pharma, biotech and academia has evolved into a mature discipline in recent years. Due to the increased emphasis on biopharmaceuticals, the growing demand for proteins used for structural and biophysical studies, the impact of genomics technologies on the analysis of large sets of structurally diverse proteins, and the increasing complexity of disease targets, the interest in innovative approaches for the expression, purification and characterisation of recombinant proteins has steadily increased over the years. In this review, we summarise recent developments in the field of recombinant protein expression for research use in pharma, biotech and academia. We focus mostly on the latest developments for protein expression in the most widely used expression systems: Escherichia coli (E. coli), insect cell expression using the Baculovirus Expression Vector System (BEVS) and, finally, transient and stable expression of recombinant proteins in mammalian cells. Copyright © 2013. Published by Elsevier Ltd.
Domain organizations of modular extracellular matrix proteins and their evolution.
Engel, J
1996-11-01
Multidomain proteins which are composed of modular units are a rather recent invention of evolution. Domains are defined as autonomously folding regions of a protein, and many of them are similar in sequence and structure, indicating common ancestry. Their modular nature is emphasized by frequent repetitions in identical or in different proteins and by a large number of different combinations with other domains. The extracellular matrix is perhaps the largest biological system composed of modular mosaic proteins, and its astonishing complexity and diversity are based on them. A cluster of minireviews on modular proteins is being published in Matrix Biology. These deal with the evolution of modular proteins, the three-dimensional structure of domains and the ways in which these interact in a multidomain protein. They discuss structure-function relationships in calcium binding domains, collagen helices, alpha-helical coiled-coil domains and C-lectins. The present minireview is focused on some general aspects and serves as an introduction to the cluster.
X-ray scattering data and structural genomics
NASA Astrophysics Data System (ADS)
Doniach, Sebastian
2003-03-01
High throughput structural genomics has the ambitious goal of determining the structure of all, or a very large number of protein folds using the high-resolution techniques of protein crystallography and NMR. However, the program is facing significant bottlenecks in reaching this goal, which include problems of protein expression and crystallization. In this talk, some preliminary results on how the low-resolution technique of small-angle X-ray solution scattering (SAXS) can help ameliorate some of these bottlenecks will be presented. One of the most significant bottlenecks arises from the difficulty of crystallizing integral membrane proteins, where only a handful of structures are available compared to thousands of structures for soluble proteins. By 3-dimensional reconstruction from SAXS data, the size and shape of detergent-solubilized integral membrane proteins can be characterized. This information can then be used to classify membrane proteins which constitute some 25% of all genomes. SAXS may also be used to study the dependence of interparticle interference scattering on solvent conditions so that regions of the protein solution phase diagram which favor crystallization can be elucidated. As a further application, SAXS may be used to provide physical constraints on computational methods for protein structure prediction based on primary sequence information. This in turn can help in identifying structural homologs of a given protein, which can then give clues to its function. D. Walther, F. Cohen and S. Doniach. "Reconstruction of low resolution three-dimensional density maps from one-dimensional small angle x-ray scattering data for biomolecules." J. Appl. Cryst. 33(2):350-363 (2000). Protein structure prediction constrained by solution X-ray scattering data and structural homology identification Zheng WJ, Doniach S JOURNAL OF MOLECULAR BIOLOGY , v. 316(#1) pp. 173-187 FEB 8, 2002
Kathuria, Sagar V; Chan, Yvonne H; Nobrega, R Paul; Özen, Ayşegül; Matthews, C Robert
2016-03-01
Measurements of protection against exchange of main chain amide hydrogens (NH) with solvent hydrogens in globular proteins have provided remarkable insights into the structures of rare high-energy states that populate their folding free-energy surfaces. Lacking, however, has been a unifying theory that rationalizes these high-energy states in terms of the structures and sequences of their resident proteins. The Branched Aliphatic Side Chain (BASiC) hypothesis has been developed to explain the observed patterns of protection in a pair of TIM barrel proteins. This hypothesis supposes that the side chains of isoleucine, leucine, and valine (ILV) residues often form large hydrophobic clusters that very effectively impede the penetration of water to their underlying hydrogen bond networks and, thereby, enhance the protection against solvent exchange. The linkage between the secondary and tertiary structures enables these ILV clusters to serve as cores of stability in high-energy partially folded states. Statistically significant correlations between the locations of large ILV clusters in native conformations and strong protection against exchange for a variety of motifs reported in the literature support the generality of the BASiC hypothesis. The results also illustrate the necessity to elaborate this simple hypothesis to account for the roles of adjacent hydrocarbon moieties in defining stability cores of partially folded states along folding reaction coordinates. © 2015 The Protein Society.
Mlynek, Georg; Lehner, Anita; Neuhold, Jana; Leeb, Sarah; Kostan, Julius; Charnagalov, Alexej; Stolt-Bergner, Peggy; Djinović-Carugo, Kristina; Pinotsis, Nikos
2014-06-01
Expression in Escherichia coli represents the simplest and most cost effective means for the production of recombinant proteins. This is a routine task in structural biology and biochemistry where milligrams of the target protein are required in high purity and monodispersity. To achieve these criteria, the user often needs to screen several constructs in different expression and purification conditions in parallel. We describe a pipeline, implemented in the Center for Optimized Structural Studies, that enables the systematic screening of expression and purification conditions for recombinant proteins and relies on a series of logical decisions. We first use bioinformatics tools to design a series of protein fragments, which we clone in parallel, and subsequently screen in small scale for optimal expression and purification conditions. Based on a scoring system that assesses soluble expression, we then select the top ranking targets for large-scale purification. In the establishment of our pipeline, emphasis was put on streamlining the processes such that it can be easily but not necessarily automatized. In a typical run of about 2 weeks, we are able to prepare and perform small-scale expression screens for 20-100 different constructs followed by large-scale purification of at least 4-6 proteins. The major advantage of our approach is its flexibility, which allows for easy adoption, either partially or entirely, by any average hypothesis driven laboratory in a manual or robot-assisted manner.
Vicente, Juan J; Galardi-Castilla, María; Escalante, Ricardo; Sastre, Leandro
2008-01-03
The social amoeba Dictyostelium discoideum executes a multicellular development program upon starvation. This morphogenetic process requires the differential regulation of a large number of genes and is coordinated by extracellular signals. The MADS-box transcription factor SrfA is required for several stages of development, including slug migration and spore terminal differentiation. Subtractive hybridization allowed the isolation of a gene, sigN (SrfA-induced gene N), that was dependent on the transcription factor SrfA for expression at the slug stage of development. Homology searches detected the existence of a large family of sigN-related genes in the Dictyostelium discoideum genome. The 13 most similar genes are grouped in two regions of chromosome 2 and have been named Group1 and Group2 sigN genes. The putative encoded proteins are 87-89 amino acids long. All these genes have a similar structure, composed of a first exon containing a 13 nucleotides long open reading frame and a second exon comprising the remaining of the putative coding region. The expression of these genes is induced at10 hours of development. Analyses of their promoter regions indicate that these genes are expressed in the prestalk region of developing structures. The addition of antibodies raised against SigN Group 2 proteins induced disintegration of multi-cellular structures at the mound stage of development. A large family of genes coding for small proteins has been identified in D. discoideum. Two groups of very similar genes from this family have been shown to be specifically expressed in prestalk cells during development. Functional studies using antibodies raised against Group 2 SigN proteins indicate that these genes could play a role during multicellular development.
Tighter Ligand Binding Can Compensate for Impaired Stability of an RNA-Binding Protein.
Wallis, Christopher P; Richman, Tara R; Filipovska, Aleksandra; Rackham, Oliver
2018-06-15
It has been widely shown that ligand-binding residues, by virtue of their orientation, charge, and solvent exposure, often have a net destabilizing effect on proteins that is offset by stability conferring residues elsewhere in the protein. This structure-function trade-off can constrain possible adaptive evolutionary changes of function and may hamper protein engineering efforts to design proteins with new functions. Here, we present evidence from a large randomized mutant library screen that, in the case of PUF RNA-binding proteins, this structural relationship may be inverted and that active-site mutations that increase protein activity are also able to compensate for impaired stability. We show that certain mutations in RNA-protein binding residues are not necessarily destabilizing and that increased ligand-binding can rescue an insoluble, unstable PUF protein. We hypothesize that these mutations restabilize the protein via thermodynamic coupling of protein folding and RNA binding.
Xiao, Rong; Anderson, Stephen; Aramini, James; Belote, Rachel; Buchwald, William A.; Ciccosanti, Colleen; Conover, Ken; Everett, John K.; Hamilton, Keith; Huang, Yuanpeng Janet; Janjua, Haleema; Jiang, Mei; Kornhaber, Gregory J.; Lee, Dong Yup; Locke, Jessica Y.; Ma, Li-Chung; Maglaqui, Melissa; Mao, Lei; Mitra, Saheli; Patel, Dayaban; Rossi, Paolo; Sahdev, Seema; Sharma, Seema; Shastry, Ritu; Swapna, G.V.T.; Tong, Saichu N.; Wang, Dongyan; Wang, Huang; Zhao, Li; Montelione, Gaetano T.; Acton, Thomas B.
2014-01-01
We describe the core Protein Production Platform of the Northeast Structural Genomics Consortium (NESG) and outline the strategies used for producing high-quality protein samples. The platform is centered on the cloning, expression and purification of 6X-His-tagged proteins using T7-based Escherichia coli systems. The 6X-His tag allows for similar purification procedures for most targets and implementation of high-throughput (HTP) parallel methods. In most cases, the 6X-His-tagged proteins are sufficiently purified (> 97% homogeneity) using a HTP two-step purification protocol for most structural studies. Using this platform, the open reading frames of over 16,000 different targeted proteins (or domains) have been cloned as > 26,000 constructs. Over the past nine years, more than 16,000 of these expressed protein, and more than 4,400 proteins (or domains) have been purified to homogeneity in tens of milligram quantities (see Summary Statistics, http://nesg.org/statistics.html). Using these samples, the NESG has deposited more than 900 new protein structures to the Protein Data Bank (PDB). The methods described here are effective in producing eukaryotic and prokaryotic protein samples in E. coli. This paper summarizes some of the updates made to the protein production pipeline in the last five years, corresponding to phase 2 of the NIGMS Protein Structure Initiative (PSI-2) project. The NESG Protein Production Platform is suitable for implementation in a large individual laboratory or by a small group of collaborating investigators. These advanced automated and/or parallel cloning, expression, purification, and biophysical screening technologies are of broad value to the structural biology, functional proteomics, and structural genomics communities. PMID:20688167
Mavridis, Lazaros; Janes, Robert W
2017-01-01
Circular dichroism (CD) spectroscopy is extensively utilized for determining the percentages of secondary structure content present in proteins. However, although a large contributor, secondary structure is not the only factor that influences the shape and magnitude of the CD spectrum produced. Other structural features can make contributions so an entire protein structural conformation can give rise to a CD spectrum. There is a need for an application capable of generating protein CD spectra from atomic coordinates. However, no empirically derived method to do this currently exists. PDB2CD has been created as an empirical-based approach to the generation of protein CD spectra from atomic coordinates. The method utilizes a combination of structural features within the conformation of a protein; not only its percentage secondary structure content, but also the juxtaposition of these structural components relative to one another, and the overall structure similarity of the query protein to proteins in our dataset, the SP175 dataset, the 'gold standard' set obtained from the Protein Circular Dichroism Data Bank (PCDDB). A significant number of the CD spectra associated with the 71 proteins in this dataset have been produced with excellent accuracy using a leave-one-out cross-validation process. The method also creates spectra in good agreement with those of a test set of 14 proteins from the PCDDB. The PDB2CD package provides a web-based, user friendly approach to enable researchers to produce CD spectra from protein atomic coordinates. http://pdb2cd.cryst.bbk.ac.uk CONTACT: r.w.janes@qmul.ac.ukSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.
Kaur, Parminder; Kiselar, Janna; Yang, Sichun; Chance, Mark R.
2015-01-01
Hydroxyl radical footprinting based MS for protein structure assessment has the goal of understanding ligand induced conformational changes and macromolecular interactions, for example, protein tertiary and quaternary structure, but the structural resolution provided by typical peptide-level quantification is limiting. In this work, we present experimental strategies using tandem-MS fragmentation to increase the spatial resolution of the technique to the single residue level to provide a high precision tool for molecular biophysics research. Overall, in this study we demonstrated an eightfold increase in structural resolution compared with peptide level assessments. In addition, to provide a quantitative analysis of residue based solvent accessibility and protein topography as a basis for high-resolution structure prediction; we illustrate strategies of data transformation using the relative reactivity of side chains as a normalization strategy and predict side-chain surface area from the footprinting data. We tested the methods by examination of Ca+2-calmodulin showing highly significant correlations between surface area and side-chain contact predictions for individual side chains and the crystal structure. Tandem ion based hydroxyl radical footprinting-MS provides quantitative high-resolution protein topology information in solution that can fill existing gaps in structure determination for large proteins and macromolecular complexes. PMID:25687570
DOE Office of Scientific and Technical Information (OSTI.GOV)
Leite, Wellington C.; Galvão, Carolina W.; Saab, Sérgio C.
The bacterial RecA protein plays a role in the complex system of DNA damage repair. Here, we report the functional and structural characterization of the Herbaspirillum seropedicae RecA protein (HsRecA). HsRecA protein is more efficient at displacing SSB protein from ssDNA than Escherichia coli RecA protein. HsRecA also promotes DNA strand exchange more efficiently. The three dimensional structure of HsRecA-ADP/ATP complex has been solved to 1.7 Å resolution. HsRecA protein contains a small N-terminal domain, a central core ATPase domain and a large C-terminal domain, that are similar to homologous bacterial RecA proteins. Comparative structural analysis showed that the N-terminalmore » polymerization motif of archaeal and eukaryotic RecA family proteins are also present in bacterial RecAs. Reconstruction of electrostatic potential from the hexameric structure of HsRecA-ADP/ATP revealed a high positive charge along the inner side, where ssDNA is bound inside the filament. The properties of this surface may explain the greater capacity of HsRecA protein to bind ssDNA, forming a contiguous nucleoprotein filament, displace SSB and promote DNA exchange relative to EcRecA. In conclusion, our functional and structural analyses provide insight into the molecular mechanisms of polymerization of bacterial RecA as a helical nucleoprotein filament.« less
Galvão, Carolina W.; Saab, Sérgio C.; Iulek, Jorge; Etto, Rafael M.; Steffens, Maria B. R.; Chitteni-Pattu, Sindhu; Stanage, Tyler; Keck, James L.; Cox, Michael M.
2016-01-01
The bacterial RecA protein plays a role in the complex system of DNA damage repair. Here, we report the functional and structural characterization of the Herbaspirillum seropedicae RecA protein (HsRecA). HsRecA protein is more efficient at displacing SSB protein from ssDNA than Escherichia coli RecA protein. HsRecA also promotes DNA strand exchange more efficiently. The three dimensional structure of HsRecA-ADP/ATP complex has been solved to 1.7 Å resolution. HsRecA protein contains a small N-terminal domain, a central core ATPase domain and a large C-terminal domain, that are similar to homologous bacterial RecA proteins. Comparative structural analysis showed that the N-terminal polymerization motif of archaeal and eukaryotic RecA family proteins are also present in bacterial RecAs. Reconstruction of electrostatic potential from the hexameric structure of HsRecA-ADP/ATP revealed a high positive charge along the inner side, where ssDNA is bound inside the filament. The properties of this surface may explain the greater capacity of HsRecA protein to bind ssDNA, forming a contiguous nucleoprotein filament, displace SSB and promote DNA exchange relative to EcRecA. Our functional and structural analyses provide insight into the molecular mechanisms of polymerization of bacterial RecA as a helical nucleoprotein filament. PMID:27447485
Dias, José; Renault, Louis; Pérez, Javier; Mirande, Marc
2013-08-16
In animal cells, nine aminoacyl-tRNA synthetases are associated with the three auxiliary proteins p18, p38, and p43 to form a stable and conserved large multi-aminoacyl-tRNA synthetase complex (MARS), whose molecular mass has been proposed to be between 1.0 and 1.5 MDa. The complex acts as a molecular hub for coordinating protein synthesis and diverse regulatory signal pathways. Electron microscopy studies defined its low resolution molecular envelope as an overall rather compact, asymmetric triangular shape. Here, we have analyzed the composition and homogeneity of the native mammalian MARS isolated from rabbit liver and characterized its overall internal structure, size, and shape at low resolution by hydrodynamic methods and small-angle x-ray scattering in solution. Our data reveal that the MARS exhibits a much more elongated and multi-armed shape than expected from previous reports. The hydrodynamic and structural features of the MARS are large compared with other supramolecular assemblies involved in translation, including ribosome. The large dimensions and non-compact structural organization of MARS favor a large protein surface accessibility for all its components. This may be essential to allow structural rearrangements between the catalytic and cis-acting tRNA binding domains of the synthetases required for binding the bulky tRNA substrates. This non-compact architecture may also contribute to the spatiotemporal controlled release of some of its components, which participate in non-canonical functions after dissociation from the complex.
High-throughput Cloning and Expression of Integral Membrane Proteins in Escherichia coli
Bruni, Renato
2014-01-01
Recently, several structural genomics centers have been established and a remarkable number of three-dimensional structures of soluble proteins have been solved. For membrane proteins, the number of structures solved has been significantly trailing those for their soluble counterparts, not least because over-expression and purification of membrane proteins is a much more arduous process. By using high throughput technologies, a large number of membrane protein targets can be screened simultaneously and a greater number of expression and purification conditions can be employed, leading to a higher probability of successfully determining the structure of membrane proteins. This unit describes the cloning, expression and screening of membrane proteins using high throughput methodologies developed in our laboratory. Basic Protocol 1 deals with the cloning of inserts into expression vectors by ligation-independent cloning. Basic Protocol 2 describes the expression and purification of the target proteins on a miniscale. Lastly, for the targets that express at the miniscale, basic protocols 3 and 4 outline the methods employed for the expression and purification of targets at the midi-scale, as well as a procedure for detergent screening and identification of detergent(s) in which the target protein is stable. PMID:24510647
Crystallization of the Large Membrane Protein Complex Photosystem I in a Microfluidic Channel
Abdallah, Bahige G.; Kupitz, Christopher; Fromme, Petra; Ros, Alexandra
2014-01-01
Traditional macroscale protein crystallization is accomplished non-trivially by exploring a range of protein concentrations and buffers in solution until a suitable combination is attained. This methodology is time consuming and resource intensive, hindering protein structure determination. Even more difficulties arise when crystallizing large membrane protein complexes such as photosystem I (PSI) due to their large unit cells dominated by solvent and complex characteristics that call for even stricter buffer requirements. Structure determination techniques tailored for these ‘difficult to crystallize’ proteins such as femtosecond nanocrystallography are being developed, yet still need specific crystal characteristics. Here, we demonstrate a simple and robust method to screen protein crystallization conditions at low ionic strength in a microfluidic device. This is realized in one microfluidic experiment using low sample amounts, unlike traditional methods where each solution condition is set up separately. Second harmonic generation microscopy via Second Order Nonlinear Imaging of Chiral Crystals (SONICC) was applied for the detection of nanometer and micrometer sized PSI crystals within microchannels. To develop a crystallization phase diagram, crystals imaged with SONICC at specific channel locations were correlated to protein and salt concentrations determined by numerical simulations of the time-dependent diffusion process along the channel. Our method demonstrated that a portion of the PSI crystallization phase diagram could be reconstructed in excellent agreement with crystallization conditions determined by traditional methods. We postulate that this approach could be utilized to efficiently study and optimize crystallization conditions for a wide range of proteins that are poorly understood to date. PMID:24191698
Albumin-stabilized fluorescent silver nanodots
NASA Astrophysics Data System (ADS)
Sych, Tomash; Polyanichko, Alexander; Kononov, Alexei
2017-07-01
Ligand-stabilized Ag nanoclusters (NCs) possess many attractive features including high fluorescence quantum yield, large absorption cross-section, good photostability, large Stokes shift and two-photon absorption cross sections. While plenty of fluorescent clusters have been synthesized on various polymer templates, only a few studies have been reported on the fluorescent Ag clusters on peptides and proteins. We study silver NCs synthesized on different protein matrices, including bovine serum albumin, human serum albumin, egg albumin, equine serum albumin, and lysozyme. Our results show that red-emitting Ag NCs can effectively be stabilized by the disulfide bonds in proteins and that the looser structure of the denatured protein favors formation of the clusters.
Nonlinear scoring functions for similarity-based ligand docking and binding affinity prediction.
Brylinski, Michal
2013-11-25
A common strategy for virtual screening considers a systematic docking of a large library of organic compounds into the target sites in protein receptors with promising leads selected based on favorable intermolecular interactions. Despite a continuous progress in the modeling of protein-ligand interactions for pharmaceutical design, important challenges still remain, thus the development of novel techniques is required. In this communication, we describe eSimDock, a new approach to ligand docking and binding affinity prediction. eSimDock employs nonlinear machine learning-based scoring functions to improve the accuracy of ligand ranking and similarity-based binding pose prediction, and to increase the tolerance to structural imperfections in the target structures. In large-scale benchmarking using the Astex/CCDC data set, we show that 53.9% (67.9%) of the predicted ligand poses have RMSD of <2 Å (<3 Å). Moreover, using binding sites predicted by recently developed eFindSite, eSimDock models ligand binding poses with an RMSD of 4 Å for 50.0-39.7% of the complexes at the protein homology level limited to 80-40%. Simulations against non-native receptor structures, whose mean backbone rearrangements vary from 0.5 to 5.0 Å Cα-RMSD, show that the ratio of docking accuracy and the estimated upper bound is at a constant level of ∼0.65. Pearson correlation coefficient between experimental and predicted by eSimDock Ki values for a large data set of the crystal structures of protein-ligand complexes from BindingDB is 0.58, which decreases only to 0.46 when target structures distorted to 3.0 Å Cα-RMSD are used. Finally, two case studies demonstrate that eSimDock can be customized to specific applications as well. These encouraging results show that the performance of eSimDock is largely unaffected by the deformations of ligand binding regions, thus it represents a practical strategy for across-proteome virtual screening using protein models. eSimDock is freely available to the academic community as a Web server at http://www.brylinski.org/esimdock .
DOE Office of Scientific and Technical Information (OSTI.GOV)
An, Bo; Jenkins, Janelle E; Sampath, Sujatha
Dragline silk from orb-weaving spiders is a copolymer of two large proteins, major ampullate spidroin 1 (MaSp1) and 2 (MaSp2). The ratio of these proteins is known to have a large variation across different species of orb-weaving spiders. NMR results from gland material of two different species of spiders, N. clavipes and A. aurantia, indicates that MaSp1 proteins are more easily formed into β-sheet nanostructures, while MaSp2 proteins form random coil and helical structures. To test if this behavior of natural silk proteins could be reproduced by recombinantly produced spider silk mimic protein, recombinant MaSp1/MaSp2 mixed fibers as well asmore » chimeric silk fibers from MaSp1 and MaSp2 sequences in a single protein were produced based on the variable ratio and conserved motifs of MaSp1 and MaSp2 in native silk fiber. Mechanical properties, solid-state NMR, and XRD results of tested synthetic fibers indicate the differing roles of MaSp1 and MaSp2 in the fiber and verify the importance of postspin stretching treatment in helping the fiber to form the proper spatial structure.« less
Surflex-Dock: Docking benchmarks and real-world application
NASA Astrophysics Data System (ADS)
Spitzer, Russell; Jain, Ajay N.
2012-06-01
Benchmarks for molecular docking have historically focused on re-docking the cognate ligand of a well-determined protein-ligand complex to measure geometric pose prediction accuracy, and measurement of virtual screening performance has been focused on increasingly large and diverse sets of target protein structures, cognate ligands, and various types of decoy sets. Here, pose prediction is reported on the Astex Diverse set of 85 protein ligand complexes, and virtual screening performance is reported on the DUD set of 40 protein targets. In both cases, prepared structures of targets and ligands were provided by symposium organizers. The re-prepared data sets yielded results not significantly different than previous reports of Surflex-Dock on the two benchmarks. Minor changes to protein coordinates resulting from complex pre-optimization had large effects on observed performance, highlighting the limitations of cognate ligand re-docking for pose prediction assessment. Docking protocols developed for cross-docking, which address protein flexibility and produce discrete families of predicted poses, produced substantially better performance for pose prediction. Performance on virtual screening performance was shown to benefit by employing and combining multiple screening methods: docking, 2D molecular similarity, and 3D molecular similarity. In addition, use of multiple protein conformations significantly improved screening enrichment.
Interactive comparison and remediation of collections of macromolecular structures.
Moriarty, Nigel W; Liebschner, Dorothee; Klei, Herbert E; Echols, Nathaniel; Afonine, Pavel V; Headd, Jeffrey J; Poon, Billy K; Adams, Paul D
2018-01-01
Often similar structures need to be compared to reveal local differences throughout the entire model or between related copies within the model. Therefore, a program to compare multiple structures and enable correction any differences not supported by the density map was written within the Phenix framework (Adams et al., Acta Cryst 2010; D66:213-221). This program, called Structure Comparison, can also be used for structures with multiple copies of the same protein chain in the asymmetric unit, that is, as a result of non-crystallographic symmetry (NCS). Structure Comparison was designed to interface with Coot(Emsley et al., Acta Cryst 2010; D66:486-501) and PyMOL(DeLano, PyMOL 0.99; 2002) to facilitate comparison of large numbers of related structures. Structure Comparison analyzes collections of protein structures using several metrics, such as the rotamer conformation of equivalent residues, displays the results in tabular form and allows superimposed protein chains and density maps to be quickly inspected and edited (via the tools in Coot) for consistency, completeness and correctness. © 2017 The Protein Society.
Poppe, Leszek; Jordan, John B; Rogers, Gary; Schnier, Paul D
2015-06-02
An important aspect in the analytical characterization of protein therapeutics is the comprehensive characterization of higher order structure (HOS). Nuclear magnetic resonance (NMR) is arguably the most sensitive method for fingerprinting HOS of a protein in solution. Traditionally, (1)H-(15)N or (1)H-(13)C correlation spectra are used as a "structural fingerprint" of HOS. Here, we demonstrate that protein fingerprint by line shape enhancement (PROFILE), a 1D (1)H NMR spectroscopy fingerprinting approach, is superior to traditional two-dimensional methods using monoclonal antibody samples and a heavily glycosylated protein therapeutic (Epoetin Alfa). PROFILE generates a high resolution structural fingerprint of a therapeutic protein in a fraction of the time required for a 2D NMR experiment. The cross-correlation analysis of PROFILE spectra allows one to distinguish contributions from HOS vs protein heterogeneity, which is difficult to accomplish by 2D NMR. We demonstrate that the major analytical limitation of two-dimensional methods is poor selectivity, which renders these approaches problematic for the purpose of fingerprinting large biological macromolecules.
regSNPs-splicing: a tool for prioritizing synonymous single-nucleotide substitution.
Zhang, Xinjun; Li, Meng; Lin, Hai; Rao, Xi; Feng, Weixing; Yang, Yuedong; Mort, Matthew; Cooper, David N; Wang, Yue; Wang, Yadong; Wells, Clark; Zhou, Yaoqi; Liu, Yunlong
2017-09-01
While synonymous single-nucleotide variants (sSNVs) have largely been unstudied, since they do not alter protein sequence, mounting evidence suggests that they may affect RNA conformation, splicing, and the stability of nascent-mRNAs to promote various diseases. Accurately prioritizing deleterious sSNVs from a pool of neutral ones can significantly improve our ability of selecting functional genetic variants identified from various genome-sequencing projects, and, therefore, advance our understanding of disease etiology. In this study, we develop a computational algorithm to prioritize sSNVs based on their impact on mRNA splicing and protein function. In addition to genomic features that potentially affect splicing regulation, our proposed algorithm also includes dozens structural features that characterize the functions of alternatively spliced exons on protein function. Our systematical evaluation on thousands of sSNVs suggests that several structural features, including intrinsic disorder protein scores, solvent accessible surface areas, protein secondary structures, and known and predicted protein family domains, show significant differences between disease-causing and neutral sSNVs. Our result suggests that the protein structure features offer an added dimension of information while distinguishing disease-causing and neutral synonymous variants. The inclusion of structural features increases the predictive accuracy for functional sSNV prioritization.
Mapping protein-RNA interactions by RCAP, RNA-cross-linking and peptide fingerprinting.
Vaughan, Robert C; Kao, C Cheng
2015-01-01
RNA nanotechnology often feature protein RNA complexes. The interaction between proteins and large RNAs are difficult to study using traditional structure-based methods like NMR or X-ray crystallography. RCAP, an approach that uses reversible-cross-linking affinity purification method coupled with mass spectrometry, has been developed to map regions within proteins that contact RNA. This chapter details how RCAP is applied to map protein-RNA contacts within virions.
Wallace, A. C.; Borkakoti, N.; Thornton, J. M.
1997-01-01
It is well established that sequence templates such as those in the PROSITE and PRINTS databases are powerful tools for predicting the biological function and tertiary structure for newly derived protein sequences. The number of X-ray and NMR protein structures is increasing rapidly and it is apparent that a 3D equivalent of the sequence templates is needed. Here, we describe an algorithm called TESS that automatically derives 3D templates from structures deposited in the Brookhaven Protein Data Bank. While a new sequence can be searched for sequence patterns, a new structure can be scanned against these 3D templates to identify functional sites. As examples, 3D templates are derived for enzymes with an O-His-O "catalytic triad" and for the ribonucleases and lysozymes. When these 3D templates are applied to a large data set of nonidentical proteins, several interesting hits are located. This suggests that the development of a 3D template database may help to identify the function of new protein structures, if unknown, as well as to design proteins with specific functions. PMID:9385633
Structural Variation of Type I-F CRISPR RNA Guided DNA Surveillance.
Pausch, Patrick; Müller-Esparza, Hanna; Gleditzsch, Daniel; Altegoer, Florian; Randau, Lennart; Bange, Gert
2017-08-17
CRISPR-Cas systems are prokaryotic immune systems against invading nucleic acids. Type I CRISPR-Cas systems employ highly diverse, multi-subunit surveillance Cascade complexes that facilitate duplex formation between crRNA and complementary target DNA for R-loop formation, retention, and DNA degradation by the subsequently recruited nuclease Cas3. Typically, the large subunit recognizes bona fide targets through the PAM (protospacer adjacent motif), and the small subunit guides the non-target DNA strand. Here, we present the Apo- and target-DNA-bound structures of the I-Fv (type I-F variant) Cascade lacking the small and large subunits. Large and small subunits are functionally replaced by the 5' terminal crRNA cap Cas5fv and the backbone protein Cas7fv, respectively. Cas5fv facilitates PAM recognition from the DNA major groove site, in contrast to all other described type I systems. Comparison of the type I-Fv Cascade with an anti-CRISPR protein-bound I-F Cascade reveals that the type I-Fv structure differs substantially at known anti-CRISPR protein target sites and might therefore be resistant to viral Cascade interception. Copyright © 2017 Elsevier Inc. All rights reserved.
Conformational Transitions upon Ligand Binding: Holo-Structure Prediction from Apo Conformations
Seeliger, Daniel; de Groot, Bert L.
2010-01-01
Biological function of proteins is frequently associated with the formation of complexes with small-molecule ligands. Experimental structure determination of such complexes at atomic resolution, however, can be time-consuming and costly. Computational methods for structure prediction of protein/ligand complexes, particularly docking, are as yet restricted by their limited consideration of receptor flexibility, rendering them not applicable for predicting protein/ligand complexes if large conformational changes of the receptor upon ligand binding are involved. Accurate receptor models in the ligand-bound state (holo structures), however, are a prerequisite for successful structure-based drug design. Hence, if only an unbound (apo) structure is available distinct from the ligand-bound conformation, structure-based drug design is severely limited. We present a method to predict the structure of protein/ligand complexes based solely on the apo structure, the ligand and the radius of gyration of the holo structure. The method is applied to ten cases in which proteins undergo structural rearrangements of up to 7.1 Å backbone RMSD upon ligand binding. In all cases, receptor models within 1.6 Å backbone RMSD to the target were predicted and close-to-native ligand binding poses were obtained for 8 of 10 cases in the top-ranked complex models. A protocol is presented that is expected to enable structure modeling of protein/ligand complexes and structure-based drug design for cases where crystal structures of ligand-bound conformations are not available. PMID:20066034
Structural study of the membrane protein MscL using cell-free expression and solid-state NMR
NASA Astrophysics Data System (ADS)
Abdine, Alaa; Verhoeven, Michiel A.; Park, Kyu-Ho; Ghazi, Alexandre; Guittet, Eric; Berrier, Catherine; Van Heijenoort, Carine; Warschawski, Dror E.
2010-05-01
High-resolution structures of membrane proteins have so far been obtained mostly by X-ray crystallography, on samples where the protein is surrounded by detergent. Recent developments of solid-state NMR have opened the way to a new approach for the study of integral membrane proteins inside a membrane. At the same time, the extension of cell-free expression to the production of membrane proteins allows for the production of proteins tailor made for NMR. We present here an in situ solid-state NMR study of a membrane protein selectively labeled through the use of cell-free expression. The sample consists of MscL (mechano-sensitive channel of large conductance), a 75 kDa pentameric α-helical ion channel from Escherichia coli, reconstituted in a hydrated lipid bilayer. Compared to a uniformly labeled protein sample, the spectral crowding is greatly reduced in the cell-free expressed protein sample. This approach may be a decisive step required for spectral assignment and structure determination of membrane proteins by solid-state NMR.
The Ramachandran Number: An Order Parameter for Protein Geometry
Mannige, Ranjan V.; Kundu, Joyjit; Whitelam, Stephen; ...
2016-08-04
Three-dimensional protein structures usually contain regions of local order, called secondary structure, such as α-helices and β-sheets. Secondary structure is characterized by the local rotational state of the protein backbone, quantified by two dihedral angles called Øand Ψ. Particular types of secondary structure can generally be described by a single (diffuse) location on a two-dimensional plot drawn in the space of the angles Ø andΨ, called a Ramachandran plot. By contrast, a recently-discovered nanomaterial made from peptoids, structural isomers of peptides, displays a secondary-structure motif corresponding to two regions on the Ramachandran plot [Mannige et al., Nature 526, 415 (2015)].more » In order to describe such 'higher-order' secondary structure in a compact way we introduce here a means of describing regions on the Ramachandran plot in terms of a single Ramachandran number, R, which is a structurally meaningful combination of Ø andΨ. We show that the potential applications of R are numerous: it can be used to describe the geometric content of protein structures, and can be used to draw diagrams that reveal, at a glance, the frequency of occurrence of regular secondary structures and disordered regions in large protein datasets. We propose that R might be used as an order parameter for protein geometry for a wide range of applications.« less
Walther, Cornelia
2015-01-01
The majority of hormones stimulates and mediates their signal transduction via G protein-coupled receptors (GPCRs). The signal is transmitted into the cell due to the association of the GPCRs with heterotrimeric G proteins, which in turn activates an extensive array of signaling pathways to regulate cell physiology. However, GPCRs also function as scaffolds for the recruitment of a variety of cytoplasmic protein-interacting proteins that bind to both the intracellular face and protein interaction motifs encoded by GPCRs. The structural scaffolding of these proteins allows GPCRs to recruit large functional complexes that serve to modulate both G protein-dependent and -independent cellular signaling pathways and modulate GPCR intracellular trafficking. This review focuses on GPCR interacting PSD95-disc large-zona occludens domain containing scaffolds in the regulation of endocrine receptor signaling as well as their potential role as therapeutic targets for the treatment of endocrinopathies. PMID:25942107
State of the APC/C: Organization, function, and structure
McLean, Janel R.; Chaix, Denis; Ohi, Melanie D.; Gould, Kathleen L.
2016-01-01
The ubiquitin-proteasome protein degradation system is involved in many essential cellular processes including cell cycle regulation, cell differentiation, and the unfolded protein response.The anaphase-promoting complex/cyclosome (APC/C), an evolutionary conserved E3 ubiquitin ligase, was discovered 15 years ago because of its pivotal role in cyclin degradation and mitotic progression. Since then, we have learned that the APC/C is a very large, complex E3 ligase composed of 13 subunits, yielding a molecular machine of approximately 1 MDa. The intricate regulation of the APC/C is mediated by the Cdc20 family of activators, pseudosubstrate inhibitors, protein kinases and phosphatases and the spindle assembly checkpoint. The large size, complexity, and dynamic nature of the APC/C represent significant obstacles toward high-resolution structural techniques; however, over the last decade, there have been a number of lower resolution APC/C structures determined using single particle electron microscopy. These structures, when combined with data generated from numerous genetic and biochemical studies, have begun to shed light on how APC/C activity is regulated. Here, we discuss the most recent developments in the APC/C field concerning structure, substrate recognition, and catalysis. PMID:21261459
Structural changes of homodimers in the PDB.
Koike, Ryotaro; Amemiya, Takayuki; Horii, Tatsuya; Ota, Motonori
2018-04-01
Protein complexes are involved in various biological phenomena. These complexes are intrinsically flexible, and structural changes are essential to their functions. To perform a large-scale automated analysis of the structural changes of complexes, we combined two original methods. An application, SCPC, compares two structures of protein complexes and decides the match of binding mode. Another application, Motion Tree, identifies rigid-body motions in various sizes and magnitude from the two structural complexes with the same binding mode. This approach was applied to all available homodimers in the Protein Data Bank (PDB). We defined two complex-specific motions: interface motion and subunit-spanning motion. In the former, each subunit of a complex constitutes a rigid body, and the relative movement between subunits occurs at the interface. In the latter, structural parts from distinct subunits constitute a rigid body, providing the relative movement spanning subunits. All structural changes were classified and examined. It was revealed that the complex-specific motions were common in the homodimers, detected in around 40% of families. The dimeric interfaces were likely to be small and flat for interface motion, while large and rugged for subunit-spanning motion. Interface motion was accompanied by a drastic change in contacts at the interface, while the change in the subunit-spanning motion was moderate. These results indicate that the interface properties of homodimers correlated with the type of complex-specific motion. The study demonstrates that the pipeline of SCPC and Motion Tree is useful for the massive analysis of structural change of protein complexes. Copyright © 2017 Elsevier Inc. All rights reserved.
Ronin, Céline; Costa, David Mendes; Tavares, Joana; Faria, Joana; Ciesielski, Fabrice; Ciapetti, Paola; Smith, Terry K; MacDougall, Jane; Cordeiro-da-Silva, Anabela; Pemberton, Iain K
2018-01-01
The de novo crystal structure of the Leishmania infantum Silent Information Regulator 2 related protein 1 (LiSir2rp1) has been solved at 1.99Å in complex with an acetyl-lysine peptide substrate. The structure is broadly commensurate with Hst2/SIRT2 proteins of yeast and human origin, reproducing many of the structural features common to these sirtuin deacetylases, including the characteristic small zinc-binding domain, and the larger Rossmann-fold domain involved in NAD+-binding interactions. The two domains are linked via a cofactor binding loop ordered in open conformation. The peptide substrate binds to the LiSir2rp1 protein via a cleft formed between the small and large domains, with the acetyl-lysine side chain inserting further into the resultant hydrophobic tunnel. Crystals were obtained only with recombinant LiSir2rp1 possessing an extensive internal deletion of a proteolytically-sensitive region unique to the sirtuins of kinetoplastid origin. Deletion of 51 internal amino acids (P253-E303) from LiSir2rp1 did not appear to alter peptide substrate interactions in deacetylation assays, but was indispensable to obtain crystals. Removal of this potentially flexible region, that otherwise extends from the classical structural elements of the Rossmann-fold, specifically the β8-β9 connector, appears to result in lower accumulation of the protein when expressed from episomal vectors in L. infantum SIR2rp1 single knockout promastigotes. The biological function of the large serine-rich insertion in kinetoplastid/trypanosomatid sirtuins, highlighted as a disordered region with strong potential for post-translational modification, remains unknown but may confer additional cellular functions that are distinct from their human counterparts. These unique molecular features, along with the resolution of the first kinetoplastid sirtuin deacetylase structure, present novel opportunities for drug design against a protein target previously established as essential to parasite survival and proliferation.
Introducing the Levinthal's Protein Folding Paradox and Its Solution
ERIC Educational Resources Information Center
Martínez, Leandro
2014-01-01
The protein folding (Levinthal's) paradox states that it would not be possible in a physically meaningful time to a protein to reach the native (functional) conformation by a random search of the enormously large number of possible structures. This paradox has been solved: it was shown that small biases toward the native conformation result…
Protein Assembly and Building Blocks: Beyond the Limits of the LEGO Brick Metaphor.
Levy, Yaakov
2017-09-26
Proteins, like other biomolecules, have a modular and hierarchical structure. Various building blocks are used to construct proteins of high structural complexity and diverse functionality. In multidomain proteins, for example, domains are fused to each other in different combinations to achieve different functions. Although the LEGO brick metaphor is justified as a means of simplifying the complexity of three-dimensional protein structures, several fundamental properties (such as allostery or the induced-fit mechanism) make deviation from it necessary to respect the plasticity, softness, and cross-talk that are essential to protein function. In this work, we illustrate recently reported protein behavior in multidomain proteins that deviates from the LEGO brick analogy. While earlier studies showed that a protein domain is often unaffected by being fused to another domain or becomes more stable following the formation of a new interface between the tethered domains, destabilization due to tethering has been reported for several systems. We illustrate that tethering may sometimes result in a multidomain protein behaving as "less than the sum of its parts". We survey these cases for which structure additivity does not guarantee thermodynamic additivity. Protein destabilization due to fusion to other domains may be linked in some cases to biological function and should be taken into account when designing large assemblies.
Crystal structure of bacillus subtilis YdaF protein : a putative ribosomal N-acetyltransferase.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Brunzelle, J. S.; Wu, R.; Korolev, S. V.
2004-12-01
Comparative sequence analysis suggests that the ydaF gene encodes a protein (YdaF) that functions as an N-acetyltransferase, more specifically, a ribosomal N-acetyltransferase. Sequence analysis using basic local alignment search tool (BLAST) suggests that YdaF belongs to a large family of proteins (199 proteins found in 88 unique species of bacteria, archaea, and eukaryotes). YdaF also belongs to the COG1670, which includes the Escherichia coli RimL protein that is known to acetylate ribosomal protein L12. N-acetylation (NAT) has been found in all kingdoms. NAT enzymes catalyze the transfer of an acetyl group from acetyl-CoA (AcCoA) to a primary amino group. Formore » example, NATs can acetylate the N-terminal {alpha}-amino group, the {epsilon}-amino group of lysine residues, aminoglycoside antibiotics, spermine/speridine, or arylalkylamines such as serotonin. The crystal structure of the alleged ribosomal NAT protein, YdaF, from Bacillus subtilis presented here was determined as a part of the Midwest Center for Structural Genomics. The structure maintains the conserved tertiary structure of other known NATs and a high sequence similarity in the presumed AcCoA binding pocket in spite of a very low overall level of sequence identity to other NATs of known structure.« less
Isothermal chemical denaturation of large proteins: Path-dependence and irreversibility.
Wafer, Lucas; Kloczewiak, Marek; Polleck, Sharon M; Luo, Yin
2017-12-15
State functions (e.g., ΔG) are path independent and quantitatively describe the equilibrium states of a thermodynamic system. Isothermal chemical denaturation (ICD) is often used to extrapolate state function parameters for protein unfolding in native buffer conditions. The approach is prudent when the unfolding/refolding processes are path independent and reversible, but may lead to erroneous results if the processes are not reversible. The reversibility was demonstrated in several early studies for smaller proteins, but was assumed in some reports for large proteins with complex structures. In this work, the unfolding/refolding of several proteins were systematically studied using an automated ICD instrument. It is shown that: (i) the apparent unfolding mechanism and conformational stability of large proteins can be denaturant-dependent, (ii) equilibration times for large proteins are non-trivial and may introduce significant error into calculations of ΔG, (iii) fluorescence emission spectroscopy may not correspond to other methods, such as circular dichroism, when used to measure protein unfolding, and (iv) irreversible unfolding and hysteresis can occur in the absence of aggregation. These results suggest that thorough confirmation of the state functions by, for example, performing refolding experiments or using additional denaturants, is needed when quantitatively studying the thermodynamics of protein unfolding using ICD. Copyright © 2017 Elsevier Inc. All rights reserved.
Ferreiro, Diego U.; Komives, Elizabeth A.; Wolynes, Peter G.
2014-01-01
Biomolecules are the prime information processing elements of living matter. Most of these inanimate systems are polymers that compute their own structures and dynamics using as input seemingly random character strings of their sequence, following which they coalesce and perform integrated cellular functions. In large computational systems with a finite interaction-codes, the appearance of conflicting goals is inevitable. Simple conflicting forces can lead to quite complex structures and behaviors, leading to the concept of frustration in condensed matter. We present here some basic ideas about frustration in biomolecules and how the frustration concept leads to a better appreciation of many aspects of the architecture of biomolecules, and how biomolecular structure connects to function. These ideas are simultaneously both seductively simple and perilously subtle to grasp completely. The energy landscape theory of protein folding provides a framework for quantifying frustration in large systems and has been implemented at many levels of description. We first review the notion of frustration from the areas of abstract logic and its uses in simple condensed matter systems. We discuss then how the frustration concept applies specifically to heteropolymers, testing folding landscape theory in computer simulations of protein models and in experimentally accessible systems. Studying the aspects of frustration averaged over many proteins provides ways to infer energy functions useful for reliable structure prediction. We discuss how frustration affects folding mechanisms. We review here how a large part of the biological functions of proteins are related to subtle local physical frustration effects and how frustration influences the appearance of metastable states, the nature of binding processes, catalysis and allosteric transitions. We hope to illustrate how Frustration is a fundamental concept in relating function to structural biology. PMID:25225856
Transmembrane proteins in the Protein Data Bank: identification and classification.
Tusnády, Gábor E; Dosztányi, Zsuzsanna; Simon, István
2004-11-22
Integral membrane proteins play important roles in living cells. Although these proteins are estimated to constitute 25% of proteins at a genomic scale, the Protein Data Bank (PDB) contains only a few hundred membrane proteins due to the difficulties with experimental techniques. The presence of transmembrane proteins in the structure data bank, however, is quite invisible, as the annotation of these entries is rather poor. Even if a protein is identified as a transmembrane one, the possible location of the lipid bilayer is not indicated in the PDB because these proteins are crystallized without their natural lipid bilayer, and currently no method is publicly available to detect the possible membrane plane using the atomic coordinates of membrane proteins. Here, we present a new geometrical approach to distinguish between transmembrane and globular proteins using structural information only and to locate the most likely position of the lipid bilayer. An automated algorithm (TMDET) is given to determine the membrane planes relative to the position of atomic coordinates, together with a discrimination function which is able to separate transmembrane and globular proteins even in cases of low resolution or incomplete structures such as fragments or parts of large multi chain complexes. This method can be used for the proper annotation of protein structures containing transmembrane segments and paves the way to an up-to-date database containing the structure of all known transmembrane proteins and fragments (PDB_TM) which can be automatically updated. The algorithm is equally important for the purpose of constructing databases purely of globular proteins.
Molloy, Kevin; Shehu, Amarda
2013-01-01
Many proteins tune their biological function by transitioning between different functional states, effectively acting as dynamic molecular machines. Detailed structural characterization of transition trajectories is central to understanding the relationship between protein dynamics and function. Computational approaches that build on the Molecular Dynamics framework are in principle able to model transition trajectories at great detail but also at considerable computational cost. Methods that delay consideration of dynamics and focus instead on elucidating energetically-credible conformational paths connecting two functionally-relevant structures provide a complementary approach. Effective sampling-based path planning methods originating in robotics have been recently proposed to produce conformational paths. These methods largely model short peptides or address large proteins by simplifying conformational space. We propose a robotics-inspired method that connects two given structures of a protein by sampling conformational paths. The method focuses on small- to medium-size proteins, efficiently modeling structural deformations through the use of the molecular fragment replacement technique. In particular, the method grows a tree in conformational space rooted at the start structure, steering the tree to a goal region defined around the goal structure. We investigate various bias schemes over a progress coordinate for balance between coverage of conformational space and progress towards the goal. A geometric projection layer promotes path diversity. A reactive temperature scheme allows sampling of rare paths that cross energy barriers. Experiments are conducted on small- to medium-size proteins of length up to 214 amino acids and with multiple known functionally-relevant states, some of which are more than 13Å apart of each-other. Analysis reveals that the method effectively obtains conformational paths connecting structural states that are significantly different. A detailed analysis on the depth and breadth of the tree suggests that a soft global bias over the progress coordinate enhances sampling and results in higher path diversity. The explicit geometric projection layer that biases the exploration away from over-sampled regions further increases coverage, often improving proximity to the goal by forcing the exploration to find new paths. The reactive temperature scheme is shown effective in increasing path diversity, particularly in difficult structural transitions with known high-energy barriers.
Crystal Structure of Menin Reveals Binding Site for Mixed Lineage Leukemia (MLL) Protein
DOE Office of Scientific and Technical Information (OSTI.GOV)
Murai, Marcelo J.; Chruszcz, Maksymilian; Reddy, Gireesh
2014-10-02
Menin is a tumor suppressor protein that is encoded by the MEN1 (multiple endocrine neoplasia 1) gene and controls cell growth in endocrine tissues. Importantly, menin also serves as a critical oncogenic cofactor of MLL (mixed lineage leukemia) fusion proteins in acute leukemias. Direct association of menin with MLL fusion proteins is required for MLL fusion protein-mediated leukemogenesis in vivo, and this interaction has been validated as a new potential therapeutic target for development of novel anti-leukemia agents. Here, we report the first crystal structure of menin homolog from Nematostella vectensis. Due to a very high sequence similarity, the Nematostellamore » menin is a close homolog of human menin, and these two proteins likely have very similar structures. Menin is predominantly an {alpha}-helical protein with the protein core comprising three tetratricopeptide motifs that are flanked by two {alpha}-helical bundles and covered by a {beta}-sheet motif. A very interesting feature of menin structure is the presence of a large central cavity that is highly conserved between Nematostella and human menin. By employing site-directed mutagenesis, we have demonstrated that this cavity constitutes the binding site for MLL. Our data provide a structural basis for understanding the role of menin as a tumor suppressor protein and as an oncogenic co-factor of MLL fusion proteins. It also provides essential structural information for development of inhibitors targeting the menin-MLL interaction as a novel therapeutic strategy in MLL-related leukemias.« less
Structure-Templated Predictions of Novel Protein Interactions from Sequence Information
Betel, Doron; Breitkreuz, Kevin E; Isserlin, Ruth; Dewar-Darch, Danielle; Tyers, Mike; Hogue, Christopher W. V
2007-01-01
The multitude of functions performed in the cell are largely controlled by a set of carefully orchestrated protein interactions often facilitated by specific binding of conserved domains in the interacting proteins. Interacting domains commonly exhibit distinct binding specificity to short and conserved recognition peptides called binding profiles. Although many conserved domains are known in nature, only a few have well-characterized binding profiles. Here, we describe a novel predictive method known as domain–motif interactions from structural topology (D-MIST) for elucidating the binding profiles of interacting domains. A set of domains and their corresponding binding profiles were derived from extant protein structures and protein interaction data and then used to predict novel protein interactions in yeast. A number of the predicted interactions were verified experimentally, including new interactions of the mitotic exit network, RNA polymerases, nucleotide metabolism enzymes, and the chaperone complex. These results demonstrate that new protein interactions can be predicted exclusively from sequence information. PMID:17892321
Functional Insights from Structural Genomics
DOE Office of Scientific and Technical Information (OSTI.GOV)
Forouhar,F.; Kuzin, A.; Seetharaman, J.
2007-01-01
Structural genomics efforts have produced structural information, either directly or by modeling, for thousands of proteins over the past few years. While many of these proteins have known functions, a large percentage of them have not been characterized at the functional level. The structural information has provided valuable functional insights on some of these proteins, through careful structural analyses, serendipity, and structure-guided functional screening. Some of the success stories based on structures solved at the Northeast Structural Genomics Consortium (NESG) are reported here. These include a novel methyl salicylate esterase with important role in plant innate immunity, a novel RNAmore » methyltransferase (H. influenzae yggJ (HI0303)), a novel spermidine/spermine N-acetyltransferase (B. subtilis PaiA), a novel methyltransferase or AdoMet binding protein (A. fulgidus AF{_}0241), an ATP:cob(I)alamin adenosyltransferase (B. subtilis YvqK), a novel carboxysome pore (E. coli EutN), a proline racemase homolog with a disrupted active site (B. melitensis BME11586), an FMN-dependent enzyme (S. pneumoniae SP{_}1951), and a 12-stranded {beta}-barrel with a novel fold (V. parahaemolyticus VPA1032).« less
Vibrational resonance, allostery, and activation in rhodopsin-like G protein-coupled receptors
Woods, Kristina N.; Pfeffer, Jürgen; Dutta, Arpana; Klein-Seetharaman, Judith
2016-01-01
G protein-coupled receptors are a large family of membrane proteins activated by a variety of structurally diverse ligands making them highly adaptable signaling molecules. Despite recent advances in the structural biology of this protein family, the mechanism by which ligands induce allosteric changes in protein structure and dynamics for its signaling function remains a mystery. Here, we propose the use of terahertz spectroscopy combined with molecular dynamics simulation and protein evolutionary network modeling to address the mechanism of activation by directly probing the concerted fluctuations of retinal ligand and transmembrane helices in rhodopsin. This approach allows us to examine the role of conformational heterogeneity in the selection and stabilization of specific signaling pathways in the photo-activation of the receptor. We demonstrate that ligand-induced shifts in the conformational equilibrium prompt vibrational resonances in the protein structure that link the dynamics of conserved interactions with fluctuations of the active-state ligand. The connection of vibrational modes creates an allosteric association of coupled fluctuations that forms a coherent signaling pathway from the receptor ligand-binding pocket to the G-protein activation region. Our evolutionary analysis of rhodopsin-like GPCRs suggest that specific allosteric sites play a pivotal role in activating structural fluctuations that allosterically modulate functional signals. PMID:27849063
[NMR structure and dynamics of the chimeric protein SH3-F2].
Kutyshenko, V P; Gushchina, L V; Khristoforov, V S; Prokhorov, D A; Timchenko, M A; Kudrevatykh, Iu A; Fediukina, D V; Filimonov, V V
2010-01-01
For the further elucidation of structural and dynamic principles of protein self-organization and protein-ligand interactions the design of new chimeric protein SH3-F2 was made and genetically engineered construct was created. The SH3-F2 amino acid sequence consists of polyproline ligand mgAPPLPPYSA, GG linker and the sequence of spectrin SH3 domain circular permutant S19-P20s. Structural and dynamics properties of the protein were studied by high-resolution NMR. According to NMR data the tertiary structure of the chimeric protein SH3-F2 has the topology which is typical of SH3 domains in the complex with the ligand, forming polyproline type II helix, located in the conservative region of binding in the orientation II. The polyproline ligand closely adjoins with the protein globule and is stabilized by hydrophobic interactions. However the interaction of ligand and the part of globule relative to SH3 domain is not too large because the analysis of protein dynamic characteristics points to the low amplitude, high-frequency ligand tumbling in relation to the slow intramolecular motions of the main globule. The constructed chimera permits to carry out further structural and thermodynamic investigations of polyproline helix properties and its interaction with regulatory domains.
Theory and Applications of Solid-State NMR Spectroscopy to Biomembrane Structure and Dynamics
NASA Astrophysics Data System (ADS)
Xu, Xiaolin
Solid-state Nuclear Magnetic Resonance (NMR) is one of the premiere biophysical methods that can be applied for addressing the structure and dynamics of biomolecules, including proteins, lipids, and nucleic acids. It illustrates the general problem of determining the average biomolecular structure, including the motional mean-square amplitudes and rates of the fluctuations. Lineshape and relaxtion studies give us a view into the molecular properties under different environments. To help the understanding of NMR theory, both lineshape and relaxation experiments are conducted with hexamethylbezene (HMB). This chemical compound with a simple structure serves as a perfect test molecule. Because of its highly symmetric structure, its motions are not very difficult to understand. The results for HMB set benchmarks for other more complicated systems like membrane proteins. After accumulating a large data set on HMB, we also proceed to develop a completely new method of data analysis, which yields the spectral densities in a body-fixed frame revealing internal motions of the system. Among the possible applications of solid-state NMR spectroscopy, we study the light activation mechanism of visual rhodopsin in lipid membranes. As a prototype of G-protein-coupled receptors, which are a large class of membrane proteins, the cofactor isomerization is triggered by photon absorption, and the local structural change is then propagated to a large-scale conformational change of the protein. Facilitation of the binding of transducin then passes along the visual signal to downstream effector proteins like transducin. To study this process, we introduce 2H labels into the rhodopsin chromophore retinal and the C-terminal peptide of transducin to probe the local structure and dynamics of these two hotspots of the rhodopsin activation process. In addition to the examination of local sites with solid-state 2H NMR spectroscopy, wide angle X-ray scattering (WAXS) provides us the chance of looking at the overall conformational changes through difference scattering profiles. Although the resolution of this method is not as high as NMR spectroscopy, which gives information on atomic scale, the early activation probing is possible because of the short duration of the optical pump and X-ray probe lasers. We can thus visualize the energy dissipation process by observing and comparing the difference scattering profiles at different times after the light activation moments.
Design and applications of a clamp for Green Fluorescent Protein with picomolar affinity
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hansen, Simon; Stüber, Jakob C.; Ernst, Patrick
Green fluorescent protein (GFP) fusions are pervasively used to study structures and processes. Specific GFP-binders are thus of great utility for detection, immobilization or manipulation of GFP-fused molecules. We determined structures of two designed ankyrin repeat proteins (DARPins), complexed with GFP, which revealed different but overlapping epitopes. Here in this paper we show a structure-guided design strategy that, by truncation and computational reengineering, led to a stable construct where both can bind simultaneously: by linkage of the two binders, fusion constructs were obtained that “wrap around” GFP, have very high affinities of about 10–30 pM, and extremely slow off-rates. Theymore » can be natively produced in E. coli in very large amounts, and show excellent biophysical properties. Their very high stability and affinity, facile site-directed functionalization at introduced unique lysines or cysteines facilitate many applications. As examples, we present them as tight yet reversible immobilization reagents for surface plasmon resonance, as fluorescently labelled monomeric detection reagents in flow cytometry, as pull-down ligands to selectively enrich GFP fusion proteins from cell extracts, and as affinity column ligands for inexpensive large-scale protein purification. We have thus described a general design strategy to create a “clamp” from two different high-affinity repeat proteins, even if their epitopes overlap.« less
Design and applications of a clamp for Green Fluorescent Protein with picomolar affinity
Hansen, Simon; Stüber, Jakob C.; Ernst, Patrick; ...
2017-11-24
Green fluorescent protein (GFP) fusions are pervasively used to study structures and processes. Specific GFP-binders are thus of great utility for detection, immobilization or manipulation of GFP-fused molecules. We determined structures of two designed ankyrin repeat proteins (DARPins), complexed with GFP, which revealed different but overlapping epitopes. Here in this paper we show a structure-guided design strategy that, by truncation and computational reengineering, led to a stable construct where both can bind simultaneously: by linkage of the two binders, fusion constructs were obtained that “wrap around” GFP, have very high affinities of about 10–30 pM, and extremely slow off-rates. Theymore » can be natively produced in E. coli in very large amounts, and show excellent biophysical properties. Their very high stability and affinity, facile site-directed functionalization at introduced unique lysines or cysteines facilitate many applications. As examples, we present them as tight yet reversible immobilization reagents for surface plasmon resonance, as fluorescently labelled monomeric detection reagents in flow cytometry, as pull-down ligands to selectively enrich GFP fusion proteins from cell extracts, and as affinity column ligands for inexpensive large-scale protein purification. We have thus described a general design strategy to create a “clamp” from two different high-affinity repeat proteins, even if their epitopes overlap.« less
Unbiased, scalable sampling of protein loop conformations from probabilistic priors.
Zhang, Yajia; Hauser, Kris
2013-01-01
Protein loops are flexible structures that are intimately tied to function, but understanding loop motion and generating loop conformation ensembles remain significant computational challenges. Discrete search techniques scale poorly to large loops, optimization and molecular dynamics techniques are prone to local minima, and inverse kinematics techniques can only incorporate structural preferences in adhoc fashion. This paper presents Sub-Loop Inverse Kinematics Monte Carlo (SLIKMC), a new Markov chain Monte Carlo algorithm for generating conformations of closed loops according to experimentally available, heterogeneous structural preferences. Our simulation experiments demonstrate that the method computes high-scoring conformations of large loops (>10 residues) orders of magnitude faster than standard Monte Carlo and discrete search techniques. Two new developments contribute to the scalability of the new method. First, structural preferences are specified via a probabilistic graphical model (PGM) that links conformation variables, spatial variables (e.g., atom positions), constraints and prior information in a unified framework. The method uses a sparse PGM that exploits locality of interactions between atoms and residues. Second, a novel method for sampling sub-loops is developed to generate statistically unbiased samples of probability densities restricted by loop-closure constraints. Numerical experiments confirm that SLIKMC generates conformation ensembles that are statistically consistent with specified structural preferences. Protein conformations with 100+ residues are sampled on standard PC hardware in seconds. Application to proteins involved in ion-binding demonstrate its potential as a tool for loop ensemble generation and missing structure completion.
Unbiased, scalable sampling of protein loop conformations from probabilistic priors
2013-01-01
Background Protein loops are flexible structures that are intimately tied to function, but understanding loop motion and generating loop conformation ensembles remain significant computational challenges. Discrete search techniques scale poorly to large loops, optimization and molecular dynamics techniques are prone to local minima, and inverse kinematics techniques can only incorporate structural preferences in adhoc fashion. This paper presents Sub-Loop Inverse Kinematics Monte Carlo (SLIKMC), a new Markov chain Monte Carlo algorithm for generating conformations of closed loops according to experimentally available, heterogeneous structural preferences. Results Our simulation experiments demonstrate that the method computes high-scoring conformations of large loops (>10 residues) orders of magnitude faster than standard Monte Carlo and discrete search techniques. Two new developments contribute to the scalability of the new method. First, structural preferences are specified via a probabilistic graphical model (PGM) that links conformation variables, spatial variables (e.g., atom positions), constraints and prior information in a unified framework. The method uses a sparse PGM that exploits locality of interactions between atoms and residues. Second, a novel method for sampling sub-loops is developed to generate statistically unbiased samples of probability densities restricted by loop-closure constraints. Conclusion Numerical experiments confirm that SLIKMC generates conformation ensembles that are statistically consistent with specified structural preferences. Protein conformations with 100+ residues are sampled on standard PC hardware in seconds. Application to proteins involved in ion-binding demonstrate its potential as a tool for loop ensemble generation and missing structure completion. PMID:24565175
Nie, Zhongzhen; Hirsch, Dianne S; Luo, Ruibai; Jian, Xiaoying; Stauffer, Stacey; Cremesti, Aida; Andrade, Josefa; Lebowitz, Jacob; Marino, Michael; Ahvazi, Bijan; Hinshaw, Jenny E; Randazzo, Paul A
2006-01-24
Arf GAPs are multidomain proteins that function in membrane traffic by inactivating the GTP binding protein Arf1. Numerous Arf GAPs contain a BAR domain, a protein structural element that contributes to membrane traffic by either inducing or sensing membrane curvature. We have examined the role of a putative BAR domain in the function of the Arf GAP ASAP1. ASAP1's N terminus, containing the putative BAR domain together with a PH domain, dimerized to form an extended structure that bound to large unilamellar vesicles containing acidic phospholipids, properties that define a BAR domain. A recombinant protein containing the BAR domain of ASAP1, together with the PH and Arf GAP domains, efficiently bent the surface of large unilamellar vesicles, resulting in the formation of tubular structures. This activity was regulated by Arf1*GTP binding to the Arf GAP domain. In vivo, the tubular structures induced by ASAP1 mutants contained epidermal growth factor receptor (EGFR) and Rab11, and ASAP1 colocalized in tubular structures with EGFR during recycling of receptor. Expression of ASAP1 accelerated EGFR trafficking and slowed cell spreading. An ASAP1 mutant lacking the BAR domain had no effect. The N-terminal BAR domain of ASAP1 mediates membrane bending and is necessary for ASAP1 function. The Arf dependence of the bending activity is consistent with ASAP1 functioning as an Arf effector.
A probabilistic model for detecting rigid domains in protein structures.
Nguyen, Thach; Habeck, Michael
2016-09-01
Large-scale conformational changes in proteins are implicated in many important biological functions. These structural transitions can often be rationalized in terms of relative movements of rigid domains. There is a need for objective and automated methods that identify rigid domains in sets of protein structures showing alternative conformational states. We present a probabilistic model for detecting rigid-body movements in protein structures. Our model aims to approximate alternative conformational states by a few structural parts that are rigidly transformed under the action of a rotation and a translation. By using Bayesian inference and Markov chain Monte Carlo sampling, we estimate all parameters of the model, including a segmentation of the protein into rigid domains, the structures of the domains themselves, and the rigid transformations that generate the observed structures. We find that our Gibbs sampling algorithm can also estimate the optimal number of rigid domains with high efficiency and accuracy. We assess the power of our method on several thousand entries of the DynDom database and discuss applications to various complex biomolecular systems. The Python source code for protein ensemble analysis is available at: https://github.com/thachnguyen/motion_detection : mhabeck@gwdg.de. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Applying graph theory to protein structures: an atlas of coiled coils.
Heal, Jack W; Bartlett, Gail J; Wood, Christopher W; Thomson, Andrew R; Woolfson, Derek N
2018-05-02
To understand protein structure, folding and function fully and to design proteins de novo reliably, we must learn from natural protein structures that have been characterised experimentally. The number of protein structures available is large and growing exponentially, which makes this task challenging. Indeed, computational resources are becoming increasingly important for classifying and analysing this resource. Here, we use tools from graph theory to define an atlas classification scheme for automatically categorising certain protein substructures. Focusing on the α-helical coiled coils, which are ubiquitous protein-structure and protein-protein interaction motifs, we present a suite of computational resources designed for analysing these assemblies. iSOCKET enables interactive analysis of side-chain packing within proteins to identify coiled coils automatically and with considerable user control. Applying a graph theory-based atlas classification scheme to structures identified by iSOCKET gives the Atlas of Coiled Coils, a fully automated, updated overview of extant coiled coils. The utility of this approach is illustrated with the first formal classification of an emerging subclass of coiled coils called α-helical barrels. Furthermore, in the Atlas, the known coiled-coil universe is presented alongside a partial enumeration of the 'dark matter' of coiled-coil structures; i.e., those coiled-coil architectures that are theoretically possible but have not been observed to date, and thus present defined targets for protein design. iSOCKET is available as part of the open-source GitHub repository associated with this work (https://github.com/woolfson-group/isocket). This repository also contains all the data generated when classifying the protein graphs. The Atlas of Coiled Coils is available at: http://coiledcoils.chm.bris.ac.uk/atlas/app.
A growing family: the expanding universe of the bacterial cytoskeleton.
Ingerson-Mahar, Michael; Gitai, Zemer
2012-01-01
Cytoskeletal proteins are important mediators of cellular organization in both eukaryotes and bacteria. In the past, cytoskeletal studies have largely focused on three major cytoskeletal families, namely the eukaryotic actin, tubulin, and intermediate filament (IF) proteins and their bacterial homologs MreB, FtsZ, and crescentin. However, mounting evidence suggests that these proteins represent only the tip of the iceberg, as the cellular cytoskeletal network is far more complex. In bacteria, each of MreB, FtsZ, and crescentin represents only one member of large families of diverse homologs. There are also newly identified bacterial cytoskeletal proteins with no eukaryotic homologs, such as WACA proteins and bactofilins. Furthermore, there are universally conserved proteins, such as the metabolic enzyme CtpS, that assemble into filamentous structures that can be repurposed for structural cytoskeletal functions. Recent studies have also identified an increasing number of eukaryotic cytoskeletal proteins that are unrelated to actin, tubulin, and IFs, such that expanding our understanding of cytoskeletal proteins is advancing the understanding of the cell biology of all organisms. Here, we summarize the recent explosion in the identification of new members of the bacterial cytoskeleton and describe a hypothesis for the evolution of the cytoskeleton from self-assembling enzymes. © 2011 Federation of European Microbiological Societies. Published by Blackwell Publishing Ltd. All rights reserved.
Developing eThread pipeline using SAGA-pilot abstraction for large-scale structural bioinformatics.
Ragothaman, Anjani; Boddu, Sairam Chowdary; Kim, Nayong; Feinstein, Wei; Brylinski, Michal; Jha, Shantenu; Kim, Joohyun
2014-01-01
While most of computational annotation approaches are sequence-based, threading methods are becoming increasingly attractive because of predicted structural information that could uncover the underlying function. However, threading tools are generally compute-intensive and the number of protein sequences from even small genomes such as prokaryotes is large typically containing many thousands, prohibiting their application as a genome-wide structural systems biology tool. To leverage its utility, we have developed a pipeline for eThread--a meta-threading protein structure modeling tool, that can use computational resources efficiently and effectively. We employ a pilot-based approach that supports seamless data and task-level parallelism and manages large variation in workload and computational requirements. Our scalable pipeline is deployed on Amazon EC2 and can efficiently select resources based upon task requirements. We present runtime analysis to characterize computational complexity of eThread and EC2 infrastructure. Based on results, we suggest a pathway to an optimized solution with respect to metrics such as time-to-solution or cost-to-solution. Our eThread pipeline can scale to support a large number of sequences and is expected to be a viable solution for genome-scale structural bioinformatics and structure-based annotation, particularly, amenable for small genomes such as prokaryotes. The developed pipeline is easily extensible to other types of distributed cyberinfrastructure.
Developing eThread Pipeline Using SAGA-Pilot Abstraction for Large-Scale Structural Bioinformatics
Ragothaman, Anjani; Feinstein, Wei; Jha, Shantenu; Kim, Joohyun
2014-01-01
While most of computational annotation approaches are sequence-based, threading methods are becoming increasingly attractive because of predicted structural information that could uncover the underlying function. However, threading tools are generally compute-intensive and the number of protein sequences from even small genomes such as prokaryotes is large typically containing many thousands, prohibiting their application as a genome-wide structural systems biology tool. To leverage its utility, we have developed a pipeline for eThread—a meta-threading protein structure modeling tool, that can use computational resources efficiently and effectively. We employ a pilot-based approach that supports seamless data and task-level parallelism and manages large variation in workload and computational requirements. Our scalable pipeline is deployed on Amazon EC2 and can efficiently select resources based upon task requirements. We present runtime analysis to characterize computational complexity of eThread and EC2 infrastructure. Based on results, we suggest a pathway to an optimized solution with respect to metrics such as time-to-solution or cost-to-solution. Our eThread pipeline can scale to support a large number of sequences and is expected to be a viable solution for genome-scale structural bioinformatics and structure-based annotation, particularly, amenable for small genomes such as prokaryotes. The developed pipeline is easily extensible to other types of distributed cyberinfrastructure. PMID:24995285
Feng, Xianchao; Chen, Lin; Lei, Na; Wang, Shuangxi; Xu, Xinglian; Zhou, Guanghong; Li, Zhixi
2017-04-05
The dose-dependent effects of (-)-epigallocatechin-3-gallate (EGCG; 0, 100, or 1000 ppm) on the textural properties and stability of a myofibrillar protein (MP) emulsion gel were investigated. Addition of EGCG significantly inhibited formation of carbonyl but promoted the loss of both thiol and free amine groups. Addition of EGCG, particularly at 1000 ppm, initiated irreversible protein modifications, as evidenced by surface hydrophobicity changes, patterns in sodium dodecyl sulfate-polyacrylamide gel electrophoresis, and differential scanning calorimetry. These results indicated that MP was modified by additive reactions between the quinone of EGCG and thiols and free amines of proteins. These adducts increased cooking loss and destabilized the texture, especially with a large EGCG dose. Confocal laser scanning microscopy and scanning electron microscopy images clearly indicated the damage to the emulsifying properties and the collapse of the internal structure when the MP emulsion gel was treated with a large EGCG dose. A high concentration of NaCl (0.6 M) improved modification of MP and increased the rate of deterioration of the internal structure, especially with the large EGCG dose (1000 ppm), resulting in an MP emulsion gel with extremely unstable emulsifying properties.
Hot-spot analysis for drug discovery targeting protein-protein interactions.
Rosell, Mireia; Fernández-Recio, Juan
2018-04-01
Protein-protein interactions are important for biological processes and pathological situations, and are attractive targets for drug discovery. However, rational drug design targeting protein-protein interactions is still highly challenging. Hot-spot residues are seen as the best option to target such interactions, but their identification requires detailed structural and energetic characterization, which is only available for a tiny fraction of protein interactions. Areas covered: In this review, the authors cover a variety of computational methods that have been reported for the energetic analysis of protein-protein interfaces in search of hot-spots, and the structural modeling of protein-protein complexes by docking. This can help to rationalize the discovery of small-molecule inhibitors of protein-protein interfaces of therapeutic interest. Computational analysis and docking can help to locate the interface, molecular dynamics can be used to find suitable cavities, and hot-spot predictions can focus the search for inhibitors of protein-protein interactions. Expert opinion: A major difficulty for applying rational drug design methods to protein-protein interactions is that in the majority of cases the complex structure is not available. Fortunately, computational docking can complement experimental data. An interesting aspect to explore in the future is the integration of these strategies for targeting PPIs with large-scale mutational analysis.
Elberson, Benjamin W.; Whisenant, Ty E.; Cortes, D. Marien; Cuello, Luis G.
2017-01-01
The Erwinia chrisanthemi ligand-gated ion channel, ELIC, is considered an excellent structural and functional surrogate for the whole pentameric ligand-gated ion channel family. Despite its simplicity, ELIC is structurally capable of undergoing ligand-dependent activation and a concomitant desensitization process. To determine at the molecular level the structural changes underlying ELIC’s function, it is desirable to produce large quantities of protein. This protein should be properly folded, fully-functional and amenable to structural determinations. In the current paper, we report a completely new protocol for the expression and purification of milligram quantities of fully-functional, more stable and crystallizable ELIC. The use of an autoinduction media and inexpensive detergents during ELIC extraction, in addition to the high-quality and large quantity of the purified channel, are the highlights of this improved biochemical protocol. PMID:28279818
CABS-flex: server for fast simulation of protein structure fluctuations
Jamroz, Michal; Kolinski, Andrzej; Kmiecik, Sebastian
2013-01-01
The CABS-flex server (http://biocomp.chem.uw.edu.pl/CABSflex) implements CABS-model–based protocol for the fast simulations of near-native dynamics of globular proteins. In this application, the CABS model was shown to be a computationally efficient alternative to all-atom molecular dynamics—a classical simulation approach. The simulation method has been validated on a large set of molecular dynamics simulation data. Using a single input (user-provided file in PDB format), the CABS-flex server outputs an ensemble of protein models (in all-atom PDB format) reflecting the flexibility of the input structure, together with the accompanying analysis (residue mean-square-fluctuation profile and others). The ensemble of predicted models can be used in structure-based studies of protein functions and interactions. PMID:23658222
CABS-flex: Server for fast simulation of protein structure fluctuations.
Jamroz, Michal; Kolinski, Andrzej; Kmiecik, Sebastian
2013-07-01
The CABS-flex server (http://biocomp.chem.uw.edu.pl/CABSflex) implements CABS-model-based protocol for the fast simulations of near-native dynamics of globular proteins. In this application, the CABS model was shown to be a computationally efficient alternative to all-atom molecular dynamics--a classical simulation approach. The simulation method has been validated on a large set of molecular dynamics simulation data. Using a single input (user-provided file in PDB format), the CABS-flex server outputs an ensemble of protein models (in all-atom PDB format) reflecting the flexibility of the input structure, together with the accompanying analysis (residue mean-square-fluctuation profile and others). The ensemble of predicted models can be used in structure-based studies of protein functions and interactions.
PLI: a web-based tool for the comparison of protein-ligand interactions observed on PDB structures.
Gallina, Anna Maria; Bisignano, Paola; Bergamino, Maurizio; Bordo, Domenico
2013-02-01
A large fraction of the entries contained in the Protein Data Bank describe proteins in complex with low molecular weight molecules such as physiological compounds or synthetic drugs. In many cases, the same molecule is found in distinct protein-ligand complexes. There is an increasing interest in Medicinal Chemistry in comparing protein binding sites to get insight on interactions that modulate the binding specificity, as this structural information can be correlated with other experimental data of biochemical or physiological nature and may help in rational drug design. The web service protein-ligand interaction presented here provides a tool to analyse and compare the binding pockets of homologous proteins in complex with a selected ligand. The information is deduced from protein-ligand complexes present in the Protein Data Bank and stored in the underlying database. Freely accessible at http://bioinformatics.istge.it/pli/.
Looking at the Disordered Proteins through the Computational Microscope.
Das, Payel; Matysiak, Silvina; Mittal, Jeetain
2018-05-23
Intrinsically disordered proteins (IDPs) have attracted wide interest over the past decade due to their surprising prevalence in the proteome and versatile roles in cell physiology and pathology. A large selection of IDPs has been identified as potential targets for therapeutic intervention. Characterizing the structure-function relationship of disordered proteins is therefore an essential but daunting task, as these proteins can adapt transient structure, necessitating a new paradigm for connecting structural disorder to function. Molecular simulation has emerged as a natural complement to experiments for atomic-level characterizations and mechanistic investigations of this intriguing class of proteins. The diverse range of length and time scales involved in IDP function requires performing simulations at multiple levels of resolution. In this Outlook, we focus on summarizing available simulation methods, along with a few interesting example applications. We also provide an outlook on how these simulation methods can be further improved in order to provide a more accurate description of IDP structure, binding, and assembly.
Integrated Structural Biology for α-Helical Membrane Protein Structure Determination.
Xia, Yan; Fischer, Axel W; Teixeira, Pedro; Weiner, Brian; Meiler, Jens
2018-04-03
While great progress has been made, only 10% of the nearly 1,000 integral, α-helical, multi-span membrane protein families are represented by at least one experimentally determined structure in the PDB. Previously, we developed the algorithm BCL::MP-Fold, which samples the large conformational space of membrane proteins de novo by assembling predicted secondary structure elements guided by knowledge-based potentials. Here, we present a case study of rhodopsin fold determination by integrating sparse and/or low-resolution restraints from multiple experimental techniques including electron microscopy, electron paramagnetic resonance spectroscopy, and nuclear magnetic resonance spectroscopy. Simultaneous incorporation of orthogonal experimental restraints not only significantly improved the sampling accuracy but also allowed identification of the correct fold, which is demonstrated by a protein size-normalized transmembrane root-mean-square deviation as low as 1.2 Å. The protocol developed in this case study can be used for the determination of unknown membrane protein folds when limited experimental restraints are available. Copyright © 2018 Elsevier Ltd. All rights reserved.
Polymeric assembly of gluten proteins in an aqueous ethanol solvent.
Dahesh, Mohsen; Banc, Amélie; Duri, Agnès; Morel, Marie-Hélène; Ramos, Laurence
2014-09-25
The supramolecular organization of wheat gluten proteins is largely unknown due to the intrinsic complexity of this family of proteins and their insolubility in water. We fractionate gluten in a water/ethanol mixture (50/50 v/v) and obtain a protein extract which is depleted in gliadin, the monomeric part of wheat gluten proteins, and enriched in glutenin, the polymeric part of wheat gluten proteins. We investigate the structure of the proteins in the solvent used for extraction over a wide range of concentration, by combining X-ray scattering and multiangle static and dynamic light scattering. Our data show that, in the ethanol/water mixture, the proteins display features characteristic of flexible polymer chains in a good solvent. In the dilute regime, the proteins form very loose structures of characteristic size 150 nm, with an internal dynamics which is quantitatively similar to that of branched polymer coils. In more concentrated regimes, data highlight a hierarchical structure with one characteristic length scale of the order of a few nm, which displays the scaling with concentration expected for a semidilute polymer in good solvent, and a fractal arrangement at a much larger length scale. This structure is strikingly similar to that of polymeric gels, thus providing some factual knowledge to rationalize the viscoelastic properties of wheat gluten proteins and their assemblies.
Computational design of an endo-1,4-[beta]-xylanase ligand binding site
DOE Office of Scientific and Technical Information (OSTI.GOV)
Morin, Andrew; Kaufmann, Kristian W.; Fortenberry, Carie
2012-09-05
The field of computational protein design has experienced important recent success. However, the de novo computational design of high-affinity protein-ligand interfaces is still largely an open challenge. Using the Rosetta program, we attempted the in silico design of a high-affinity protein interface to a small peptide ligand. We chose the thermophilic endo-1,4-{beta}-xylanase from Nonomuraea flexuosa as the protein scaffold on which to perform our designs. Over the course of the study, 12 proteins derived from this scaffold were produced and assayed for binding to the target ligand. Unfortunately, none of the designed proteins displayed evidence of high-affinity binding. Structural characterizationmore » of four designed proteins revealed that although the predicted structure of the protein model was highly accurate, this structural accuracy did not translate into accurate prediction of binding affinity. Crystallographic analyses indicate that the lack of binding affinity is possibly due to unaccounted for protein dynamics in the 'thumb' region of our design scaffold intrinsic to the family 11 {beta}-xylanase fold. Further computational analysis revealed two specific, single amino acid substitutions responsible for an observed change in backbone conformation, and decreased dynamic stability of the catalytic cleft. These findings offer new insight into the dynamic and structural determinants of the {beta}-xylanase proteins.« less
Structural basis for activity of highly efficient RNA mimics of green fluorescent protein
Warner, Katherine Deigan; Chen, Michael C.; Song, Wenjiao; Strack, Rita L.; Thorn, Andrea; Jaffrey, Samie R.; Ferré-D’Amaré, Adrian R.
2014-01-01
Green fluorescent protein (GFP) and its derivatives revolutionized the study of proteins. Spinach is a recently reported in vitro evolved RNA mimic of GFP, which as genetically encoded fusions, makes possible live-cell, real-time imaging of biological RNAs, without resorting to large RNA-binding protein-GFP fusions. To elucidate the molecular basis of Spinach fluorescence, we have solved its co-crystal structure bound to its cognate exogenous chromophore, revealing that Spinach activates the small molecule by immobilizing it between a base triple, a G-quadruplex, and an unpaired guanine. Mutational and NMR analyses indicate that the G-quadruplex is essential for Spinach fluorescence, is also present in other fluorogenic RNAs, and may represent a general strategy for RNAs to induce fluorescence of chromophores. The structure has guided the design of a miniaturized 'Baby Spinach', and provides the foundation for structure-driven design and tuning of fluorescent RNAs. PMID:25026079
Sequence-similar, structure-dissimilar protein pairs in the PDB.
Kosloff, Mickey; Kolodny, Rachel
2008-05-01
It is often assumed that in the Protein Data Bank (PDB), two proteins with similar sequences will also have similar structures. Accordingly, it has proved useful to develop subsets of the PDB from which "redundant" structures have been removed, based on a sequence-based criterion for similarity. Similarly, when predicting protein structure using homology modeling, if a template structure for modeling a target sequence is selected by sequence alone, this implicitly assumes that all sequence-similar templates are equivalent. Here, we show that this assumption is often not correct and that standard approaches to create subsets of the PDB can lead to the loss of structurally and functionally important information. We have carried out sequence-based structural superpositions and geometry-based structural alignments of a large number of protein pairs to determine the extent to which sequence similarity ensures structural similarity. We find many examples where two proteins that are similar in sequence have structures that differ significantly from one another. The source of the structural differences usually has a functional basis. The number of such proteins pairs that are identified and the magnitude of the dissimilarity depend on the approach that is used to calculate the differences; in particular sequence-based structure superpositioning will identify a larger number of structurally dissimilar pairs than geometry-based structural alignments. When two sequences can be aligned in a statistically meaningful way, sequence-based structural superpositioning provides a meaningful measure of structural differences. This approach and geometry-based structure alignments reveal somewhat different information and one or the other might be preferable in a given application. Our results suggest that in some cases, notably homology modeling, the common use of nonredundant datasets, culled from the PDB based on sequence, may mask important structural and functional information. We have established a data base of sequence-similar, structurally dissimilar protein pairs that will help address this problem (http://luna.bioc.columbia.edu/rachel/seqsimstrdiff.htm).
Nanoliter-Scale Protein Crystallization and Screening with a Microfluidic Droplet Robot
Zhu, Ying; Zhu, Li-Na; Guo, Rui; Cui, Heng-Jun; Ye, Sheng; Fang, Qun
2014-01-01
Large-scale screening of hundreds or even thousands of crystallization conditions while with low sample consumption is in urgent need, in current structural biology research. Here we describe a fully-automated droplet robot for nanoliter-scale crystallization screening that combines the advantages of both automated robotics technique for protein crystallization screening and the droplet-based microfluidic technique. A semi-contact dispensing method was developed to achieve flexible, programmable and reliable liquid-handling operations for nanoliter-scale protein crystallization experiments. We applied the droplet robot in large-scale screening of crystallization conditions of five soluble proteins and one membrane protein with 35–96 different crystallization conditions, study of volume effects on protein crystallization, and determination of phase diagrams of two proteins. The volume for each droplet reactor is only ca. 4–8 nL. The protein consumption significantly reduces 50–500 fold compared with current crystallization stations. PMID:24854085
Nanoliter-scale protein crystallization and screening with a microfluidic droplet robot.
Zhu, Ying; Zhu, Li-Na; Guo, Rui; Cui, Heng-Jun; Ye, Sheng; Fang, Qun
2014-05-23
Large-scale screening of hundreds or even thousands of crystallization conditions while with low sample consumption is in urgent need, in current structural biology research. Here we describe a fully-automated droplet robot for nanoliter-scale crystallization screening that combines the advantages of both automated robotics technique for protein crystallization screening and the droplet-based microfluidic technique. A semi-contact dispensing method was developed to achieve flexible, programmable and reliable liquid-handling operations for nanoliter-scale protein crystallization experiments. We applied the droplet robot in large-scale screening of crystallization conditions of five soluble proteins and one membrane protein with 35-96 different crystallization conditions, study of volume effects on protein crystallization, and determination of phase diagrams of two proteins. The volume for each droplet reactor is only ca. 4-8 nL. The protein consumption significantly reduces 50-500 fold compared with current crystallization stations.
How Many Protein Sequences Fold to a Given Structure? A Coevolutionary Analysis.
Tian, Pengfei; Best, Robert B
2017-10-17
Quantifying the relationship between protein sequence and structure is key to understanding the protein universe. A fundamental measure of this relationship is the total number of amino acid sequences that can fold to a target protein structure, known as the "sequence capacity," which has been suggested as a proxy for how designable a given protein fold is. Although sequence capacity has been extensively studied using lattice models and theory, numerical estimates for real protein structures are currently lacking. In this work, we have quantitatively estimated the sequence capacity of 10 proteins with a variety of different structures using a statistical model based on residue-residue co-evolution to capture the variation of sequences from the same protein family. Remarkably, we find that even for the smallest protein folds, such as the WW domain, the number of foldable sequences is extremely large, exceeding the Avogadro constant. In agreement with earlier theoretical work, the calculated sequence capacity is positively correlated with the size of the protein, or better, the density of contacts. This allows the absolute sequence capacity of a given protein to be approximately predicted from its structure. On the other hand, the relative sequence capacity, i.e., normalized by the total number of possible sequences, is an extremely tiny number and is strongly anti-correlated with the protein length. Thus, although there may be more foldable sequences for larger proteins, it will be much harder to find them. Lastly, we have correlated the evolutionary age of proteins in the CATH database with their sequence capacity as predicted by our model. The results suggest a trade-off between the opposing requirements of high designability and the likelihood of a novel fold emerging by chance. Published by Elsevier Inc.
Sudha, Govindarajan; Srinivasan, Narayanaswamy
2016-09-01
A comprehensive analysis of the quaternary features of distantly related homo-oligomeric proteins is the focus of the current study. This study has been performed at the levels of quaternary state, symmetry, and quaternary structure. Quaternary state and quaternary structure refers to the number of subunits and spatial arrangements of subunits, respectively. Using a large dataset of available 3D structures of biologically relevant assemblies, we show that only 53% of the distantly related homo-oligomeric proteins have the same quaternary state. Considering these homologous homo-oligomers with the same quaternary state, conservation of quaternary structures is observed only in 38% of the pairs. In 36% of the pairs of distantly related homo-oligomers with different quaternary states the larger assembly in a pair shows high structural similarity with the entire quaternary structure of the related protein with lower quaternary state and it is referred as "Russian doll effect." The differences in quaternary state and structure have been suggested to contribute to the functional diversity. Detailed investigations show that even though the gross functions of many distantly related homo-oligomers are the same, finer level differences in molecular functions are manifested by differences in quaternary states and structures. Comparison of structures of biological assemblies in distantly and closely related homo-oligomeric proteins throughout the study differentiates the effects of sequence divergence on the quaternary structures and function. Knowledge inferred from this study can provide insights for improved protein structure classification and function prediction of homo-oligomers. Proteins 2016; 84:1190-1202. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
NASA Astrophysics Data System (ADS)
Eschweiler, Joseph D.; Frank, Aaron T.; Ruotolo, Brandon T.
2017-10-01
Multiprotein complexes are central to our understanding of cellular biology, as they play critical roles in nearly every biological process. Despite many impressive advances associated with structural characterization techniques, large and highly-dynamic protein complexes are too often refractory to analysis by conventional, high-resolution approaches. To fill this gap, ion mobility-mass spectrometry (IM-MS) methods have emerged as a promising approach for characterizing the structures of challenging assemblies due in large part to the ability of these methods to characterize the composition, connectivity, and topology of large, labile complexes. In this Critical Insight, we present a series of bioinformatics studies aimed at assessing the information content of IM-MS datasets for building models of multiprotein structure. Our computational data highlights the limits of current coarse-graining approaches, and compelled us to develop an improved workflow for multiprotein topology modeling, which we benchmark against a subset of the multiprotein complexes within the PDB. This improved workflow has allowed us to ascertain both the minimal experimental restraint sets required for generation of high-confidence multiprotein topologies, and quantify the ambiguity in models where insufficient IM-MS information is available. We conclude by projecting the future of IM-MS in the context of protein quaternary structure assignment, where we predict that a more complete knowledge of the ultimate information content and ambiguity within such models will undoubtedly lead to applications for a broader array of challenging biomolecular assemblies. [Figure not available: see fulltext.
Protein crystal growth in microgravity: Temperature induced large scale crystallization of insulin
NASA Technical Reports Server (NTRS)
Long, Marianna M.; Delucas, Larry J.; Smith, C.; Carson, M.; Moore, K.; Harrington, Michael D.; Pillion, D. J.; Bishop, S. P.; Rosenblum, W. M.; Naumann, R. J.
1994-01-01
One of the major stumbling blocks that prevents rapid structure determination using x-ray crystallography is macro-molecular crystal growth. There are many examples where crystallization takes longer than structure determination. In some cases, it is impossible to grow useful crystals on earth. Recent experiments conducted in conjuction with NASA on various Space Shuttle missions have demonstrated that protein crystals often grow larger and display better internal molecular order than their earth-grown counterparts. This paper reports results from three Shuttle flights using the Protein Crystallization Facility (PCF). The PCF hardware produced large, high-quality insulin crystals by using a temperature change as the sole means to affect protein solubility and thus, crystallization. The facility consists of cylinders/containers with volumes of 500, 200, 100, and 50 ml. Data from the three Shuttle flights demonstrated that larger, higher resolution crystals (as evidenced by x-ray diffraction data) were obtained from the microgravity experiments when compared to earth-grown crystals.
Protein model discrimination using mutational sensitivity derived from deep sequencing.
Adkar, Bharat V; Tripathi, Arti; Sahoo, Anusmita; Bajaj, Kanika; Goswami, Devrishi; Chakrabarti, Purbani; Swarnkar, Mohit K; Gokhale, Rajesh S; Varadarajan, Raghavan
2012-02-08
A major bottleneck in protein structure prediction is the selection of correct models from a pool of decoys. Relative activities of ∼1,200 individual single-site mutants in a saturation library of the bacterial toxin CcdB were estimated by determining their relative populations using deep sequencing. This phenotypic information was used to define an empirical score for each residue (RankScore), which correlated with the residue depth, and identify active-site residues. Using these correlations, ∼98% of correct models of CcdB (RMSD ≤ 4Å) were identified from a large set of decoys. The model-discrimination methodology was further validated on eleven different monomeric proteins using simulated RankScore values. The methodology is also a rapid, accurate way to obtain relative activities of each mutant in a large pool and derive sequence-structure-function relationships without protein isolation or characterization. It can be applied to any system in which mutational effects can be monitored by a phenotypic readout. Copyright © 2012 Elsevier Ltd. All rights reserved.
Enabling Large-Scale Design, Synthesis and Validation of Small Molecule Protein-Protein Antagonists
Koes, David; Khoury, Kareem; Huang, Yijun; Wang, Wei; Bista, Michal; Popowicz, Grzegorz M.; Wolf, Siglinde; Holak, Tad A.; Dömling, Alexander; Camacho, Carlos J.
2012-01-01
Although there is no shortage of potential drug targets, there are only a handful known low-molecular-weight inhibitors of protein-protein interactions (PPIs). One problem is that current efforts are dominated by low-yield high-throughput screening, whose rigid framework is not suitable for the diverse chemotypes present in PPIs. Here, we developed a novel pharmacophore-based interactive screening technology that builds on the role anchor residues, or deeply buried hot spots, have in PPIs, and redesigns these entry points with anchor-biased virtual multicomponent reactions, delivering tens of millions of readily synthesizable novel compounds. Application of this approach to the MDM2/p53 cancer target led to high hit rates, resulting in a large and diverse set of confirmed inhibitors, and co-crystal structures validate the designed compounds. Our unique open-access technology promises to expand chemical space and the exploration of the human interactome by leveraging in-house small-scale assays and user-friendly chemistry to rationally design ligands for PPIs with known structure. PMID:22427896
Covering complete proteomes with X-ray structures: A current snapshot
Mizianty, Marcin J.; Fan, Xiao; Yan, Jing; ...
2014-10-23
Structural genomics programs have developed and applied structure-determination pipelines to a wide range of protein targets, facilitating the visualization of macromolecular interactions and the understanding of their molecular and biochemical functions. The fundamental question of whether three-dimensional structures of all proteins and all functional annotations can be determined using X-ray crystallography is investigated. A first-of-its-kind large-scale analysis of crystallization propensity for all proteins encoded in 1953 fully sequenced genomes was performed. It is shown that current X-ray crystallographic knowhow combined with homology modeling can provide structures for 25% of modeling families (protein clusters for which structural models can be obtainedmore » through homology modeling), with at least one structural model produced for each Gene Ontology functional annotation. The coverage varies between superkingdoms, with 19% for eukaryotes, 35% for bacteria and 49% for archaea, and with those of viruses following the coverage values of their hosts. It is shown that the crystallization propensities of proteomes from the taxonomic superkingdoms are distinct. The use of knowledge-based target selection is shown to substantially increase the ability to produce X-ray structures. It is demonstrated that the human proteome has one of the highest attainable coverage values among eukaryotes, and GPCR membrane proteins suitable for X-ray structure determination were determined.« less
Protein structural similarity search by Ramachandran codes
Lo, Wei-Cheng; Huang, Po-Jung; Chang, Chih-Hung; Lyu, Ping-Chiang
2007-01-01
Background Protein structural data has increased exponentially, such that fast and accurate tools are necessary to access structure similarity search. To improve the search speed, several methods have been designed to reduce three-dimensional protein structures to one-dimensional text strings that are then analyzed by traditional sequence alignment methods; however, the accuracy is usually sacrificed and the speed is still unable to match sequence similarity search tools. Here, we aimed to improve the linear encoding methodology and develop efficient search tools that can rapidly retrieve structural homologs from large protein databases. Results We propose a new linear encoding method, SARST (Structural similarity search Aided by Ramachandran Sequential Transformation). SARST transforms protein structures into text strings through a Ramachandran map organized by nearest-neighbor clustering and uses a regenerative approach to produce substitution matrices. Then, classical sequence similarity search methods can be applied to the structural similarity search. Its accuracy is similar to Combinatorial Extension (CE) and works over 243,000 times faster, searching 34,000 proteins in 0.34 sec with a 3.2-GHz CPU. SARST provides statistically meaningful expectation values to assess the retrieved information. It has been implemented into a web service and a stand-alone Java program that is able to run on many different platforms. Conclusion As a database search method, SARST can rapidly distinguish high from low similarities and efficiently retrieve homologous structures. It demonstrates that the easily accessible linear encoding methodology has the potential to serve as a foundation for efficient protein structural similarity search tools. These search tools are supposed applicable to automated and high-throughput functional annotations or predictions for the ever increasing number of published protein structures in this post-genomic era. PMID:17716377
Objective identification of residue ranges for the superposition of protein structures
2011-01-01
Background The automation of objectively selecting amino acid residue ranges for structure superpositions is important for meaningful and consistent protein structure analyses. So far there is no widely-used standard for choosing these residue ranges for experimentally determined protein structures, where the manual selection of residue ranges or the use of suboptimal criteria remain commonplace. Results We present an automated and objective method for finding amino acid residue ranges for the superposition and analysis of protein structures, in particular for structure bundles resulting from NMR structure calculations. The method is implemented in an algorithm, CYRANGE, that yields, without protein-specific parameter adjustment, appropriate residue ranges in most commonly occurring situations, including low-precision structure bundles, multi-domain proteins, symmetric multimers, and protein complexes. Residue ranges are chosen to comprise as many residues of a protein domain that increasing their number would lead to a steep rise in the RMSD value. Residue ranges are determined by first clustering residues into domains based on the distance variance matrix, and then refining for each domain the initial choice of residues by excluding residues one by one until the relative decrease of the RMSD value becomes insignificant. A penalty for the opening of gaps favours contiguous residue ranges in order to obtain a result that is as simple as possible, but not simpler. Results are given for a set of 37 proteins and compared with those of commonly used protein structure validation packages. We also provide residue ranges for 6351 NMR structures in the Protein Data Bank. Conclusions The CYRANGE method is capable of automatically determining residue ranges for the superposition of protein structure bundles for a large variety of protein structures. The method correctly identifies ordered regions. Global structure superpositions based on the CYRANGE residue ranges allow a clear presentation of the structure, and unnecessary small gaps within the selected ranges are absent. In the majority of cases, the residue ranges from CYRANGE contain fewer gaps and cover considerably larger parts of the sequence than those from other methods without significantly increasing the RMSD values. CYRANGE thus provides an objective and automatic method for standardizing the choice of residue ranges for the superposition of protein structures. PMID:21592348
2013-01-01
Background In recent years, various types of cellular networks have penetrated biology and are nowadays used omnipresently for studying eukaryote and prokaryote organisms. Still, the relation and the biological overlap among phenomenological and inferential gene networks, e.g., between the protein interaction network and the gene regulatory network inferred from large-scale transcriptomic data, is largely unexplored. Results We provide in this study an in-depth analysis of the structural, functional and chromosomal relationship between a protein-protein network, a transcriptional regulatory network and an inferred gene regulatory network, for S. cerevisiae and E. coli. Further, we study global and local aspects of these networks and their biological information overlap by comparing, e.g., the functional co-occurrence of Gene Ontology terms by exploiting the available interaction structure among the genes. Conclusions Although the individual networks represent different levels of cellular interactions with global structural and functional dissimilarities, we observe crucial functions of their network interfaces for the assembly of protein complexes, proteolysis, transcription, translation, metabolic and regulatory interactions. Overall, our results shed light on the integrability of these networks and their interfacing biological processes. PMID:23663484
Huang, Wei; Ravikumar, Krishnakumar M; Parisien, Marc; Yang, Sichun
2016-12-01
Structural determination of protein-protein complexes such as multidomain nuclear receptors has been challenging for high-resolution structural techniques. Here, we present a combined use of multiple biophysical methods, termed iSPOT, an integration of shape information from small-angle X-ray scattering (SAXS), protection factors probed by hydroxyl radical footprinting, and a large series of computationally docked conformations from rigid-body or molecular dynamics (MD) simulations. Specifically tested on two model systems, the power of iSPOT is demonstrated to accurately predict the structures of a large protein-protein complex (TGFβ-FKBP12) and a multidomain nuclear receptor homodimer (HNF-4α), based on the structures of individual components of the complexes. Although neither SAXS nor footprinting alone can yield an unambiguous picture for each complex, the combination of both, seamlessly integrated in iSPOT, narrows down the best-fit structures that are about 3.2Å and 4.2Å in RMSD from their corresponding crystal structures, respectively. Furthermore, this proof-of-principle study based on the data synthetically derived from available crystal structures shows that the iSPOT-using either rigid-body or MD-based flexible docking-is capable of overcoming the shortcomings of standalone computational methods, especially for HNF-4α. By taking advantage of the integration of SAXS-based shape information and footprinting-based protection/accessibility as well as computational docking, this iSPOT platform is set to be a powerful approach towards accurate integrated modeling of many challenging multiprotein complexes. Copyright © 2016 Elsevier Inc. All rights reserved.
Microgravity protein crystallization
McPherson, Alexander; DeLucas, Lawrence James
2015-01-01
Over the past 20 years a variety of technological advances in X-ray crystallography have shortened the time required to determine the structures of large macromolecules (i.e., proteins and nucleic acids) from several years to several weeks or days. However, one of the remaining challenges is the ability to produce diffraction-quality crystals suitable for a detailed structural analysis. Although the development of automated crystallization systems combined with protein engineering (site-directed mutagenesis to enhance protein solubility and crystallization) have improved crystallization success rates, there remain hundreds of proteins that either cannot be crystallized or yield crystals of insufficient quality to support X-ray structure determination. In an attempt to address this bottleneck, an international group of scientists has explored use of a microgravity environment to crystallize macromolecules. This paper summarizes the history of this international initiative along with a description of some of the flight hardware systems and crystallization results. PMID:28725714
Structure of a group II intron in complex with its reverse transcriptase.
Qu, Guosheng; Kaushal, Prem Singh; Wang, Jia; Shigematsu, Hideki; Piazza, Carol Lyn; Agrawal, Rajendra Kumar; Belfort, Marlene; Wang, Hong-Wei
2016-06-01
Bacterial group II introns are large catalytic RNAs related to nuclear spliceosomal introns and eukaryotic retrotransposons. They self-splice, yielding mature RNA, and integrate into DNA as retroelements. A fully active group II intron forms a ribonucleoprotein complex comprising the intron ribozyme and an intron-encoded protein that performs multiple activities including reverse transcription, in which intron RNA is copied into the DNA target. Here we report cryo-EM structures of an endogenously spliced Lactococcus lactis group IIA intron in its ribonucleoprotein complex form at 3.8-Å resolution and in its protein-depleted form at 4.5-Å resolution, revealing functional coordination of the intron RNA with the protein. Remarkably, the protein structure reveals a close relationship between the reverse transcriptase catalytic domain and telomerase, whereas the active splicing center resembles the spliceosomal Prp8 protein. These extraordinary similarities hint at intricate ancestral relationships and provide new insights into splicing and retromobility.
Interrogating viral capsid assembly with ion mobility-mass spectrometry
NASA Astrophysics Data System (ADS)
Uetrecht, Charlotte; Barbu, Ioana M.; Shoemaker, Glen K.; van Duijn, Esther; Heck, Albert J. R.
2011-02-01
Most proteins fulfil their function as part of large protein complexes. Surprisingly, little is known about the pathways and regulation of protein assembly. Several viral coat proteins can spontaneously assemble into capsids in vitro with morphologies identical to the native virion and thus resemble ideal model systems for studying protein complex formation. Even for these systems, the mechanism for self-assembly is still poorly understood, although it is generally thought that smaller oligomeric structures form key intermediates. This assembly nucleus and larger viral assembly intermediates are typically low abundant and difficult to monitor. Here, we characterised small oligomers of Hepatitis B virus (HBV) and norovirus under equilibrium conditions using native ion mobility mass spectrometry. This data in conjunction with computational modelling enabled us to elucidate structural features of these oligomers. Instead of more globular shapes, the intermediates exhibit sheet-like structures suggesting that they are assembly competent. We propose pathways for the formation of both capsids.
Pharmacophore screening of the protein data bank for specific binding site chemistry.
Campagna-Slater, Valérie; Arrowsmith, Andrew G; Zhao, Yong; Schapira, Matthieu
2010-03-22
A simple computational approach was developed to screen the Protein Data Bank (PDB) for putative pockets possessing a specific binding site chemistry and geometry. The method employs two commonly used 3D screening technologies, namely identification of cavities in protein structures and pharmacophore screening of chemical libraries. For each protein structure, a pocket finding algorithm is used to extract potential binding sites containing the correct types of residues, which are then stored in a large SDF-formatted virtual library; pharmacophore filters describing the desired binding site chemistry and geometry are then applied to screen this virtual library and identify pockets matching the specified structural chemistry. As an example, this approach was used to screen all human protein structures in the PDB and identify sites having chemistry similar to that of known methyl-lysine binding domains that recognize chromatin methylation marks. The selected genes include known readers of the histone code as well as novel binding pockets that may be involved in epigenetic signaling. Putative allosteric sites were identified on the structures of TP53BP1, L3MBTL3, CHEK1, KDM4A, and CREBBP.
Cho, Kyung Ho; Bae, Hyoung Eun; Das, Manabendra; Gellman, Samuel H; Chae, Pil Seok
2014-02-01
Membrane proteins are inherently amphipathic and undergo dynamic conformational changes for proper function within native membranes. Maintaining the functional structures of these biomacromolecules in aqueous media is necessary for structural studies but difficult to achieve with currently available tools, thus necessitating the development of novel agents with favorable properties. This study introduces several new glucose-neopentyl glycol (GNG) amphiphiles and reveals some agents that display favorable behaviors for the solubilization and stabilization of a large, multi-subunit membrane protein assembly. Furthermore, a detergent structure-property relationship that could serve as a useful guideline for the design of novel amphiphiles is discussed. Copyright © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
The Structure and Function of Non-Collagenous Bone Proteins
NASA Technical Reports Server (NTRS)
Hook, Magnus
1997-01-01
The long-term goal for this program is to determine the structural and functional relationships of bone proteins and proteins that interact with bone. This information will used to design useful pharmacological compounds that will have a beneficial effect in osteoporotic patients and in the osteoporotic-like effects experienced on long duration space missions. The first phase of this program, funded under a cooperative research agreement with NASA through the Texas Medical Center, aimed to develop powerful recombinant expression systems and purification methods for production of large amounts of target proteins. Proteins expressed in sufficient'amount and purity would be characterized by a variety of structural methods, and made available for crystallization studies. In order to increase the likelihood of crystallization and subsequent high resolution solution of structures, we undertook to develop expression of normal and mutant forms of proteins by bacterial and mammalian cells. In addition to the main goals of this program, we would also be able to provide reagents for other related studies, including development of anti-fibrotic and anti-metastatic therapeutics.
Protein Flexibility Facilitates Quaternary Structure Assembly and Evolution
Marsh, Joseph A.; Teichmann, Sarah A.
2014-01-01
The intrinsic flexibility of proteins allows them to undergo large conformational fluctuations in solution or upon interaction with other molecules. Proteins also commonly assemble into complexes with diverse quaternary structure arrangements. Here we investigate how the flexibility of individual protein chains influences the assembly and evolution of protein complexes. We find that flexibility appears to be particularly conducive to the formation of heterologous (i.e., asymmetric) intersubunit interfaces. This leads to a strong association between subunit flexibility and homomeric complexes with cyclic and asymmetric quaternary structure topologies. Similarly, we also observe that the more nonhomologous subunits that assemble together within a complex, the more flexible those subunits tend to be. Importantly, these findings suggest that subunit flexibility should be closely related to the evolutionary history of a complex. We confirm this by showing that evolutionarily more recent subunits are generally more flexible than evolutionarily older subunits. Finally, we investigate the very different explorations of quaternary structure space that have occurred in different evolutionary lineages. In particular, the increased flexibility of eukaryotic proteins appears to enable the assembly of heteromeric complexes with more unique components. PMID:24866000
Schoborg, Todd; Rickels, Ryan; Barrios, Josh
2013-01-01
Chromatin insulators assist in the formation of higher-order chromatin structures by mediating long-range contacts between distant genomic sites. It has been suggested that insulators accomplish this task by forming dense nuclear foci termed insulator bodies that result from the coalescence of multiple protein-bound insulators. However, these structures remain poorly understood, particularly the mechanisms triggering body formation and their role in nuclear function. In this paper, we show that insulator proteins undergo a dramatic and dynamic spatial reorganization into insulator bodies during osmostress and cell death in a high osmolarity glycerol–p38 mitogen-activated protein kinase–independent manner, leading to a large reduction in DNA-bound insulator proteins that rapidly repopulate chromatin as the bodies disassemble upon return to isotonicity. These bodies occupy distinct nuclear territories and contain a defined structural arrangement of insulator proteins. Our findings suggest insulator bodies are novel nuclear stress foci that can be used as a proxy to monitor the chromatin-bound state of insulator proteins and provide new insights into the effects of osmostress on nuclear and genome organization. PMID:23878275
Screening and large-scale expression of membrane proteins in mammalian cells for structural studies.
Goehring, April; Lee, Chia-Hsueh; Wang, Kevin H; Michel, Jennifer Carlisle; Claxton, Derek P; Baconguis, Isabelle; Althoff, Thorsten; Fischer, Suzanne; Garcia, K Christopher; Gouaux, Eric
2014-11-01
Structural, biochemical and biophysical studies of eukaryotic membrane proteins are often hampered by difficulties in overexpression of the candidate molecule. Baculovirus transduction of mammalian cells (BacMam), although a powerful method to heterologously express membrane proteins, can be cumbersome for screening and expression of multiple constructs. We therefore developed plasmid Eric Gouaux (pEG) BacMam, a vector optimized for use in screening assays, as well as for efficient production of baculovirus and robust expression of the target protein. In this protocol, we show how to use small-scale transient transfection and fluorescence-detection size-exclusion chromatography (FSEC) experiments using a GFP-His8-tagged candidate protein to screen for monodispersity and expression level. Once promising candidates are identified, we describe how to generate baculovirus, transduce HEK293S GnTI(-) (N-acetylglucosaminyltransferase I-negative) cells in suspension culture and overexpress the candidate protein. We have used these methods to prepare pure samples of chicken acid-sensing ion channel 1a (cASIC1) and Caenorhabditis elegans glutamate-gated chloride channel (GluCl) for X-ray crystallography, demonstrating how to rapidly and efficiently screen hundreds of constructs and accomplish large-scale expression in 4-6 weeks.
Screening and large-scale expression of membrane proteins in mammalian cells for structural studies
Goehring, April; Lee, Chia-Hsueh; Wang, Kevin H.; Michel, Jennifer Carlisle; Claxton, Derek P.; Baconguis, Isabelle; Althoff, Thorsten; Fischer, Suzanne; Garcia, K. Christopher; Gouaux, Eric
2014-01-01
Structural, biochemical and biophysical studies of eukaryotic membrane proteins are often hampered by difficulties in over-expression of the candidate molecule. Baculovirus transduction of mammalian cells (BacMam), although a powerful method to heterologously express membrane proteins, can be cumbersome for screening and expression of multiple constructs. We therefore developed plasmid Eric Gouaux (pEG) BacMam, a vector optimized for use in screening assays, as well as for efficient production of baculovirus and robust expression of the target protein. In this protocol we show how to use small-scale transient transfection and fluorescence-detection, size-exclusion chromatography (FSEC) experiments using a GFP-His8 tagged candidate protein to screen for monodispersity and expression level. Once promising candidates are identified, we describe how to generate baculovirus, transduce HEK293S GnTI− (N-acetylglucosaminyltransferase I-negative) cells in suspension culture, and over-express the candidate protein. We have used these methods to prepare pure samples of chicken acid-sensing ion channel 1a (cASIC1) and Caenorhabditis elegans glutamate-gated chloride channel (GluCl), for X-ray crystallography, demonstrating how to rapidly and efficiently screen hundreds of constructs and accomplish large-scale expression in 4-6 weeks. PMID:25299155
Wood, Christopher W; Bruning, Marc; Ibarra, Amaurys Á; Bartlett, Gail J; Thomson, Andrew R; Sessions, Richard B; Brady, R Leo; Woolfson, Derek N
2014-11-01
The ability to accurately model protein structures at the atomistic level underpins efforts to understand protein folding, to engineer natural proteins predictably and to design proteins de novo. Homology-based methods are well established and produce impressive results. However, these are limited to structures presented by and resolved for natural proteins. Addressing this problem more widely and deriving truly ab initio models requires mathematical descriptions for protein folds; the means to decorate these with natural, engineered or de novo sequences; and methods to score the resulting models. We present CCBuilder, a web-based application that tackles the problem for a defined but large class of protein structure, the α-helical coiled coils. CCBuilder generates coiled-coil backbones, builds side chains onto these frameworks and provides a range of metrics to measure the quality of the models. Its straightforward graphical user interface provides broad functionality that allows users to build and assess models, in which helix geometry, coiled-coil architecture and topology and protein sequence can be varied rapidly. We demonstrate the utility of CCBuilder by assembling models for 653 coiled-coil structures from the PDB, which cover >96% of the known coiled-coil types, and by generating models for rarer and de novo coiled-coil structures. CCBuilder is freely available, without registration, at http://coiledcoils.chm.bris.ac.uk/app/cc_builder/. © The Author 2014. Published by Oxford University Press.
Covalent Bonding of Chlorogenic Acid Induces Structural Modifications on Sunflower Proteins.
Karefyllakis, Dimitris; Salakou, Stavroula; Bitter, J Harry; van der Goot, Atze J; Nikiforidis, Constantinos V
2018-02-19
Proteins and phenols coexist in the confined space of plant cells leading to reactions between them, which result in new covalently bonded complex molecules. This kind of reactions has been widely observed during storage and processing of plant materials. However, the nature of the new complex molecules and their physicochemical properties are largely unknown. Therefore, we investigated the structural characteristics of covalently bonded complexes between sunflower protein isolate (SFPI, protein content 85 wt %) and the dominant phenol in the confined space of a sunflower seed cell (chlorogenic acid, CGA). It was shown that the efficiency of bond formation goes through a maximum as a function of the SFPI:CGA ratio. Moreover, the bonding of CGA with proteins resulted in changes in the secondary and tertiary structure of the protein. It was also shown that the phenol bound strongly to the protein, which resulted in new crosslinks between the polypeptide chains. As a result, secondary structures like α-helices and β-sheets diminished, which in turn resulted in more disordered domains and a subsequent modification of the tertiary structure of the proteins. These findings are relevant for establishing future protocols for extraction of high-quality proteins and phenols when utilizing plant material and offer insight into the impact of processing that these ingredients endure. © 2018 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.
Cross-Link Guided Molecular Modeling with ROSETTA
Leitner, Alexander; Rosenberger, George; Aebersold, Ruedi; Malmström, Lars
2013-01-01
Chemical cross-links identified by mass spectrometry generate distance restraints that reveal low-resolution structural information on proteins and protein complexes. The technology to reliably generate such data has become mature and robust enough to shift the focus to the question of how these distance restraints can be best integrated into molecular modeling calculations. Here, we introduce three workflows for incorporating distance restraints generated by chemical cross-linking and mass spectrometry into ROSETTA protocols for comparative and de novo modeling and protein-protein docking. We demonstrate that the cross-link validation and visualization software Xwalk facilitates successful cross-link data integration. Besides the protocols we introduce XLdb, a database of chemical cross-links from 14 different publications with 506 intra-protein and 62 inter-protein cross-links, where each cross-link can be mapped on an experimental structure from the Protein Data Bank. Finally, we demonstrate on a protein-protein docking reference data set the impact of virtual cross-links on protein docking calculations and show that an inter-protein cross-link can reduce on average the RMSD of a docking prediction by 5.0 Å. The methods and results presented here provide guidelines for the effective integration of chemical cross-link data in molecular modeling calculations and should advance the structural analysis of particularly large and transient protein complexes via hybrid structural biology methods. PMID:24069194
Anjos, Liliana; Morgado, Isabel; Guerreiro, Marta; Cardoso, João C R; Melo, Eduardo P; Power, Deborah M
2017-02-01
Cartilage acidic protein1 (CRTAC1) is an extracellular matrix protein of chondrogenic tissue in humans and its presence in bacteria indicate it is of ancient origin. Structural modeling of piscine CRTAC1 reveals it belongs to the large family of beta-propeller proteins that in mammals have been associated with diseases, including amyloid diseases such as Alzheimer's. In order to characterize the structure/function evolution of this new member of the beta-propeller family we exploited the unique characteristics of piscine duplicate genes Crtac1a and Crtac1b and compared their structural and biochemical modifications with human recombinant CRTAC1. We demonstrate that CRTAC1 has a beta-propeller structure that has been conserved during evolution and easily forms high molecular weight thermo-stable aggregates. We reveal for the first time the propensity of CRTAC1 to form amyloid-like structures, and hypothesize that the aggregating property of CRTAC1 may be related to its disease-association. We further contribute to the general understating of CRTAC1's and beta-propeller family evolution and function. Proteins 2017; 85:242-255. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
Machinery of protein folding and unfolding.
Zhang, Xiaodong; Beuron, Fabienne; Freemont, Paul S
2002-04-01
During the past two years, a large amount of biochemical, biophysical and low- to high-resolution structural data have provided mechanistic insights into the machinery of protein folding and unfolding. It has emerged that dual functionality in terms of folding and unfolding might exist for some systems. The majority of folding/unfolding machines adopt oligomeric ring structures in a cooperative fashion and utilise the conformational changes induced by ATP binding/hydrolysis for their specific functions.
Zhang, Chengxin; Zheng, Wei; Freddolino, Peter L; Zhang, Yang
2018-03-10
Homology-based transferal remains the major approach to computational protein function annotations, but it becomes increasingly unreliable when the sequence identity between query and template decreases below 30%. We propose a novel pipeline, MetaGO, to deduce Gene Ontology attributes of proteins by combining sequence homology-based annotation with low-resolution structure prediction and comparison, and partner's homology-based protein-protein network mapping. The pipeline was tested on a large-scale set of 1000 non-redundant proteins from the CAFA3 experiment. Under the stringent benchmark conditions where templates with >30% sequence identity to the query are excluded, MetaGO achieves average F-measures of 0.487, 0.408, and 0.598, for Molecular Function, Biological Process, and Cellular Component, respectively, which are significantly higher than those achieved by other state-of-the-art function annotations methods. Detailed data analysis shows that the major advantage of the MetaGO lies in the new functional homolog detections from partner's homology-based network mapping and structure-based local and global structure alignments, the confidence scores of which can be optimally combined through logistic regression. These data demonstrate the power of using a hybrid model incorporating protein structure and interaction networks to deduce new functional insights beyond traditional sequence homology-based referrals, especially for proteins that lack homologous function templates. The MetaGO pipeline is available at http://zhanglab.ccmb.med.umich.edu/MetaGO/. Copyright © 2018. Published by Elsevier Ltd.
Minimalist design of water-soluble cross-[beta] architecture
DOE Office of Scientific and Technical Information (OSTI.GOV)
Biancalana, Matthew; Makabe, Koki; Koide, Shohei
Demonstrated successes of protein design and engineering suggest significant potential to produce diverse protein architectures and assemblies beyond those found in nature. Here, we describe a new class of synthetic protein architecture through the successful design and atomic structures of water-soluble cross-{beta} proteins. The cross-{beta} motif is formed from the lamination of successive {beta}-sheet layers, and it is abundantly observed in the core of insoluble amyloid fibrils associated with protein-misfolding diseases. Despite its prominence, cross-{beta} has been designed only in the context of insoluble aggregates of peptides or proteins. Cross-{beta}'s recalcitrance to protein engineering and conspicuous absence among the knownmore » atomic structures of natural proteins thus makes it a challenging target for design in a water-soluble form. Through comparative analysis of the cross-{beta} structures of fibril-forming peptides, we identified rows of hydrophobic residues ('ladders') running across {beta}-strands of each {beta}-sheet layer as a minimal component of the cross-{beta} motif. Grafting a single ladder of hydrophobic residues designed from the Alzheimer's amyloid-{beta} peptide onto a large {beta}-sheet protein formed a dimeric protein with a cross-{beta} architecture that remained water-soluble, as revealed by solution analysis and x-ray crystal structures. These results demonstrate that the cross-{beta} motif is a stable architecture in water-soluble polypeptides and can be readily designed. Our results provide a new route for accessing the cross-{beta} structure and expanding the scope of protein design.« less
Minimalist design of water-soluble cross-beta architecture.
Biancalana, Matthew; Makabe, Koki; Koide, Shohei
2010-02-23
Demonstrated successes of protein design and engineering suggest significant potential to produce diverse protein architectures and assemblies beyond those found in nature. Here, we describe a new class of synthetic protein architecture through the successful design and atomic structures of water-soluble cross-beta proteins. The cross-beta motif is formed from the lamination of successive beta-sheet layers, and it is abundantly observed in the core of insoluble amyloid fibrils associated with protein-misfolding diseases. Despite its prominence, cross-beta has been designed only in the context of insoluble aggregates of peptides or proteins. Cross-beta's recalcitrance to protein engineering and conspicuous absence among the known atomic structures of natural proteins thus makes it a challenging target for design in a water-soluble form. Through comparative analysis of the cross-beta structures of fibril-forming peptides, we identified rows of hydrophobic residues ("ladders") running across beta-strands of each beta-sheet layer as a minimal component of the cross-beta motif. Grafting a single ladder of hydrophobic residues designed from the Alzheimer's amyloid-beta peptide onto a large beta-sheet protein formed a dimeric protein with a cross-beta architecture that remained water-soluble, as revealed by solution analysis and x-ray crystal structures. These results demonstrate that the cross-beta motif is a stable architecture in water-soluble polypeptides and can be readily designed. Our results provide a new route for accessing the cross-beta structure and expanding the scope of protein design.
Computational mining for hypothetical patterns of amino acid side chains in protein data bank (PDB)
NASA Astrophysics Data System (ADS)
Ghani, Nur Syatila Ab; Firdaus-Raih, Mohd
2018-04-01
The three-dimensional structure of a protein can provide insights regarding its function. Functional relationship between proteins can be inferred from fold and sequence similarities. In certain cases, sequence or fold comparison fails to conclude homology between proteins with similar mechanism. Since the structure is more conserved than the sequence, a constellation of functional residues can be similarly arranged among proteins of similar mechanism. Local structural similarity searches are able to detect such constellation of amino acids among distinct proteins, which can be useful to annotate proteins of unknown function. Detection of such patterns of amino acids on a large scale can increase the repertoire of important 3D motifs since available known 3D motifs currently, could not compensate the ever-increasing numbers of uncharacterized proteins to be annotated. Here, a computational platform for an automated detection of 3D motifs is described. A fuzzy-pattern searching algorithm derived from IMagine an Amino Acid 3D Arrangement search EnGINE (IMAAAGINE) was implemented to develop an automated method for searching of hypothetical patterns of amino acid side chains in Protein Data Bank (PDB), without the need for prior knowledge on related sequence or structure of pattern of interest. We present an example of the searches, which is the detection of a hypothetical pattern derived from known structural motif of C2H2 structural pattern from zinc fingers. The conservation of particular patterns of amino acid side chains in unrelated proteins is highlighted. This approach can act as a complementary method for available structure- and sequence-based platforms and may contribute in improving functional association between proteins.
Ołdziej, S; Czaplewski, C; Liwo, A; Chinchio, M; Nanias, M; Vila, J A; Khalili, M; Arnautova, Y A; Jagielska, A; Makowski, M; Schafroth, H D; Kaźmierkiewicz, R; Ripoll, D R; Pillardy, J; Saunders, J A; Kang, Y K; Gibson, K D; Scheraga, H A
2005-05-24
Recent improvements in the protein-structure prediction method developed in our laboratory, based on the thermodynamic hypothesis, are described. The conformational space is searched extensively at the united-residue level by using our physics-based UNRES energy function and the conformational space annealing method of global optimization. The lowest-energy coarse-grained structures are then converted to an all-atom representation and energy-minimized with the ECEPP/3 force field. The procedure was assessed in two recent blind tests of protein-structure prediction. During the first blind test, we predicted large fragments of alpha and alpha+beta proteins [60-70 residues with C(alpha) rms deviation (rmsd) <6 A]. However, for alpha+beta proteins, significant topological errors occurred despite low rmsd values. In the second exercise, we predicted whole structures of five proteins (two alpha and three alpha+beta, with sizes of 53-235 residues) with remarkably good accuracy. In particular, for the genomic target TM0487 (a 102-residue alpha+beta protein from Thermotoga maritima), we predicted the complete, topologically correct structure with 7.3-A C(alpha) rmsd. So far this protein is the largest alpha+beta protein predicted based solely on the amino acid sequence and a physics-based potential-energy function and search procedure. For target T0198, a phosphate transport system regulator PhoU from T. maritima (a 235-residue mainly alpha-helical protein), we predicted the topology of the whole six-helix bundle correctly within 8 A rmsd, except the 32 C-terminal residues, most of which form a beta-hairpin. These and other examples described in this work demonstrate significant progress in physics-based protein-structure prediction.
The 15-K neutron structure of saccharide-free concanavalin A.
Blakeley, M P; Kalb, A J; Helliwell, J R; Myles, D A A
2004-11-23
The positions of the ordered hydrogen isotopes of a protein and its bound solvent can be determined by using neutron crystallography. Furthermore, by collecting neutron data at cryo temperatures, the dynamic disorder within a protein crystal is reduced, which may lead to improved definition of the nuclear density. It has proved possible to cryo-cool very large Con A protein crystals (>1.5 mm3) suitable for high-resolution neutron and x-ray structure analysis. We can thereby report the neutron crystal structure of the saccharide-free form of Con A and its bound water, including 167 intact D2O molecules and 60 oxygen atoms at 15 K to 2.5-A resolution, along with the 1.65-A x-ray structure of an identical crystal at 100 K. Comparison with the 293-K neutron structure shows that the bound water molecules are better ordered and have lower average B factors than those at room temperature. Overall, twice as many bound waters (as D2O) are identified at 15 K than at 293 K. We note that alteration of bound water orientations occurs between 293 and 15 K; such changes, as illustrated here with this example, could be important more generally in protein crystal structure analysis and ligand design. Methodologically, this successful neutron cryo protein structure refinement opens up categories of neutron protein crystallography, including freeze-trapped structures and cryo to room temperature comparisons.
Parallel Computational Protein Design.
Zhou, Yichao; Donald, Bruce R; Zeng, Jianyang
2017-01-01
Computational structure-based protein design (CSPD) is an important problem in computational biology, which aims to design or improve a prescribed protein function based on a protein structure template. It provides a practical tool for real-world protein engineering applications. A popular CSPD method that guarantees to find the global minimum energy solution (GMEC) is to combine both dead-end elimination (DEE) and A* tree search algorithms. However, in this framework, the A* search algorithm can run in exponential time in the worst case, which may become the computation bottleneck of large-scale computational protein design process. To address this issue, we extend and add a new module to the OSPREY program that was previously developed in the Donald lab (Gainza et al., Methods Enzymol 523:87, 2013) to implement a GPU-based massively parallel A* algorithm for improving protein design pipeline. By exploiting the modern GPU computational framework and optimizing the computation of the heuristic function for A* search, our new program, called gOSPREY, can provide up to four orders of magnitude speedups in large protein design cases with a small memory overhead comparing to the traditional A* search algorithm implementation, while still guaranteeing the optimality. In addition, gOSPREY can be configured to run in a bounded-memory mode to tackle the problems in which the conformation space is too large and the global optimal solution cannot be computed previously. Furthermore, the GPU-based A* algorithm implemented in the gOSPREY program can be combined with the state-of-the-art rotamer pruning algorithms such as iMinDEE (Gainza et al., PLoS Comput Biol 8:e1002335, 2012) and DEEPer (Hallen et al., Proteins 81:18-39, 2013) to also consider continuous backbone and side-chain flexibility.
The First Mammalian Aldehyde Oxidase Crystal Structure
Coelho, Catarina; Mahro, Martin; Trincão, José; Carvalho, Alexandra T. P.; Ramos, Maria João; Terao, Mineko; Garattini, Enrico; Leimkühler, Silke; Romão, Maria João
2012-01-01
Aldehyde oxidases (AOXs) are homodimeric proteins belonging to the xanthine oxidase family of molybdenum-containing enzymes. Each 150-kDa monomer contains a FAD redox cofactor, two spectroscopically distinct [2Fe-2S] clusters, and a molybdenum cofactor located within the protein active site. AOXs are characterized by broad range substrate specificity, oxidizing different aldehydes and aromatic N-heterocycles. Despite increasing recognition of its role in the metabolism of drugs and xenobiotics, the physiological function of the protein is still largely unknown. We have crystallized and solved the crystal structure of mouse liver aldehyde oxidase 3 to 2.9 Å. This is the first mammalian AOX whose structure has been solved. The structure provides important insights into the protein active center and further evidence on the catalytic differences characterizing AOX and xanthine oxidoreductase. The mouse liver aldehyde oxidase 3 three-dimensional structure combined with kinetic, mutagenesis data, molecular docking, and molecular dynamics studies make a decisive contribution to understand the molecular basis of its rather broad substrate specificity. PMID:23019336
Chen, Mingchen; Lin, Xingcheng; Zheng, Weihua; Onuchic, José N; Wolynes, Peter G
2016-08-25
The associative memory, water mediated, structure and energy model (AWSEM) is a coarse-grained force field with transferable tertiary interactions that incorporates local in sequence energetic biases using bioinformatically derived structural information about peptide fragments with locally similar sequences that we call memories. The memory information from the protein data bank (PDB) database guides proper protein folding. The structural information about available sequences in the database varies in quality and can sometimes lead to frustrated free energy landscapes locally. One way out of this difficulty is to construct the input fragment memory information from all-atom simulations of portions of the complete polypeptide chain. In this paper, we investigate this approach first put forward by Kwac and Wolynes in a more complete way by studying the structure prediction capabilities of this approach for six α-helical proteins. This scheme which we call the atomistic associative memory, water mediated, structure and energy model (AAWSEM) amounts to an ab initio protein structure prediction method that starts from the ground up without using bioinformatic input. The free energy profiles from AAWSEM show that atomistic fragment memories are sufficient to guide the correct folding when tertiary forces are included. AAWSEM combines the efficiency of coarse-grained simulations on the full protein level with the local structural accuracy achievable from all-atom simulations of only parts of a large protein. The results suggest that a hybrid use of atomistic fragment memory and database memory in structural predictions may well be optimal for many practical applications.
The use of experimental structures to model protein dynamics.
Katebi, Ataur R; Sankar, Kannan; Jia, Kejue; Jernigan, Robert L
2015-01-01
The number of solved protein structures submitted in the Protein Data Bank (PDB) has increased dramatically in recent years. For some specific proteins, this number is very high-for example, there are over 550 solved structures for HIV-1 protease, one protein that is essential for the life cycle of human immunodeficiency virus (HIV) which causes acquired immunodeficiency syndrome (AIDS) in humans. The large number of structures for the same protein and its variants include a sample of different conformational states of the protein. A rich set of structures solved experimentally for the same protein has information buried within the dataset that can explain the functional dynamics and structural mechanism of the protein. To extract the dynamics information and functional mechanism from the experimental structures, this chapter focuses on two methods-Principal Component Analysis (PCA) and Elastic Network Models (ENM). PCA is a widely used statistical dimensionality reduction technique to classify and visualize high-dimensional data. On the other hand, ENMs are well-established simple biophysical method for modeling the functionally important global motions of proteins. This chapter covers the basics of these two. Moreover, an improved ENM version that utilizes the variations found within a given set of structures for a protein is described. As a practical example, we have extracted the functional dynamics and mechanism of HIV-1 protease dimeric structure by using a set of 329 PDB structures of this protein. We have described, step by step, how to select a set of protein structures, how to extract the needed information from the PDB files for PCA, how to extract the dynamics information using PCA, how to calculate ENM modes, how to measure the congruency between the dynamics computed from the principal components (PCs) and the ENM modes, and how to compute entropies using the PCs. We provide the computer programs or references to software tools to accomplish each step and show how to use these programs and tools. We also include computer programs to generate movies based on PCs and ENM modes and describe how to visualize them.
The Use of Experimental Structures to Model Protein Dynamics
Katebi, Ataur R.; Sankar, Kannan; Jia, Kejue; Jernigan, Robert L.
2014-01-01
Summary The number of solved protein structures submitted in the Protein Data Bank (PDB) has increased dramatically in recent years. For some specific proteins, this number is very high – for example, there are over 550 solved structures for HIV-1 protease, one protein that is essential for the life cycle of human immunodeficiency virus (HIV) which causes acquired immunodeficiency syndrome (AIDS) in humans. The large number of structures for the same protein and its variants include a sample of different conformational states of the protein. A rich set of structures solved experimentally for the same protein has information buried within the dataset that can explain the functional dynamics and structural mechanism of the protein. To extract the dynamics information and functional mechanism from the experimental structures, this chapter focuses on two methods – Principal Component Analysis (PCA) and Elastic Network Models (ENM). PCA is a widely used statistical dimensionality reduction technique to classify and visualize high-dimensional data. On the other hand, ENMs are well-established simple biophysical method for modeling the functionally important global motions of proteins. This chapter covers the basics of these two. Moreover, an improved ENM version that utilizes the variations found within a given set of structures for a protein is described. As a practical example, we have extracted the functional dynamics and mechanism of HIV-1 protease dimeric structure by using a set of 329 PDB structures of this protein. We have described, step by step, how to select a set of protein structures, how to extract the needed information from the PDB files for PCA, how to extract the dynamics information using PCA, how to calculate ENM modes, how to measure the congruency between the dynamics computed from the principal components (PCs) and the ENM modes, and how to compute entropies using the PCs. We provide the computer programs or references to software tools to accomplish each step and show how to use these programs and tools. We also include computer programs to generate movies based on PCs and ENM modes and describe how to visualize them. PMID:25330965
Fast and anisotropic flexibility-rigidity index for protein flexibility and fluctuation analysis
NASA Astrophysics Data System (ADS)
Opron, Kristopher; Xia, Kelin; Wei, Guo-Wei
2014-06-01
Protein structural fluctuation, typically measured by Debye-Waller factors, or B-factors, is a manifestation of protein flexibility, which strongly correlates to protein function. The flexibility-rigidity index (FRI) is a newly proposed method for the construction of atomic rigidity functions required in the theory of continuum elasticity with atomic rigidity, which is a new multiscale formalism for describing excessively large biomolecular systems. The FRI method analyzes protein rigidity and flexibility and is capable of predicting protein B-factors without resorting to matrix diagonalization. A fundamental assumption used in the FRI is that protein structures are uniquely determined by various internal and external interactions, while the protein functions, such as stability and flexibility, are solely determined by the structure. As such, one can predict protein flexibility without resorting to the protein interaction Hamiltonian. Consequently, bypassing the matrix diagonalization, the original FRI has a computational complexity of O(N^2). This work introduces a fast FRI (fFRI) algorithm for the flexibility analysis of large macromolecules. The proposed fFRI further reduces the computational complexity to O(N). Additionally, we propose anisotropic FRI (aFRI) algorithms for the analysis of protein collective dynamics. The aFRI algorithms permit adaptive Hessian matrices, from a completely global 3N × 3N matrix to completely local 3 × 3 matrices. These 3 × 3 matrices, despite being calculated locally, also contain non-local correlation information. Eigenvectors obtained from the proposed aFRI algorithms are able to demonstrate collective motions. Moreover, we investigate the performance of FRI by employing four families of radial basis correlation functions. Both parameter optimized and parameter-free FRI methods are explored. Furthermore, we compare the accuracy and efficiency of FRI with some established approaches to flexibility analysis, namely, normal mode analysis and Gaussian network model (GNM). The accuracy of the FRI method is tested using four sets of proteins, three sets of relatively small-, medium-, and large-sized structures and an extended set of 365 proteins. A fifth set of proteins is used to compare the efficiency of the FRI, fFRI, aFRI, and GNM methods. Intensive validation and comparison indicate that the FRI, particularly the fFRI, is orders of magnitude more efficient and about 10% more accurate overall than some of the most popular methods in the field. The proposed fFRI is able to predict B-factors for α-carbons of the HIV virus capsid (313 236 residues) in less than 30 seconds on a single processor using only one core. Finally, we demonstrate the application of FRI and aFRI to protein domain analysis.
Fast and anisotropic flexibility-rigidity index for protein flexibility and fluctuation analysis
DOE Office of Scientific and Technical Information (OSTI.GOV)
Opron, Kristopher; Xia, Kelin; Wei, Guo-Wei, E-mail: wei@math.msu.edu
Protein structural fluctuation, typically measured by Debye-Waller factors, or B-factors, is a manifestation of protein flexibility, which strongly correlates to protein function. The flexibility-rigidity index (FRI) is a newly proposed method for the construction of atomic rigidity functions required in the theory of continuum elasticity with atomic rigidity, which is a new multiscale formalism for describing excessively large biomolecular systems. The FRI method analyzes protein rigidity and flexibility and is capable of predicting protein B-factors without resorting to matrix diagonalization. A fundamental assumption used in the FRI is that protein structures are uniquely determined by various internal and external interactions,more » while the protein functions, such as stability and flexibility, are solely determined by the structure. As such, one can predict protein flexibility without resorting to the protein interaction Hamiltonian. Consequently, bypassing the matrix diagonalization, the original FRI has a computational complexity of O(N{sup 2}). This work introduces a fast FRI (fFRI) algorithm for the flexibility analysis of large macromolecules. The proposed fFRI further reduces the computational complexity to O(N). Additionally, we propose anisotropic FRI (aFRI) algorithms for the analysis of protein collective dynamics. The aFRI algorithms permit adaptive Hessian matrices, from a completely global 3N × 3N matrix to completely local 3 × 3 matrices. These 3 × 3 matrices, despite being calculated locally, also contain non-local correlation information. Eigenvectors obtained from the proposed aFRI algorithms are able to demonstrate collective motions. Moreover, we investigate the performance of FRI by employing four families of radial basis correlation functions. Both parameter optimized and parameter-free FRI methods are explored. Furthermore, we compare the accuracy and efficiency of FRI with some established approaches to flexibility analysis, namely, normal mode analysis and Gaussian network model (GNM). The accuracy of the FRI method is tested using four sets of proteins, three sets of relatively small-, medium-, and large-sized structures and an extended set of 365 proteins. A fifth set of proteins is used to compare the efficiency of the FRI, fFRI, aFRI, and GNM methods. Intensive validation and comparison indicate that the FRI, particularly the fFRI, is orders of magnitude more efficient and about 10% more accurate overall than some of the most popular methods in the field. The proposed fFRI is able to predict B-factors for α-carbons of the HIV virus capsid (313 236 residues) in less than 30 seconds on a single processor using only one core. Finally, we demonstrate the application of FRI and aFRI to protein domain analysis.« less
Preparation of Protein Samples for NMR Structure, Function, and Small Molecule Screening Studies
Acton, Thomas B.; Xiao, Rong; Anderson, Stephen; Aramini, James; Buchwald, William A.; Ciccosanti, Colleen; Conover, Ken; Everett, John; Hamilton, Keith; Huang, Yuanpeng Janet; Janjua, Haleema; Kornhaber, Gregory; Lau, Jessica; Lee, Dong Yup; Liu, Gaohua; Maglaqui, Melissa; Ma, Lichung; Mao, Lei; Patel, Dayaban; Rossi, Paolo; Sahdev, Seema; Shastry, Ritu; Swapna, G.V.T.; Tang, Yeufeng; Tong, Saichiu; Wang, Dongyan; Wang, Huang; Zhao, Li; Montelione, Gaetano T.
2014-01-01
In this chapter, we concentrate on the production of high quality protein samples for NMR studies. In particular, we provide an in-depth description of recent advances in the production of NMR samples and their synergistic use with recent advancements in NMR hardware. We describe the protein production platform of the Northeast Structural Genomics Consortium, and outline our high-throughput strategies for producing high quality protein samples for nuclear magnetic resonance (NMR) studies. Our strategy is based on the cloning, expression and purification of 6X-His-tagged proteins using T7-based Escherichia coli systems and isotope enrichment in minimal media. We describe 96-well ligation-independent cloning and analytical expression systems, parallel preparative scale fermentation, and high-throughput purification protocols. The 6X-His affinity tag allows for a similar two-step purification procedure implemented in a parallel high-throughput fashion that routinely results in purity levels sufficient for NMR studies (> 97% homogeneity). Using this platform, the protein open reading frames of over 17,500 different targeted proteins (or domains) have been cloned as over 28,000 constructs. Nearly 5,000 of these proteins have been purified to homogeneity in tens of milligram quantities (see Summary Statistics, http://nesg.org/statistics.html), resulting in more than 950 new protein structures, including more than 400 NMR structures, deposited in the Protein Data Bank. The Northeast Structural Genomics Consortium pipeline has been effective in producing protein samples of both prokaryotic and eukaryotic origin. Although this paper describes our entire pipeline for producing isotope-enriched protein samples, it focuses on the major updates introduced during the last 5 years (Phase 2 of the National Institute of General Medical Sciences Protein Structure Initiative). Our advanced automated and/or parallel cloning, expression, purification, and biophysical screening technologies are suitable for implementation in a large individual laboratory or by a small group of collaborating investigators for structural biology, functional proteomics, ligand screening and structural genomics research. PMID:21371586
The Popeye Domain Containing Genes and Their Function as cAMP Effector Proteins in Striated Muscle.
Brand, Thomas
2018-03-13
The Popeye domain containing (POPDC) genes encode transmembrane proteins, which are abundantly expressed in striated muscle cells. Hallmarks of the POPDC proteins are the presence of three transmembrane domains and the Popeye domain, which makes up a large part of the cytoplasmic portion of the protein and functions as a cAMP-binding domain. Interestingly, despite the prediction of structural similarity between the Popeye domain and other cAMP binding domains, at the protein sequence level they strongly differ from each other suggesting an independent evolutionary origin of POPDC proteins. Loss-of-function experiments in zebrafish and mouse established an important role of POPDC proteins for cardiac conduction and heart rate adaptation after stress. Loss-of function mutations in patients have been associated with limb-girdle muscular dystrophy and AV-block. These data suggest an important role of these proteins in the maintenance of structure and function of striated muscle cells.
Naranjo, Yandi; Pons, Miquel; Konrat, Robert
2012-01-01
The number of existing protein sequences spans a very small fraction of sequence space. Natural proteins have overcome a strong negative selective pressure to avoid the formation of insoluble aggregates. Stably folded globular proteins and intrinsically disordered proteins (IDPs) use alternative solutions to the aggregation problem. While in globular proteins folding minimizes the access to aggregation prone regions, IDPs on average display large exposed contact areas. Here, we introduce the concept of average meta-structure correlation maps to analyze sequence space. Using this novel conceptual view we show that representative ensembles of folded and ID proteins show distinct characteristics and respond differently to sequence randomization. By studying the way evolutionary constraints act on IDPs to disable a negative function (aggregation) we might gain insight into the mechanisms by which function-enabling information is encoded in IDPs.
Cell-free protein synthesis for structure determination by X-ray crystallography.
Watanabe, Miki; Miyazono, Ken-ichi; Tanokura, Masaru; Sawasaki, Tatsuya; Endo, Yaeta; Kobayashi, Ichizo
2010-01-01
Structure determination has been difficult for those proteins that are toxic to the cells and cannot be prepared in a large amount in vivo. These proteins, even when biologically very interesting, tend to be left uncharacterized in the structural genomics projects. Their cell-free synthesis can bypass the toxicity problem. Among the various cell-free systems, the wheat-germ-based system is of special interest due to the following points: (1) Because the gene is placed under a plant translational signal, its toxic expression in a bacterial host is reduced. (2) It has only little codon preference and, especially, little discrimination between methionine and selenomethionine (SeMet), which allows easy preparation of selenomethionylated proteins for crystal structure determination by SAD and MAD methods. (3) Translation is uncoupled from transcription, so that the toxicity of the translation product on DNA and its transcription, if any, can be bypassed. We have shown that the wheat-germ-based cell-free protein synthesis is useful for X-ray crystallography of one of the 4-bp cutter restriction enzymes, which are expected to be very toxic to all forms of cells retaining the genome. Our report on its structure represents the first report of structure determination by X-ray crystallography using protein overexpressed with the wheat-germ-based cell-free protein expression system. This will be a method of choice for cytotoxic proteins when its cost is not a problem. Its use will become popular when the crystal structure determination technology has evolved to require only a tiny amount of protein.
Dias, José; Renault, Louis; Pérez, Javier; Mirande, Marc
2013-01-01
In animal cells, nine aminoacyl-tRNA synthetases are associated with the three auxiliary proteins p18, p38, and p43 to form a stable and conserved large multi-aminoacyl-tRNA synthetase complex (MARS), whose molecular mass has been proposed to be between 1.0 and 1.5 MDa. The complex acts as a molecular hub for coordinating protein synthesis and diverse regulatory signal pathways. Electron microscopy studies defined its low resolution molecular envelope as an overall rather compact, asymmetric triangular shape. Here, we have analyzed the composition and homogeneity of the native mammalian MARS isolated from rabbit liver and characterized its overall internal structure, size, and shape at low resolution by hydrodynamic methods and small-angle x-ray scattering in solution. Our data reveal that the MARS exhibits a much more elongated and multi-armed shape than expected from previous reports. The hydrodynamic and structural features of the MARS are large compared with other supramolecular assemblies involved in translation, including ribosome. The large dimensions and non-compact structural organization of MARS favor a large protein surface accessibility for all its components. This may be essential to allow structural rearrangements between the catalytic and cis-acting tRNA binding domains of the synthetases required for binding the bulky tRNA substrates. This non-compact architecture may also contribute to the spatiotemporal controlled release of some of its components, which participate in non-canonical functions after dissociation from the complex. PMID:23836901
Micsonai, András; Wien, Frank; Bulyáki, Éva; Kun, Judit; Moussong, Éva; Lee, Young-Ho; Goto, Yuji; Réfrégiers, Matthieu; Kardos, József
2018-06-11
Circular dichroism (CD) spectroscopy is a widely used method to study the protein secondary structure. However, for decades, the general opinion was that the correct estimation of β-sheet content is challenging because of the large spectral and structural diversity of β-sheets. Recently, we showed that the orientation and twisting of β-sheets account for the observed spectral diversity, and developed a new method to estimate accurately the secondary structure (PNAS, 112, E3095). BeStSel web server provides the Beta Structure Selection method to analyze the CD spectra recorded by conventional or synchrotron radiation CD equipment. Both normalized and measured data can be uploaded to the server either as a single spectrum or series of spectra. The originality of BeStSel is that it carries out a detailed secondary structure analysis providing information on eight secondary structure components including parallel-β structure and antiparallel β-sheets with three different groups of twist. Based on these, it predicts the protein fold down to the topology/homology level of the CATH protein fold classification. The server also provides a module to analyze the structures deposited in the PDB for BeStSel secondary structure contents in relation to Dictionary of Secondary Structure of Proteins data. The BeStSel server is freely accessible at http://bestsel.elte.hu.
Andersen, Ole Juul; Grouleff, Julie; Needham, Perri; Walker, Ross C; Jensen, Frank
2015-11-19
Current enhanced sampling molecular dynamics methods for studying large conformational changes in proteins suffer from certain limitations. These include, among others, the need for user defined collective variables, the prerequisite of both start and end point structures of the conformational change, and the need for a priori knowledge of the amount by which to boost specific parts of the potential. In this paper, a framework is proposed for a molecular dynamics method for studying ligand-induced conformational changes, in which the nonbonded interactions between the ligand and the protein are used to calculate a biasing force. The method requires only a single input structure, and does not entail the use of collective variables. We provide a proof-of-concept for accelerating conformational changes in three simple test molecules, as well as promising results for two proteins known to undergo domain closure upon ligand binding. For the ribose-binding protein, backbone root-mean-square deviations as low as 0.75 Å compared to the crystal structure of the closed conformation are obtained within 50 ns simulations, whereas no domain closures are observed in unbiased simulations. A skewed closed structure is obtained for the glutamine-binding protein at high bias values, indicating that specific protein-ligand interactions might suppress important protein-protein interactions.
The role of internal duplication in the evolution of multi-domain proteins.
Nacher, J C; Hayashida, M; Akutsu, T
2010-08-01
Many proteins consist of several structural domains. These multi-domain proteins have likely been generated by selective genome growth dynamics during evolution to perform new functions as well as to create structures that fold on a biologically feasible time scale. Domain units frequently evolved through a variety of genetic shuffling mechanisms. Here we examine the protein domain statistics of more than 1000 organisms including eukaryotic, archaeal and bacterial species. The analysis extends earlier findings on asymmetric statistical laws for proteome to a wider variety of species. While proteins are composed of a wide range of domains, displaying a power-law decay, the computation of domain families for each protein reveals an exponential distribution, characterizing a protein universe composed of a thin number of unique families. Structural studies in proteomics have shown that domain repeats, or internal duplicated domains, represent a small but significant fraction of genome. In spite of its importance, this observation has been largely overlooked until recently. We model the evolutionary dynamics of proteome and demonstrate that these distinct distributions are in fact rooted in an internal duplication mechanism. This process generates the contemporary protein structural domain universe, determines its reduced thickness, and tames its growth. These findings have important implications, ranging from protein interaction network modeling to evolutionary studies based on fundamental mechanisms governing genome expansion.
ModeRNA: a tool for comparative modeling of RNA 3D structure
Rother, Magdalena; Rother, Kristian; Puton, Tomasz; Bujnicki, Janusz M.
2011-01-01
RNA is a large group of functionally important biomacromolecules. In striking analogy to proteins, the function of RNA depends on its structure and dynamics, which in turn is encoded in the linear sequence. However, while there are numerous methods for computational prediction of protein three-dimensional (3D) structure from sequence, with comparative modeling being the most reliable approach, there are very few such methods for RNA. Here, we present ModeRNA, a software tool for comparative modeling of RNA 3D structures. As an input, ModeRNA requires a 3D structure of a template RNA molecule, and a sequence alignment between the target to be modeled and the template. It must be emphasized that a good alignment is required for successful modeling, and for large and complex RNA molecules the development of a good alignment usually requires manual adjustments of the input data based on previous expertise of the respective RNA family. ModeRNA can model post-transcriptional modifications, a functionally important feature analogous to post-translational modifications in proteins. ModeRNA can also model DNA structures or use them as templates. It is equipped with many functions for merging fragments of different nucleic acid structures into a single model and analyzing their geometry. Windows and UNIX implementations of ModeRNA with comprehensive documentation and a tutorial are freely available. PMID:21300639
WONKA: objective novel complex analysis for ensembles of protein-ligand structures.
Bradley, A R; Wall, I D; von Delft, F; Green, D V S; Deane, C M; Marsden, B D
2015-10-01
WONKA is a tool for the systematic analysis of an ensemble of protein-ligand structures. It makes the identification of conserved and unusual features within such an ensemble straightforward. WONKA uses an intuitive workflow to process structural co-ordinates. Ligand and protein features are summarised and then presented within an interactive web application. WONKA's power in consolidating and summarising large amounts of data is described through the analysis of three bromodomain datasets. Furthermore, and in contrast to many current methods, WONKA relates analysis to individual ligands, from which we find unusual and erroneous binding modes. Finally the use of WONKA as an annotation tool to share observations about structures is demonstrated. WONKA is freely available to download and install locally or can be used online at http://wonka.sgc.ox.ac.uk.
Internally bridging water molecule in transmembrane alpha-helical kink.
Miyano, Masashi; Ago, Hideo; Saino, Hiromichi; Hori, Tetsuya; Ida, Koh
2010-08-01
There are hundreds of membrane protein atomic coordinates in the Protein Data Bank (PDB), and high-resolution structures of better than 2.5 A enable the visualization of a sizable number of amphiphiles (lipid and/or detergent) and bound water molecules as essential parts of the structure. Upon scrutinizing these high-resolution structures, water molecules were found to 'wedge' and stabilize large kink angle (30-40 degrees) in a simple cylindrical model at the transmembrane helical kinks so as to form an inter-helical cavity to accommodate a ligand binding or active site as a crucial structural feature in alpha-helical integral membrane proteins. Furthermore, some of these water molecules are proposed to play a pivotal role of their conformational change to exert their functional regulation. Copyright (c) 2010 Elsevier Ltd. All rights reserved.
Dewhurst, Henry M; Choudhury, Shilpa; Torres, Matthew P
2015-08-01
Predicting the biological function potential of post-translational modifications (PTMs) is becoming increasingly important in light of the exponential increase in available PTM data from high-throughput proteomics. We developed structural analysis of PTM hotspots (SAPH-ire)--a quantitative PTM ranking method that integrates experimental PTM observations, sequence conservation, protein structure, and interaction data to allow rank order comparisons within or between protein families. Here, we applied SAPH-ire to the study of PTMs in diverse G protein families, a conserved and ubiquitous class of proteins essential for maintenance of intracellular structure (tubulins) and signal transduction (large and small Ras-like G proteins). A total of 1728 experimentally verified PTMs from eight unique G protein families were clustered into 451 unique hotspots, 51 of which have a known and cited biological function or response. Using customized software, the hotspots were analyzed in the context of 598 unique protein structures. By comparing distributions of hotspots with known versus unknown function, we show that SAPH-ire analysis is predictive for PTM biological function. Notably, SAPH-ire revealed high-ranking hotspots for which a functional impact has not yet been determined, including phosphorylation hotspots in the N-terminal tails of G protein gamma subunits--conserved protein structures never before reported as regulators of G protein coupled receptor signaling. To validate this prediction we used the yeast model system for G protein coupled receptor signaling, revealing that gamma subunit-N-terminal tail phosphorylation is activated in response to G protein coupled receptor stimulation and regulates protein stability in vivo. These results demonstrate the utility of integrating protein structural and sequence features into PTM prioritization schemes that can improve the analysis and functional power of modification-specific proteomics data. © 2015 by The American Society for Biochemistry and Molecular Biology, Inc.
Characterizing protein domain associations by Small-molecule ligand binding
Li, Qingliang; Cheng, Tiejun; Wang, Yanli; Bryant, Stephen H.
2012-01-01
Background Protein domains are evolutionarily conserved building blocks for protein structure and function, which are conventionally identified based on protein sequence or structure similarity. Small molecule binding domains are of great importance for the recognition of small molecules in biological systems and drug development. Many small molecules, including drugs, have been increasingly identified to bind to multiple targets, leading to promiscuous interactions with protein domains. Thus, a large scale characterization of the protein domains and their associations with respect to small-molecule binding is of particular interest to system biology research, drug target identification, as well as drug repurposing. Methods We compiled a collection of 13,822 physical interactions of small molecules and protein domains derived from the Protein Data Bank (PDB) structures. Based on the chemical similarity of these small molecules, we characterized pairwise associations of the protein domains and further investigated their global associations from a network point of view. Results We found that protein domains, despite lack of similarity in sequence and structure, were comprehensively associated through binding the same or similar small-molecule ligands. Moreover, we identified modules in the domain network that consisted of closely related protein domains by sharing similar biochemical mechanisms, being involved in relevant biological pathways, or being regulated by the same cognate cofactors. Conclusions A novel protein domain relationship was identified in the context of small-molecule binding, which is complementary to those identified by traditional sequence-based or structure-based approaches. The protein domain network constructed in the present study provides a novel perspective for chemogenomic study and network pharmacology, as well as target identification for drug repurposing. PMID:23745168
Electrostatics, structure prediction, and the energy landscapes for protein folding and binding.
Tsai, Min-Yeh; Zheng, Weihua; Balamurugan, D; Schafer, Nicholas P; Kim, Bobby L; Cheung, Margaret S; Wolynes, Peter G
2016-01-01
While being long in range and therefore weakly specific, electrostatic interactions are able to modulate the stability and folding landscapes of some proteins. The relevance of electrostatic forces for steering the docking of proteins to each other is widely acknowledged, however, the role of electrostatics in establishing specifically funneled landscapes and their relevance for protein structure prediction are still not clear. By introducing Debye-Hückel potentials that mimic long-range electrostatic forces into the Associative memory, Water mediated, Structure, and Energy Model (AWSEM), a transferable protein model capable of predicting tertiary structures, we assess the effects of electrostatics on the landscapes of thirteen monomeric proteins and four dimers. For the monomers, we find that adding electrostatic interactions does not improve structure prediction. Simulations of ribosomal protein S6 show, however, that folding stability depends monotonically on electrostatic strength. The trend in predicted melting temperatures of the S6 variants agrees with experimental observations. Electrostatic effects can play a range of roles in binding. The binding of the protein complex KIX-pKID is largely assisted by electrostatic interactions, which provide direct charge-charge stabilization of the native state and contribute to the funneling of the binding landscape. In contrast, for several other proteins, including the DNA-binding protein FIS, electrostatics causes frustration in the DNA-binding region, which favors its binding with DNA but not with its protein partner. This study highlights the importance of long-range electrostatics in functional responses to problems where proteins interact with their charged partners, such as DNA, RNA, as well as membranes. © 2015 The Protein Society.
Joseph, Agnel Praveen; Srinivasan, Narayanaswamy; de Brevern, Alexandre G
2012-09-01
Comparison of multiple protein structures has a broad range of applications in the analysis of protein structure, function and evolution. Multiple structure alignment tools (MSTAs) are necessary to obtain a simultaneous comparison of a family of related folds. In this study, we have developed a method for multiple structure comparison largely based on sequence alignment techniques. A widely used Structural Alphabet named Protein Blocks (PBs) was used to transform the information on 3D protein backbone conformation as a 1D sequence string. A progressive alignment strategy similar to CLUSTALW was adopted for multiple PB sequence alignment (mulPBA). Highly similar stretches identified by the pairwise alignments are given higher weights during the alignment. The residue equivalences from PB based alignments are used to obtain a three dimensional fit of the structures followed by an iterative refinement of the structural superposition. Systematic comparisons using benchmark datasets of MSTAs underlines that the alignment quality is better than MULTIPROT, MUSTANG and the alignments in HOMSTRAD, in more than 85% of the cases. Comparison with other rigid-body and flexible MSTAs also indicate that mulPBA alignments are superior to most of the rigid-body MSTAs and highly comparable to the flexible alignment methods. Copyright © 2012 Elsevier Masson SAS. All rights reserved.
Supramolecular Structures with Blood Plasma Proteins, Sugars and Nanosilica
NASA Astrophysics Data System (ADS)
Turov, V. V.; Gun'ko, V. M.; Galagan, N. P.; Rugal, A. A.; Barvinchenko, V. M.; Gorbyk, P. P.
Supramolecular structures with blood plasma proteins (albumin, immunoglobulin and fibrinogen (HPF)), protein/water/silica and protein/water/ silica/sugar (glucose, fructose and saccharose) were studied by NMR, adsorption, IR and UV spectroscopy methods. Hydration parameters, amounts of weakly and strongly bound waters and interfacial energy (γ S) were determined over a wide range of component concentrations. The γ S(C protein,C silica) graphs were used to estimate the energy of protein-protein, protein-surface and particle-particle interactions. It was shown that interfacial energy of self-association (γ as) of protein molecules depends on a type of proteins. A large fraction of water bound to proteins can be displaced by sugars, and the effect of disaccharide (saccharose) was greater than that of monosugars. Changes in the structural parameters of cavities in HPF molecules and complexes with HPF/silica nanoparticles filled by bound water were analysed using NMR-cryoporometry showing that interaction of proteins with silica leads to a significant decrease in the amounts of water bound to both protein and silica surfaces. Bionanocomposites with BSA/nanosilica/sugar can be used to influence states of living cells and tissues after cryopreservation or other treatments. It was shown that interaction of proteins with silica leads to strong decrease in the volume of all types of internal cavities filled by water.
2017-01-01
Recent advances in understanding protein folding have benefitted from coarse-grained representations of protein structures. Empirical energy functions derived from these techniques occasionally succeed in distinguishing native structures from their corresponding ensembles of nonnative folds or decoys which display varying degrees of structural dissimilarity to the native proteins. Here we utilized atomic coordinates of single protein chains, comprising a large diverse training set, to develop and evaluate twelve all-atom four-body statistical potentials obtained by exploring alternative values for a pair of inherent parameters. Delaunay tessellation was performed on the atomic coordinates of each protein to objectively identify all quadruplets of interacting atoms, and atomic potentials were generated via statistical analysis of the data and implementation of the inverted Boltzmann principle. Our potentials were evaluated using benchmarking datasets from Decoys-‘R'-Us, and comparisons were made with twelve other physics- and knowledge-based potentials. Ranking 3rd, our best potential tied CHARMM19 and surpassed AMBER force field potentials. We illustrate how a generalized version of our potential can be used to empirically calculate binding energies for target-ligand complexes, using HIV-1 protease-inhibitor complexes for a practical application. The combined results suggest an accurate and efficient atomic four-body statistical potential for protein structure prediction and assessment. PMID:29119109
NMR relaxation studies on the hydrate layer of intrinsically unstructured proteins.
Bokor, Mónika; Csizmók, Veronika; Kovács, Dénes; Bánki, Péter; Friedrich, Peter; Tompa, Peter; Tompa, Kálmán
2005-03-01
Intrinsically unstructured/disordered proteins (IUPs) exist in a disordered and largely solvent-exposed, still functional, structural state under physiological conditions. As their function is often directly linked with structural disorder, understanding their structure-function relationship in detail is a great challenge to structural biology. In particular, their hydration and residual structure, both closely linked with their mechanism of action, require close attention. Here we demonstrate that the hydration of IUPs can be adequately approached by a technique so far unexplored with respect to IUPs, solid-state NMR relaxation measurements. This technique provides quantitative information on various features of hydrate water bound to these proteins. By freezing nonhydrate (bulk) water out, we have been able to measure free induction decays pertaining to protons of bound water from which the amount of hydrate water, its activation energy, and correlation times could be calculated. Thus, for three IUPs, the first inhibitory domain of calpastatin, microtubule-associated protein 2c, and plant dehydrin early responsive to dehydration 10, we demonstrate that they bind a significantly larger amount of water than globular proteins, whereas their suboptimal hydration and relaxation parameters are correlated with their differing modes of function. The theoretical treatment and experimental approach presented in this article may have general utility in characterizing proteins that belong to this novel structural class.
Jaeger, Vance W; Pfaendtner, Jim
2016-12-01
Ionic liquid (IL) containing solvents can change the structure, dynamics, function, and stability of proteins. In order to investigate the mechanisms by which ILs induce structural changes in a large multidomain protein, we study the interactions of human serum albumin (HSA) with two different ILs, 1-butyl-3-methylimidazolium tetrafluoroborate and choline dihydrogen phosphate. Root mean square deviation and fluctuation calculations indicate that high concentrations of ILs in mixtures with water lead to protein structures that remain close to their crystallographic structures on time scales of hundreds of nanoseconds. To overcome potential time scale limitations due to the high viscosity of the solvent, we employed enhanced sampling techniques to estimate the free energy of an experimentally determined important transition within the protein structure. Metadynamics simulations show that the free energy landscape of the unfolding of loop 1 of domain I is different in the presence of ILs than it is in water, consistent with previously published experimental evidence. We then apply essential dynamics coarse graining to systematically predict differences in the dynamics of proteins solvated in IL-water mixtures versus pure water systems. We also demonstrate that the presence of ionic liquids changes the distribution of intermolecular distances among several ligands, indicating that the protein structure swells in the presence of certain ILs, consistent with experimental evidence.
NASA Astrophysics Data System (ADS)
Sarti, E.; Zamuner, S.; Cossio, P.; Laio, A.; Seno, F.; Trovato, A.
2013-12-01
In protein structure prediction it is of crucial importance, especially at the refinement stage, to score efficiently large sets of models by selecting the ones that are closest to the native state. We here present a new computational tool, BACHSCORE, that allows its users to rank different structural models of the same protein according to their quality, evaluated by using the BACH++ (Bayesian Analysis Conformation Hunt) scoring function. The original BACH statistical potential was already shown to discriminate with very good reliability the protein native state in large sets of misfolded models of the same protein. BACH++ features a novel upgrade in the solvation potential of the scoring function, now computed by adapting the LCPO (Linear Combination of Pairwise Orbitals) algorithm. This change further enhances the already good performance of the scoring function. BACHSCORE can be accessed directly through the web server: bachserver.pd.infn.it. Catalogue identifier: AEQD_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEQD_v1_0.html Program obtainable from: CPC Program Library, Queen’s University, Belfast, N. Ireland Licensing provisions: GNU General Public License version 3 No. of lines in distributed program, including test data, etc.: 130159 No. of bytes in distributed program, including test data, etc.: 24 687 455 Distribution format: tar.gz Programming language: C++. Computer: Any computer capable of running an executable produced by a g++ compiler (4.6.3 version). Operating system: Linux, Unix OS-es. RAM: 1 073 741 824 bytes Classification: 3. Nature of problem: Evaluate the quality of a protein structural model, taking into account the possible “a priori” knowledge of a reference primary sequence that may be different from the amino-acid sequence of the model; the native protein structure should be recognized as the best model. Solution method: The contact potential scores the occurrence of any given type of residue pair in 5 possible contact classes (α-helical contact, parallel β-sheet contact, anti-parallel β-sheet contact, side-chain contact, no contact). The solvation potential scores the occurrence of any residue type in 2 possible environments: buried and solvent exposed. Residue environment is assigned by adapting the LCPO algorithm. Residues present in the reference primary sequence and not present in the model structure contribute to the model score as solvent exposed and as non contacting all other residues. Restrictions: Input format file according to the Protein Data Bank standard Additional comments: Parameter values used in the scoring function can be found in the file /folder-to-bachscore/BACH/examples/bach_std.par. Running time: Roughly one minute to score one hundred structures on a desktop PC, depending on their size.
2013-01-01
Background Human triosephosphate isomerase (HsTIM) deficiency is a genetic disease caused often by the pathogenic mutation E104D. This mutation, located at the side of an abnormally large cluster of water in the inter-subunit interface, reduces the thermostability of the enzyme. Why and how these water molecules are directly related to the excessive thermolability of the mutant have not been investigated in structural biology. Results This work compares the structure of the E104D mutant with its wild type counterparts. It is found that the water topology in the dimer interface of HsTIM is atypical, having a "wet-core-dry-rim" distribution with 16 water molecules tightly packed in a small deep region surrounded by 22 residues including GLU104. These water molecules are co-conserved with their surrounding residues in non-archaeal TIMs (dimers) but not conserved across archaeal TIMs (tetramers), indicating their importance in preserving the overall quaternary structure. As the structural permutation induced by the mutation is not significant, we hypothesize that the excessive thermolability of the E104D mutant is attributed to the easy propagation of atoms' flexibility from the surface into the core via the large cluster of water. It is indeed found that the B factor increment in the wet region is higher than other regions, and, more importantly, the B factor increment in the wet region is maintained in the deeply buried core. Molecular dynamics simulations revealed that for the mutant structure at normal temperature, a clear increase of the root-mean-square deviation is observed for the wet region contacting with the large cluster of interfacial water. Such increase is not observed for other interfacial regions or the whole protein. This clearly suggests that, in the E104D mutant, the large water cluster is responsible for the subunit interface flexibility and overall thermolability, and it ultimately leads to the deficiency of this enzyme. Conclusions Our study reveals that a large cluster of water buried in protein interfaces is fragile and high-maintenance, closely related to the structure, function and evolution of the whole protein. PMID:24564410
Maintenance of a Protein Structure in the Dynamic Evolution of TIMPs over 600 Million Years
Nicosia, Aldo; Maggio, Teresa; Costa, Salvatore; Salamone, Monica; Tagliavia, Marcello; Mazzola, Salvatore; Gianguzza, Fabrizio; Cuttitta, Angela
2016-01-01
Deciphering the events leading to protein evolution represents a challenge, especially for protein families showing complex evolutionary history. Among them, TIMPs represent an ancient eukaryotic protein family widely distributed in the animal kingdom. They are known to control the turnover of the extracellular matrix and are considered to arise early during metazoan evolution, arguably tuning essential features of tissue and epithelial organization. To probe the structure and molecular evolution of TIMPs within metazoans, we report the mining and structural characterization of a large data set of TIMPs over approximately 600 Myr. The TIMPs repertoire was explored starting from the Cnidaria phylum, coeval with the origins of connective tissue, to great apes and humans. Despite dramatic sequence differences compared with highest metazoans, the ancestral proteins displayed the canonical TIMP fold. Only small structural changes, represented by an α-helix located in the N-domain, have occurred over the evolution. Both the occurrence of such secondary structure elements and the relative solvent accessibility of the corresponding residues in the three-dimensional structures raises the possibility that these sites represent unconserved element prone to accept variations. PMID:26957029
NASA Astrophysics Data System (ADS)
Struts, A. V.; Barmasov, A. V.; Brown, M. F.
2016-02-01
This article continues our review of spectroscopic studies of G-protein-coupled receptors. Magnetic resonance methods including electron paramagnetic resonance (EPR) and nuclear magnetic resonance (NMR) provide specific structural and dynamical data for the protein in conjunction with optical methods (vibrational, electronic spectroscopy) as discussed in the accompanying article. An additional advantage is the opportunity to explore the receptor proteins in the natural membrane lipid environment. Solid-state 2H and 13C NMR methods yield information about both the local structure and dynamics of the cofactor bound to the protein and its light-induced changes. Complementary site-directed spin-labeling studies monitor the structural alterations over larger distances and correspondingly longer time scales. A multiscale reaction mechanism describes how local changes of the retinal cofactor unlock the receptor to initiate large-scale conformational changes of rhodopsin. Activation of the G-protein-coupled receptor involves an ensemble of conformational substates within the rhodopsin manifold that characterize the dynamically active receptor.
The design and characterization of protein based block polymers
NASA Astrophysics Data System (ADS)
Haghpanah, Jennifer Shorah
Over the past decades, protein engineering has provided noteworthy advances in basic science as well as in medicine and industry. Protein engineers are currently focusing their efforts on developing elementary rules to design proteins with a specific structure and function. Proteins derived from natural sources have been used generate a plethora of materials with remarkable structural and functional properties. In the first chapter, we show how we can fabricate protein polymers comprised of two different self-assembling domains (SADs). From our studies, we discover that SADs in different orientations have a large impact on their overall microscopic and macroscopic features. In the second chapter, we explore the impact of cellulose (Tc) on the diblocks EC and CE. We discover that Tc is able to selectively impact the mechanical propertied of CE because CE has smaller particle sizes and more E domain exposed on its surface at RT. In the third chapter, we appended an extra C domain to CE to generate CEC with improved mechanical properties, structure and small molecule recognition.
CoMoDo: identifying dynamic protein domains based on covariances of motion.
Wieninger, Silke A; Ullmann, G Matthias
2015-06-09
Most large proteins are built of several domains, compact units which enable functional protein motions. Different domain assignment approaches exist, which mostly rely on concepts of stability, folding, and evolution. We describe the automatic assignment method CoMoDo, which identifies domains based on protein dynamics. Covariances of atomic fluctuations, here calculated by an Elastic Network Model, are used to group residues into domains of different hierarchical levels. The so-called dynamic domains facilitate the study of functional protein motions involved in biological processes like ligand binding and signal transduction. By applying CoMoDo to a large number of proteins, we demonstrate that dynamic domains exhibit features absent in the commonly assigned structural domains, which can deliver insight into the interactions between domains and between subunits of multimeric proteins. CoMoDo is distributed as free open source software at www.bisb.uni-bayreuth.de/CoMoDo.html .
Combining functional and structural genomics to sample the essential Burkholderia structome.
Baugh, Loren; Gallagher, Larry A; Patrapuvich, Rapatbhorn; Clifton, Matthew C; Gardberg, Anna S; Edwards, Thomas E; Armour, Brianna; Begley, Darren W; Dieterich, Shellie H; Dranow, David M; Abendroth, Jan; Fairman, James W; Fox, David; Staker, Bart L; Phan, Isabelle; Gillespie, Angela; Choi, Ryan; Nakazawa-Hewitt, Steve; Nguyen, Mary Trang; Napuli, Alberto; Barrett, Lynn; Buchko, Garry W; Stacy, Robin; Myler, Peter J; Stewart, Lance J; Manoil, Colin; Van Voorhis, Wesley C
2013-01-01
The genus Burkholderia includes pathogenic gram-negative bacteria that cause melioidosis, glanders, and pulmonary infections of patients with cancer and cystic fibrosis. Drug resistance has made development of new antimicrobials critical. Many approaches to discovering new antimicrobials, such as structure-based drug design and whole cell phenotypic screens followed by lead refinement, require high-resolution structures of proteins essential to the parasite. We experimentally identified 406 putative essential genes in B. thailandensis, a low-virulence species phylogenetically similar to B. pseudomallei, the causative agent of melioidosis, using saturation-level transposon mutagenesis and next-generation sequencing (Tn-seq). We selected 315 protein products of these genes based on structure-determination criteria, such as excluding very large and/or integral membrane proteins, and entered them into the Seattle Structural Genomics Center for Infection Disease (SSGCID) structure determination pipeline. To maximize structural coverage of these targets, we applied an "ortholog rescue" strategy for those producing insoluble or difficult to crystallize proteins, resulting in the addition of 387 orthologs (or paralogs) from seven other Burkholderia species into the SSGCID pipeline. This structural genomics approach yielded structures from 31 putative essential targets from B. thailandensis, and 25 orthologs from other Burkholderia species, yielding an overall structural coverage for 49 of the 406 essential gene families, with a total of 88 depositions into the Protein Data Bank. Of these, 25 proteins have properties of a potential antimicrobial drug target i.e., no close human homolog, part of an essential metabolic pathway, and a deep binding pocket. We describe the structures of several potential drug targets in detail. This collection of structures, solubility and experimental essentiality data provides a resource for development of drugs against infections and diseases caused by Burkholderia. All expression clones and proteins created in this study are freely available by request.
MAS NMR of HIV-1 protein assemblies
NASA Astrophysics Data System (ADS)
Suiter, Christopher L.; Quinn, Caitlin M.; Lu, Manman; Hou, Guangjin; Zhang, Huilan; Polenova, Tatyana
2015-04-01
The negative global impact of the AIDS pandemic is well known. In this perspective article, the utility of magic angle spinning (MAS) NMR spectroscopy to answer pressing questions related to the structure and dynamics of HIV-1 protein assemblies is examined. In recent years, MAS NMR has undergone major technological developments enabling studies of large viral assemblies. We discuss some of these evolving methods and technologies and provide a perspective on the current state of MAS NMR as applied to the investigations into structure and dynamics of HIV-1 assemblies of CA capsid protein and of Gag maturation intermediates.
Campanacci, Valérie; Veesler, David; Lichière, Julie; Blangy, Stéphanie; Sciara, Giuliano; Moineau, Sylvain; van Sinderen, Douwe; Bron, Patrick; Cambillau, Christian
2010-10-01
We report here the characterization of several large structural protein complexes forming the baseplates (or part of them) of Siphoviridae phages infecting Lactococcus lactis: TP901-1, Tuc2009 and p2. We revisited a "block cloning" expression strategy and extended this approach to genomic fragments encoding proteins whose interacting partners have not yet been clearly identified. Biophysical characterization of some of these complexes using circular dichroism and size exclusion chromatography, coupled with on-line light scattering and refractometry, demonstrated that the over-produced recombinant proteins interact with each other to form large (up to 1.9MDa) and stable baseplate assemblies. Some of these complexes were characterized by electron microscopy confirming their structural homogeneity as well as providing a picture of their overall molecular shapes and symmetry. Finally, using these results, we were able to highlight similarities and differences with the well characterized much larger baseplate of the myophage T4.
Weininger, Arthur; Weininger, Susan
2015-01-01
The ability to identify the functional correlates of structural and sequence variation in proteins is a critical capability. We related structures of influenza A N10 and N11 proteins that have no established function to structures of proteins with known function by identifying spatially conserved atoms. We identified atoms with common distributed spatial occupancy in PDB structures of N10 protein, N11 protein, an influenza A neuraminidase, an influenza B neuraminidase, and a bacterial neuraminidase. By superposing these spatially conserved atoms, we aligned the structures and associated molecules. We report spatially and sequence invariant residues in the aligned structures. Spatially invariant residues in the N6 and influenza B neuraminidase active sites were found in previously unidentified spatially equivalent sites in the N10 and N11 proteins. We found the corresponding secondary and tertiary structures of the aligned proteins to be largely identical despite significant sequence divergence. We found structural precedent in known non-neuraminidase structures for residues exhibiting structural and sequence divergence in the aligned structures. In N10 protein, we identified staphylococcal enterotoxin I-like domains. In N11 protein, we identified hepatitis E E2S-like domains, SARS spike protein-like domains, and toxin components shared by alpha-bungarotoxin, staphylococcal enterotoxin I, anthrax lethal factor, clostridium botulinum neurotoxin, and clostridium tetanus toxin. The presence of active site components common to the N6, influenza B, and S. pneumoniae neuraminidases in the N10 and N11 proteins, combined with the absence of apparent neuraminidase function, suggests that the role of neuraminidases in H17N10 and H18N11 emerging influenza A viruses may have changed. The presentation of E2S-like, SARS spike protein-like, or toxin-like domains by the N10 and N11 proteins in these emerging viruses may indicate that H17N10 and H18N11 sialidase-facilitated cell entry has been supplemented or replaced by sialidase-independent receptor binding to an expanded cell population that may include neurons and T-cells. PMID:25706124
Prediction of physical protein protein interactions
NASA Astrophysics Data System (ADS)
Szilágyi, András; Grimm, Vera; Arakaki, Adrián K.; Skolnick, Jeffrey
2005-06-01
Many essential cellular processes such as signal transduction, transport, cellular motion and most regulatory mechanisms are mediated by protein-protein interactions. In recent years, new experimental techniques have been developed to discover the protein-protein interaction networks of several organisms. However, the accuracy and coverage of these techniques have proven to be limited, and computational approaches remain essential both to assist in the design and validation of experimental studies and for the prediction of interaction partners and detailed structures of protein complexes. Here, we provide a critical overview of existing structure-independent and structure-based computational methods. Although these techniques have significantly advanced in the past few years, we find that most of them are still in their infancy. We also provide an overview of experimental techniques for the detection of protein-protein interactions. Although the developments are promising, false positive and false negative results are common, and reliable detection is possible only by taking a consensus of different experimental approaches. The shortcomings of experimental techniques affect both the further development and the fair evaluation of computational prediction methods. For an adequate comparative evaluation of prediction and high-throughput experimental methods, an appropriately large benchmark set of biophysically characterized protein complexes would be needed, but is sorely lacking.
DNA Nanotubes for NMR Structure Determination of Membrane Proteins
Bellot, Gaëtan; McClintock, Mark A.; Chou, James J; Shih, William M.
2013-01-01
Structure determination of integral membrane proteins by solution NMR represents one of the most important challenges of structural biology. A Residual-Dipolar-Coupling-based refinement approach can be used to solve the structure of membrane proteins up to 40 kDa in size, however, a weak-alignment medium that is detergent-resistant is required. Previously, availability of media suitable for weak alignment of membrane proteins was severely limited. We describe here a protocol for robust, large-scale synthesis of detergent-resistant DNA nanotubes that can be assembled into dilute liquid crystals for application as weak-alignment media in solution NMR structure determination of membrane proteins in detergent micelles. The DNA nanotubes are heterodimers of 400nm-long six-helix bundles each self-assembled from a M13-based p7308 scaffold strand and >170 short oligonucleotide staple strands. Compatibility with proteins bearing considerable positive charge as well as modulation of molecular alignment, towards collection of linearly independent restraints, can be introduced by reducing the negative charge of DNA nanotubes via counter ions and small DNA binding molecules. This detergent-resistant liquid-crystal media offers a number of properties conducive for membrane protein alignment, including high-yield production, thermal stability, buffer compatibility, and structural programmability. Production of sufficient nanotubes for 4–5 NMR experiments can be completed in one week by a single individual. PMID:23518667
Zhukov, I.; Jaroszewski, L.; Bierzyński, A.
2000-01-01
Protein molecules can accommodate a large number of mutations without noticeable effects on their stability and folding kinetics. On the other hand, some mutations can have quite strong effects on protein conformational properties. Such mutations either destabilize secondary structures, e.g., alpha-helices, are incompatible with close packing of protein hydrophobic cores, or lead to disruption of some specific interactions such as disulfide cross links, salt bridges, hydrogen bonds, or aromatic-aromatic contacts. The Met8 --> Leu mutation in CMTI-I results in significant destabilization of the protein structure. This effect could hardly be expected since the mutation is highly conservative, and the side chain of residue 8 is situated on the protein surface. We show that the protein destabilization is caused by rearrangement of a hydrophobic cluster formed by side chains of residues 8, Ile6, and Leu17 that leads to partial breaking of a hydrogen bond formed by the amide group of Leu17 with water and to a reduction of a hydrophobic surface buried within the cluster. The mutation perturbs also the protein folding. In aerobic conditions the reduced wild-type protein folds effectively into its native structure, whereas more then 75% of the mutant molecules are trapped in various misfolded species. The main conclusion of this work is that conservative mutations of hydrophobic residues can destabilize a protein structure even if these residues are situated on the protein surface and partially accessible to water. Structural rearrangement of small hydrophobic clusters formed by such residues can lead to local changes in protein hydration, and consequently, can affect considerably protein stability and folding process. PMID:10716179
Zheng, Wenjun
2017-01-10
Dynactin, a large multiprotein complex, binds with the cytoplasmic dynein-1 motor and various adaptor proteins to allow recruitment and transportation of cellular cargoes toward the minus end of microtubules. The structure of the dynactin complex is built around an actin-like minifilament with a defined length, which has been visualized in a high-resolution structure of the dynactin filament determined by cryo-electron microscopy (cryo-EM). To understand the energetic basis of dynactin filament assembly, we used molecular dynamics simulation to probe the intersubunit interactions among the actin-like proteins, various capping proteins, and four extended regions of the dynactin shoulder. Our simulations revealed stronger intersubunit interactions at the barbed and pointed ends of the filament and involving the extended regions (compared with the interactions within the filament), which may energetically drive filament termination by the capping proteins and recruitment of the actin-like proteins by the extended regions, two key features of the dynactin filament assembly process. Next, we modeled the unknown binding configuration among dynactin, dynein tails, and a number of coiled-coil adaptor proteins (including several Bicaudal-D and related proteins and three HOOK proteins), and predicted a key set of charged residues involved in their electrostatic interactions. Our modeling is consistent with previous findings of conserved regions, functional sites, and disease mutations in the adaptor proteins and will provide a structural framework for future functional and mutational studies of these adaptor proteins. In sum, this study yielded rich structural and energetic information about dynactin and associated adaptor proteins that cannot be directly obtained from the cryo-EM structures with limited resolutions.
The limits of protein sequence comparison?
Pearson, William R; Sierk, Michael L
2010-01-01
Modern sequence alignment algorithms are used routinely to identify homologous proteins, proteins that share a common ancestor. Homologous proteins always share similar structures and often have similar functions. Over the past 20 years, sequence comparison has become both more sensitive, largely because of profile-based methods, and more reliable, because of more accurate statistical estimates. As sequence and structure databases become larger, and comparison methods become more powerful, reliable statistical estimates will become even more important for distinguishing similarities that are due to homology from those that are due to analogy (convergence). The newest sequence alignment methods are more sensitive than older methods, but more accurate statistical estimates are needed for their full power to be realized. PMID:15919194
Oezguen, Numan; Zhou, Bin; Negi, Surendra S.; Ivanciuc, Ovidiu; Schein, Catherine H.; Labesse, Gilles; Braun, Werner
2008-01-01
Similarities in sequences and 3D structures of allergenic proteins provide vital clues to identify clinically relevant IgE cross-reactivities. However, experimental 3D structures are available in the Protein Data Bank for only 5% (45/829) of all allergens catalogued in the Structural Database of Allergenic Proteins (SDAP, http://fermi.utmb.edu/SDAP). Here, an automated procedure was used to prepare 3D-models of all allergens where there was no experimentally determined 3D structure or high identity (95%) to another protein of known 3D structure. After a final selection by quality criteria, 433 reliable 3D models were retained and are available from our SDAP Website. The new 3D models extensively enhance our knowledge of allergen structures. As an example of their use, experimentally derived “continuous IgE epitopes” were mapped on 3 experimentally determined structures and 13 of our 3D-models of allergenic proteins. Large portions of these continuous sequences are not entirely on the surface and therefore cannot interact with IgE or other proteins. Only the surface exposed residues are constituents of “conformational IgE epitopes” which are not in all cases continuous in sequence. The surface exposed parts of the experimental determined continuous IgE epitopes showed a distinct statistical distribution as compared to their presence in typical protein-protein interfaces. The amino acids Ala, Ser, Asn, Gly and particularly Lys have a high propensity to occur in IgE binding sites. The 3D-models will facilitate further analysis of the common properties of IgE binding sites of allergenic proteins. PMID:18621419
Microstructure of Desmanthus illinoensis
NASA Astrophysics Data System (ADS)
Wood, Delilah F.; Orts, William J.; Glenn, Gregory M.
2010-06-01
Structure and histochemistry of mature seeds of Desmanthus illinoensis (Illinois bundle flower) show that the seed has typical legume structure. The seed can be separated into two major fractions including the seed coat/endosperm and the embryo. The seed coat consists of a cuticle, palisade sclereids, hour glass cells and mesophyll. Endosperm is attached to the inner portion of the seed coat and is thicker beneath the pleurogram in the center of the seed. The embryo consists mostly of two large cotyledons, the major storage structures of the seed. The cotyledons are high in protein which occurs in protein bodies. Protein bodies in the cotyledons include those without inclusions, those with phytin inclusions and those with calcium-rich crystals. The phytin inclusions are spherical and have high phosphorus and magnesium contents. The calcium-rich crystals are also included inside protein bodies and are druse-type crystals.
Evolution of the arginase fold and functional diversity
Dowling, Daniel P.; Costanzo, Luigi Di; Gennadios, Heather A.; Christianson, David W.
2009-01-01
The large number of protein structures deposited in the Protein Data Bank allows for the identification of novel structural superfamilies based on conservation of fold in addition to conservation of amino acid sequence. Since sequence diverges more rapidly than fold in protein evolution, proteins with little or no significant sequence identity are occasionally observed to adopt similar folds, thereby reflecting unanticipated evolutionary relationships. Here, we review the unique α/β fold first observed in the manganese metalloenzyme rat liver arginase, consisting of a parallel 8 stranded β-sheet surrounded by several helices, and its evolutionary relationship with the zinc-requiring and/or iron-requiring histone deacetylases and acetylpolyamine amidohydrolases. Structural comparisons reveal key features of the core α/β fold that contribute to the divergent metal ion specificity and stoichiometry required for the chemical and biological functions of these enzymes. PMID:18360740
Richard, François D; Kajava, Andrey V
2014-06-01
The dramatic growth of sequencing data evokes an urgent need to improve bioinformatics tools for large-scale proteome analysis. Over the last two decades, the foremost efforts of computer scientists were devoted to proteins with aperiodic sequences having globular 3D structures. However, a large portion of proteins contain periodic sequences representing arrays of repeats that are directly adjacent to each other (so called tandem repeats or TRs). These proteins frequently fold into elongated fibrous structures carrying different fundamental functions. Algorithms specific to the analysis of these regions are urgently required since the conventional approaches developed for globular domains have had limited success when applied to the TR regions. The protein TRs are frequently not perfect, containing a number of mutations, and some of them cannot be easily identified. To detect such "hidden" repeats several algorithms have been developed. However, the most sensitive among them are time-consuming and, therefore, inappropriate for large scale proteome analysis. To speed up the TR detection we developed a rapid filter that is based on the comparison of composition and order of short strings in the adjacent sequence motifs. Tests show that our filter discards up to 22.5% of proteins which are known to be without TRs while keeping almost all (99.2%) TR-containing sequences. Thus, we are able to decrease the size of the initial sequence dataset enriching it with TR-containing proteins which allows a faster subsequent TR detection by other methods. The program is available upon request. Copyright © 2014 Elsevier Inc. All rights reserved.
Improved protein surface comparison and application to low-resolution protein structure data.
Sael, Lee; Kihara, Daisuke
2010-12-14
Recent advancements of experimental techniques for determining protein tertiary structures raise significant challenges for protein bioinformatics. With the number of known structures of unknown function expanding at a rapid pace, an urgent task is to provide reliable clues to their biological function on a large scale. Conventional approaches for structure comparison are not suitable for a real-time database search due to their slow speed. Moreover, a new challenge has arisen from recent techniques such as electron microscopy (EM), which provide low-resolution structure data. Previously, we have introduced a method for protein surface shape representation using the 3D Zernike descriptors (3DZDs). The 3DZD enables fast structure database searches, taking advantage of its rotation invariance and compact representation. The search results of protein surface represented with the 3DZD has showngood agreement with the existing structure classifications, but some discrepancies were also observed. The three new surface representations of backbone atoms, originally devised all-atom-surface representation, and the combination of all-atom surface with the backbone representation are examined. All representations are encoded with the 3DZD. Also, we have investigated the applicability of the 3DZD for searching protein EM density maps of varying resolutions. The surface representations are evaluated on structure retrieval using two existing classifications, SCOP and the CE-based classification. Overall, the 3DZDs representing backbone atoms show better retrieval performance than the original all-atom surface representation. The performance further improved when the two representations are combined. Moreover, we observed that the 3DZD is also powerful in comparing low-resolution structures obtained by electron microscopy.
Claudins and the Modulation of Tight Junction Permeability
Günzel, Dorothee
2013-01-01
Claudins are tight junction membrane proteins that are expressed in epithelia and endothelia and form paracellular barriers and pores that determine tight junction permeability. This review summarizes our current knowledge of this large protein family and discusses recent advances in our understanding of their structure and physiological functions. PMID:23589827
Sinars, Cindy R.; Cheung-Flynn, Joyce; Rimerman, Ronald A.; Scammell, Jonathan G.; Smith, David F.; Clardy, Jon
2003-01-01
The ability to bind immunosuppressive drugs such as cyclosporin and FK506 defines the immunophilin family of proteins, and the FK506-binding proteins form the FKBP subfamily of immunophilins. Some FKBPs, notably FKBP12 (the 12-kDa FK506-binding protein), have defined roles in regulating ion channels or cell signaling, and well established structures. Other FKBPs, especially the larger ones, participate in important biological processes, but their exact roles and the structural bases for these roles are poorly defined. FKBP51 (the 51-kDa FKBP) associates with heat shock protein 90 (Hsp90) and appears in functionally mature steroid receptor complexes. In New World monkeys, FKBP51 has been implicated in cortisol resistance. We report here the x-ray structures of human FKBP51, to 2.7 Å, and squirrel monkey FKBP51, to 2.8 Å, by using multiwavelength anomalous dispersion phasing. FKBP51 is composed of three domains: two consecutive FKBP domains and a three-unit repeat of the TPR (tetratricopeptide repeat) domain. This structure of a multi-FKBP domain protein clarifies the arrangement of these domains and their possible interactions with other proteins. The two FKBP domains differ by an insertion in the second that affects the formation of the progesterone receptor complex. PMID:12538866
Bridging Enzymatic Structure Function via Mechanics: A Coarse-Grain Approach.
Sacquin-Mora, S
2016-01-01
Flexibility is a central aspect of protein function, and ligand binding in enzymes involves a wide range of structural changes, ranging from large-scale domain movements to small loop or side-chain rearrangements. In order to understand how the mechanical properties of enzymes, and the mechanical variations that are induced by ligand binding, relate to enzymatic activity, we carried out coarse-grain Brownian dynamics simulations on a set of enzymes whose structures in the unbound and ligand-bound forms are available in the Protein Data Bank. Our results show that enzymes are remarkably heterogeneous objects from a mechanical point of view and that the local rigidity of individual residues is tightly connected to their part in the protein's overall structure and function. The systematic comparison of the rigidity of enzymes in their unbound and bound forms highlights the fact that small conformational changes can induce large mechanical effects, leading to either more or less flexibility depending on the enzyme's architecture and the location of its ligand-biding site. These mechanical variations target a limited number of specific residues that occupy key locations for enzymatic activity, and our approach thus offers a mean to detect perturbation-sensitive sites in enzymes, where the addition or removal of a few interactions will lead to important changes in the proteins internal dynamics. © 2016 Elsevier Inc. All rights reserved.
A decade and a half of protein intrinsic disorder: Biology still waits for physics
Uversky, Vladimir N
2013-01-01
The abundant existence of proteins and regions that possess specific functions without being uniquely folded into unique 3D structures has become accepted by a significant number of protein scientists. Sequences of these intrinsically disordered proteins (IDPs) and IDP regions (IDPRs) are characterized by a number of specific features, such as low overall hydrophobicity and high net charge which makes these proteins predictable. IDPs/IDPRs possess large hydrodynamic volumes, low contents of ordered secondary structure, and are characterized by high structural heterogeneity. They are very flexible, but some may undergo disorder to order transitions in the presence of natural ligands. The degree of these structural rearrangements varies over a very wide range. IDPs/IDPRs are tightly controlled under the normal conditions and have numerous specific functions that complement functions of ordered proteins and domains. When lacking proper control, they have multiple roles in pathogenesis of various human diseases. Gaining structural and functional information about these proteins is a challenge, since they do not typically “freeze” while their “pictures are taken.” However, despite or perhaps because of the experimental challenges, these fuzzy objects with fuzzy structures and fuzzy functions are among the most interesting targets for modern protein research. This review briefly summarizes some of the recent advances in this exciting field and considers some of the basic lessons learned from the analysis of physics, chemistry, and biology of IDPs. PMID:23553817
i3Drefine software for protein 3D structure refinement and its assessment in CASP10.
Bhattacharya, Debswapna; Cheng, Jianlin
2013-01-01
Protein structure refinement refers to the process of improving the qualities of protein structures during structure modeling processes to bring them closer to their native states. Structure refinement has been drawing increasing attention in the community-wide Critical Assessment of techniques for Protein Structure prediction (CASP) experiments since its addition in 8(th) CASP experiment. During the 9(th) and recently concluded 10(th) CASP experiments, a consistent growth in number of refinement targets and participating groups has been witnessed. Yet, protein structure refinement still remains a largely unsolved problem with majority of participating groups in CASP refinement category failed to consistently improve the quality of structures issued for refinement. In order to alleviate this need, we developed a completely automated and computationally efficient protein 3D structure refinement method, i3Drefine, based on an iterative and highly convergent energy minimization algorithm with a powerful all-atom composite physics and knowledge-based force fields and hydrogen bonding (HB) network optimization technique. In the recent community-wide blind experiment, CASP10, i3Drefine (as 'MULTICOM-CONSTRUCT') was ranked as the best method in the server section as per the official assessment of CASP10 experiment. Here we provide the community with free access to i3Drefine software and systematically analyse the performance of i3Drefine in strict blind mode on the refinement targets issued in CASP10 refinement category and compare with other state-of-the-art refinement methods participating in CASP10. Our analysis demonstrates that i3Drefine is only fully-automated server participating in CASP10 exhibiting consistent improvement over the initial structures in both global and local structural quality metrics. Executable version of i3Drefine is freely available at http://protein.rnet.missouri.edu/i3drefine/.
SONAR Discovers RNA-Binding Proteins from Analysis of Large-Scale Protein-Protein Interactomes.
Brannan, Kristopher W; Jin, Wenhao; Huelga, Stephanie C; Banks, Charles A S; Gilmore, Joshua M; Florens, Laurence; Washburn, Michael P; Van Nostrand, Eric L; Pratt, Gabriel A; Schwinn, Marie K; Daniels, Danette L; Yeo, Gene W
2016-10-20
RNA metabolism is controlled by an expanding, yet incomplete, catalog of RNA-binding proteins (RBPs), many of which lack characterized RNA binding domains. Approaches to expand the RBP repertoire to discover non-canonical RBPs are currently needed. Here, HaloTag fusion pull down of 12 nuclear and cytoplasmic RBPs followed by quantitative mass spectrometry (MS) demonstrates that proteins interacting with multiple RBPs in an RNA-dependent manner are enriched for RBPs. This motivated SONAR, a computational approach that predicts RNA binding activity by analyzing large-scale affinity precipitation-MS protein-protein interactomes. Without relying on sequence or structure information, SONAR identifies 1,923 human, 489 fly, and 745 yeast RBPs, including over 100 human candidate RBPs that contain zinc finger domains. Enhanced CLIP confirms RNA binding activity and identifies transcriptome-wide RNA binding sites for SONAR-predicted RBPs, revealing unexpected RNA binding activity for disease-relevant proteins and DNA binding proteins. Copyright © 2016 Elsevier Inc. All rights reserved.
Hidden relationships between metalloproteins unveiled by structural comparison of their metal sites
NASA Astrophysics Data System (ADS)
Valasatava, Yana; Andreini, Claudia; Rosato, Antonio
2015-03-01
Metalloproteins account for a substantial fraction of all proteins. They incorporate metal atoms, which are required for their structure and/or function. Here we describe a new computational protocol to systematically compare and classify metal-binding sites on the basis of their structural similarity. These sites are extracted from the MetalPDB database of minimal functional sites (MFSs) in metal-binding biological macromolecules. Structural similarity is measured by the scoring function of the available MetalS2 program. Hierarchical clustering was used to organize MFSs into clusters, for each of which a representative MFS was identified. The comparison of all representative MFSs provided a thorough structure-based classification of the sites analyzed. As examples, the application of the proposed computational protocol to all heme-binding proteins and zinc-binding proteins of known structure highlighted the existence of structural subtypes, validated known evolutionary links and shed new light on the occurrence of similar sites in systems at different evolutionary distances. The present approach thus makes available an innovative viewpoint on metalloproteins, where the functionally crucial metal sites effectively lead the discovery of structural and functional relationships in a largely protein-independent manner.
Complementary uses of small angle X-ray scattering and X-ray crystallography.
Pillon, Monica C; Guarné, Alba
2017-11-01
Most proteins function within networks and, therefore, protein interactions are central to protein function. Although stable macromolecular machines have been extensively studied, dynamic protein interactions remain poorly understood. Small-angle X-ray scattering probes the size, shape and dynamics of proteins in solution at low resolution and can be used to study samples in a large range of molecular weights. Therefore, it has emerged as a powerful technique to study the structure and dynamics of biomolecular systems and bridge fragmented information obtained using high-resolution techniques. Here we review how small-angle X-ray scattering can be combined with other structural biology techniques to study protein dynamics. This article is part of a Special Issue entitled: Biophysics in Canada, edited by Lewis Kay, John Baenziger, Albert Berghuis and Peter Tieleman. Copyright © 2017 Elsevier B.V. All rights reserved.
Quantifying side-chain conformational variations in protein structure
Miao, Zhichao; Cao, Yang
2016-01-01
Protein side-chain conformation is closely related to their biological functions. The side-chain prediction is a key step in protein design, protein docking and structure optimization. However, side-chain polymorphism comprehensively exists in protein as various types and has been long overlooked by side-chain prediction. But such conformational variations have not been quantitatively studied and the correlations between these variations and residue features are vague. Here, we performed statistical analyses on large scale data sets and found that the side-chain conformational flexibility is closely related to the exposure to solvent, degree of freedom and hydrophilicity. These analyses allowed us to quantify different types of side-chain variabilities in PDB. The results underscore that protein side-chain conformation prediction is not a single-answer problem, leading us to reconsider the assessment approaches of side-chain prediction programs. PMID:27845406
Quantifying side-chain conformational variations in protein structure
NASA Astrophysics Data System (ADS)
Miao, Zhichao; Cao, Yang
2016-11-01
Protein side-chain conformation is closely related to their biological functions. The side-chain prediction is a key step in protein design, protein docking and structure optimization. However, side-chain polymorphism comprehensively exists in protein as various types and has been long overlooked by side-chain prediction. But such conformational variations have not been quantitatively studied and the correlations between these variations and residue features are vague. Here, we performed statistical analyses on large scale data sets and found that the side-chain conformational flexibility is closely related to the exposure to solvent, degree of freedom and hydrophilicity. These analyses allowed us to quantify different types of side-chain variabilities in PDB. The results underscore that protein side-chain conformation prediction is not a single-answer problem, leading us to reconsider the assessment approaches of side-chain prediction programs.
Quantifying side-chain conformational variations in protein structure.
Miao, Zhichao; Cao, Yang
2016-11-15
Protein side-chain conformation is closely related to their biological functions. The side-chain prediction is a key step in protein design, protein docking and structure optimization. However, side-chain polymorphism comprehensively exists in protein as various types and has been long overlooked by side-chain prediction. But such conformational variations have not been quantitatively studied and the correlations between these variations and residue features are vague. Here, we performed statistical analyses on large scale data sets and found that the side-chain conformational flexibility is closely related to the exposure to solvent, degree of freedom and hydrophilicity. These analyses allowed us to quantify different types of side-chain variabilities in PDB. The results underscore that protein side-chain conformation prediction is not a single-answer problem, leading us to reconsider the assessment approaches of side-chain prediction programs.
Lieutaud, Philippe; Uversky, Alexey V.; Uversky, Vladimir N.; Longhi, Sonia
2016-01-01
ABSTRACT In the last 2 decades it has become increasingly evident that a large number of proteins are either fully or partially disordered. Intrinsically disordered proteins lack a stable 3D structure, are ubiquitous and fulfill essential biological functions. Their conformational heterogeneity is encoded in their amino acid sequences, thereby allowing intrinsically disordered proteins or regions to be recognized based on properties of these sequences. The identification of disordered regions facilitates the functional annotation of proteins and is instrumental for delineating boundaries of protein domains amenable to structural determination with X-ray crystallization. This article discusses a comprehensive selection of databases and methods currently employed to disseminate experimental and putative annotations of disorder, predict disorder and identify regions involved in induced folding. It also provides a set of detailed instructions that should be followed to perform computational analysis of disorder. PMID:28232901
The role of stabilization centers in protein thermal stability
DOE Office of Scientific and Technical Information (OSTI.GOV)
Magyar, Csaba; Gromiha, M. Michael; Sávoly, Zoltán
2016-02-26
The definition of stabilization centers was introduced almost two decades ago. They are centers of noncovalent long range interaction clusters, believed to have a role in maintaining the three-dimensional structure of proteins by preventing their decay due to their cooperative long range interactions. Here, this hypothesis is investigated from the viewpoint of thermal stability for the first time, using a large protein thermodynamics database. The positions of amino acids belonging to stabilization centers are correlated with available experimental thermodynamic data on protein thermal stability. Our analysis suggests that stabilization centers, especially solvent exposed ones, do contribute to the thermal stabilizationmore » of proteins. - Highlights: • Stabilization centers contribute to thermal stabilization of protein structures. • Stabilization center content correlates with melting temperature of proteins. • Exposed stabilization center content correlates with stability even in hyperthermophiles. • Stability changing mutations are frequently found at stabilization centers.« less
Evolutionary Strategies for Protein Folding
NASA Astrophysics Data System (ADS)
Murthy Gopal, Srinivasa; Wenzel, Wolfgang
2006-03-01
The free energy approach for predicting the protein tertiary structure describes the native state of a protein as the global minimum of an appropriate free-energy forcefield. The low-energy region of the free-energy landscape of a protein is extremely rugged. Efficient optimization methods must therefore speed up the search for the global optimum by avoiding high energy transition states, adapt large scale moves or accept unphysical intermediates. Here we investigate an evolutionary strategies(ES) for optimizing a protein conformation in our all-atom free-energy force field([1],[2]). A set of random conformations is evolved using an ES to get a diverse population containing low energy structure. The ES is shown to balance energy improvement and yet maintain diversity in structures. The ES is implemented as a master-client model for distributed computing. Starting from random structures and by using this optimization technique, we were able to fold a 20 amino-acid helical protein and 16 amino-acid beta hairpin[3]. We compare ES to basin hopping method. [1]T. Herges and W. Wenzel,Biophys.J. 87,3100(2004) [2] A. Verma and W. Wenzel Stabilization and folding of beta-sheet and alpha-helical proteins in an all-atom free energy model(submitted)(2005) [3] S. M. Gopal and W. Wenzel Evolutionary Strategies for Protein Folding (in preparation)
Protein structure shapes immunodominance in the CD4 T cell response to yellow fever vaccination.
Koblischke, Maximilian; Mackroth, Maria S; Schwaiger, Julia; Fae, Ingrid; Fischer, Gottfried; Stiasny, Karin; Heinz, Franz X; Aberle, Judith H
2017-08-21
The live attenuated yellow fever (YF) vaccine is a highly effective human vaccine and induces long-term protective neutralizing antibodies directed against the viral envelope protein E. The generation of such antibodies requires the help of CD4 T cells which recognize peptides derived from proteins in virus particles internalized and processed by E-specific B cells. The CD4 T helper cell response is restricted to few immunodominant epitopes, but the mechanisms of their selection are largely unknown. Here, we report that CD4 T cell responses elicited by the YF-17D vaccine are focused to hotspots of two helices of the viral capsid protein and to exposed strands and loops of E. We found that the locations of immunodominant epitopes within three-dimensional protein structures exhibit a high degree of overlap between YF virus and the structurally homologous flavivirus tick-borne encephalitis virus, although amino acid sequence identity of the epitope regions is only 15-45%. The restriction of epitopes to exposed E protein surfaces and their strikingly similar positioning within proteins of distantly related flaviviruses are consistent with a strong influence of protein structure that shapes CD4 T cell responses and provide leads for a rational design of immunogens for vaccination.
Ahlstrom, Logan S.; Vorontsov, Ivan I.; Shi, Jun; Miyashita, Osamu
2017-01-01
Side chains in protein crystal structures are essential for understanding biochemical processes such as catalysis and molecular recognition. However, crystal packing could influence side-chain conformation and dynamics, thus complicating functional interpretations of available experimental structures. Here we investigate the effect of crystal packing on side-chain conformational dynamics with crystal and solution molecular dynamics simulations using Cyanovirin-N as a model system. Side-chain ensembles for solvent-exposed residues obtained from simulation largely reflect the conformations observed in the X-ray structure. This agreement is most striking for crystal-contacting residues during crystal simulation. Given the high level of correspondence between our simulations and the X-ray data, we compare side-chain ensembles in solution and crystal simulations. We observe large decreases in conformational entropy in the crystal for several long, polar and contacting residues on the protein surface. Such cases agree well with the average loss in conformational entropy per residue upon protein folding and are accompanied by a change in side-chain conformation. This finding supports the application of surface engineering to facilitate crystallization. Our simulation-based approach demonstrated here with Cyanovirin-N establishes a framework for quantitatively comparing side-chain ensembles in solution and in the crystal across a larger set of proteins to elucidate the effect of the crystal environment on protein conformations. PMID:28107510
Ahlstrom, Logan S; Vorontsov, Ivan I; Shi, Jun; Miyashita, Osamu
2017-01-01
Side chains in protein crystal structures are essential for understanding biochemical processes such as catalysis and molecular recognition. However, crystal packing could influence side-chain conformation and dynamics, thus complicating functional interpretations of available experimental structures. Here we investigate the effect of crystal packing on side-chain conformational dynamics with crystal and solution molecular dynamics simulations using Cyanovirin-N as a model system. Side-chain ensembles for solvent-exposed residues obtained from simulation largely reflect the conformations observed in the X-ray structure. This agreement is most striking for crystal-contacting residues during crystal simulation. Given the high level of correspondence between our simulations and the X-ray data, we compare side-chain ensembles in solution and crystal simulations. We observe large decreases in conformational entropy in the crystal for several long, polar and contacting residues on the protein surface. Such cases agree well with the average loss in conformational entropy per residue upon protein folding and are accompanied by a change in side-chain conformation. This finding supports the application of surface engineering to facilitate crystallization. Our simulation-based approach demonstrated here with Cyanovirin-N establishes a framework for quantitatively comparing side-chain ensembles in solution and in the crystal across a larger set of proteins to elucidate the effect of the crystal environment on protein conformations.
Naitow, Hisashi; Matsuura, Yoshinori; Tono, Kensuke; Joti, Yasumasa; Kameshima, Takashi; Hatsui, Takaki; Yabashi, Makina; Tanaka, Rie; Tanaka, Tomoyuki; Sugahara, Michihiro; Kobayashi, Jun; Nango, Eriko; Iwata, So; Kunishima, Naoki
2017-08-01
Serial femtosecond crystallography (SFX) with an X-ray free-electron laser is used for the structural determination of proteins from a large number of microcrystals at room temperature. To examine the feasibility of pharmaceutical applications of SFX, a ligand-soaking experiment using thermolysin microcrystals has been performed using SFX. The results were compared with those from a conventional experiment with synchrotron radiation (SR) at 100 K. A protein-ligand complex structure was successfully obtained from an SFX experiment using microcrystals soaked with a small-molecule ligand; both oil-based and water-based crystal carriers gave essentially the same results. In a comparison of the SFX and SR structures, clear differences were observed in the unit-cell parameters, in the alternate conformation of side chains, in the degree of water coordination and in the ligand-binding mode.
Predicting protein complex geometries with a neural network.
Chae, Myong-Ho; Krull, Florian; Lorenzen, Stephan; Knapp, Ernst-Walter
2010-03-01
A major challenge of the protein docking problem is to define scoring functions that can distinguish near-native protein complex geometries from a large number of non-native geometries (decoys) generated with noncomplexed protein structures (unbound docking). In this study, we have constructed a neural network that employs the information from atom-pair distance distributions of a large number of decoys to predict protein complex geometries. We found that docking prediction can be significantly improved using two different types of polar hydrogen atoms. To train the neural network, 2000 near-native decoys of even distance distribution were used for each of the 185 considered protein complexes. The neural network normalizes the information from different protein complexes using an additional protein complex identity input neuron for each complex. The parameters of the neural network were determined such that they mimic a scoring funnel in the neighborhood of the native complex structure. The neural network approach avoids the reference state problem, which occurs in deriving knowledge-based energy functions for scoring. We show that a distance-dependent atom pair potential performs much better than a simple atom-pair contact potential. We have compared the performance of our scoring function with other empirical and knowledge-based scoring functions such as ZDOCK 3.0, ZRANK, ITScore-PP, EMPIRE, and RosettaDock. In spite of the simplicity of the method and its functional form, our neural network-based scoring function achieves a reasonable performance in rigid-body unbound docking of proteins. Proteins 2010. (c) 2009 Wiley-Liss, Inc.
Minciacchi, Valentina R.; You, Sungyong; Spinelli, Cristiana; Morley, Samantha; Zandian, Mandana; Aspuria, Paul-Joseph; Cavallini, Lorenzo; Ciardiello, Chiara; Sobreiro, Mariana Reis; Morello, Matteo; Kharmate, Geetanjali; Jang, Su Chul; Kim, Dae-Kyum; Hosseini-Beheshti, Elham; Guns, Emma Tomlinson; Gleave, Martin; Gho, Yong Song; Mathivanan, Suresh; Yang, Wei; Freeman, Michael R.; Di Vizio, Dolores
2015-01-01
Large oncosomes (LO) are atypically large (1-10μm diameter) cancer-derived extracellular vesicles (EVs), originating from the shedding of membrane blebs and associated with advanced disease. We report that 25% of the proteins, identified by a quantitative proteomics analysis, are differentially represented in large and nano-sized EVs from prostate cancer cells. Proteins enriched in large EVs included enzymes involved in glucose, glutamine and amino acid metabolism, all metabolic processes relevant to cancer. Glutamine metabolism was altered in cancer cells exposed to large EVs, an effect that was not observed upon treatment with exosomes. Large EVs exhibited discrete buoyant densities in iodixanol (OptiPrepTM) gradients. Fluorescent microscopy of large EVs revealed an appearance consistent with LO morphology, indicating that these structures can be categorized as LO. Among the proteins enriched in LO, cytokeratin 18 (CK18) was one of the most abundant (within the top 5th percentile) and was used to develop an assay to detect LO in the circulation and tissues of mice and patients with prostate cancer. These observations indicate that LO represent a discrete EV type that may play a distinct role in tumor progression and that may be a source of cancer-specific markers. PMID:25857301
De Novo Protein Structure Prediction
NASA Astrophysics Data System (ADS)
Hung, Ling-Hong; Ngan, Shing-Chung; Samudrala, Ram
An unparalleled amount of sequence data is being made available from large-scale genome sequencing efforts. The data provide a shortcut to the determination of the function of a gene of interest, as long as there is an existing sequenced gene with similar sequence and of known function. This has spurred structural genomic initiatives with the goal of determining as many protein folds as possible (Brenner and Levitt, 2000; Burley, 2000; Brenner, 2001; Heinemann et al., 2001). The purpose of this is twofold: First, the structure of a gene product can often lead to direct inference of its function. Second, since the function of a protein is dependent on its structure, direct comparison of the structures of gene products can be more sensitive than the comparison of sequences of genes for detecting homology. Presently, structural determination by crystallography and NMR techniques is still slow and expensive in terms of manpower and resources, despite attempts to automate the processes. Computer structure prediction algorithms, while not providing the accuracy of the traditional techniques, are extremely quick and inexpensive and can provide useful low-resolution data for structure comparisons (Bonneau and Baker, 2001). Given the immense number of structures which the structural genomic projects are attempting to solve, there would be a considerable gain even if the computer structure prediction approach were applicable to a subset of proteins.
Developing protein documentaries and other multimedia presentations for molecular biology.
Quinn, G; Wang, H P; Martinez, D; Bourne, P E
1999-01-01
Computer-based multimedia technology for distance learning and research has come of age--the price point is acceptable, domain experts using off-the-shelf software can prepare compelling materials, and the material can be efficiently delivered via the Internet to a large audience. While not presenting any new scientific results, this paper outlines experiences with a variety of commercial and free software tools and the associated protocols we have used to prepare protein documentaries and other multimedia presentations relevant to molecular biology. A protein documentary is defined here as a description of the relationship between structure and function in a single protein or in a related family of proteins. A description using text and images which is further enhanced by the use of sound and interactive graphics. Examples of documentaries prepared to describe cAMP dependent protein kinase, the founding structural member of the protein kinase family for which there is now over 40 structures can be found at http://franklin.burnham-inst.org/rcsb. A variety of other prototype multimedia presentations for molecular biology described in this paper can be found at http://fraklin.burnham-inst.org.
Núñez-Vivanco, Gabriel; Valdés-Jiménez, Alejandro; Besoaín, Felipe; Reyes-Parada, Miguel
2016-01-01
Since the structure of proteins is more conserved than the sequence, the identification of conserved three-dimensional (3D) patterns among a set of proteins, can be important for protein function prediction, protein clustering, drug discovery and the establishment of evolutionary relationships. Thus, several computational applications to identify, describe and compare 3D patterns (or motifs) have been developed. Often, these tools consider a 3D pattern as that described by the residues surrounding co-crystallized/docked ligands available from X-ray crystal structures or homology models. Nevertheless, many of the protein structures stored in public databases do not provide information about the location and characteristics of ligand binding sites and/or other important 3D patterns such as allosteric sites, enzyme-cofactor interaction motifs, etc. This makes necessary the development of new ligand-independent methods to search and compare 3D patterns in all available protein structures. Here we introduce Geomfinder, an intuitive, flexible, alignment-free and ligand-independent web server for detailed estimation of similarities between all pairs of 3D patterns detected in any two given protein structures. We used around 1100 protein structures to form pairs of proteins which were assessed with Geomfinder. In these analyses each protein was considered in only one pair (e.g. in a subset of 100 different proteins, 50 pairs of proteins can be defined). Thus: (a) Geomfinder detected identical pairs of 3D patterns in a series of monoamine oxidase-B structures, which corresponded to the effectively similar ligand binding sites at these proteins; (b) we identified structural similarities among pairs of protein structures which are targets of compounds such as acarbose, benzamidine, adenosine triphosphate and pyridoxal phosphate; these similar 3D patterns are not detected using sequence-based methods; (c) the detailed evaluation of three specific cases showed the versatility of Geomfinder, which was able to discriminate between similar and different 3D patterns related to binding sites of common substrates in a range of diverse proteins. Geomfinder allows detecting similar 3D patterns between any two pair of protein structures, regardless of the divergency among their amino acids sequences. Although the software is not intended for simultaneous multiple comparisons in a large number of proteins, it can be particularly useful in cases such as the structure-based design of multitarget drugs, where a detailed analysis of 3D patterns similarities between a few selected protein targets is essential.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Thompson, Michael C.; Cascio, Duilio; Yeates, Todd O.
Real macromolecular crystals can be non-ideal in a myriad of ways. This often creates challenges for structure determination, while also offering opportunities for greater insight into the crystalline state and the dynamic behavior of macromolecules. To evaluate whether different parts of a single crystal of a dynamic protein, EutL, might be informative about crystal and protein polymorphism, a microfocus X-ray synchrotron beam was used to collect a series of 18 separate data sets from non-overlapping regions of the same crystal specimen. A principal component analysis (PCA) approach was employed to compare the structure factors and unit cells across the datamore » sets, and it was found that the 18 data sets separated into two distinct groups, with largeRvalues (in the 40% range) and significant unit-cell variations between the members of the two groups. This categorization mapped the different data-set types to distinct regions of the crystal specimen. Atomic models of EutL were then refined against two different data sets obtained by separately merging data from the two distinct groups. A comparison of the two resulting models revealed minor but discernable differences in certain segments of the protein structure, and regions of higher deviation were found to correlate with regions where larger dynamic motions were predicted to occur by normal-mode molecular-dynamics simulations. The findings emphasize that large spatially dependent variations may be present across individual macromolecular crystals. This information can be uncovered by simultaneous analysis of multiple partial data sets and can be exploited to reveal new insights about protein dynamics, while also improving the accuracy of the structure-factor data ultimately obtained in X-ray diffraction experiments.« less
Ramakrishnan, Gayatri; Ochoa-Montaño, Bernardo; Raghavender, Upadhyayula S; Mudgal, Richa; Joshi, Adwait G; Chandra, Nagasuma R; Sowdhamini, Ramanathan; Blundell, Tom L; Srinivasan, Narayanaswamy
2015-01-01
The availability of the genome sequence of Mycobacterium tuberculosis H37Rv has encouraged determination of large numbers of protein structures and detailed definition of the biological information encoded therein; yet, the functions of many proteins in M. tuberculosis remain unknown. The emergence of multidrug resistant strains makes it a priority to exploit recent advances in homology recognition and structure prediction to re-analyse its gene products. Here we report the structural and functional characterization of gene products encoded in the M. tuberculosis genome, with the help of sensitive profile-based remote homology search and fold recognition algorithms resulting in an enhanced annotation of the proteome where 95% of the M. tuberculosis proteins were identified wholly or partly with information on structure or function. New information includes association of 244 proteins with 205 domain families and a separate set of new association of folds to 64 proteins. Extending structural information across uncharacterized protein families represented in the M. tuberculosis proteome, by determining superfamily relationships between families of known and unknown structures, has contributed to an enhancement in the knowledge of structural content. In retrospect, such superfamily relationships have facilitated recognition of probable structure and/or function for several uncharacterized protein families, eventually aiding recognition of probable functions for homologous proteins corresponding to such families. Gene products unique to mycobacteria for which no functions could be identified are 183. Of these 18 were determined to be M. tuberculosis specific. Such pathogen-specific proteins are speculated to harbour virulence factors required for pathogenesis. A re-annotated proteome of M. tuberculosis, with greater completeness of annotated proteins and domain assigned regions, provides a valuable basis for experimental endeavours designed to obtain a better understanding of pathogenesis and to accelerate the process of drug target discovery. Copyright © 2014 Elsevier Ltd. All rights reserved.
Ritchie, Andrew W; Webb, Lauren J
2015-11-05
Biological function emerges in large part from the interactions of biomacromolecules in the complex and dynamic environment of the living cell. For this reason, macromolecular interactions in biological systems are now a major focus of interest throughout the biochemical and biophysical communities. The affinity and specificity of macromolecular interactions are the result of both structural and electrostatic factors. Significant advances have been made in characterizing structural features of stable protein-protein interfaces through the techniques of modern structural biology, but much less is understood about how electrostatic factors promote and stabilize specific functional macromolecular interactions over all possible choices presented to a given molecule in a crowded environment. In this Feature Article, we describe how vibrational Stark effect (VSE) spectroscopy is being applied to measure electrostatic fields at protein-protein interfaces, focusing on measurements of guanosine triphosphate (GTP)-binding proteins of the Ras superfamily binding with structurally related but functionally distinct downstream effector proteins. In VSE spectroscopy, spectral shifts of a probe oscillator's energy are related directly to that probe's local electrostatic environment. By performing this experiment repeatedly throughout a protein-protein interface, an experimental map of measured electrostatic fields generated at that interface is determined. These data can be used to rationalize selective binding of similarly structured proteins in both in vitro and in vivo environments. Furthermore, these data can be used to compare to computational predictions of electrostatic fields to explore the level of simulation detail that is necessary to accurately predict our experimental findings.
Subhadarshanee, Biswamaitree; Mohanty, Abhinav; Jagdev, Manas Kumar; Vasudevan, Dileep; Behera, Rabindra K
2017-10-01
Preparation of modified and hybrid ferritin provides a great opportunity to understand the mechanisms of iron loading/unloading, protein self-assembly, size constrained nanomaterial synthesis and targeted drug delivery. However, the large size (M.W.=490kDa) has been limiting the separation of different modified and/or hybrid ferritin nanocages from each other in their intact assembled form and further characterization. Native polyacrylamide gel electrophoresis (PAGE) separates proteins on the basis of both charge and mass, while maintaining their overall native structure and activity. Altering surface charge distribution by substitution of amino acid residues located at the external surface of ferritin (K104E & D40A) affected the migration rate in native PAGE while internal modification had little effect. Crystal structures confirmed that ferritin nanocages made up of subunits with single amino acid substitutions retain the overall structure of ferritin nanocage. Taking advantage of K104E migration behavior, formation of hybrid ferritins with subunits of wild type (WT) and K104E were confirmed and separated in native PAGE. Cage integrity and iron loading ability (ferritin activity) were also tested. The migration pattern of hybrid ferritins in native PAGE depends on the subunit ratio (WT: K104E) in the ferritin cage. Our work shows that native PAGE can be exploited in nanobiotechnology, by analyzing modifications of large proteins like ferritin. Native PAGE, a simple, straight-forward technique, can be used to analyze small modification (by altering external surface charge) in large proteins like ferritin, without disintegrating its self-assembled nanocage structure. In doing so, native PAGE can complement the information obtained from mass spectrometry. The confirmation and separation of modified and hybrid ferritin protein nanocages in native PAGE, opens up various prospects of bio-conjugation, which can be useful in targeted drug delivery, nanobiotechnology and in understanding nature's idea of synthesizing hybrid ferritins in different human tissues. Copyright © 2017 Elsevier B.V. All rights reserved.
The dipole moment of membrane proteins: potassium channel protein and beta-subunit.
Takashima, S
2001-12-25
The mechanism of ion channel opening is one of the most fascinating problems in membrane biology. Based on phenomenological studies, early researchers suggested that the elementary process of ion channel opening may be the intramembrane charge movement or the orientation of dipolar proteins in the channel. In spite of the far reaching significance of these hypotheses, it has not been possible to formulate a comprehensive molecular theory for the mechanism of channel opening. This is because of the lack of the detailed knowledge on the structure of channel proteins. In recent years, however, the research on the structure of channel proteins made marked advances and, at present, we are beginning to have sufficient information on the structure of some of the channel proteins, e.g. potassium-channel protein and beta-subunits. With these new information, we are now ready to have another look at the old hypothesis, in particular, the dipole moment of channel proteins being the voltage sensor for the opening and closing of ion channels. In this paper, the dipole moments of potassium channel protein and beta-subunit, are calculated using X-ray diffraction data. A large dipole moment was found for beta-subunits while the dipole moment of K-channel protein was found to be considerably smaller than that of beta-subunits. These calculations were conducted as a preliminary study of the comprehensive research on the dipolar structure of channel proteins in excitable membranes, above all, sodium channel proteins.
Zivanovic, Yvan; Confalonieri, Fabrice; Ponchon, Luc; Lurz, Rudi; Chami, Mohamed; Flayhan, Ali; Renouard, Madalena; Huet, Alexis; Decottignies, Paulette; Davidson, Alan R.; Breyton, Cécile
2014-01-01
Bacteriophage T5 represents a large family of lytic Siphoviridae infecting Gram-negative bacteria. The low-resolution structure of T5 showed the T=13 geometry of the capsid and the unusual trimeric organization of the tail tube, and the assembly pathway of the capsid was established. Although major structural proteins of T5 have been identified in these studies, most of the genes encoding the morphogenesis proteins remained to be identified. Here, we combine a proteomic analysis of T5 particles with a bioinformatic study and electron microscopic immunolocalization to assign function to the genes encoding the structural proteins, the packaging proteins, and other nonstructural components required for T5 assembly. A head maturation protease that likely accounts for the cleavage of the different capsid proteins is identified. Two other proteins involved in capsid maturation add originality to the T5 capsid assembly mechanism: the single head-to-tail joining protein, which closes the T5 capsid after DNA packaging, and the nicking endonuclease responsible for the single-strand interruptions in the T5 genome. We localize most of the tail proteins that were hitherto uncharacterized and provide a detailed description of the tail tip composition. Our findings highlight novel variations of viral assembly strategies and of virion particle architecture. They further recommend T5 for exploring phage structure and assembly and for deciphering conformational rearrangements that accompany DNA transfer from the capsid to the host cytoplasm. PMID:24198424
Recognition of functional sites in protein structures.
Shulman-Peleg, Alexandra; Nussinov, Ruth; Wolfson, Haim J
2004-06-04
Recognition of regions on the surface of one protein, that are similar to a binding site of another is crucial for the prediction of molecular interactions and for functional classifications. We first describe a novel method, SiteEngine, that assumes no sequence or fold similarities and is able to recognize proteins that have similar binding sites and may perform similar functions. We achieve high efficiency and speed by introducing a low-resolution surface representation via chemically important surface points, by hashing triangles of physico-chemical properties and by application of hierarchical scoring schemes for a thorough exploration of global and local similarities. We proceed to rigorously apply this method to functional site recognition in three possible ways: first, we search a given functional site on a large set of complete protein structures. Second, a potential functional site on a protein of interest is compared with known binding sites, to recognize similar features. Third, a complete protein structure is searched for the presence of an a priori unknown functional site, similar to known sites. Our method is robust and efficient enough to allow computationally demanding applications such as the first and the third. From the biological standpoint, the first application may identify secondary binding sites of drugs that may lead to side-effects. The third application finds new potential sites on the protein that may provide targets for drug design. Each of the three applications may aid in assigning a function and in classification of binding patterns. We highlight the advantages and disadvantages of each type of search, provide examples of large-scale searches of the entire Protein Data Base and make functional predictions.
Statistical discovery of site inter-dependencies in sub-molecular hierarchical protein structuring
2012-01-01
Background Much progress has been made in understanding the 3D structure of proteins using methods such as NMR and X-ray crystallography. The resulting 3D structures are extremely informative, but do not always reveal which sites and residues within the structure are of special importance. Recently, there are indications that multiple-residue, sub-domain structural relationships within the larger 3D consensus structure of a protein can be inferred from the analysis of the multiple sequence alignment data of a protein family. These intra-dependent clusters of associated sites are used to indicate hierarchical inter-residue relationships within the 3D structure. To reveal the patterns of associations among individual amino acids or sub-domain components within the structure, we apply a k-modes attribute (aligned site) clustering algorithm to the ubiquitin and transthyretin families in order to discover associations among groups of sites within the multiple sequence alignment. We then observe what these associations imply within the 3D structure of these two protein families. Results The k-modes site clustering algorithm we developed maximizes the intra-group interdependencies based on a normalized mutual information measure. The clusters formed correspond to sub-structural components or binding and interface locations. Applying this data-directed method to the ubiquitin and transthyretin protein family multiple sequence alignments as a test bed, we located numerous interesting associations of interdependent sites. These clusters were then arranged into cluster tree diagrams which revealed four structural sub-domains within the single domain structure of ubiquitin and a single large sub-domain within transthyretin associated with the interface among transthyretin monomers. In addition, several clusters of mutually interdependent sites were discovered for each protein family, each of which appear to play an important role in the molecular structure and/or function. Conclusions Our results demonstrate that the method we present here using a k-modes site clustering algorithm based on interdependency evaluation among sites obtained from a sequence alignment of homologous proteins can provide significant insights into the complex, hierarchical inter-residue structural relationships within the 3D structure of a protein family. PMID:22793672
Statistical discovery of site inter-dependencies in sub-molecular hierarchical protein structuring.
Durston, Kirk K; Chiu, David Ky; Wong, Andrew Kc; Li, Gary Cl
2012-07-13
Much progress has been made in understanding the 3D structure of proteins using methods such as NMR and X-ray crystallography. The resulting 3D structures are extremely informative, but do not always reveal which sites and residues within the structure are of special importance. Recently, there are indications that multiple-residue, sub-domain structural relationships within the larger 3D consensus structure of a protein can be inferred from the analysis of the multiple sequence alignment data of a protein family. These intra-dependent clusters of associated sites are used to indicate hierarchical inter-residue relationships within the 3D structure. To reveal the patterns of associations among individual amino acids or sub-domain components within the structure, we apply a k-modes attribute (aligned site) clustering algorithm to the ubiquitin and transthyretin families in order to discover associations among groups of sites within the multiple sequence alignment. We then observe what these associations imply within the 3D structure of these two protein families. The k-modes site clustering algorithm we developed maximizes the intra-group interdependencies based on a normalized mutual information measure. The clusters formed correspond to sub-structural components or binding and interface locations. Applying this data-directed method to the ubiquitin and transthyretin protein family multiple sequence alignments as a test bed, we located numerous interesting associations of interdependent sites. These clusters were then arranged into cluster tree diagrams which revealed four structural sub-domains within the single domain structure of ubiquitin and a single large sub-domain within transthyretin associated with the interface among transthyretin monomers. In addition, several clusters of mutually interdependent sites were discovered for each protein family, each of which appear to play an important role in the molecular structure and/or function. Our results demonstrate that the method we present here using a k-modes site clustering algorithm based on interdependency evaluation among sites obtained from a sequence alignment of homologous proteins can provide significant insights into the complex, hierarchical inter-residue structural relationships within the 3D structure of a protein family.
Serial Femtosecond Crystallography of G Protein-Coupled Receptors
Liu, Wei; Wacker, Daniel; Gati, Cornelius; Han, Gye Won; James, Daniel; Wang, Dingjie; Nelson, Garrett; Weierstall, Uwe; Katritch, Vsevolod; Barty, Anton; Zatsepin, Nadia A.; Li, Dianfan; Messerschmidt, Marc; Boutet, Sébastien; Williams, Garth J.; Koglin, Jason E.; Seibert, M. Marvin; Wang, Chong; Shah, Syed T.A.; Basu, Shibom; Fromme, Raimund; Kupitz, Christopher; Rendek, Kimberley N.; Grotjohann, Ingo; Fromme, Petra; Kirian, Richard A.; Beyerlein, Kenneth R.; White, Thomas A.; Chapman, Henry N.; Caffrey, Martin; Spence, John C.H.; Stevens, Raymond C.; Cherezov, Vadim
2014-01-01
X-ray crystallography of G protein-coupled receptors and other membrane proteins is hampered by difficulties associated with growing sufficiently large crystals that withstand radiation damage and yield high-resolution data at synchrotron sources. Here we used an x-ray free-electron laser (XFEL) with individual 50-fs duration x-ray pulses to minimize radiation damage and obtained a high-resolution room temperature structure of a human serotonin receptor using sub-10 µm microcrystals grown in a membrane mimetic matrix known as lipidic cubic phase. Compared to the structure solved by traditional microcrystallography from cryo-cooled crystals of about two orders of magnitude larger volume, the room temperature XFEL structure displays a distinct distribution of thermal motions and conformations of residues that likely more accurately represent the receptor structure and dynamics in a cellular environment. PMID:24357322
A large iris-like expansion of a mechanosensitive channel protein induced by membrane tension
NASA Technical Reports Server (NTRS)
Betanzos, Monica; Chiang, Chien-Sung; Guy, H. Robert; Sukharev, Sergei
2002-01-01
MscL, a bacterial mechanosensitive channel of large conductance, is the first structurally characterized mechanosensor protein. Molecular models of its gating mechanisms are tested here. Disulfide crosslinking shows that M1 transmembrane alpha-helices in MscL of resting Escherichia coli are arranged similarly to those in the crystal structure of MscL from Mycobacterium tuberculosis. An expanded conformation was trapped in osmotically shocked cells by the specific bridging between Cys 20 and Cys 36 of adjacent M1 helices. These bridges stabilized the open channel. Disulfide bonds engineered between the M1 and M2 helices of adjacent subunits (Cys 32-Cys 81) do not prevent channel gating. These findings support gating models in which interactions between M1 and M2 of adjacent subunits remain unaltered while their tilts simultaneously increase. The MscL barrel, therefore, undergoes a large concerted iris-like expansion and flattening when perturbed by membrane tension.
2013-01-01
Background Many proteins tune their biological function by transitioning between different functional states, effectively acting as dynamic molecular machines. Detailed structural characterization of transition trajectories is central to understanding the relationship between protein dynamics and function. Computational approaches that build on the Molecular Dynamics framework are in principle able to model transition trajectories at great detail but also at considerable computational cost. Methods that delay consideration of dynamics and focus instead on elucidating energetically-credible conformational paths connecting two functionally-relevant structures provide a complementary approach. Effective sampling-based path planning methods originating in robotics have been recently proposed to produce conformational paths. These methods largely model short peptides or address large proteins by simplifying conformational space. Methods We propose a robotics-inspired method that connects two given structures of a protein by sampling conformational paths. The method focuses on small- to medium-size proteins, efficiently modeling structural deformations through the use of the molecular fragment replacement technique. In particular, the method grows a tree in conformational space rooted at the start structure, steering the tree to a goal region defined around the goal structure. We investigate various bias schemes over a progress coordinate for balance between coverage of conformational space and progress towards the goal. A geometric projection layer promotes path diversity. A reactive temperature scheme allows sampling of rare paths that cross energy barriers. Results and conclusions Experiments are conducted on small- to medium-size proteins of length up to 214 amino acids and with multiple known functionally-relevant states, some of which are more than 13Å apart of each-other. Analysis reveals that the method effectively obtains conformational paths connecting structural states that are significantly different. A detailed analysis on the depth and breadth of the tree suggests that a soft global bias over the progress coordinate enhances sampling and results in higher path diversity. The explicit geometric projection layer that biases the exploration away from over-sampled regions further increases coverage, often improving proximity to the goal by forcing the exploration to find new paths. The reactive temperature scheme is shown effective in increasing path diversity, particularly in difficult structural transitions with known high-energy barriers. PMID:24565158
Protein-based materials, toward a new level of structural control.
van Hest, J C; Tirrell, D A
2001-10-07
Through billions of years of evolution nature has created and refined structural proteins for a wide variety of specific purposes. Amino acid sequences and their associated folding patterns combine to create elastic, rigid or tough materials. In many respects, nature's intricately designed products provide challenging examples for materials scientists, but translation of natural structural concepts into bio-inspired materials requires a level of control of macromolecular architecture far higher than that afforded by conventional polymerization processes. An increasingly important approach to this problem has been to use biological systems for production of materials. Through protein engineering, artificial genes can be developed that encode protein-based materials with desired features. Structural elements found in nature, such as beta-sheets and alpha-helices, can be combined with great flexibility, and can be outfitted with functional elements such as cell binding sites or enzymatic domains. The possibility of incorporating non-natural amino acids increases the versatility of protein engineering still further. It is expected that such methods will have large impact in the field of materials science, and especially in biomedical materials science, in the future.
Functional Dynamics of PDZ Binding Domains: A Normal-Mode Analysis
De Los Rios, Paolo; Cecconi, Fabio; Pretre, Anna; Dietler, Giovanni; Michielin, Olivier; Piazza, Francesco; Juanico, Brice
2005-01-01
Postsynaptic density-95/disks large/zonula occludens-1 (PDZ) domains are relatively small (80–120 residues) protein binding modules central in the organization of receptor clusters and in the association of cellular proteins. Their main function is to bind C-terminals of selected proteins that are recognized through specific amino acids in their carboxyl end. Binding is associated with a deformation of the PDZ native structure and is responsible for dynamical changes in regions not in direct contact with the target. We investigate how this deformation is related to the harmonic dynamics of the PDZ structure and show that one low-frequency collective normal mode, characterized by the concerted movements of different secondary structures, is involved in the binding process. Our results suggest that even minimal structural changes are responsible for communication between distant regions of the protein, in agreement with recent NMR experiments. Thus, PDZ domains are a very clear example of how collective normal modes are able to characterize the relation between function and dynamics of proteins, and to provide indications on the precursors of binding/unbinding events. PMID:15821164
1995-01-01
The role of the latent TGF-beta binding protein (LTBP) is unclear. In cultures of fetal rat calvarial cells, which form mineralized bonelike nodules, both LTBP and the TGF-beta 1 precursor localized to large fibrillar structures in the extracellular matrix. The appearance of these fibrillar structures preceded the appearance of type I collagen fibers. Plasmin treatment abolished the fibrillar staining pattern for LTBP and released a complex containing both LTBP and TGF-beta. Antibodies and antisense oligonucleotides against LTBP inhibited the formation of mineralized bonelike nodules in long-term fetal rat calvarial cultures. Immunohistochemistry of fetal and adult rat bone confirmed a fibrillar staining pattern for LTBP in vivo. These findings, together with the known homology of LTBP to the fibrillin family of proteins, suggest a novel function for LTBP, in addition to its role in matrix storage of latent TGF-beta, as a structural matrix protein that may play a role in bone formation. PMID:7593177